Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

David P. Woodruff, Mountain View US

David P. Woodruff, Mountain View, CA US

Patent application number	Description	Published
20110251976	COMPUTING CASCADED AGGREGATES IN A DATA STREAM - A method for efficiently approximating cascaded aggregates in a data stream in a single pass over a dataset, with entries presented to the methodology in an arbitrary order includes receiving out-of-order data entries in the data stream, aggregating particular data entries into aggregated data sets from the data stream based on a first characteristic of the data entries, computing a normalized Euclidean norm around mean values of each of the aggregated data sets, calculating an average of all of the normalized Euclidean norms of each of the aggregated data sets, and calculating a value based on the first characteristic as a result of calculating the average of all of the normalized Euclidean norms.	10-13-2011
20110270835	COMPUTER INFORMATION RETRIEVAL USING LATENT SEMANTIC STRUCTURE VIA SKETCHES - A method, system and program product for computer information retrieval is disclosed. A matrix A is received. Random sign matrices S and R are generated. Matrix products of ŜTA, AR, and ŜTAR are computed. A Moore-Penrose pseudoinverse C of ŜTAR is computed. A singular value decomposition is computed of the pseudoinverse C. Three matrices ARU, Sigma, and V̂TŜTA are outputted as factorization in applications.	11-03-2011
20120215803	AGGREGATE CONTRIBUTION OF ICEBERG QUERIES - One or more embodiments determine a distance between at least two vectors of n coordinates. A set of heavy coordinates is identified from a set of n coordinates associated with at least two vectors. A set of light coordinates is identified from the set of n coordinates associated with the at least two vectors. A first estimation of a contribution is determined from the set of heavy coordinates to a rectilinear distance between the at least two vectors. A second estimation of a contribution is determined from the set of light coordinates to the rectilinear distance norm. The first estimation is combined with the second estimation.	08-23-2012
20120296935	AGGREGATE CONTRIBUTION OF ICEBERG QUERIES - One or more embodiments determine a distance between at least two vectors of n coordinates. A set of heavy coordinates is identified from a set of n coordinates associated with at least two vectors. A set of light coordinates is identified from the set of n coordinates associated with the at least two vectors. A first estimation of a contribution is determined from the set of heavy coordinates to a rectilinear distance between the at least two vectors. A second estimation of a contribution is determined from the set of light coordinates to the rectilinear distance norm. The first estimation is combined with the second estimation.	11-22-2012
20130073561	RANDOM SAMPLING FROM DISTRIBUTED STREAMS - Described herein are methods, systems, apparatuses and products for random sampling from distributed streams. An aspect provides a method for distributed sampling on a network with a plurality of sites and a coordinator, including: receiving at the coordinator a data element from a site of the plurality of sites, said data element having a weight randomly associated therewith deemed reportable by comparison at the site to a locally stored global value; comparing the weight of the data element received with a global value stored at the coordinator; and performing one of: updating the global value stored at the coordinator to the weight of the data element received; and communicating the global value stored at the coordinator back to the site of the plurality of sites. Other embodiments are disclosed.	03-21-2013
20130103711	COMPUTING CORRELATED AGGREGATES OVER A DATA STREAM - Described herein are approaches for computing correlated aggregates. An aspect provides for receiving a stream of data elements at a device, each data element having at least one numerical attribute; maintaining in memory plurality of tree structures comprising a plurality of separate nodes for summarizing numerical attributes of the data elements with respect to a predicate value of a correlated aggregation query, said maintaining comprising: creating the plurality of tree structures in which each node implements one of: a probabilistic counter and a sketch, wherein said probabilistic counter and said sketch each act to estimate aggregated data element numerical attributes to form a summary of said numerical attributes; and responsive to a correlated aggregation query specifying said predicate value, using said plurality of tree structures as a summary of said data element numerical attributes to compute a response to said correlated aggregate query.	04-25-2013
20130103713	COMPUTING CORRELATED AGGREGATES OVER A DATA STREAM - Described herein are approaches for computing correlated aggregates. An aspect provides for receiving a stream of data elements at a device, each data element having at least one numerical attribute; maintaining in memory plurality of tree structures comprising a plurality of separate nodes for summarizing numerical attributes of the data elements with respect to a predicate value of a correlated aggregation query, said maintaining comprising: creating the plurality of tree structures in which each node implements one of: a probabilistic counter and a sketch, wherein said probabilistic counter and said sketch each act to estimate aggregated data element numerical attributes to form a summary of said numerical attributes; and responsive to a correlated aggregation query specifying said predicate value, using said plurality of tree structures as a summary of said data element numerical attributes to compute a response to said correlated aggregate query.	04-25-2013
20140258253	SUMMARIZING A STREAM OF MULTIDIMENSIONAL, AXIS-ALIGNED RECTANGLES - A method for estimating aggregates over a stream of axis-aligned rectangles, includes: decomposing the stream along one-dimensional intervals, wherein vertices for the rectangle are located in a predetermined grid; assigning each grid row to buckets, wherein the one-dimensional intervals are placed into buckets according to the corresponding rows in which the one-dimensional intervals are positioned; and estimating a sum of a number of grid points touched by at least one of the rectangles in each row of the grid to approximate a volume of the axis-aligned rectangles by: using pairwise-independent hash functions in a multi-dimensional algorithm to determine buckets that include a first interval corresponding to a given rectangle, wherein the interval has hash function results that meet a predetermined threshold; and inserting a second interval for the rectangle corresponding to the first interval into a one-dimensional algorithm for the corresponding bucket meeting the predetermined threshold.	09-11-2014
20140258332	FAST DISTRIBUTED DATABASE FREQUENCY SUMMARIZATION - A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.	09-11-2014
20140258333	FAST DISTRIBUTED DATABASE FREQUENCY SUMMARIZATION - A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.	09-11-2014
20140280426	INFORMATION RETRIEVAL USING SPARSE MATRIX SKETCHING - Embodiments of the invention include method of approximating a matrix of data using sparse matrices which includes receiving a first matrix and generating a second matrix based on the first matrix and a first sparse matrix. The method further includes generating a third matrix based on the first matrix and a second sparse matrix and generating a fourth matrix by generating a Moore-Penrose pseudo-inverse matrix based on the first matrix, the second matrix and the third matrix. The method also includes generating a fifth matrix based on a product of the second matrix, the third matrix, and a fourth matrix. The method further includes receiving, by a computer, a request to access at least one entry of the first matrix and responding to the request by accessing an entry of the fifth matrix.	09-18-2014
20140280428	INFORMATION RETRIEVAL USING SPARSE MATRIX SKETCHING - A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, RA, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, RA, has dimensions n×t. The processor generates a third matrix, AS	09-18-2014
20140351007	ESTIMATING THE TOTAL SALES OVER STREAMING BIDS - A mechanism is provided for computing an estimation of maximum total sales over streaming items. Each item having an associated value is designated as an item value pair. Value ranges are established to place the item value pairs. The value ranges are distinct. Each of the item value pairs is added into the value ranges according to each of the associated values for the item value pairs. Repeated item value pairs are removed that are in the same value ranges. A number of the item value pairs is reduced in each of the value ranges respectively based on an error factor, by randomly selecting the item value pairs to remove from each of the value ranges. An estimate of a total maximum value of the bids for the item value pairs in all of the value ranges is computed based on a scale factor.	11-27-2014
20140351020	ESTIMATING THE TOTAL SALES OVER STREAMING BIDS - A mechanism is provided for computing an estimation of maximum total sales over streaming items. Each item having an associated value is designated as an item value pair. Value ranges are established to place the item value pairs. The value ranges are distinct. Each of the item value pairs is added into the value ranges according to each of the associated values for the item value pairs. Repeated item value pairs are removed that are in the same value ranges. A number of the item value pairs is reduced in each of the value ranges respectively based on an error factor, by randomly selecting the item value pairs to remove from each of the value ranges. An estimate of a total maximum value of the bids for the item value pairs in all of the value ranges is computed based on a scale factor.	11-27-2014
20150052172	IDENTIFYING A SKETCHING MATRIX USED BY A LINEAR SKETCH - Embodiments relate to identifying a sketching matrix used by a linear sketch. Aspects include receiving an initial output of the linear sketch, generating a query vector and inputting the query vector into the linear sketch. Aspects further include receiving an revised output of the linear sketch based on inputting the query vector and iteratively repeating the steps of generating the query vector, inputting the query vector into the linear sketch, and receiving an revised output of the linear sketch based on inputting the query vector until the sketching matrix used by the linear sketch can be identified.	02-19-2015

Patent applications by David P. Woodruff, Mountain View, CA US