Patent application number | Description | Published |
20080306903 | CARDINALITY ESTIMATION IN DATABASE SYSTEMS USING SAMPLE VIEWS - A system and method that facilitates and effectuates estimating the result of performing a data analysis operation on a set of data. Employing an approximation of the data analysis operation on a statistically valid random sample view of the data allows for a statistically accurate estimate of the result to be obtained. Sequential sampling in the view enables the approximated operation to evaluate accuracy conditions at intervals during the scan of the sample view and obtain the estimated result without having to scan the entire sample view. Feedback regarding the accuracy of the estimated result can be captured when the data analysis operation is performed against the set of data. Process control techniques can be employed with the feedback to maintain the statistical validity of the sample view. | 12-11-2008 |
20090064160 | Transparent lazy maintenance of indexes and materialized views - Described herein is a materialized view or index maintenance system that includes a task generator component that receives an indication that an update transaction has committed against a base table in a database system. The task generator component, in response to the update transaction being received, generates a maintenance task for one or more of a materialized view or an index that is affected by the update transaction. A maintenance component transparently performs the maintenance task when a workload of a CPU in the database system is below a threshold or when an indication is received that a query that uses the one or more of the materialized view or the index has been received. | 03-05-2009 |
20090327255 | VIEW MATCHING OF MATERIALIZED XML VIEWS - A materialized XML view matching system and method for processing of SQLXML queries using view matching of materialized XML views. The view matching process of the embodiments of the system and method use a multi-path tree (MPT) data structure. Embodiments of the materialized XML view matching system and method construct an MPT data structure for each input query and view expression. View matching is performed on the MPT data structures to generate a set of partial matches, which then are cleaned to generate a set of candidate matches. A valid match definition is generated by testing each candidate match for different forms of compliance. Using the valid match definition, a set of valid matches is identified and extracted. For each valid match, a substitute query expression is constructed that can serve as a replacement for the original query. These substitute queries can be used to evaluate the original query. | 12-31-2009 |
20100175049 | SCOPE: A STRUCTURED COMPUTATIONS OPTIMIZED FOR PARALLEL EXECUTION SCRIPT LANGUAGE - Embodiments of the present invention relate to systems, methods and computer storage media for providing Structured Computations Optimized for Parallel Execution (SCOPE) that facilitate analysis of a large-scale dataset utilizing row data of those data sets. SCOPE includes, among other features, an extract command for extracting data bytes from a data stream and structuring the data bytes as data rows having strictly defined columns. SCOPE also includes a process command and a reduce command that identify data rows as inputs. The reduce command also identifies a reduce key that facilitates the reduction based on the reduce key. SCOPE additionally includes a combine command that identifies two data row sets that are to be combined based on an identified joint condition. Additionally, SCOPE includes a select command that leverages SQL and C# languages to create an expressive script that is capable of analyzing large-scale data sets in a parallel computing environment. | 07-08-2010 |
20100281005 | Asynchronous Database Index Maintenance - This disclosure provides techniques for asynchronously maintaining database indexes or sub-indexes. For example, a database management server may receive a data manipulation statement to modify particular data stored in a database and determine whether an index associated with executing the statement is maintained asynchronously. When the index is maintained asynchronously, maintenance of the index to reflect changes made to the particular data by executing the data manipulation statement may be delayed until an index maintenance event. The index maintenance may be based on an isolation level of a transaction including a query that triggered the index maintenance. | 11-04-2010 |
20110153593 | EXPLOITING PARTITIONING, GROUPING, AND SORTING IN QUERY OPTIMIZATION - An optimizer uses comprehensive reasoning regarding partitioning, sorting, and grouping properties for query optimization. When optimizing an input query expression, logical exploration generates alternative logical expressions. Physical optimization explores physical operator alternatives for logical operators. Required partitioning, sorting, and grouping properties of inputs to physical operators are determined. Additionally, delivered partitioning, sorting, and grouping properties of outputs from physical operators are determined. In some embodiments, enforcer rules are employed to modify structural property requirements to introduce alternatives for consideration. Property matching identifies valid execution plans in which the delivered partitioning, sorting, and grouping properties satisfy corresponding required partitioning, sorting, and grouping properties. An execution plan having the lowest cost is selected as the optimized execution plan. | 06-23-2011 |
20120096001 | AFFINITIZING DATASETS BASED ON EFFICIENT QUERY PROCESSING - Embodiments of the present invention relate to systems, methods, and computer-storage media for affinitizing datasets based on efficient query processing. In one embodiment, a plurality of datasets within a data stream is received. The data stream is partitioned based on efficient query processing. Once the data stream is partitioned, an affinity identifier is assigned to datasets based on the partitioning of the dataset. Further, when datasets are broken into extents, the affinity identifier of the parent dataset is retained in the resulting extent. The affinity identifier of each extent is then referenced to preferentially store extents having common affinity identifiers within close proximity of one other across a data center. | 04-19-2012 |
20120284719 | DISTRIBUTED MULTI-PHASE BATCH JOB PROCESSING - A distributed job-processing environment including a server, or servers, capable of receiving and processing user-submitted job queries for data sets on backend storage servers. The server identifies computational tasks to be completed on the job as well as a time frame to complete some of the computational tasks. Computational tasks may include, without limitation, preprocessing, parsing, importing, verifying dependencies, retrieving relevant metadata, checking syntax and semantics, optimizing, compiling, and running. The server performs the computational tasks, and once the time frame expires, a message is transmitted to the user indicating which tasks have been completed. The rest of the computational tasks are subsequently performed, and eventually, job results are transmitted to the user. | 11-08-2012 |
20130332446 | EFFICIENT PARTITIONING TECHNIQUES FOR MASSIVELY DISTRIBUTED COMPUTATION - A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined. The indexed data is sorted by such repartitioning indices, thereby grouping together all like data, and then stored in a sequential manner. | 12-12-2013 |
20130346988 | PARALLEL DATA COMPUTING OPTIMIZATION - The use of statistics collected during the parallel distributed execution of the tasks of a job may be used to optimize the performance of the task or similar recurring tasks. An execution plan for a job is initially generated, in which the execution plan includes tasks. Statistics regarding operations performed in the tasks are collected while the tasks are executed via parallel distributed execution. Another execution plan is then generated for another recurring job, in which the additional execution plan has at least one task in common with the execution plan for the job. The additional execution plan is subsequently optimized based at least on the statistics to produce an optimized execution plan. | 12-26-2013 |
20140297680 | ANALYZING MULTIPLE DATA STREAMS AS A SINGLE DATA OBJECT - Embodiments of the present invention allow multiple data streams to be analyzed as a single data set. The single data set may be described as a stream set herein. The multiple streams that are included in the stream set may be specified through a user script or query. For example, a query may be used to gather all streams created within a date range. The query could include one or more filters to gather certain information from the data streams or to exclude certain data streams that otherwise are in the query's range. A stream may be an unstructured byte stream of data. The stream may be created by append-only writing to the end of the stream. The stream could also be a structured stream that includes metadata that defines column structure and affinity/clustering information. | 10-02-2014 |
20150058316 | Continuous Cloud-Scale Query Optimization and Processing - Runtime statistics from the actual performance of operations on a set of data are collected and utilized to dynamically modify the execution plan for processing a set of data. The operations performed are modified to include statistics collection operations, the statistics being tailored to the specific operations being quantified. Optimization policy defines how often optimization is attempted and how much more efficient an execution plan should be to justify transitioning from the current one. Optimization is based on the collected runtime statistics but also takes into account already materialized intermediate data to gain further optimization by avoiding reprocessing. | 02-26-2015 |