Patent application number | Description | Published |
20080235163 | SYSTEM AND METHOD FOR ONLINE DUPLICATE DETECTION AND ELIMINATION IN A WEB CRAWLER - As part of the normal crawling process, a crawler parses a page and computes a de-tagged hash, called a fingerprint, of the page content. A lookup structure consisting of the host hash (hash of the host portion of the URL) and the fingerprint of the page is maintained. Before the crawler writes a page to a store, this lookup structure is consulted. If the lookup structure already contains the tuple (i.e., host hash and fingerprint), then the page is not written to the store. Thus, a lot of duplicates are eliminated at the crawler itself, saving CPU and disk cycles which would otherwise be needed during current duplicate elimination processes. | 09-25-2008 |
20100114895 | System and Method for Administering Data Ingesters Using Taxonomy Based Filtering Rules - A method, system, and article are provided for management of a data ingester and associated content collected by the data ingester. The computer system is configured with a taxonomy together with rules and policies for ingesting and classifying the collected data. Based upon the classification of the collected data with respect to the taxonomy, the data is assigned to a location in the taxonomy. | 05-06-2010 |
20110078123 | MANAGING DATA CONSISTENCY BETWEEN LOOSELY COUPLED COMPONENTS IN A DISTRIBUTED COMPUTING SYSTEM - Embodiments of the present invention provide a method, system and computer program product for maintaining distributed state consistency in a distributed computing application. In an embodiment of the invention, a method for maintaining distributed state consistency in a distributed computing application can include registering a set of components of a distributed computing application, starting a transaction resulting in changes of state in different ones of the components in the registered set and determining in response to a conclusion of the transaction whether or not an inconsistency of state has arisen amongst the different components in the registered set in consequence of the changes of state in the different ones of the components in the registered set. If an inconsistency has arisen, each of the components in the registered set can be directed to rollback to a previously stored state. Otherwise a committal of state can be directed in each of the components in the registered set. | 03-31-2011 |
20120078719 | SYSTEMS AND METHODS FOR CLUSTER AUGMENTATION OF SEARCH RESULTS - Systems and associated methods for clustering a plurality of nodes based on connectivity among the plurality of nodes, determining relevant content of the clusters, and applying knowledge regarding the relevant content are described. The nodes can include for example web-based documents such as web pages. The clusters can include for example groups of web pages that are linked together, as via hyperlinks. The relevant content can include one or more topics associated with the web page, as for example determined via text mining. Applying the knowledge regarding the relevant content can include for example using the one or more topics associated with the web pages to augment search results and/or conduct contextual advertising. | 03-29-2012 |
20130018891 | REAL-TIME SEARCH OF VERTICALLY PARTITIONED, INVERTED INDEXES - Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set. | 01-17-2013 |
20130018916 | REAL-TIME SEARCH OF VERTICALLY PARTITIONED, INVERTED INDEXESAANM Busch; MichaelAACI Mountain ViewAAST CAAACO USAAGP Busch; Michael Mountain View CA USAANM Desai; Rajesh M.AACI San JoseAAST CAAACO USAAGP Desai; Rajesh M. San Jose CA USAANM Foyle; Robert A.AACI OrangeAAST CAAACO USAAGP Foyle; Robert A. Orange CA USAANM Jayapandian; MageshAACI San JoseAAST CAAACO USAAGP Jayapandian; Magesh San Jose CA US - Provided are techniques for processing a query. A query including constraints for at least two vertically partitioned, inverted indexes is received. The constraints in the query are separated based on the vertically partitioned, inverted indexes. A document identifier iterator is obtained for each of the constraints, wherein each document identifier iterator is associated with a posting list, and wherein each posting list is ordered by document identifier order. A run-time join of the posting lists is performed to obtain a final result set. | 01-17-2013 |
20140195554 | SYSTEM AND METHOD FOR CASE ACTIVITY MONITORING AND CASE DATA RECOVERY USING AUDIT LOGS IN E-DISCOVERY - A method, apparatus and article of manufacture for analyzing data recorded in an audit log generated as part of an electronic discovery (e-Discovery) process in litigation is disclosed. In at least one embodiment of the present invention, a computer implemented method of analyzing data recorded in an audit log generated as part of an electronic discovery (e-Discovery) process in litigation is provided. The method comprises retrieving, on one or more computers, an audit log from a storage system accessible from the computer, the audit log comprising data regarding a chronological sequence of actions taken to produce case documents relevant in litigation. The data in the audit log is analyzed and a comprehensive overview of the electronic discovery process is compiled based on the analyzed data for presentation to a user. | 07-10-2014 |
20150100549 | EXTENDING A CONTENT REPOSITORY USING AN AUXILIARY DATA STORE - According to one embodiment of the present invention, a system extends a content repository by creating an auxiliary data store outside of the content repository and storing auxiliary data in the auxiliary data store, wherein the auxiliary data is associated with a collection of documents in the content repository. The system stores version information for the auxiliary data store and records of operations against the auxiliary data store in a log in the repository. In response to receiving a request for an operation against the auxiliary data store, the system determines that the auxiliary data store and repository are consistent based on the version information and applies the operation against the auxiliary data store. Embodiments of the present invention further include a method and computer program product for extending a content repository data model in substantially the same manners described above. | 04-09-2015 |
20150100550 | EXTENDING A CONTENT REPOSITORY USING AN AUXILIARY DATA STORE - According to one embodiment of the present invention, a system extends a content repository by creating an auxiliary data store outside of the content repository and storing auxiliary data in the auxiliary data store, wherein the auxiliary data is associated with a collection of documents in the content repository. The system stores version information for the auxiliary data store and records of operations against the auxiliary data store in a log in the repository. In response to receiving a request for an operation against the auxiliary data store, the system determines that the auxiliary data store and repository are consistent based on the version information and applies the operation against the auxiliary data store. Embodiments of the present invention further include a method and computer program product for extending a content repository data model in substantially the same manners described above. | 04-09-2015 |