Patent application number | Description | Published |
20100174686 | Generating Equivalence Classes and Rules for Associating Content with Document Identifiers - A system of reducing the possibility of crawling duplicate document identifiers partitions a plurality of document identifiers into multiple clusters, each cluster having a cluster name and a set of document parameters. The system generates an equivalence rule for each cluster of document identifiers, the rule specifying which document parameters associated with the cluster are content-relevant. Next, the system groups each cluster of document identifiers into one or more equivalence classes in accordance with its associated equivalence rule, each equivalence class including one or more document identifiers that correspond to a document content and having a representative document identifier identifying the document content. | 07-08-2010 |
20110022605 | DOCUMENT SCORING BASED ON LINK-BASED CRITERIA - A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score. | 01-27-2011 |
20110035372 | Search Engine Cache Control - A search query containing one or more terms is received from a client system. In response to receiving the search query, one or more snippets obtained in response to a prior execution of said search query are requested from a cache. For a respective snippet received from the cache, it is determined whether the respective snippet is a current version. In response to a determination that the respective snippet is not the current version, the current version of the respective snippet is obtained from a corresponding document in which one or more terms from said search query are located and the snippet stored in the cache is updated using the obtained current version. Search query results including the respective snippet are transmitted to the client. | 02-10-2011 |
20110258185 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 10-20-2011 |
20110264671 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 10-27-2011 |
20120005199 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 01-05-2012 |
20120016871 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120066576 | Anchor Tag Indexing in a Web Crawler System - Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers. | 03-15-2012 |
20120173552 | Assigning Document Identification Tags - Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents. | 07-05-2012 |
20140222776 | Document Reuse in a Search Engine Crawler - Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion. | 08-07-2014 |