Patent application number | Description | Published |
20090089244 | METHOD OF DETECTING SPAM HOSTS BASED ON CLUSTERING THE HOST GRAPH - Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam. Then the hosts are partitioned into clusters based on how each host is linked to other hosts. Each cluster is then analyzed and, depending on the number of spam and non-spam hosts it contains, the cluster may be classified as a spam cluster or a non-spam cluster. The hosts within the cluster may then be reclassified based on the cluster's classification. The results may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090089285 | METHOD OF DETECTING SPAM HOSTS BASED ON PROPAGATING PREDICTION LABELS - Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam by a baseline classifier. The accuracy of the initial host classifications are then improved by propagating them using a random walk algorithm. The random walk used may be modified in order to obtain a weighted or skewed characterization of the host. The hosts may then be reclassified based on the characterization obtained from the random walk to obtain a final spam/non-spam classification. The final classification may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090089373 | SYSTEM AND METHOD FOR IDENTIFYING SPAM HOSTS USING STACKED GRAPHICAL LEARNING - Systems and methods for identifying spam hosts are disclosed in which hosts known to the system and initially classified as spam or non-spam by a baseline classifier. Then for each node u in the host graph a new feature is computed. This feature is an aggregate function of the initial classifications produced by the baseline classifier for the neighbors of the node u. The set of neighbors can be defined in many different ways: in-link neighbors, out-link neighbors, bi-directional neighbors, k-hops neighbors, etc. The new feature computed above then is added to the existing set of features, and the baseline classifier is trained again, producing new predictions for each node. The results may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090271388 | ANNOTATIONS OF THIRD PARTY CONTENT - The subject matter disclosed herein relates to creating a search query based on content and subject of a web page, for example. In one particular example, such a search query may be established by a selection of one or more keywords in a web page. Consequently, the search query may be affected by a determination of content and/or a subject of the web page. | 10-29-2009 |
20100036784 | SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA - The present invention is directed towards systems and methods for identifying high quality content in a social media environment. The method according to one embodiment of the present invention comprises retrieving a content item and retrieving a plurality of quality features associated with said content item wherein said quality features comprise intrinsic, usage and relationship features. The method then performs an analysis of said content item against said quality features and generates a quality score based on said analysis. | 02-11-2010 |
20100082694 | QUERY LOG MINING FOR DETECTING SPAM-ATTRACTING QUERIES - Disclosed are methods and apparatus for detecting spam-attracting queries. In one embodiment, one or more graphs are generated using data obtained from a query log, where the one or more graphs include at least one of an anticlick graph or a view graph. Values of one or more syntactic features of the one or more graphs are ascertained. Values of one or more semantic features of the one or more graphs are determined by propagating categories from a web directory among nodes in each of the one or more graphs. Spam-attracting queries are then detected based upon the values of the syntactic features and the semantic features. | 04-01-2010 |
20100082752 | QUERY LOG MINING FOR DETECTING SPAM HOSTS - Disclosed are methods and apparatus for detecting spam hosts. In one embodiment, one or more graphs are generated using data obtained from a query log, where the one or more graphs include at least one of an anticlick graph or a view graph. Values of one or more syntactic features of the one or more graphs are ascertained. Values of one or more semantic features of the one or more graphs are determined by propagating categories from a web directory among nodes in each of the one or more graphs. Spam hosts are then detected based upon the values of the syntactic features and the semantic features. | 04-01-2010 |
20100106719 | CONTEXT-SENSITIVE SEARCH - A method for performing a search based on a query term and a context document is described herein. The method involves receiving a search request comprising a query term and a context document, and identifying a target document of a plurality of documents based on a relationship of the context document with the target document and the query term, where the relationship of the context document with the target document is determined prior to receiving the search request. | 04-29-2010 |
20100114928 | DIVERSE QUERY RECOMMENDATIONS USING WEIGHTED SET COVER METHODOLOGY - A computer-implemented method is such that suggested search queries are provided based on an input search query. The search query is received (such as from a user providing the search query to a search engine service) and a first list of documents is determined that correspond to processing the query by a search engine. A list of result queries is determined, wherein executing the list of result queries would correspond to a second list of documents, that result from presenting the result queries to the search engine, and the documents of the second list of documents cover the documents of the first list of documents. The list of result queries is returned as the suggested queries. Determining a list of result queries may include, for example, determining a list of potential queries, wherein each potential query, when executed by the search engine, results in at least one document in the first list of documents; and processing the potential queries to determine which of the potential queries to include in the list of result queries. | 05-06-2010 |
20100114929 | DIVERSE QUERY RECOMMENDATIONS USING CLUSTERING-BASED METHODOLOGY - A computer-implemented method provides suggested search queries based on an input search query. The input search query is received. A first list of documents is determined that correspond to processing the query by a search engine determining the list of result queries, including processing the first list of documents to determine clusters of documents and determining potential queries that correspond to the determined clusters by comparing results of the potential queries with documents in the determined clusters. A list of result queries is determined, wherein executing the list of result queries would correspond to a second list of documents, that result from presenting the result queries to the search engine; and the documents of the second list of documents cover the documents of the first list of documents. The list of result queries based on the potential queries determined to correspond to the determined clusters. | 05-06-2010 |
20100161643 | SEGMENTATION OF INTERLEAVED QUERY MISSIONS INTO QUERY CHAINS - The subject matter disclosed herein relates to segmentation of interleaved query missions into a plurality of query chains. | 06-24-2010 |
20110029475 | TAXONOMY-DRIVEN LUMPING FOR SEQUENCE MINING - Methods and apparatus are described for modeling sequences of events with Markov models whose states correspond to nodes in a provided taxonomy. Each state represents the events in the subtree under the corresponding node. By lumping observed events into states that correspond to internal nodes in the taxonomy, more compact models are achieved that are easier to understand and visualize, at the expense of a decrease in the data likelihood. The decision for selecting the best model is taken on the basis of two competing goals: maximizing the data likelihood, while minimizing the model complexity (i.e., the number of states). | 02-03-2011 |