Patent application number | Description | Published |
20080281922 | AUTOMATIC GENERATION OF EMAIL PREVIEWS AND SUMMARIES - An incoming electronic communication is broken down into message portions. Features of the message portions are extracted and the message portions are converted into sparse feature vectors. The probabilities of the message portions being of interest of the user are calculated and the message portions are converted back into text. Message portions with a relatively high probability of being of interest to a user are presented to the user as a summary. | 11-13-2008 |
20080300872 | SCALABLE SUMMARIES OF AUDIO OR VISUAL CONTENT - Providing for browsing a summary of content formed of keywords that can scale to a user-defined level of detail is disclosed herein. Components of a system can include a summarization component that extracts keywords related to the content and associates the keywords with portions thereof, and a zooming component that displays a number of keywords based on a keyword/keyphrase relevance rank and a zoom factor. Additionally, a speech to text component can translate speech associated with the content into text, wherein the keywords are extracted from the translated text. Consequently, the claimed subject matter can present a variable hierarchy of keywords to form a scalable summary of such recorded content. | 12-04-2008 |
20090006343 | MACHINE ASSISTED QUERY FORMULATION - Architecture for completing search queries by using artificial intelligence based schemes to infer search intentions of users. Partial queries are completed dynamically in real time. Additionally, search aliasing can also be employed. Custom tuning can be performed based on at least query inputs in the form of text, graffiti, images, handwriting, voice, audio, and video signals. Natural language processing occurs, along with handwriting recognition and slang recognition. The system includes a classifier that receives a partial query as input, accesses a query database based on contents of the query input, and infers an intended search goal from query information stored on the query database. A query formulation engine receives search information associated with the intended search goal and generates a completed formal query for execution. | 01-01-2009 |
20090006344 | MARK-UP ECOSYSTEM FOR SEARCHING - Architecture for completing search queries by using artificial intelligence based schemes to infer search intentions of users. Partial queries are completed dynamically in real time. Additionally, search aliasing can also be employed. Custom tuning can be performed based on at least query inputs in the form of text, graffiti, images, handwriting, voice, audio, and video signals. Natural language processing occurs, along with handwriting recognition and slang recognition. The system includes a classifier that receives a partial query as input, accesses a query database based on contents of the query input, and infers an intended search goal from query information stored on the query database. A query formulation engine receives search information associated with the intended search goal and generates a completed formal query for execution. | 01-01-2009 |
20090006345 | VOICE-BASED SEARCH PROCESSING - Architecture for completing search queries by using artificial intelligence based schemes to infer search intentions of users. Partial queries are completed dynamically in real time. Additionally, search aliasing can also be employed. Custom tuning can be performed based on at least query inputs in the form of text, graffiti, images, handwriting, voice, audio, and video signals. Natural language processing occurs, along with handwriting recognition and slang recognition. The system includes a classifier that receives a partial query as input, accesses a query database based on contents of the query input, and infers an intended search goal from query information stored on the query database. A query formulation engine receives search information associated with the intended search goal and generates a completed formal query for execution. | 01-01-2009 |
20090024356 | DETERMINATION OF ROOT CAUSE(S) OF SYMPTOMS USING STOCHASTIC GRADIENT DESCENT - Diagnosis of one or more root causes of symptoms is performed by using stochastic gradient descent to find the optimal parameters of a variational distribution. This methodology, called variational gradient descent, permits fast diagnosis for a large number (greater than 1,000) or very large number (greater than 1,000,000) of symptom observations. A real-time application of the root cause diagnosis can determine currently occurring intermittent root causes. Diagnosis can be performed in a number of scenarios, such as medical disease detection or computer/network failure. | 01-22-2009 |
20090099988 | ACTIVE LEARNING USING A DISCRIMINATIVE CLASSIFIER AND A GENERATIVE MODEL TO DETECT AND/OR PREVENT MALICIOUS BEHAVIOR - A malicious behavior detection/prevention system, such as an intrusion detection system, is provided that uses active learning to classify entries into multiple classes. A single entry can correspond to either the occurrence of one or more events or the non-occurrence of one or more events. During a training phase, entries are automatically classified into one of multiple classes. After classifying the entry, a generated model for the determined class is utilized to determine how well an entry corresponds to the model. Ambiguous classifications along with entries that do not fit the model well for the determined class are selected for labeling by a human analyst The selected entries are presented to a human analyst for labeling. These labels are used to further train the classifier and the models. During an evaluation phase, entries are automatically classified using the trained classifier and a policy associated with determined class is applied. | 04-16-2009 |
20090198654 | DETECTING RELEVANT CONTENT BLOCKS IN TEXT - A system that facilitates detecting a targeted topic in a document is described herein. The system includes a receiver component that receives a document. The system additionally includes a topic model component trained using a plurality of training documents including the topic and a plurality of training documents that do not include the topic. The topic model component analyzes the document and automatically determines which portions of the document include the topic and which portions of the document do not include the topic. | 08-06-2009 |
20110283204 | Pasting Various Data into a Programming Environment - Described is a technology by which a user pastes selected data into a command line of a program, including when the selected data is non-textual. Upon detecting the paste (or drop) action, a variable name is automatically generated and inserted at the current point in a command line, where it acts as a proxy for the pasted data itself. A data structure comprising the selected data or transformed data corresponding to that selected data is maintained in program storage, e.g., RAM allocated to the program. In one aspect, a handler may be used to transform the data from one format into another that may be used by a particular program. For example, text may be reformatted into an array on which the program operates. The handler may be selected from a plurality of possible handlers, including customized handlers. | 11-17-2011 |
20110314001 | PERFORMING QUERY EXPANSION BASED UPON STATISTICAL ANALYSIS OF STRUCTURED DATA - A method described herein includes an act of receiving a query from a user, wherein the query is configured to search over a plurality of documents belonging to a particular domain. The method also includes an act of providing data to the user for display on a display screen of a computing apparatus, wherein the data is provided based at least in part upon a statistical analysis undertaken with respect to structured data pertaining to the particular domain, wherein the structured data is based at least in part upon data included in the plurality of documents. | 12-22-2011 |
20120005282 | COLLABORATIVE RANKING AND FILTERING OF ELECTRONIC MAIL MESSAGES - Electronic mail messages may be collaboratively ranked and filtered. User actions on an electronic mail message received from a sender by one or more recipients may be monitored. Statistics may be generated based on the user actions. The generated statistics may be utilized to provide a quality ranking of the electronic mail message based on the generated statistics. | 01-05-2012 |
20120323829 | GRAPH-BASED CLASSIFICATION BASED ON FILE RELATIONSHIPS - A reliable automated malware classification approach with substantially low false positive rates is provided. Graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm. File relationships such as containing, creating, copying, downloading, modifying, etc. are used to assign malware probabilities and simultaneously reduce the false positive and false negative rates on executable files. | 12-20-2012 |
20120323968 | Learning Discriminative Projections for Text Similarity Measures - A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently. | 12-20-2012 |
20130268531 | Finding Data in Connected Corpuses Using Examples - In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly linked or indirectly linked through other domains. The user provides known relationship examples to filter the connected subsets and to identify the connected subsets that are most relevant to the user's query. The selected connected subsets may be further analyzed by business intelligence/analytics to create pivot tables or to process the data. | 10-10-2013 |
20130268552 | Brokered Exchange of Private Data - A data broker observes datasets that are opened or created by a user. The data broker looks for related datasets in a data catalog. If a related dataset is found, the data broker asks the user if they want to access the related dataset. If the user is interested, then the data broker asks the data owner if they are willing to share access to the related dataset with the user. The data owner may deny access, allow access, or request the user's identity. If the user does not want to provide his or her identity, then access to the related dataset is denied. If the user does provide his or her identity, then the data owner determines whether or not to share the data with that user. Once the owner approves sharing the related dataset, then the dataset or a link to the dataset is sent to the user. | 10-10-2013 |
20130275434 | DEVELOPING IMPLICIT METADATA FOR DATA STORES - A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher. | 10-17-2013 |
20130275436 | PSEUDO-DOCUMENTS TO FACILITATE DATA DISCOVERY - Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure. | 10-17-2013 |
20140067368 | DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS - A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix. | 03-06-2014 |
20140222747 | LEARNING WITH NOISY LABELS FROM MULTIPLE JUDGES - A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties. | 08-07-2014 |