Nexidia Inc. Patent applications |
Patent application number | Title | Published |
20140310000 | SPOTTING AND FILTERING MULTIMEDIA - In an aspect, in general, a computer implemented method includes receiving a query phrase, receiving a first data representing a first audio signal including an interaction among a number of speakers and at least one segment of one or more known audio items, receiving a second data comprising temporal locations of the at least one segment of one or more known audio items in the first audio signal, and searching the first data to identify putative instances of the query phrase that are temporally excluded from the temporal locations of the at least one segment of one or more known audio items. | 10-16-2014 |
20140297280 | SPEAKER IDENTIFICATION - In an aspect, in general, a system includes a first input for receiving a first data representing an interaction among a plurality of parties, the first data identifying a plurality of parts of the interaction and identifying a plurality of segments associated with each part of the plurality of parts, a second input for receiving a second data associating each of one or more labels with one or more corresponding query phrases, a searching module for searching the first data to identify putative instances of the query phrases, and a classifier for labeling the parts of the interaction associated with the identified putative instances of the query phrases with the labels corresponding to the identified query phrases. | 10-02-2014 |
20130294587 | SPEAKER ADAPTATION - A method for speaker adaptation includes receiving a plurality of media files, each associated with a call center agent of a plurality of call center agents and receiving a plurality of terms. Speech processing is performed on at least some of the media files to identify putative instances of at least some of the plurality of terms. Each putative instance is associated with a hit quality that characterizes a quality of recognition of the corresponding term. One or more call center agents for performing speaker adaptation are determined, including identifying call center agents that are associated with at least one media file that includes one or more putative instances with a hit quality below a predetermined threshold. Speaker adaptation is performed for each identified call center agent based on the media files associated with the identified call center agent and the identified instances of the plurality of terms. | 11-07-2013 |
20130110849 | QUERY GENERATION | 05-02-2013 |
20130060572 | TRANSCRIPT RE-SYNC - In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence. | 03-07-2013 |
20130035936 | LANGUAGE TRANSCRIPTION - A transcription system is applicable to transcription for a language in which there is limited pronunciation and/or acoustic data. A transcription station is configured using pronunciation data and acoustic data for use with the language. The pronunciation data and/or the acoustic data is initially from another dialect of a language, another language from a language group, or is universal (e.g., not specific to any particular language). A partial transcription of the audio recording is accepted via the transcription station (e.g., from a transcriptionist). One or more repetitions of one or more portions of the partial transcription are identified in the audio recording, and can be accepted during transcription. The pronunciation data and/or the acoustic data is updated in a bootstrapping manner during transcription, thereby improving the efficiency of the transcription process. | 02-07-2013 |
20120284026 | SPEAKER VERIFICATION SYSTEM - In an aspect, in general, a method for computer assisted speaker authentication in a voice communication session includes establishing a voice communication session between a first speaker and an agent, accepting a first voice signal from the first speaker, determining a voice characteristic measure of the first voice signal, including characterizing a similarity of the first voice signal to each of one or more stored characterizations of voice signals previously acquired from one or more known speakers, and providing an interface to the agent during the voice communication session between the agent and the first speaker, including presenting an indicator based on the determined voice characteristic measure to the agent. | 11-08-2012 |
20120278071 | TRANSCRIPTION SYSTEM - A transcription system automates the control of the playback of the audio to accommodate the user's ability to transcribe the words spoken. In some examples, a delay between playback and typed input is estimated by processing the typed words using a wordspotting approach. The estimated delay is used as in input to an automated speed control, for example, to maintain a target or maximum delay between playback and typed input. | 11-01-2012 |
20120059656 | Speech Signal Similarity - A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing. | 03-08-2012 |
20120010736 | SPOTTING MULTIMEDIA - A method for detecting sections of a known input in an unknown input includes processing the known input to form a series of discrete-valued feature values associated with corresponding time locations in the known input. Index data associating a plurality of the feature values each with one or more time locations in the known input is then formed. The unknown input is processed to form a series of discrete-valued features values. A time offset between the unknown input and the known input is determined by determining time locations in the known input associated with the feature values of the unknown input. Determining the time offset may include maintaining a distribution of time offsets based on successive determined time locations of the feature values of the unknown input. | 01-12-2012 |
20110216905 | CHANNEL COMPRESSION - Techniques implemented as systems, methods, and apparatuses, including computer program products, for logging multi-channel audio signals. The techniques include receiving a first audio input signal over a first audio channel and a second audio input signal over a second audio channel, the first audio channel and the second audio channel forming portions of a multi-channel call; generating supplemental information representative of characteristics of the first audio input signal, the second audio input signal, or both; after generating the supplemental information, combining the first audio input signal and the second audio input signal to form an audio output signal of a single-channel format; and storing the generated supplemental information in association with an identifier of the audio output signal, wherein at least a portion of the generated supplemental information is sufficient to enable information associated with the first audio input signal, the second audio input signal, or both to be derived from the audio output signal of the single-channel format. | 09-08-2011 |
20110125499 | SPEECH RECOGNITION - Systems, methods, and apparatus, including computer program products for accepting a predetermined vocabulary-dependent characterization of a set of audio signals, the predetermined characterization including an identification of putative occurrences of each of a plurality of vocabulary items in the set of audio signals, the plurality of vocabulary items included in the vocabulary; accepting a new vocabulary item not included in the vocabulary; accepting putative occurrences of the new vocabulary item in the set of audio signals; and generating, by an analysis engine of a speech processing system, an augmented characterization of the set of audio signals based on the identified putative occurrences of the new vocabulary item. | 05-26-2011 |
20110044447 | TREND DISCOVERY IN AUDIO SIGNALS - Techniques for processing data representative of text associated with one or more content sources to generate a specification of a set of keyphrases of interest; processing a first set of audio signals collected during a first time period to generate first data characterizing putative occurrences of one or more keyphrases of the set in the first set of audio signals; evaluating the first data to generate keyphrase-specific comparison values for the first set of audio signals; deriving first trending data between the first set of audio signals and a second set of audio signals based in part on an analysis of the keyphrase-specific comparison values for the first set of audio signals relative to stored keyphrase-specific baseline values; and generating a visual representation of at least some of the first trending data and causing the visual representation of the first trending data to be presented on a display terminal. | 02-24-2011 |
20110037766 | CLUSTER MAP DISPLAY - Systems and methods are providing for using cluster maps in managing multimedia content including, for example, analyzing audio files stored at a call center. Very generally, a cluster map can be used as an effective tool for visualizing condensed information and for improving the understanding of the characteristics and relationships of the data under study. For example, a set of nodes can be displayed in a cluster map as corresponding to a set of information objects. Each information object may represent the result of a respective query conducted against the data. In some embodiments, multiple relationships between various information objects (such as between different query results) can be displayed simultaneously as graphical links in the map, making data comparison and exploration easier and more intuitive. | 02-17-2011 |
20110033036 | REAL-TIME AGENT ASSISTANCE - Some general aspects of the invention relate to systems and methods for improving contact center agent performance, for instance, by integrating real-time call monitoring with speech analytics to present agents with information useful to the handling of the current calls. In some implementations, phonetically based speech analysis techniques are applied to process live audio streams to identify key words and/or phrases of relevance, based on which knowledge articles can be selectively presented to agents to drive more efficient business processes. | 02-10-2011 |
20100332477 | Enhancing Call Center Performance - Some general aspects of the invention relate to systems and methods of processing data, for example, to improve customer interactions. One aspect, in particular, relates to a computer-implemented method that includes accepting user input for analysis of a database having media data and metadata. The media data includes a group of audio recordings and the metadata includes descriptive information of the group of audio recordings. A representation of a set of call series is formed based on user input, and processed to generate an analysis report. A visual representation of the analysis report is formed for presentation to a user. | 12-30-2010 |
20100332225 | TRANSCRIPT ALIGNMENT - Some general aspects relate to systems and methods for media processing. One aspect, for example, relates to a method for aligning multimedia recording with a transcript. A group of search terms are formed from the transcript, with each search term being associated with a location within the transcript. Putative locations of the search terms are determined in a time interval of the multimedia recording. For each search term, zero or more putative locations are determined and, for at least some of the search terms, multiple putative locations are determined in the time interval of the multimedia recording. According to a first sequencing constraint, a first representation of a group of sequences each of a subset of the putative locations of the search terms is formed. A second representation of a group of sequences each of a subset of the search terms is formed. Using the first and the second representations, the time interval of the multimedia recording is partially aligned with the transcript. | 12-30-2010 |
20100329437 | Enterprise Speech Intelligence Analysis - A method includes accepting, via an input interface, a caller identifier parameter and a target value of at least one call series parameter; identifying, using a data processor, a plurality of calls each associated with the caller identifier parameter from amongst a set of calls stored in a call center database; analyzing, using the data processor, the identified plurality of calls to determine a value of the at least one call series parameter for the identified plurality of calls; comparing, using the data processor, the determined value of the at least one call series parameter with the target value of the at least one call series parameter; and defining, using the data processor, the identified plurality of calls as a call series based at least in part on results of the comparing. | 12-30-2010 |
20100299131 | TRANSCRIPT ALIGNMENT - Some general aspects relate to systems, software, and methods for media processing. In one aspect, a script associated with a multimedia recording is accepted, wherein the script includes dialogue, speaker indications and video event indications. A group of search terms are formed from the dialogue, with each search term being associated with a location within the script. Zero or more putative locations of each of the search terms are identified in a time interval of the multimedia recording. For at least some of the search terms, multiple putative locations are identified in the time interval of the multimedia recording. The time interval of the multimedia recording and the script are partially aligned using the determined putative locations of the search terms and one or more of the following: a result of matching audio characteristics of the multimedia recording with the speaker indications, and a result of matching video characteristics of the multimedia recording with the video event indications. Based on a result of the partial alignment, event-localization information is generated. Further processing of the generated event-localization information is enabled. | 11-25-2010 |
20100274667 | MULTIMEDIA ACCESS - A computer-implemented method provides access to multimedia content, which include units of content that include audio components. Meta data for the units of content is formed to an association of key phrases detected in the audio components and the units. In some examples, forming the meta data includes determining a candidate set of key phrases associated with the unit of multimedia and searching for the presence of the candidate key phrases in the audio components. Forming the meta data then includes forming data representing the presence of key phrases in the audio components. | 10-28-2010 |
20100217596 | WORD SPOTTING FALSE ALARM PHRASES - In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns. | 08-26-2010 |
20100138411 | Segmented Query Word Spotting - An approach to words spotting processes a query including a sequence of terms (e.g., words) to identify one or more subsequences that constitute segments (e.g., phrases) that are likely to occur spoken together in the audio begin searched. The segments are searched for as units. An advantage can include improved accuracy as compared to searching for the terms individually. | 06-03-2010 |
20100094622 | FEATURE NORMALIZATION FOR SPEECH AND AUDIO PROCESSING - Systems, method, and apparatus for processing a speech utterance or audio record that includes receiving one or more feature vectors characterizing the speech utterance or audio record, each feature vector having a plurality of feature elements, each feature element being associated with a spectral representation of a characteristic of one of a plurality of sequential segments of the speech utterance or audio record; and processing the one or more feature vectors in a rank order filter to obtain one or more normalized feature vectors, each normalized feature vector having a plurality of normalized feature elements corresponding to the plurality of feature elements. | 04-15-2010 |
20100042644 | TREE-STRUCTURED DATA DISPLAY - Some general aspects of the invention relate to systems and computer-implemented methods of generating a treemap display. A collection of data elements characterized by a first attribute is accepted, and some data elements are grouped into a first set of data elements according to a first rule associated with the first attribute. A treemap field is partitioned into a collection of cells according to the grouping result, and the collection of cells includes a first cell representing the first set of data elements. The first cell has a first dimension corresponding to a value of the first attribute of the first set of data elements. The first set of data elements is then divided into a collection of subsets of data elements according to a second rule. Correspondingly, the first cell of the treemap field is partitioned into a collection of sub-cells according to the division. Each sub-cell represents a respective one of the plurality of subsets of data elements. | 02-18-2010 |
20090164217 | MULTIRESOLUTION SEARCHING - This invention relates to processing of audio files, and more specifically, to an improved technique of searching audio. More particularly, a method and system for processing audio using a multi-stage searching process is disclosed. | 06-25-2009 |
20090119101 | Transcript Alignment - An approach to alignment of transcripts with recorded audio is tolerant of moderate transcript inaccuracies, untranscribed speech, and significant non-speech noise. In one aspect, a number of search terms are formed from the transcript such that each search term is associated with a location within the transcript. Possible locations of the search terms are then determined in the audio recording. The audio recording and the transcript are then aligned using the possible locations of the search terms. In another aspect a search expression is accepted, and then a search is performed for spoken occurrences of the search expression in an audio recording. This search includes searching for text occurrences of the search expression in a text transcript of the audio recording, and searching for spoken occurrences of the search expression in the audio recording. | 05-07-2009 |
20090063151 | KEYWORD SPOTTING USING A PHONEME-SEQUENCE INDEX - In some aspects, a wordspotter is used to locate occurrences in an audio corpus of each of a set of predetermined subword units, which may be phoneme sequences. To locate a query (e.g., a keyword or phrase) in the audio corpus, constituent subword units in the query are indentified and then locations of those subwords are determined based on the locations of those subword units determined earlier by the wordspotter, for example, using a pre-built inverted index that maps subword units to their locations. | 03-05-2009 |
20090055360 | CONSISTENT USER EXPERIENCE IN INFORMATION RETRIEVAL SYSTEMS - An information retrieval system for searching a corpus is configured to operate in a manner that optimizes the consistency of a user experience given a subset of a corpus and a search query. | 02-26-2009 |
20090037176 | CONTROL AND CONFIGURATION OF A SPEECH RECOGNIZER BY WORDSPOTTING - A wordspotting system is applied to a speech source in a preliminary processing phase. The putative hits corresponding to queries (e.g., keywords, key phrases, or more complex queries that may include Boolean expressions and proximity operators) are used to control a speech recognizer. The control can include one or more of application of a time specification that is determined from the putative hits for selecting an interval of the speech source to which to apply the speech recognizer; application of a grammar specification determined from the putative hits that is used by the speech recognizer, and application of a specification of a lattice or pruning specification that is used by the recognizer to limit or guide the recognizer in recognition of the speech source. | 02-05-2009 |
20080300874 | SPEECH SKILLS ASSESSMENT - An approach to evaluating a person's speech skills includes automatically processing speech of a person and text some or all of which corresponds to the speech. In some examples, a job application procedure includes collecting speech from an applicant, and using text corresponding to the collected speech to automatically assess speech skills of the applicant. The text may include text that is presented to the applicant and the speech collected from the applicant can include the applicant reading the presented text. | 12-04-2008 |
20080208872 | ACCESSING MULTIMEDIA - An approach to accessing audio or multimedia content uses associated text sources to segment the content and/or to locate entities in the content. A user interface then provides a user with a way to navigate the content in a non-linear manner based on the segmentation or linking of text entities with locations in the content. The user interface can also provide a way to edit segment-specific content and to publish individual segments of the content. The output of the system, for instance the individual segments of annotated content, can be used to syndicate and/or to improve discoverability of the content. | 08-28-2008 |