Entries |
Document | Title | Date |
20080201147 | Distributed speech recognition system and method and terminal and server for distributed speech recognition - Provided are a distributed speech recognition system, a distributed speech recognition speech method, and a terminal and a server for distributed speech recognition. The distributed speech recognition system includes a terminal which decodes a feature vector that is extracted from an input speech signal into a sequence of phonemes and generates the final recognition result by rescoring a candidate list provided from the outside; and a server which generates the candidate list by performing symbol matching on the recognized sequence of phonemes provided from the terminal and transmits the candidate list for the rescoring to the terminal. | 08-21-2008 |
20080215328 | METHOD AND SYSTEM FOR AUTOMATICALLY DETECTING MORPHEMES IN A TASK CLASSIFICATION SYSTEM USING LATTICES - The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal. | 09-04-2008 |
20080228485 | AURAL SIMILARITY MEASURING SYSTEM FOR TEXT - The aural similarity measuring system and method provides a measure of the aural similarity between a target text ( | 09-18-2008 |
20080319749 | GENERIC SPELLING MNEMONICS - A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary. | 12-25-2008 |
20090043581 | Methods and apparatus relating to searching of spoken audio data - This invention relates to a method of searching spoken audio data for one or more search terms comprising performing a phonetic search of the audio data to identify likely matches to a search term and producing textual data corresponding to a portion of the spoken audio data including a likely match. An embodiment of the method comprises the steps of taking phonetic index data corresponding to the spoken audio data, searching the phonetic index data for likely matches to the search term, wherein when a likely match is detected a portion of the spoken audio data or phonetic index data is selected which includes the likely match and said selected portion of the spoken audio data or phonetic index data is processed using a large vocabulary speech recogniser. The large vocabulary speech recogniser may derive textual data which can be used for further processing or may be used to present a transcript to a user. The present invention therefore combines the benefit of phonetic searching of audio data with the advantages of large vocabulary speech recognition. | 02-12-2009 |
20090048837 | Phonetic tone mark system and method thereof - A system and method that utilizes common symbols for marking the tones of alphabet letters of different languages. The marking system and method employs the symbols from the standard English typing keyboard to denote tones. There are seven phonetic tone marks. Each mark represents a unique tone. The system can be applied to any alphabetic writing letters of different languages to denote specific language tones. The method makes it possible for alphabetic writing of any kind of language and for people to effectively capture the tones of words in different languages. | 02-19-2009 |
20090048838 | System and method for client voice building - Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed. | 02-19-2009 |
20090063151 | KEYWORD SPOTTING USING A PHONEME-SEQUENCE INDEX - In some aspects, a wordspotter is used to locate occurrences in an audio corpus of each of a set of predetermined subword units, which may be phoneme sequences. To locate a query (e.g., a keyword or phrase) in the audio corpus, constituent subword units in the query are indentified and then locations of those subwords are determined based on the locations of those subword units determined earlier by the wordspotter, for example, using a pre-built inverted index that maps subword units to their locations. | 03-05-2009 |
20090089059 | METHOD AND APPARATUS FOR ENABLING MULTIMODAL TAGS IN A COMMUNICATION DEVICE - A method and apparatus for enabling multimodal tags in a communication device is disclosed. The method comprises receiving a first training signal and receiving a second training signal in conjunction with the first training signal. A multimodal tag is created to represent a combination of the first training signal and the second training signal and a function is associated with the created multimodal tag. | 04-02-2009 |
20090112594 | SYSTEM AND METHOD OF USING ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION WHICH DISTINGUISH PRE- AND POST-VOCALIC CONSONANTS - Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon. | 04-30-2009 |
20090125308 | PLATFORM FOR ENABLING VOICE COMMANDS TO RESOLVE PHONEME BASED DOMAIN NAME REGISTRATIONS - A method, apparatus, and system are directed towards employing machine representations of phonemes to generate and manage domain names, and/or messaging addresses. A user of a computing device may provide an audio input signal such as obtained from human language sounds. The audio input signal is received at a phoneme encoder that converts the sounds into machine representations of the sounds using a phoneme representation viewable as a sequence of alpha-numeric values. The sequence of alpha-numeric values may then be combined with a host name, or the like to generate a URI, a message address, or the like. The generated URI, message address, or the like, may then be used to communication over a network. | 05-14-2009 |
20090132251 | SPOKEN DOCUMENT RETRIEVAL SYSTEM - The present invention provides a spoken document retrieval system capable of high-speed and high-accuracy retrieval of where a user-specified keyword is uttered from spoken documents, even if the spoken documents are large in amount. Candidate periods are narrowed down in advance on the basis of a sequence of subwords generated from a keyword, and then the count values of the candidate periods containing the subwords are each calculated by adding up certain values. Through such simple process, the candidate periods are prioritized and then selected as retrieved results. In addition, the sequence of subwords generated from the keyword is complemented assuming that speech recognition errors occur, and then, candidate period generation and selection are performed on the basis of the complemented sequence of subwords. | 05-21-2009 |
20090138266 | APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT FOR RECOGNIZING SPEECH - A contiguous word recognizing unit recognizes speech as a morpheme string, based on an acoustic model and a language model. A sentence obtaining unit obtains an exemplary sentence related to the speech out of a correct sentence storage unit. Based on the degree of matching, a sentence correspondence bringing unit brings first morphemes contained in the recognized morpheme string into correspondence with second morphemes contained in the obtained exemplary sentence. A disparity detecting unit detects one or more of the first morphemes each of which does not match the corresponding one of the second morphemes as disparity portions. A cause information obtaining unit obtains output information that corresponds to a condition satisfied by each of the disparity portions out of a cause information storage unit. An output unit outputs the obtained output information. | 05-28-2009 |
20090150152 | METHOD AND APPARATUS FOR FAST SEARCH IN CALL-CENTER MONITORING - A method and apparatus for indexing one or more audio signals using a speech to text engine and a phoneme detection engine, and generating a combined lattice comprising a text part and a phoneme part. A word to be searched is searched for in the text part, and if not found, or is found with low certainty is divided into phonemes and searched for in the phoneme parts of the lattice. | 06-11-2009 |
20090150153 | GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA - Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model. | 06-11-2009 |
20090150154 | Method and system of generating and detecting confusing phones of pronunciation - A method of generating and detecting confusing phones/syllables is disclosed. The method includes a generating stage and a detecting stage. The generating stage includes: (a) input a Mandarin utterance; (b) partition the Mandarin utterance into segmented phones/syllables and generate the most likely route in a recognition net via Forced Alignment of Viterbi decoding; (c) compare the segmented phones/syllables with a Mandarin acoustic model; (d) determine whether a confusing phone/syllable exists; (e) add the confusing phone/syllable into the recognition net and repeat step (b), (c), and (d) when the confusing phone/syllable exists; (f) stop and output all generated confusing phones/syllables to a confusing phone/syllable file when a confusing phone/syllable does not exist. The detecting stage includes: (g) input a spoken sentence; (h) align the spoken sentence with the recognition net; (i) determine the most likely route of the spoken sentence; and (j) compare the most likely route of the spoken sentence with the target route of the spoken sentence to detect pronunciation error and give high-level pronunciation suggestions. | 06-11-2009 |
20090157402 | METHOD OF CONSTRUCTING MODEL OF RECOGNIZING ENGLISH PRONUNCIATION VARIATION - A method of constructing a model of recognizing English pronunciation variations is used to recognize English pronunciations with different intonations influenced by native languages. The method includes collecting a plurality of sound information corresponding to English expressions; corresponding phonetic alphabets of the native language and English of a region to International Phonetic Alphabets (IPAs), so as to form a plurality of pronunciation models; converting the sound information with the pronunciation models to form a pronunciation variation network of the corresponding English expressions, thereby detecting whether the English expressions have pronunciation variation paths; and finally summarizing the pronunciation variation paths to form a plurality of pronunciation variation rules. Furthermore, the pronunciation variations are represented by phonetics features to infer possible pronunciation variation rules, which are stored to form pronunciation variation models. The construction of the pronunciation variation models enhances applicability of an English recognition system and accuracy of voice recognition. | 06-18-2009 |
20090157403 | HUMAN SPEECH RECOGNITION APPARATUS AND METHOD - A speech recognition apparatus generates a feature vector series corresponding to a speech signal, and recognizes a phoneme series corresponding to the feature vector series using sounds corresponding to phonemes and a phoneme language model. In addition, the speech recognition apparatus recognizes vocabulary that corresponds to the recognized phoneme series. At this time, the phoneme language model represents connection relationships between the phonemes, and is modeled according to time-variant characteristics of the phonemes. | 06-18-2009 |
20090164217 | MULTIRESOLUTION SEARCHING - This invention relates to processing of audio files, and more specifically, to an improved technique of searching audio. More particularly, a method and system for processing audio using a multi-stage searching process is disclosed. | 06-25-2009 |
20090164218 | METHOD AND APPARATUS FOR UNITERM DISCOVERY AND VOICE-TO-VOICE SEARCH ON MOBILE DEVICE - A method, system and communication device for enabling uniterm discovery from audio content and voice-to-voice searching of audio content stored on a device using discovered uniterms. Received audio/voice input signal is sent to a uniterm discovery and search (UDS) engine within the device. The audio data may be associated with other content that is also stored within the device. The UDS engine retrieves a number of uniterms from the audio data and associates the uniterms with the stored content. When a voice search is initiated at the device, the UDS engine generates a statistical latent lattice model from the voice query and scores the uniterms from the audio database against the latent lattice model. Following a further refinement, the best group of uniterms is then determined and segments of the stored audio data and/or other content corresponding to the best group of uniterms are outputted. | 06-25-2009 |
20090177472 | APPARATUS, METHOD, AND PROGRAM FOR CLUSTERING PHONEMIC MODELS - A node initializing unit generates a root node including inputted phonemic models. A candidate generating unit generates candidates of a pair of child sets by partitioning a set of phonemic models included in a node having no child node into two. A candidate deleting unit deletes candidates each including only phonemic models attached with determination information indicating that at least one of the child sets has a small amount of speech data for training. A similarity calculating unit calculates a sum of similarities among the phonemic models included in the child sets. A candidate selecting unit selects one of the candidates having a largest sum. A node generating unit generates two nodes including the two child sets included in the selected candidate, respectively. A clustering unit clusters the phonemic models in units of phonemic model sets each included in a node. | 07-09-2009 |
20090187406 | VOICE RECOGNITION SYSTEM - A voice recognition system is provided that outputs a talk-back voice in a manner such that a user can distinguish the accuracy of a voice-recognized character string more easily. A voice recognition unit performs voice recognition on a user's articulation in which a character string such as the telephone number “024 636 0123” is entered via a microphone. Based on each sound existing period delimited by silent intervals, each recognized partial character string “024”, “636” and “0123” is obtained. A talk-back voice data generating unit connects each recognized partial character string “024”, “636” and “0123” together in a manner such that space characters are inserted, and generates a character string “024 636 0123”. The generated character string “024 636 0123” is supplied to a voice generating device as talk-back voice data. A voice signal to be produced by the speaker 2 is generated in the form of the talk-back voice. | 07-23-2009 |
20090216535 | Engine For Speech Recognition - A computerized method for speech recognition in a computer system. Reference word segments are stored in memory. The reference word segments when concatenated form spoken words in a language. Each of the reference word segments is a combination of at least two phonemes, including a vowel sound in the language. A temporal speech signal is input and digitized to produced a digitized temporal speech signal The digitized temporal speech signal is transformed piecewise into the frequency domain to produce a time and frequency dependent transform function. The energy spectral density of the temporal speech signal is proportional to the absolute value squared of the transform function. The energy spectral density is cut into input time segments of the energy spectral density. Each of the input time segments includes at least two phonemes including at least one vowel sound of the temporal speech signal. For each of the input time segments, (i) a fundamental frequency is extracted from the energy spectral density during the input time segment, (ii) a target segment is selected from the reference segments and thereby a target energy spectral density of the target segment is input. A correlation between the energy spectral density during the time segment and the target energy spectral density of the target segment is performed after calibrating the fundamental frequency to the target energy spectral density thereby improving the correlation. | 08-27-2009 |
20090222266 | APPARATUS, METHOD, AND RECORDING MEDIUM FOR CLUSTERING PHONEME MODELS - A phoneme model clustering apparatus stores a classification condition of a phoneme context, generates a cluster by performing a clustering of context-dependent phoneme models having different acoustic characteristics of central phoneme for each model having a common central phoneme according to the classification condition, sets a conditional response for each cluster according to acoustic characteristics of context-dependent phoneme models included in the cluster, generates a set of clusters by performing a clustering on clusters according to the conditional response, and outputs the context-dependent phoneme models included in the set of clusters. | 09-03-2009 |
20090271200 | Speech recognition assembly for acoustically controlling a function of a motor vehicle - The invention relates to a speech recognition assembly for acoustically controlling a function of a motor vehicle, wherein the speech recognition assembly comprises a microphone disposed in the motor vehicle for inputting a voice command, a data base disposed in the motor vehicle in which respectively at least one meaning is allocated to phonetic representations of voice commands and an on-board-speech-recognition-system disposed in the motor vehicle for determining a meaning of the voice command by use of a meaning of a phonetic representation of a voice command stored in the data base, and wherein the speech recognition assembly further comprises an off-board-speech-recognition-system disposed spatially separated from the motor vehicle for determining a meaning of the voice command. | 10-29-2009 |
20090281807 | VOICE QUALITY CONVERSION DEVICE AND VOICE QUALITY CONVERSION METHOD - A voice quality conversion device converts voice quality of an input speech using information of the speech. The device includes: a target vowel vocal tract information hold unit ( | 11-12-2009 |
20090313019 | EMOTION RECOGNITION APPARATUS - An emotion recognition apparatus is capable of performing accurate and stable speech-based emotion recognition, irrespective of individual, regional, and language differences of prosodic information. The emotion recognition apparatus is an apparatus for recognizing an emotion of a speaker from an input speech, and includes: a speech recognition unit ( | 12-17-2009 |
20090326945 | METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING A MIXED LANGUAGE ENTRY SPEECH DICTATION SYSTEM - An apparatus may include a processor configured to receive vocabulary entry data. The processor may be further configured to determine a class for the received vocabulary entry data. The processor may be additionally configured to identify one or more languages for the vocabulary entry data based upon the determined class. The processor may also be configured to generate a phoneme sequence for the vocabulary entry data for each identified language. Corresponding methods and computer program products are also provided. | 12-31-2009 |
20100049518 | SYSTEM FOR PROVIDING CONSISTENCY OF PRONUNCIATIONS - A system for providing consistency between the pronunciation of a word by a user and a confirmation pronunciation issued by a voice server ( | 02-25-2010 |
20100088098 | SPEECH RECOGNIZER, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM - A speech recognition apparatus includes a speech collating unit that calculates similarities at each time between a feature amount converted by a speech analyzing unit and a word model generated by a word model generating unit. The speech collating unit extracts a word model from word models generated by the word model generating unit, whose minimum similarity among similarities at each time or whose overall similarity obtained from similarities at each time satisfies a second threshold value condition, and whose similarity at each time in a section among vocalization sections of utterance speech and corresponding to either a phoneme or a phoneme string associated with a first threshold value condition satisfies the first threshold value condition, and outputs as a recognition result the recognized word corresponding to the extracted word model. | 04-08-2010 |
20100094630 | ASSOCIATING SOURCE INFORMATION WITH PHONETIC INDICES - The present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources. The phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived. By associating the phonemes with a corresponding source, the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source. | 04-15-2010 |
20100100382 | Detecting Segments of Speech from an Audio Stream - The disclosure describes a speech detection system for detecting one or more desired speech segments in an audio stream. The speech detection system includes an audio stream input and a speech detection technique. The speech detection technique may be performed in various ways, such as using pattern matching and/or signal processing. The pattern matching implementation may extract features representing types of sounds as in phrases, words, syllables, phonemes and so on. The signal processing implementation may extract spectrally-localized frequency-based features, amplitude-based features, and combinations of the frequency-based and amplitude-based features. Metrics may be obtained and used to determine a desired word in the audio stream. In addition, a keypad stream having keypad entries may be used in determining the desired word. | 04-22-2010 |
20100121642 | Speech Data Retrieval Apparatus, Speech Data Retrieval Method, Speech Data Retrieval Program and Computer Usable Medium Having Computer Readable Data Retrieval Program Embodied Therein - A speech data retrieval apparatus ( | 05-13-2010 |
20100121643 | MELODIS CRYSTAL DECODER METHOD AND DEVICE - The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time. | 05-13-2010 |
20100125457 | SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH - Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations. | 05-20-2010 |
20100161335 | METHOD AND SYSTEM FOR DETECTING A RELEVANT UTTERANCE - A method and apparatus for detecting use of an utterance. A voice session including voice signals generated during a conversation between a first participant and a second participant is monitored by a speech analytics processor. The speech analytics processor detects the use of an utterance. A speech recognition processor channel selected from a pool of speech recognition processor channels and is coupled to the voice session. The speech recognition processor provided speech recognition services to a voice-enabled application. The speech recognition processor channel is then decoupled from the voice session. The speech analytics processor continues to monitor the conversation for subsequent use of the utterance. | 06-24-2010 |
20100217598 | SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION RESULT OUTPUT METHOD, AND SPEECH RECOGNITION RESULT OUTPUT PROGRAM - A speech recognition system in which, even when the user makes an utterance including a word that satisfies a predetermined condition such as an unknown word, such a fact can be presented to the user, and the user can confirm the fact easily, is provided. The speech recognition system includes a word speech recognition section that converts input speech to a recognition result word sequence by using a predetermined word dictionary for recognition, a syllable recognition section that converts input speech to a recognition result syllable sequence, a segment determination section that determines a segment that corresponds to a predetermined condition which is a ground for estimating that a word in the converted recognition result word sequence is an unknown word, and an output section that obtains a partial syllable sequence from the recognition result syllable sequence corresponding to the determined segment, and outputs one or more word entries, which are in the vicinity of a position at which the partial syllable sequence is arranged in the word dictionary for recognition in which words are arranged in the order defined for word entries, together with the recognition result word sequence. | 08-26-2010 |
20100324900 | Searching in Audio Speech - A computerized method of detecting a target word in a speech signal. A speech recognition engine and a previously constructed phoneme model is provided. The speech signal is input into the speech recognition engine. Based on the phoneme model, the input speech signal is indexed. A time-ordered list is stored representing n-best phoneme candidates of the input speech signal and phonemes of the input speech signal in multiple phoneme frames. The target word is transcribed into a transcription of target phonemes. The time-ordered list of n-best phoneme candidates is searched for a locus of said target phonemes. While searching, scoring is based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of the target phonemes found. A composite score of the probability of an occurrence of the target word is produced. When the composite score is higher than a threshold, start and finish times are output which bound the locus. The start and finish times are input into an algorithm adapted for sequence alignment based on dynamic programming for aligning a portion of the phoneme frames with the target phonemes. | 12-23-2010 |
20100332231 | LEXICAL ACQUISITION APPARATUS, MULTI DIALOGUE BEHAVIOR SYSTEM, AND LEXICAL ACQUISITION PROGRAM - A lexical acquisition apparatus includes: a phoneme recognition section | 12-30-2010 |
20110010175 | TEXT DATA PROCESSING APPARATUS, TEXT DATA PROCESSING METHOD, AND RECORDING MEDIUM STORING TEXT DATA PROCESSING PROGRAM - Provided is to a text data processing apparatus, method and program to add a symbol at an appropriate position. The apparatus according to this embodiment is a text data processing apparatus that executes edit of a symbol in input text, the apparatus including symbol edit determination means | 01-13-2011 |
20110054901 | METHOD AND APPARATUS FOR ALIGNING TEXTS - A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource. | 03-03-2011 |
20110066437 | METHODS AND APPARATUS TO MONITOR MEDIA EXPOSURE USING CONTENT-AWARE WATERMARKS - Methods and apparatus to construct and transmit content-aware watermarks are disclosed herein. An example method of creating a content-aware watermark includes selecting at least one word associated with a media composition; representing the word with at least one phonetic notation; obtaining a proxy code for each phonetic notation; and locating the proxy code in the content-aware watermark. | 03-17-2011 |
20110093270 | REPLACING AN AUDIO PORTION - A method includes identifying a first syllable in a first audio of a first word and a second syllable in a second audio of a second word, the first syllable having a first set of properties and the second syllable having a second set of properties; detecting the first syllable in a first instance of the first word in an audio file, the first syllable in the first instance having a third set of properties; determining one or more transformations for transforming the first set of properties to the third set of properties; applying the one or more transformations to the second set of properties of the second syllable to yield a transformed second syllable; and replacing the first syllable in the first instance of the first word with the transformed second syllable in the audio file. | 04-21-2011 |
20110153329 | Audio Comparison Using Phoneme Matching - Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected. | 06-23-2011 |
20110166860 | SPOKEN MOBILE ENGINE - Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device. | 07-07-2011 |
20110184737 | SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION ROBOT - A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result. | 07-28-2011 |
20110251844 | GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA - Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model. | 10-13-2011 |
20110282667 | Methods and System for Grammar Fitness Evaluation as Speech Recognition Error Predictor - A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement. | 11-17-2011 |
20110313769 | Method and System for Automatically Detecting Morphemes in a Task Classification System Using Lattices - In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user. | 12-22-2011 |
20110320203 | METHOD AND SYSTEM FOR IDENTIFYING AND CORRECTING ACCENT-INDUCED SPEECH RECOGNITION DIFFICULTIES - A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model. | 12-29-2011 |
20120035932 | Disambiguating Input Based on Context - In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device. | 02-09-2012 |
20120059656 | Speech Signal Similarity - A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing. | 03-08-2012 |
20120065975 | SYSTEM AND METHOD FOR PRONUNCIATION MODELING - Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes. | 03-15-2012 |
20120078630 | Utterance Verification and Pronunciation Scoring by Lattice Transduction - In the field of language learning systems, proper pronunciation of words and phrases is an integral aspect of language learning, determining the proximity of the language learner's pronunciation to a standardized, i.e. ‘perfect’, pronunciation is utilized to guide the learner from imperfect toward perfect pronunciation. In this regard, a phoneme lattice scoring system is utilized, whereby an input from a user is transduced into the perfect pronunciation example in a phoneme lattice. The cost of this transduction may be determined based on a summation of substitutions, deletions and insertions of phonemes needed to transducer from the input to the perfect pronunciation of the utterance. | 03-29-2012 |
20120078631 | RECOGNITION OF TARGET WORDS USING DESIGNATED CHARACTERISTIC VALUES - Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion. | 03-29-2012 |
20120089398 | METHODS AND SYSTEMS FOR IMPROVING TEXT SEGMENTATION - Methods and systems for improving text segmentation are disclosed. In one embodiment, at least a first segmented result and a second segmented result are determined from a string of characters, a first frequency of occurrence for the first segmented result and a second frequency of occurrence for the second segmented result are determined, and an operable segmented result is identified from the first segmented result and the second segmented result based at least in part on the first frequency of occurrence and the second frequency of occurrence. | 04-12-2012 |
20120101823 | SYSTEM AND METHOD FOR RECOGNIZING PROPER NAMES IN DIALOG SYSTEMS - Embodiments of a dialog system that utilizes contextual information to perform recognition of proper names are described. Unlike present name recognition methods on large name lists that generally focus strictly on the static aspect of the names, embodiments of the present system take into account of the temporal, recency and context effect when names are used, and formulates new questions to further constrain the search space or grammar for recognition of the past and current utterances. | 04-26-2012 |
20120116766 | METHOD AND APPARATUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION - A method and apparatus combining the advantages of phonetic search such as the rapid implementation and deployment and medium accuracy, with the advantages of speech to text, including providing the full text of the audio and rapid search. | 05-10-2012 |
20120116767 | METHOD AND SYSTEM OF SPEECH EVALUATION - A method is provided for user speech performance evaluation with respect to a reference performance for which a phoneme mark-up is available. The method includes capturing input speech from the user and formatting it as frames. For a respective frame of the input speech, the method generates probability values for a plurality of phonemes, generates a probability value for a phoneme class based upon the generated probability values for a plurality of phonemes belonging to that phoneme class. For a plurality of frames of the input speech, the method further includes averaging the phoneme class probability values corresponding to the plurality of frames of the input speech. The method also includes calculating a user speech performance score based upon the average. | 05-10-2012 |
20120116768 | Systems and Methods for Extracting Meaning from Multimodal Inputs Using Finite-State Devices - Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input. | 05-10-2012 |
20120136660 | VOICE-ESTIMATION BASED ON REAL-TIME PROBING OF THE VOCAL TRACT - A voice-estimation device that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. The waves reflected by the vocal tract are detected and converted into a digital signal, which is then processed segment-by-segment. Based on the processing, a set of formant frequencies is determined for each segment. Each such set is then analyzed to assign a phoneme to the corresponding segment of the digital signal. The resulting sequence of phonemes is converted into a digital audio signal or text representing the user's estimated voice. | 05-31-2012 |
20120136661 | CONVERTING TEXT INTO SPEECH FOR SPEECH RECOGNITION - The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition. | 05-31-2012 |
20120136662 | SPEECH RECOGNITION SYSTEM WITH HUGE VOCABULARY - The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method. | 05-31-2012 |
20120166196 | Word-Dependent Language Model - This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker. | 06-28-2012 |
20120166197 | CONVERTING TEXT INTO SPEECH FOR SPEECH RECOGNITION - The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition. | 06-28-2012 |
20120173240 | Subspace Speech Adaptation - Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance. | 07-05-2012 |
20120191456 | POSITION-DEPENDENT PHONETIC MODELS FOR RELIABLE PRONUNCIATION IDENTIFICATION - A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable. | 07-26-2012 |
20120197644 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM - An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation. | 08-02-2012 |
20120215539 | HYBRIDIZED CLIENT-SERVER SPEECH RECOGNITION - A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device. | 08-23-2012 |
20120232904 | METHOD AND APPARATUS FOR CORRECTING A WORD IN SPEECH INPUT TEXT - A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly. | 09-13-2012 |
20120239403 | Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition - An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features. | 09-20-2012 |
20120245942 | Computer-Implemented Systems and Methods for Evaluating Prosodic Features of Speech - Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric. | 09-27-2012 |
20120253812 | SPEECH SYLLABLE/VOWEL/PHONE BOUNDARY DETECTION USING AUDITORY ATTENTION CUES - In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm. | 10-04-2012 |
20120253813 | SPEECH SEGMENT DETERMINATION DEVICE, AND STORAGE MEDIUM - A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment. | 10-04-2012 |
20120265531 | Speech based learning/training system using semantic decoding - An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries. | 10-18-2012 |
20120271635 | SPEECH RECOGNITION BASED ON PRONUNCIATION MODELING - A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model. | 10-25-2012 |
20120278079 | COMPRESSED PHONETIC REPRESENTATION - An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input. | 11-01-2012 |
20120296653 | SPEECH RECOGNITION OF CHARACTER SEQUENCES - A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence. | 11-22-2012 |
20120316879 | SYSTEM FOR DETECTING SPEECH INTERVAL AND RECOGNIZING CONTINOUS SPEECH IN A NOISY ENVIRONMENT THROUGH REAL-TIME RECOGNITION OF CALL COMMANDS - A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command. | 12-13-2012 |
20120316880 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM - An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation. | 12-13-2012 |
20120323577 | SPEECH RECOGNITION FOR PREMATURE ENUNCIATION - Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone. | 12-20-2012 |
20130035939 | System and Method for Discriminative Pronunciation Modeling for Voice Search - Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art. | 02-07-2013 |
20130060572 | TRANSCRIPT RE-SYNC - In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence. | 03-07-2013 |
20130085757 | APPARATUS AND METHOD FOR SPEECH RECOGNITION - An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit. | 04-04-2013 |
20130124205 | Providing Programming Information in Response to Spoken Requests - A system allows a user to obtain information about television programming and to make selections of programming using conversational speech. The system includes a speech recognizer that recognizes spoken requests for television programming information. A speech synthesizer generates spoken responses to the spoken requests for television programming information. A user may use a voice user interface as well as a graphical user interface to interact with the system to facilitate the selection of programming choices. | 05-16-2013 |
20130138441 | METHOD AND SYSTEM FOR GENERATING SEARCH NETWORK FOR VOICE RECOGNITION - Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers. | 05-30-2013 |
20130159000 | Spoken Utterance Classification Training for a Speech Recognition System - The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label. | 06-20-2013 |
20130179169 | CHINESE TEXT READABILITY ASSESSING SYSTEM AND METHOD - A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model. | 07-11-2013 |
20130185073 | SPEECH RECOGNITION SYSTEM WITH HUGE VOCABULARY - The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method. | 07-18-2013 |
20130191128 | CONTINUOUS PHONETIC RECOGNITION METHOD USING SEMI-MARKOV MODEL, SYSTEM FOR PROCESSING THE SAME, AND RECORDING MEDIUM FOR STORING THE SAME - A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model. | 07-25-2013 |
20130191129 | Information Processing Device, Large Vocabulary Continuous Speech Recognition Method, and Program - System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score. | 07-25-2013 |
20130218563 | SPEECH UNDERSTANDING METHOD AND SYSTEM - A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes and measures of uncertainty are transmitted to the server, which processes the phonemes for speech understanding and transmits a text of the speech (or the context or understanding of the speech) back to the mobile device. | 08-22-2013 |
20130226583 | AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS - A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language. | 08-29-2013 |
20130231934 | Speech Recognition on Large Lists Using Fragments - A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow a user to select from a list of best matching entries. | 09-05-2013 |
20130262116 | METHOD AND APPARATUS FOR ELEMENT IDENTIFICATION IN A SIGNAL - A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences. | 10-03-2013 |
20130289994 | EMBEDDED SYSTEM FOR CONSTRUCTION OF SMALL FOOTPRINT SPEECH RECOGNITION WITH USER-DEFINABLE CONSTRAINTS - Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply. | 10-31-2013 |
20130289995 | Method and Device for Voice Controlling - The present invention discloses a method and device for voice control, which are used to solve the problem of low success rate of voice control in the prior art. The method includes: classifying stored recognition information used for voice recognizing to obtain a syntax packet corresponding to each type of recognition information ( | 10-31-2013 |
20130304472 | AUTOMATIC MEASUREMENT OF SPEECH FLUENCY - Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process. | 11-14-2013 |
20130339020 | DISPLAY APPARATUS, INTERACTIVE SERVER, AND METHOD FOR PROVIDING RESPONSE INFORMATION - A display apparatus, an interactive server, and a method for providing response information are provided. The display apparatus includes: a voice collector which collects a user's uttered voice, a communication unit which communicates with an interactive server; and, a controller which, if response information corresponding to the uttered voice which is transmitted to the interactive server is received from the interactive server, controls to perform an operation corresponding to the user's uttered voice based on the response information, wherein the response information is generated in a different form according to a function of the display apparatus which is classified based on an utterance element extracted from the uttered voice. Accordingly the display apparatus can execute the function corresponding to each of the uttered voices and can output the response message corresponding to each of the uttered voices, even if a variety of uttered voices are input from the user. | 12-19-2013 |
20140006029 | SYSTEMS AND METHODS FOR MODELING L1-SPECIFIC PHONOLOGICAL ERRORS IN COMPUTER-ASSISTED PRONUNCIATION TRAINING SYSTEM | 01-02-2014 |
20140012578 | SPEECH-RECOGNITION SYSTEM, STORAGE MEDIUM, AND METHOD OF SPEECH RECOGNITION - A speech recognition system that recognizes speech data is provided. The speech recognition system includes a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered, a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood, and a judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data. | 01-09-2014 |
20140019131 | METHOD OF RECOGNIZING SPEECH AND ELECTRONIC DEVICE THEREOF - A method of recognizing a speech and an electronic device thereof are provided. The method includes: segmenting a speech signal into a plurality of sections at preset time intervals; performing a phoneme recognition with respect to one of the plurality of sections of the speech signal by using a first acoustic model; extracting a candidate word of the one of the plurality of sections of the speech signal by using the phoneme recognition result; and performing a speech recognition with respect to the one the plurality of sections the speech signal by using the candidate word. | 01-16-2014 |
20140058732 | METHOD TO PROVIDE INCREMENTAL UI RESPONSE BASED ON MULTIPLE ASYNCHRONOUS EVIDENCE ABOUT USER INPUT - Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user. | 02-27-2014 |
20140074476 | Method and System for Building a Phonotactic Model for Domain Independent Speech Recognition - The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication. | 03-13-2014 |
20140088968 | SYSTEM AND METHOD FOR SPEECH RECOGNITION USING TIMBRE VECTORS - The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition. | 03-27-2014 |
20140108013 | SYSTEM AND METHOD OF SUPPORTING ADAPTIVE MISRECOGNITION IN CONVERSATIONAL SPEECH - A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores, and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command. | 04-17-2014 |
20140114662 | SYSTEM AND METHOD FOR RECOGNIZING SPEECH WITH DIALECT GRAMMARS - Disclosed herein are systems, computer-implemented methods, and computer-readable media for recognizing speech. The method includes receiving speech from a user, perceiving at least one speech dialect in the received speech, selecting at least one grammar from a plurality of optimized dialect grammars based on at least one score associated with the perceived speech dialect and the perceived at least one speech dialect, and recognizing the received speech with the selected at least one grammar. Selecting at least one grammar can be further based on a user profile. Multiple grammars can be blended. Predefined parameters can include pronunciation differences, vocabulary, and sentence structure. Optimized dialect grammars can be domain specific. The method can further include recognizing initial received speech with a generic grammar until an optimized dialect grammar is selected. Selecting at least one grammar from a plurality of optimized dialect grammars can be based on a certainty threshold. | 04-24-2014 |
20140129226 | PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS - Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users. | 05-08-2014 |
20140129227 | LANGUAGE PROCESSING METHOD AND INTEGRATED CIRCUIT - A parse unit parses an input sequence of token elements for an input string, wherein each token element contains a token and/or at least one corresponding token classifier. In a first mode the parse unit applies regular production rules on the token elements and on multi-token classifiers for phrases obtained from the token classifiers. If the first mode parsing does not result in a multi-token classifier encompassing all tokens of the input string, a control unit controls the parse unit to parse the input sequence in a second mode that applies both the regular and artificial production rules. A rule generator unit generates the artificial production rules based on the input sequence and/or intermediate results of the parsing. The parser unit provides a complete parse tree for ungrammatical sentences and a solution where the regular production rules do not cover the complete grammar of the respective natural language. | 05-08-2014 |
20140142945 | Application Services Interface to ASR - An application services interface system includes an automatic speech recognition control application program interface that receives a request for a recognition session from an application-based automatic speech recognition controller. An automatic speech recognition control engine directs the performance of an automatic speech recognition module. The automatic speech recognition module compares a spoken utterance to a vocabulary of active grammars to generate recognition results through limited data interchanges or exchanges. | 05-22-2014 |
20140156278 | SYSTEM AND METHOD FOR DYNAMICALLY GENERATING A RECOGNITION GRAMMAR IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - The system and method described herein may dynamically generate a recognition grammar associated with a conversational voice user interface in an integrated voice navigation services environment. In particular, in response to receiving a natural language utterance that relates to a navigation context at the voice user interface, a conversational language processor may generate a dynamic recognition grammar that organizes grammar information based on one or more topological domains. For example, the one or more topological domains may be determined based on a current location associated with a navigation device, whereby a speech recognition engine may use the grammar information organized in the dynamic recognition grammar according to the one or more topological domains to generate one or more interpretations associated with the natural language utterance. | 06-05-2014 |
20140163987 | SPEECH RECOGNITION APPARATUS - In accordance with alphabet input method information for each user, a word formed of an alphabet string is registered in a word dictionary, in a state where “dotto” being added before each alphabet and one of a set of alphabets difficult to distinguish from each other like “M and N” and “B and P” is repeated twice. For example, a word “PAM” and a feature of time series corresponding to “dotto P P doddo A dotto M” are registered in association with each other. When a user performs a speech input of “PAM”, in accordance with the user's alphabet input method information, the user utters “dotto P P dotto A dotto M”. A speech recognition is performed on this sound data using the word dictionary corresponding to the user's alphabet input method information. | 06-12-2014 |
20140180692 | INTENT MINING VIA ANALYSIS OF UTTERANCES - According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance. | 06-26-2014 |
20140188475 | FAST OUT-OF-VOCABULARY SEARCH IN AUTOMATIC SPEECH RECOGNITION SYSTEMS - A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar. | 07-03-2014 |
20140188476 | CONTENT DELIVERY SYSTEM WITH BARGE-IN MECHANISM AND METHOD OF OPERATION THEREOF - A method of operation of a content delivery system includes: receiving a command phrase based on determining an utterance type according to a travel context; determining a trigger match with a control unit based on the command phrase matching a trigger phrase; and stopping a prompt according to a prompt type based on the trigger match for controlling the prompt presented by a device. | 07-03-2014 |
20140195239 | Systems and Methods for an Automated Pronunciation Assessment System for Similar Vowel Pairs - Computer-implemented systems and methods are provided for assessing non-native speech proficiency. a non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics associated with a first vowel sound and a second vowel sound represented in the non-native speech sample. The vowel characteristics are processed to identify a first vowel pronunciation metric for the first vowel sound and a second vowel pronunciation metric for the second vowel sound, and the first vowel pronunciation metric and the second vowel pronunciation metric are processed to determine whether the non-native speech sample exhibits a distinction in pronunciation of the first vowel sound and the second vowel sound. | 07-10-2014 |
20140222430 | System and Method for Multimodal Utterance Detection - The disclosure describe a system and method for detecting one or more segments of desired speech utterances from an audio stream using timings of events from other modes that are correlated to the timings of the desired segments of speech. The redundant information from other modes results in a highly accurate and robust utterance detection. | 08-07-2014 |
20140222431 | METHOD AND APPARATUS FOR SPEECH RECOGNITION - A computer-implemented method, apparatus and computer program product. The computer-implemented method performed by a computerized device, comprising: transforming a hidden Markov model to qubits; transforming data into groups of qubits, the data being determined upon the hidden Markov model and features extracted from an audio signal, the data representing a likelihood observation matrix representing likelihood of phoneme and state combinations in an audio signal; applying a quantum search algorithm for finding a maximal value of the qubits; and transforming the maximal value of the qubits into a number, the number representing an entry in a delta array used in speech recognition. | 08-07-2014 |
20140236601 | COMMUNICATION APPARATUS - In an agent function, when a character image in motion is displayed, a number of the character images to be displayed is changed, depending on whether or not a speech utterance of a user is being received. In other words, during receipt the speech utterance of the user, the communication apparatus sends the character images to the vehicle-mounted apparatus at a first reduced frequency. Thus, even when a process of receiving the speech utterance and a process of displaying the character images are concurrently performed, the vehicle-mounted apparatus is not overloaded with the processes. Therefore, a stopping state of a process caused by overload of the vehicle-mounted apparatus with processes can be prevented and stoppage of the motion of the character image can also be prevented even during the receipt of the speech utterance. | 08-21-2014 |
20140244259 | SPEECH RECOGNITION UTILIZING A DYNAMIC SET OF GRAMMAR ELEMENTS - Speech recognition is performed utilizing a dynamically maintained set of grammar elements. A plurality of grammar elements may be identified, and the grammar elements may be ordered based at least in part upon contextual information. In other words, contextual information may be utilized to bias speech recognition. Once a speech input is received, the ordered plurality of grammar elements may be evaluated, and a correspondence between the received speech input and a grammar element included in the plurality of grammar elements may be determined. | 08-28-2014 |
20140244260 | METHOD AND APPARATUS FOR RECOGNIZING AND REACTING TO USER PERSONALITY IN ACCORDANCE WITH SPEECH RECOGNITION SYSTEM - Techniques are disclosed for recognizing user personality in accordance with a speech recognition system. For example, a technique for recognizing a personality trait associated with a user interacting with a speech recognition system includes the following steps/operations. One or more decoded spoken utterances of the user are obtained. The one or more decoded spoken utterances are generated by the speech recognition system. The one or more decoded spoken utterances are analyzed to determine one or more linguistic attributes (morphological and syntactic filters) that are associated with the one or more decoded spoken utterances. The personality trait associated with the user is then determined based on the analyzing step/operation. | 08-28-2014 |
20140278423 | Audio Transmission Channel Quality Assessment - A device, system and method for audio transmission quality assessment that occurs during the transmission. A transmission channel such as the internet is used to transmit speech that is spoken by a human speaker, captured at a first end, and transmitted over the transmission channel for reproduction at a second end. The processors at each end of the transmission channel are configured to determine one or more characteristics of the speech such as phonemes. The phonemes are transmitted over a backchannel of the transmission channel to a processor that compares the speech characteristics that were determined at both ends of the call. The participants are notified of a transmission problem that has had an effect on the intelligibility of the speech that was reproduced at the far end if the comparison does not meet a predetermined quality metric. | 09-18-2014 |
20140288934 | SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment. | 09-25-2014 |
20140288935 | APPARATUS AND METHOD FOR FORMING SEARCH ENGINE QUERIES BASED ON SPOKEN UTTERANCES - A combination and a method are provided. Automatic speech recognition is performed on a received utterance. A meaning of the utterance is determined based, at least in part, on the recognized speech. At least one query is formed based, at least in part, on the determined meaning of the utterance. The at least one query is sent to at least one searching mechanism to search for an address of at least one web page that satisfies the at least one query. | 09-25-2014 |
20140297282 | Auto-Generation of Parsing Grammars from a Concept Ontology - An ontology stores information about a domain of an automatic speech recognition (ASR) application program. The ontology is augmented with information that enables subsequent automatic generation of a speech understanding grammar for use by the ASR application program. The information includes hints about how a human might talk about objects in the domain, such as preludes (phrases that introduce an identification of the object) and postludes (phrases that follow an identification of the object). | 10-02-2014 |
20140316785 | SPEECH RECOGNITION SYSTEM INTERACTIVE AGENT - A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different languages are used to support and detect a language spoken by a user. In some implementations an interactive electronic agent responds in the user's language to facilitate a real-time, human like dialogue. | 10-23-2014 |
20140324433 | METHOD AND DEVICE FOR LEARNING LANGUAGE AND COMPUTER READABLE RECORDING MEDIUM - A method and a device for learning a language and a computer readable recording medium are provided. The method includes following steps. An input voice from a voice receiver is transformed into an input sentence according to a grammar rule. Whether the input sentence is the same as a learning sentence displayed on a display is determined. If the input sentence is different from the learning sentence, an ancillary information containing at least one error word in the input sentence that is different from the learning sentence is generated. | 10-30-2014 |
20140358543 | LINKED-WORK ASSISTANCE APPARATUS, METHOD AND PROGRAM - According to one embodiment, a linked-work assistance apparatus includes an analysis unit, an identification unit and a control unit. The analysis unit analyzes a speech of each of users by using a keyword list, to acquire a speech analysis result indicating a relation between a first keyword and a classification of the first keyword, the keyword list indicating a list of keywords classified based on concepts of the keywords and intentions of the keywords. The identification unit identifies a role of each of the users according to the classification of the first keyword, to acquire a correspondence relation between each of the users and the role. The control unit, if the speech includes a name of the role, transmits the speech to other users which relate to the role corresponding to the name, by referring to the correspondence relation. | 12-04-2014 |
20140358544 | SYSTEMS AND METHODS FOR ADAPTIVE PROPER NAME ENTITY RECOGNITION AND UNDERSTANDING - Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question—e.g., words that are not part of the proper name entities—may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input. | 12-04-2014 |
20140372121 | SPEECH PROCESSING DEVICE AND METHOD - A speech processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: obtaining input speech, detecting a vowel segment contained in the input speech, estimating an accent segment contained in the input speech, calculating a first vowel segment length containing the accent segment and a second vowel segment length excluding the accent segment, and controlling at least one of the first vowel segment length and the second vowel segment length. | 12-18-2014 |
20140379347 | SYSTEM AND METHOD FOR EFFICIENT SIGNAL PROCESSING TO IDENTIFYAND UNDERSTAND SPEECH - A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified. | 12-25-2014 |
20140379348 | METHOD AND APPARATUS FOR IMPROVING DISORDERED VOICE - There is provided a method and an apparatus for processing a disordered voice. A method for processing a disordered voice according to an exemplary embodiment of the present invention includes: receiving a voice signal; recognizing the voice signal by phoneme; extracting multiple voice components from the voice signal; acquiring restored voice components by processing at least some disordered voice components of the multiple voice components by phoneme; and synthesizing a restored voice signal based on at least the restored voice components. | 12-25-2014 |
20150019225 | SYSTEMS AND METHODS FOR RESULT ARBITRATION IN SPOKEN DIALOG SYSTEMS - A method for arbitrating spoken dialog results includes receiving a spoken utterance from a user within an environment; receiving first recognition results and a first confidence level associated with the spoken utterance from a first source; receiving second recognition results and a second confidence level associated with the spoken utterance from a second source; receiving human-machine-interface (HMI) information associated with the user; selecting between the first recognition results and the second recognition results based on at least one of the first confidence level, the second confidence level, and the HMI information. | 01-15-2015 |
20150019226 | COMPUTERIZED INFORMATION APPARATUS - A computerized information apparatus for providing information to a user of transport device. In one embodiment, the apparatus includes data processing apparatus, speech recognition and synthesis apparatus, and a network interface to enable voice-driven provision of information obtained both locally within the transport device and from a remote source such as a networked server. In one implementation, the information relates to one or more business entities in an area local to the transport device's location. Information can be both displayed and provided to the user audibly in another implementation. | 01-15-2015 |
20150032453 | SYSTEMS AND METHODS FOR PROVIDING INFORMATION DISCOVERY AND RETRIEVAL - This invention relates generally to software and computers, and more specifically, to systems and methods for providing information discovery and retrieval. In one embodiment, the invention includes a system for providing information discovery and retrieval, the system including a processor module, the processor module configurable to performing the steps of receiving an information request from a consumer device over a communications network; decoding the information request; discovering information using the decoded information request; preparing instructions for accessing the information; and communicating the prepared instructions to the consumer device, wherein the consumer device is configurable to retrieving the information for presentation using the prepared instructions. | 01-29-2015 |
20150039314 | SPEECH RECOGNITION METHOD AND APPARATUS BASED ON SOUND MAPPING - A method and system for speech recognition defined by using a microphone array that is directed to the face of a person speaking. Reading/scanning the output from the microphone array in order to determine which part of a face sound is emitting from. Using this information as input to a speech recognition system for improving speech recognition. | 02-05-2015 |
20150046163 | LEVERAGING INTERACTION CONTEXT TO IMPROVE RECOGNITION CONFIDENCE SCORES - On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance. | 02-12-2015 |
20150073803 | SMOOTHENING THE INFORMATION DENSITY OF SPOKEN WORDS IN AN AUDIO SIGNAL - A portion of an audio signal is identified corresponding to a spoken word and its phonemes. A set of alternate spoken words satisfying phonetic similarity criteria to the spoken word is generated. A subset of the set of alternate spoken words is also identified; each member of the subset shares the same phoneme in a similar temporal position as the spoken word. A significance factor is then calculated for the phoneme based on the number of alternates in the subset and on the total number of alternates. The calculated significance factor may then be used to lengthen or shorten the temporal duration of the phoneme in the audio signal according to its significance in the spoken word. | 03-12-2015 |
20150081304 | SYSTEM FOR SAY-FEEL GAP ANALYSIS IN VIDEO - Systems and techniques using observed emotional data are described herein. An audio stream of a subject corresponding in time to a sequence of visual observations of the subject can be received. A transcript of speech uttered in the audio stream can be produced. A meaning of a string in the transcript can be determined. The sequence of visual observations that correspond to speech that produced the string can be received. An emotional state of the subject can be determined based on the sequence of visual observations. A correlation value can be calculated for the string by comparing the meaning and the emotional state. | 03-19-2015 |
20150081305 | SYSTEM AND METHOD FOR RECOGNIZING SPEECH WITH DIALECT GRAMMARS - Disclosed herein are systems, computer-implemented methods, and computer-readable media for recognizing speech. The method includes receiving speech from a user, perceiving at least one speech dialect in the received speech, selecting at least one grammar from a plurality of optimized dialect grammars based on at least one score associated with the perceived speech dialect and the perceived at least one speech dialect, and recognizing the received speech with the selected at least one grammar. Selecting at least one grammar can be further based on a user profile. Multiple grammars can be blended. Predefined parameters can include pronunciation differences, vocabulary, and sentence structure. Optimized dialect grammars can be domain specific. The method can further include recognizing initial received speech with a generic grammar until an optimized dialect grammar is selected. Selecting at least one grammar from a plurality of optimized dialect grammars can be based on a certainty threshold. | 03-19-2015 |
20150095031 | SYSTEM AND METHOD FOR CROWDSOURCING OF WORD PRONUNCIATION VERIFICATION - Disclosed herein are systems, methods, and computer-readable storage media for crowdsourcing verification of word pronunciations. A system performing word pronunciation crowdsourcing identifies spoken words, or word pronunciations in a dictionary of words, for review by a turker. The identified words are assigned to one or more turkers for review. Assigned turkers listen to the word pronunciations, providing feedback on the correctness/incorrectness of the machine made pronunciation. The feedback can then be used to modify the lexicon, or can be stored for use in configuring future lexicons. | 04-02-2015 |
20150112683 | DOCUMENT SEARCH DEVICE AND DOCUMENT SEARCH METHOD - An utterance content estimator estimates a document ID corresponding to an answer to user input analysis results from a document on the basis of an utterance estimating model that is generated by learning a correspondence between hypothetical questions each as to a content of the document and document IDs each of which is an answer to one of the hypothetical questions. A result integrator integrates document estimation results of the utterance estimating model and document search results of search indexes so as to generate final search results. | 04-23-2015 |
20150127346 | SELECTING ALTERNATES IN SPEECH RECOGNITION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting alternates in speech recognition. In some implementations, data is received that indicates multiple speech recognition hypotheses for an utterance. Based on the multiple speech recognition hypotheses, multiple alternates for a particular portion of a transcription of the utterance are identified. For each of the identified alternates, one or more features scores are determined, the features scores are input to a trained classifier, and an output is received from the classifier. A subset of the identified alternates is selected, based on the classifier outputs, to provide for display. Data indicating the selected subset of the alternates is provided for display. | 05-07-2015 |
20150134336 | Robust Information Extraction From Utterances - The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical. | 05-14-2015 |
20150134337 | CONVERSATION BASED SEARCH SYSTEM AND METHOD - A conversation based search method includes the steps of: proposing an utterance phrase corresponding to a query input by a user; and reforming the query by means of an answer from the user and offering a search result corresponding to the reformed query. | 05-14-2015 |
20150302848 | SPEECH RETRIEVAL METHOD, SPEECH RETRIEVAL APPARATUS, AND PROGRAM FOR SPEECH RETRIEVAL APPARATUS - A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold. | 10-22-2015 |
20150302851 | GESTURE-BASED CUES FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM - A method of recognizing continuous digits uttered by a speaker using an automatic speech recognition (ASR) system includes receiving continuous digits via a microphone as speech from a user; detecting that recognition of one or more of the continuous digits falls below a predetermined confidence threshold; prompting the user to identify the continuous digits using a body gesture; detecting the body gesture made by the user; and identifying one or more of the continuous digits based on the body gesture. | 10-22-2015 |
20150310853 | SYSTEMS AND METHODS FOR SPEECH ARTIFACT COMPENSATION IN SPEECH RECOGNITION SYSTEMS - A method for speech recognition includes generating a speech prompt, receiving a spoken utterance from a user in response to the speech prompt, wherein the spoken utterance includes a speech artifact, and compensating for the speech artifact. Compensating for the speech artifact may include, for example, utilizing a recognition grammar that includes the speech artifact as a speech component, or modifying the spoken utterance to eliminate the speech artifact. | 10-29-2015 |
20150310854 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing device including an analysis unit configured to analyze a character string indicating contents of utterance obtained as a result of speech recognition, and a display control unit configured to display the character string indicating the contents of the utterance and an analysis result on a display screen. | 10-29-2015 |
20150310860 | SPEECH RETRIEVAL METHOD, SPEECH RETRIEVAL APPARATUS, AND PROGRAM FOR SPEECH RETRIEVAL APPARATUS - A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold. | 10-29-2015 |
20150310865 | VEHICLE VOICE RECOGNITION SYSTEMS AND METHODS - A voice recognition system includes a microphone for receiving speech from a user and processing electronics. The processing electronics are in communication with the microphone and are configured to use a plurality of rules to evaluate user interactions with the voice recognition system. The processing electronics automatically determine and set an expertise level in response to and based on the evaluation. The processing electronics are configured to automatically adjust at least one setting of the voice recognition system in response to the set expertise level. | 10-29-2015 |
20150317973 | SYSTEMS AND METHODS FOR COORDINATING SPEECH RECOGNITION - Methods and systems are provided for coordinating recognition of a speech utterance between a speech system of a vehicle and a speech system of a user device. In one embodiment, a method includes: receiving the speech utterance from a user; performing speech recognition on the speech utterance to determine a topic of the speech utterance; determining whether the speech utterance was meant for the speech system of the vehicle or the speech system of the user device based on the topic of the speech utterance; and selectively providing the speech utterance to the speech system of the vehicle or the speech system of the user device based on the determination of whether the speech utterance was meant for the speech system of the vehicle or the speech system of the user device. | 11-05-2015 |
20150340026 | EXTRACTING CANDIDATE ANSWERS FOR A KNOWLEDGE BASE FROM CONVERSATIONAL SOURCES - A method and system is provided to extract candidate utterances from conversational data. The conversational data includes a plurality of utterances and is stored in an electronic storage. A superficial property algorithm is applied to the stored conversational data. The superficial property algorithm is used to (i) search at least a portion of the stored conversational data by application of at least one superficial property of the superficial property algorithm, (ii) it is determined when the searched portion of the conversational data includes a candidate utterance, and (iii) then the portion of the conversational data which was determined to be the candidate utterance is then stored. | 11-26-2015 |
20150340030 | OPERATION ASSISTING METHOD AND OPERATION ASSISTING DEVICE - An operation assisting method comprising comparing input spoken voices with a preliminarily stored keyword associated with an operation target and determining whether or not the keyword is spoken, determining whether or not similarity between or among the input spoken voices falls within a predetermined range. In a case where it is determined that the keyword is not spoken, determining whether or not eyes of a user are directed at the operation target, and in a case of the similarity falling within the predetermined range, determining that the keyword is spoken, in a case of being determined that the eyes of the user are directed at the operation target. | 11-26-2015 |
20150340033 | CONTEXT INTERPRETATION IN NATURAL LANGUAGE PROCESSING USING PREVIOUS DIALOG ACTS - Features are disclosed for processing and interpreting natural language, such as interpretations of user utterances, in multi-turn dialog interactions. Context information regarding interpretations of user utterances and system responses to the user utterances can be maintained. Subsequent user utterances can be interpreted using the context information, rather than being interpreted without context. In some cases, interpretations of subsequent user utterances can be merged with interpretations of prior user utterances using a rule-based framework. Rules may be defined to determine which interpretations may be merged and under what circumstances they may be merged. | 11-26-2015 |
20150364133 | SYSTEM AND METHOD FOR DELIVERING TARGETED ADVERTISEMENTS AND/OR PROVIDING NATURAL LANGUAGE PROCESSING BASED ON ADVERTISEMENTS - The system and method described herein may use various natural language models to deliver targeted advertisements and/or provide natural language processing based on advertisements. In one implementation, an advertisement associated with a product or service may be provided for presentation to a user. A natural language utterance of the user may be received. The natural language utterance may be interpreted based on the advertisement and, responsive to the existence of a pronoun in the natural language utterance, a determination of whether the pronoun refers to one or more of the product or service or a provider of the product or service may be effectuated. | 12-17-2015 |
20150371628 | USER-ADAPTED SPEECH RECOGNITION - One embodiment of the present disclosure sets forth an approach for performing speech recognition. A speech recognition system receives an electronic signal that represents human speech of a speaker. The speech recognition system converts the electronic signal into a plurality of phonemes. The speech recognition system, while converting the plurality of phonemes into a first group of words based on a first voice recognition model, encounters an error when attempting to convert one or more of the phonemes into words. The speech recognition system transmits a message associated with the error to a server machine. The speech recognition system causes the server machine to convert the one or more phonemes into a second group of words based on a second voice recognition model resident on the server machine. The speech recognition system receives the second group of words from the server machine. | 12-24-2015 |
20160004502 | SYSTEM AND METHOD FOR CORRECTING SPEECH INPUT - A system and method for correcting speech input are disclosed. A particular embodiment includes: receiving a base input string; detecting a correction operation; receiving a replacement string in response to the correction operation; generating a base object set from the base input string and a replacement object set from the replacement string; identifying a matching base object of the base object set that is most phonetically similar to a replacement object of the replacement object set; and replacing the matching base object with the replacement object in the base input string. | 01-07-2016 |
20160005398 | METHOD AND SYSTEM FOR EFFICIENT SPOKEN TERM DETECTION USING CONFUSION NETWORKS - Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries. | 01-07-2016 |
20160019882 | SYSTEMS AND METHODS FOR SPEECH ANALYTICS AND PHRASE SPOTTING USING PHONEME SEQUENCES - A contact center system can receive audio messages. The system can review audio messages by identifying phoneme strings within the audio messages associated with a characteristic. A phoneme can be a component of spoken language. Identified phoneme strings are used to analyze subsequent audio messages to determine the presence of the characteristic without requiring human analysis. Thus, the identification of phoneme strings then can be used to determine a characteristic of audio messages without transcribing the messages. | 01-21-2016 |
20160019889 | SPEAKER VERIFICATION USING CO-LOCATION INFORMATION - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score. | 01-21-2016 |
20160027437 | METHOD AND APPARATUS FOR SPEECH RECOGNITION AND GENERATION OF SPEECH RECOGNITION ENGINE - A method and apparatus for speech recognition and for generation of speech recognition engine, and a speech recognition engine are provided. The method of speech recognition involves receiving a speech input, transmitting the speech input to a speech recognition engine, and receiving a speech recognition result from the speech recognition engine, in which the speech recognition engine obtains a phoneme sequence from the speech input and provides the speech recognition result based on a phonetic distance of the phoneme sequence. | 01-28-2016 |
20160035344 | IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores. | 02-04-2016 |
20160035347 | SYSTEMS AND METHODS FOR PERFORMING ASR IN THE PRESENCE OF HETEROGRAPHS - Systems and methods for performing ASR in the presence of heterographs are provided. Verbal input is received from the user that includes a plurality of utterances. A first of the plurality of utterances is matched to a first word. It is determined that a second utterance in the plurality of utterances matches a plurality of words that is in a same heterograph set. It is identified which one of the plurality of words is associated with a context of the first word. A function is performed based on the first word and the identified one of the plurality of words. | 02-04-2016 |
20160063992 | SYSTEM AND METHOD FOR MULTI-AGENT ARCHITECTURE FOR INTERACTIVE MACHINES - Systems, methods, and computer-readable storage devices are for an event-driven multi-agent architecture improves via a semi-hierarchical multi-agent reinforcement learning approach. A system receives a user input during a speech dialog between a user and the system. The system then processes the user input, identifying an importance of the user input to the speech dialog based on a user classification and identifying a variable strength turn-taking signal inferred from the user input. An utterance selection agent selects an utterance for replying to the user input based on the importance of the user input, and a turn-taking agent determines whether to output the utterance based on the utterance, and the variable strength turn-taking signal. When the turn-taking agent indicates the utterance should be output, the system selects when to output the utterance. | 03-03-2016 |
20160063993 | FACET RECOMMENDATIONS FROM SENTIMENT-BEARING CONTENT - A “Facet Recommender” creates conversational recommendations for facets of particular conversational topics, and optionally for things associated with those facets, from consumer reviews or other social media content. The Facet Recommender applies a machine-learned facet model and optional sentiment-model, to identify facets associated with spans or segments of the content and to determine neutral, positive, or negative consumer sentiment associated with those facets and, optionally, things associated with those facets. These facets are selected by the facet model from a list or set of manually defined or machine-learned facets for particular conversational topic types. The Facet Recommender then generates new conversational utterances (i.e., short neutral, positive or negative suggestions) about particular facets based on the sentiments associated with those facets. In various implementations, utterances are fit to one or more predefined conversational frameworks. Further, responses or suggestions provided as utterances may be personalized to individual users. | 03-03-2016 |
20160063998 | AUTOMATIC SPEECH RECOGNITION BASED ON USER FEEDBACK - Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result. | 03-03-2016 |
20160104479 | ACOUSTIC IMPULSE RESPONSE SIMULATION - At least one spoken utterance and a stored vehicle acoustic impulse response can be provided to a computing device. The computing device is programmed to provide at least one speech file based at least in part on the spoken utterance and the vehicle acoustic impulse response. | 04-14-2016 |
20160104480 | HOTWORD DETECTION ON MULTIPLE DEVICES - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data. | 04-14-2016 |
20160118040 | INITIATING ACTIONS BASED ON PARTIAL HOTWORDS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, receiving audio data; determining that an initial portion of the audio data corresponds to an initial portion of a hotword; in response to determining that the initial portion of the audio data corresponds to the initial portion of the hotword, selecting, from among a set of one or more actions that are performed when the entire hotword is detected, a subset of the one or more actions; and causing one or more actions of the subset to be performed. | 04-28-2016 |
20160125875 | INITIATING ACTIONS BASED ON PARTIAL HOTWORDS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, receiving audio data; determining that an initial portion of the audio data corresponds to an initial portion of a hotword; in response to determining that the initial portion of the audio data corresponds to the initial portion of the hotword, selecting, from among a set of one or more actions that are performed when the entire hotword is detected, a subset of the one or more actions; and causing one or more actions of the subset to be performed. | 05-05-2016 |
20160148613 | DISPLAY APPARATUS AND METHOD FOR REGISTRATION OF USER COMMAND - A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user. | 05-26-2016 |
20160155437 | BEHAVIOR ADJUSTMENT USING SPEECH RECOGNITION SYSTEM | 06-02-2016 |
20160171984 | SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING | 06-16-2016 |
20160180834 | VOICE RETRIEVAL APPARATUS, VOICE RETRIEVAL METHOD, AND NON-TRANSITORY RECORDING MEDIUM | 06-23-2016 |
20160379671 | WEARABLE WORD COUNTER - This disclosure generally relates to a portable device. Specifically, this disclosure generally relates to a portable word counter device. The portable word counter device includes a digital microcontroller circuit. The digital microcontroller circuit includes a syllable detector detecting syllables in spoken speech. The syllable detector aggregates a number of detected syllables and applies a syllable to word counted ratio. Based on the syllable to word counted ratio, the syllable detector determines a number of words spoken, and transmits the number of words spoken to a mobile device. | 12-29-2016 |
20180025723 | GENERATION DEVICE, RECOGNITION SYSTEM, AND GENERATION METHOD FOR GENERATING FINITE STATE TRANSDUCER | 01-25-2018 |