Entries |
Document | Title | Date |
20080201139 | Generic framework for large-margin MCE training in speech recognition - A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met. | 08-21-2008 |
20080201140 | AUTOMATIC IDENTIFICATION OF SOUND RECORDINGS - Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device. | 08-21-2008 |
20080208576 | Digital Video Reproducing Apparatus - Character information recognition means ( | 08-28-2008 |
20080208577 | Multi-stage speech recognition apparatus and method - Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal. | 08-28-2008 |
20080215318 | EVENT RECOGNITION - Recognition of events can be performed by accessing an audio signal having static and dynamic features. A value for the audio signal can be calculated by utilizing different weights for the static and dynamic features such that a frame of the audio signal can be associated with a particular event. A filter can also be used to aid in determining the event for the frame. | 09-04-2008 |
20080215319 | Query by humming for ringtone search and download - Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth. | 09-04-2008 |
20080215320 | Apparatus And Method To Reduce Recognition Errors Through Context Relations Among Dialogue Turns - Disclosed is directed an apparatus and method to reduce recognition errors through context relations among multiple dialogue turns. The apparatus includes a rule set storage unit having a rule set containing one or more rules, an evolutionary rule generation module connected to the rule storage unit, and a rule trigger unit connected to the rule storage unit. The rule set uses dialogue turn as a unit for the information described by each rule. The method analyzes a dialogue history through an evolutionary massive parallelism approach to get a rule set describing the context relation among dialogue turns. Based on the rule set and recognition result of an ASR system, it reevaluates the recognition result, and measures the confidence measure of the reevaluated recognition result. After each successful dialogue turn, the rule set is dynamically adapted. | 09-04-2008 |
20080235012 | System and method of identifying contact information - A system and method for identifying contact information is provided. A system to identify contact information may include an input to receive a data stream. The data stream may include audio content, video content or both. The system may also include an analysis module to detect contact information within the data stream. The system may also include a memory to store a record of the contact information. | 09-25-2008 |
20080243498 | METHOD AND SYSTEM FOR PROVIDING INTERACTIVE SPEECH RECOGNITION USING SPEAKER DATA - An interactive speech recognition process and system is disclosed. A user is prompted for selection of one of a number designated phrases represented in a grammar database. Speech recognition processing is applied to an uttered response from the user to match data in the grammar database, thereby identifying the selected phrase. The user is requested to confirm a determined match. If the match is not confirmed, the data corresponding to the matched phrase is removed from the grammar database and the user is re-prompted to select from the remaining phrases. | 10-02-2008 |
20080243499 | SYSTEM AND METHOD OF SPEECH RECOGNITION TRAINING BASED ON CONFIRMED SPEAKER UTTERANCES - An interactive speech recognition training process and system is disclosed. A speech recognition process is applied to a received speaker utterance. Utterance data are matched by the system with data in a grammar database and the speaker is requested to confirm a determined match. If the system determines from the speaker's response that the match is not confirmed, a negative score is assigned to the utterance data. If the match is determined by the system to be confirmed, a positive score is assigned to the utterance data. Scores for a plurality of such speaker utterances are accumulated in a log file, the accumulated scores used to adjust acoustic models for the grammar database. | 10-02-2008 |
20080249770 | Method and apparatus for searching for music based on speech recognition - Provided is a method and apparatus for searching music based on speech recognition. By calculating search scores with respect to a speech input using an acoustic model, calculating preferences in music using a user preference model, reflecting the preferences in the search scores, and extracting a music list according to the search scores in which the preferences are reflected, a personal expression of a search result using speech recognition can be achieved, and an error or imperfection of a speech recognition result can be compensated for. | 10-09-2008 |
20080255835 | USER DIRECTED ADAPTATION OF SPOKEN LANGUAGE GRAMMER - A method and system for interacting with a speech recognition system. A lattice of candidate words is displayed. The lattice of candidate words may include the output of a speech recognizer. Candidate words representing temporally serial utterances may be directly joined in the lattice. A path through the lattice represents a selection of one or more candidate words interpreting one or more corresponding utterances. An interface allows a user to select a path in the lattice. A selection of the path in the lattice may be received and the selection may be stored. The selection may be provided as positive feedback to the speech recognizer. | 10-16-2008 |
20080255836 | METHOD AND SYSTEM FOR A RECOGNITION SYSTEM HAVING A VERIFICATION RECOGNITION SYSTEM - A method and system for performing computer implemented recognition is disclosed. In one method embodiment, the present invention first accesses user input stored in a memory of a mobile device. On the mobile device, the present invention performs a coarse recognition process on the user input to generate a coarse result. The coarse process may operate in real-time. The embodiment then displays a portion of the coarse result on a display screen of the mobile device. The embodiment further performs a detailed recognition process on the user input to generate a detailed result. The detailed process has more recognition patterns and computing resources available to it. The present embodiment performs a comparison of the detailed result and the coarse result. The present embodiment displays a portion of the comparison on the display screen. | 10-16-2008 |
20080275699 | Systems and methods of performing speech recognition using global positioning (GPS) information - Embodiments of the present invention improve content selection systems and methods using speech recognition. In one embodiment, the present invention includes a speech recognition method comprising receiving location parameters from a global positioning system, retrieving location data using the location parameters, and configuring one or more recognition sets of a speech recognizer using the location data. | 11-06-2008 |
20080281590 | Method of Deriving a Set of Features for an Audio Input Signal - The invention describes a method of deriving a set of features (S) of an audio input signal (M), which method comprises identifying a number of first-order features (f | 11-13-2008 |
20080294431 | Displaying text of speech in synchronization with the speech - Displays a character string representing content of speech in synchronization with reproduction of the speech. An apparatus includes: a unit for obtaining scenario data representing the speech; a unit for dividing textual data resulting from recognition of the speech to generate pieces of recognition pieces of recognition data; a unit for detecting in the scenario data a character matching each character contained in each piece of recognition data for which no matching character string has been detected to detect in the scenario data a character string that matches the piece of recognition data; and a unit for setting the display timing of displaying each of character strings contained in the scenario data to the timing at which speech recognized as the piece of recognition data that matches the character string is reproduced. | 11-27-2008 |
20080300870 | Method and Module for Improving Personal Speech Recognition Capability - A method and a module for improving personal speech recognition capability for use in a portable electronic device are provided. The portable electronic device has a predetermined recognition model constructed of a phoneme model for recognizing at least a command speech from a user. The method comprises the steps of: establishing a database having specific characters which are related to the command speech; construing an adaptation parameter by retrieving a plurality of speech datum spoken by the user according to the database; and modulating the recognition model by integrating the phoneme model and the adaptation parameter. The user can effectively adapt the recognition model to improve the recognition capability according to the above steps. | 12-04-2008 |
20080306735 | SYSTEMS AND METHODS FOR INDICATING PRESENCE OF DATA - Included are systems and methods for indicating presence of data. At least one embodiment of a method includes receiving communications data associated with a communication session and determining at least one point of audio silence in the communications session. Some embodiments include creating tagging data configured to indicate the at least one point of audio silence in the communications session. | 12-11-2008 |
20080319741 | SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS - Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook. | 12-25-2008 |
20090006087 | SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH - A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device. | 01-01-2009 |
20090012785 | SAMPLING RATE INDEPENDENT SPEECH RECOGNITION - A sampling-rate-independent method of automated speech recognition (ASR). Speech energies of a plurality of codebooks generated from training data created at an ASR sampling rate are compared to speech energies in a current frame of acoustic data generated from received audio created at an audio sampling rate below the ASR sampling rate. A codebook is selected from the plurality of codebooks, and has speech energies that correspond to speech energies in the current frame over a spectral range corresponding to the audio sampling rate. Speech energies above the spectral range are copied from the selected codebook and appended to the current frame. | 01-08-2009 |
20090018827 | Media usage monitoring and measurement system and method - Media monitoring and measurement systems and methods are disclosed. Some embodiments of the present invention provide a media measurement system and method that utilizes audience data to enhance content identifications. Some embodiments analyze media player log data to enhance content identification. Other embodiments of the present invention analyze sample sequence data to enhance content identifications. Other embodiments analyze sequence data to enhance content identification and/or to establish channel identification. Yet other embodiments provide a system and method in which sample construction and selection parameters are adjusted based upon identification results. Yet other embodiments provide a method in which play-altering activity of an audience member is deduced from content offset values of identifications corresponding to captured samples. Yet other embodiments provide a monitoring and measurement system in which a media monitoring device is adapted to receive a wireless or non-wireless audio signal from a media player, the audio signal also being received wirelessly by headphones of a user of the monitoring device. | 01-15-2009 |
20090024388 | METHOD AND APPARATUS FOR SEARCHING A MUSIC DATABASE - A method for a user to buy a song from a remote music source, the method comprising the steps of
| 01-22-2009 |
20090043576 | SYSTEM AND METHOD FOR TUNING AND TESTING IN A SPEECH RECOGNITION SYSTEM - Systems and methods for improving the performance of a speech recognition system. In some embodiments a tuner module and/or a tester module are configured to cooperate with a speech recognition system. The tester and tuner modules can be configured to cooperate with each other. In one embodiment, the tuner module may include a module for playing back a selected portion of a digital data audio file, a module for creating and/or editing a transcript of the selected portion, and/or a module for displaying information associated with a decoding of the selected portion, the decoding generated by a speech recognition engine. In other embodiments, the tester module can include an editor for creating and/or modifying a grammar, a module for receiving a selected portion of a digital audio file and its corresponding transcript, and a scoring module for producing scoring statistics of the decoding based at least in part on the transcript. | 02-12-2009 |
20090076811 | Decision Analysis System - A decision analysis system ( | 03-19-2009 |
20090076812 | Media usage monitoring and measurement system and method - Media monitoring and measurement systems and methods are disclosed. Some embodiments of the present invention provide a media measurement system and method that utilizes audience data to enhance content identifications. Some embodiments analyze media player log data to enhance content identification. Other embodiments of the present invention analyze sample sequence data to enhance content identifications. Other embodiments analyze sequence data to enhance content identification and/or to establish channel identification. Yet other embodiments provide a system and method in which sample construction and selection parameters are adjusted based upon identification results. Yet other embodiments provide a method in which play-altering activity of an audience member is deduced from content offset values of identifications corresponding to captured samples. Yet other embodiments provide a monitoring and measurement system in which a media monitoring device is adapted to receive a wireless or non-wireless audio signal from a media player, the audio signal also being received wirelessly by headphones of a user of the monitoring device. | 03-19-2009 |
20090112583 | Language Processing System, Language Processing Method and Program - A language processing system, method, and program to automatically in time obtain text analysis results. The system comprises a plurality of text analysis units, each performing a different type of text analysis processing, an analysis order control means controlling order of analysis of text by each of the text analysis means, and an additional processing execution means receiving and executing additional processing for the text analysis results from each of the text analysis means, from a user. At a stage which a text analysis result by any one of said text analysis units is outputted and said additional processing execution unit operates, said analysis order control unit performs control to start text analysis processing for other text analysis means. | 04-30-2009 |
20090157399 | APPARATUS AND METHOD FOR EVALUATING PERFORMANCE OF SPEECH RECOGNITION - An apparatus for evaluating the performance of speech recognition includes a speech database for storing N-number of test speech signals for evaluation. A speech recognizer is located in an actual environment and executes the speech recognition of the test speech signals reproduced using a loud speaker from the speech database in the actual environment to produce speech recognition results. A performance evaluation module evaluates the performance of the speech recognition by comparing correct recognition results answers with the speech recognition results. | 06-18-2009 |
20090164213 | Digital Media Recognition Apparatus and Methods - One of the embodiments of the invention includes a method of identifying illegal uses of copyright material. The steps of the method preferably include the steps of: (a) providing a primary digital media object, (b) associating an auxiliary construct with the object, (c) transforming the construct using at least one of the attributes of the object to generate a unique key representative of the primary object, (d) receiving a plurality of secondary digital media objects, (e) performing steps (b) and (c) on the secondary objects to generate unique keys representative of the secondary objects, (f) comparing the keys of the secondary objects with the key of the primary object to identify if any of the secondary objects are substantially similar to the primary object. | 06-25-2009 |
20090204398 | Measurement of Spoken Language Training, Learning & Testing - The fluency of a spoken utterance or passage is measure and presented to the speaker and to others. In one embodiment, a method is described that includes recording a spoken utterance, evaluating the spoken utterance for accuracy, evaluating the spoken utterance for duration, and assigning a score to the spoken utterance based on the accuracy and the duration. | 08-13-2009 |
20090210223 | APPARATUS AND METHOD FOR SOUND RECOGNITION IN PORTABLE DEVICE - Provided are an apparatus and a method capable of recognizing a sound through a reduced burden of computations and a noise-tolerant technique. The sound recognition apparatus in a portable device includes a memory unit that stores at least one base sound and a sound input unit that receives a sound input. The sound recognition apparatus also includes a control unit that receives the sound input from the sound input unit, extracts peak values of the sound input, calculates statistical data by using the peak values, and determines whether the sound input is equal to a base sound by using the statistical data. | 08-20-2009 |
20090222262 | Systems And Methods For Blind Source Signal Separation - Signal separation techniques based on frequency dependency are described. In one implementation, a blind signal separation process is provided that avoids the permutation problem of previous signal separation processes. In the process, two or more signal sources are provided, with each signal source having recognized frequency dependencies. The process uses these inter-frequency dependencies to more robustly separate the source signals. The process receives a set of mixed signal input signals, and samples each input signal using a rolling window process. The sampled data is transformed into the frequency domain, which provides channel inputs to the inter-frequency dependent separation process. Since frequency dependencies have been defined for each source, the process is able to use the frequency dependency to more accurately separate the signals. The process can use a learning algorithm that preserves frequency dependencies within each source signal, and can remove dependencies between or among the signal sources. | 09-03-2009 |
20090281804 | PROCESSING UNIT, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, STORAGE MEDIUM STORING SPEECH RECOGNITION PROGRAM - A processing unit is provided which executes speech recognition on speech signals captured by a microphone for capturing sounds uttered in an environment. The processing unit has: an initial reflection component extraction portion that extracts initial reflection components by removing diffuse reverberation components from a reverberation pattern of an impulse response generated in the environment; and an acoustic model learning portion that learns an acoustic model for the speech recognition by reflecting the initial reflection components to speech data for learning. | 11-12-2009 |
20090287483 | METHOD AND SYSTEM FOR IMPROVED SPEECH RECOGNITION - A method for speech recognition includes: prompting a user with a first query to input speech into a speech recognition engine; determining if the inputted speech is correctly recognized; wherein in the event the inputted speech is correctly recognized proceeding to a new task; wherein in the event the inputted speech is not correctly recognized, prompting the user repeatedly with the first query to input speech into the speech recognition engine, and determining if the inputted speech is correctly recognized until a predefined limit on repetitions has been met; wherein in the event the predefined limit has been met without correctly recognizing the inputted user speech, prompting speech input from the user with a secondary query for redundant information; and cross-referencing the user's n-best result from the first query with the n-best result from the second query to obtain a top hypothesis. | 11-19-2009 |
20090287484 | System and Method for Targeted Tuning of a Speech Recognition System - A system and method of targeted tuning of a speech recognition system are disclosed. In a particular embodiment, a method includes determining a frequency of occurrence of a particular type of utterance method and includes determining whether the frequency of occurrence exceeds a threshold. The method further includes tuning a speech recognition system to improve recognition of the particular type of utterance when the frequency of occurrence of the particular type of utterance exceeds the threshold. | 11-19-2009 |
20100023328 | Audio Recognition System - A system and method of identifying an audio track uses music identification software that produces a fingerprint or audio profile for an audio segment recorded with a portable communication device. The audio profile is transmitted from the portable communication device to a remote service provider over a communication network. The remote server receives the transmitted audio track profile and compares the profile to a stored database of audio tracks. If a matching audio track is identified by the remote server, metadata relating to the identified audio track is transmitted from the remote server to the portable communication device. The received audio track metadata is then displayed on the portable communication device. | 01-28-2010 |
20100036660 | Emotion Detection Device and Method for Use in Distributed Systems - A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker's utterances. | 02-11-2010 |
20100049513 | AUTOMATIC CONVERSATION SYSTEM AND CONVERSATION SCENARIO EDITING DEVICE - A conversation scenario editor generates/edits a conversation scenario for an automatic conversation system. The system includes a conversation device and a conversation server. The conversation device generates an input sentence through speech recognition of an utterance by a user. The conversation server determines the reply sentence based on the conversation scenario when a reply sentence to the input sentence is requested from the conversation device. The editor includes a language model generator for generating a language model to be used for the speech recognition based on the conversation scenario. According to the editor, a non-expert can generate the language model to provide an adequate conversation based on the speech recognition. | 02-25-2010 |
20100057450 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 03-04-2010 |
20100057451 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 03-04-2010 |
20100063813 | SYSTEM AND METHOD FOR MULTIDIMENSIONAL GESTURE ANALYSIS - Hand gestures are translated by first detecting the hand gestures with an electronic sensor and converting the detected gestures into respective electrical transfer signals in a frequency band corresponding to that of speech. These transfer signals are inputted in the audible-sound frequency band into a speech-recognition system where they are analyzed. | 03-11-2010 |
20100070273 | SPEECH SYNTHESIS AND VOICE RECOGNITION IN METROLOGIC EQUIPMENT - An electronic test equipment apparatus is provided. A metrologic device is adapted for creating stimulus signals and capturing responses from electronic devices under test (DUTs). An auditory device is in communication with the metrologic device. The auditory device is adapted for converting an output of the metrologic device to an audio signal to be heard by a user. | 03-18-2010 |
20100106497 | INTERNAL AND EXTERNAL SPEECH RECOGNITION USE WITH A MOBILE COMMUNICATION FACILITY - In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria. | 04-29-2010 |
20100145693 | METHOD OF DECODING NONVERBAL CUES IN CROSS-CULTURAL INTERACTIONS AND LANGUAGE IMPAIRMENT - A method for extracting verbal cues is presented which enhances a speech signal to increase the saliency and recognition of verbal cues including emotive verbal cues. In a further embodiment of the method, the method works in conjunction with a computer that displays a face which gestures and articulates non-verbal cues in accord with speech patterns that are also modified to enhance their verbal cues. The methods work to provide a means for allowing non-fluent speakers to better understand and learn foreign languages. | 06-10-2010 |
20100169088 | AUTOMATED DEMOGRAPHIC ANALYSIS - A method of generating demographic information relating to an individual is provided. The method includes monitoring an environment for a voice activity of an individual and detecting the voice activity of the individual. The method further includes analyzing the detected voice activity of the individual and determining, based on the detected voice activity of the individual, a demographic descriptor of the individual. | 07-01-2010 |
20100179810 | Method for recognizing and distributing music - A customer for music distributed over the interne may select a composition from a menu of written identifiers (such as the song title and singer or group) and then confirm that the composition is indeed the one desired by listening to a corrupted version of the composition. If the customer has forgotten the song title or the singer or other words that provide the identifier, he or she may hum or otherwise vocalize a few bars of the desired composition, or pick the desired composition out on a simulated keyboard. A music-recognition system then locates candidates for the selected composition and displays identifiers for these candidates to the customer. | 07-15-2010 |
20100191528 | SPEECH SIGNAL PROCESSING APPARATUS - A speech signal processing apparatus comprising: a control signal output unit configured to receive as an input signal either one of a first speech signal corresponding to a sound uttered by a user and a second speech signal corresponding to a sound output from an eardrum of the user when the user utters a sound, and output a control signal corresponding to a noise level of the input signal; and a speech signal output unit configured to output either one of the first speech signal and the second speech signal according to the control signal. | 07-29-2010 |
20100191529 | Systems And Methods For Managing Multiple Grammars in a Speech Recognition System - Systems and methods are described for a speech system that manages multiple grammars from one or more speech-enabled applications. The speech system includes a speech server that supports different grammars and different types of grammars by exposing several methods to the speech-enabled applications. The speech server supports static grammars that do not change and dynamic grammars that may change after a commit. The speech server provides persistence by supporting persistent grammars that enable a user to issue a command to an application even when the application is not loaded. In such a circumstance, the application is automatically launched and the command is processed. The speech server may enable or disable a grammar in order to limit confusion between grammars. Global and yielding grammars are also supported by the speech server. Global grammars are always active (e.g., “call 9-1-1”) while yielding grammars may be deactivated when an interaction whose grammar requires priority is active. | 07-29-2010 |
20100198591 | PORTABLE TERMINAL AND MANAGEMENT SYSTEM - A portable terminal having an audio pickup means that acquires sound, an absolute position detection unit that detects the absolute position of the portable terminal, a relative position detection unit that detects the relative position of the portable terminal, and a speech recognition and synthesis unit that recognizes the audio acquired by the audio pickup means as speech, is achieved with a simple configuration. A portable terminal ( | 08-05-2010 |
20100217588 | APPARATUS AND METHOD FOR RECOGNIZING A CONTEXT OF AN OBJECT - Moving information of an object is input, and first sound information around the object is input. A motion status of the object is recognized based on the moving information. Second sound information is selectively extracted from the first sound information, based on the motion status. A first feature quantity is extracted from the second sound information. A plurality of models is stored in a memory. Each model has a second feature quantity and a corresponding specified context. The second feature quantity is previously extracted by the second extraction unit before the first feature quantity is extracted. A present context of the object is decided based on the specified context corresponding to the second feature quantity most similar to the first feature quantity. The present context of the object is output. | 08-26-2010 |
20100235167 | SPEECH RECOGNITION LEARNING SYSTEM AND METHOD - One or more embodiments include a speech recognition learning system for improved speech recognition. The learning system may include a speech optimizing system. The optimizing system may receive a first stimulus data package including spoken utterances having at least one phoneme, and contextual information. A number of result data packages may be retrieved which include stored spoken utterances and contextual information. A determination may be made as to whether the first stimulus data package requires improvement. A second stimulus data package may be generated based on the determination. A number of speech recognition implementation rules for implementing the second stimulus data package may be received. The rules may be associated with the contextual information. A determination may be made as to whether the second stimulus data package requires further improvement. Based on the determination, one or more additional speech recognition implementation rules for improved speech recognition may be generated. | 09-16-2010 |
20100235168 | TERMINAL AND METHOD FOR EFFICIENT USE AND IDENTIFICATION OF PERIPHERALS HAVING AUDIO LINES - A communication system comprises a terminal configured for being able to communicate with a computer and to operate according to at least one operational parameter. A peripheral device for use with the terminal has a characterizing parameter associated therewith. The terminal is operable for reading the characterizing parameter from the peripheral device when the device is coupled to the terminal. The terminal is further operable for configuring itself to operate according to an operational parameter associated with the characterizing parameter of the peripheral device. | 09-16-2010 |
20100312555 | LOCAL AND REMOTE AGGREGATION OF FEEDBACK DATA FOR SPEECH RECOGNITION - A local feedback mechanism for customizing training models based on user data and directed user feedback is provided in speech recognition applications. The feedback data is filtered at different levels to address privacy concerns for local storage and for submittal to a system developer for enhancement of generic training models. | 12-09-2010 |
20100318353 | COMPRESSOR AUGMENTED ARRAY PROCESSING - The present invention relates generally to the use of compressors, with an optional noise extractor, to improve audio sensing performance of one or more microphones. The audio sensing performance of a single element microphone array with dynamic range compression can be improved by the use of a noise extractor, to modify the operation of the compressor, typically to avoid noise floor amplification. Dynamic range compression can be applied to the output of two or more element microphone array processing with the optional use of a noise extractor. Dynamic range compression can precede the microphone array processing with the optional use of a noise extractor. Syllabic dynamic range compression may be used in one or more element microphone arrays, with the optional use of a noise extractor, which increases speech recognition accuracy. | 12-16-2010 |
20100332224 | METHOD AND APPARATUS FOR CONVERTING TEXT TO AUDIO AND TACTILE OUTPUT - In accordance with an example embodiment of the present invention, an apparatus comprises a controller configured to process punctuated text data, and to identify punctuation in said punctuated text data; and an output unit configured to generate audio output corresponding to said punctuated text data, and to generate tactile output corresponding to said identified punctuation. | 12-30-2010 |
20110010170 | USE OF MULTIPLE SPEECH RECOGNITION SOFTWARE INSTANCES - A wireless communication device is disclosed that accepts recorded audio data from an end-user. The audio data can be in the form of a command requesting user action. Likewise, the audio data can be converted into a text file. The audio data is reduced to a digital file in a format that is supported by the device hardware, such as a .wav, .mp | 01-13-2011 |
20110015924 | ACOUSTIC SOURCE SEPARATION - A method of separating a mixture of acoustic signals from a plurality of sources comprises: providing pressure signals indicative of time-varying acoustic pressure in the mixture; defining a series of time windows; and for each time window: a) providing from the pressure signals a series of sample values of measured directional pressure gradient; b) identifying different frequency components of the pressure signals c) for each frequency component defining an associated direction; and d) from the frequency components and their associated directions generating a separated signal for one of the sources. | 01-20-2011 |
20110022384 | WIND TURBINE CONTROL SYSTEM AND METHOD FOR INPUTTING COMMANDS TO A WIND TURBINE CONTROLLER - A method and a control system are provided for inputting commands to a wind turbine controller during a service or maintenance procedure. A command orally input by a user is transformed into an electrical signal representing the orally input command. The electrical signal is transformed into an input command signal which is further transformed into a reproduction signal. A user is provided the reproduction signal along with a confirmation request in a form recognized by a user, such as visually or speech representation. After the user confirms the request, a signal based on the input command is sent to the wind tower controller. | 01-27-2011 |
20110022385 | METHOD AND EQUIPMENT OF PATTERN RECOGNITION, ITS PROGRAM AND ITS RECORDING MEDIUM - The present invention provides a method and equipment of pattern recognition capable of efficiently pruning partial hypotheses without lowering recognition accuracy, its pattern recognition program, and its recording medium. In a second search unit, a likelihood calculation unit calculates an acoustic likelihood by matching time series data of acoustic feature parameters against a lexical tree stored in a second database and an acoustic model stored in a third database to determine an accumulated likelihood by accumulating the acoustic likelihood in a time direction. A self-transition unit causes each partial hypothesis to make a self-transition in a search process. An LR transition unit causes each partial hypothesis to make an RL transition. A reward attachment unit adds a reward R(x) in accordance with the number of reachable words to each partial hypothesis to raise the accumulated likelihood. A pruning unit excludes partial hypotheses with less likelihood from search targets. | 01-27-2011 |
20110029306 | AUDIO SIGNAL DISCRIMINATING DEVICE AND METHOD - An audio discriminating device includes a plurality of audio discriminators for discriminating an input audio signal as a speech signal or a non-speech signal by using at least one feature parameter, and determines whether to drive the audio discriminator connected next to the corresponding audio discriminator according to the audio discriminator's audio signal discriminating result. | 02-03-2011 |
20110029307 | SYSTEM AND METHOD FOR MOBILE AUTOMATIC SPEECH RECOGNITION - A system and method of updating automatic speech recognition parameters on a mobile device are disclosed. The method comprises storing user account-specific adaptation data associated with ASR on a computing device associated with a wireless network, generating new ASR adaptation parameters based on transmitted information from the mobile device when a communication channel between the computing device and the mobile device becomes available and transmitting the new ASR adaptation data to the mobile device when a communication channel between the computing device and the mobile device becomes available. The new ASR adaptation data on the mobile device more accurately recognizes user utterances. | 02-03-2011 |
20110035215 | METHOD, DEVICE AND SYSTEM FOR SPEECH RECOGNITION - Disclosed is a method and apparatus for signal processing and signal pattern recognition. According to some embodiments of the present invention, events in the signal to be processed/recognized may be used to pace or clock the operation of one or more processing elements. The detected events may be based on signal energy level measurements. The processing/recognition elements may be neuron models. The signal to be processed/recognized may be a speech signal. | 02-10-2011 |
20110040559 | SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event. | 02-17-2011 |
20110046948 | AUTOMATIC SOUND RECOGNITION BASED ON BINARY TIME FREQUENCY UNITS - The invention relates to a method of automatic sound recognition. The object of the present invention is to provide an alternative scheme for automatically recognizing sounds, e.g. human speech. The problem is solved by providing a training database comprising a number of models, each model representing a sound element in the form of a binary mask comprising binary time frequency (TF) units which indicate the energetic areas in time and frequency of the sound element in question, or of characteristic features or statistics extracted from the binary mask; providing an input signal comprising an input sound element; estimating the input sound element based on the models of the training database to provide an output sound element. The method has the advantage of being relatively simple and adaptable to the application in question. The invention may e.g. be used in devices comprising automatic sound recognition, e.g. for sound, e.g. voice control of a device, or in listening devices, e.g. hearing aids, for improving speech perception. | 02-24-2011 |
20110046949 | DEVICE, METHOD AND SYSTEM FOR DETECTING UNWANTED CONVERSATIONAL MEDIA SESSION - Some embodiments of the invention relate to a method and a system for detecting unwanted conversational media session data. In accordance with one aspect of the invention, a method of detecting unwanted conversation media session data according to some embodiments of the invention may include calculating two or more progressive similarity scores each with respect to a different instant during a progress of a real-time conversational media session, wherein each of said scores is associated with a similarity between the conversational media session's media data that was available at the associated instant and a reference data item corresponding to media data of a previous conversational media session, and evaluating progressive similarity between the real-time conversational media session and the reference data item based upon the two or more progressive similarity scores. | 02-24-2011 |
20110054890 | APPARATUS AND METHOD FOR AUDIO MAPPING - A mobile phone, and corresponding method, which is arranged to detect sounds of different types and to indicate to a user the direction from which those sounds are coming from. The mobile phone includes a microphone for recording sound and a display for providing feedback to the user. The phone also includes a sound mapping program which is arranged to interpret the sound recorded by the microphone and to provide an audio map of detected sounds. This is presented to the user on the display. | 03-03-2011 |
20110060586 | VOICE APPLICATION NETWORK PLATFORM - A distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user. A management system may control and direct the voice applications rendering agent to create voice applications that are personalized for individual users based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information. The voice applications agent and components of customized voice applications may be resident on a local user device which includes a voice browser and speech recognition capabilities. The local device, voice applications rendering agent and management system may be interconnected via a communications network. | 03-10-2011 |
20110071823 | SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, AND STORAGE MEDIUM STORING PROGRAM FOR SPEECH RECOGNITION - A purpose is to suppress recognition process delay generated due to load in signal processing. Included is a speech input means | 03-24-2011 |
20110082694 | REAL-TIME DATA PATTERN ANALYSIS SYSTEM AND METHOD OF OPERATION THEREOF - A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units. | 04-07-2011 |
20110087490 | ADJUSTING RECORDER TIMING - A portion of audio content of a multimedia program, such as a television program, is captured from a network. An audio fingerprint is generated based on the portion of audio content, and the audio fingerprint is matched to one of multiple theme song fingerprints stored in a database. An expected theme song time offset associated with the matched theme song fingerprint is retrieved from the database. It is determined whether the program is running on-schedule, based on the time the portion of audio content occurred, a scheduled start time of the program, and/or the expected theme song time offset. If it is determined that the program is running off-schedule, an adjusted start time and/or an adjusted end time of the program are calculated. The program is recorded by a recorder based on the adjusted start time and/or the adjusted end time. | 04-14-2011 |
20110125496 | SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND PROGRAM - A speech recognition device includes a sound source separation unit configured to separate a mixed signal of outputs of a plurality of sound sources into signals corresponding to individual sound sources and generate separation signals of a plurality of channels; a speech recognition unit configured to input the separation signals of the plurality of channels, the separation signals being generated by the sound source separation unit, perform a speech recognition process, generate a speech recognition result corresponding to each channel, and generate additional information serving as evaluation information on the speech recognition result corresponding to each channel; and a channel selection unit configured to input the speech recognition result and the additional information, calculate a score of the speech recognition result corresponding to each channel by applying the additional information, and select and output a speech recognition result having a high score. | 05-26-2011 |
20110131040 | MULTI-MODE SPEECH RECOGNITION - A method and an in-vehicle system having a speech recognition component are provided for improving speech recognition performance. The speech recognition component may have multiple vocabulary dictionaries, each of which may include phonetics associated with commands. When the in-vehicle system receives speech input, the speech recognition component may determine whether the received speech input includes a speech access command. If the received speech input is determined to include a speech access command, then a dictionary changing component may transition a currently-used dictionary of the speech recognition component to a vocabulary dictionary associated with the determined speech access command. Otherwise, the dictionary changing component may transition the currently-used dictionary to a first vocabulary dictionary. A command included in the received speech input may then be recognized by the speech recognition component using the transitioned currently-used dictionary. | 06-02-2011 |
20110137648 | SYSTEM AND METHOD FOR IMPROVED AUTOMATIC SPEECH RECOGNITION PERFORMANCE - Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand. | 06-09-2011 |
20110137649 | METHOD FOR DYNAMIC SUPPRESSION OF SURROUNDING ACOUSTIC NOISE WHEN LISTENING TO ELECTRICAL INPUTS - A listening instrument includes a) a microphone unit for picking up an input sound from the current acoustic environment of the user and converting it to an electric microphone signal; b) a microphone gain unit for applying a specific microphone gain to the microphone signal and providing a modified microphone signal; c) a direct electric input signal representing an audio signal; d) a direct gain unit for applying a specific direct gain to the direct electric input signal and providing a modified direct electric input signal; e) a detector unit for classifying the current acoustic environment and providing one or more classification parameters; f) a control unit for controlling the specific microphone gain applied to the electric microphone signal and/or the specific direct gain applied to the direct electric input signal based on the one or more classification parameters. | 06-09-2011 |
20110161076 | Intuitive Computing Methods and Systems - A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed. | 06-30-2011 |
20110161077 | METHOD AND SYSTEM FOR PROCESSING MULTIPLE SPEECH RECOGNITION RESULTS FROM A SINGLE UTTERANCE - A method of and system for accurately determining a caller response by processing speech-recognition results and returning that result to a directed-dialog application for further interaction with the caller. Multiple speech-recognition engines are provided that process the caller response in parallel. Returned speech-recognition results comprising confidence-score values and word-score values from each of the speech-recognition engines may be modified based on context information provided by the directed-dialog application and grammars associated with each speech-recognition engine. An optional context database may be used to further reduce or add weight to confidence-score values and word-score values, remove phrases and/or words, and add phrases and/or words to the speech-recognition engine results. In situations where a predefined threshold-confidence-score value is not exceeded, a new dynamic grammar may be created. A set of n-best hypotheses of what the caller uttered is returned to the directed-dialog application. | 06-30-2011 |
20110166855 | Systems and Methods for Hands-free Voice Control and Voice Search - In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step. | 07-07-2011 |
20110202337 | Method and Discriminator for Classifying Different Segments of a Signal - For classifying different segments of a signal which has segments of at least a first type and second type, e.g. audio and speech segments, the signal is short-term classified on the basis of the at least one short-term feature extracted from the signal and a short-term classification result is delivered. The signal is also long-term classified on the basis of the at least one short-term feature and at least one long-term feature extracted from the signal and a long-term classification result is delivered. The short-term classification result and the long-term classification result are combined to provide an output signal indicating whether a segment of the signal is of the first type or of the second type. | 08-18-2011 |
20110202338 | SYSTEM AND METHOD FOR RECOGNITION OF ALPHANUMERIC PATTERNS INCLUDING LICENSE PLATE NUMBERS - Voice recognition technology is combined with external information sources and/or contextual information to enhance the quality of voice recognition results specifically for the use case of reading out or speaking an alphanumeric identifier. The alphanumeric identifier may be associated with a good, service, person, account, or other entity. For example, the identifier may be a vehicle license plate number. | 08-18-2011 |
20110202339 | SPEECH SOUND DETECTION APPARATUS - A speech sound detection apparatus receives an input audio signal (as a sound reception unit), and computes input power that indicates a magnitude of the sound represented by the audio signal (as an input power computation unit). The apparatus estimates a correction function that is a continuous function defining a relation between a certain frequency and a correction coefficient used to approximate the input power computed at that frequency to the reference power predetermined for that frequency (as a correction function estimation unit). The apparatus corrects the input power at every frequency, based upon the correction coefficient that is obtained in accordance with the relation defined by the estimated correction function (as an input power correcting unit). The apparatus further determines whether or not the sound represented by the received audio signal is speech sound, based upon the corrected input power (as a speech sound detection unit). | 08-18-2011 |
20110208519 | REAL-TIME DATA PATTERN ANALYSIS SYSTEM AND METHOD OF OPERATION THEREOF - A method of operation of a real-time data-pattern analysis system includes: providing a memory module, a computational unit, and an integrated data transfer module arranged within an integrated circuit die; storing a data pattern within the memory module; transferring the data pattern from the memory module to the computational unit using the integrated data transfer module; and comparing processed data to the data pattern using the computational unit. | 08-25-2011 |
20110224978 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM - An information processing device includes an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken, an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information, an audio-image-combined speech recognition score calculating unit which is input with the word information and the mouth movement information, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process, and an information integration processing unit which is input with the score and executes a speaker specification process. | 09-15-2011 |
20110238415 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 09-29-2011 |
20110257969 | MAIL RECEIPT APPARATUS AND METHOD BASED ON VOICE RECOGNITION - A mail receipt method based on voice recognition, includes receiving input voice data required for an mail receipt; and recognizing information about the mail receipt from the received input voice data. Further, the mail receipt method based on the voice recognition includes storing the recognized information about the mail receipt to complete the mail receipt. | 10-20-2011 |
20110257970 | VOICED PROGRAMMING SYSTEM AND METHOD - A voiced programming system and methods are provided herein. | 10-20-2011 |
20110282661 | METHOD FOR SPEAKER SOURCE CLASSIFICATION - A method for classifying a pair of audio signals into an agent audio signal and a customer audio signal. One embodiment relates to unsupervised training, in which the training corpus comprises a multiplicity of audio signal pairs, wherein each pair comprises an agent signal and a customer signal, and wherein it is unknown for each signal if it is by the agent or by the customer. Training is based on the agent signals being more similar to one another than the customer signals. An agent cluster and a customer cluster are determined. The input signals are associated with the agent or the customer according to the higher score combination of the input signals and the clusters. | 11-17-2011 |
20110282662 | Customer Service Data Recording Device, Customer Service Data Recording Method, and Recording Medium - To enable determining the correlation between customer satisfaction and employee satisfaction, a speech acquisition unit | 11-17-2011 |
20110288859 | LANGUAGE CONTEXT SENSITIVE COMMAND SYSTEM AND METHOD - A system and method implements a command system in a speech recognition context in such a way as to enable a user to speak a voice command in a first spoken language to a computer that is operating an application in a second spoken language configuration. The command system identifies the first spoken language the user is speaking, recognizes the voice command, identifies the second spoken language of a target application, and selects the command action in the second spoken language that correlates to the voice command provided in the first spoken language. | 11-24-2011 |
20110295601 | SYSTEM AND METHOD FOR AUTOMATIC IDENTIFICATION OF SPEECH CODING SCHEME - Methods and systems for extracting speech from such packet streams. The methods and systems analyze the encoded speech in a given packet stream, and automatically identify the actual speech coding scheme that was used to produce it. These techniques may be used, for example, in interception systems where the identity of the actual speech coding scheme is sometimes unavailable or inaccessible. For instance, the identity of the actual speech coding scheme may be sent in a separate signaling stream that is not intercepted. As another example, the identity of the actual speech coding scheme may be sent in the same packet stream as the encoded speech, but in encrypted form. | 12-01-2011 |
20110301949 | SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION) - In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user. | 12-08-2011 |
20110301950 | SPEECH INPUT DEVICE, SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNITION METHOD - A device for speech input includes a speech input unit configured to convert a speech of a user to a speech signal; an angle detection unit configured to detect an angle of the speech input unit; a distance detection unit configured to detect a distance between the speech input unit and the user; and an input switch unit configured to control on and off of the speech input unit based on the angle and the distance. | 12-08-2011 |
20110307250 | Modular Speech Recognition Architecture - A speech recognition system is provided. The speech recognition system includes a speech recognition module; a plurality of domain specific dialog manager modules that communicate with the speech recognition module to perform speech recognition; and a speech interface module that that communicates with the plurality of domain specific dialog manager modules to selectively enable the speech recognition. | 12-15-2011 |
20110307251 | Sound Source Separation Using Spatial Filtering and Regularization Phases - Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform. | 12-15-2011 |
20110313762 | SPEECH OUTPUT WITH CONFIDENCE INDICATION - A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score. | 12-22-2011 |
20120016670 | METHODS AND APPARATUSES FOR IDENTIFYING AUDIBLE SAMPLES FOR USE IN A SPEECH RECOGNITION CAPABILITY OF A MOBILE DEVICE - Techniques for provided which may be implemented using various methods and/or apparatuses in a mobile device to allow for speech recognition based, at least in part, on context information associated with at least a portion of at least one navigational region, e.g., associated with a location of the mobile device. A speech recognition capability may, for example, be provided with a set of audible samples based, at least in part, on the context information. Such speech recognition capability may be provided by the mobile device and/or by one or more other devices coupled to the mobile device. | 01-19-2012 |
20120022862 | SPEECH RECOGNITION CIRCUIT AND METHOD - A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to the scores. | 01-26-2012 |
20120029916 | METHOD FOR PROCESSING MULTICHANNEL ACOUSTIC SIGNAL, SYSTEM THEREFOR, AND PROGRAM - A method for processing multichannel acoustic signals which is characterized by calculating the feature quantity of each channel from the input signals of a plurality of channels, calculating similarity between the channels in the feature quantity of each channel, selecting channels having high similarity, and separating signals using the input signals of the selected channels. | 02-02-2012 |
20120035922 | METHOD AND APPARATUS FOR CONTROLLING WORD-SEPARATION DURING AUDIO PLAYOUT - A word-separation control capability is provided herein. An apparatus having a word-separation control capability includes a processor configured for controlling a length of separation between adjacent words of audio during playout of the audio. The processor is configured for analyzing a locator analysis region of buffered audio for identifying boundaries between adjacent words of the buffered audio, and, for each identified boundary between adjacent words, associating a boundary marker with the identified boundary. The locator analysis region of the buffered audio may be analyzed using syntactic and/or non-syntactic speech recognition capabilities. The boundary markers may all have the same thickness, or the thickness of the boundary markers may vary based on the length of separation between the adjacent words of the respective boundaries. The boundary markers are associated with the buffered audio for use in controlling the word-separation during the playout of the audio. | 02-09-2012 |
20120065968 | SPEECH RECOGNITION METHOD - In a speech recognition method, a number of audio signals are obtained from a voice input of a number of utterances of at least one speaker into a pickup system. The audio signals are examined using a speech recognition algorithm and a recognition result is obtained for each audio signal. For a reliable recognition of keywords in a conversation, it is proposed that a recognition result for at least one other audio signal is included in the examination of one of the audio signals by the speech recognition algorithm. | 03-15-2012 |
20120072211 | USING CODEC PARAMETERS FOR ENDPOINT DETECTION IN SPEECH RECOGNITION - Systems, methods and apparatus for determining an estimated endpoint of human speech in a sound wave received by a mobile device having a speech encoder for encoding the sound wave to produce an encoded representation of the sound wave. The estimated endpoint may be determined by analyzing information available from the speech encoder, without analyzing the sound wave directly and without producing a decoded representation of the sound wave. The encoded representation of the sound wave may be transmitted to a remote server for speech recognition processing, along with an indication of the estimated endpoint. | 03-22-2012 |
20120072212 | SYSTEM AND METHOD FOR MOBILE AUTOMATIC SPEECH RECOGNITION - A system and method of updating automatic speech recognition parameters on a mobile device are disclosed. The method comprises storing user account-specific adaptation data associated with ASR on a computing device associated with a wireless network, generating new ASR adaptation parameters based on transmitted information from the mobile device when a communication channel between the computing device and the mobile device becomes available and transmitting the new ASR adaptation data to the mobile device when a communication channel between the computing device and the mobile device becomes available. The new ASR adaptation data on the mobile device more accurately recognizes user utterances. | 03-22-2012 |
20120072213 | SPEECH SOUND INTELLIGIBILITY ASSESSMENT SYSTEM, AND METHOD AND PROGRAM THEREFOR - The speech sound intelligibility assessment system includes: an output section for presenting a speech sound to a user; a biological signal measurement section for measuring an electroencephalogram signal of the user; a positive component determination section for determining presence/absence of a positive component of an event-related potential in the electroencephalogram signal in a zone from 600 ms to 800 ms from a starting point, which is a point in time at which the output section presents a speech sound; a negative component determination section for determining presence/absence of a negative component of an event-related potential in the electroencephalogram signal in a zone from 100 ms to 300 ms from the same starting point; and an assessment section for evaluating whether the user has clearly aurally comprehended the presented speech sound or not based on the results of determination of presence/absence of the positive and negative components, respectively. | 03-22-2012 |
20120078621 | SPARSE REPRESENTATION FEATURES FOR SPEECH RECOGNITION - Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. | 03-29-2012 |
20120078622 | SPOKEN DIALOGUE APPARATUS, SPOKEN DIALOGUE METHOD AND COMPUTER PROGRAM PRODUCT FOR SPOKEN DIALOGUE - According to one embodiment, a spoken dialogue apparatus includes a detection unit configured to detect speech of a user; a recognition unit configured to recognize the speech; an output unit configured to output a response voice corresponding to the result of speech recognition; an estimate unit configured to estimate probability variation of a barge-in utterance, the probability variation of the barge-in utterance being the time variation of the probability of arising the barge-in utterance interrupted by the user during outputting the response voice; and a control unit configured to determine whether to adopt the barge-in utterance based on the probability variation of the barge-in utterance. | 03-29-2012 |
20120078623 | Method and Apparatus for Communication Between Humans and Devices - This invention relates to methods and apparatus for improving communications between humans and devices. The invention provides a method of modulating operation of a device, comprising: providing an attentive user interface for obtaining information about an attentive state of a user; and modulating operation of a device on the basis of the obtained information, wherein the operation that is modulated is initiated by the device. Preferably, the information about the user's attentive state is eye contact of the user with the device that is sensed by the attentive user interface. | 03-29-2012 |
20120089392 | SPEECH RECOGNITION USER INTERFACE - Speech recognition techniques are disclosed herein. In one embodiment, a novice mode is available such that when the user is unfamiliar with the speech recognition system, a voice user interface (VUI) may be provided to guide them. The VUI may display one or more speech commands that are presently available. The VUI may also provide feedback to train the user. After the user becomes more familiar with speech recognition, the user may enter speech commands without the aid of the novice mode. In this “experienced mode,” the VUI need not be displayed. Therefore, the user interface is not cluttered. | 04-12-2012 |
20120089393 | ACOUSTIC SIGNAL PROCESSING DEVICE AND METHOD - A highlight section including an exciting scene is appropriately extracted with smaller amount of processing. A reflection coefficient calculating unit ( | 04-12-2012 |
20120101817 | SYSTEM AND METHOD FOR GENERATING MODELS FOR USE IN AUTOMATIC SPEECH RECOGNITION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a model for use with automatic speech recognition. These principles can be implemented as part of a streamlined tool for automatic training and tuning of speech, or other, models with a fast turnaround and with limited human involvement. A system configured to practice the method receives, as part of a request to generate a model, input data and a seed model. The system receives a cost function indicating accuracy and at least one of speed and memory usage, The system processes the input data based on seed model and based on parameters that optimize the cost function to yield an updated model, and outputs the updated model. | 04-26-2012 |
20120101818 | DEVICE AND METHOD FOR CREATING DATA RECORDS IN A DATA-STORE BASED ON MESSAGES - Updating a data-store associated with an electronic communications device includes wirelessly communicating an electronic message. A location identifier representative of a physical location is identified within the electronic message. The physical location of the electronic communications device is measured or estimated as needed, after which validating the location identifier occurs when the measured or estimated physical location is calculated to be within a threshold distance of the physical location represented by the location identifier. Initiating creation of a new data record in the data-store is then performed, with the new data record storing at least the validated location identifier and a time identifier. | 04-26-2012 |
20120130710 | ONLINE DISTORTED SPEECH ESTIMATION WITHIN AN UNSCENTED TRANSFORMATION FRAMEWORK - Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping. | 05-24-2012 |
20120130711 | SPEECH DETERMINATION APPARATUS AND SPEECH DETERMINATION METHOD - A signal portion per frame is extracted from an input signal, thus generating a per-frame signal. The per-frame signal in the time domain is converted into a per-frame signal in the frequency domain, thereby generating a spectral pattern of spectra. It is determined whether an energy ratio is higher than a threshold level. The energy ratio is a ratio of each spectral energy to subband energy in a subband that involves the spectrum. The subband is involved in subbands into which a frequency band is separated with a specific bandwidth. It is determined whether the per-frame signal is a speech segment, based on a result of the determination. Average energy is derived in the frequency direction for the spectra in the spectral pattern in each subband. Subband energy is derived per subband by averaging the average energy in the time domain. | 05-24-2012 |
20120130712 | MOBILE TERMINAL AND MENU CONTROL METHOD THEREOF - A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal. | 05-24-2012 |
20120136658 | SYSTEMS AND METHODS FOR CUSTOMIZING BROADBAND CONTENT BASED UPON PASSIVE PRESENCE DETECTION OF USERS - Systems and methods for customizing broadband content based upon passive presence detection of users are provided. A sample of ambient audio may be collected by a customer premise device configured to output programming content received from a service provider. One or more audio components associated with the output of the customer premise device may be removed. Following the removal, a remainder of the collected sample may be compared to one or more stored user voice samples. Based at least in part on the comparison, one of an identity of a user or one or more user characteristics may be determined. Based at least in part on the determination, the content output by the customer premise device may be customized. | 05-31-2012 |
20120136659 | APPARATUS AND METHOD FOR PREPROCESSING SPEECH SIGNALS - Disclosed herein are an apparatus and method for preprocessing speech signals to perform speech recognition. The apparatus includes a voiced sound interval detection unit, a preprocessing method determination unit, and a clipping signal processing unit. The voiced sound interval detection unit detects a voiced sound interval including a voiced sound signal in a voice interval. The preprocessing method determination unit detects a clipping signal present in the voiced sound interval. The clipping signal processing unit extracts signal samples adjacent to the clipping signal, and performs interpolation on the clipping signal using the adjacent signal samples. | 05-31-2012 |
20120150536 | MODEL RESTRUCTURING FOR CLIENT AND SERVER BASED AUTOMATIC SPEECH RECOGNITION - Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components N | 06-14-2012 |
20120173232 | ACOUSTIC PROCESSING APPARATUS AND METHOD - An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof. | 07-05-2012 |
20120173233 | COMMUNICATION METHOD AND APPARATUS FOR PHONE HAVING VOICE RECOGNITION FUNCTION - A method and apparatus for communicating through a phone having a voice recognition function are provided. The method of performing communication using a phone having a voice recognition function includes converting to an incoming call notification and voice recognition mode when a phone call is received; converting to a communication connection and speakerphone mode when voice information related to a communication connection instruction is recognized; performing communication using a speakerphone; and ending communication when voice information related to a communication end instruction is recognized during communication using the speakerphone. Therefore, when a phone call is received, a mode of a phone is converted to a speakerphone mode with a voice instruction using a voice recognition function, and thus communication can be performed without using a hand. | 07-05-2012 |
20120179463 | CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture. | 07-12-2012 |
20120179464 | CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture. | 07-12-2012 |
20120191448 | SPEECH RECOGNITION USING DOCK CONTEXT - Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data. | 07-26-2012 |
20120191449 | SPEECH RECOGNITION USING DOCK CONTEXT - Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data. | 07-26-2012 |
20120215531 | Increased User Interface Responsiveness for System with Multi-Modal Input and High Response Latencies - A multi-modal user interface with increased responsiveness is described. A graphical user interface (GUI) supports multiple different user input modalities including low delay inputs which respond to user inputs without significant delay, and high latency inputs which have a significant response latency after receiving a user input before providing a corresponding completed response. The GUI accepts user inputs in a sequence of mixed input modalities independently of response latencies without waiting for responses to high latency inputs. The GUI also provides interim indication during response latencies of pending responses at a position in the GUI where the completed response will be presented. | 08-23-2012 |
20120226497 | SOUND RECOGNITION METHOD AND SYSTEM - A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range. | 09-06-2012 |
20120232891 | SPEECH COMMUNICATION SYSTEM AND METHOD, AND ROBOT APPARATUS - This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means. | 09-13-2012 |
20120232892 | SYSTEM AND METHOD FOR ISOLATING AND PROCESSING COMMON DIALOG CUES - A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component. | 09-13-2012 |
20120232893 | MULTI-LAYERED SPEECH RECOGNITION APPARATUS AND METHOD - A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result. | 09-13-2012 |
20120232894 | DEVICE FOR RECONSTRUCTING SPEECH BY ULTRASONICALLY PROBING THE VOCAL APPARATUS - The invention provides a portable device for recognizing and/or reconstructing speech by ultrasound probing of the vocal apparatus, the device including at least one ultrasound transducer ( | 09-13-2012 |
20120239393 | MULTIPLE AUDIO/VIDEO DATA STREAM SIMULATION - A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams. | 09-20-2012 |
20120245932 | VOICE RECOGNITION APPARATUS - According to one embodiment, a voice recognition apparatus includes a determination unit, an estimating unit, and a voice recognition unit. The determination unit determines whether a component with a frequency of not less than 1000 Hz and with a level not lower than a predetermined level is included in a sound input from a plurality of microphones. The estimating unit estimates a sound source direction of the sound when the determination unit determines that the component is included in the sound. The voice recognition unit recognizes whether the sound obtained in the sound source direction coincides with a voice model registered beforehand. | 09-27-2012 |
20120253799 | SYSTEM AND METHOD FOR RAPID CUSTOMIZATION OF SPEECH RECOGNITION MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data. | 10-04-2012 |
20120253800 | System and Method for Modifying and Updating a Speech Recognition Program - The system provides a speech recognition program, an update website for updating a speech recognition program, and a way of storing data. A user may utilize an update website, to add, modify, and delete items that may comprise speech commands, dll's, multimedia files, executable code, and other information. Speech recognition program may communicate with update website to request information about possible updates. Update website may send a response consisting of information to speech recognition program. Speech recognition program may utilize received information to decide what items to download. A speech recognition program may send one or more requests to update website to download items. Update website may respond by transmitting, requested items to a speech recognition program that overwrite existing items with newly received items. | 10-04-2012 |
20120259627 | Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition - A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result. | 10-11-2012 |
20120284022 | NOISE REDUCTION SYSTEM USING A SENSOR BASED SPEECH DETECTOR - Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. Embodiments of the current invention provide systems and methods that may be implemented in a communication device. A system may include one or more sensors for detecting information corresponding to a user. The user is in a state of verbal communication. The system further includes one or more sensors for determining periods of speech and non-speech, in the verbal communication, based on the detected information and the audio signal captured by the microphones. The determined periods of speech and non-speech may be used in the coding, compression, noise reduction and other aspects of signal processing. | 11-08-2012 |
20120296644 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 11-22-2012 |
20120296645 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 11-22-2012 |
20120303365 | Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio. | 11-29-2012 |
20120316870 | COMMUNICATION DEVICE WITH SPEECH RECOGNITION AND METHOD THEREOF - A communication unit, a voice input unit, a storage unit, and a processor are included in a communication device. The communication unit enables communication between the device and other communication devices. The voice input unit receives voice signals, which may correspond to one stored speech command and an related operation. The processor detects a match, and executes the desired operation. A related communication method is also provided. | 12-13-2012 |
20120316871 | Speech Recognition Using Loosely Coupled Components - An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server. | 12-13-2012 |
20120330654 | IDENTIFYING AND GENERATING AUDIO COHORTS BASED ON AUDIO DATA INPUT - A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers. | 12-27-2012 |
20130006620 | SYSTEM AND METHOD FOR PROVIDING NETWORK COORDINATED CONVERSATIONAL SERVICES - A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service. | 01-03-2013 |
20130006621 | CONTEXT-BASED GRAMMARS FOR AUTOMATED SPEECH RECOGNITION - Methods, apparatus, and computer program products for providing a context-based grammar for automatic speech recognition, including creating by a multimodal application a context, the context comprising words associated with user activity in the multimodal application, and supplementing by the multimodal application a grammar for automatic speech recognition in dependence upon the context. | 01-03-2013 |
20130030802 | MAINTAINING AND SUPPLYING SPEECH MODELS - Maintaining and supplying a plurality of speech models is provided. A plurality of speech models and metadata for each speech model are stored. A query for a speech model is received from a source. The query includes one or more conditions. The speech model with metadata most closely matching the supplied one or more conditions is determined. The determined speech model is provided to the source. A refined speech model is received from the source, and the refined speech model is stored. | 01-31-2013 |
20130060566 | SPEECH COMMUNICATION SYSTEM AND METHOD, AND ROBOT APPARATUS - This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means. | 03-07-2013 |
20130066629 | Speech & Music Discriminator for Multi-Media Applications - The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music. | 03-14-2013 |
20130080159 | DETECTION OF CREATIVE WORKS ON BROADCAST MEDIA - This disclosure relates to systems and methods for proactively determining identification information for a plurality of audio segments within a plurality of broadcast media streams, and providing identification information associated with specific audio portions of a broadcast media stream automatically or upon request. | 03-28-2013 |
20130080160 | DOCUMENT READING-OUT SUPPORT APPARATUS AND METHOD - According to one embodiment, a document reading-out support apparatus is provided with first to third acquisition units, an extraction unit, a decision unit and a user verification unit. The first acquisition unit acquires a document having texts. The second acquisition unit acquires metadata having definitions each of which includes an applicable condition and a reading-out style. The extraction unit extracts features of the document. The third acquisition unit acquires execution environment information. The decision unit decides candidates of parameters of reading-out based on the features and the information. The user verification unit presents the candidates and accepts a verification instruction. | 03-28-2013 |
20130080161 | SPEECH RECOGNITION APPARATUS AND METHOD - According to one embodiment, a speech recognition apparatus includes following units. The service estimation unit estimates a service being performed by a user, by using non-speech information, and to generate service information. The speech recognition unit performs speech recognition on speech information in accordance with a speech recognition technique corresponding to the service information. The feature quantity extraction unit extracts a feature quantity related to the service of the user, from the speech recognition result. The service estimation unit re-estimates the service by using the feature quantity. The speech recognition unit performs speech recognition based on the re-estimation result. | 03-28-2013 |
20130090923 | Framework For User-Created Device Applications - A method to provide an interface for launching applications is described. The method includes receiving information indicative of a record stored in an electronic device application. The method also includes determining whether the record is associated with a software application command. In response to determining that the record is associated with a software application command, the software application command is activated. Apparatus and computer readable media are also described. | 04-11-2013 |
20130138435 | CHARACTER-BASED AUTOMATED SHOT SUMMARIZATION - Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices. | 05-30-2013 |
20130151249 | INFORMATION PRESENTATION DEVICE, INFORMATION PRESENTATION METHOD, INFORMATION PRESENTATION PROGRAM, AND INFORMATION TRANSMISSION SYSTEM - An information presentation device includes an audio signal input unit configured to input an audio signal, an image signal input unit configured to input an image signal, an image display unit configured to display an image indicated by the image signal, a sound source localization unit configured to estimate direction information for each sound source based on the audio signal, a sound source separation unit configured to separate the audio signal to sound-source-classified audio signals for each sound source, an operation input unit configured to receive an operation input and generates coordinate designation information indicating a part of a region of the image, and a sound source selection unit configured to select a sound-source-classified audio signal of a sound source associated with a coordinate which is included in a region indicated by the coordinate designation information, and which corresponds to the direction information. | 06-13-2013 |
20130166290 | VOICE RECOGNITION APPARATUS - A voice recognition apparatus includes a command recognizer and a data recognizer. The command recognizer recognizes a command portion of a voice input and outputs a command based on a voice recognition result of the voice input. The data recognizer recognizes a data portion of a voice inputs and outputs a data based on a voice recognition result of the voice input. The data recognizer further includes a plurality of data-category recognizers respectively using a data-category dictionary for recognizing the data portion of the voice input and outputting a data result. A voice recognition result selection unit of the voice recognition apparatus selects one of the data results from the data-category recognizers based on the command recognized by the command recognizer. | 06-27-2013 |
20130173264 | METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR IMPLEMENTING AUTOMATIC SPEECH RECOGNITION AND SENTIMENT DETECTION ON A DEVICE - An apparatus for utilizing textual data and acoustic data corresponding to speech data to detect sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including evaluating textual data and acoustic data corresponding to voice data associated with captured speech content. The computer program code may further cause the apparatus to analyze the textual data and the acoustic data to detect whether the textual data or the acoustic data includes one or more words indicating at least one sentiment of a user that spoke the speech content. The computer program code may further cause the apparatus to assign at least one predefined sentiment to at least one of the words in response to detecting that the word(s) indicates the sentiment of the user. Corresponding methods and computer program products are also provided. | 07-04-2013 |
20130179162 | TOUCH FREE OPERATION OF DEVICES BY USE OF DEPTH SENSORS - An inventive system and method for touch free operation of a device is presented. The system can comprise a depth sensor for detecting a movement, motion software to receive the detected movement from the depth sensor, deduce a gesture based on the detected movement, and filter the gesture to accept an applicable gesture, and client software to receive the applicable gesture at a client computer for performing a task in accordance with client logic based on the applicable gesture. The client can be a mapping device and the task can be one of various mapping operations. The system can also comprise hardware for making the detected movement an applicable gesture. The system can also comprise voice recognition providing voice input for enabling the client to perform the task based on the voice input in conjunction with the applicable gesture. The applicable gesture can be a movement authorized using facial recognition. | 07-11-2013 |
20130191122 | Voice Electronic Listening Assistant - The invention comprises music and information delivery systems and methods. One system comprises a voice activated sound system wherein a user speaks and the sound system recognizes the speech and searches an internet database like Rhapsody™ to obtain a list of matching audio files and display the list on a dashboard screen of a vehicle. The user is able to identify the audio file by voice activation and the system is configured to receive the audio file. | 07-25-2013 |
20130191123 | Automatic Door - In some implementations a storage device having a voice-recognition engine stored thereon is coupled to a microcontroller, a device-controller for an automatic door is operably coupled to the microcontroller. | 07-25-2013 |
20130197906 | TECHNIQUES TO NORMALIZE NAMES EFFICIENTLY FOR NAME-BASED SPEECH RECOGNITNION GRAMMARS - Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed. | 08-01-2013 |
20130197907 | SERVICES IDENTIFICATION AND INITIATION FOR A SPEECH-BASED INTERFACE TO A MOBILE DEVICE - A method of providing hands-free services using a mobile device having wireless access to computer-based services includes establishing a short range wireless connection between a mobile device and one or more audio devices that include at least a microphone and speaker; receiving at the mobile device speech inputted via the microphone from a user and sent via the short range wireless connection; wirelessly transmitting the speech input from the mobile device to a speech recognition server that provides automated speech recognition (ASR); receiving at the mobile device a speech recognition result representing the content of the speech input; determining a desired service by processing the speech recognition result using a first, service-identifying grammar; determining a user service request by processing at least some of the speech recognition result using a second, service-specific grammar associated with the desired service; initiating the user service request and receiving a service response; generating an audio message from the service response; and presenting the audio message to the user via the speaker. | 08-01-2013 |
20130226574 | SYSTEMS AND METHODS FOR TUNING AUTOMATIC SPEECH RECOGNITION SYSTEMS - A tuning system for tuning a speech recognition system includes a transmitter for sending a user response to a speech recognition system. The user response is based at least in part on a test stimulus that may be generated by the control system. A receiver receives a recognized response from the speech recognition system; this recognized response is based at least in part on the associated user response. An adjustment module adjusts at least one parameter of the speech recognition system based at least in part on at least one of the test stimulus, the associated user response, and the recognized response. | 08-29-2013 |
20130238326 | APPARATUS AND METHOD FOR MULTIPLE DEVICE VOICE CONTROL - In an environment including multiple electronic devices that are each capable of being controlled by a user's voice command, an individual device is able to distinguish a voice command intended particularly for the device from among other voice commands that are intended for other devices present in the common environment. The device is able to accomplish this distinction by identifying unique attributes belonging to the device itself from within a user's voice command. Thus only voice commands that include attribute information that are supported by the device will be recognized by the device, and other voice commands that include attribute information that are not supported by the device may be effectively ignored for voice control purposes of the device. | 09-12-2013 |
20130262102 | Systems and Methods for Off-Board Voice-Automated Vehicle Navigation - A method of providing navigational information comprises processing destination information spoken by a user of a mobile processing system. The processed voice information is transmitted to a remote data center. The processed voice information is analyzed at the data center to recognize components of the destination information. The center generates a list of hypothetical recognized components of the destination by confidence levels as calculated for each component of the information analyzed. The hypothetical recognized component list is displayed with confidence levels at the data center for selective checking by a human data center operator. A set of hypothetical components is selected based on confidence levels in the list. The accuracy of the selected set of hypothetical recognized components of the destination information is confirmed though interactive voice exchanges between the mobile system user and the remote data center. A destination is determined from confirmed components of the destination information. | 10-03-2013 |
20130268269 | Systems and Methods for Off-Board Voice-Automated Vehicle Navigation - A system for providing navigational information comprises a mobile system processing and transmitting via a wireless link a continuous voice stream spoken by a user of the mobile system, the continuous voice stream including a complete destination address and a data center processing the continuous voice stream received via the wireless link into voice navigational information. The data center performs automated voice recognition processing on the voice navigational information to recognize destination components of the complete destination address, confirms the recognized destination components through interactive speech exchanges with the mobile system user via the wireless link and the mobile system, selectively allows human data center operator intervention to assist in identifying the selected recognized destination components having a recognition confidence below a selected threshold value, and downloads the complete destination address for transmission to the mobile system derived from the confirmed recognized destination components. | 10-10-2013 |
20130282371 | Recognizing Repeated Speech in a Mobile Computing Device - A method is disclosed herein for recognizing a repeated utterance in a mobile computing device via a processor. A first utterance is detected being spoken into a first mobile computing device. Likewise, a second utterance is detected being spoken into a second mobile computing device within a predetermined time period. The second utterance substantially matches the first spoken utterance and the first and second mobile computing devices are communicatively coupled to each other. The processor enables capturing, at least temporarily, a matching utterance for performing a subsequent processing function. The performed subsequent processing function is based on a type of captured utterance. | 10-24-2013 |
20130297304 | APPARATUS AND METHOD FOR SPEECH RECOGNITION - Disclosed is an apparatus for speech recognition and automatic translation operated in a PC or a mobile device. The apparatus for speech recognition according to the present invention includes a display unit that displays a screen for selecting a domain as a unit for a speech recognition region previously sorted for speech recognition to a user; a user input unit that receives a selection of a domain from the user; and a communication unit that transmits the user selection information for the domain. According to the present invention, the apparatus for speech recognition using an intuitive and simple user interface is provided to a user to enable the user to easily select/correct a designation domain of a speech recognition system and improve accuracy and performance of speech recognition and automatic translation by the designated system for speech recognition. | 11-07-2013 |
20130304462 | SIGNAL PROCESSING APPARATUS AND METHOD AND PROGRAM - Disclosed herein is a signal processing apparatus including: a first A/D converter configured to execute A/D conversion by adjusting an input signal with a first gain; a second A/D converter configured to execute A/D conversion by adjusting an input signal with a second gain that is smaller than the first gain; a synthesis block configured to synthesize a first signal obtained by conversion by the first A/D converter with a second signal obtained by conversion by the second A/D converter to output a resultant synthesized signal if the first signal is clipped; and a signal processing block configured to execute signal processing by use of the signal outputted from the synthesis block. | 11-14-2013 |
20130325459 | SPEECH RECOGNITION ADAPTATION SYSTEMS BASED ON ADAPTATION DATA - Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 12-05-2013 |
20130325460 | METHOD OF PROVIDING VOICE RECOGNITION SERVICE AND ELECTRONIC DEVICE THEREFOR - A method and an electronic device provide a voice recognition service. The method includes displaying one or more application programs according to a voice command input through a microphone, determining an additional service to be driven in a selected application program in consideration of the voice command when the any one of the one or more application programs is selected, and displaying the additional service. | 12-05-2013 |
20130339013 | PROCESSING APPARATUS, PROCESSING SYSTEM, AND OUTPUT METHOD - A processing apparatus includes: a search result acquisition unit that acquires a search result searched based on a voice of a user recognized by a voice recognition unit; a user data storage unit that stores therein a knowledge level so as to be associated with a user; an expression data storage unit that stores therein a plurality of pieces of expression data expressing provision contents provided to the user as the search result so as to be associated with a plurality of different knowledge levels, the plurality of pieces of expression data having different professional levels; a knowledge level identifying unit that identifies a knowledge level of the user with reference to the user data storage unit; an editing unit that edits the search result based on the expression data associated with the identified knowledge level; and an output unit that outputs the edited search result. | 12-19-2013 |
20140012572 | SYSTEM AND METHOD FOR CONTENT RECOGNITION IN PORTABLE DEVICES - According to a preferred aspect of the instant invention, there is provided a system and method for content recognition in portable devices. Content, preferably audio content is recorded by the instant invention, preferably a sample with a length between 1 and 10 seconds. A fingerprint will be generated from the recorded sample and automatically, and preferably without further user interaction, prompting, notification, etc. (e.g. invisible to the user), compared with the fingerprints in a fingerprint database that is stored locally in the portable device and the result thereafter presented to the user. | 01-09-2014 |
20140039885 | METHODS AND APPARATUS FOR VOICE-ENABLING A WEB APPLICATION - Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information. | 02-06-2014 |
20140052441 | INPUT AUXILIARY APPARATUS, INPUT AUXILIARY METHOD, AND PROGRAM - An object of the present invention is to provide an input auxiliary apparatus equipped with an input section to input character strings; an embellishment information retaining section to retain embellishment information on a plurality of postures in a storing section in advance to link each posture with the embellishment information; a posture detecting section to detect the posture; a reading section to read out the embellishment information linked with the posture detected by the posture detecting section from the storing section; and an embellishment applying section to apply the embellishment information read out by the reading section to the character strings. The input section preferably includes a speech recognition section to recognize voice data based on speech recognition and convert the voice data to the character strings. Accordingly, this enables emotions of a speaker to be correctly judged and suitable embellishments to be appended when performing speech recognition. | 02-20-2014 |
20140081632 | COMPUTER PRODUCT, ANALYSIS SUPPORT APPARATUS, AND ANALYSIS SUPPORT METHOD - A computer-readable recording medium stores a program causing a computer to execute a process that includes acquiring first scenario information that formalizes and indicates a character set of characters appearing in a first scene, a knowledge set of knowledge items retained by the characters, and an action set of actions taken by the characters; receiving into the action set, an input of speech contents of a speech action from a first character to a second character who appear in the first scene; producing second scenario information that inherits the character set and the knowledge set of the first scenario information and, formalizes and indicates for a second scene, a character set, a knowledge set, and an action set; and registering into the knowledge set of the second character indicated in the second scenario information, the speech contents of the speech action for which the input is received. | 03-20-2014 |
20140088960 | VOICE RECOGNITION DEVICE AND METHOD, AND SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE - A semiconductor integrated circuit device for voice recognition includes: a signal processing unit which generates a feature pattern representing a state of distribution of frequency components of an input voice signal; a voice recognition database storage unit which stores a voice recognition database including a standard pattern representing a state of distribution of frequency components of plural phonemes; a conversion list storage unit which stores a conversion list including plural words or sentences to be conversion candidates; a standard pattern extraction unit which extracts a standard pattern corresponding to character data representing the first syllable of each word or sentence included in the conversion list, from the voice recognition database; and a matching detection unit which compares the feature pattern generated from the first syllable of the voice signal with the extracted standard pattern and thus detects the matching of the syllable. | 03-27-2014 |
20140108009 | Multimedia Search Application for a Mobile Device - In accordance with one aspect of the present invention, a method selects a program from a library of programs. A user selection is determined based upon a voice command, and the program is presented at a display device in accordance with the voice command. In accordance with another aspect of the present invention, a system selects a program from a library of programs. The system includes a processor that determines a user selection based upon a voice command, and also includes a display device that presents the program in accordance with the voice command. In accordance with yet another embodiment of the present invention, a computer-readable medium contains a set of instructions that when executed by a processor cause the processor to determine a user selection based upon a voice command and to command a display device to present the program, in accordance with the voice command. | 04-17-2014 |
20140114655 | EMOTION RECOGNITION USING AUDITORY ATTENTION CUES EXTRACTED FROM USERS VOICE - Emotion recognition may be implemented on an input window of sound. One or more auditory attention features may be extracted from an auditory spectrum for the window using one or more two-dimensional spectro-temporal receptive filters. One or more feature maps corresponding to the one or more auditory attention features may be generated. Auditory gist features may be extracted from feature maps, and the auditory gist features may be analyzed to determine one or more emotion classes corresponding to the input window of sound. In addition, a bottom-up auditory attention model may be used to select emotionally salient parts of speech and execute emotion recognition only on the salient parts of speech while ignoring the rest of the speech signal. | 04-24-2014 |
20140129217 | Senone Scoring For Multiple Input Streams - Embodiments of the present invention include an apparatus, method, and system for calculating senone scores for multiple concurrent input speech streams. The method can include the following: receiving one or more feature vectors from one or more input streams; accessing the acoustic model one senone at a time; and calculating separate senone scores corresponding to each incoming feature vector. The calculation uses a single read access to the acoustic model for a single senone and calculates a set of separate senone scores for the one or more feature vectors, before proceeding to the next senone in the acoustic model. | 05-08-2014 |
20140129218 | Recognition of Speech With Different Accents - Computer-based speech recognition can be improved by recognizing words with an accurate accent model. In order to provide a large number of possible accents, while providing real-time speech recognition, a language tree data structure of possible accents is provided in one embodiment such that a computerized speech recognition system can benefit from choosing among accent categories when searching for an appropriate accent model for speech recognition. | 05-08-2014 |
20140156268 | INCREMENTAL SPEECH RECOGNITION FOR DIALOG SYSTEMS - A system and method for integrating incremental speech recognition in dialog systems. An example system configured to practice the method receives incremental speech recognition results of user speech as part of a dialog with a user, and copies a dialog manager operating on the user speech to generate temporary instances of the dialog manager. Then the system evaluates actions the temporary instances of the dialog manager would take based on the incremental speech recognition results, and identifies an action that would advance the dialog and a corresponding temporary instance of the dialog manager. The system can then execute the action in the dialog and optionally replace the dialog manager with the corresponding temporary instance of the dialog manager. The action can include making a turn-taking decision in the dialog, such as whether, what, and when to speak or whether to be silent. | 06-05-2014 |
20140156269 | PORTABLE DEVICE AND METHOD FOR PROVIDING VOICE RECOGNITION SERVICE - A portable device and a method for providing a voice recognition service are disclosed. The portable device includes a mechanical vibration sensor configured to sense vibrations having a magnitude equal to or larger than a threshold and generate an electrical signal, a motion sensor configured to sense a motion of the portable device, an audio sensor configured to receive a voice command, a sensor hub configured to control a plurality of sensors including the motion sensor and the audio sensor, and a main processor configured to execute an application and control the portable device. When the portable device is placed in standby mode, upon receipt of the electrical signal from the mechanical vibration sensor, the sensor hub is configured to switch from inactive state to active state and activate the motion sensor. | 06-05-2014 |
20140156270 | APPARATUS AND METHOD FOR SPEECH RECOGNITION - Disclosed herein is an apparatus and a method for speech recognition. The apparatus includes a controller that is configured to receive a speech signal including a speech recognition waveform from a user and the waveform of speech generated within a vehicle, when a speech recognition operation initiates. The controller is further configured to generate an offset waveform corresponding to a speech output waveform generated from a speech output device within the vehicle, using feature information of the speech output waveform, when the speech recognition operation initiates. Additionally, the controller is configured to extract the speech recognition waveform of the user by removing a predetermined amount or more of the speech output waveform from a speech signal input by overlapping the offset waveform to the speech signal and to perform speech recognition based on the speech recognition waveform. | 06-05-2014 |
20140163974 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 06-12-2014 |
20140163975 | METHOD AND APPARATUS FOR CORRECTING SPEECH RECOGNITION ERROR - Disclosed are a speech recognition error correction method and an apparatus thereof. The speech recognition error correction method includes determining a likelihood that a speech recognition result is erroneous, and if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generating a parallel corpus according to whether the speech recognition result matches the correct answer corpus, generating a speech recognition model based on the parallel corpus, and correcting an erroneous speech recognition result based on the speech recognition model and the language model. Accordingly, speech recognition errors are corrected. | 06-12-2014 |
20140163976 | METHOD AND USER DEVICE FOR PROVIDING CONTEXT AWARENESS SERVICE USING SPEECH RECOGNITION - A method for providing a context awareness service is provided. The method includes defining a control command for the context awareness service depending on a user input, triggering a playback mode and the context awareness service in response to a user selection, receiving external audio through a microphone in the playback mode, determining whether the received audio corresponds to the control command, and executing a particular action assigned to the control command when the received audio corresponds to the control command. | 06-12-2014 |
20140172423 | SPEECH RECOGNITION METHOD, DEVICE AND ELECTRONIC APPARATUS - A speech recognition method, device and electronic apparatus are provided. The method includes: receiving a speech input, recognizing the speech input as a wake-up instruction by a wake-up engine, waking up a search engine according to the wake-up instruction, and determining a recognition scope corresponding to the wake-up instruction. The recognition scope corresponding to the wake-up instruction, compared with the entire recognition scope of the recognition engine, is relatively small. Hence, the recognition scope of the recognition engine is narrowed. Compared with the search within a large recognition scope, the precision in searching the target is improved by searching within a relatively small scope. | 06-19-2014 |
20140195226 | METHOD AND APPARATUS FOR CORRECTING ERROR IN SPEECH RECOGNITION SYSTEM - A method of correcting errors in a speech recognition system includes a process of searching a speech recognition error-answer pair DB based on a sound model for a first candidate answer group for a speech recognition error, a process of searching a word relationship information DB for a second candidate answer group for the speech recognition error, a process of searching a user error correction information DB for a third candidate answer group for the speech recognition error, a process of searching a domain articulation pattern DB and a proper noun DB for a fourth candidate answer group for the speech recognition error, and a process of aligning candidate answers within each of the retrieved candidate answer groups and displaying the aligned candidate answers. | 07-10-2014 |
20140195227 | SYSTEM AND METHOD FOR ACOUSTIC TRANSFORMATION - An acoustic transformation system and method. A specific embodiment is the transformation of acoustic speech signals produced by speakers with speech disabilities in order to make those utterances more intelligible to typical listeners. These modifications include the correction of tempo or rhythm, the adjustment of formant frequencies in sonorants, the removal of adjustment of aberrant voicing, the deletion of phoneme insertion errors, and the replacement of erroneously dropped phonemes. These methods may also be applied to general correction of musical or acoustic sequences. | 07-10-2014 |
20140214415 | USING VISUAL CUES TO DISAMBIGUATE SPEECH INPUTS - Embodiments related to recognizing speech inputs are disclosed. One disclosed embodiment provides a method for recognizing a speech input including receiving depth information of a physical space from a depth camera, determining an identity of a user in the physical space based on the depth information, receiving audio information from one or more microphones, and determining a speech input from the audio input. If the speech input comprises an ambiguous term, the ambiguous term in the speech input is compared to one or more of depth image data received from the depth image sensor and digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term. After identifying the unambiguous term, an action is taken on the computing device based on the speech input and the unambiguous term. | 07-31-2014 |
20140214416 | METHOD AND SYSTEM FOR RECOGNIZING SPEECH COMMANDS - A method of recognizing speech commands includes generating a background acoustic model for a sound using a first sound sample, the background acoustic model characterized by a first precision metric. A foreground acoustic model is generated for the sound using a second sound sample, the foreground acoustic model characterized by a second precision metric. A third sound sample is received and decoded by assigning a weight to the third sound sample corresponding to a probability that the sound sample originated in a foreground using the foreground acoustic model and the background acoustic model. The method further includes determining if the weight meets predefined criteria for assigning the third sound sample to the foreground and, when the weight meets the predefined criteria, interpreting the third sound sample as a portion of a speech command. Otherwise, recognition of the third sound sample as a portion of a speech command is forgone. | 07-31-2014 |
20140222422 | SCALING STATISTICAL LANGUAGE UNDERSTANDING SYSTEMS ACROSS DOMAINS AND INTENTS - A scalable statistical language understanding (SLU) system uses a fixed number of understanding models that scale across domains and intents (i.e. single vs. multiple intents per utterance). For each domain added to the SLU system, the fixed number of existing models is updated to reflect the newly added domain. Information that is already included in the existing models and the corresponding training data may be re-used. The fixed models may include a domain detector model, an intent action detector model, an intent object detector model and a slot/entity tagging model. A domain detector identifies different domains identified within an utterance. All/portion of the detected domains are used to determine associated intent actions. For each determined intent action, one or more intent objects are identified. Slot/entity tagging is performed using the determined domains, intent actions, and intent object detector. | 08-07-2014 |
20140236592 | SYSTEMS AND METHODS FOR GATHERING RESEARCH DATA - Methods and systems are provided for gathering research data that includes information pertaining to audio signals received on a portable device, such as a cell phone. Frequency domain data is received or produced, a signature is extracted from the frequency domain data and an ancillary code is read from the frequency domain data. | 08-21-2014 |
20140236593 | SPEAKER RECOGNITION METHOD THROUGH EMOTIONAL MODEL SYNTHESIS BASED ON NEIGHBORS PRESERVING PRINCIPLE - A speaker recognition method through emotional model synthesis based on Neighbors Preserving Principle is enclosed. The methods includes the following steps: (1) training the reference speaker's and user's speech models; (2) extracting the neutral-to-emotion transformation/mapping sets of GMM reference models; (3) extracting the emotion reference Gaussian components mapped by or corresponding to several Gaussian neutral reference Gaussian components close to the user's neutral training Gaussian component; (4) synthesizing the user's emotion training Gaussian component and then synthesizing the user's emotion training model; (5) synthesizing all user's GMM training models; (6) inputting test speech and conducting the identification. This invention extracts several reference speeches similar to the neutral training speech of a user from a speech library by employing neighbor preserving principles based on KL divergence and combines an emotion training speech of the user using the emotion reference speech in the reference speech, improving the performance of the speaker recognition system in the situation where the training speech and the test speech are mismatched, and the robustness of the speaker recognition system is increased. | 08-21-2014 |
20140249811 | DETECTING THE END OF A USER QUESTION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying voice inputs. The methods, systems, and apparatus include actions of providing an answer to a first voice input from a user and receiving visual or audio data corresponding to a second voice input. Further actions include classifying the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data. Additionally, the actions include determining whether to provide a response to the second voice input based on the classification of the second voice input. | 09-04-2014 |
20140278387 | SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY IN A WORK ENVIRONMENT - Apparatus and method that improves speech recognition accuracy, by monitoring the position of a user's headset-mounted speech microphone, and prompting the user to reconfigure the speech microphone's orientation if required. A microprocessor or other application specific integrated circuit provides a mechanism for comparing the relative transit times between a user's voice, a primary speech microphone, and a secondary compliance microphone. The difference in transit times may be used to determine if the speech microphone is placed in an appropriate proximity to the user's mouth. If required, the user is automatically prompted to reposition the speech microphone. | 09-18-2014 |
20140278388 | SYSTEMS AND METHODS FOR IDENTIFYING PATIENT DISTRESS BASED ON A SOUND SIGNAL - A sound signal from a patient may include information that may be used to determine multiple patient parameters. A patient monitor may determine respiration information such as respiration rate from the sound signal, for example based on modulations of the sound signal due to patient breathing. The patient monitor may also determine indications of patient distress based on a trained classifier, speech commands, or sound patterns. | 09-18-2014 |
20140278389 | Method and Apparatus for Adjusting Trigger Parameters for Voice Recognition Processing Based on Noise Characteristics - A method and apparatus for adjusting a trigger parameter related to voice recognition processing includes receiving into the device an acoustic signal comprising a speech signal, which is provided to a voice recognition module, and comprising noise. The method further includes determining a noise profile for the acoustic signal, wherein the noise profile identifies a noise level for the noise and identifies a noise type for the noise based on a frequency spectrum for the noise, and adjusting the voice recognition module based on the noise profile by adjusting a trigger parameter related to voice recognition processing. | 09-18-2014 |
20140297274 | NESTED SEGMENTATION METHOD FOR SPEECH RECOGNITION BASED ON SOUND PROCESSING OF BRAIN - A method of segmenting input speech signal into plurality of frames for speech recognition is disclosed. The method includes extracting a low frequency signal from the speech signal, and segmenting the speech signal into a plurality of time-intervals according to a plurality of instantaneous phase-sections of the low frequency signal. | 10-02-2014 |
20140297275 | SPEECH PROCESSING DEVICE, INTEGRATED CIRCUIT DEVICE, SPEECH PROCESSING SYSTEM, AND CONTROL METHOD FOR SPEECH PROCESSING DEVICE - A speech processing device includes: a dialog execution control unit that controls speech output and timings of speech recognition in accordance with dialog information including speech output information, speech recognition information and control information; a speech output control unit that outputs an output speech signal designated by the speech output information; and a speech recognition unit that executes speech recognition processing for an input speech signal using the speech recognition information. The control information includes speech output timing information for the output speech signal and speech recognition start timing information for the input speech signal. The speech recognition start timing information is specified by a time period that elapses from a first timing specified by the speech output timing information. | 10-02-2014 |
20140303969 | SPEECH RECOGNITION CONTROL DEVICE - A speech recognition control device has a plurality of microphones placed at different positions, a speech transmission control unit, and a speech recognition execution control unit. The speech transmission control unit stores data based on the speeches which are input from the microphones and time data related to ranks among the microphones, assigns ranks to the plurality of microphones using the time data based on a preset condition, and transmits a speech data signal corresponding to the microphone to the speech recognition execution control unit in the order of the ranks. The speech recognition execution control unit executes the speech recognition process according to the order of the speech data signals transmitted from the speech transmission control unit. | 10-09-2014 |
20140309993 | SYSTEM AND METHOD FOR DETERMINING QUERY INTENT - A method for training a system is provided. The method may include storing one or more backend communication logs, each of the one or more backend communication logs including a user query and a corresponding backend query. The method may further include parsing the one or more backend communication logs to extract statistical information and generating a mapping between each user query and a corresponding set of language tags. The method may also include sorting the one or more backend communication logs based upon, at least in part, the extracted statistical information. | 10-16-2014 |
20140316776 | VOICE RECOGNITION CLIENT SYSTEM FOR PROCESSING ONLINE VOICE RECOGNITION, VOICE RECOGNITION SERVER SYSTEM, AND VOICE RECOGNITION METHOD - A voice/speech recognition client system, a voice recognition server system, and a voice recognition method. The voice recognition system indicates a result of voice recognition in a voice signal inputted from a starting time for voice recognition to an ending time. The voice recognition client comprises: a communication unit that transmits a unit voice signal input at intervals from the starting time to the ending time, to the voice recognition server system at the intervals and receives an intermediate result of voice recognition from the voice recognition server system; and a display unit that displays the intermediate result received between the starting time and the ending time. | 10-23-2014 |
20140316777 | USER DEVICE AND OPERATION METHOD THEREOF - A user device having a voice recognition function and an operation method thereof are provided. The operation method includes detecting whether there is an input from at least one sensor in response to execution of an application which may use voice recognition and activating or inactivating the voice recognition in response to the detection of the input. | 10-23-2014 |
20140337022 | SYSTEM AND METHOD FOR LOAD BALANCING IN A SPEECH RECOGNITION SYSTEM - The various implementations described herein include systems, methods and/or devices used to enable load balancing in a speech recognition system. For example, in some implementations, the method includes, at a speech access server: (1) initializing the speech access server, (2) receiving a speech request from a terminal, (3) determining, in accordance with a predefined load balancing algorithm, a first speech recognition server to process the speech request, (4) determining whether the first speech recognition server is available for processing, (5) if the first speech recognition server is available, forwarding the speech request to the first speech recognition server for processing, and (6) if the first speech recognition server is not available: (a) determining whether other speech recognition servers are available for processing, and (b) if a second speech recognition server is available, forwarding the speech request to the second speech recognition server for processing. | 11-13-2014 |
20140350924 | METHOD AND APPARATUS FOR USING IMAGE DATA TO AID VOICE RECOGNITION - A device performs a method for using image data to aid voice recognition. The method includes the device capturing image data of a vicinity of the device and adjusting, based on the image data, a set of parameters for voice recognition performed by the device. The set of parameters for the device performing voice recognition include, but are not limited to: a trigger threshold of a trigger for voice recognition; a set of beamforming parameters; a database for voice recognition; and/or an algorithm for voice recognition, wherein the algorithm can include using noise suppression or using acoustic beamforming. | 11-27-2014 |
20140350925 | VOICE RECOGNITION APPARATUS, VOICE RECOGNITION SERVER AND VOICE RECOGNITION GUIDE METHOD - A voice recognition apparatus includes a communication part configured to communicate with a voice recognition server, a voice receiver configured to receive a user's voice signal, a storage part configured to store guide information comprising at least an example command for voice recognition; and a controller. The controller is configured to generate a guide image comprising at least a part of the example command, transmit the received user's voice signal to the voice recognition server through the communication part in response to receiving the user's voice signal by the voice receiver, and update the stored guide information based on update information received through the communication part. | 11-27-2014 |
20140358533 | PRONUNCIATION ACCURACY IN SPEECH RECOGNITION - A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string. | 12-04-2014 |
20150046157 | User Dedicated Automatic Speech Recognition - A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues. | 02-12-2015 |
20150058000 | COMPUTERIZED INFORMATION AND DISPLAY APPARATUS - Computerized apparatus useful for obtaining and presenting information to users. In one embodiment, the computerized apparatus includes a display device and speech digitization apparatus configured to receive user speech input and enable performance of various tasks, such as obtaining desired information relating to an entity, maps or directions, weather, news, or any number of other topics. In one variant, the user causes the apparatus to enter a mode whereby the user can immediately say a portion of a name, and the apparatus will locate two or more possible matches, and solicit further user input to allow the user to rapidly converge on identifying and navigating to a desired entity. | 02-26-2015 |
20150058001 | Microphone and Corresponding Digital Interface - Analog signals are received from a sound transducer. The analog signals are converted into digitized data. A determination is made as to whether voice activity exists within the digitized signal. Upon the detection of voice activity, an indication of voice activity is sent to a processing device. The indication is sent across a standard interface, and the standard interface is configured to be compatible to be coupled with a plurality of devices from potentially different manufacturers. | 02-26-2015 |
20150066495 | Robust Feature Extraction Using Differential Zero-Crossing Countes - A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A portion of the sparse sound parameter information is differential zero crossing (ZC) counts. Differential ZC rate may be determined by measuring a number of times the analog signal crosses a threshold value during each of a sequence of time frames to form a sequence of ZC counts and taking a difference between selected pairs of ZC counts to form a sequence of differential ZC counts. | 03-05-2015 |
20150073786 | GAMING HEADSET WITH PROGRAMMABLE AUDIO PATHS - A headset having game, chat and microphone audio signals is provided with a programmable signal processor for individually modifying the audio signals and a memory configured to store a plurality of user-selectable signal-processing parameter settings that determine the manner in which the audio signals will be altered by the signal processor. The parameter settings collectively form a preset, and one or more user-operable controls can select and activate a preset from the plurality of presets stored in memory. The parameters stored in the selected preset can be loaded into the signal processor such that the sound characteristics of the audio paths are modified in accordance with the parameter settings in the selected preset. | 03-12-2015 |
20150081288 | SPEECH RECOGNITION DEVICE AND THE OPERATION METHOD THEREOF - Described herein is a speech recognition device comprising: a communication module receiving speech data corresponding to speech input from a speech recognition terminal and multi-sensor data corresponding to input environment of the speech; a model selection module selecting a language and acoustic model corresponding to the multi-sensor data among a plurality of language and acoustic models classified according to the speech input environment on the basis of previous multi-sensor data; and a speech recognition module controlling the communication module to apply a feature vector extracted from the speech data to the language and acoustic model and transmit speech recognition result for the speech data to the speech recognition terminal. | 03-19-2015 |
20150081289 | Computer, Internet and Telecommunications Based Network - A method and apparatus for a computer and telecommunication network which can receive, send and manage information from or to a subscriber of the network, based on the subscriber's configuration. The network is made up of at least one cluster containing voice servers which allow for telephony, speech recognition, text-to-speech and conferencing functions, and is accessible by the subscriber through standard telephone connections or through internet connections. The network also utilizes a database and file server allowing the subscriber to maintain and manage certain contact lists and administrative information. A web server is also connected to the cluster thereby allowing access to all functions through internet connections. | 03-19-2015 |
20150081290 | CALL STEERING DATA TAGGING INTERFACE WITH AUTOMATIC SEMANTIC CLUSTERING - A system and method for providing an easy-to-use interface for verifying semantic tags in a steering application in order to generate a natural language grammar. The method includes obtaining user responses to open-ended steering questions, automatically grouping the user responses into groups based on their semantic meaning, and automatically assigning preliminary semantic tags to each of the groups. The user interface enables the user to validate the content of the groups to ensure that all responses within a group have the same semantic meaning and to add or edit semantic tags associated with the groups. The system and method may be applied to interactive voice response (IVR) systems, as well as customer service systems that can communicate with a user via a text or written interface. | 03-19-2015 |
20150095024 | FUNCTION EXECUTION INSTRUCTION SYSTEM, FUNCTION EXECUTION INSTRUCTION METHOD, AND FUNCTION EXECUTION INSTRUCTION PROGRAM - To appropriately execute functions based on words that are consecutively input, a function-execution instruction server of a function-execution instruction system includes: a function-execution instruction unit that issues an instruction of the execution of one or more tasks related to preset categories for respective tasks; a word input unit that inputs information containing a word; category identifying unit that identifies a category of a word; an executed-function determination unit that determines a task, based on the category thus identified; and an executed-function storage unit that stores therein a function the execution of which was instructed. The executed-function determination unit, based on the category identified and a category related to a task the execution of which was previously instructed, determines whether to issue an instruction of the execution of the function the execution of which was previously instructed from now. | 04-02-2015 |
20150095025 | Decoding-Time Prediction of Non-Verbalized Tokens - Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding. | 04-02-2015 |
20150100311 | SYSTEM AND METHOD FOR CORRECTING ACCENT INDUCED SPEECH IN AN AIRCRAFT COCKPIT UTILIZING A DYNAMIC SPEECH DATABASE - A system and method for recognizing speech on board an aircraft that compensates for different regional dialects over an area comprised of at least first and second distinct geographical regions, comprises analyzing speech in the first distinct geographical region using speech data characteristics representative of speech in the first distinct geographical region, detecting a change in position from the first distinct geographical region to the second geographical region, and analyzing speech in the second distinct geographical region using speech data characteristics representative of speech in the second distinct geographical region upon detecting that the aircraft has transitioned from the first distinct geographical region to the second distinct geographical region. | 04-09-2015 |
20150106085 | SPEECH RECOGNITION WAKE-UP OF A HANDHELD PORTABLE ELECTRONIC DEVICE - A system and method for parallel speech recognition processing of multiple audio signals produced by multiple microphones in a handheld portable electronic device. In one embodiment, a primary processor transitions to a power-saving mode while an auxiliary processor remains active. The auxiliary processor then monitors the speech of a user of the device to detect a wake-up command by speech recognition processing the audio signals in parallel. When the auxiliary processor detects the command it then signals the primary processor to transition to active mode. The auxiliary processor may also identify to the primary processor which microphone resulted in the command being recognized with the highest confidence. Other embodiments are also described. | 04-16-2015 |
20150106086 | Building Automation Systems with Voice Control - A regional monitoring system can include a plurality of voice sensing units each of which incorporates speech recognition circuitry. In response to recognizing a verbal command at a unit, a coded representation, or token, along with location information, can be transmitted to a system control apparatus. Upon receipt of the token, the control apparatus can carry out the requested command or provide requested information. | 04-16-2015 |
20150120287 | SYSTEM AND METHOD FOR MANAGING MODELS FOR EMBEDDED SPEECH AND LANGUAGE PROCESSING - Disclosed herein are systems, methods, and computer-readable storage devices for fetching speech processing models based on context changes in advance of speech requests using the speech processing models. An example local device configured to practice the method, having a local speech processor, and having access to remote speech models, detects a change in context. The change in context can be based on geographical location, language translation, speech in a different language, user language settings, installing or removing an app, and so forth. The local device can determine a speech processing model that is likely to be needed based on the change in context, and that is not stored on the local device. Independently of an explicit request to process speech, the local device can retrieve, from a remote server, the speech processing model for use on the mobile device. | 04-30-2015 |
20150120288 | SYSTEM AND METHOD OF PERFORMING AUTOMATIC SPEECH RECOGNITION USING LOCAL PRIVATE DATA - A method of providing hybrid speech recognition between a local embedded speech recognition system and a remote speech recognition system relates to receiving speech from a user at a device communicating with a remote speech recognition system. The system recognizes a first part of speech by performing a first recognition of the first part of the speech with the embedded speech recognition system that accesses private user data, wherein the private user data is not available to the remote speech recognition system. The system recognizes the second part of the speech by performing a second recognition of the second part of the speech with the remote speech recognition system. The final recognition result is a combination of these two recognition processes. The private data can be such local information as a user location, a playlist, frequently dialed numbers or texted people, user contact list information, and so forth. | 04-30-2015 |
20150120289 | PREDICTING RECOGNITION QUALITY OF A PHRASE IN AUTOMATIC SPEECH RECOGNITION SYSTEMS - A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value. | 04-30-2015 |
20150120290 | CLIENT-SERVER ARCHITECTURE FOR AUTOMATIC SPEECH RECOGNITION APPLICATIONS - A client-server architecture for Automatic Speech Recognition (ASR) applications, includes: (a) a client-side including: a client being part of distributed front end for converting acoustic waves to feature vectors; VAD for separating between speech and non-speech acoustic signals; adaptor for WebSockets; and (b) a server side including: a web layer utilizing HTTP protocols and including a Web Server having a Servlet Container; an intermediate layer for transport based on Message-Oriented Middleware being a message broker; a recognition server and an adaptation server both connected to said intermediate layer; a Speech processing server; a Recognition Server for instantiation of a recognition channel per client; an Adaptation Server for adaptation acoustic and linguistic models for each speaker; a Bidirectional communication channel between a Speech processing server and client side; and a Persistent layer for storing a Language Knowledge Base connected to said Speech processing server. | 04-30-2015 |
20150120291 | Scene Recognition Method, Device and Mobile Terminal Based on Ambient Sound - The present document provides a scene recognition method and device based on ambient sound and a mobile terminal. The device includes: a sound collection module, a preprocessing module, a feature extraction module, a scene recognition module and a database. The method includes: collecting a sound signal; processing the sound signal into a frequency domain signal; extracting sound feature information from the frequency domain signal; inputting the sound feature information under a preset model, matching a model output result with weight values of sound sample models of scenes, and determining a scene corresponding to the sound feature information. The present document implements locating based on background sound information as the feature of the scene, so that the mobile terminal quickly and correctly recognizes the current scene under the maintaining low-consumption state. | 04-30-2015 |
20150127334 | Speech Recognition Based Asset Management System For Tagging Legacy LPG Containers - Speech recognition based asset management systems and related methods are disclosed for tagging legacy LPG (liquefied petroleum gas) containers, such as LPG cylinders, and tracking associated data for the LPG containers. Embodiments described herein utilize a speech recognition processor to capture data associated with legacy LPG containers that are being tagged with electronic identification tags. The operator reads aloud the container information for the legacy LPG containers being tagged to produce voice signals for the container information, and the asset management system uses speech recognition to convert these voice signals to digital data that can be reviewed, modified, and verified by the operator before it is stored along with a tag identifier within a database system for later retrieval, as needed. | 05-07-2015 |
20150127335 | VOICE TRIGGER - Voice trigger. In accordance with a first method embodiment, a long term average audio energy is determined based on a one-bit pulse-density modulation bit stream. A short term average audio energy is determined based on the one-bit pulse-density modulation bit stream. The long term average audio energy is compared to the short term average audio energy. Responsive to the comparing, a voice trigger signal is generated if the short term average audio energy is greater than the long term average audio energy. Determining the long term average audio energy may be performed independent of any decimation of the bit stream. | 05-07-2015 |
20150134329 | CONTENT IDENTIFICATION SYSTEM - The content of a media program is recognized by analyzing its audio content to extract therefrom prescribed features, which are compared to a database of features associated with identified content. The identity of the content within the database that has features that most closely match the features of the media program being played is supplied as the identity of the program being played. The features are extracted from a frequency domain version of the media program by a) filtering the coefficients to reduce their number, e.g., using triangular filters; b) grouping a number of consecutive outputs of triangular filters into segments; and c) selecting those segments that meet prescribed criteria, such as those segments that have the largest minimum segment energy with prescribed constraints that prevent the segments from being too close to each other. The triangular filters may be log-spaced and their output may be normalized. | 05-14-2015 |
20150142428 | IN-VEHICLE NAMETAG CHOICE USING SPEECH RECOGNITION - According to an embodiment of the disclosure, there is provided a method of choosing a nametag using automatic speech recognition (ASR). The method includes receiving a spoken nametag via a microphone; performing a first speech recognition analysis on the spoken nametag; determining that the first speech recognition analysis outputs only handheld wireless device nametags; performing a second speech recognition analysis that excludes the handheld wireless device nametags stored at the handheld wireless device; and combining the results of the first speech recognition analysis and the second speech recognition analysis. | 05-21-2015 |
20150142429 | Recording and Entertainment System - A recording and entertainment system is provided. A mobile kiosk is capable or being outfitted with one or more cameras for capturing visual content, an audio recording system, a display device, a computer to control recording, storing, processing, and playing the captured visual and audio content. The captured video and audio are then processed and edited to produce remembrance products in various forms, including videos. Karaoke function is provided by a means for recognizing songs being performed and displaying lyrics on the display. Camera tilt, noise suppression, noise cancelation, voice control of the computer, a battery pack to free kiosk for movement, means for internet communication of the visual and audio content, means for confirming that objects of the visual content capture are within the field of view, robotic control of the kiosk, and interactive gaming means are additional features that can be utilized with the system. | 05-21-2015 |
20150142430 | PRE-PROCESSING APPARATUS AND METHOD FOR SPEECH RECOGNITION - A pre-processing apparatus for speech recognition may include: a trailing silence period detection unit configured to detect the length of a trailing silence period contained in a speech signal; a reference trailing silence period storage unit configured to store the length of a reference trailing silence period; and a trailing silence period adjusting unit configured to adjust the length of the trailing silence period contained in the speech signal based on the length of the reference trailing silence period. | 05-21-2015 |
20150142431 | MULTILINGUAL SPEECH RECOGNITION AND PUBLIC ANNOUNCEMENT - Embodiments of the present invention provide a system, method, and program product to deliver an announcement to people, such as a public announcement. A computer receives input representative of audio from one or more people speaking in one or more natural languages. The computer processes the input to identify the languages being spoken, and identifies a relative proportion of each of the identified languages. Using these proportions, the computer determines one or more languages in which to deliver the announcement. The computer then causes to be delivered the announcement in the determined languages. In other embodiments, the computer can also determine an order in which to deliver the announcement. Further, the computer can transmit the announcement in the determined languages and order for delivery in aural or visual form. | 05-21-2015 |
20150149162 | MULTI-CHANNEL SPEECH RECOGNITION - Disclosed herein are systems, methods, and computer-readable storage devices for performing per-channel automatic speech recognition. An example system configured to practice the method combines a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel. The system can recognize speech in the first audio channel of the recording using a first model associated with the first speaker, and recognize speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model. The system can generate recognized speech as an output from the communication session. The system can identify the models based on identifiers of the speakers, such as a telephone number, an IP address, a customer number, or account number. | 05-28-2015 |
20150149163 | VOICE INPUT CORRECTION - An embodiment provides a method, including: accepting, at an audio receiver of an information handling device, voice input of a user; interpreting, using a processor, the voice input; thereafter receiving, at the audio receiver, repeated voice input of the user; identifying a correction using the repeated voice input; and correcting, using the processor, the voice input using the repeated voice input, wherein the corrective voice input does not include a predetermined voice command. Other aspects are described and claimed. | 05-28-2015 |
20150149164 | APPARATUS AND METHOD FOR RECOGNIZING VOICE - An apparatus and a method for recognizing a voice include a plurality of array microphones configured to have at least one microphone, and a seat controller configured to check a position of a seat provided in a vehicle. A microphone controller is configured to set a beam forming region based on the checked position of the seat and controls an array microphone so as to obtain sound source data from the set beam forming region. | 05-28-2015 |
20150318002 | MOOD MONITORING OF BIPOLAR DISORDER USING SPEECH ANALYSIS - A system that monitors and assesses the moods of subjects with neurological disorders, like bipolar disorder, by analyzing normal conversational speech to identify speech data that is then analyzed through an automated speech data classifier. The classifier may be based on a vector, separator, hyperplane, decision boundary, or other set of rules to classify one or more mood states of a subject. The system classifier is used to assess current mood state, predicted instability, and/or a change in future mood state, in particular for subjects with bipolar disorder. | 11-05-2015 |
20150325240 | METHOD AND SYSTEM FOR SPEECH INPUT - Inputting speech includes receiving feature information obtained by a client, the feature information comprising speech signals and user feature image signals, recognizing first candidate recognition data matching the user feature image signals, determining target recognition data based at least on the first candidate recognition data, and outputting the target recognition data. | 11-12-2015 |
20150325254 | METHOD AND APPARATUS FOR DISPLAYING SPEECH RECOGNITION INFORMATION - A method and an apparatus for displaying speech recognition information are provided. The method includes acquiring at least one of speech recognition information based on speech recognized by performing speech recognition, and response information indicating a processing result of the speech recognition information, displaying a speech recognition history list including the acquired information, in a first window region, selecting at least one of the acquired information included in the speech recognition history list, and updating response information corresponding to the selected at least one piece of acquired information. | 11-12-2015 |
20150341005 | AUTOMATICALLY CONTROLLING THE LOUDNESS OF VOICE PROMPTS - A system and method of regulating automatic speech recognition (ASR) playback of audible prompts includes: generating an audible prompt via a speaker; detecting ambient sound during the audible prompt via a microphone; obtaining a speech recognition confidence value for speech recognition performed on the ambient sound; and reducing a volume level of the audible prompt based on the speech recognition confidence value while continuing to generate the audible prompt. | 11-26-2015 |
20150356971 | MODIFICATION OF VISUAL CONTENT TO FACILITATE IMPROVED SPEECH RECOGNITION - Technologies described herein relate to modifying visual content for presentment on a display to facilitate improving performance of an automatic speech recognition (ASR) system. The visual content is modified to move elements further away from one another, wherein the moved elements give rise to ambiguity from the perspective of the ASR system. The visual content is modified to take into consideration accuracy of gaze tracking. When a user views an element in the modified visual content, the ASR system is customized as a function of the element being viewed by the user. | 12-10-2015 |
20150356981 | Augmenting Speech Segmentation and Recognition Using Head-Mounted Vibration and/or Motion Sensors - Example methods and systems use multiple sensors to determine whether a speaker is speaking. Audio data in an audio-channel speech band detected by a microphone can be received. Vibration data in a vibration-channel speech band representative of vibrations detected by a sensor other than the microphone can be received. The microphone and the sensor can be associated with a head-mountable device (HMD). It is determined whether the audio data is causally related to the vibration data. If the audio data and the vibration data are causally related, an indication can be generated that the audio data contains HMD-wearer speech. Causally related audio and vibration data can be used to increase accuracy of text transcription of the HMD-wearer speech. If the audio data and the vibration data are not causally related, an indication can be generated that the audio data does not contain HMD-wearer speech. | 12-10-2015 |
20150364135 | SPEECH RECOGNITION METHODS, DEVICES, AND SYSTEMS - Speech recognition methods, devices, and systems are described herein. One system includes a number of microphones configured to capture sound in an area, a digital signal processor configured to segregate the captured sound into a plurality of signals, wherein each respective signal corresponds to a different portion of the area, and an automatic speech recognition engine configured to separately process each of the plurality of signals to recognize a speech command in the captured sound. | 12-17-2015 |
20150364139 | SENSOR ENHANCED SPEECH RECOGNITION - A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user. Additionally, the system may obtain, from the visual content, metadata associated with the user and the environment of the user. The system may also include determining, based on the visual content and metadata, if the user is speaking. If the user is determined to be speaking, the system may obtain audio content associated with the user and the environment. The system may then adapt, based on the visual content, audio content, and metadata, one or more acoustic models that match the user and the environment. Once the one or more acoustic models are adapted and loaded, the system may enhance a speech recognition process or other process associated with the user. | 12-17-2015 |
20150380013 | LEARNING ALGORITHM TO DETECT HUMAN PRESENCE IN INDOOR ENVIRONMENTS FROM ACOUSTIC SIGNALS - A system is described that constantly learns the sound characteristics of an indoor environment to detect the presence or absence of humans within that environment. A detection model is constructed and a decision feedback approach is used to constantly learn and update the statistics of the detection features and sound events that are unique to the environment in question. The learning process may not only rely on acoustic signal, but may also make use of signals derived from other sensors such as range sensor, motion sensors, pressure sensors, and video sensors. | 12-31-2015 |
20160005401 | ENGINE FOR HUMAN LANGUAGE COMPREHENSION OF INTENT AND COMMAND EXECUTION - The invention provides a computer system for interacting with a user. A set of concepts initially forms a target set of concepts. An input module receives a language input from the user. An analysis system executes a plurality of narrowing cycles until a concept packet having at least one concept has been identified. Each narrowing cycle includes identifying at least one portion of the language and determining a subset of concepts from the target set of concepts to form a new target subset. An action item identifier identifies an action item from the action items based on the concept packet. An action executer that executes an action based on the action item that has been identified. | 01-07-2016 |
20160019886 | METHOD AND APPARATUS FOR RECOGNIZING WHISPER - A method and an apparatus of recognizing whisper are provided. The method of recognizing a whisper may include recognizing a whispering action performed by a user through a first sensor, recognizing a loudness change through a second sensor, and activating a whisper recognition mode based on the whispering action and the loudness change. | 01-21-2016 |
20160111111 | SYSTEMS, METHODS, AND DEVICES FOR INTELLIGENT SPEECH RECOGNITION AND PROCESSING - Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element. | 04-21-2016 |
20160125882 | Voice Control System with Multiple Microphone Arrays - A voice controlled medical system with improved speech recognition includes a first microphone array, a second microphone array, a controller in communication with the first and second microphone arrays, and a medical device operable by the controller. The controller includes a beam module that generates a first beamed signal using signals from the first microphone array and a second beamed signal using signals from the second microphone array. The controller also includes a comparison module that compares the first and second beamed signals and determines a correlation between the first and second beamed signals. The controller also includes a voice interpreting module that identifies commands within the first and second beamed signals if the correlation is above a correlation threshold. The controller also includes an instrument control module that executes the commands to operate said medical device. | 05-05-2016 |
20160140963 | SPEECH RECOGNITION CANDIDATE SELECTION BASED ON NON-ACOUSTIC INPUT - A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions. | 05-19-2016 |
20160140964 | SPEECH RECOGNITION SYSTEM ADAPTATION BASED ON NON-ACOUSTIC ATTRIBUTES - A method includes the following steps. A vicinity from which speech input to a speech recognition system originates is determined. Non-acoustic data from the vicinity of the speech is obtained using one or more non-acoustic sensors. A subject speaker is identified as the source of the speech input from the obtained non-acoustic data. One or more non-acoustic attributes of the subject speaker is analyzed. A speech recognition system is adjusted based on the one or more analyzed non-acoustic attributes. | 05-19-2016 |
20170236519 | SPEECH RECOGNITION USING ELECTRONIC DEVICE AND SERVER | 08-17-2017 |
20190147099 | AUTOMATIC IDENTIFICATION OF RETRAINING DATA IN A CLASSIFIER-BASED DIALOGUE SYSTEM | 05-16-2019 |
20190147852 | SIGNAL PROCESSING AND SOURCE SEPARATION | 05-16-2019 |
20190147882 | AUTOMATED COGNITIVE RECORDING AND ORGANIZATION OF SPEECH AS STRUCTURED TEXT | 05-16-2019 |
20190147884 | ACCELERATED DATA TRANSFER FOR LATENCY REDUCTION AND REAL-TIME PROCESSING | 05-16-2019 |
20190147904 | METHOD, DEVICE AND APPARATUS FOR SELECTIVELY INTERACTING WITH MULTI-DEVICES, AND COMPUTER-READABLE MEDIUM | 05-16-2019 |
20220139371 | SIMULTANEOUS ACOUSTIC EVENT DETECTION ACROSS MULTIPLE ASSISTANT DEVICES - Implementations can detect respective audio data that captures an acoustic event at multiple assistant devices in an ecosystem that includes a plurality of assistant devices, process the respective audio data locally at each of the multiple assistant devices to generate respective measures that are associated with the acoustic event using respective event detection models, process the respective measures to determine whether the detected acoustic event is an actual acoustic event, and cause an action associated with the actional acoustic event to be performed in response to determining that the detected acoustic event is the actual acoustic event. In some implementations, the multiple assistant devices that detected the respective audio data are anticipated to detect the respective audio data that captures the actual acoustic event based on a plurality of historical acoustic events being detected at each of the multiple assistant devices. | 05-05-2022 |