Google introduced offline voice typing for Android Jelly Bean at the 2012 IO Developer's COnference. |
At the recently completed 2012 Google IO Developer's Conference, Google responded to Apple by adding voice response to the search functions in the new Android version 4.1, aka Jelly Bean. Voice search in Android will now be very Siri-like, using what Google calls a knowledge graph to formulate contextually-aware spoken responses.
Google then went a step further than Apple, by introducing offline voice typing in Android Jelly Bean. The average iPhone user may be unaware that Siri relies on a connection to cloud servers, as has Google's voice search, but Google now claims to have "shrunk the Google speech recognizer" to fit into smartphones. While this will not provide the intelligent response of online voice search, by providing this capability Google has addressed one of the problems with speech UIs, i.e. the need for a continuous internet connection to execute the recognition algorithms and language database search.
Another problem with speech UIs is response time, or latency. As Google said in their IO presentation, a slow connection can make voice input unusable. By embedding the speech recognizer in the device, Android developers can more confidently include voice input in their applications. However, the demo device at Google IO was a top of the line Nexus smartphone, so it remains to be seen how pervasive the offline Android voice typing functions will be, at least initially. Google appears to be relying on general purpose application processor horsepower, and on-board memory resources, to execute these functions in software. It is noteworthy that Google qualified their introduction, by saying only U.S. English language will be supported at launch. Installing a multi-language database was no doubt infeasible at this time.
Spansion's Acoustic Co-Processor combines customer logic and flash memory to offload CPUs for speech recognition applications. |
Alvin Wong, Spansion's VP of Marketing and Business Development, says that the acoustic co-processor is inserted into the speech processing path immediately after the Analog-Digital conversion of the voice input. The processor utilizes voice technology from Nuance Communications, a provider of speech recognition solutions for PC applications, call centers, and in healthcare. The acoustic co-processor logic, which is Spansion's own design, executes algorithms to score sound packets (similar to syllables or phonemes) from the digitized voice against the acoustic database, stored in flash memory on the same chip. The co-processor transmits sound scores over a Serial Peripheral Interface (SPI) to an application processor, which then executes a search algorithm to select the most likely spoken words from a language database.
Wong says that by implementing the scoring algorithms in their acoustic co-processor hardware, Spansion is able to significantly improve both response time and accuracy over conventional voice interfaces. In a benchmark experiment with an automobile infotainment system, Spansion claims to have reduced the CPU load and response time by 50%, compared to a standalone 800MHz ARM processor. Spansion attributes much of the speedup in the scoring process to their design of a dedicated 1.2GB/s wide data bus between the acoustic processor logic and the flash memory. The embedded flash memory in the Spansion acoustic co-processor allows for storage of as many as 10 to 12 language models, according to Wong, each with their own library of sounds that are provided by Nuance. The larger acoustic databases provide finer granularity, and hence greater accuracy, in the matching process. In addition, offloading the scoring process can also free the application processor to execute a more natural language interface.
Spansion is planning to deliver design samples of their automotive platform in Q3, and is targeting Q1 of 2012 for full production. Device scaling will be based on how much flash memory is required to store language models.The company plans to introduce a low-end co-processor, supporting 1 or 2 language models, along with a high-end device capable of 10 to 12 models.
No comments:
Post a Comment