Monday, April 27, 2009

Voice Recognition Parse to Text - New Generation of IT Products on the way!

New Generation of Speech Recognition Software

This guy found hardware/software at a research trade show in China IBM division that accurately converts human speech into text, and is able to translate the text into any given language. Something like this could be the tipping point of a new wave of innovation and productivity improvement.


Apparently IBM has been in the voice recognition business for over 40 years, and has been gaining ground on making the necessary improvements and overcoming the barriers to market acceptance. Most of us are familiar with the "I'm sorry, I didn't understand your last response" inadequacies of the current over-the-phone software, and of course,

this embarrassing live-demonstration of MS Vista's voice recognition feature that may have contributed to the delay of MS Visa release
http://www.youtube.com/watch?v=2Y_Jp6PxsSQ

but at IBM, there are signs the days of awkward and malfunctioning voice recognition software may soon be a thing of the past.

IBM voice recognition presentation, in 2001 pretty slick
http://www.youtube.com/watch?v=ZZsL9UCn_3A


http://www.theatlantic.com/doc/200904/chinese-innovation/3
China's Way Forward - James Fallows, The Atlantic

At the far-opposite end of Greater Beijing, in a special government-sponsored research park, I visited the China Research Lab of IBM. The lab’s director, Thomas Li, has a life story like those I have heard at many successful tech and manufacturing companies. He was raised in Taiwan, by parents who had grown up on the mainland. He went to America for his doctorate, had a successful career with a U.S. firm—and then decided, for reasons of opportunity and sentiment, to be part of everything going on in mainland China. In 2002 Li moved with his family to Beijing, where he directs a 200-person team of mainly Chinese-trained computer scientists.

One product demo made me wish I could get out a checkbook on the spot. It addressed two of the real-world problems most difficult for computers to handle: converting spoken language to written text, and converting written text from one language to another. Computers have “done” both of these tasks for years, but they have not done them accurately enough to be worth the bother. Having watched many similar demonstrations, I was startled by this one. My wife and I were the only native speakers of English in the room. But when each of us spoke into the voice-recognition system, it produced nearly perfect real-time versions of what we said. I had been speaking with deliberate clarity, so as a test I said the following words at fast conversational speed: “I never worry that my apartment is bugged in Beijing, because I figure there aren’t that many non-native speakers who can understand high-speed slangy American speech.” Those very words, except “slangy” (which had become “slinky”), were on the screen. Hmmmm.

Although everyone in Li’s lab speaks English, differences in accent can be a barrier in discussions with native speakers. So on video conference calls with their IBM colleagues in Armonk, New York, the Chinese scientists listen to what is said in English—and see a nearly real-time English transcription running across the bottom of the screen, which greatly aids their comprehension. I am sure it is not perfect, but I have seen enough such projects through the decades to be impressed with this one. Based on another demo I saw, it is already mature enough to allow spoken words—from TV, radio, commercials, YouTube—to be indexed and therefore retrieved as accurately as ordinary text. The words could then be translated and searched, in the original language or others, so that video clips, say, would be easy to find by a phrase (“axis of evil”) someone says in them.

No comments:

Post a Comment