If you’re fed up with chatbots mishearing you, Microsoft is making machine ears a little more attentive. Researchers from the tech giant have achieved an impressively low error rate for speech-recognition software — just 6.3 percent, according to a paper published last week. The company hopes this milestone will help refine and personalize its AI assistant, Cortana, and features like Skype Translator.
The newest error rate of Microsoft’s conversational speech-recognition system is regarded as the lowest in the industry, according to Xuedong Huang, Microsoft’s chief speech scientist. IBM meanwhile recently announced an error rate of 6.6 percent, bettering its 6.9 percent error rate from April and the 8 percent milestone that the company achieved last year. Two decades ago, the lowest error rate of a published system was more than 43 percent, Microsoft notes in a blog post.
In artificial intelligence development, researchers often model machines of off humans by equipping the systems with the abilities to speak, see, and hear. Although Microsoft’s achievement is just 0.3 percent below IBM’s, incremental advancements like these bring machines closer to human-like capabilities. In speech recognition, the human error rate is around 4 percent, according to IBM.
“This new milestone benefited from a wide range of new technologies developed by the AI community from many different organizations over the past 20 years,” Microsoft’s Huang said.
A few of these technologies include biologically inspired systems called neural networks, a training technique known as deep learning, and the adoption of graphic processing units (GPUs) to process algorithms. Over the past two years, neural networks and deep learning have enabled AI researchers to develop and train systems in advanced speech recognition, image recognition, and natural language processing. Just last year, Microsoft created image-recognition software that outperformed humans.
Although initially designed for computer graphics, GPUs are now regularly used to process sophisticated algorithms. Cortana can process up to 10 times more data using GPUs than previous methods, according to Microsoft.
With steady advances like these, repeating your question to a chatbot may be a thing of the past.