Microsoft reaches ‘human parity’ with new speech recognition system

Researchers at Microsoft have announced a ‘major breakthrough’ in speech recognition. The company has created a technology that recognizes the words in a conversation, just as well as a person does. The team is calling it ‘human parity’ and plans to use this technology in its personal voice assistant Cortana along with its speech-to-text transcription software.

The team at Microsoft Artificial Intelligence and Research has reported a speech recognition system that makes the same or fewer errors than professional transcriptionists. These researchers reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month.

However, this research doesn’t imply that the computer will recognize every word perfectly. But then, nor do we! According to the company, it means that the error rate, or the rate at which the computer mishears a word like ‘have” for “is” or “a” for “the” – is almost the same as you would expect from a person, hearing the same conversation respectively.

Existing tools used for this milestone include the Computational Network Toolkit, an open source Microsoft system for applying deep learning to computing tasks. This allows the specialized graphics processing units (GPUs) running in parallel, to enable faster processing of deep-learning algorithms.

The system also uses neural language models for better recognition. Deep neural networks use large amounts of data to teach computer systems to recognize patterns from inputs such as images or sounds.

No arguments here that this is an exciting breakthrough for Microsoft. An improvisation of voice recognition will prove beneficial to consumers and enterprisers as well. At the moment, the company has not announced as to when we can expect to this system to make way to its products.