Human beings most times find it hard in distinguishing a particular voice in the midst of many people or a crowd. Imagine how difficult it is for a microphone to identify distinct sounds and this is observed in cases where a smart speaker is given instructions at house parties or crowded places.
Just as most smartphone cameras now allow users to focus on a single object among many, it may soon be possible to pick out individual voices in a crowd by suppressing all other sounds, thanks to a new Artificial Intelligence (AI) system developed by Google researchers.
On Wednesday, researchers at Google unveiled this incredible yet simultaneous terrifying technology. The team had been working for a long time on isolating sources of audio like speech in videos, something which automated systems have difficulty with.
HOW DOES THIS AI VOICE RECOGNITION WORK?
This is an important development as computers as not as good as humans at focusing their attention on a particular person in a noisy environment. Known as the cocktail party effect, the capability to mentally “mute” all other voices and sounds comes natural to us humans.
The system works on an Audio-Visual Speech Separation Model that can identify voices by monitoring people’s faces when they speak. Its neural network model was trained to pick out sounds from different individuals through ‘fake parties’ created by the researchers.
Background noises were mixed in these virtual parties in order to teach the AI how to distinct audio tracks by isolating multiple voices. The results were mind-blowing as the system could entirely separate not just the noise but also the speech of two people talking simultaneously.
Where and how Google implements this in its product line remains to be seen, but its Hangouts chat client and YouTube videos seem like ideal places to test it out. Further, if you added a camera to a Google Home speaker, the device could do a much better job of knowing who’s talking and delivering personalized results.