Sundar Pichai, Google CEO announce the most important google product update at Google I/O 2018, including Android P and google application sets. Gmail gets the smart compose to help us write messages faster. Google Lens has the ability to match the object style. Google News now understand the full story of the news and support the news sources you love. But the most stunning part in the first-day keynote is a Google Assistant demo.
You can hear the demo conversation here.
It shows Google Assistant can make phone calls to help people schedule a hair salon appointment. It’s a real conversation between an AI computing system and a person in the business situation. They announce this new technology as Google Duplex. The first impression of this conversation is just natural. It is exactly the target of Google Duplex. People even don’t know they are talking with a robot.
Google tries to make the conversation experience comfortable by using a human voice instead of a stilled computerized voice. It’s a heavy effort to train a voice model. Usually, it takes several months. Google said they trained a deep neural network model, WaveNet, which can generate realistic-sounding audio using a small amount of original corpus. It reduces the effort from several months to hundreds of hours. WaveNet can also generate some non-speech sounds like breathing and mouth movements.
*Automatic Speech Recognition (ASR) [1]
Google Duplex combines a concatenative text to speech (TTS) engine and a synthesis TTS engine(Tacotron and WaveNet) to make the system sound more natural. In the demo, we can hear Google Assitant answered “hmm”s and “uh”s. It gives a natural reaction to the speaker. Also, it allows the system to show in a natural way that it is still processing. It’s a familiar situation that we using disfluencies sound during human communication.
Some people give a big concern that this technology will be used in phone deception and vote scammer. When a new technology becomes a reality, it will definitely give an impact to the society. At least, we should not feel strange about the photo manipulation. Now, we know audio can be manipulated too. Google said Google Duplex will be made for simple tasks, like making a reservation. Anything more complex than making an appointment, it’s not being able to do.
[1]Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone