Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

baidu
Image used with permission by copyright holder
Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
AT&T just made it a lot easier to upgrade your phone
AT&T Storefront with logo.

Do you want to upgrade your phone more than once a year? What about three times a year? Are you on AT&T? If you answered yes to those questions, then AT&T’s new “Next Up Anytime” early upgrade program is made for you. With this add-on, you’ll be able to upgrade your phone three times a year for just $10 extra every month. It will be available starting July 16.

Currently, AT&T has its “Next Up” add-on, which has been available for the past several years. This program costs $6 extra per month and lets you upgrade by trading in your existing phone after at least half of it is paid off. But the new Next Up Anytime option gives you some more flexibility.

Read more
Motorola is selling unlocked smartphones for just $150 today
Someone holding the Moto G Stylus 5G (2024).

Have you been looking for phone deals but don’t want to spend a ton of money on flagship devices from Apple and Samsung? Have you ever considered investing in an unlocked Motorola? For a limited time, the company is offering a $100 markdown on the Motorola Moto G 5G. It can be yours for just $150, and your days and nights of phone-shopping will finally be over!

Why you should buy the Motorola Moto G 5G
Powered by the Snapdragon 480+ 5G CPU and 4GB of RAM, the Moto G delivers exceptional performance across the board. From UI navigation to apps, games, and camera functions, you can expect fast load times, next to no buffering, and smooth animations. You’ll also get up to 128GB of internal storage that you’ll be able to use for photos, videos, music, and any other mobile content you can store locally. 

Read more
The Nokia 3210 is the worst phone I’ve used in 2024
A person holding the Nokia 3210, showing the screen.

Where do I even start with the Nokia 3210? Not the original, which was one of the coolest phones to own back in a time when Star Wars: Episode 1 -- The Phantom Menace wasn’t even a thing, but the latest 2024 reissue that has come along to save us all from digital overload, the horror of social media, and the endless distraction that is the modern smartphone.

Except behind this facade of marketing-friendly do-goodery hides a weapon of torture, a device so foul that I’d rather sit through multiple showings of Jar Jar Binks and the gang hopelessly trying to bring back the magic of A New Hope than use it.
The Nokia 3210 really is that bad

Read more