Skip to main content

Nvidia’s new voice A.I. sounds just like a real person

The “uncanny valley” is often used to describe artificial intelligence (A.I.) mimicking human behavior. But Nvidia’s new voice A.I. is much more realistic than anything we’ve ever heard before. Using a combination of A.I. and a human reference recording, the fake voice sounds almost identical to a real one.

All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech

In a video (above), Nvidia’s in-house creative team describes the process of achieving accurate voice synthesis. The team equates speech to music, featuring complex and nuanced rhythms, pitches, and timbres that aren’t easy to replicate. Nvidia is creating tools to reproduce these intricacies with A.I.

Get your weekly teardown of the tech behind PC gaming
Check your inbox!

The company unveiled its latest advancements at Interspeech, which is a technical conference dedicated to research into speech processing technologies. Nvidia’s voice tools are available through the open-source NeMo toolkit, and they’re optimized to run on Nvidia GPUs (according to Nvidia, of course).

The A.I. voice isn’t just a demo, either. Nvidia has transitioned to an A.I. narrator for its I Am A.I. video series, which shows the impacts of machine learning across various industries. Now, Nvidia is able to an artificial voice as a narrator, free of the usual audio artifacts that come along with synthesized voices.

Nvidia tackles A.I. voices in one of two ways. The first is to train a text-to-speech model on a speech given by a human. After enough training, the model can take any text input and convert it into speech. The other method is voice conversion. In this case, the program uses an audio file of a human speaking and converts the voice to an A.I. one, matching the pattern and intonation.

For practical applications, Nvidia points to the countless virtual assistants helming customer service lines, as well as the ones present in smart devices like Alexa and Google Assistant. Nvidia says this technology reaches much further, however. “Text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice,” Nvidia’s blog post reads.

Nvidia is developing a knack for tricking people using A.I. The company recently went into detail about how it created a virtual CEO for its GPU Technology Conference, aided in part by its own Omniverse software.

Jacob Roach
Lead Reporter, PC Hardware
Jacob Roach is the lead reporter for PC hardware at Digital Trends. In addition to covering the latest PC components, from…
I tested Nvidia’s new RTX 4060 against the RX 7600 — and it’s not pretty
AMD RX 7600 on a pink background.

Nvidia's newly released RTX 4060 is positioned as the champion of midrange, 1080p gaming, but it has some stiff competition in the form of the RX 7600 from AMD. Both GPUs are built for premium gaming at Full HD, but there are some important differences between them.

I threw both on my test bench to see which is the best graphics card for your next PC build. Although neither GPU shoots ahead of the other, there's a clear winner for value.
Spec for spec

Read more
Nvidia’s supercomputer may bring on a new era of ChatGPT
Nvidia's CEO showing off the company's Grace Hopper computer.

Nvidia has just announced a new supercomputer that may change the future of AI. The DGX GH200, equipped with nearly 500 times more memory than the systems we're familiar with now, will soon fall into the hands of Google, Meta, and Microsoft.

The goal? Revolutionizing generative AI, recommender systems, and data processing on a scale we've never seen before. Are language models like GPT going to benefit, and what will that mean for regular users?

Read more
Nvidia’s new Guardrails tool fixes the biggest problem with AI chatbots
Bing Chat saying it wants to be human.

Nvidia is introducing its new NeMo Guardrails tool for AI developers, and it promises to make AI chatbots like ChatGPT just a little less insane. The open-source software is available to developers now, and it focuses on three areas to make AI chatbots more useful and less unsettling.

The tool sits between the user and the Large Language Model (LLM) they're interacting with. It's a safety for chatbots, intercepting responses before they ever reach the language model to either stop the model from responding or to give it specific instructions about how to respond.

Read more