Skip to main content

This startup wants to deepfake clone your voice and sell it to the highest bidder

There’s a video that pops up periodically on my YouTube feed. It’s a conversation between rappers Snoop Dogg and 50 Cent bemoaning the fact that, compared to their generation, all modern hip-hop artists apparently sound the same. “When a person decides to be themselves, they offer something no-one else can be,” says 50 Cent. “Yeah, ‘cos once you be you — who can be you but you?” Snoop responds.

Snoop Dogg impersonates today's rappers sound-alike flow

When the video was uploaded in October 2014, that may have broadly been true. But just a few years later it certainly isn’t. In a world of audio deepfakes, it’s possible to train an A.I. to sound eerily similar to another person by feeding it an audio corpus consisting of hours of their spoken data. The results are unnervingly accurate.

Public figures like the rapper Jay-Z and the psychologist Jordan Peterson have already complained about people misappropriating their voices by creating audio deepfakes and then making them say silly things on the internet. “Wake up,” wrote Peterson. “The sanctity of your voice, and your image, is at serious risk.” Those are just the mischievous cases. In others, the results can tip over into un-nuanced criminality. In one 2019 incident, criminals used an audio deepfake to impersonate the voice of the CEO of an energy company and persuade an underling over the phone to urgently transfer $243,000 to a bank account.

Veritone, an A.I. company that creates smart tools for labeling media for the entertainment industry, is putting the audio deepfake power back in the hands (or, err, the throats) of those to whom it rightly belongs. This month, the company announced Marvel.ai, what company president Ryan Steelberg described to Digital Trends as a “complete voice-as-a-service solution.” For a fee, Veritone will build an A.I. model that sounds just like you (or, more likely, a famous person with an immediately recognizable voice), which can then be licensed out on loan like a high-tech version of Ariel’s voice-as-collateral bargain from The Little Mermaid.

Synthetic Voice by MARVEL.ai

“Your voice is just as valuable as any other content or brand attribute that you have,” said Steelberg. “[It’s on a level with] your name and likeness, your face, your signature, or a song you’ve written or piece of content you’ve created.”

“We can repurpose a lot”

Certain individuals have, of course, long sold their voices in the form of recording commercials or voiceovers, singing songs, and countless other forms of monetization. But these endeavors all required the person to actually say the words. What Veritone’s solution promises to do is to make this individually scalable.

What if, for instance, it was possible for Kevin Hart to license his voice out to a luxury brand that could then use it to create personalized ads featuring the name of the viewer, the location of their nearest brick-and-mortar sales outlet, and the particular product they could be most likely to buy? Rather than spending literally days in the recording booth, A.I. could allow this to be done with little more (on Hart’s part, at least) than signing on the dotted line to agree for his voice likeness to be harnessed by said third party. While he was off shooting a movie, or doing a comedy tour, or taking a vacation, or even sleeping, his digital voice could be raking in the cash.

“We can repurpose a lot,” Steelberg explained, regarding the training process. “People who are already speaking a ton, if they’re producing a podcast or in the media, there’s a lot of data out there. We probably have a ton of it already if they happen to be a customer of ours.”

“What we find so fascinating about this new category of A.I. is the extensibility and the variability.”

Steelberg said that the voice-as-a-service idea occurred to Veritone several years ago. However, at the time he was unconvinced that machine learning models were able to create the hyper-realistic synthetic voices he was looking for. This is especially important when it comes to voices we know intimately, even if we’ve never actually met the speaker in question. The results could be some kind of audible uncanny valley, with every wrong sound alerting listeners to the fact that they’re listening to a fake. But here in 2021 he is convinced that things have advanced to the point where this is now possible. Hence Marvel.ai.

Steelberg speaks in excited buzzwords about the massive potential of the technology, talking up its possible plethora of “modalities of execution.” Veritone can create models for text-to-speech. It can also build models for speech-to-speech, whereby a voice actor can “drive” a vocal performance by reading the words with suitable inflection and then having the finished voice overlaid at the end like a Snapchat filter. The company can also fingerprint each voice so it can tell if a piece of apparently real audio that pops up someplace was created using its technology.

“The more you think about it … you’ll literally come up with 50 more [possible use-cases],” he said. “What we find so fascinating about this new category of A.I. is the extensibility and the variability.”

Consider some others. A famous athlete might be a god on the basketball court, but a devil when it comes to reading lines in a script in a way that sounds natural. Using Veritone’s technology, their part in video game cutscenes or reading an audio book of their memoir (which they may also not have written) could be performed by a voice actor, which is then digitally tweaked to sound like the athlete. As another possibility, a movie could be translated for other countries with the same actor voice now reading the lines in French, Mandarin, or any other one of a number of languages, even if the actor doesn’t actually speak them.

How will the public react?

Image used with permission by copyright holder

A big question hanging over all of this, of course, is how members of the public are going to respond to it all. This is the tricky, unpredictable bit. Celebrities today must play a complex role: Both larger-than-life figures worthy of having their face plastered on billboards, and also relatable individuals who have relationship problems, tweet about watching TV in their pajamas, and make silly faces when they eat hot sauce.

What happens, then, when ads appear that not only feature a celebrity reading lines, but in cases when we know that said performer never actually said those lines, but rather had their voice programmatically utilized to bring us a targeted ad? Steelberg said that it is little different to a celebrity handing over control of their social media to a third party account manager. If we see Taylor Swift tweet, we know that it’s quite possibly not Taylor herself tapping out the message, especially if it’s an endorsement or piece of promotional content.

But voice is, in a very real way, different, precisely because it’s more personal. Especially if it’s accompanied by a degree of personalization, which is one of the use-cases that makes the most sense. The truth is that, to quote the screenwriter William Goldman, nobody knows what the public response will be — precisely because nobody has done exactly this before.

“It’s going to run the spectrum, right?” Steelberg said. “[Some] people are going to say, ‘I’m going to use this tool a little bit to augment my day to help me save time.’ Others are going to say, full-blown, ‘I want my voice everywhere to extend my brand, and I’m going to license it out.’”

His best guess is that acceptance will be on a case-by-case basis. “You need to be in tune with the reaction of your audience, and if you see things are working or not working,” he said. “They may love it. They may say, ‘You know what? I love the fact that you’re putting out 10 times more content or more personal content to me, even though I know you used synthetic content to augment it. Thank you. Thank you.’”

Think about the future

Veritone MARVEL.ai
Veritone

As for the future? Steelberg said that “We want to work with all the major talent agencies. We think anybody who is in the business of making money around a scarce brand should be thinking about their voice strategy.”

And don’t expect it to remain purely about audio, either. “We’ve always been fascinated by the potential of using synthetic content to either extend, augment, or potentially completely replace some of the legacy forms of content production,” he continued. “Be that in an audio sense or, ultimately in the future, a video sense.”

That’s right: Once it has cornered the market in the world of audio deepfakes, Veritone plans to go one step further and enter the world of fully realized virtual avatars that both sound and look indistinguishable from their source.

Suddenly those personalized ads from Minority Report sound a whole lot less like science fiction.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
This AI cloned my voice using just three minutes of audio
acapela group voice cloning ad

There's a scene in Mission Impossible 3 that you might recall. In it, our hero Ethan Hunt (Tom Cruise) tackles the movie's villain, holds him at gunpoint, and forces him to read a bizarre series of sentences aloud.

"The pleasure of Busby's company is what I most enjoy," he reluctantly reads. "He put a tack on Miss Yancy's chair, and she called him a horrible boy. At the end of the month, he was flinging two kittens across the width of the room ..."

Read more
Digital Trends’ Top Tech of CES 2023 Awards
Best of CES 2023 Awards Our Top Tech from the Show Feature

Let there be no doubt: CES isn’t just alive in 2023; it’s thriving. Take one glance at the taxi gridlock outside the Las Vegas Convention Center and it’s evident that two quiet COVID years didn’t kill the world’s desire for an overcrowded in-person tech extravaganza -- they just built up a ravenous demand.

From VR to AI, eVTOLs and QD-OLED, the acronyms were flying and fresh technologies populated every corner of the show floor, and even the parking lot. So naturally, we poked, prodded, and tried on everything we could. They weren’t all revolutionary. But they didn’t have to be. We’ve watched enough waves of “game-changing” technologies that never quite arrive to know that sometimes it’s the little tweaks that really count.

Read more
Digital Trends’ Tech For Change CES 2023 Awards
Digital Trends CES 2023 Tech For Change Award Winners Feature

CES is more than just a neon-drenched show-and-tell session for the world’s biggest tech manufacturers. More and more, it’s also a place where companies showcase innovations that could truly make the world a better place — and at CES 2023, this type of tech was on full display. We saw everything from accessibility-minded PS5 controllers to pedal-powered smart desks. But of all the amazing innovations on display this year, these three impressed us the most:

Samsung's Relumino Mode
Across the globe, roughly 300 million people suffer from moderate to severe vision loss, and generally speaking, most TVs don’t take that into account. So in an effort to make television more accessible and enjoyable for those millions of people suffering from impaired vision, Samsung is adding a new picture mode to many of its new TVs.
[CES 2023] Relumino Mode: Innovation for every need | Samsung
Relumino Mode, as it’s called, works by adding a bunch of different visual filters to the picture simultaneously. Outlines of people and objects on screen are highlighted, the contrast and brightness of the overall picture are cranked up, and extra sharpness is applied to everything. The resulting video would likely look strange to people with normal vision, but for folks with low vision, it should look clearer and closer to "normal" than it otherwise would.
Excitingly, since Relumino Mode is ultimately just a clever software trick, this technology could theoretically be pushed out via a software update and installed on millions of existing Samsung TVs -- not just new and recently purchased ones.

Read more