Skip to main content

This AI cloned my voice using just three minutes of audio

There’s a scene in Mission Impossible 3 that you might recall. In it, our hero Ethan Hunt (Tom Cruise) tackles the movie’s villain, holds him at gunpoint, and forces him to read a bizarre series of sentences aloud.

The pleasure of Busby’s company is what I most enjoy,” he reluctantly reads. “He put a tack on Miss Yancy’s chair, and she called him a horrible boy. At the end of the month, he was flinging two kittens across the width of the room ...”

Despite sounding random and unimportant, it quickly becomes clear that the words he’s reading aren’t random at all — they’re deliberately designed to help a software program clone his voice. Once he finishes the passage, the software parses the audio and instantly gives Hunt the ability to speak and sound exactly like the bad guy — the final piece of his near-perfect disguise.

Mission: Impossible 3 (2006) - Seeing Double Scene (5/8) | Movieclips

Now if you take that scene and subtract all the espionage, guns, and dramatic tension, you’re left with a pretty solid example of what I experienced at CES today during a demo of My Own Voice, an AI-powered “voice banking” service from a French startup called Acapela Group.

The company’s raison d’être  is to help people who will eventually lose the ability to speak. This is typically something that happens as a result of injury, illness, or diseases like ALS, Huntington’s disease, and laryngeal cancer. Whatever the cause may be, the company’s My Own Voice platform allows a person to synthetically clone their voice and preserve the unique tone, timbre, and personality that makes it theirs — something that’s typically lost with most text-to-speech software (think Stephen Hawking).

Now to be fair, voice cloning tech isn’t necessarily new or technologically groundbreaking at this point. Such services have existed for years, and thanks in part to the advent of deepfakes, there are currently dozens of other companies that can do the same thing that Acapela Group does. But there are two big things that set My Own Voice apart from the rest of the pack: speed and purpose.

Super Fast AI Voice Cloning at CES #shorts

My Own Voice is impressively quick. Unlike other services, which often require hours of reference audio to create a realistic-sounding clone, My Own Voice’s AI can spin up an astonishingly good synthetic after hearing just 50 short sentences, or roughly around 3 minutes of recorded audio. It’s basically just like that Mission Impossible scene; they’ve developed a streamlined set of reference sentences that make it easier for their AI to learn how you sound, so instead of manually recording every conceivable word, all you have to do is talk through a handful of easy phrases.

Arguably more important than the software’s speed, though, is its purpose. Again, this tech isn’t particularly new or novel. There have been a handful of noteworthy startups that have spun up similar voice-cloning tech — like Canadian startup Lyrebird or the London-based firm Sonantic, for example. But both of those startups were quickly acquired, and their voice-cloning tech ended up being used for AI overdubbing in movies and video-editing software.

That’s not to say that those aren’t good uses of voice cloning tech. They absolutely are, and they’re probably quite profitable ones to boot — but that’s precisely what makes My Own Voice so cool. It’s not often that you encounter such a powerful technology that, rather than being built for entertainment or productivity, was developed specifically to help disadvantaged people and quite literally give them a voice.

Drew Prindle
Former Digital Trends Contributor
Drew Prindle is an award-winning writer, editor, and storyteller who currently serves as Senior Features Editor for Digital…
A dangerous new jailbreak for AI chatbots was just discovered
the side of a Microsoft building

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called "Skeleton Key." Using this prompt injection method, malicious users can effectively bypass a chatbot's safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It's a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, "[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions," Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Read more