Skip to main content

Baidu’s SwiftScribe uses AI to transcribe audio files up to an hour in length

baidu
Image used with permission by copyright holder
Baidu may be known as “the Google of China,” but that doesn’t mean the Asian search giant doesn’t come up with its own unique applications. On Monday, it debuted SwiftScribe, a web app that automatically transcribes speech files with the help of artificial intelligence.

SwiftScribe is about as simple as web apps come. It recognizes files in .wav and .mp3 format, and once the upload’s complete, the transcription process gets underway. A 30-second file takes about 10 seconds, and a one-minute file less than 30. An hour of audio, the maximum length SwiftScribe will allow, takes 20 minutes.

It’s not always perfect. SwiftScribe sometimes misses the spelling of certain words, and capitalization and punctuation aren’t always on point. But it offers an editable field that lets users correct mistakes, and a built-in speed-shifting tool that plays the uploaded audio clip audio at a faster or slower speed.

Baidu project manager Tian Wu, who was inspired partly by her experience transcribing interviews as a graduate student at the University of California, Santa Barbara, said that SwiftScribe has the potential to save hours. “English is not my first language,” Wu told VentureBeat. “It took 10 hours to transcribe one hour of audio. That’s my personal experience. Usually, it will take a professional four to six hours to transcribe a one-hour audio clip.”

Image used with permission by copyright holder

Wu told VentureBeat that SwiftScribe can help transcribe audio 1.67 times faster on average. She envisions transcriptionists doing more work and ultimately getting paid more for it.

SwiftScribe’s more proof of concept than polished product, right now. In the coming months, the team plans to enhance the app with video transcription and captioning, support for more file formats, and an option for automatically adding punctuation.

It’s free to use for now, but Baidu’s considering a paid option. “In the future, we hope to turn it into a business,” Wu said.

Baidu may not have the name recognition in the United States that it does in mainland China, where the Beijing-based juggernaut commands roughly 80 percent of the internet search market and amasses quarterly profits that regularly top the hundreds of millions. But it’s hoping to change that. In 2013, it opened the Institute of Deep Learning, a research center devoted to advancing the firm’s artificial intelligence efforts.

In the immediate future, the Chinese aims to use the lab to increase revenue by building augmented reality marketing tools. But it may be considering a significant expansion of health-care and education applications.

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
A dangerous new jailbreak for AI chatbots was just discovered
the side of a Microsoft building

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called "Skeleton Key." Using this prompt injection method, malicious users can effectively bypass a chatbot's safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It's a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, "[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions," Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

Read more