Skip to main content

Scientists encode the novel ‘Wonderful Wizard of Oz’ in DNA

A few years ago, Harvard scientists successfully managed to encode a low-resolution GIF of a horse galloping into the DNA of an e.coli bacteria. Now, researchers have shown off the next level of DNA encoding: By storing the entire L. Frank Baum novel “The Wonderful Wizard of Oz” (the basis for the classic 1939 Hollywood movie of almost the same name) in the form of DNA information.

“We start with the digital version of the text,” Stephen Jones, a research scientist who collaborated on the project, told Digital Trends. “We send that information to our program, which spits out a bunch of DNA sequences, made of A,C,G and Ts. Each sequence is used to make actual pieces of DNA. Those pieces could be stored in some pretty rough conditions for thousands to even millions of years, much like we’ve seen with sequenced dinosaur DNA.”

Coding and decoding the text

Should someone, as Jones said, get the “burning desire” to read the novel in Esperanto, the constructed international auxiliary language it was translated into, they would take these DNA pieces and read back their sequence using a DNA sequencer. The sequence would then go through the algorithm developed by the team, which would translate it back into a digital version readable on computer. “So basically, a computer’s zeros and ones get turned into DNA’s As, Cs, Gs and Ts for storage, then the process is reversed when you’re ready to read,” Jones said.

Carrying out digital-to-DNA conversion has been possible for a long time. But the excitement of this work is the way that the conversion takes place. Digital and DNA storage have different issues, with digital storage being sensitive to electricity, temperatures, water, and more. DNA is more robust in these areas, but is prone to parts being accidentally deleted or added to during the encoding process.

“Academics and big companies like Google and Microsoft have been trying to figure a way around this for a long time,” Jones explained. “Usually, people just read enough copies of the DNA information that if one gets messed up, they can depend on another. You can imagine that process is very inefficient.”

An algorithm to overcome the problems

To overcome this, the team’s encoding algorithm has some neat qualities. To begin with, the information in each DNA sequence helps correct errors in every other DNA sequence’s information so that they build upon each other. The method also accounts for those deletions or additions, is flexible enough that it can be made stronger when a piece of information is really important (a character name in “Wizard of Oz,” for example) and weaker when the information doesn’t matter so much (a random word in the novel), and will specifically avoid DNA sequences known to be problematic like a string of A’s in a row. Finally, the method encrypts the information as it’s converted to DNA sequence, adding a layer of protection and privacy that could be useful with data more sensitive than a 120-year-old public domain novel.

“A top [real-world] use would be for long-term storage when you must keep the information, but use it infrequently,” Jones said, giving the example of historical banking data for years past. “Tech companies would see value for dormant accounts that no one’s using, but they don’t want to delete. There could [additionally] be a huge cost savings during storage. Storing DNA takes almost no energy — especially compared to keeping data servers plugged in and happy.”

This is a problem that at least one DNA storage company is working on, although it’s likely several years away from being viable. Nonetheless, work like this is a reminder that science is getting closer all the time.

A paper describing the work was recently published in the journal PNAS.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Your Google Photos app may soon get a big overhaul. Here’s what it looks like
The Google Photos app running on a Google Pixel 8 Pro.

Google Photos is set to get a long-overdue overhaul that will bring new and improved sharing and notification features to the app. With its automatic backups, easy sorting and search, and album sharing, Google Photos has always been one of the better photo apps, and now it's set to get a whole slew of AI features.

According to an APK teardown done by Android Authority and the leaker AssembleDebug, Google is now set to double down on improving sharing features. Google Photos will get a new social-focused sharing page in version 6.85.0.637477501 for Android devices.

Read more
The numbers are in. Is AMD abandoning gamers for AI?
AMD's RX 7700 XT in a test bench.

The data for the first quarter of 2024 is in, and it's bad news for the giants behind some of the best graphics cards. GPU shipments have decreased, and while every GPU vendor experienced this, AMD saw the biggest drop in shipments. Combined with the fact that AMD's gaming revenue is down significantly, it's hard not to wonder about the company's future in the gaming segment.

The report comes from the analyst firm Jon Peddie Research, and the news is not all bad. The PC-based GPU market hit 70 million units in the first quarter of 2024, and from year to year, total GPU shipments (which includes all types of graphics cards) increased by 28% (desktop GPU shipments dropped by -7%, and CPU shipments grew by 33.3%). Comparing the final quarter of 2023 to the beginning of this year looks much less optimistic, though.

Read more
Hackers claim they’re selling the user data of 560 million Ticketmaster customers
A crowd enjoying a music show that you are at because of Ticketmaster.

Ticketmaster is giving people a lot to talk about. If the Justice Department is not suing it, it's reportedly suffering a data breach affecting the vital information of hundreds of millions of users. Hackread reports that a hacker group is claiming it breached Ticketmaster, putting the personal data of 560 million users at risk of suffering all types of attacks.

According to Hackread, the total amount of stolen data reaches 1.3TB and includes personal information such as names, emails, phone numbers, addresses, event details, ticket sales, order information, and partial payment card data. The list doesn't end there, though, as the compromised data also includes customer fraud details, expiration dates, and the last four digits of card numbers.

Read more