Photo Illustration by Sarah Rogers/MITTR | Photos Getty
It turns out that you don’t need to be a scientist to encode data in DNA. Researchers have been working on DNA-based data storage for decades, but a new template-based method inspired by our cells’ chemical processes is easy enough for even nonscientists to practice. The technique could pave the way for an unusual but ultra-stable way to store information.
The idea of storing data in DNA was first proposed in the 1950s by the physicist Richard Feynman. Genetic material has exceptional storage density and durability; a single gram of DNA can store a trillion gigabytes of data and retain the information for thousands of years. Decades later, a team led by George Church at Harvard University put the idea into practice, encoding a 53,400-word book.
This early approach relied on DNA synthesis—stringing genetic sequences together piece by piece, like beads on a thread, using the four nucleotide building blocks A, T, C, and G to encode information. The process was expensive, time consuming, and error prone, creating only one bit (or an eighth of a byte) with each nucleotide added to a strand. Crucially, the process required skilled expertise to carry out.
Related StoryThe race to save our online lives from a digital dark age We’re making more data than ever. What can—and should—we save for future generations? And will they be able to understand it?
The new method, published in Nature last week, is more efficient, storing 350 bits at a time by encoding strands in parallel. Rather than hand-threading each DNA strand, the team assembles strands from pre-built DNA bricks about 20 nucleotides long, encoding information by altering some and not others along the way. Peking University’s Long Qian and team got the idea for such templates from the way cells share the same basic set of genes but behave differently in response to chemical changes in DNA strands. “Every cell in our bodies has the same genome sequence, but genetic programming comes from modifications to DNA. If life can do this, we can do this,” she says.
Qian and her colleagues encoded data through methylation, a chemical reaction that switches genes on and off by attaching a methyl compound—a small methane-related molecule. Once the bricks are locked into their assigned spots on the strand, researchers select which bricks to methylate, with the presence or absence of the modification standing in for binary values of 0 or 1. The information can then be deciphered using nanopore sequencers to detect whether a brick has been methylated. In theory, the new method is simple enough to be carried out without detailed knowledge of how to manipulate DNA.
The storage capacity of each DNA strand caps off at roughly 70 bits. For larger files, researchers splintered data into multiple strands identified by unique barcodes encoded in the bricks. The strands were then read simultaneously and sequenced according to their barcodes. With this technique, researchers encoded the image of a tiger rubbing from the Han dynasty, troubleshooting the encoding process until the image came back with no errors. The same process worked for more complex images, like a photorealistic print of a panda.
To gauge the real-world applicability of their approach, the team enlisted 60 students from diverse academic backgrounds—not just scientists—to encode any writing of their choice. The volunteers transcribed their writing into binary code through a web server. Then, with a kit sent by the team, they pipetted an enzyme into a 96-well plate of the DNA bricks, marking which would be methylated. The team then ran the samples through a sequencer to make the DNA strand. Once the computer received the sequence, researchers ran a decoding algorithm and sent the restored message back to a web server for students to retrieve with a password. The writing came back with a 1.4% error rate in letters, and the errors were eventually corrected through language-learning models.
Once it’s more thoroughly developed, Qian sees the technology becoming useful as long-term storage for archival information that isn’t accessed every day, like medical records, financial reports, or scientific data.
The success nonscientists achieved using the technique in coding trials suggests that the DNA storage could eventually become a practical technology. “Everyone is storing data every day, and so to compete with traditional data storage technologies, DNA methods need to be usable by the everyday person,” says Jeff Nivala, co-director of University of Washington’s Molecular Information Systems Lab. “This is still an early demonstration of going toward nonexperts, but I think it’s pretty unique that they’re able to do that.”
DNA storage still has many strides left to make before it can compete with traditional data storage. The new system is more expensive than either traditional data storage techniques or previous DNA-synthesis methods, Nivala says, though the encoding process could become more efficient with automation on a larger scale. With future development, template-based DNA storage might become a more secure method of tackling ever-climbing data demands.
A photo illustration depicting gene editing and the US patent language for CRISPR
A photo illustration depicting gene editing and the US patent language for CRISPR