Data Storage on DNA Can Keep It Safe for Centuries
In two recent experiments, a team of computer scientists at the University of Washington and Microsoft, and a separate group at the University of Illinois, have shown that DNA molecules can be the basis for an archival storage system potentially capable of storing all of the world's digital information in roughly nine liters of solution, about the amount of liquid in a case of wine.
The new research demonstrates that specific digital files can be retrieved from a potentially vast pool of data. The new storage technology would also be capable of keeping immense amounts of information safely for a millennium or longer, researchers said.
It would also address a glaring Achilles' heel at the heart of microelectronic data storage systems: Magnetic disks, tape and even optical storage systems safely store information at most for only a handful of decades.
The recent advances suggest there may be a new way to store the exploding amount of computer data for centuries rather than decades.
The raw storage capacity of DNA is staggering compared with even the most advanced electronic or magnetic storage systems. It is theoretically possible to store an exabyte of information, if it were coded into DNA, in the volume of a grain of sand. An exabyte is roughly equivalent to 200 million DVDs.
In nature, DNA molecules carry the genetic instructions that govern the development and function of living organisms. The cost of sequencing or "reading" the genetic code is falling faster than the cost of computer memory, and technologists are beginning to make progress in their ability to more rapidly synthesize strands composed of arbitrary sequences of the small organic molecules known as oligonucleotides, the basic DNA building blocks.
Computer scientists say they believe that as costs of sequencing and creating synthetic DNA continue to fall, it will soon be possible to create a new class of hybrid storage systems.
"In the last year, it suddenly hit us that this fusion of computer technology and biology will be where future advances come from," said Douglas M. Carmean, a Microsoft researcher who had been a leading designer of microprocessor chips at Intel.
The evolution of the two fields dates back to the start of interactive computing. The first true personal computer, known as the LINC, was designed by Wesley A. Clark in 1961 for biomedical researchers.
"Information technology has helped biotech in the past," said Luis Ceze, a University of Washington computer scientist and one of the designers of the new DNA storage system. "Now biotech has to pay back."
Early signs of a possible convergence of computing and biology can be found in a visit to a cramped laboratory in the basement of the Paul G. Allen Center for Computer Science & Engineering on the University of Washington campus.
It is crammed with equipment more readily found in a biology laboratory — a desktop DNA sequencing system and a separate machine that is used to amplify fragments of DNA by making billions of precise copies.
Together, the two machines form a prototype of a data-archiving approach that could spread more widely as soon as five years from now. The researchers note that it could be used by Hollywood studios and modern hospitals that need long-term storage for digitized movies as well as X-ray and M.R.I. images.
Previous experiments performed by scientists at European Bioinformatics Institute in Hinxton, England, in 2013, and in 2012 at Harvard, showed that it was possible to store data files in DNA and then read the information back in digital form. The Harvard group received international attention for storing billions of copies of Regenesis, a book written by the Harvard geneticist George Church and Ed Regis.
The research teams from the University of Illinois, and the University of Washington and Microsoft, have built on that work by storing information in DNA form and then retrieving a specific file from the data. The Illinois scientists were able to encode parts of the Wikipedia pages of six universities, and then select and edit parts of the text written in DNA corresponding to three of the colleges.
The University of Washington and Microsoft researchers decided that because of the vast potential storage capacity of DNA, it would be better used for simply storing data rather than rewriting it. They were able to store four small image files and then retrieve them independently with just a single error.
Computer storage systems resemble cities with precise street addresses where data can be retrieved. With DNA storage, the scientists exploit the self-assembly nature of biological molecules that encode information. The basic function of DNA, replication in living organisms, is essential for life.
A digitized picture, for example, might be broken into thousands of pieces that are in turn mapped into thousands of individual strands of DNA. When they encode the information, the researchers add a unique identifier that makes it possible to later reassemble the complete picture or file, like putting together a jigsaw puzzle.
The scientists use the ability to amplify specific DNA strands rapidly and efficiently using a technique known as "polymerase chain reaction," or P.C.R., to make it easier to find the information they wish to retrieve. Invented by the chemist Kary Mullis in 1983, P.C.R. makes it possible to amplify a single copy of a DNA molecule into millions of copies of a single sequence.
In addition to refining the computerized reassembly techniques, the research groups are continuing to work on improving the basic storage technology.
"We have scaled up our 2012 work about a hundredfold," Dr. Church said. His laboratory is working in collaboration with Technicolor S.A., a French company that has a large business in digital data and film archiving. "The big issue is lowering the cost by another thousandfold, which is our current focus," he added.
The Harvard laboratory is trying to encode and retrieve "A Trip to the Moon," a 1902 French silent film.
The University of Washington and Microsoft researchers have partnered with Twist Bioscience, a San Francisco start-up that has developed a semiconductor-based system that accelerates the production of custom DNA strands in which digital data can be encoded.
The scientists acknowledge that their current bottleneck is in the ability to write the information in DNA, but they say they expect that technology to begin to improve rapidly.
"It is absolutely about the technology and miniaturizing the scale of the reaction" used to create synthetic DNA, said Emily Leproust, the chief of Twist.
Currently, it takes just seconds to store or retrieve data using magnetic tape cartridges — widely used by corporate computing centers to keep archival copies. But the cartridges themselves are often stored on shelves or in elaborate robotic retrieval systems; retrieving them and putting the data online for access can take hours.
The cost and speed of encoding digital information in DNA will soon come down by several orders of magnitude, said Dr. Leproust, making it competitive with magnetic storage.
Although it is snaillike in retrieval speed compared with electronic and magnetic memory, DNA will be far better in terms of the scale of the data it can store and the time scale.
"DNA is a remarkable media for long-term storage," said Karin Strauss, a Microsoft computer architect. "All you have to do is keep it cold and dry."
Correction: December 3, 2015
An earlier version of a picture caption with this article misidentified the employer and current job of Douglas Carmean. He designs computers at Microsoft, no longer microprocessor chips at Intel.
Enviado do meu iPhone