There is a lot of digital information in the world — about three zettabytes’ worth (that’s 3000 billion billion bytes) — and the constant influx of new digital content poses a real challenge for archivists.
This is a growing problem in the life sciences, where massive volumes of data (including DNA sequences) make up the fabric of the scientific record.
One solution is to use DNA: a compact, robust molecule, as a storage medium.
“If you had to think about the information you’d like to store for tens of thousands of years, it is most likely digitally stored, but these methods are becoming less and less permanent,” Nick Goldman told delegates at BCX Disrupt 2017.
“Stone tablets and even paper can last a long time, but modern forms of storage, such as hard disk drives, have very short lifespans, and even if they do last, the equipment you need to read it
Data centres are big, power-hungry and expensive, and the hardware within them have a lifespan of only 5 to 10 years.
Considering growing problem of big data, especially in life sciences where massive volumes of data (including DNA sequences) make up the fabric of the scientific record, Goldman and his team began considering the effectiveness of DNA to store and pass on data through evolution, and they created the modern field of “DNA-storage”, the use of DNA to archive digital information.
Goldman and his team developed a code to translate the zeroes and ones that make up digital files into As, Cs, Gs and Ts — the letters that correspond to the basic components of DNA.
“It might not seem like such a hard thing to do, but we had to use some other rules to make sure the experiment would work, such as requiring that the new, alphabetic code would not have any repeats,” Goldman said.
Repeating letters in the code could confuse the machines that write and read DNA.
“We also had to work out how to break each message into many pieces, since humans can only reliably create DNA fragments about 200 letters long, sort them out and put them back together again when they are read,” he added.
“We had to do all this in a manner that could recover the information perfectly, even when there were inevitable writing and reading errors.”
This coded information can be fed into DNA synthesis machines, which transforms it into the physical material in much the same way an inkjet printer lays down ink on paper.
What you get in the end is an almost imperceptible smidgen of dust, which itself contains thousands of DNA copies of the encoded files.
“Because DNA is so robust, the material will last for many thousands of years if it is kept safe, dry and cool, and DNA sequencing machines can be used to read the files back,” Goldman said.
On an “almost invisible spec of dust worth of DNA”, Goldman’s team could store about 1MB of data, and every single byte could be decoded back perfectly.
Big companies, such as IBM, Microsoft, Intel, ScanDisk, and Google, and many startups have expressed interest in DNA storage and have been conducting experiments of their own.
“At the moment, though, it is astonishingly expensive, however it is getting cheaper especially if we can make it smaller and more convenient,” Goldman added.
“We are already seeing that DNA technologies are advancing rapidly, and in 3 or 5 years, these could be viable technologies that anyone could own.”
Archives of DNA would take up very little space, and you’d be able to get all the information in the world into the back of a medium sized vehicle.
“This would work for a half-million year time-span, so it would have an unbelievably long lifespan in easy-to-create conditions – even a refrigerator would work,” Goldman told delegates.
“With DNA it is also easy to make copies of your digital data. The first copy is quite difficult to make, but additional copies can be made in only a few minutes and can self-assemble in an exponential way.
“Lastly, DNA will never become obsolete, as other dorms of technology have become, because we all have DNA inside us and people will always be interested in DNA and the ability to read it will always exist.”
Optimistically, Goldman believes that in 3 to 5 years, ths form of storage will start becoming more viable and will disrupt data centres as we know them today.