DNA Storage in the Yottabyte Era
Demand for data storage is skyrocketing
Did you know we are living in the “Zettabyte Era”? Honestly, did you even know what a zettabyte is? Kilobytes, megabytes, gigabytes, maybe even terabytes, sure, but zettabytes? Well, if you ran data centers you’d know, and you’d care, because demand for data storage is skyrocketing. (All those TikTok videos and Netflix shows add up!) Believe it or not, pretty much all of that data is still stored on magnetic tapes, which have served us well for the past sixty-some years, but at some point there won’t be enough tapes or enough places to store them to keep up with our data storage needs.
That’s why people are so keen on DNA storage — including me.
A zettabyte, for the record, is one sextillion bytes. A kilobyte is 1000 bytes; a zettabyte is 1000⁷! Between kilobytes and zettabytes, by powers of 1000, come megabytes, gigabytes, terabytes, petabytes, and exabytes; after zettabytes come yottabytes. Back in 2016, Cisco announced we were in the Zettabyte Era, with global internet traffic reaching 1.2 zettabytes. We’ll be in the Yottabyte Era before the decade is out.
People have been working on DNA storage for many years; I first wrote about it in 2016, when I speculated it might mean we could literally be our own medical record. We’re not at the stage of practical DNA storage yet, and we probably won’t be for many more years, but it’s hard to believe we’re not going to be there eventually. Unlike every other form of recording we’ve come up with, DNA can persist almost indefinitely, and as long as there are intelligent species based on DNA, they’ll want to read it.
Most importantly, DNA can store a lot of data. As MIT professor Mark Bathe, Ph.D. told NPR: “All the data in the world could fit in your coffee cup that you’re drinking in the morning if it were stored in DNA.”
Mind. Blown.
What prompted me to write about this now was an announcement from Microsoft. Working with researchers from the Molecular Information Laboratory at the University of Washington, their paper demonstrated a “proof of concept” molecular controller that allowed them to write to DNA “three orders of magnitude” — that’s 1000× — denser. As the announcement said: “Ultimately, we were able to use the system to encode a message onto four strands of synthetic DNA, proof that nanoscale DNA writing is possible at dimensions necessary for practical DNA data storage.”
I’ll spare readers the detail of what they did — I don’t pretend to understand it — but the paper concludes:
…we project that the technology will scale further to billions of features per square centimeter, enabling synthesis throughput to reach megabytes-per-second levels in a single write module, competitive with the write throughput of other storage devices… We foresee these assemblers being used in other areas like material science, synthetic biology, diagnostics, and closed-loop massive molecular biology experimental assays.
Similarly, the announcement concludes: “We foresee the technology reaching arrays containing billions of electrodes capable of storing megabytes per second of data in DNA. This will bring DNA data storage performance and cost significantly closer to tape.”
You can bet Microsoft is taking this seriously.
Lest anyone think only Microsoft is working on this, there have been several other promising developments in recent weeks. Interesting Engineering highlighted a few of them:
- Georgia Tech Research Institute researchers have developed a microchip that allows faster writing to DNA, and expect it to be 100× faster than current technologies. Lead researcher Nicholas Guise told BBC that, since DNA can survive so long, “the cost of ownership drops to almost zero.”
- Northwestern University scientists have demonstrated a new “enzymatic system” that encodes three bits of data per hour. The NU announcement explains: “Our method is much cheaper to write information because the enzyme that synthesizes the DNA can be directly manipulated.” The researchers believe the technique could be used to install “molecular recorders” inside cells to act as biosensors; the possibilities are astounding.
- A team at China’s Southeast University used a new process to split content in sequences, rather than one long chain, while “downsizing” the instruments used. TechRadar speculates could lead to the first mass-market DNA storage device. Professor Liu Hong told Global Times: “Now we are aiming at the combination of electronic information technology and biology, which might be used in various aspects including data storage and nucleic test for virus.”
Interesting Engineering may have missed the most interesting use yet: Business Insider India reports that Roddenberry Entertainment has created an NFT (non-fungible token) of Gene Roddenberry’s signature on the first Star Trek contract and is storing it on DNA implanted in bacteria — “the first-ever living ecological non-fungible token (NFT).” The bacteria are currently dormant, but if revived, they will duplicate the NFT as they reproduce (which sort of goes against what I thought NFTs were).
Somehow I don’t think that’s what the Microsoft researchers were intending DNA storage to accomplish, but, hey, anything for Star Trek.
As Professor Bathe told NPR, if cost and efficacy issues are solved — and they are well on their way — “Then, you know, the sky’s the limit in terms of just storing everything that we ever wanted to and ever will need to.”
It’s possible that DNA storage will never get fast enough or cheap enough to replace existing storage methods. It’s possible that some other new technique will emerge that will be even better than DNA storage (e.g., holographic storage?). But we are DNA-based creatures, and the possibility of using the technique that nature builds us with to store and manipulate the data we generate is irresistible.
There already are DNA-based “robots” and DNA-based computers, so honestly, DNA storage doesn’t surprise me at all. We should be expecting molecular DNA recorders — and trying to anticipate what we do and don’t want them used for.
In the 21st century, biology is computing, and vice versa. DNA isn’t just our genetic history and future, but information that we can read and write. We call it “synthetic biology” now, but as the field grows and grows, we’re likely to forget the “synthetic” bit, like “digital health” may just become “health” or “cryptocurrency” just becomes “currency.”
Life in the Yottabyte Era should be very interesting.