DNA is the future for data storage. That future is coming very soon. – SynBioBeta

Software

We have read how DNA data storage is about to go viral. At SynBioBeta 2019, this was made even more clear after a panel discussion of leaders in the field forecasted that costs for storing information in DNA could drop to $100 per terabyte in as little as five years with the right investment. While challenges remain for automating the DNA reading/writing process, experts are increasingly leaning towards DNA as a long-term information storage solution, particularly for archiving culturally significant data.

“A lot of our interactions every day are mired in data transfer,” said Henry Lee, co-founder of Kern Systems.

From cat memes to satellite photos, the amount of data we’re generating worldwide is growing exponentially. The technologies for storing that data are not advancing as quickly. Fortunately, nature has evolved its own elegant solution for information storage: DNA.

DNA stores all of the information required to make a human or a plant in an incredibly tiny package. A small but growing group of scientists is now working to replicate that storage strategy to preserve digital data.

“It’s all based on translating bits into bases,” said Karin Strauss, Principal Research Manager at Microsoft. Every two bits of information translates into one of the four DNA nucleotides. Once the sequences are mapped out in software, the DNA is synthesized

The other half of the DNA storage equation is recovering the bits (or reading the DNA) via sequencing.

“Now that we know how to read DNA, we’ll always be able to read it, so it’s an eternally relevant means of data storage,” said Strauss.

DNA data storage

DNA synthesis and DNA sequencing technologies were not designed for writing and reading digital information. A lot of energy has gone into making perfect DNA, said Bill Peck, Chief Technology Officer at Twist Bioscience. But we might be able to resolve error-ridden sequences using good software.

Essentially when we’re making DNA, we’re actually making millions and millions of the same molecules at the same time, said Lee. In a data storage system, you can use that as redundancy. Data scientists use algorithms to encode redundancy in digital media storage devices like DVD’s. That redundancy can be used to correct errors.

“We very much can tolerate errors in the DNA and we are willing to give up on some quality for other benefits,” said Strauss. The beauty of computer science is that we can still recover the data bit by bit.

So far, the process isn’t cheap.

When DNA is synthesized, it’s essentially “printed” onto silicon chips, and silicon is expensive. Twist is pushing the limits on how much DNA you can print on a single chip, said Peck, but that innovation is also expensive. The panelists almost unanimously agreed that significant investments are required to make DNA-based data storage a practical reality.

Another significant cost involved in the writing-storage-reading workflow is labor.

“There are writers and readers that are fully automated today,” said Strauss, “but the entire process is not automated.” Everything between DNA synthesis and sequencing, such as preparing sequencing libraries, is still done by hand. Liquid handling robots can help, but Strauss’s team is trying to find ways to automate more affordably, so that the entire process is scalable.

Kern Systems and Molecular Assemblies are working to make synthesis more scalable by innovating the manufacturing process. They’re focused on enzyme-based synthesis, a change in paradigm from the chemical-based methods we’ve been using for the last 30 years.

Investment is an issue here too.

“We’re trying to come up with the ink that will drive the printer to write DNA”, said Bill Efcavitch, cofounder of Molecular Assemblies, “but we’re going to need partnerships to engineer those enzymes at scale”.

While increased investments are needed to make DNA-based data storage practical at scale, Lee predicts people will start using the technology within the next 2-3 years.

Government agencies could be early customers, said Efcavitch, because they need to store massive amounts of data for long periods of time.

Peck and Strauss agreed that the first use of the technology will likely be archival. There is a lot of intrinsic value in figuring out how to store culturally significant information like music for millennia, said Peck.

Down the line, Lee hopes to see the technology in many more hands. “We’re interested in how we can miniaturize this,” he said. If the technology isn’t siloed, then he expects that biohackers will help build additional apps.

Fundamentally, storing digital information in DNA is a very simple idea. When you begin to imagine how the technology might be used in the real world, it gets a lot more complicated.

For instance, when it comes to actually retrieving information that is stored in DNA, you probably don’t want to have to sequence an entire library. We need to develop the DNA equivalent of a digital search function. Strauss’s team is using machine learning to develop search capabilities within molecules.

Right now, the focus has been on cold data — data that doesn’t need to be accessed very often. “DNA sequencers until recently were based on batch processes,” said Strauss. But new sequencing technologies such as Oxford Nanopore’s are more real-time. Real-time sequencing is a step in the “hot data storage” direction, but we still have a long way to go.

“The digital storage world is so new, we really don’t know what it’s going to look like in 5 years,” said Peck. Ironically, digital storage is also relatively new, but now things aren’t considered archived until they’re digitized, so the technology might move faster than we think.

When it comes to hot storage — the kind of instantaneous, on-demand access to data that flash drives provide — “the best way to make things happen is to tell a bunch of scientists and engineers that it’s impossible,” said Peck. “So, it’s impossible…”

3