Los Alamos Develops Binary-to-DNA Translator – HPCwire


Tape storage has dominated high-volume data storage for many decades, and with data production continuing to grow exponentially, researchers are eager to find an improved solution. Many are exploring the possibility of high-volume data storage in DNA, using the four constituent nucleotides (A, T, G and C) to encode information with great efficiency. While the core technology has been proven many times, substantial hurdles continue to stand in the path of wide-scale adoption of DNA data storage. Now, researchers at Los Alamos National Laboratory have developed a translator that converts binary files into ATGC encoding for molecular storage.

“DNA storage could disrupt the way we think about archival storage, because the data retention is so long and the data density so high,” explained Bradley Settlemyer, a storage systems researcher and HPC programmer at Los Alamos. You could store all of YouTube in your refrigerator, instead of in acres and acres of data centers.”

The new software is called the Adaptive DNA Storage Codec, or “ADS Codex,” and supports both encoding from binary to DNA and decoding from DNA to binary, including accommodations for a variety of different methods for DNA synthesis. “[It] translates data files from what a computer understands into what biology understands,” said Latchesar Ionkov, the principal investigator for the project and a computer scientist at Los Alamos. “It’s like translating from English to Chinese, only harder.”

One of the main hurdles for DNA data storage is a comparatively high error rate for data writing. “You’re writing A, C, G, and T, but sometimes you try to write A, and nothing appears, so the sequence of letters shifts to the left, or it types AAA,” Ionkov said. “Normal error correction codes don’t work well with that.” So, to compensate for these errors, ADS Codex includes error detection codes that validate the resulting data by converting it back to binary; if a failure occurs, the software tinkers with the nucleotides until it achieves successful verification.

The quest for DNA data storage is part of a long-standing tradition of bleeding-edge storage work at Los Alamos, which also has a lot to gain from finding a more efficient data storage solution. “At Los Alamos, we have some of the oldest digital-only data and largest stores of data, starting from the 1940s,” Settlemyer said. “It still has tremendous value. Because we keep data forever, we’ve been at the tip of the spear for a long time when it comes to finding a cold-storage solution.”