SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
DNA data storage will dramatically effect the way organizations think about data retention, data protection, and archival by providing capacity density and longevity several orders of magnitude beyond anything available today, while reducing requirements for both power, cooling, and fixity checks. One of challenges of any long term archival storage is being able to recover the data after possibly decades or longer. To do this, the reader must be able to bootstrap the archive, akin to how an OS is loaded after the master boot record is loaded. This talk will describe our initial work to define a standard schema for a self-describing DNA data archive sector zero, which will be as generic as possible, exploiting the format immutability of the natural DNA molecule to assure the archive can be bootstrapped by sequencers decades in the future, all while enabling archive writers to continue innovation in how the balance of the archive is synthesized. We call this the “DNA Rosetta Stone” project.
The information explosion is making the storage industry look to new media to meet increasing demands. Molecular storage, and specifically synthetic DNA, is a rapidly evolving technology that provides high levels of physical data density and longevity for archival storage systems. Major challenges to the adoption are the higher error rates for synthesis and sequencing, as well as the different nature of errors. Unlike traditional storage media, erroneous insertions and deletions are a common source of errors in DNA-based storage sys- tems. These errors require a different approach to encoding and recovery of binary data. Further, the quickly evolving fields of synthesis and sequencing require a codec that can accommodate rapid technology changes in parallelism, er- ror rates, and DNA lengths while allowing the writing and reading technologies to be mixed and matched to best effect. Here we describe ACOMA, an open source end-to-end codec, that has been demonstrated to achieve 0.99 bits of data per nucleotide while successfully recovering data across a variety of real industrial DNA processes for writing and reading data.
DNA-based data storage systems have the potential to offer unprecedented increases in density and longevity over conventional storage mediums. Starting from the assumption that advances in synthesis and sequencing technology will soon make DNA-based storage cost competitive with conventional media, we will need ways of organizing, accessing, and manipulating the data stored in DNA to harness its full potential. There are a range of possible storage system designs. This talk will cover three systems that the speaker co-developed and prototyped at NC State / DNAli Data Technologies. First, we'll show how we expanded the set of uniquely addressable files by nesting primers, the chemical labels that identify each file, to ensure that system capacity can reach the high densities afforded by DNA. Second, in our File Preview system, we exploit the thermodynamics of primer bindings to create a new file access operation that allows either full or partial access of a file's data, thereby saving sequencing bandwidth when a partial file read is sufficient. While the first two systems rely on double-stranded DNA, the third system, DORIS, is comprised of a T7 promoter and a single-stranded overhang domain (ss-dsDNA). The overhang serves as a physical address for accessing specific DNA strands as well as enabling a range of in-storage file operations like renaming and deletion. Meanwhile, the T7 promoter enables repeatable information access by transcribing information from DNA without destroying it.
The most expensive factor in traditional archival storage is that it is not durable and, thus, over the years, it is necessary to do many migrations due to degradation and technology obsolescence. DNA reading technology, due to the immutable format of the DNA molecule, will not be obsolete, mitigating this obsolescence. However low cost DNA storage does come with some imperatives. Indeed DNA outside the cell, as with any biological molecule, will be subject to aggressive degradation factors, the main one being water. Even dehydrated, degradation due to water cannot be completely avoided because no plastic container is watertight. Imagene developed and industrialized a process allowing to keep, at room temperature, dehydrated DNA under an inert atmosphere in a hermetic stainless steel capsule. This standalone storage system allows to store and retrieve digital data coded in DNA for millennia.
Enabling data storage on DNA relies on advancements in semiconductor technology to make DNA synthesis cheaper, which is a must-have for this field to emerge. The talk will introduce storage people to the concept of how semiconductors are used to create DNA and how the two are tied together, as well as how the advancements in semiconductors are crucial to bringing DNA data storage costs down.
DNA Sequencing using Sequencing-by-synthesis (SBS) technology is today responsible for the majority of sequencing done worldwide. This presentation will cover the fundamentals behind SBS, the steps involved in going from a DNA sample to data, and the current state of art of sequencing platforms. The presentation will end by discussing how DNA sequencing can be applied to DNA-based data storage.
Abstract: DNA data storage is an attractive option for digital data storage because of its extreme density, durability and eternal relevance. This is especially attractive when contrasted with the exponential growth in world-wide digital data production. In this talk we will present our efforts in building an end-to-end system, from the computational component of encoding and decoding to the molecular biology component of random access, sequencing and fluidics automation. We will also discuss some early efforts in building a hybrid electronic/molecular computer system that can offer more than just data storage, for example, image similarity search.