SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
DNA lacks many key attributes found in other traditional storage media types including locality and addressability. The Rosetta Stone workgroup is aiming to solve the issue of enabling archive readers to understand key metadata about the archive and position them to be able to consume its contents. This session will provide an overview of where the Rosetta Stone workgroup is in the process of creating a recommended approach for this issue.
DNA lacks many key attributes found in other traditional storage media types including locality and addressability. The Rosetta Stone workgroup is aiming to solve the issue of enabling archive readers to understand key metadata about the archive and position them to be able to consume its contents. This session will provide an overview of where the Rosetta Stone workgroup is in the process of creating a recommended approach for this issue.
Deoxyribonucleic Acid (DNA) as a storage medium with high density and long-term preservation properties can satisfy the requirement of archival storage for rapidly increased digital volume. The read and write processes of DNA storage are error-prone. Images widely used in social media have the properties of fault tolerance which are well fitted to the DNA storage. However, prior work simply investigated the feasibility of DNA storage storing different types of data and simply store images in DNA storage, which did not fully investigate the fault-tolerant potential of images in DNA storage systems. In this talk, we introduce new image-based DNA systems, which can efficiently store images in DNA storage with improved DNA storage robustness and density. First, a new DNA architecture is proposed to fit JPEG-based images and improve the image's robustness in DNA storage. Moreover, barriers inserted in DNA sequences efficiently prevent error propagation in images of DNA storage. Also, to improve the overall encoding density, a hybrid lossy and lossless encoding scheme is used. Finally, the experimental results indicate that the proposed schemes achieve higher robustness to the injected errors than other DNA storage codes. Also, the schemes improve the encoding density of DNA storage and make it much close to the ideal case.
A new error correction code for DNA data storage is presented. The code takes advantage of the information about the noise in DNA data channel to clean up errors and erasure, that's why it is called DNA Noise Aware Errors Erasures Cleaner (DNAe2c). By modeling errors and erasures source with different state-of-the-art distributions and real data, we see an improvement of 10x over Reed Solomon codes requiring less than 20% overhead, making DNAe2c a promising candidate to accelerate DNA data storage adoption.
Synthetic DNA-based data storage has been on the rise as a candidate for Data Storage due to its longer shelf life and higher data density. This technology is expected to tackle the ever-increasing demand for cold storage and reduce energy consumption of data-centers to preserve information over long periods of time. In 2021, Lenovo and IPT joined the race towards DNA synthesis for data storage application. In regards to the traditional chemical method, the established “base-by-base” synthesis promises the highest information density on short length DNA strands. To make this technology viable, it is imperative to build a microscale system with lower reagent consumption and faster cycle time when compared to DNA synthesizers for the biotechnology industry. Moreover, adding electronically-controlled selectivity to the process is the way to allow a first degree of parallelization and increased throughput to be compatible with current LTO-tape standards. An alternative to the use of harsh chemicals on the aforementioned method is the DNA synthesis using enzymes. Although it is a more recent and less-established technology, interest has grown due to the storage rate perspectives - resembling how fast DNA strands are replicated in cells. Other interesting feature with enzymes is the possibility of building long ternary sequences. Similar to the chemical method, a microfluidic platform is needed to reduce reagent consumption, alongside control for optimal enzymatic activity with thermal control. This presentation offers insights towards the use of bit-to-DNA writing machines in the data centers of the future. Facility requirements, challenges of using microfluidic platforms and technical overview will be provided. The knowledge shared here will contribute to path the way we envisioned the deployment and use of the DNA Data Storage technology
There are several well-known advantages of using synthetic DNA for cold-data storage, such as higher density, reduced energy consumption, and durability compared with the standard storage mediums used for the same purpose. The enablement of this technology in the market involves the development of cost-effective DNA synthesizers that can write the data at an appropriate throughput speed and a CODEC able to handle data from different synthesis and sequencing technologies. In the last two years, the Prometheus project, a partnership between Lenovo and IPT Institute in Brazil, has significantly progressed in developing DNA writing machines and a versatile CODEC. This presentation offers a comprehensive overview of the DNA data storage pipeline, providing real-world experiences from data encoding to storage and retrieval. Our primary goal is to provide the audience with valuable insights and practical knowledge regarding coding and decoding techniques, specifically emphasizing our designed error correction architecture. The CODEC developed includes not only the standard methods for storage systems, such as encoding and decoding algorithms, addressing, and error correction coding, but also comprises the application of standard techniques in the bioinformatics field known as sanitizing process, such as the removal of low-quality reads, adapter removal and filter for contaminants, followed by alignment, and clustering of sequenced reads. The last released version of the Lenovo DNA Data Storage CODEC, named Pantheon, is already applying the Sector scheme proposed by the SNIA DNA Archive Rosetta Stone (DARS) technical working group. Exciting results from experiments with this CODEC will be demonstrated and discussed. Finally, our presentation will inspire participants and provide a comprehensive overview of the complexity of implementing coding and adaptative decoding techniques for a functional DNA data storage system, including practical considerations, potential roadblocks, and viable solutions, drawing from our real-world experiences.
Users of DNA as a digital data storage medium must have confidence that they can reliably recover their stored data, and to understand the competing capabilities and claims of codecs, readers, writers, and container systems as a multi-vendor DNA data storage ecosystem emerges. To facilitate this, the DNA Data Storage Alliance is working to create standard methods and metrics to enable the objective verification of endurance and data retention claims for DNA-based storage media container systems, and to more generally standardize how DNA-based data reliability and endurance is characterized. This talk will review work on two specifications being developed: 1) a standard methodology for rating the expected half-life/shelf life of different DNA perseveration/storage mechanisms, from stainless steel sealed capsules filled with inert gas, to filter paper, such that objective comparisons can be made and verified; and 2) defining standard, “FDA-like”, labeling for DNA-media such that the recipient/owner of the media can select an optimal sequencing solution based on the knowledge of how the media was created and stored.