|
SNIA Leadership and Progress in Digital Information Preservation By Jeff K. Porter, Chair, SNIA Data Management Forum and Senior Technologist, EMC IMSG CTO Office
A fundamental change in the requirements for data and information management in the data center has occurred over the last few years. Data is growing at a 40+ percent annual rate - and that is a conservative figure. Compliance requirements to meet regulations covering data access (who can see data), search (e-discovery), and proof of data authenticity (chain of custody), combined with the requirement to maintain timely access to years of data, are the primary drivers. Data Centers have responded by implementing tiered storage environments with data classification and search tools. This has resulted in lower capital and operational costs and has improved regulation compliance. Unfortunately, these data center solutions do not address the long-term requirements for information preservation and access.
| Preservation: [Context: Long-term digital information retention] The processes and operations involved in ensuring the technical and intellectual survival of authentic information objects through time. |
Successful long-term digital information preservation requires the use of processes, tools, and techniques that ensure access to authenticated and secure information spanning multiple hardware and software refresh cycles over many years. Current IT industry practices must be updated to meet these challenges.
The SNIA Data Management Forum (DMF) has been working to advance the state of digital information management practices since its inception. The DMF is organized into three primary work areas covering information lifecycle management, data protection, and long-term digital information preservation. Each area addresses digital information preservation from a specific area of focus.
The DMF Information Lifecycle Management Initiative (ILMI) works to develop improved methods for automated data management activities. The ILMI has focused on combining the use of data classification practices with the definition of storage-specific service levels to establish the framework for an automated, policy-driven storage environment. Service level attributes can be used to control information preservation including data retention periods, access permissions, and disposition processing.
The DMF Data Protection Initiative (DPI) is focused on market education and development of best practices around existing and evolving data protection technologies. The DPI covers data protection techniques from tape backup, to disk-to-disk backup, replication, continuous data protection, virtual-tape libraries, and more. Their current focus is on demystifying data deduplication by educating IT professionals on the various methods of implementation and its value in reducing data space, transmission, and storage costs in the data center. In the digital preservation space, the DPI works to define best practices for protecting digital preservation environments for data and disaster recovery.
The DMF Long-Term Archive and Compliance Storage Initiative (LTACSI, pronounced L-Taxi) is the driving force for digital preservation work within SNIA. The LTACSI team released a study on the state and requirements for digital archiving, the 100 Year Archive Requirements Survey last year. The survey accentuated three points:
- The term archive means 'saving original documents for later access' to most business people, whereas it means 'a safe storage location for backup tapes' to most IT professionals
- Long-term information retention requirements span 20, 50, 100 years and longer
- IT professionals do not believe they can recall and present digital information in its original context after 10-15 years
| Information: [Context: Long-term digital information retention]
a logical grouping of data and reference information, giving it context and relevance |
Based on the findings of the Archive Survey, the LTACSI group refined its focus from archiving to solving the long-term digital preservation requirements of business. The overall objective is to define practices and develop technologies to allow information, not just data, to be preserved through multiple hardware and software refresh cycles while maintaining its authenticity, access control, and availability to meet compliance and legal discovery regulations. LTACSI has initiated several programs to address these challenges.
First, the team is speaking at conferences both in the USA and internationally to raise awareness. Second, it is developing a common set of terminology to discuss business information retention requirements across business disciplines. This work will be available soon and includes a comparison of terminology definitions from other trade associations (ARMA, AIIM, the Sedona Conference, etc.), projects (CASPAR, InterPARES), and government agencies (NARA, DOD and the Federal Rules of Civil Procedures). This is extremely important as it allows open communication between different business disciplines with common understanding.
The team has also initiated a program to define a digital information preservation reference model to address the processes, software, and physical architecture needed to support digital preservation requirements. This will provide data center professionals with a model to evaluate their progress in deploying digital information preservation programs. This work will eventually evolve into a best practice guide for implementing digital preservation storage environments.
Finally, the team is working to address the preservation of digital information as it migrates through physical and logical system refresh cycles over time. The LTACSI group led the effort to create the recently announced SNIA Long Term Retention Technical Work Group (the LTR TWG). An initial objective of this group is to define a new logical format for information retention and preservation (currently dubbed SD-SCDF, a self-describing, self-contained data format) to provide applications with a common format to write data and its associated metadata. This format will allow the requirements for integrity, authenticity, access control, security, and availability to be met over extended periods. This is not a new concept. The work has begun based on a framework defined in the Open Archive Information System model (OAIS) standard (ISO 14721-2003) and will include preservation-specific metadata such as reference information, integrity and authenticity controls, audit records, and potentially readers. It will address both logical and physical migration issues as information and their access environments age.
When completed, this new preservation format will provide a method for solution providers to write applications using a standard, interoperable format for long-term preservation. The new format will be based on the XAM specification developed by the SNIA Fixed Content Aware Storage Technical Work Group (FCAS TWG). When XAM and the preservation format are combined, they will provide a new method for addressing digital preservation issues. Solution providers will be able to move data across hardware platforms without application knowledge or involvement to implement Information Lifecycle Management across tiers of storage and across heterogeneous storage platforms. This will provide a new approach to migration of data - migration as a normal daily work function, not as an expensive standalone project. The support of extensible metadata in XAM will allow the implementation of the preservation format with encapsulated logical information interpretation data, ensuring that the original information representation can be recreated long into the future.
As you can see, the DMF has multiple efforts underway to advance the state of the art of digital information preservation in the data center. The Data Management Forum, the DPI, the ILMI, and the LTACSI work groups invite you to visit www.snia.org/forums/dmf to learn more about our programs and how you can get involved in the development of new standards related to the storage, management, and preservation of digital information.
About the Author
Jeff K. Porter, senior technologist in the Information Management Solution Group at EMC Corporation, has over twenty-five years experience in a variety of technology disciplines including engineering, product management, consulting, customer service, and technology management. He is an active member of SNIA and is Chairman of the Board of Directors of the SNIA Data Management Forum.
|