







Apr 3, 2023
Mar 24, 2023
Jan 11, 2023
Aug 25, 2022
In our recent webcast Is the Data Really Gone? A Primer on the Sanitization of Storage Devices, our presenters Jonmichael Hands (Chia Network), Jim Hatfield (Seagate), and John Geldman (KIOXIA) took an in-depth look at exactly what sanitization is, what the standards are, and where sanitization is being practiced today. If you missed it, you can watch on-demand their recommendations for the verification of sanitization to ensure that devices are meeting stringent requirements – and access the presentation slides at the SNIA Educational Library. Here, in our Q&A blog, our experts answer more of your questions on data sanitization.
Is Over Provisioning part of the spare blocks or separate?
The main intent of an overprovisioning strategy is to resolve the asymmetric NAND behaviors of Block Erase (e.g., MBs) and Page Write (e.g., KBs) that allows efficient use of a NAND die’s endurance capability, in other words, it is a store-over capability that is regularly used leaving older versions of a Logical Block Addressing (LBA) in media until it is appropriate to garbage collect.
Spares are a subset of overprovisioning and a spare block strategy is different than an overprovisioning strategy. The main intent of a spare strategy is a failover capability mainly used on some kind of failure (this can be a temporary vibration issue on a hard disk drive or a bad sector).
The National Institute of Standards and Technology (NIST) mentions the NVMe® Format with Secure Erase Settings to 1 for User Data erase or 2 for Crypto as a purge method. From what I can gather the sanitize was more a fallout of the format rather than anything that was designed. With the NVMe sanitize would you expect the Format with the Data Erasure options to be depreciated or moved back to a clear?
The Format NVM command does have a crypto erase, but it is entirely unspecified, vendor specific, and without any requirements. It is not to be trusted. Sanitize, however, can be trusted, has specific TESTABLE requirements, and is sanctioned by IEEE 2883.
The Format NVM command was silent on some requirements that are explicit in both NVMe Sanitize commands and IEEE 2883. It was possible, but not required for a NVME Format with Secure Erase Settings set to Crypto to also purge other internal buffers. Such behavior beyond the specification is vendor specific. Without assurance from the vendor, be wary of assuming the vendor made additional design efforts. The NVMe Sanitize command does meet the requirements of purge as defined in IEEE 2883.
My question is around logical (file-level, OS/Filesystem, Logical volumes, not able to apply to physical DDMs): What can be done at the technical level and to what degree that it is beyond what modern arrays can do (e.g., too many logical layers) and thus, that falls under procedural controls. Can you comment on regulatory alignment with technical (or procedural) acceptable practices?
The IEEE Security in Storage Working Group (SISWG) has not had participation by subject matter experts for this, and therefore has not made any requirements or recommendations, and acceptable practices. Should such experts participate, we can consider requirements and recommendations and acceptable practices.
Full verification is very expensive especially if you are doing lots of drives simultaneously. Why can't you seed like you could do for crypto, verify the seeding is gone, and then do representative sampling?
The problem with seeding before crypto erase is that you don’t know the before and after data to actually compare with. Reading after crypto erase returns garbage…. but you don’t know if it is the right garbage. In addition, in some implementations, doing a crypto erase also destroys the CRC/EDC/ECC information making the data unreadable after crypto erase.
Seeding is not a common defined term. If what was intended by seeding was writing known values into known locations, be aware that there are multiple problems with that process. Consider an Overwrite Sanitize operation. Such an operation writes the same pattern into every accessible and non-accessible block. That means that the device is completely written with no free media (even the overprovisioning has that pattern). For SSDs, a new write into that device has to erase data before it can be re-written. This lack of overprovisioned data in SSDs results in artificial accelerated endurance issues.
A common solution implemented by multiple companies is to de-allocate after sanitization. After a de-allocation, a logical block address will not access physical media until that logical block address is written by the host. This means that even if known data was written before sanitize, and if the sanitize did not do its job, then the read-back will not return the data from the physical media that used to be allocated to that address (i.e., that physical block is de-allocated) so the intended test will not be effective.
Are there other problems with Sanitize?
Another problem with Sanitize is that internal protection information (e.g., CRC data, Integrity Check data, and Error Correction Code data) have also been neutralized until that block is written again with new data. Most SSDs are designed to never return bad data (e.g., data that fails Integrity Checks) as a protection and reliability feature.
What are some solutions for Data Sanitization?
One solution that has been designed into NVMe is for the vendor to support a full overwrite of media after a crypto erase or a block erase sanitize operation. Note that such an overwrite has unpopular side-effects as the overwrite:
A unique complication for a Block Erase sanitization operation that leaves NAND in an erased state is not stable at the NAND layer, so a full write of deallocated media can be scheduled to be done over time, or the device can be designed to complete an overwrite before the sanitize operation returns a completion. In any/either case, the media remains deallocated until the blocks are written by the host.
Can you kindly clarify DEALLOCATE all storage before leaving sanitize ? What does that mean physically?
Deallocation (by itself) is not acceptable for sanitization. It is allowable AFTER a proper and thorough sanitization has taken place. Also, in some implementations, reading a deallocated logical block results in a read error. Deallocation must be USED WITH CAUTION. There are many knobs and switches to set to do it right.
Deallocation means removing the internal addressing that mapped a logical block to a physical block. After deallocation, media is not accessed so the read of a logical block address provides no help in determining if the media was actually sanitized or not. Deallocation gives as factory-fresh out of the box performance as is possible.
Aug 25, 2022
Jul 29, 2022
Sep 23, 2021
Nov 9, 2020
Last month, Bill Martin, SNIA Technical Council Co-Chair, presented a detailed update on what’s happening in the development and deployment of the NVMe Key-Value standard. Bill explained where Key Value fits within an architecture, why it’s important, and the standards work that is being done between NVM Express and SNIA. The webcast was one of our highest rated. If you missed it, it’s available on-demand along with the webcast slides. Attendees at the live event had many great questions, which Bill Martin has answered here:
Q. Two of the most common KV storage mechanisms in use today are AWS S3 and RocksDB. How does NVMe KV standards align or differ from them? How difficult would it be to map between the APIs and semantics of those other technologies to NVMe KV devices?
A. KV Storage is intended as a storage layer that would support these and other object storage mechanisms. There is a publicly available KVRocks: RocksDB compatible key value store and MyRocks compatible storage engine designed for KV SSDs at GitHub. There is also a Ceph Object storage design available. These are example implementations that can help an implementer get to an efficient use of NVMe KV storage.
Q. At which layer will my app stack need to change to take advantage of KV storage? Will VMware or Linux or Windows need to change at the driver level? Or do the apps need to be changed to treat data differently? If the apps don’t need to change doesn’t this then just take the data layout tables and move them up the stack in to the server?
A. The application stack needs to change at the point where it interfaces to a filesystem, where the interface would change from a filesystem interface to a KV storage interface. In order to take advantage of Key Value storage, the application itself may need to change, depending on what the current application interface is. If the application is talking to a RocksDB or similar interface, then the driver could simply be changed out to allow the app to talk directly to Key Value Storage. In this case, the application does not care about the API or the underlying storage. If the application is currently interfacing to a filesystem, then the application itself would indeed need to change and the KV API provides a standardized interface that multiple vendors can support to provide both the necessary libraries and access to a Key Value storage device. There will need to be changes in the OS to support this in providing a kernel layer driver for the NVMe KV device. If the application is using an existing driver stack that goes through a filesystem and does not change, then you cannot take advantage of KV Storage, but if the application changes or already has an object storage interface then the kernel filesystem and mapping functions can be removed from the data path.
Q. Is there a limit to the length of a key or value in the KV Architecture?
A.There are limits to the Key and value sizes in the current NVMe standard. The current implementation limits the key to 16 bytes due to a desire to pass the key within the NVMe command. The other architectural limit on a key is that the length of the key is specified in a field that allows up to 255 bytes for the key length. To utilize this, an alternative mechanism for passing the key to the device is necessary. For the value, the limit on the size is 4 GBytes.
Q. Are there any atomicity guarantees (e.g. for overwrites)?
A. The current specification makes it mandatory for atomicity at the KV level. In other words, if a KV Store command overwrites an existing KV pair and there is a power failure, you either get all of the original value or all of the new value.
Q. Is KV storage for a special class of storage called computational storage or can it be used for general purpose storage?
A. This is for any application that benefits from storing objects as opposed to storing blocks. This is unrelated to computational storage but may be of use in computational storage applications. One application that has been considered is that for a filesystem that rather than using the filesystem for storing blocks and having a mapping of each file handle to a set of blocks that contain the file contents, you would use KV storage where the file handle is the key and the object holds the file contents.
Q. What are the most frequently used devices to use the KV structure?
A. If what is being asked is, what are the devices that provide a KV structure, then the answer is, we expect the most common devices using the KV structure will be KV SSDs.
Q. Does the NVMe KV interface require 2 accessed in order to get the value (i.e., on access to get the value size in order to allocate the buffer and then a second access to read the value)?
A.If you know the size of the object or if you can pre-allocate enough space for your maximum size object then you can do a single access. This is no different than current implementations where you actually have to specify how much data you are retrieving from the storage device by specifying a starting LBA and a length. If you do not know the size of the value and require that in order to retrieve the value then you would indeed need to submit two commands to the NVMe KV storage device.
Q. Does the device know whether an object was compressed, and if not how can a previously compressed object be stored?
A. The hardware knows if it does compression automatically and therefore whether it should de-compress the object. If the storage device supports compression and the no-compress option, then the device will store metadata with the KV pair indicating if no-compress was specified when storing the file in order to return appropriate data. If the KV storage device does not perform compression, it can simply support storage and retrieval of previously compressed objects. If the KV storage device performs its own compression and is given a previously-compressed object to store and the no-compress option is not requested, the device will recompress the value (which typically won’t result in any space savings) or if the no-compress option is requested the device will store the value without attempting additional compression.
Q. On flash, erased blocks are fixed sizes, so how does Key Value handle defrag after a lot of writes and deletes?
A. This is implementation specific and depends on the size of the values that are stored. This is much more efficient on values that are approximately the size of the device’s erase block size as those values may be stored in an erase block and when deleted the erase block can be erased. For smaller values, an implementation would need to manage garbage collection as values are deleted and when appropriate move values that remain in a mostly empty erase block into a new erase block prior to erasing the erase block. This is no different than current garbage collection. The NVMe KV standard provides a mechanism for the device to report optimal value size to the host in order to better manage this as well.
Q. What about encryption? Supported now or will there be SED versions of [key value] drives released down the road?
A. There is no reason that a product could not support encryption with the current definition of key value storage. The release of SED (self-encrypting drive) products is vendor specific.
Q. What are considered to be best use cases for this technology? And for those use cases - what's the expected performance improvement vs. current NVMe drives + software?
A. The initial use case is for database applications where the database is already storing key/value pairs. In this use case, experimentation has shown that a 6x performance improvement from RocksDB to a KV SSD implementing KV-Rocks is possible.
Q. Since writes are complete (value must be written altogether), does this mean values are restricted to NVMe's MDTS?
A. Yes. Values are limited by MDTS (maximum data transfer size). A KV device may set this value to something greater than a block storage device does in order to support larger value sizes.
Q. How do protection scheme works with key-value (erasure coding/RAID/...)?
A. Since key value deals with complete values as opposed to blocks that make up a user data, RAID and erasure coding are usually not applicable to key value systems. The most appropriate data protection scheme for key value storage devices would be a mirrored scheme. If a storage solution performed erasure coding on data first, it could store the resulting EC fragments or symbols on key value SSDs.
Q. So Key Value is not something built on top of block like Object and NFS are? Object and NFS data are still stored on disks that operate on sectors, so object and NFS are layers on top of block storage? KV is drastically different, uses different drive firmware and drive layout? Or do the drives still work the same and KV is another way of storing data on them alongside block, object, NFS?
A. Today, there is only one storage paradigm at the drive level -- block. Object and NFS are mechanisms in the host to map data models onto block storage. Key Value storage is a mechanism for the storage device to map from an address (a key) to a physical location where the value is stored, avoiding a translation in the host from the Key/value pair to a set of block addresses which are then mapped to physical locations where data is then stored. A device may have one namespace that stores blocks and another namespace that stores key value pairs. There is not a difference in the low-level storage mechanism only in the mapping process from address to physical location. Another difference from block storage is that the value stored is not a fixed size.
Q. Could you explain more about how tx/s is increased with KV?
A. The increase in transfers/second occurs for two reasons: one is because the translation layer in the host from key/value to block storage is removed; the second is that the commands over the bus are reduced to a single transfer for the entire key value pair. The latency savings from this second reduction is less significant than the savings from removing translation operations that have to happen in the host.
Keep up-to-date on work SNIA is doing on the Key Value Storage API Specification at the SNIA website.
Nov 9, 2020
Oct 19, 2020
Leave a Reply