Blog

Understanding the NVMe Key-Value Standard

Understanding the NVMe Key-Value Standard

Aug 17, 2020

The storage industry has many applications that rely on storing data as objects. In fact, it’s the most popular way that unstructured data—for example photos, videos, and archived messages–is accessed. At the drive level, however, the devil is in the details. Normally, storage devices like drives or storage systems store information as blocks, not objects. This means that there is some translation that goes on between the data as it is ingested or consumed (i.e., objects) and the data that is stored (i.e., blocks). Naturally, storing objects from applications as objects on storage would be more efficient and means that there are performance boosts, and simplicity means that there are fewer things that can go wrong. Moving towards storing key value pairs that get away from the traditional block storage paradigm makes it easier and simpler to access objects. But nobody wants a marketplace where each storage vendor has their own key value API. Both the NVM Express

group and SNIA have done quite a bit of work in standardizing this approach:

NVM Express has completed standardization of the Key Value Command Set
SNIA has standardized a Key Value API
Spoiler alert: these two work very well together!

What does this mean? And why should you care? Find out on September 1, 2020 at our live SNIA Networking Storage Forum webcast, “The Key to Value: Understanding the NVMe Key-Value Standard” when Bill Martin, SNIA Technical Council Co-Chair, will discuss the benefits of Key Value storage, present the major features of the NVMe-KV Command Set and how it interacts with the NVMe standards. He will also cover the SNIA KV-API and open source work that is available to take advantage of Key Value storage and discuss:

How this approach is different than traditional block-based storage
Why doing this makes sense for certain types of data (and, of course, why doing this may not make sense for certain types of data)
How this simplifies the storage stack
Who should care about this, why they should care about this, and whether or not you are in that group

The event is live and with time permitting, Bill will be ready to answer questions on the spot. Register today to join us on September 1^st.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Key Value Storage NVMe Object Storage Storage

Blog

Compression Puts the Squeeze on Storage

Compression Puts the Squeeze on Storage

Ilker Cebeli

Aug 12, 2020

On September 2, 2020, the SNIA Networking Storage Forum will specifically focus on data compression in our live webcast, “Compression: Putting the Squeeze on Storage.” Compression can be done at different times, at different stages in the storage process, and using different techniques. We’ll discuss:

Where compression can be done: at the client, on the network, on the storage controller, or within the storage devices
What types of data should be compressed
When to compress: real-time compression vs. post-process compression
Different compression techniques
How compression affects performance

Join me and my SNIA colleagues, John Kim and Brian Will, for this compact and informative webcast! We hope to see you on September 2^nd. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Compression

Blog

Compression Puts the Squeeze on Storage

Compression Puts the Squeeze on Storage

Ilker Cebeli

Aug 12, 2020

Everyone knows data volumes are exploding faster than IT budgets. And customers are increasingly moving to flash storage, which is faster and easier to use than hard drives, but still more expensive. To cope with this conundrum and squeeze more efficiency from storage, storage vendors and customers can turn to data reduction techniques such as compression, deduplication, thin provisioning and snapshots. On September 2, 2020, the SNIA Networking Storage Forum will specifically focus on data compression in our live webcast, “Compression: Putting the Squeeze on Storage.” Compression can be done at different times, at different stages in the storage process, and using different techniques. We’ll discuss:

Where compression can be done: at the client, on the network, on the storage controller, or within the storage devices
What types of data should be compressed
When to compress: real-time compression vs. post-process compression
Different compression techniques
How compression affects performance

Join me and my SNIA colleagues, John Kim and Brian Will, for this compact and informative webcast! We hope to see you on September 2^nd. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Compression Data Compression Storage

Blog

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Tom Friend

Aug 11, 2020

NVMe over Fabrics technology is gaining momentum and getting more traction in data centers, but there are three kinds of Ethernet based NVMe over Fabrics transports: iWARP, RoCEv2 and TCP.

How do we optimize NVMe over Fabrics performance with different Ethernet transports? That will be the discussion topic at our SNIA Networking Storage Forum Webcast, “Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors” on September 16, 2020.

Setting aside the considerations of network infrastructure, scalability, security requirements and complete solution stack, this webcast will explore the performance of different Ethernet-based transports for NVMe over Fabrics at the detailed benchmark level. We will show three key performance indicators: IOPs, Throughput, and Latency with different workloads including: Sequential Read/Write, Random Read/Write, 70%Read/30%Write, all with different data sizes. We will compare the result of three Ethernet based transports: iWARP, RoCEv2 and TCP.

Further, we will dig a little bit deeper to talk about the variables that impact the performance of different Ethernet transports. There are a lot of variables that you can tune, but these variables will impact the performance of each transport differently. We will cover the variables including:

How many CPU cores are needed (I’m willing to give)?
Optane SSD or 3D NAND SSD?
How deep should the Queue-Depth be?
Do I need to care about MTU?

This discussion won’t tell you which transport is the best. Instead we unfold the performance of each transport and tell you what it would take for each transport to get the best performance, so that you can make the best choice for your NVMe over Fabrics solutions.

I hope you will join us on September 16^th for this live session that is sure to be informative. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage NVMe-oF

Blog

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Tom Friend

Aug 11, 2020

NVMe over Fabrics technology is gaining momentum and getting more traction in data centers, but there are three kinds of Ethernet based NVMe over Fabrics transports: iWARP, RoCEv2 and TCP. How do we optimize NVMe over Fabrics performance with different Ethernet transports? That will be the discussion topic at our SNIA Networking Storage Forum Webcast, “Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors” on September 16, 2020. Setting aside the considerations of network infrastructure, scalability, security requirements and complete solution stack, this webcast will explore the performance of different Ethernet-based transports for NVMe over Fabrics at the detailed benchmark level. We will show three key performance indicators: IOPs, Throughput, and Latency with different workloads including: Sequential Read/Write, Random Read/Write, 70%Read/30%Write, all with different data sizes. We will compare the result of three Ethernet based transports: iWARP, RoCEv2 and TCP. Further, we will dig a little bit deeper to talk about the variables that impact the performance of different Ethernet transports. There are a lot of variables that you can tune, but these variables will impact the performance of each transport differently. We will cover the variables including:

How many CPU cores are needed (I’m willing to give)?
Optane SSD or 3D NAND SSD?
How deep should the Queue-Depth be?
Do I need to care about MTU?

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage NVMe-oF Storage Networking

Blog

Take 10 – Watch a Computational Storage Trilogy

Take 10 – Watch a Computational Storage Trilogy

Marty Foltyn

Jul 31, 2020

We’re all busy these days, and the thought of scheduling even more content to watch can be overwhelming. Great technical content – especially from the SNIA Educational Library – delivers what you need to know, but often it needs to be consumed in long chunks. Perhaps it’s time to shorten the content so you have more freedom to watch.

With the tremendous interest in computational storage, SNIA is on the forefront of standards development – and education. The SNIA Computational Storage Special Interest Group (CS SIG) has just produced a video trilogy – informative, packed with detail, and consumable in under 10 minutes!

What Is Computational Storage?, presented by Eli Tiomkin, SNIA CS SIG Chair, emphasizes the need for common language and definition of computational storage terms, and discusses four distinct examples of computational storage deployments. It serves as a great introduction to the other videos.

Advantages of Reducing Data Movement frames computational storage advantages into two categories: saving time and saving money. JB Baker, SNIA CS SIG member, dives into a data filtering computational storage service example and an analytics benchmark, explaining how tasks complete more quickly using less power and fewer CPU cycles.

Eli Tiomkin returns to complete the trilogy with Computational Storage: Edge Compute Deployment. He discusses how an edge computing future might look, and how computational storage operates in a cloud, edge node, and edge device environment.

Each video in the Educational Library also has a
downloadable PDF of the slides that also link to additional resources that you
can view at your leisure. The SNIA Compute, Memory, and Storage Initiative
will be producing more of these short videos in the coming months on
computational storage, persistent memory, and other topics.

Check out each video and download the PDF of the
slides! Happy watching!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Case Studies Computational Storage Standards

Blog

Data Reduction: Don’t Be Too Proud to Ask

Data Reduction: Don’t Be Too Proud to Ask

John Kim

Jul 29, 2020

Everyone knows data volumes are growing rapidly (25-35% per year according to many analysts), far faster than IT budgets, which are constrained to flat or minimal annual growth rates. One of the drivers of such rapid data growth is storing multiple copies of the same data. Developers copy data for testing and analysis. Users email and store multiple copies of the same files. Administrators typically back up the same data over and over, often with minimal to no changes.

To avoid a budget crisis and paying more than once to store the same data, storage vendors and customers can use data reduction techniques such as deduplication, compression, thin provisioning, clones, and snapshots.

On August 18th, our live webcast “Everything You Wanted to Know about Storage but Were Too Proud to Ask – Part Onyx” will focus on the fundamentals of data reduction, which can be performed in different places and at different stages of the data lifecycle. Like most technologies, there are related means to do this, but with enough differences to cause confusion. For that reason, we’re going to be looking at:

How companies end up with so many copies of the same data
Difference between deduplication and compression – when should you use one vs. the other?
Where and when to reduce data: application-level, networked storage, backups, and during data movement. Is it best done at the client, the server, the storage, the network, or the backup?
What are snapshots, clones, and thin provisioning, and how can they help?
When to collapse the copies: real-time vs. post-process deduplication
Performance considerations

Register today for this efficient and educational webcast, which will cover valuable concepts with minimal repetition or redundancy!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Compression

Blog

Data Reduction: Don’t Be Too Proud to Ask

Data Reduction: Don’t Be Too Proud to Ask

John Kim

Jul 29, 2020

It’s back! Our SNIA Networking Storage Forum (NSF) webcast series “Everything You Wanted to Know About Storage but Were Too Proud to Ask” will return on August 18, 2020. After a little hiatus, we are going to tackle the topic of data reduction. Everyone knows data volumes are growing rapidly (25-35% per year according to many analysts), far faster than IT budgets, which are constrained to flat or minimal annual growth rates. One of the drivers of such rapid data growth is storing multiple copies of the same data. Developers copy data for testing and analysis. Users email and store multiple copies of the same files. Administrators typically back up the same data over and over, often with minimal to no changes. To avoid a budget crisis and paying more than once to store the same data, storage vendors and customers can use data reduction techniques such as deduplication, compression, thin provisioning, clones, and snapshots. On August 18th, our live webcast “Everything You Wanted to Know about Storage but Were Too Proud to Ask – Part Onyx” will focus on the fundamentals of data reduction, which can be performed in different places and at different stages of the data lifecycle. Like most technologies, there are related means to do this, but with enough differences to cause confusion. For that reason, we’re going to be looking at:

How companies end up with so many copies of the same data
Difference between deduplication and compression – when should you use one vs. the other?
Where and when to reduce data: application-level, networked storage, backups, and during data movement. Is it best done at the client, the server, the storage, the network, or the backup?
What are snapshots, clones, and thin provisioning, and how can they help?
When to collapse the copies: real-time vs. post-process deduplication
Performance considerations

Register today for this efficient and educational webcast, which will cover valuable concepts with minimal repetition or redundancy!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Compression Data Deduplication Data Compression

Blog

Think your Backup is Your Archive? Think Again!

Think your Backup is Your Archive? Think Again!

Mounir Elmously

Jul 28, 2020

The challenges of archiving structured and unstructured data

Traditionally, organizations had two electronic storage technologies: disk and tape. Whilst disk became the primary storage media, tape offered a cost-effective media to store infrequently accessed contents.

This led organizations to consider tape as not just a backup media but as the organization’s archive which then resulted in using monthly full system backups over extended durations to support archiving requirements.

Over time, legislative and regulatory bodies began to accept extended time delays for inquiries and investigations caused by tape restore limitations.

Since the beginning of this century, the following trends have impacted the IT industry:

Single disk drive capacity has grown exponentially to multi-TB delivering cost effective performance levels.
The exponential growth of unstructured data due to the introduction of social media networks, Internet of Things, etc. have exceeded all planned growth.
The introduction of cloud storage (storage as a service) that offer an easy way to acquire storage services with incremental investment that fits any organization’s financial planning at virtually infinite scalability.

All the above have contributed to unprecedented growth of unstructured data that is straining all organizations’ IT budgets, which is not offset by the declining storage costs. As of today, all organizations are experiencing double digits and, in some instances, triple digits unstructured storage growth year over year. The response to this challenge is often to simply buy or license more capacity to accommodate growing unstructured data, however, this also creates an additional burden on IT to protect it.

What are some of the typical scenarios seen by business today?

With the dynamic nature of IT industry, the employee churn rate is impacting IT organizations. Typically, when an employee moves on there is no process to retire their personal folders. Doing a simple scan can often reveal a large number of files that have not been accessed or used for fifteen or twenty years, but they continue consuming precious storage, backup license, and administrative time and money. Even when the time comes for a storage infrastructure refresh, these unused files become an integral part of the migration process simply since there is no decision process to retire data on storage.
In many cases some of the data may need to be kept for many years in support of industry compliance (e.g. healthcare records, banking records, checks, etc.), this complicates the process to retire any data even further.
Even with low-cost per GB storage provided by cloud providers, unjustified storage of non-business-related data can drive the storage cost higher if unmanaged and will be compounded by the need for multi-region and/or multi-zone data repositories in support of resilient access to stored data.

Industry response to managing unstructured data

Unstructured data growth is recognized across end user organizations, storage vendors and cloud providers alike. The response varies from offering extremely low-cost, virtually infinitely scalable storage platforms (e.g. object storage) or cloud object storage as a service with multi-tier pricing, coupled with varying performance and protection capabilities.

Tape vendors have also jumped on the bandwagon trying to revive tape as an even lower cost per GB archive storage, which is analogous to the historical usage of tape as an archive.

Can tape backup be the ultimate archive platform?

LTO (Linear Tape Open or Ultrium) standards were initially released in early 1990 by a consortium of storage vendors as a replacement for DLT (Digital Linear tape technology) which was owned by a single vendor.
Since 1990 LTO has undergone multiple generations, the current generation is LTO 8 (at 24TB per cartridge native capacity and up to 60TB compressed) with an approximate price of $100 per cartridge (0.0001 cents per GB) with infinite scalability since tapes can be removed after writing and kept on the shelf as archive.
According to planned LTO roadmap, LTO 9 will be shipping Q4 2020.
Since its inception, the LTO roadmap was to maintain one generation backward compatibility (e.g. LTO 8 can read and write to LTO 7).

Hence, IT shops need to plan tape migration around every five years just to maintain access to archive contents. Failure to refresh tape technology beyond two generations may result in inability to use the archive.

With tape solutions such as LTO 8 reaching 24TB per cartridge native capacity, with an approximate price of $100 per cartridge, it is clear that tape wins the price war hands down. However, there is a general trend to moving away from tape since it comes with the following burdens:

The only way to write to tape is through backup software. Backup software regularly undergoes release updates similar to LTO media, to benefit from new processors, newer releases of operating systems, databases, newer libraries and new capabilities, and backup vendors maintain limited backward compatibility.
There is no simple way to use data stored on tape since it is not in a usable format by applications or search engines until it is restored by the same backup mechanism that wrote it in the first place.
There is always an extensive manual process to manage tapes which is prone to human errors and potential restore challenges.
The impact of bit decay (bit rot): Like all forms of energy storage, stored bits experience gradual decay over time. A single bit decay is simply addressed by disk controllers and Cyclic Redundancy Check (CRC) on tapes but as the media ages to 15 years or more multiple bit errors may be experienced by the tape media that in many cases may render tape data unrestorable.

Based on the above challenges, tape is nearly always disqualified as a retention media, and the problem becomes more relevant with data retention periods exceeding 15 years.

Additionally, the challenge of managing billions of unstructured files by simple filesystem structures is a daunting task that will strain IT resources increasing the storage TCO exponentially.

Is the issue the same with structured data growth?

Structured data (e.g. ERP applications, HR, databases, etc.) is not experiencing similar growth rates compared to unstructured data, however it is still subject to the following:

Compliance mandates: almost all ERP systems contain data that is subject to one or more compliance requirements (e.g. financial transactions, HR records, etc.).
There is no simple way to retire or archive structured data since there is no granular way to deal with records within a database.
Merger and acquisitions: there is always a need to consolidate, migrate databases and ERP systems, and while this may help control the spiraling cost of structured data storage, because of compliance and the need to access old financial records, organizations end up needing to keep the retired ERP system alive just to fulfill access to historical financial records. This comes with hefty cost of ERP and database licensing and the need to run the aged ERP system on systems that are out of maintenance support with the potential of permanent data loss due to stopping the backup process.
Without an effective process to control, retire, and access aged records within the ERP systems, databases will continue to grow year over year, which in turn impacts the overall system performance and dramatically escalates ERP license cost along with backup licensing and disaster recovery planning.
Coupled with storage growth, failure to retire ERP systems’ data in line with its regulatory mandates introduces severe compliance gaps and may result in a financial penalty in addition to unnecessary searches in response to an outdated investigation or data query.
Almost all IT shops have experienced ERP refresh where the newly deployed ERP system is not backward compatible with the retired one, however access to compliance records within forces the organization to keep the retired application running until end of retention date, which in turn multiplies issues with managing the storage growth and licenses.

Specific Recommendations:

Organizations should consider the following:

Create a corporate governance practice to enforce and manage data retention rules. Ideally, this practice should be sponsored by the CIO and must comprise representation of all divisions subject to compliance (e.g. legal, HR, finance, etc.) in addition to IT.
Enforce data retention and retirement policies in strict alignment with regulations and/or corporate governance.
Create a process and automated workflow to migrate data from live production to an archive platform and from archive to purge based on defined retention values with consideration to access frequency and critical nature of contents.
Separate between structured and unstructured data archiving and challenge their incumbent vendors about structured data archiving technology capability.
Investigate cost effective storage technologies that offer minimally managed, self-healing, resilient storage platforms where available.
Consider tape as a viable option while tape continues to offer a cost-effective data storage medium, however many processes and strict rules will be required to avoid the well-known tape pitfalls mentioned previously.

About the SNIA Data Protection & Privacy Committee

The SNIA Data Protection & Privacy Committee (DPPC) exists to further the awareness and adoption of data protection technology, and to provide education, best practices and technology guidance on all matters related to the protection and privacy of data.

Within SNIA, data protection is defined as the assurance that data is usable and accessible for authorized purposes only, with acceptable performance and in compliance with applicable requirements. The technology behind data protection will remain a primary focus for the DPPC. However, there is now also a wider context which is being driven by increasing legislation to keep personal data private. The term data protection also extends into areas of resilience to cyber attacks and to threat management. To join the DPPC, login to SNIA and request to join the DPPC Governing Committee. Not a SNIA member? Learn all the benefits of SNIA membership.

Mounir Elmously, Governing Committee Member, SNIA Data Protection & Privacy Committee and Executive, Advisory Services Ernst & Young, LLP

Why did I join SNIA DPPC?

With my long history with storage and data protection technologies, along with my current job roles and responsibilities, I can bring my expertise to influence the storage industry technology education and drive industry awareness. During my days with storage vendors, I did not have the freedom to critique specific storage technologies or products. With SNIA I enjoy the freedom of independence to critique and use my knowledge and expertise to help others improve their understanding.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Protection Disaster Recovery

Blog

A Q&A on Protecting Data-at-Rest

A Q&A on Protecting Data-at-Rest

Steve Vanderlinden

Jul 28, 2020

One of the most important aspects of security is how to protect the data that is just “sitting there” called data-at-rest. There are many requirements for securing data-at-rest and they were discussed in detail at our SNIA Networking Storage Forum (NSF) webcast Storage Networking Security: Protecting Data-at-Rest. If you missed the live event, you can watch it on-demand and access the presentation slides here. As we promised during the webcast, here are our experts’ answers to the questions from this presentation: Q. If data is encrypted at rest, is it still vulnerable to ransomware attacks? A. Yes, encrypted data is still vulnerable to ransomware attacks as the attack would simply re-encrypt the encrypted data with a key known only to the attacker. Q. The data at rest is best implemented at the storage device. The Media Encryption Key (MEK) is located in the devices per the Trusted Computing Group (TCG) spec. NIST requires the MEK to be sanitized before decommissioning the devices. But devices do fail, because of a 3-5 year life span. Would it be better to manage the MEK in the Key Management System (KMS) or Hardware Security Module (HSM) in cloud/enterprise storage? A. For a higher level of protection including against physical attacks, a dedicated hardware security module (HSM) at the controller head would be preferable. It’s unlikely to find the same level of security in an individual storage device like a hard drive or SSD. Q. What is your take on the TCG’s “Key per I/O” work that is ongoing in the storage workgroup? A. It’s for virtual systems where many different users need to share common resources like storage. This design only covers one aspect of that security. We’re interested in the opinions of those who see a bigger picture of security. Q. Most “Opal” drives have encryption circuits based on AES-256. How secure is that method? A. Anything using 256-bit encryption is going to offer a high degree of security, regardless of Opal. Opal provides additional benefits to Self-Encrypting Drives (SEDs) by offering features such as “Locking Ranges” where a different Media Encryption Key can be used for a contiguous (Logical Block Address) LBAs, and each range can be unlocked independently of the others. These LBAs can then also be independently cryptographically erased. Q. What is your opinion on SEDCLI? A. As a definition, ‘sedcli’ is an Open Source utility for managing NVMe SEDs that are TCG Opal complaint.It is a new proposal to manage keys in datacenter usage. It enables auto provisioning, hot insert, and multiple key management. So, the use of sedcli will be critical for the management of Opal-compliant drives. Q. What is the standard on Secured Erase? A. NIST-SP 800-88 has guidelines for media sanitization.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Encryption Data Protection Storage

Subscribe to

Understanding the NVMe Key-Value Standard

Find a similar article by tags

Leave a Reply

Compression Puts the Squeeze on Storage

Find a similar article by tags

Leave a Reply

Compression Puts the Squeeze on Storage

Find a similar article by tags

Leave a Reply

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Find a similar article by tags

Leave a Reply

Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

Find a similar article by tags

Leave a Reply

Take 10 – Watch a Computational Storage Trilogy

Find a similar article by tags

Leave a Reply

Data Reduction: Don’t Be Too Proud to Ask

Find a similar article by tags

Leave a Reply

Data Reduction: Don’t Be Too Proud to Ask

Find a similar article by tags

Leave a Reply

Think your Backup is Your Archive? Think Again!

What are some of the typical scenarios seen by business today?

Industry response to managing unstructured data

Can tape backup be the ultimate archive platform?

Is the issue the same with structured data growth?

Specific Recommendations:

About the SNIA Data Protection & Privacy Committee

Why did I join SNIA DPPC?

Find a similar article by tags

Leave a Reply

A Q&A on Protecting Data-at-Rest

Find a similar article by tags

Leave a Reply