Blog

Computational Storage – Driving Success, Driving Standards Q&A

Computational Storage – Driving Success, Driving Standards Q&A

Feb 16, 2022

What's being done in SNIA to implement data protection (e.g. RAID) and CSDs? Can data be written/striped to CSDs in such a way that it can be computed on within the drive?

Bill Martin: The challenges of computation on a RAID system are outside the scope of the Computational Storage Architecture and Programming Model. The Model does not address data protection in that it does not specify how data is written nor how computation is done on the data. Section 3 of the Model discusses the Computational Storage Array (CSA), a storage array that is able to execute one or more Computational Storage Functions (CSFs). As a storage array, a CSA contains control software, which provides virtualization to storage services, storage devices, and Computational Storage Resources for the purpose of aggregating, hiding complexity, or adding new capabilities to lower level storage resources. The Computational Storage Resources in the CSA may be centrally located or distributed across CSDs/CSPs within the array.

When will Version 1.0 of the Computational Storage Architecture and Programming Model be available and when is operating system support expected?

Bill Martin: We expect Version 1.0 of the model to be available Q2 2022. The Model is agnostic with regard to operating systems, but we anticipate a publicly available API library for Computational Storage over NVMe.

Will Computational Storage library support CXL accelerators as well? How is the collaboration between these two technology consortiums?

Jason Molgaard: The Computational Storage Architecture and Programming Model is agnostic to the device interface protocol. Computational Storage can work with CXL. SNIA currently has an alliance agreement in place with the CXL Consortium and will interface with that group to help enable the CXL interface with Computational Storage. We anticipate there will be technical work to develop a computational storage library utilizing the CS API that will support CXL in the future.

System memory is required for PCIe/NVMe SSD. How does computational storage bypass system memory?

Bill Martin: The computational storage architecture relies on computation using memory that is local to the Computational Storage Device (CSx).Section B.2.4 of the Model describes the topic of Function Data Memory (FDM) on the CSx and the movement of data from media to FDM and back. Note that a device does not need to access system memory for computation – only to read and write data. Figure B.2.8 from the Model illustrates CSx usage.

Diagram

Description automatically generated

Is this CS API Library vendor specific, or is this a generic library which could also be provided for example by an operating system vendor?

Bill Martin: The Computational Storage API is not a library, it is a generic interface definition. It describes the software application interface definitions for a Computational Storage device (CSx).There will be a generic library for a given protocol layer, but there may also be vendor specific additions to that generic library for vendor specific CSx enhancements beyond the standard protocol definition.

Are there additional use cases out there? Where could I see them and get more information?

Jason Molgaard: Section B.2.5 of the Computational Storage Architecture and Programming Model provides an example of application deployment. The API specification will have a library that could be used and/or modified for a specific device. If the CSx does not support everything in NVMe, an individual could write a vendor specific library that supports some host activity.

There are a lot of acronyms and terms used in the discussion. Is there a place where they are defined?

Jason Molgaard: Besides the Model and the API, which provide the definitive definition of the terms and acronyms, there are some great resources. Recent presentations at the SNIA Storage Developer Conference on Computational Storage Moving Forward with an Architecture and API and Computational Storage APIs provide a broad view of how the specifications affect the growing industry computational storage efforts. Additional videos and presentations are available in the SNIA Educational Library, search for “Computational Storage”.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage Standards

Blog

Using SNIA Swordfish™ to Manage Storage on Your Network

Using SNIA Swordfish™ to Manage Storage on Your Network

Barry Kittner

Feb 16, 2022

Now consider how a storage administrator chooses to add storage capacity to a datacenter. There are so many ways to do it: Add one or more physical drives to a single server; add additional storage nodes to a software-defined storage cluster; add additional storage to a dedicated storage network device that provides storage to be used by other (data) servers.

These options all require consideration as to the data protection methods utilized such as RAID or Erasure Coding, and the performance expectations these entail. Complicating matters further are the many different devices and standards to choose from, including traditional spinning HDDs, SSDs, Flash memory, optical drives, and Persistent Memory.

Each storage instance can also be deployed as file, block, or object storage which can affect performance. Selection of the communication protocol such as iSCSI and FC/FCoE can limit scalability options. And finally, with some vendors adding the requirement of using their management paradigm to control these assets, it’s easy to see how these choices can be daunting.

But… it doesn’t need to be so complicated!

The Storage Network Industry Association (SNIA) has a mission to lead the storage industry in developing and promoting vendor-neutral architectures, standards and education services that facilitate the efficient management, movement and security of information. To that end, the organization created SNIA Swordfish^™, a specification that provides a unified approach for the management of storage and servers in hyperscale and cloud infrastructure environments.

Swordfish is an API specification that defines a simplified model that is client-oriented, designed to integrate with the technologies used in cloud data center environments and can be used to accomplish a broad range of simple-to-advanced storage management tasks. These tasks focus on what IT administrators need to do with storage equipment and storage services in a data center. As a result, the API provides functionality that simplifies the way storage can be allocated, monitored, and managed, making it easier for IT administrators to integrate scalable solutions into their data centers.

SNIA Swordfish can provide a stand-alone solution, or act as an extension to the DMTF Redfish® specification, using the same easy-to-use RESTful interface and JavaScript Object Notation (JSON) to seamlessly manage storage equipment and storage services.

REST stands for REpresentational State Transfer. We won’t discuss REST architecture in this article, but we use it to show how complex tasks are simplified. A REST API allows an administrator to retrieve information from, or perform a function on, a computer system. Although the syntax can be challenging, most of the requests and responses are based on JSON, which enables requests in plain language so you can read and understand the messages to determine the state of your networked devices. This article assumes we are not programmers creating object code, rather, administrators that need tools to monitor their network.

To examine a network in a REST/JSON environment, you simply start with a browser. The easiest starting point is to show via an example or a “mockup.” Swordfish is a hypermedia API, which allows access to resources via URLs returned from other APIs. The schema for URLs consist of a node (example: www.snia.org, or an IP address: 127.0.0.1) and a resource identifier (redfish/v1/storage). Hence the starting point, referred to as the ‘service root’ will look like: HTTP://127.0.0.0/redfish/v1/storage.

Redfish objects are mainly ‘systems’ (typically servers), Managers (typically a BMC or enclosure manager), and Chassis (physical components and infrastructure). Swordfish adds another: Storage. These are all collections, which all have properties, and all properties have a name and ID, Actions and Oem. Actions inform the user which actions can be performed and Oem contains vendor-specific extensions.

Let’s look at two brief examples of how Swordfish is used.

Here is the response to a query of objects and properties in a standalone Swordfish installation:

Ignoring the extra characters that are part of REST syntax, the information is easier to read and understand when compared to object code. We can also see the designated Servers, Managers and Chassis with the network paths for each.

Of course, a network diagram is more complicated than a single storage installation, so it is represented by a tree diagram:

Within the context of the network tree, using a simple ‘Get’ command, we can determine the capacity of our target storage device:

GET /redfish/v1/Systems/Sys-1/Storage/NVMeSSD-EG/Volumes/Namespace1

The above command returns all the properties for the selected volume, including the capacity:

{

“@odata.id”: “/redfish/v1/Systems/Sys-1/Storage/NVMeSSD-EG/Volumes/Namespace1”,

“@odata.type”: “#Volume.v1_5_0.Volume”,

“Id”: “1”,

“Name”: “Namespace 1”,

“LogicalUnitNumber”: 1,

“Capacity”: {

“Data”: {

“ConsumedBytes”: 0,

“AllocatedBytes”: 10737418240,

“ProvisionedBytes”: 10737418240

}

…

}

The easy-to-execute query can be done directly from a web browser and returns data that is simple, readable, and informational about our target.

The author is not a programmer; many reading this are not either. But as you can see from the example above, being a programmer is not necessary to successfully use a Swordfish storage management interface.

There is, of course, much more that can be done with Swordfish and REST; the intent of this short article was to show how adding storage and monitoring it can be easily done in a network running Swordfish without being a programmer. Many of the queries (like the one shown above) are already available so you don’t have to create them from scratch.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

DMTF Swordfish Standards Storage Storage

Blog

Computational Storage: Driving Success, Driving Standards Q and A

Computational Storage: Driving Success, Driving Standards Q and A

SNIA CMS Community

Feb 16, 2022

Our recent SNIA Compute, Memory, and Storage Initiative (CMSI) webcast, Computational Storage – Driving Success, Driving Standards, explained the key elements of the SNIA Computational Storage Architecture and Programming Model and the SNIA Computational Storage API . If you missed the live event, you can watch on-demand and view the presentation slides. Ouraudience asked a number of questions, and Bill Martin, Editor of the Model, and Jason Molgaard, Co-Chair of the SNIA Computational Storage Technical Work Group, teamed up to answer them. What’s being done in SNIA to implement data protection (e.g. RAID) and CSDs? Can data be written/striped to CSDs in such a way that it can be computed on within the drive? Bill Martin: The challenges of computation on a RAID system are outside the scope of the Computational Storage Architecture and Programming Model. The Model does not address data protection in that it does not specify how data is written nor how computation is done on the data. Section 3 of the Model discusses the Computational Storage Array (CSA), a storage array that is able to execute one or more Computational Storage Functions (CSFs). As a storage array, a CSA contains control software, which provides virtualization to storage services, storage devices, and Computational Storage Resources for the purpose of aggregating, hiding complexity, or adding new capabilities to lower level storage resources. The Computational Storage Resources in the CSA may be centrally located or distributed across CSDs/CSPs within the array. When will Version 1.0 of the Computational Storage Architecture and Programming Model be available and when is operating system support expected? Bill Martin: We expect Version 1.0 of the model to be available Q2 2022. The Model is agnostic with regard to operating systems, but we anticipate a publicly available API library for Computational Storage over NVMe. Will Computational Storage library support CXL accelerators as well? How is the collaboration between these two technology consortiums? Jason Molgaard: The Computational Storage Architecture and Programming Model is agnostic to the device interface protocol. Computational Storage can work with CXL. SNIA currently has an alliance agreement in place with the CXL Consortium and will interface with that group to help enable the CXL interface with Computational Storage. We anticipate there will be technical work to develop a computational storage library utilizing the CS API that will support CXL in the future. System memory is required for PCIe/NVMe SSD. How does computational storage bypass system memory? Bill Martin: The computational storage architecture relies on computation using memory that is local to the Computational Storage Device (CSx).Section B.2.4 of the Model describes the topic of Function Data Memory (FDM) on the CSx and the movement of data from media to FDM and back. Note that a device does not need to access system memory for computation – only to read and write data. Figure B.2.8 from the Model illustrates CSx usage.

Is this CS API Library vendor specific, or is this a generic library which could also be provided for example by an operating system vendor? Bill Martin: The Computational Storage API is not a library, it is a generic interface definition. It describes the software application interface definitions for a Computational Storage device (CSx).There will be a generic library for a given protocol layer, but there may also be vendor specific additions to that generic library for vendor specific CSx enhancements beyond the standard protocol definition. Are there additional use cases out there? Where could I see them and get more information? Jason Molgaard: Section B.2.5 of the Computational Storage Architecture and Programming Model provides an example of application deployment. The API specification will have a library that could be used and/or modified for a specific device. If the CSx does not support everything in NVMe, an individual could write a vendor specific library that supports some host activity. There are a lot of acronyms and terms used in the discussion. Is there a place where they are defined? Jason Molgaard: Besides the Model and the API, which provide the definitive definition of the terms and acronyms, there are some great resources. Recent presentations at the SNIA Storage Developer Conference on Computational Storage Moving Forward with an Architecture and API and Computational Storage APIs provide a broad view of how the specifications affect the growing industry computational storage efforts. Additional videos and presentations are available in the SNIA Educational Library, search for “Computational Storage”. The post Computational Storage – Driving Success, Driving Standards Q&A first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage Standards

Blog

Using SNIA Swordfish™ to Manage Storage on Your Network

Using SNIA Swordfish™ to Manage Storage on Your Network

Barry Kittner

Feb 16, 2022

Consider how we charge our phones: we can plug them into a computer’s USB port, into a wall outlet using a power adapter, or into an external/portable power bank. We can even place them on top of a Qi-enabled pad for wireless charging. None of these options are complicated, but we routinely charge our phones throughout the day and, thanks to USB and standardized charging interfaces, our decision boils down to what is available and convenient. Now consider how a storage administrator chooses to add storage capacity to a datacenter. There are so many ways to do it: Add one or more physical drives to a single server; add additional storage nodes to a software-defined storage cluster; add additional storage to a dedicated storage network device that provides storage to be used by other (data) servers. These options all require consideration as to the data protection methods utilized such as RAID or Erasure Coding, and the performance expectations these entail. Complicating matters further are the many different devices and standards to choose from, including traditional spinning HDDs, SSDs, Flash memory, optical drives, and Persistent Memory. Each storage instance can also be deployed as file, block, or object storage which can affect performance. Selection of the communication protocol such as iSCSI and FC/FCoE can limit scalability options. And finally, with some vendors adding the requirement of using their management paradigm to control these assets, it’s easy to see how these choices can be daunting. But… it doesn’t need to be so complicated! The Storage Network Industry Association (SNIA) has a mission to lead the storage industry in developing and promoting vendor-neutral architectures, standards and education services that facilitate the efficient management, movement and security of information. To that end, the organization created SNIA Swordfish, a specification that provides a unified approach for the management of storage and servers in hyperscale and cloud infrastructure environments. Swordfish is an API specification that defines a simplified model that is client-oriented, designed to integrate with the technologies used in cloud data center environments and can be used to accomplish a broad range of simple-to-advanced storage management tasks. These tasks focus on what IT administrators need to do with storage equipment and storage services in a data center. As a result, the API provides functionality that simplifies the way storage can be allocated, monitored, and managed, making it easier for IT administrators to integrate scalable solutions into their data centers. SNIA Swordfish can provide a stand-alone solution, or act as an extension to the DMTF Redfish® specification, using the same easy-to-use RESTful interface and JavaScript Object Notation (JSON) to seamlessly manage storage equipment and storage services. REST stands for REpresentational State Transfer. We won’t discuss REST architecture in this article, but we use it to show how complex tasks are simplified. A REST API allows an administrator to retrieve information from, or perform a function on, a computer system. Although the syntax can be challenging, most of the requests and responses are based on JSON, which enables requests in plain language so you can read and understand the messages to determine the state of your networked devices. This article assumes we are not programmers creating object code, rather, administrators that need tools to monitor their network. To examine a network in a REST/JSON environment, you simply start with a browser. The easiest starting point is to show via an example or a “mockup.” Swordfish is a hypermedia API, which allows access to resources via URLs returned from other APIs. The schema for URLs consist of a node (example: www.snia.org, or an IP address: 127.0.0.1) and a resource identifier (redfish/v1/storage). Hence the starting point, referred to as the ‘service root’ will look like: HTTP://127.0.0.0/redfish/v1/storage. Redfish objects are mainly ‘systems’ (typically servers), Managers (typically a BMC or enclosure manager), and Chassis (physical components and infrastructure). Swordfish adds another: Storage. These are all collections, which all have properties, and all properties have a name and ID, Actions and Oem. Actions inform the user which actions can be performed and Oem contains vendor-specific extensions. Let’s look at two brief examples of how Swordfish is used. Here is the response to a query of objects and properties in a standalone Swordfish installation: Ignoring the extra characters that are part of REST syntax, the information is easier to read and understand when compared to object code. We can also see the designated Servers, Managers and Chassis with the network paths for each. Of course, a network diagram is more complicated than a single storage installation, so it is represented by a tree diagram: Network Tree Diagram of how SNIA Swordfish is Used

Network Tree Diagram of how SNIA Swordfish is Used

Within the context of the network tree, using a simple ‘Get’ command, we can determine the capacity of our target storage device: GET /redfish/v1/Systems/Sys-1/Storage/NVMeSSD-EG/Volumes/Namespace1 The above command returns all the properties for the selected volume, including the capacity: { “@Redfish.Copyright”: “Copyright 2014-2020 SNIA. All rights reserved.”, “@odata.id”: “/redfish/v1/Systems/Sys-1/Storage/NVMeSSD-EG/Volumes/Namespace1”, “@odata.type”: “#Volume.v1_5_0.Volume”, “Id”: “1”, “Name”: “Namespace 1”, “LogicalUnitNumber”: 1, “Capacity”: { “Data”: { “ConsumedBytes”: 0, “AllocatedBytes”: 10737418240, “ProvisionedBytes”: 10737418240 } }, … } The easy-to-execute query can be done directly from a web browser and returns data that is simple, readable, and informational about our target. The author is not a programmer; many reading this are not either. But as you can see from the example above, being a programmer is not necessary to successfully use a Swordfish storage management interface. There is, of course, much more that can be done with Swordfish and REST; the intent of this short article was to show how adding storage and monitoring it can be easily done in a network running Swordfish without being a programmer. Many of the queries (like the one shown above) are already available so you don’t have to create them from scratch.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

DMTF Swordfish Standards Storage Storage

Blog

Multi-cloud Use Has Become the Norm

Multi-cloud Use Has Become the Norm

Alex McDonald

Feb 15, 2022

Multiple clouds within an organization have become the norm. This strategy enables organizations to reduce risk and dependence on a single cloud platform. The SNIA Cloud Storage Technologies Initiative (CSTI) discussed this topic at length at our live webcast last month “Why Use Multiple Clouds?” We polled our webcast attendees on their use of multiple clouds and here’s what we learned about the cloud platforms that comprise their multi-cloud environments:

Our expert presenters, Mark Carlson and Gregory Touretsky, also discussed the benefits of a storage abstraction layer that insulates the application from the underlying cloud provider’s interfaces, something the SNIA Cloud Data Management Interface (CDMI

) 2.0 enables. Cost is always an issue with cloud. One of our session attendees asked: do you have an example of a cloud vendor who does not toll for egress? There may be a few vendors that don’t charge, but one we know of that is toll free on egress is Seagate’s Lyve Cloud; they only charge for used capacity. We were also challenged on the economics and increased cost due to the perceived complexity of multi-cloud specifically, security. While it’s true that there’s no standard security model for multi-cloud, there are 3^rd party security solutions that can simplify its management, something we covered in the webinar. If you missed this webinar, you can access it on-demand and get a copy of the presentation slides in the SNIA Educational Library

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

CDMI cloud Cloud Standards Cloud Storage Hybrid Cloud Multi-Cloud Multi-Cloud

Blog

Multi-cloud Use Has Become the Norm

Multi-cloud Use Has Become the Norm

Alex McDonald

Feb 15, 2022

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Uncategorized

Blog

Multi-cloud Use Has Become the Norm

Multi-cloud Use Has Become the Norm

Alex McDonald

Feb 15, 2022

We polled our webcast attendees on their use of multiple clouds and here’s what we learned about the cloud platforms that comprise their multi-cloud environments:

Cost is always an issue with cloud. One of our session attendees asked: do you have an example of a cloud vendor who does not toll for egress? There may be a few vendors that don’t charge, but one we know of that is toll free on egress is Seagate’s Lyve Cloud; they only charge for used capacity.

We were also challenged on the economics and increased cost due to the perceived complexity of multi-cloud specifically, security. While it’s true that there’s no standard security model for multi-cloud, there are 3^rd party security solutions that can simplify its management, something we covered in the webinar.

If you missed this webinar, you can access it on-demand and get a copy of the presentation slides in the SNIA Educational Library

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Blog

Object Storage: Got Questions?

Object Storage: Got Questions?

Christine McMonigal

Feb 14, 2022

Q: Today object storage allows many new capabilities but also new challenges, such as the need for geographic and local load balancers in a distributed scale out infrastructure that at the same time do not become the bottleneck of the object services at an unsustainable cost. Are there any solutions available today that have these features built in?

A: Some object storage solutions have features such as load balancing and geographic distribution built into the software, though often the storage administrator must manually configure parts of these features at the network and/or server level. Most object storage cloud (StaaS) implementations include a distributed, scale-out infrastructure (including load balancing) in their implementation.

Q: What's the approximate current market share of block vs. file vs. object storage deployed today? Where do you see this going in the next 5 years?

A: You can analyze this based on spending or capacity, since object storage typically costs less per terabyte than block or file storage. Including all private and public cloud storage worldwide, object storage probably makes up between 20-30% of the spending and between 40-60% of all storage capacity. If we look only at enterprise (not cloud) storage, then object storage probably constitutes 10-15% of spending and 20-30% of capacity.

Q: There was a comment at the start of the discussion where object storage is less performant than block/file which was clarified as a myth? Can you share some performance numbers for a given size of data?

A: On average, existing object storage is less performant than existing block/file storage because it is usually deployed on top of slower storage media, slower servers, and slower networks. But there is no reason object storage needs to be any slower than block/file storage for throughput and large I/O sizes. If deployed using fast infrastructure, the fastest object storage solutions run just as fast—in throughput terms—as the fastest block or file storage. However, in many cases, object storage may not be appropriate for highly-transactional small I/O workloads, which typically run on top of block or file storage.

Q: Do I need to transform to key value or can I just query S3?

A: You don’t query S3. To retrieve an object via S3 is simply an HTTP GET request which can be done from a browser. Many types of object storage support the S3 API, either natively or through translation, but there may be some types that require switching your applications to support a different key value storage API.

Q: Where does NVMe KV Command Set (in NVMe 2.0) sit in the S3 Amazon stack? How does it change the API structure?

A. The NVMe Key Value Command Set does not sit at the same level as the S3 API. The S3 API sits above protocols like the NVMe KV Command Set. The SNIA Key Value API allows a library to be written to the NVMe KV Command Set specification which is part of NVMe 2.0. Amazon S3 today supports use of key value pairs but does not currently employ the SNIA Key Value Storage API.

Q: Aren’t analytics on Object Storage slow and difficult? Have there been any changes in this area that make analytics faster?

A: This is one of the myths about object storage that we wanted to debunk in this webcast. Analytics on object storage is only slow if the storage itself is slow. It’s difficult only if the analytics tools or query cannot query object storage. While it is true that most traditional object storage deployed in the past ran on slower storage media (and connected with slower networks), there are now fast object storage solutions that can perform just as well as block or file storage solutions. In fact, some object storage software/service options include analytics capabilities built into the storage servers, and computational storage can include analytics capabilities within the drives themselves.

Q: For Kubernetes, if the client is the app why is CSI required (COSI)?

A: CSI provides an interface between the containerized app and persistent storage outside of the Kubernetes orchestrator. It allows storage vendors to support containerized applications.

Q: Is the entire KV database from a given S3 bucket being downloaded to the local drive?

A: AWS S3 sync can be used to synchronize an entire bucket to a local directory, but there are multiple ways to move data to and from AWS S3 to your local directories or other instance types.

Q: Given the volume, sensitivity, and the hybrid nature of data generation, location, and access -- does object storage include security/encryption/key management built into the solution deployments?

A: Some object storage products include encryption and key management. Others do encryption while integrating with an external key management solution. At a high level, any object storage solution should include support for encryption and other security features.

Q: Does object storage support compression and dedupe?

A: Most object storage solutions include the ability to support dedupe or single-instance storage (storing only one copy of identical objects if the same object is submitted multiple times). Some object storage solutions include support for compression performed within the storage service, but it’s more common for objects to be compressed by the application or client before being sent to the object storage system.

Q: Amazon's S3 in-the-cloud storage means saving in data ingress-egress, but losing on the Amazon CPU to perform the analysis in Amazon's cloud compute platform, doesn't it? Not understanding how data remains ""local."

A: If you’re comparing AWS S3 to on-premises local storage, whether it will be less expensive to run analytics using AWS or using your own on-prem servers depends on the scale, maturity, and efficiency of your in-house analytics. Typically, an IT department building a small or new analytics operation will find it less costly to use AWS cloud storage and cloud analytics. While a large IT organization running a scalable, mature and efficient analytics operation would find they can do so at a lower cost than outsourcing it to AWS. Whether on-prem or in the cloud, object storage solutions can typically scale out further in capacity, while supporting a customizable level of processing performance based on the user’s requirements.

Q: Cheap and deep describes Openstack Swift, which claims to be hardware agnostic (deploys on readily available commodity hardware) - then you have to add network bandwidth, CPU, SSD, etc, for what you want to do at speed that makes it cheaper in the long run to go for a purpose-built array and fabric. Why not stay client-server at the outset, with a fast array, fast processing and fast network?

A: For geographic location, data remains local if you store it in your local data center without replicating it to remote locations. For data analytics purposes, data is “local” if it’s stored in the same data center or on the same network segment as the analytics servers. When the data and the analytics servers are in different data centers and not connected by a high-bandwidth, low-latency network, then analytics performance may suffer. This is true for object or any other type of storage solution. If the data is stored in Amazon servers, there may be less control over where data remains.

Q: Does supporting the NVMe KV Command Set in NVMe SSD/HDDs improve the performance or latency when compared to standard NVM Command Set?

A: Using SSDs/HDDs which support the NVMe KV Command Set structure should improve performance and latency over using the standard NVM Command Set, if storing an object as a key value pair.

Q. Do SSDs need to support both Command sets or just one?

A. An SSD can support just NVMe Command Set, just the NVM Command Set or both. A namespace on an NVMe SSD is formatted for one or the other. To get the benefits of the NVMe KV Command Set, an SSD only needs to implement that command set.

Q. Are there any latest updates on the KV Command Set ecosystem in Linux?

A. The latest drivers for Linux are available on a public GitHub site at: https://github.com/OpenMPDK/KVSSD

Q: Computational storage with S3 SELECT: Usually, an object storage solution doesn't write objects to a single disk, there is some kind of erasure coding for data protection and probably some file system as an abstraction layer which the disk may not be aware of. Also, the data is usually encrypted. How would S3 SELECT be able to parse the original object data on a single drive?

A: Yes, most object storage solutions use erasure coding or a simple mirroring mechanism to ensure each object is stored in redundant locations, and yes, erasure coding usually splits up each object across multiple drives. A storage-side query such as AWS S3 Select runs a query on or near the object storage servers and returns a subset of the data to the client or requestor instead of returning the entire object to the requestor for the query. In this type of query, the object storage servers can decrypt encryption before executing the local query, if the encryption was done on the object server side. (If the encryption was done by the client before being sent to the object storage, then the queries would not be able to run at or on the storage servers.) The storage servers would also be able to reassemble an erasure-coded object locally to the storage servers to run the query, or possibly distribute and run the query on the multiple erasure coding destinations for that object.

Interested in more information on object storage? Check out the SNIA Educational Library.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Object Storage

Blog

Object Storage: Got Questions?

Object Storage: Got Questions?

Christine McMonigal

Feb 14, 2022

Over 900 people (and counting) have watched ourSNIA Networking Storage Forum (NSF) webcast, “Object Storage: Trends, Use Cases” where our expert panelist had a lively discussion on object storage characteristics, use cases and performance acceleration. If you have not seen this session yet, we encourage you to check it out on-demand. The conversation included several interesting questions related to object storage. As promised, here are answers to them: Q: Today object storage allows many new capabilities but also new challenges, such as the need for geographic and local load balancers in a distributed scale out infrastructure that at the same time do not become the bottleneck of the object services at an unsustainable cost. Are there any solutions available today that have these features built in? A: Some object storage solutions have features such as load balancing and geographic distribution built into the software, though often the storage administrator must manually configure parts of these features at the network and/or server level. Most object storage cloud (StaaS) implementations include a distributed, scale-out infrastructure (including load balancing) in their implementation. Q: What’s the approximate current market share of block vs. file vs. object storage deployed today? Where do you see this going in the next 5 years? A: You can analyze this based on spending or capacity, since object storage typically costs less per terabyte than block or file storage. Including all private and public cloud storage worldwide, object storage probably makes up between 20-30% of the spending and between 40-60% of all storage capacity. If we look only at enterprise (not cloud) storage, then object storage probably constitutes 10-15% of spending and 20-30% of capacity. Q: There was a comment at the start of the discussion where object storage is less performant than block/file which was clarified as a myth? Can you share some performance numbers for a given size of data? A: On average, existing object storage is less performant than existing block/file storage because it is usually deployed on top of slower storage media, slower servers, and slower networks. But there is no reason object storage needs to be any slower than block/file storage for throughput and large I/O sizes. If deployed using fast infrastructure, the fastest object storage solutions run just as fast—in throughput terms—as the fastest block or file storage. However, in many cases, object storage may not be appropriate for highly-transactional small I/O workloads, which typically run on top of block or file storage. Q: Do I need to transform to key value or can I just query S3? A: You don’t query S3. To retrieve an object via S3 is simply an HTTP GET request which can be done from a browser. Many types of object storage support the S3 API, either natively or through translation, but there may be some types that require switching your applications to support a different key value storage API. Q: Where does NVMe KV Command Set (in NVMe 2.0) sit in the S3 Amazon stack? How does it change the API structure? A. The NVMe Key Value Command Set does not sit at the same level as the S3 API. The S3 API sits above protocols like the NVMe KV Command Set. The SNIA Key Value API allows a library to be written to the NVMe KV Command Set specification which is part of NVMe 2.0. Amazon S3 today supports use of key value pairs but does not currently employ the SNIA Key Value Storage API. Q: Aren’t analytics on Object Storage slow and difficult? Have there been any changes in this area that make analytics faster? A: This is one of the myths about object storage that we wanted to debunk in this webcast. Analytics on object storage is only slow if the storage itself is slow. It’s difficult only if the analytics tools or query cannot query object storage. While it is true that most traditional object storage deployed in the past ran on slower storage media (and connected with slower networks), there are now fast object storage solutions that can perform just as well as block or file storage solutions. In fact, some object storage software/service options include analytics capabilities built into the storage servers, and computational storage can include analytics capabilities within the drives themselves. Q: For Kubernetes, if the client is the app why is CSI required (COSI)? A: CSI provides an interface between the containerized app and persistent storage outside of the Kubernetes orchestrator. It allows storage vendors to support containerized applications. Q: Is the entire KV database from a given S3 bucket being downloaded to the local drive? A: AWS S3 sync can be used to synchronize an entire bucket to a local directory, but there are multiple ways to move data to and from AWS S3 to your local directories or other instance types. Q: Given the volume, sensitivity, and the hybrid nature of data generation, location, and access — does object storage include security/encryption/key management built into the solution deployments? A: Some object storage products include encryption and key management. Others do encryption while integrating with an external key management solution. At a high level, any object storage solution should include support for encryption and other security features. Q: Does object storage support compression and dedupe? A: Most object storage solutions include the ability to support dedupe or single-instance storage (storing only one copy of identical objects if the same object is submitted multiple times). Some object storage solutions include support for compression performed within the storage service, but it’s more common for objects to be compressed by the application or client before being sent to the object storage system. Q: Amazon’s S3 in-the-cloud storage means saving in data ingress-egress, but losing on the Amazon CPU to perform the analysis in Amazon’s cloud compute platform, doesn’t it? Not understanding how data remains “”local.” A: If you’re comparing AWS S3 to on-premises local storage, whether it will be less expensive to run analytics using AWS or using your own on-prem servers depends on the scale, maturity, and efficiency of your in-house analytics. Typically, an IT department building a small or new analytics operation will find it less costly to use AWS cloud storage and cloud analytics. While a large IT organization running a scalable, mature and efficient analytics operation would find they can do so at a lower cost than outsourcing it to AWS. Whether on-prem or in the cloud, object storage solutions can typically scale out further in capacity, while supporting a customizable level of processing performance based on the user’s requirements. Q: Cheap and deep describes Openstack Swift, which claims to be hardware agnostic (deploys on readily available commodity hardware) – then you have to add network bandwidth, CPU, SSD, etc, for what you want to do at speed that makes it cheaper in the long run to go for a purpose-built array and fabric. Why not stay client-server at the outset, with a fast array, fast processing and fast network? A: For geographic location, data remains local if you store it in your local data center without replicating it to remote locations. For data analytics purposes, data is “local” if it’s stored in the same data center or on the same network segment as the analytics servers. When the data and the analytics servers are in different data centers and not connected by a high-bandwidth, low-latency network, then analytics performance may suffer. This is true for object or any other type of storage solution. If the data is stored in Amazon servers, there may be less control over where data remains. Q: Does supporting the NVMe KV Command Set in NVMe SSD/HDDs improve the performance or latency when compared to standard NVM Command Set? A: Using SSDs/HDDs which support the NVMe KV Command Set structure should improve performance and latency over using the standard NVM Command Set, if storing an object as a key value pair. Q. Do SSDs need to support both Command sets or just one? A. An SSD can support just NVMe Command Set, just the NVM Command Set or both. A namespace on an NVMe SSD is formatted for one or the other. To get the benefits of the NVMe KV Command Set, an SSD only needs to implement that command set. Q. Are there any latest updates on the KV Command Set ecosystem in Linux? A. The latest drivers for Linux are available on a public GitHub site at: https://github.com/OpenMPDK/KVSSD Q: Computational storage with S3 SELECT: Usually, an object storage solution doesn’t write objects to a single disk, there is some kind of erasure coding for data protection and probably some file system as an abstraction layer which the disk may not be aware of. Also, the data is usually encrypted. How would S3 SELECT be able to parse the original object data on a single drive? A: Yes, most object storage solutions use erasure coding or a simple mirroring mechanism to ensure each object is stored in redundant locations, and yes, erasure coding usually splits up each object across multiple drives. A storage-side query such as AWS S3 Select runs a query on or near the object storage servers and returns a subset of the data to the client or requestor instead of returning the entire object to the requestor for the query. In this type of query, the object storage servers can decrypt encryption before executing the local query, if the encryption was done on the object server side. (If the encryption was done by the client before being sent to the object storage, then the queries would not be able to run at or on the storage servers.) The storage servers would also be able to reassemble an erasure-coded object locally to the storage servers to run the query, or possibly distribute and run the query on the multiple erasure coding destinations for that object. Interested in more information on object storage? Check out the SNIA Educational Library.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage Key Value Storage NVMe Object Storage

Blog

Why Cryptocurrency and Computational Storage?

Why Cryptocurrency and Computational Storage?

Marty Foltyn

Feb 11, 2022

Our new SNIA Compute, Memory, and Storage webcast focuses on a hot topic – storage-based cryptocurrency.

Blockchains, cryptocurrency, and the internet of markets are working to transform finance, wealth, safety, digital security, and trust. Storage-based cryptocurrencies had a breakout year in 2021. Proof of Space and Time is a new blockchain consensus that uses storage capacity to secure the blockchain. Decentralized file storage will enable alternatives to hyperscale data centers for hosting files and objects. Understanding the TCO of a storage system and optimizing the utilization of the storage hardware is critical in scaling these systems.

Join our speakers, Jonmichael Hands of Chia Network and Eli Tiomkin of NGD Systems, for this discussion on how a new approach of auto-plotting SSDs combined with computational storage can lower the total TCO. Registration is free for this webcast on Tuesday, February 15 at 10:00 am Pacific time. Click on the link to register and see you there! https://www.brighttalk.com/webcast/663/526154

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Case Studies Computational Storage

Subscribe to

Computational Storage – Driving Success, Driving Standards Q&A

Find a similar article by tags

Leave a Reply

Using SNIA Swordfish™ to Manage Storage on Your Network

Find a similar article by tags

Leave a Reply

Computational Storage: Driving Success, Driving Standards Q and A

Find a similar article by tags

Leave a Reply

Using SNIA Swordfish™ to Manage Storage on Your Network

Find a similar article by tags

Leave a Reply

Multi-cloud Use Has Become the Norm

Find a similar article by tags

Leave a Reply

Multi-cloud Use Has Become the Norm

Find a similar article by tags

Leave a Reply

Multi-cloud Use Has Become the Norm

Find a similar article by tags

Leave a Reply

Object Storage: Got Questions?

Find a similar article by tags

Leave a Reply

Object Storage: Got Questions?

Find a similar article by tags

Leave a Reply

Why Cryptocurrency and Computational Storage?

Find a similar article by tags

Leave a Reply