Sorry, you need to enable JavaScript to visit this website.

Can Cloud Storage and Big Data Live Happily Ever After?

Chip Mauer

Aug 31, 2021

title of post
“Big Data” has pushed the storage envelope, creating a seemingly perfect relationship with Cloud Storage. But local storage is the third wheel in this relationship, and won’t go down easy. Can this marriage survive when Big Data is being pulled in two directions? Should Big Data pick one, or can the three of them live happily ever after? This will be the topic of discussion on October 21, 2021 at our live SNIA Cloud Storage Technologies webcast, “Cloud Storage and Big Data, A Marriage Made in the Clouds.” Join us as our SNIA experts will cover:
  • A short history of Big Data
  • The impact of edge computing
  • The erosion of the data center
  • Managing data-on-the-fly
  • Grid management
  • Next-gen Hadoop and related technologies
  • Supporting AI workloads
  • Data gravity and distributed data
Register today! Our speakers will be ready to take your questions and black-tie is not required for this wedding!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Can Cloud Storage and Big Data Live Happily Ever After?

Chip Maurer

Aug 31, 2021

title of post
“Big Data” has pushed the storage envelope, creating a seemingly perfect relationship with Cloud Storage. But local storage is the third wheel in this relationship, and won’t go down easy. Can this marriage survive when Big Data is being pulled in two directions? Should Big Data pick one, or can the three of them live happily ever after? This will be the topic of discussion on October 21, 2021 at our live SNIA Cloud Storage Technologies webcast, “Cloud Storage and Big Data, A Marriage Made in the Clouds.” Join us as our SNIA experts will cover:
  • A short history of Big Data
  • The impact of edge computing
  • The erosion of the data center
  • Managing data-on-the-fly
  • Grid management
  • Next-gen Hadoop and related technologies
  • Supporting AI workloads
  • Data gravity and distributed data
Register today! Our speakers will be ready to take your questions and black-tie is not required for this wedding!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

What’s New in Computational Storage? A Conversation with SNIA Leadership

SNIAOnStorage

Aug 27, 2021

title of post
The latest revisions of the SNIA Computational Storage Architecture and Programming Model Version 0.8 Revision 0 and the Computational Storage API v0.5 rev 0 are now live on the SNIA website. Interested to know what has been added to the specifications, SNIAOnStorage met “virtually” with Jason Molgaard, Co-Chair of the SNIA Computational Storage Technical Work Group, and Bill Martin, Co-Chair of the SNIA Technical Council and editor of the specifications, to get the details. Both SNIA volunteer leaders stressed that they welcome ideas about the specifications and invite industry colleagues to join them in continuing to define computational storage standards.  The two documents are working documents – continually being refined and enhanced. If you are not a SNIA member, you can submit public comments via the SNIA Feedback Portal. To learn if your company is a SNIA member, check the SNIA membership list. If you are a SNIA member,  go here to join the Computational Storage Technical Work Group member work area. The Computational Storage Technical Work Group chairs also welcome your emails.  Reach out to them at computationaltwg-chair@snia.org. SNIAOnStorage (SOS):  What is the overall objective of the Computational Storage Architecture and Programming Model? Jason Molgaard (JM): The overall objective of the document is to define recommended behavior for hardware and software that supports computational storage.  This is the second release of the Architecture and Programming Model, and it is very stable.  While changes are dramatic, this is primarily because of feedback we received both from the public and to a larger extent from new Technical Work Group members who have provided insight and perspective. SOS: Could you summarize what has changed in the 0.8 version of the Model? JM: Version 0.8 has four main takeaways:
  1. It renames the Computational Storage Processor.  The component within a Computational Storage Device (CSx) is now called a Computational Storage Engine (CSE).  The Computational Storage Processor (CSP) now only refers to a device that contains a Computational Storage Engine (CSE) and no storage.
  2. It defines a new architectural concept of a Computational Storage Engine Environment (CSEE).  This is something that is attached to a specific CSE and defines the environment that a Computational Storage Function (CSF) operates in.
  3. It defines a new architectural element of a Resource Repository that contains CSEEs that are available for activation on a CSE and also CSFs that are available for activation on a CSEE.
  4. Discovery and configuration flow are now documented in Version 0.8
SOS: Why did the TWG decide to work on the release of a unique API document? Bill Martin (BM): The overall objective of the Computational Storage API document is to define an interface between an application and a CSx. Version 0.5 is the first release to the public by the Technical Work Group. There are three key takeaways from version 0.5:
  1. The document defines an Application Programming Interface (API) to CSxs
  2. The API allows a user application on a host to have a consistent interface to any vendor’s CSx.
  3. A Vendor defines a library for their device that implements the API.  Mapping to wire protocol for the device is done by this library.   Functions that are not available on a specific CSx may be implemented in software.
SOS: How can vendors use these documents? BM: The Computational Storage Architecture and Programming Model is what I would categorize as a “descriptive” document.  There are no “shalls” or “shoulds” in the document. Rather, the Model is something to use and view the elements they should be considering. It shows the components that are in the architecture, what they mean, and how they interact with each other. This allows users to understand the frameworks and options that can be implemented with a common language and understanding. The API document, in contrast, is a “prescriptive” document.  It describes how to use the elements defined within the architectural document – how to do discovery and configuration, and how to utilize the architecture. These documents are meant to be used together. Some implementations may not use all of the elements of the architecture, but all of the elements are logically there. JM: Individuals who are looking to implement computational storage – and who are developing their own computational storage devices – should absolutely review both documents and use them to provide feedback and questions. Many vendors are considering what their computational storage device should look like. This architecture framework provides good guidance and baseline nomenclature we can all use to speak the same language. SOS: Are there any specific areas where you are looking for feedback? BM:  In the API specification, we’d like feedback on whether or not the discovery should cover everything implementers want to discover about a CSx. We’d like more depth and details and what things people think they want to discover about a device.  We’d also like comments on items that need to be added on how you interact with the devices to execute a Computational Storage Function (CSF). JM:  For the Model, we’d like feedback on whether people see the value of this descriptive document and are actually following it.  We’d like to know if there are additional ideas or interests of definition that users want to see when constructing architectures, or whether there are gaps in defined activities. SOS:  Where can folks find out more information about the Specifications? BM: We invite everyone to attend the upcoming SNIA Storage Developer Conference (SDC) . We will be virtual this year on September 28 and 29.  Registrants can view 12 presentations on computational storage, including a Computational Storage Update from the Working Group that the Co-Chairs of the Computational Storage TWG, Scott Shadley and Jason Molgaard, are presenting, one I will be giving on Computational Storage Moving Forward with an Architecture and API, and another by my Computational Storage TWG colleague Oscar Pinto on Computational Storage APIs.  And anyone with interest in computational storage can attend an open discussion during SDC on computational storage advances that will be featured in a Birds-of-a-Feather session via Zoom on September 29 at 4:00 pm Pacific. Go here to learn how to attend this SDC special event and all the Birds-of-a-Feather sessions. SOS: Thanks to you both.  Our readers may want to know that  SNIA’s work in computational storage is led by the 250+ volunteer vendor members of the Computational Storage Technical Work Group.  In addition to these two specifications, the TWG has also updated computational storage terms in the Online SNIA Dictionary. The SNIA Computational Storage Special Interest Group accelerates the awareness of computational storage concepts and influences industry adoption and implementation of the technical specifications and programming models. Learn more at http://www.snia.org/cmsi The post What’s New in Computational Storage? A Conversation with SNIA Leadership first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Deploying Confidential Computing Q&A

Michael Hoard

Aug 27, 2021

title of post
The third live webcast in our SNIA Cloud Storage Technologies Initiative confidential computing series focused on real-world deployments of confidential computing and included case studies and demonstrations. If you missed the live event, you can watch it on demand here. Our live audience asked some interesting questions, here are our expert presenters’ answers. Q.  What is the overhead in CPU cycles for running in a trusted enclave? A. We have been running some very large machine learning applications in secure enclaves using the latest available hardware, and seeing very close to “near-native” performance, with no more than 5% performance overhead compared to normal non-secure operations. This performance is significantly better in comparison to older versions of hardware. With new hardware, we are ready to take on bigger workloads with minimal overhead. Also, it is important to note that encryption and isolation are done in hardware at memory access speeds, so that is not where you will tend to see a performance issue. Regardless of which secure enclave hardware capability you choose, each uses a different technology to manage the barrier between secure enclaves. The important thing is to look at how often an application crosses the barrier, since that is where careful attention is needed. Q. How do you get around the extremely limited memory space of SGX? A. With the latest Intel® Ice Lake processors, SGX related memory limits have been significantly relaxed. With previous generations, memory was limited to 256MB of secure enclave base cache or Enclave Page Cache (EPC). With Ice Lake processors, this has been relaxed so that SGX supports memory sizes from hundreds of Gigabytes (GB) up to one Terabyte (TB) of EPC.  With this update, we are seeing very large applications now fully fit within secure enclaves, and thus gain significant performance increases. However, we should still be wary about running large commercial database suites within secure enclaves, not only with respect to memory size, but also with respect to how database operations run within CPUs. Databases are large and complex applications that use native CPU and memory features (for example shared memory), that don’t lend themselves well to constrained environments like enclaves. AMD Secure Encrypted Virtualization (SEV) and AWS Nitro Enclaves have different characteristics, but it’s important to note that they support very large applications in secure enclaves. Q. How can I test this stuff out?  Where can I start? A. There are a number of demonstration environments where users can test their own applications. Confidential computing offerings are available from many cloud service providers (CSPs).  It is recommended that when you’re thinking about confidential computing, talk with someone who has been through the process. They should be able to help identify challenges and demonstrate how deployment can be easier than you initially anticipated. Q. How does confidential computing compare with other technologies for data security such as homomorphic encryption and multi-party computation? A. Homomorphic encryption allows data in encrypted memory to be processed and acted upon without moving it out of the encrypted space, but it’s currently computationally expensive. In contrast, there is interest in multi-party compute. For example, someone owns data in a bank, someone else owns AI models to detect money laundering, and a third owns the compute which acts as a trusted place where the data and algorithms can come together for secure multi-party processing. Confidential computing makes this possible in a way that was previously not feasible. Homomorphic encryption, multi-party computation and confidential computing may all be used together to implement data protection. Which method is most suitable depends upon performance, security model, ease of use, scalability, flexibility of deployment architecture, and whether or not the application requires significant or regular change. Q. What industry sectors are you seeing with the most traction for confidential computing? A. We are seeing many applications related to machine learning, applications which request data from a secure database and applications in highly regulated data privacy and data protection environments. Other applications include distribution of secret information, for example web certificates across web services, web servers, key management and key distribution systems. And finally, distributed compute applications where data needs to be locally processed on secure edge platforms. We are seeing significant interest in both securing the hundreds of thousands of existing enterprise applications, data, and workloads in the public cloud. Bad actors are focused on the cloud because they know legacy security is easily undermined. Confidential clouds quickly and easily put confidential computing to work to provide instant hardware-grade enclave protections for these cloud assets with no changes to the application, deployment or IT processes. Q. How does confidential computing help with meeting compliance requirements like GDPR, CCPA, etc.? A. Regulated organizations are now seeing value in confidential computing. Applications, like UCSF which we shared earlier, can achieve HIPPA regulatory compliance much faster when using confidential computing versus other approaches. Additionally, use of new security primitives which come as part of confidential computing can make it easier to prove to regulators that an environment is secure and meets all of the necessary security regulations. The ability to show that data in use is protected as well as at rest is becoming increasingly important and auditability down to an individual application, process and CPU further demonstrates compliance. Closing thoughts… Confidential computing will be widely prevalent in the next five years, but now is the time to begin adoption. Suitable environments and hardware are available now via various CSPs and on-premises platforms.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Deploying Confidential Computing Q&A

Michael Hoard

Aug 27, 2021

title of post
The third live webcast in our SNIA Cloud Storage Technologies Initiative confidential computing series focused on real-world deployments of confidential computing and included case studies and demonstrations. If you missed the live event, you can watch it on demand here. Our live audience asked some interesting questions, here are our expert presenters’ answers. Q.  What is the overhead in CPU cycles for running in a trusted enclave? A. We have been running some very large machine learning applications in secure enclaves using the latest available hardware, and seeing very close to “near-native” performance, with no more than 5% performance overhead compared to normal non-secure operations. This performance is significantly better in comparison to older versions of hardware. With new hardware, we are ready to take on bigger workloads with minimal overhead. Also, it is important to note that encryption and isolation are done in hardware at memory access speeds, so that is not where you will tend to see a performance issue. Regardless of which secure enclave hardware capability you choose, each uses a different technology to manage the barrier between secure enclaves. The important thing is to look at how often an application crosses the barrier, since that is where careful attention is needed. Q. How do you get around the extremely limited memory space of SGX? A. With the latest Intel® Ice Lake processors, SGX related memory limits have been significantly relaxed. With previous generations, memory was limited to 256MB of secure enclave base cache or Enclave Page Cache (EPC). With Ice Lake processors, this has been relaxed so that SGX supports memory sizes from hundreds of Gigabytes (GB) up to one Terabyte (TB) of EPC.  With this update, we are seeing very large applications now fully fit within secure enclaves, and thus gain significant performance increases. However, we should still be wary about running large commercial database suites within secure enclaves, not only with respect to memory size, but also with respect to how database operations run within CPUs. Databases are large and complex applications that use native CPU and memory features (for example shared memory), that don’t lend themselves well to constrained environments like enclaves. AMD Secure Encrypted Virtualization (SEV) and AWS Nitro Enclaves have different characteristics, but it’s important to note that they support very large applications in secure enclaves. Q. How can I test this stuff out?  Where can I start? A. There are a number of demonstration environments where users can test their own applications. Confidential computing offerings are available from many cloud service providers (CSPs).  It is recommended that when you’re thinking about confidential computing, talk with someone who has been through the process. They should be able to help identify challenges and demonstrate how deployment can be easier than you initially anticipated. Q. How does confidential computing compare with other technologies for data security such as homomorphic encryption and multi-party computation? A. Homomorphic encryption allows data in encrypted memory to be processed and acted upon without moving it out of the encrypted space, but it’s currently computationally expensive. In contrast, there is interest in multi-party compute. For example, someone owns data in a bank, someone else owns AI models to detect money laundering, and a third owns the compute which acts as a trusted place where the data and algorithms can come together for secure multi-party processing. Confidential computing makes this possible in a way that was previously not feasible. Homomorphic encryption, multi-party computation and confidential computing may all be used together to implement data protection. Which method is most suitable depends upon performance, security model, ease of use, scalability, flexibility of deployment architecture, and whether or not the application requires significant or regular change. Q. What industry sectors are you seeing with the most traction for confidential computing? A. We are seeing many applications related to machine learning, applications which request data from a secure database and applications in highly regulated data privacy and data protection environments. Other applications include distribution of secret information, for example web certificates across web services, web servers, key management and key distribution systems. And finally, distributed compute applications where data needs to be locally processed on secure edge platforms. We are seeing significant interest in both securing the hundreds of thousands of existing enterprise applications, data, and workloads in the public cloud. Bad actors are focused on the cloud because they know legacy security is easily undermined. Confidential clouds quickly and easily put confidential computing to work to provide instant hardware-grade enclave protections for these cloud assets with no changes to the application, deployment or IT processes. Q. How does confidential computing help with meeting compliance requirements like GDPR, CCPA, etc.? A. Regulated organizations are now seeing value in confidential computing. Applications, like UCSF which we shared earlier, can achieve HIPPA regulatory compliance much faster when using confidential computing versus other approaches. Additionally, use of new security primitives which come as part of confidential computing can make it easier to prove to regulators that an environment is secure and meets all of the necessary security regulations. The ability to show that data in use is protected as well as at rest is becoming increasingly important and auditability down to an individual application, process and CPU further demonstrates compliance. Closing thoughts… Confidential computing will be widely prevalent in the next five years, but now is the time to begin adoption. Suitable environments and hardware are available now via various CSPs and on-premises platforms.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Q&A (Part 1) from “Storage Trends for 2021 and Beyond” Webcast

STA Forum

Aug 24, 2021

title of post

Questions from “Storage Trends for 2021 and Beyond” Webcast Answered

It was a great pleasure for Rick Kutcipal, board director, SCSI Trade Association (STA), to welcome Jeff Janukowicz, Research vice president at IDC and Chris Preimesberger, former editor-in-chief of eWeek, in a roundtable talk to discuss prominent data storage technologies shaping the market. If you missed this webcast titled “Storage Trends for 2021 and Beyond,” it’s now available on demand here.

The well-attended event generated a lot of questions! So many in fact, we’re authoring a two-part blog series with the answers. In part one, we recap the questions that were asked and answered during the webcast, but since we ran out of time to answer them all, please watch for part two when we tackle the rest.

Q1. How far along is 24G in development?
A1. Rick: The specification is done and most of the major players are investing in it today. Products have been announced and we’re also expecting to see server shipments in 2022. STA has a plugfest scheduled for July 6, 2021. It’s a busy time and everybody’s pretty excited about it!

Q2. What’s after 24G SAS?
A2. Rick: Naturally, one would think it would be a 48G speed bump, but it’s not clear that’s necessary. There’s still a lot of room for innovation within the SCSI stack, not just in the physical layer. The physical layer is the one that people can relate to and think “oh, it’s faster.” Keep in mind that there are a lot of features and functionality that can be added on top of that physical layer. The layered architecture of the SCSI stack, enables changes whether it’s at the protocol layer or another higher layer, without impacting the physical layer. These are happening real time and STA is having T10 technical committee meetings on a regular basis, and innovations are in the works.

Q3. Where does NVMe HDD and 25G ethernet HDD fit in?
A3. Jeff: Generally speaking, it’s still unclear how that’s going to evolve. As we look out over time, in the enterprise market on the SSD side, clearly, we’re seeing NVMe move into the majority of the shipments and SSDs are growing as a percentage of the overall unit shipments and petabytes. However, right now we’re seeing a mix of technologies that are used within a storage array or in an enterprise system. And clearly, they are SAS-based SSDs and HDDs. And with that transition to more SSDs, it’s sort of a natural question to say, “hey, what about putting the NVMe interface on HDDs?” Now you obviously don’t necessarily need it for all the performance reasons or the optimizations around non-volatile media, which is why NVMe was introduced, but there are some initiatives, and these could help bring some cost savings and further system optimizations to the industry. There are some things underway from OCP in terms of looking at NVMe based HDDs, but they’re still relatively early on at least from my perspective in terms of their development. But there are definitely some activities underway that are looking at the technology.
Rick: From my perspective, I’m seeing a surge in NVMe HDD work within OCP. My concern with NVMe HDDs is the amount of standards work that still has to be done to make them work in an enterprise environment. I think people forget it’s not just taking some media and putting an NVMe interface in front of it. How do all the drive inquiries get mapped to NVMe? How do you manage enterprise large scale spin up? I think it’s an exciting time. I think there are a lot of good possibilities, but the amount of work that’s needed can be underestimated sometimes.

Q4. Could you discuss the adoption of SAS SATA and NVMe in all flash arrays?
A4. Jeff: IDC has seen a lot of investment in terms of all flash arrays. And we’ve seen pretty rapid growth over the last couple years. In 2020, about 40% of the spending on external storage was on all flash arrays. And the reality is if you look at that today, the vast majority of those are really still built upon SAS-based SSDs. There have been some announcements from a lot of the large storage providers around NVMe-based arrays, whether it’s Dell EMC, Netapp, Pure Storage, IBM, etc. Today, these solutions have already started to become available in the market. And we do see NVMe AFA’s as a very high growth category over the next few years, but right now they’re still targeted primarily at a lot of the higher end and more performance-oriented types of applications. We’re really just starting to see them move down into the more mainstream portion of the all flash array market. Which from IDC’s perspective, if it was 40% last year, we see it growing as an overall category to about 50% of the overall spend on external storage by 2023. So clearly there is a lot going on in this market as well.
Rick: My questions in regards to NVMe and all flash arrays is always about scalability. I know there’s a lot of work going on regarding NVMe over fabrics, but if you go back and look at the amount of computational resources, memory and system resources that it takes to scale these things, there’s still some pretty big challenges ahead. I’m not saying it’s not going to happen, but of course the ecosystem, has solved hard problems in the past.

Q5. How do you differentiate between M.2 SSDs and NVMe in client system deployments?
A5. Rick: The SOCs or the controllers on these devices are very different. There are enterprise class M.2 drives, so the form factor doesn’t necessarily preclude it from fitting into one of these categories. While M.2 is more designed to the client, it’s not a hard and fast thing. Typically, it’s the traditional 2.5.
Jeff: Rick, you’re pretty much spot on. There are some differences at the SOC level and design level such as power fail protection. But there does tend to be a different firmware load a lot of times for the enterprise class drives. There can also be some differences in terms of the endurance in how those drives are designed. But if the question is about form factors, we really are at an interesting point for the industry, because historically it has always been dictated by HDD form factors. But as flash has grown, we’ve seen a lot of new form factors. M.2 is obviously one that was originally designed for some of the client market, and has now found its way into a lot of enterprise applications. E1 short is a slight variant of M.2 but is on the roadmap to be more enterprise optimized form factor. But we also see some other ones out there like E1 long, which is a longer version of E1.S. There’s also U.3 and others which are pretty interesting in terms of ways to optimize around some of the new storage media, i.e., SSDs and solid state.

Q6. Is the NVMe takeover sooner than 3-5 years?
A6. Rick: That’s a very logical question. People that aren’t in the ecosystem day-to-day might not be seeing the 24G SAS adoption. Right now, there’s a lot of investments at the system and sub-system level. For 24G SAS there are multiple adapter vendors, same as there has been in the past for 12G SAS. And from the media side, there are numerous drive vendors sampling 24G SAD drives today, and one has been announced. I think some people are going to be shocked of the 24G adoption, and that’s going to start coming to light at STA’s next plugfest, with some big demos and press announcements as products get ready to launch. So, I guess I would, say stay tuned for that one because I think people, some people, are going to be pretty surprised.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

What is eBPF, and Why Does it Matter for Computational Storage?

SNIAOnStorage

Jul 28, 2021

title of post
Recently, a question came up in the SNIA Computational Storage Special Interest Group on new developments in a technology called eBPF and how they might relate to computational storage. To learn more, SNIA on Storage sat down with Eli Tiomkin, SNIA CS SIG Chair with NGD Systems; Matias Bjørling of Western Digital; Jim Harris of Intel; Dave Landsman of Western Digital; and Oscar Pinto of Samsung. SNIA On Storage (SOS):  The eBPF.io website defines eBPF, extended Berkeley Packet Filter, as a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. Why is it important? Dave Landsman (DL): eBPF emerged in Linux as a way to do network filtering, and enables the Linux kernel to be programmed.  Intelligence and features can be added to existing layers, and there is no need to add additional layers of complexity. SNIA On Storage (SOS):  What are the elements of eBPF that would be key to computational storage?  Jim Harris (JH):  The key to eBPF is that it is architecturally agnostic; that is, applications can download programs into a kernel without having to modify the kernel.  Computational storage allows a user to do the same types of things – develop programs on a host and have the controller execute them without having to change the firmware on the controller. Using a hardware agnostic instruction set is preferred to having an application need to download x86 or ARM code based on what architecture is running. DL:  It is much easier to establish a standard ecosystem with architecture independence.  Instead of an application needing to download x86 or ARM code based on the architecture, you can use a hardware agnostic instruction set where the kernel can interpret and then translate the instructions based on the processor. Computational storage would not need to know the processor running on an NVMe device with this “agnostic code”. SOS: How has the use of eBPF evolved? JH:  It is more efficient to run programs directly in the kernel I/O stack rather than have to return packet data to the user, operate on it there, and then send the data back to the kernel. In the Linux kernel, eBPF began as a way to capture and filter network packets.  Over time, eBPF use has evolved to additional use cases. SOS: What are some use case examples? DL: One of the use cases is performance analysis. For example, eBPF can be used to measure things such as latency distributions for file system I/O, details of storage device I/O and TCP retransmits, and blocked stack traces and memory. Matias Bjørling (MB): Other examples in the Linux kernel include tracing and gathering statistics. However, while the eBPF programs in the kernel are fairly simple, and can be verified by the Linux kernel VM, computational programs are more complex, and longer running. Thus, there is a lot of work ongoing to explore how to efficiently apply eBPF to computational programs. For example, what is the right set of run-time restrictions to be defined by the eBPF VM, any new instructions to be defined, how to make the program run as close to the instruction set of the target hardware. JH: One of the big use cases involves data analytics and filtering. A common data flow for data analytics are large database table files that are often compressed and encrypted. Without computational storage, you read the compressed and encrypted data blocks to the host, decompress and decrypt the blocks, and maybe do some filtering operations like a SQL query. All this, however, consumes a lot of extra host PCIe, host memory, and cache bandwidth because you are reading the data blocks and doing all these operations on the host.  With computational storage, inside the device you can tell the SSD to read data and transfer it not to the host but to some memory buffers within the SSD.  The host can then tell the controller to do a fixed function program like decrypt the data and put in another local location on the SSD, and then do a user supplied program like eBPF to do some filtering operations on that local decrypted data. In the end you would transfer the filtered data to the host.  You are doing the compute closer to the storage, saving memory and bandwidth. SOS:  How does using eBPF for computational storage look the same?  How does it look different? Jim – There are two parts to this answer.  Part 1 is the eBPF instruction set with registers and how eBPF programs are assembled. Where we are excited about computational storage and eBPF is that the instruction set is common. There are already existing tool chains that support eBPF.   You can take a C program and compile it into an eBPF object file, which is huge.  If you add computational storage aspects to standards like NVMe, where developing a unique tool chain support can take a lot of work, you can now leverage what is already there for the eBPF ecosystem. Part 2 of the answer centers around the Linux kernel’s restrictions on what an eBPF program is allowed to do when downloaded. For example, the eBPF instruction set allows for unbounded loops, and toolchains such as gcc will generate eBPF object code with unbounded loops, but the Linux kernel will not permit those to execute – and rejects the program. These restrictions are manageable when doing packet processing in the kernel.  The kernel knows a packet’s specific data structure and can verify that data is not being accessed outside the packet.  With computational storage, you may want to run an eBPF program that operates on a set of data that has a very complex data structure – perhaps arrays not bounded or multiple levels of indirection.  Applying Linux kernel verification rules to computational storage would limit or even prevent processing this type of data. SOS: What are some of the other challenges you are working through with using eBPF for computational storage? MB:  We know that x86 works fast with high memory bandwidth, while other cores are slower.  We have some general compute challenges in that eBPF needs to be able to hook into today’s hardware like we do for SSDs.  What kind of operations make sense to offload for these workloads?  How do we define a common implementation API for all of them and build an ecosystem on top of it?  Do we need an instruction-based compiler, or a library to compile up to – and if you have it on the NVMe drive side, could you use it?  eBPF in itself is great- but getting a whole ecosystem and getting all of us to agree on what makes value will be the challenge in the long term. Oscar Pinto (OP): The Linux kernel for eBPF today is more geared towards networking in its functionality but light on storage. That may be a challenge in building a computational storage framework. We need to think through how to enhance this given that we download and execute eBPF programs in the device. As Matias indicated, x86 is great at what it does in the host today. But if we have to work with smaller CPUs in the device, they may need help with say dedicated hardware or similar implemented using additional logic to aid the eBPF programs One question is how would these programs talk to them? We don’t have a setup for storage like this today, and there are a variety of storage services that can benefit from eBPF. SOS: Is SNIA addressing this challenge? OP: On the SNIA side we are building on program functions that are downloaded to computational storage engines.  These functions run on the engines which are CPUs or some other form of compute that are tied to a FPGA, DPU, or dedicated hardware. We are defining these abstracted functionalities in SNIA today, and the SNIA Computational Storage Technical Work Group is developing a Computational Storage Architecture and Programming Model and Computational Storage APIs  to address it..  The latest versions, v0.8 and v0.5, has been approved by the SNIA Technical Council, and is now available for public review and comment at SNIA Feedback Portal. SOS: Is there an eBPF standard? Is it aligned with storage? JH:  We have a challenge around what an eBPF standard should look like.  Today it is defined in the Linux kernel.  But if you want to incorporate eBPF in a storage standard you need to have something specified for that storage standard.  We know the Linux kernel will continue to evolve adding and modifying instructions. But if you have a NVMe SSD or other storage device you have to have something set in stone –the version of eBPF that the standard supports. We need to know what the eBPF standard will look like and where will it live.  Will standards organizations need to define something separately? SOS:  What would you like an eBPF standard to look like from a storage perspective? JH – We’d like an eBPF standard that can be used by everyone.  We are looking at how computational storage can be implemented in a way that is safe and secure but also be able to solve use cases that are different. MB:  Security will be a key part of an eBPF standard.  Programs should not access data they should not have access to.  This will need to be solved within a storage device. There are some synergies with external key management. DL: The storage community has to figure out how to work with eBPF and make this standard something that a storage environment can take advantage of and rely on. SOS: Where do you see the future of eBPF? MB:  The vision is that you can build eBPFs and it works everywhere.  When we build new database systems and integrate eBPFs into them, we then have embedded kernels that can be sent to any NVMe device over the wire and be executed. The cool part is that it can be anywhere on the path, so there becomes a lot of interesting ways to build new architectures on top of this. And together with the open system ecosystem we can create a body of accelerators in which we can then fast track the build of these ecosystems.  eBPF can put this into overdrive with use cases outside the kernel. DL:  There may be some other environments where computational storage is being evaluated, such as web assembly. JH: An eBPF run time is much easier to put into an SSD than a web assembly run time. MB: eBPF makes more sense – it is simpler to start and build upon as it is not set in stone for one particular use case. Eli Tiomkin (ET):  Different SSDs have different levels of constraints. Every computational storage SSDs in production and even those in development have very unique capabilities that are dependent on the workload and application. SOS:  Any final thoughts? MB: At this point, technologies are coming together which are going to change the industry in a way that we can redesign the storage systems both with computational storage and how we manage security in NVMe devices for these programs.  We have the perfect storm pulling things together. Exciting platforms can be built using open standards specifications not previously available. SOS: Looking forward to this exciting future. Thanks to you all. The post What is eBPF, and Why Does it Matter for Computational Storage? first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Q&A: Security of Data on NVMe-oF

John Kim

Jul 28, 2021

title of post

Ensuring the security of data on NVMe® over Fabrics was the topic of our SNIA Networking Storage Forum (NSF) webcast “Security of Data on NVMe over Fabrics, the Armored Truck Way.” During the webcast our experts outlined industry trends, potential threats, security best practices and much more. The live audience asked several interesting questions and here are answers to them.

Q. Does use of strong authentication and network encryption ensure I will be compliant with regulations such as HIPAA, GDPR, PCI, CCPA, etc.?

A. Not by themselves. Proper use of strong authentication and network encryption will reduce the risk of data theft or improper data access, which can help achieve compliance with data privacy regulations. But full compliance also requires establishment of proper processes, employee training, system testing and monitoring. Compliance may also require regular reviews and audits of systems and processes plus the involvement of lawyers and compliance consultants.

Q. Does using encryption on the wire such as IPsec, FC_ESP, or TLS protect against ransomware, man-in-the middle attacks, or physical theft of the storage system?

A. Proper use of data encryption on the storage network can protect against man-in-the middle snooping attacks because any data intercepted would be encrypted and very difficult to decrypt.  Use of strong authentication such DH-HMAC-CHAP can reduce the risk of a man-in-the-middle attack succeeding in the first place. However, encrypting data on the wire does not by itself protect against ransomware nor against physical theft of the storage systems because the data is decrypted once it arrives on the storage system or on the accessing server.

Q. Does "zero trust" mean I cannot trust anybody else on my IT team or trust my family members?

A. Zero Trust does not mean your coworker, mother or cousin is a hacker.  But it does require assuming that any server, user (even your coworker or mother), or application could be compromised and that malware or hackers might already be inside the network, as opposed to assuming all threats are being kept outside the network by perimeter firewalls. As a result, Zero Trust means regular use of security technologies--including firewalls, encryption, IDS/IPS, anti-virus software, monitoring, audits, penetration testing, etc.--on all parts of the data center to detect and prevent attacks in case one of the applications, machines or users has been compromised.

Q. Great information! Is there any reference security practice for eBOF and NVMe-oF™ that you recommend?

A. Generally security practices with an eBOF using NVMe-oF would be similar to with traditional storage arrays (whether they use NVMe-oF, iSCSI, FCP, or a NAS protocol). You should authenticate users, emplace fine-grained access controls, encrypt data, and backup your data regularly. You might also want to physically or logically separate your storage network from the compute traffic or user access networks. Some differences may arise from the fact that with an eBOF, it's likely that multiple servers will access multiple eBOFs directly, instead of each server going to a central storage controller that in turn accesses the storage shelves or JBOFs.

Q. Are there concerns around FC-NVMe security when it comes to Fibre Channel Fabric services? Can a rogue NVMe initiator discover the subsystem controllers during the discovery phase and cause a denial-of-service kind of attack? Under such circumstances can DH-CHAP authentication help?

A. A rogue initiator might be able to discover storage arrays using the FC-NVMe protocol but this may be blocked by proper use of Fibre Channel zoning and LUN masking. If a rogue initiator is able to discover a storage array, proper use of DH-CHAP should prevent it from connecting and accessing data, unless the rogue initiator is able to successfully impersonate a legitimate server. If the rogue server is able to discover an array using FC-NVMe, but cannot connect due to being blocked by strong authentication, it could initiate a denial-of-service attack and DH-CHAP by itself would not block or prevent a denial-of-service attack.

Q. With the recent example of Colonial Pipeline cyber-attack, can you please comment on what are best practice security recommendations for storage with regards to separation of networks for data protection and security?

A. It's a best practice to separate storage networks from the application and/or user networks. This separation can be physical or logical and could include access controls and authentication within each physical or logical network. A separate physical network is often used for management and monitoring. In addition, to protect against ransomware, storage systems should be backed up regularly with some backups kept physically offline, and the storage team should practice restoring data from backups on a regular basis to verify the integrity of the backups and the restoration process.

For those of you who follow the many educational webcasts that the NSF hosts, you may have noticed that we are discussing the important topic of data security a lot. In fact, there is an entire Storage Networking Security Webcast Series that dives into protecting data at rest, protecting data in flight, encryption, key management, and more.

We’ve also been talking about NVMe-oF a lot. I encourage you to watch “NVMe-oF: Looking Beyond Performance Hero Numbers” where our SNIA experts explain why it is important to look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency. And if you’re ready for more, you can “Geek Out” on NVMe-oF here, where we’ve curated several great basic and advanced educational assets on NVMe-oF.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Q&A: Security of Data on NVMe-oF

John Kim

Jul 28, 2021

title of post
Ensuring the security of data on NVMe over Fabrics was the topic of our SNIA Networking Storage Forum (NSF) webcast “Security of Data on NVMe over Fabrics, the Armored Truck Way.” During the webcast our experts outlined industry trends, potential threats, security best practices and much more. The live audience asked several interesting questions and here are answers to them. Q. Does use of strong authentication and network encryption ensure I will be compliant with regulations such as HIPAA, GDPR, PCI, CCPA, etc.? A. Not by themselves. Proper use of strong authentication and network encryption will reduce the risk of data theft or improper data access, which can help achieve compliance with data privacy regulations. But full compliance also requires establishment of proper processes, employee training, system testing and monitoring. Compliance may also require regular reviews and audits of systems and processes plus the involvement of lawyers and compliance consultants. Q. Does using encryption on the wire such as IPsec, FC_ESP, or TLS protect against ransomware, man-in-the middle attacks, or physical theft of the storage system? A. Proper use of data encryption on the storage network can protect against man-in-the middle snooping attacks because any data intercepted would be encrypted and very difficult to decrypt. Use of strong authentication such DH-HMAC-CHAP can reduce the risk of a man-in-the-middle attack succeeding in the first place. However, encrypting data on the wire does not by itself protect against ransomware nor against physical theft of the storage systems because the data is decrypted once it arrives on the storage system or on the accessing server. Q. Does “zero trust” mean I cannot trust anybody else on my IT team or trust my family members? A. Zero Trust does not mean your coworker, mother or cousin is a hacker.  But it does require assuming that any server, user (even your coworker or mother), or application could be compromised and that malware or hackers might already be inside the network, as opposed to assuming all threats are being kept outside the network by perimeter firewalls. As a result, Zero Trust means regular use of security technologies–including firewalls, encryption, IDS/IPS, anti-virus software, monitoring, audits, penetration testing, etc.–on all parts of the data center to detect and prevent attacks in case one of the applications, machines or users has been compromised. Q. Great information! Is there any reference security practice for eBOF and NVMe-oF that you recommend? A. Generally security practices with an eBOF using NVMe-oF would be similar to with traditional storage arrays (whether they use NVMe-oF, iSCSI, FCP, or a NAS protocol). You should authenticate users, emplace fine-grained access controls, encrypt data, and backup your data regularly. You might also want to physically or logically separate your storage network from the compute traffic or user access networks. Some differences may arise from the fact that with an eBOF, it’s likely that multiple servers will access multiple eBOFs directly, instead of each server going to a central storage controller that in turn accesses the storage shelves or JBOFs. Q. Are there concerns around FC-NVMe security when it comes to Fibre Channel Fabric services? Can a rogue NVMe initiator discover the subsystem controllers during the discovery phase and cause a denial-of-service kind of attack? Under such circumstances can DH-CHAP authentication help? A. A rogue initiator might be able to discover storage arrays using the FC-NVMe protocol but this may be blocked by proper use of Fibre Channel zoning and LUN masking. If a rogue initiator is able to discover a storage array, proper use of DH-CHAP should prevent it from connecting and accessing data, unless the rogue initiator is able to successfully impersonate a legitimate server. If the rogue server is able to discover an array using FC-NVMe, but cannot connect due to being blocked by strong authentication, it could initiate a denial-of-service attack and DH-CHAP by itself would not block or prevent a denial-of-service attack. Q. With the recent example of Colonial Pipeline cyber-attack, can you please comment on what are best practice security recommendations for storage with regards to separation of networks for data protection and security? A. It’s a best practice to separate storage networks from the application and/or user networks. This separation can be physical or logical and could include access controls and authentication within each physical or logical network. A separate physical network is often used for management and monitoring. In addition, to protect against ransomware, storage systems should be backed up regularly with some backups kept physically offline, and the storage team should practice restoring data from backups on a regular basis to verify the integrity of the backups and the restoration process. For those of you who follow the many educational webcasts that the NSF hosts, you may have noticed that we are discussing the important topic of data security a lot. In fact, there is an entire Storage Networking Security Webcast Series that dives into protecting data at rest, protecting data in flight, encryption, key management, and more. We’ve also been talking about NVMe-oF a lot. I encourage you to watch “NVMe-oF: Looking Beyond Performance Hero Numbers” where our SNIA experts explain why it is important to look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency. And if you’re ready for more, you can “Geek Out” on NVMe-oF here, where we’ve curated several great basic and advanced educational assets on NVMe-oF.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Moving Genomics to the Cloud

Alex McDonald

Jul 27, 2021

title of post
The study of genomics in modern biology has revolutionized the discovery of medicines and the COVID pandemic response has quickened genetic research and driven the rapid development of vaccines. Genomics, however, requires a significant amount of compute power and data storage to make new discoveries possible. Making sure compute and storage are not a roadblock for genomics innovations will be the topic of discussion at the SNIA Cloud Storage Technologies Initiative live webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” This session will feature expert viewpoints from both bioinformatics and technology perspectives with a focus on some of the compute and data storage challenges for genomics workflows. We will discuss:
  • How to best store and manage large genomics datasets
  • Methods for sharing large datasets for collaborative analysis
  • Legal and ethical implications of storing shareable data in the cloud
  • Transferring large data sets and the impact on storage and networking
Join us for this live event on September 9, 2021 for a fascinating discussion on an area of technology that is rapidly evolving and changing the world.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to