Sorry, you need to enable JavaScript to visit this website.

What is eBPF, and Why Does it Matter for Computational Storage?

SNIAOnStorage

Jul 28, 2021

title of post
Recently, a question came up in the SNIA Computational Storage Special Interest Group on new developments in a technology called eBPF and how they might relate to computational storage. To learn more, SNIA on Storage sat down with Eli Tiomkin, SNIA CS SIG Chair with NGD Systems; Matias Bjørling of Western Digital; Jim Harris of Intel; Dave Landsman of Western Digital; and Oscar Pinto of Samsung. SNIA On Storage (SOS):  The eBPF.io website defines eBPF, extended Berkeley Packet Filter, as a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. Why is it important? Dave Landsman (DL): eBPF emerged in Linux as a way to do network filtering, and enables the Linux kernel to be programmed.  Intelligence and features can be added to existing layers, and there is no need to add additional layers of complexity. SNIA On Storage (SOS):  What are the elements of eBPF that would be key to computational storage?  Jim Harris (JH):  The key to eBPF is that it is architecturally agnostic; that is, applications can download programs into a kernel without having to modify the kernel.  Computational storage allows a user to do the same types of things – develop programs on a host and have the controller execute them without having to change the firmware on the controller. Using a hardware agnostic instruction set is preferred to having an application need to download x86 or ARM code based on what architecture is running. DL:  It is much easier to establish a standard ecosystem with architecture independence.  Instead of an application needing to download x86 or ARM code based on the architecture, you can use a hardware agnostic instruction set where the kernel can interpret and then translate the instructions based on the processor. Computational storage would not need to know the processor running on an NVMe device with this “agnostic code”. SOS: How has the use of eBPF evolved? JH:  It is more efficient to run programs directly in the kernel I/O stack rather than have to return packet data to the user, operate on it there, and then send the data back to the kernel. In the Linux kernel, eBPF began as a way to capture and filter network packets.  Over time, eBPF use has evolved to additional use cases. SOS: What are some use case examples? DL: One of the use cases is performance analysis. For example, eBPF can be used to measure things such as latency distributions for file system I/O, details of storage device I/O and TCP retransmits, and blocked stack traces and memory. Matias Bjørling (MB): Other examples in the Linux kernel include tracing and gathering statistics. However, while the eBPF programs in the kernel are fairly simple, and can be verified by the Linux kernel VM, computational programs are more complex, and longer running. Thus, there is a lot of work ongoing to explore how to efficiently apply eBPF to computational programs. For example, what is the right set of run-time restrictions to be defined by the eBPF VM, any new instructions to be defined, how to make the program run as close to the instruction set of the target hardware. JH: One of the big use cases involves data analytics and filtering. A common data flow for data analytics are large database table files that are often compressed and encrypted. Without computational storage, you read the compressed and encrypted data blocks to the host, decompress and decrypt the blocks, and maybe do some filtering operations like a SQL query. All this, however, consumes a lot of extra host PCIe, host memory, and cache bandwidth because you are reading the data blocks and doing all these operations on the host.  With computational storage, inside the device you can tell the SSD to read data and transfer it not to the host but to some memory buffers within the SSD.  The host can then tell the controller to do a fixed function program like decrypt the data and put in another local location on the SSD, and then do a user supplied program like eBPF to do some filtering operations on that local decrypted data. In the end you would transfer the filtered data to the host.  You are doing the compute closer to the storage, saving memory and bandwidth. SOS:  How does using eBPF for computational storage look the same?  How does it look different? Jim – There are two parts to this answer.  Part 1 is the eBPF instruction set with registers and how eBPF programs are assembled. Where we are excited about computational storage and eBPF is that the instruction set is common. There are already existing tool chains that support eBPF.   You can take a C program and compile it into an eBPF object file, which is huge.  If you add computational storage aspects to standards like NVMe, where developing a unique tool chain support can take a lot of work, you can now leverage what is already there for the eBPF ecosystem. Part 2 of the answer centers around the Linux kernel’s restrictions on what an eBPF program is allowed to do when downloaded. For example, the eBPF instruction set allows for unbounded loops, and toolchains such as gcc will generate eBPF object code with unbounded loops, but the Linux kernel will not permit those to execute – and rejects the program. These restrictions are manageable when doing packet processing in the kernel.  The kernel knows a packet’s specific data structure and can verify that data is not being accessed outside the packet.  With computational storage, you may want to run an eBPF program that operates on a set of data that has a very complex data structure – perhaps arrays not bounded or multiple levels of indirection.  Applying Linux kernel verification rules to computational storage would limit or even prevent processing this type of data. SOS: What are some of the other challenges you are working through with using eBPF for computational storage? MB:  We know that x86 works fast with high memory bandwidth, while other cores are slower.  We have some general compute challenges in that eBPF needs to be able to hook into today’s hardware like we do for SSDs.  What kind of operations make sense to offload for these workloads?  How do we define a common implementation API for all of them and build an ecosystem on top of it?  Do we need an instruction-based compiler, or a library to compile up to – and if you have it on the NVMe drive side, could you use it?  eBPF in itself is great- but getting a whole ecosystem and getting all of us to agree on what makes value will be the challenge in the long term. Oscar Pinto (OP): The Linux kernel for eBPF today is more geared towards networking in its functionality but light on storage. That may be a challenge in building a computational storage framework. We need to think through how to enhance this given that we download and execute eBPF programs in the device. As Matias indicated, x86 is great at what it does in the host today. But if we have to work with smaller CPUs in the device, they may need help with say dedicated hardware or similar implemented using additional logic to aid the eBPF programs One question is how would these programs talk to them? We don’t have a setup for storage like this today, and there are a variety of storage services that can benefit from eBPF. SOS: Is SNIA addressing this challenge? OP: On the SNIA side we are building on program functions that are downloaded to computational storage engines.  These functions run on the engines which are CPUs or some other form of compute that are tied to a FPGA, DPU, or dedicated hardware. We are defining these abstracted functionalities in SNIA today, and the SNIA Computational Storage Technical Work Group is developing a Computational Storage Architecture and Programming Model and Computational Storage APIs  to address it..  The latest versions, v0.8 and v0.5, has been approved by the SNIA Technical Council, and is now available for public review and comment at SNIA Feedback Portal. SOS: Is there an eBPF standard? Is it aligned with storage? JH:  We have a challenge around what an eBPF standard should look like.  Today it is defined in the Linux kernel.  But if you want to incorporate eBPF in a storage standard you need to have something specified for that storage standard.  We know the Linux kernel will continue to evolve adding and modifying instructions. But if you have a NVMe SSD or other storage device you have to have something set in stone –the version of eBPF that the standard supports. We need to know what the eBPF standard will look like and where will it live.  Will standards organizations need to define something separately? SOS:  What would you like an eBPF standard to look like from a storage perspective? JH – We’d like an eBPF standard that can be used by everyone.  We are looking at how computational storage can be implemented in a way that is safe and secure but also be able to solve use cases that are different. MB:  Security will be a key part of an eBPF standard.  Programs should not access data they should not have access to.  This will need to be solved within a storage device. There are some synergies with external key management. DL: The storage community has to figure out how to work with eBPF and make this standard something that a storage environment can take advantage of and rely on. SOS: Where do you see the future of eBPF? MB:  The vision is that you can build eBPFs and it works everywhere.  When we build new database systems and integrate eBPFs into them, we then have embedded kernels that can be sent to any NVMe device over the wire and be executed. The cool part is that it can be anywhere on the path, so there becomes a lot of interesting ways to build new architectures on top of this. And together with the open system ecosystem we can create a body of accelerators in which we can then fast track the build of these ecosystems.  eBPF can put this into overdrive with use cases outside the kernel. DL:  There may be some other environments where computational storage is being evaluated, such as web assembly. JH: An eBPF run time is much easier to put into an SSD than a web assembly run time. MB: eBPF makes more sense – it is simpler to start and build upon as it is not set in stone for one particular use case. Eli Tiomkin (ET):  Different SSDs have different levels of constraints. Every computational storage SSDs in production and even those in development have very unique capabilities that are dependent on the workload and application. SOS:  Any final thoughts? MB: At this point, technologies are coming together which are going to change the industry in a way that we can redesign the storage systems both with computational storage and how we manage security in NVMe devices for these programs.  We have the perfect storm pulling things together. Exciting platforms can be built using open standards specifications not previously available. SOS: Looking forward to this exciting future. Thanks to you all. The post What is eBPF, and Why Does it Matter for Computational Storage? first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Q&A: Security of Data on NVMe-oF

John Kim

Jul 28, 2021

title of post

Ensuring the security of data on NVMe® over Fabrics was the topic of our SNIA Networking Storage Forum (NSF) webcast “Security of Data on NVMe over Fabrics, the Armored Truck Way.” During the webcast our experts outlined industry trends, potential threats, security best practices and much more. The live audience asked several interesting questions and here are answers to them.

Q. Does use of strong authentication and network encryption ensure I will be compliant with regulations such as HIPAA, GDPR, PCI, CCPA, etc.?

A. Not by themselves. Proper use of strong authentication and network encryption will reduce the risk of data theft or improper data access, which can help achieve compliance with data privacy regulations. But full compliance also requires establishment of proper processes, employee training, system testing and monitoring. Compliance may also require regular reviews and audits of systems and processes plus the involvement of lawyers and compliance consultants.

Q. Does using encryption on the wire such as IPsec, FC_ESP, or TLS protect against ransomware, man-in-the middle attacks, or physical theft of the storage system?

A. Proper use of data encryption on the storage network can protect against man-in-the middle snooping attacks because any data intercepted would be encrypted and very difficult to decrypt.  Use of strong authentication such DH-HMAC-CHAP can reduce the risk of a man-in-the-middle attack succeeding in the first place. However, encrypting data on the wire does not by itself protect against ransomware nor against physical theft of the storage systems because the data is decrypted once it arrives on the storage system or on the accessing server.

Q. Does "zero trust" mean I cannot trust anybody else on my IT team or trust my family members?

A. Zero Trust does not mean your coworker, mother or cousin is a hacker.  But it does require assuming that any server, user (even your coworker or mother), or application could be compromised and that malware or hackers might already be inside the network, as opposed to assuming all threats are being kept outside the network by perimeter firewalls. As a result, Zero Trust means regular use of security technologies--including firewalls, encryption, IDS/IPS, anti-virus software, monitoring, audits, penetration testing, etc.--on all parts of the data center to detect and prevent attacks in case one of the applications, machines or users has been compromised.

Q. Great information! Is there any reference security practice for eBOF and NVMe-oF™ that you recommend?

A. Generally security practices with an eBOF using NVMe-oF would be similar to with traditional storage arrays (whether they use NVMe-oF, iSCSI, FCP, or a NAS protocol). You should authenticate users, emplace fine-grained access controls, encrypt data, and backup your data regularly. You might also want to physically or logically separate your storage network from the compute traffic or user access networks. Some differences may arise from the fact that with an eBOF, it's likely that multiple servers will access multiple eBOFs directly, instead of each server going to a central storage controller that in turn accesses the storage shelves or JBOFs.

Q. Are there concerns around FC-NVMe security when it comes to Fibre Channel Fabric services? Can a rogue NVMe initiator discover the subsystem controllers during the discovery phase and cause a denial-of-service kind of attack? Under such circumstances can DH-CHAP authentication help?

A. A rogue initiator might be able to discover storage arrays using the FC-NVMe protocol but this may be blocked by proper use of Fibre Channel zoning and LUN masking. If a rogue initiator is able to discover a storage array, proper use of DH-CHAP should prevent it from connecting and accessing data, unless the rogue initiator is able to successfully impersonate a legitimate server. If the rogue server is able to discover an array using FC-NVMe, but cannot connect due to being blocked by strong authentication, it could initiate a denial-of-service attack and DH-CHAP by itself would not block or prevent a denial-of-service attack.

Q. With the recent example of Colonial Pipeline cyber-attack, can you please comment on what are best practice security recommendations for storage with regards to separation of networks for data protection and security?

A. It's a best practice to separate storage networks from the application and/or user networks. This separation can be physical or logical and could include access controls and authentication within each physical or logical network. A separate physical network is often used for management and monitoring. In addition, to protect against ransomware, storage systems should be backed up regularly with some backups kept physically offline, and the storage team should practice restoring data from backups on a regular basis to verify the integrity of the backups and the restoration process.

For those of you who follow the many educational webcasts that the NSF hosts, you may have noticed that we are discussing the important topic of data security a lot. In fact, there is an entire Storage Networking Security Webcast Series that dives into protecting data at rest, protecting data in flight, encryption, key management, and more.

We’ve also been talking about NVMe-oF a lot. I encourage you to watch “NVMe-oF: Looking Beyond Performance Hero Numbers” where our SNIA experts explain why it is important to look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency. And if you’re ready for more, you can “Geek Out” on NVMe-oF here, where we’ve curated several great basic and advanced educational assets on NVMe-oF.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Q&A: Security of Data on NVMe-oF

John Kim

Jul 28, 2021

title of post
Ensuring the security of data on NVMe over Fabrics was the topic of our SNIA Networking Storage Forum (NSF) webcast “Security of Data on NVMe over Fabrics, the Armored Truck Way.” During the webcast our experts outlined industry trends, potential threats, security best practices and much more. The live audience asked several interesting questions and here are answers to them. Q. Does use of strong authentication and network encryption ensure I will be compliant with regulations such as HIPAA, GDPR, PCI, CCPA, etc.? A. Not by themselves. Proper use of strong authentication and network encryption will reduce the risk of data theft or improper data access, which can help achieve compliance with data privacy regulations. But full compliance also requires establishment of proper processes, employee training, system testing and monitoring. Compliance may also require regular reviews and audits of systems and processes plus the involvement of lawyers and compliance consultants. Q. Does using encryption on the wire such as IPsec, FC_ESP, or TLS protect against ransomware, man-in-the middle attacks, or physical theft of the storage system? A. Proper use of data encryption on the storage network can protect against man-in-the middle snooping attacks because any data intercepted would be encrypted and very difficult to decrypt. Use of strong authentication such DH-HMAC-CHAP can reduce the risk of a man-in-the-middle attack succeeding in the first place. However, encrypting data on the wire does not by itself protect against ransomware nor against physical theft of the storage systems because the data is decrypted once it arrives on the storage system or on the accessing server. Q. Does “zero trust” mean I cannot trust anybody else on my IT team or trust my family members? A. Zero Trust does not mean your coworker, mother or cousin is a hacker.  But it does require assuming that any server, user (even your coworker or mother), or application could be compromised and that malware or hackers might already be inside the network, as opposed to assuming all threats are being kept outside the network by perimeter firewalls. As a result, Zero Trust means regular use of security technologies–including firewalls, encryption, IDS/IPS, anti-virus software, monitoring, audits, penetration testing, etc.–on all parts of the data center to detect and prevent attacks in case one of the applications, machines or users has been compromised. Q. Great information! Is there any reference security practice for eBOF and NVMe-oF that you recommend? A. Generally security practices with an eBOF using NVMe-oF would be similar to with traditional storage arrays (whether they use NVMe-oF, iSCSI, FCP, or a NAS protocol). You should authenticate users, emplace fine-grained access controls, encrypt data, and backup your data regularly. You might also want to physically or logically separate your storage network from the compute traffic or user access networks. Some differences may arise from the fact that with an eBOF, it’s likely that multiple servers will access multiple eBOFs directly, instead of each server going to a central storage controller that in turn accesses the storage shelves or JBOFs. Q. Are there concerns around FC-NVMe security when it comes to Fibre Channel Fabric services? Can a rogue NVMe initiator discover the subsystem controllers during the discovery phase and cause a denial-of-service kind of attack? Under such circumstances can DH-CHAP authentication help? A. A rogue initiator might be able to discover storage arrays using the FC-NVMe protocol but this may be blocked by proper use of Fibre Channel zoning and LUN masking. If a rogue initiator is able to discover a storage array, proper use of DH-CHAP should prevent it from connecting and accessing data, unless the rogue initiator is able to successfully impersonate a legitimate server. If the rogue server is able to discover an array using FC-NVMe, but cannot connect due to being blocked by strong authentication, it could initiate a denial-of-service attack and DH-CHAP by itself would not block or prevent a denial-of-service attack. Q. With the recent example of Colonial Pipeline cyber-attack, can you please comment on what are best practice security recommendations for storage with regards to separation of networks for data protection and security? A. It’s a best practice to separate storage networks from the application and/or user networks. This separation can be physical or logical and could include access controls and authentication within each physical or logical network. A separate physical network is often used for management and monitoring. In addition, to protect against ransomware, storage systems should be backed up regularly with some backups kept physically offline, and the storage team should practice restoring data from backups on a regular basis to verify the integrity of the backups and the restoration process. For those of you who follow the many educational webcasts that the NSF hosts, you may have noticed that we are discussing the important topic of data security a lot. In fact, there is an entire Storage Networking Security Webcast Series that dives into protecting data at rest, protecting data in flight, encryption, key management, and more. We’ve also been talking about NVMe-oF a lot. I encourage you to watch “NVMe-oF: Looking Beyond Performance Hero Numbers” where our SNIA experts explain why it is important to look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency. And if you’re ready for more, you can “Geek Out” on NVMe-oF here, where we’ve curated several great basic and advanced educational assets on NVMe-oF.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Moving Genomics to the Cloud

Alex McDonald

Jul 27, 2021

title of post
The study of genomics in modern biology has revolutionized the discovery of medicines and the COVID pandemic response has quickened genetic research and driven the rapid development of vaccines. Genomics, however, requires a significant amount of compute power and data storage to make new discoveries possible. Making sure compute and storage are not a roadblock for genomics innovations will be the topic of discussion at the SNIA Cloud Storage Technologies Initiative live webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” This session will feature expert viewpoints from both bioinformatics and technology perspectives with a focus on some of the compute and data storage challenges for genomics workflows. We will discuss:
  • How to best store and manage large genomics datasets
  • Methods for sharing large datasets for collaborative analysis
  • Legal and ethical implications of storing shareable data in the cloud
  • Transferring large data sets and the impact on storage and networking
Join us for this live event on September 9, 2021 for a fascinating discussion on an area of technology that is rapidly evolving and changing the world.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Moving Genomics to the Cloud

Alex McDonald

Jul 27, 2021

title of post
The study of genomics in modern biology has revolutionized the discovery of medicines and the COVID pandemic response has quickened genetic research and driven the rapid development of vaccines. Genomics, however, requires a significant amount of compute power and data storage to make new discoveries possible. Making sure compute and storage are not a roadblock for genomics innovations will be the topic of discussion at the SNIA Cloud Storage Technologies Initiative live webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” This session will feature expert viewpoints from both bioinformatics and technology perspectives with a focus on some of the compute and data storage challenges for genomics workflows. We will discuss:
  • How to best store and manage large genomics datasets
  • Methods for sharing large datasets for collaborative analysis
  • Legal and ethical implications of storing shareable data in the cloud
  • Transferring large data sets and the impact on storage and networking
Join us for this live event on September 9, 2021 for a fascinating discussion on an area of technology that is rapidly evolving and changing the world.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Extending Storage to the Edge

Jim Fister

Jul 19, 2021

title of post
Data gravity has pulled computing to the Edge and enabled significant advances in hybrid cloud deployments. The ability to run analytics from the datacenter to the Edge, where the data is generated and lives, also creates new use cases for nearly every industry and company. However, this movement of compute to the Edge is not the only pattern to have emerged. How might other use cases impact your storage strategy? That’s the topic of our next SNIA Cloud Storage Technologies Initiative (CSTI) live webcast on August 25, 2021 “Extending Storage to the Edge – How It Should Affect Your Storage Strategy” where our experts, Erin Farr, Senior Technical Staff Member, IBM Storage CTO Innovation Team and Vincent Hsu, IBM Fellow, VP & CTO for Storage will join us for an interactive session that will cover:
  • Emerging patterns of data movement and the use cases that drive them
  • Cloud Bursting
  • Federated Learning across the Edge and Hybrid Cloud
  • Considerations for distributed cloud storage architectures to match these emerging patterns
It is sure to be a fascinating and insightful discussion. Register today. Our esteemed expert will be on-hand to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Extending Storage to the Edge

Jim Fister

Jul 19, 2021

title of post
Data gravity has pulled computing to the Edge and enabled significant advances in hybrid cloud deployments. The ability to run analytics from the datacenter to the Edge, where the data is generated and lives, also creates new use cases for nearly every industry and company. However, this movement of compute to the Edge is not the only pattern to have emerged. How might other use cases impact your storage strategy? That’s the topic of our next SNIA Cloud Storage Technologies Initiative (CSTI) live webcast on August 25, 2021 “Extending Storage to the Edge – How It Should Affect Your Storage Strategy” where our experts, Erin Farr, Senior Technical Staff Member, IBM Storage CTO Innovation Team and Vincent Hsu, IBM Fellow, VP & CTO for Storage will join us for an interactive session that will cover:
  • Emerging patterns of data movement and the use cases that drive them
  • Cloud Bursting
  • Federated Learning across the Edge and Hybrid Cloud
  • Considerations for distributed cloud storage architectures to match these emerging patterns
It is sure to be a fascinating and insightful discussion. Register today. Our esteemed expert will be on-hand to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

A Storage Debate Q&A: Hyperconverged vs. Disaggregated vs. Centralized

John Kim

Jul 12, 2021

title of post
The SNIA Networking Storage Forum recently hosted another webcast in our Great Storage Debate webcast series. This time, our SNIA experts debated three competing visions about how storage should be done: Hyperconverged Infrastructure (HCI), Disaggregated Storage, and Centralized Storage. If you missed the live event, it’s available on-demand. Questions from the webcast attendees made the panel debate quite lively. As promised, here are answers to those questions. Q. Can you imagine a realistic scenario where the different storage types are used as storage tiers? How much are they interoperable? A. Most HCI solutions already have a tiering/caching structure built-in.  However, a user could use HCI for hot to warm data, and also tier less frequently accessed data out to a separate backup/archive.  Some of the HCI solutions have close partnerships with backup/archive vendor solutions just for this purpose. Q. Does Hyperconverged (HCI) use primarily object storage with erasure coding for managing the distributed storage, such as vSAN for VxRail (from Dell)? A. That is accurate for vSAN, but other HCI solutions are not necessarily object based. Even if object-based, the object interface is rarely exposed. Erasure coding is a common method of distributing the data across the cluster for increased durability with efficient space sharing. Q. Is there a possibility if two or more classification of storage can co-exist or deployed? Examples please? A. Often IT organizations have multiple types of storage deployed in their data centers, particularly over time with various types of legacy systems. Also, HCI solutions that support iSCSI can interface with these legacy systems to enable better sharing of data to avoid silos. Q. How would you classify HPC deployment given it is more distributed file systems and converged storage? does it need a new classification? A. Often HPC storage is deployed on large, distributed file systems (e.g. Lustre), which I would classify as distributed, scale-out storage, but not hyperconverged, as the compute is still on separate servers. Q. A lot of HCI solutions are already allowing heterogeneous nodes within a cluster. What about these “new” Disaggregated HCI solutions that uses “traditional” storage arrays in the solution (thus not using a Software Defined Storage solution? Doesn’t it sound a step back? It seems most of the innovation comes on the software. A. The solutions marketed as disaggregated HCI are not HCI. They are traditional servers and storage combined in a chassis.  This would meet the definition of converged, but not hyperconverged. Q. Why is HCI growing so quickly and seems so popular of late?  It seems to be one of the fastest growing “data storage” use cases. A. HCI has many advantages, as I shared in the slides up front. The #1 reason for the growth and popularity is the ease of deployment and management. Any IT person who is familiar with deploying and managing a VM can now easily deploy and manage the storage with the VM.  No specialized storage system skillsets required, which makes better use of limited IT people resources, and reduces OpEx. Q. Where do you categorize newer deployments like Vast Data? Is that considered NAS since it presents as NFS and CIFS? A. I would categorize Vast Data as scale-out, software-defined storage.  HCI is also a type of scale-out, software-defined storage, but with compute as well, so that is the key difference. Q. So what happens when HCI works with ANY storage including centralized solutions. What is HCI then? A. I believe this question is referencing the SCSi interface support. HCI solutions that support iSCSI can interface with other types of storage systems to enable better sharing of data to avoid silos. Q. With NVMe/RoCE becoming more available, DAS-like performance while have reducing CPU usage on the hosts massively, saving license costs (potentially, we are only in pilot phase) does the ball swing back towards disaggregated? A. I’m not sure I fully understand the question, but RDMA can be used to streamline the inter-node traffic across the HCI cluster.  Network performance becomes more critical as the size of the cluster, and therefore the traffic between nodes increases, and RDMA can reduce any network bottlenecks. RoCEv2 is popular, and some HCI solutions also support iWARP.  Therefore, as HCI solutions adopt RDMA, this is not a driver to disaggregated. Q. HCI was initially targeted at SMB and had difficulty scaling beyond 16 nodes. Why would HCI be the choice for large scale enterprise implementations? A. HCI has proven itself as capable of running a broad range of workloads in small to large data center environments at this point. Each HCI solution can scale to different numbers of nodes, but usage data shows that single clusters rarely exceed about 12 nodes, and then users start a new cluster. There are a mix of reasons for this: concerns about the size of failure domains, departmental or remote site deployment size requirements, but often it’s the software license fees for the applications running on the HCI infrastructure that limits the typical clusters sizes in practice. Q. SPC (Storage Performance Council) benchmarks are still the gold standard (maybe?) and my understanding is they typically use an FC SAN. Is that changing? I understand that the underlying hardware is what determines performance but I’m not aware of SPC benchmarks using anything other than SAN. A. Myriad benchmarks are used to measure HCI performance across a cluster. I/O benchmarks that are variants on FIO are common to measure the storage performance, and then the compute performance is often measured using other benchmarks, such as TPC benchmarks for database performance, LoginVSI for VDI performance, etc. Q. What is the current implementation mix ratio in the industry? What is the long-term projected mix ratio? A. Today the enterprise is dominated by centralized storage with HCI in second place and growing more rapidly. Large cloud service providers and hyperscalers are dominated by disaggregated storage, but also use some centralized storage and some have their own customized HCI implementations for specific workloads. HPC and AI customers use a mix of disaggregated and centralized storage. In the long-term, it’s possible that disaggregated will have the largest overall share since cloud storage is growing the most, with centralized storage and HCI splitting the rest. Q. Is the latency high for HCI vs. disaggregated vs. centralized? A. It depends on the implementation. HCI and disaggregated might have slightly higher latency than centralized storage if they distribute writes across nodes before acknowledging them or if they must retrieve reads from multiple nodes. But HCI and disaggregated storage can also be implemented in a way that offers the same latency as centralized. Q. What about GPUDirect? A. GPUDirect Storage allows GPUs to access storage more directly to reduce latency. Currently it is supported by some types of centralized and disaggregated storage. In the future, it might be supported with HCI as well. Q. Splitting so many hairs here. Each of the three storage types are more about HOW the storage is consumed by the user/application versus the actual architecture. A. Yes, that is largely correct, but the storage architecture can also affect how it’s consumed. Q. Besides technical qualities, is there a financial differentiator between solutions? For example, OpEx and CapEx, ROI? A. For very large-scale storage implementations, disaggregated generally has the lowest CapEx and OpEx because the higher initial cost of managing distributed storage software is amortized across many nodes and many terabytes. For medium to large implementations, centralized storage usually has the best CapEx and OpEx. For small to medium implementations, HCI usually has the lowest CapEx and OpEx because it’s easy and fast to acquire and deploy. However, it always depends on the specific type of storage and the skill set or expertise of the team managing the storage. Q. Why wouldn’t disaggregating storage compute and memory be the next trend? The Hyperscalers have already done it. What are we waiting for? A. Disaggregating compute is indeed happening, supported by VMs, containers, and faster network links. However, disaggregating memory across different physical machines is more challenging because even today’s very fast network links have much higher latency than memory. For now, memory disaggregation is largely limited to being done “inside the box” or within one rack with links like PCIe, or to cases where the compute and memory stick together and are disaggregated as a unit. Q. Storage lends itself as first choice for disaggregation as mentioned before. What about disaggregation of other resources (such as networking, GPU, memory) in the future and how do you believe will it impact the selection of centralized vs disaggregated storage? Will Ethernet stay 1st choice for the fabric for disaggregation? A. See the above answer about disaggregating memory. Networking can be disaggregated within a rack by using a very low-latency fabric, for example PCIe, but usually networking is used to support disaggregation of other resources. GPUs can be disaggregated but normally still travel with some CPU and memory in the same box, though this could change in the near future. Ethernet will indeed remain the 1st networking choice for disaggregation, but other network types will also be used (InfiniBand, Fibre Channel, Ethernet with RDMA, etc.) Don’t forget to check out our other great storage debates, including: File vs. Block vs. Object Storage, Fibre Channel vs. iSCSI, FCoE vs. iSCSI vs. iSER, RoCE vs. iWARP, and Centralized vs. Distributed. You can view them all on our SNIAVideo YouTube Channel.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

An Easy Path to Confidential Computing

Michael Hoard

Jul 6, 2021

title of post
To counter the ever-increasing likelihood of catastrophic disruption and cost due to enterprise IT security threats, data center decision makers need to be vigilant in protecting their organization’s data. Confidential Computing is architected to provide security for data in use to meet this critical need for enterprises today. The next webcast in our Confidential Computing series is “How to Easily Deploy Confidential Computing.” It will provide insight into how data center, cloud and edge applications may easily benefit from cost-effective, real-world Confidential Computing solutions. This educational discussion on July 28, 2021 will provide end-user examples, tips on how to assess systems before and after deployment, as well as key steps to complete along the journey to mitigate threat exposure.  Presenting will be Steve Van Lare (Anjuna), Anand Kashyap (Fortanix), and Michael Hoard (Intel), who will discuss: ·       What would it take to build-your-own Confidential Computing solution? ·       Emergence of easily deployable, cost-effective Confidential Computing solutions ·       Real-world usage examples and key technical, business and investment insights Hosted by the SNIA Cloud Storage Technologies Initiative (CSTI), this webinar acts as a grand finale in our three-part Confidential Computing series.  Earlier we covered an introduction What is Confidential Computing and Why Should I Care? and how Confidential Computing works in multi-tenant cloud environments “Confidential Computing: Protecting Data in Use.”   Please join us on July 28th for this exciting discussion.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Confidential Computing FAQ

Jim Fister

Jun 25, 2021

title of post
Recently, the SNIA Cloud Storage Technologies Initiative (CSTI) I hosted a lively panel discussion “What is Confidential Computing and Why Should I Care?” It was the first in a 3-part series of Confidential Computing security discussions. You can learn about the series here.  The webcast featured three experts who are working to define the Confidential Computing architecture, Mike Bursell of the Enarx Project, David Kaplan at AMD, and Ronald Perez from Intel. This session served as an introduction to the concept of Confidential Computing and examined the technology and its initial uses. The audience asked several interesting questions. We’re answering some of the more basic questions here, as well as some that did not get addressed directly during the live event. Q. What is Confidential Computing?  How does it complement existing security efforts, such as the Trusted Platform Model (TPM)? A.  Confidential Computing is an architectural approach to security that uses virtualization to create a Trusted Execution Environment (TEE).  This environment can run any amount of code within it, though the volume of code is usually selective in the protected environment. This allows data to be completely protected, even from other code and data running in the system. Q.  Is Confidential Computing only for a CPU architecture? A. The current architecture is focused on delivering this capability via the CPU, but nothing limits other system components such as GPU, FPGA, or the like from implementing a similar architecture. Q. It was mentioned that with Confidential Compute, one only needs to trust their own code along with the hardware. With the prevalence of microarchitectural attacks that break down various isolation mechanisms, can the hardware really be trusted? A. Most of the implementations to create a TEE are using fairly well-tested hardware and security infrastructure.  As such, the threat profile is fairly low. However, any implementation in the market does need to ensure that it’s following proper protocol to best protect data.  An example would be ensuring that data in the TEE is only used or accessed there and is not passed to non-trusted execution areas. Q. Are there potential pitfalls in the TEE implementations that might become security issues later, similar to speculative execution?  Are there potential side-channel attacks using TEE? A. No security solution is 100% secure and there is always a risk of vulnerabilities in any product. But perfect cannot be the enemy of good, and TEEs are a great defense-in-depth tool to provide an additional layer of isolation on top of existing security controls, making data that much more secure.  Additionally, the recent trend has been to consider security much earlier in the design process and perform targeted security testing to try to identify and mitigate issues as early as possible. Q. Is this just a new technology, or is there a bigger value proposition?  What’s in it for the CISO or the CIO? A. There are a variety of answers to this. One would be that running TEE in the cloud provides the protection for vital workloads that otherwise would not be able to run on a shared system.  Another benefit is that key secrets can be secured while much of the rest of the code can be run at a lower privilege level, which helps with costs. In terms of many security initiatives, Confidential Computing might be one that is easier to explain to the management team. Q. Anybody have a guess at what a regulation/law might look like? Certification test analogous to FCC (obviously more complex)? Other approaches? A. This technology is in response to the need for stronger security and privacy which includes legal compliance with regulations being passed by states like California. But this has not taken the form of certifications at this time.  Individual vendors will retain the necessary functions of their virtualization products and may consider security as one of the characteristics within their certification. To hear answers to all the questions that our esteemed panel answered during the live event. Please watch this session on-demand.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to