Sorry, you need to enable JavaScript to visit this website.
Erin Farr

Sep 14, 2023

title of post
Confidential AI is a new collaborative platform for data and AI teams to work with sensitive data sets and run AI models in a confidential environment. It includes infrastructure, software, and workflow orchestration to create a secure, on-demand work environment that meets organization’s privacy requirements and complies with regulatory mandates. It’s a topic the SNIA Cloud Storage Technologies Initiative (CSTI) covered in depth at our webinar, “The Rise in Confidential AI.” At this webinar, our experts, Parviz Peiravi and Richard Searle provided a deep and insightful look at how this dynamic technology works to ensure data protection and data privacy. Here are their answers to the questions from our webinar audience. Q. Are businesses using Confidential AI today? A. Absolutely, we have seen a big increase in adoption of Confidential AI particularly in industries such as Financial Services, Healthcare and Government, where Confidential AI is helping these organizations enhance risk mitigation, including cybercrime prevention, anti-money laundering, fraud prevention and more. Q: With compute capabilities on the Edge increasing, how do you see Trusted Execution Environments evolving? A. One of the important things about Confidential Computing is although it's a discrete privacy enhancing technology, it's part of the underlying broader, distributed data center compute hardware. However, the Edge is going to be increasingly important as we look ahead to things like 6G communication networks. We see a role for AI at the Edge in terms of things like signal processing and data quality evaluation, particularly in situations where the data is being sourced from different endpoints. Q: Can you elaborate on attestation within a Trusted Execution Environment (TEE)? A. One of the critical things about Confidential Computing is the need for an attested Trusted Execution Environment. In order to have that reassurance of confidentiality and the isolation and integrity guarantees that we spoke about during the webinar, attestation is the foundational truth of Confidential Computing and is absolutely necessary. In every secure implementation of confidential AI, attestation provides the assurance that you're working in that protected memory region, that data and software instructions can be secured in memory, and that the AI workload itself is shielded from the other elements of the computing system. If you're starting with hardware-based technology, then you have the utmost security, removing the majority of actors outside of the boundary of your trust. However, this also creates a level of isolation that you might not want to use for an application that doesn't need this high level of security. You must balance utmost security with your application’s appetite for risk. Q: What is your favorite reference for implementing Confidential Computing that bypasses the OS, BIOS, VMM (Virtual Machine Manager) and uses the root trust certificate? A. It's important to know that there are different implementations of Trusted Execution Environments, and they are very relevant to different types of purposes. For example, there are process-based TEEs that enable a very discrete definition of a TEE and provide the ability to write specific code and protect very sensitive information because of the isolation from things like the hypervisor and virtual machine manager. There are also different technologies available now that have a virtualization basis and include a guest operating system within their trusted computing base, but they provide greater flexibility in terms of implementation, so you might want to use that when you have a larger application or a more complex deployment. The Confidential Computing Consortium, which is part of The Linux Foundation, is also a good resource to keep up with Confidential AI guidance. Q: Can you please give us a picture of the upcoming standards for strengthening security? Do you believe that European Union’s AI Act (EU AI Act) is going in the right direction and that it will have a positive impact on the industry? A. That’s a good question. The draft EU AI Act was approved in June 2023 by the European Parliament, but the UN Security Council has also put out a call for international regulation in the same way that we have treaties and conventions. We think what we're going to see is different nation states taking discrete approaches. The UK has taken an open approach to AI regulation in order to stimulate innovation. The EU already has a very prescriptive data protection regulation method, and the EU AI Act takes a similar approach. It's quite prescriptive and designed to complement data privacy regulations that already exist. Q. Where do you think some of the biggest data privacy issues are within generative AI? A. There's quite a lot of debate already about how these massive generative AI systems have used data scraped from the web, whether things like copyright provisions have been acknowledged, and whether data privacy in imagery from social media has been respected. At an international level, it's going to be interesting to see whether people can agree on a cohesive framework to regulate AI and to see if different countries can agree. There’s also the issue of the time required to develop legislation being superseded by technological developments. We saw ChatGPT to be very disruptive last year. There are also ethical considerations around this topic which the SNIA CSTI covered in a webinar “The Ethics of Artificial Intelligence.” Q. Are you optimistic that regulators can come to an agreement on generative AI? A. In the last four or five years, regulators have become more open to working with financial institutions to better understand the impact of adopting new technologies such as AI and generative AI. This collaboration among regulators with those in the financial sector is creating momentum. Regulators such as the Monetary Authority of Singapore are leading this strategy, actively working with vendors to understand the technology application within financial services and how to guide the rest of the banking industry.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

How Edge Data is Impacting AI

Erin Farr

Sep 6, 2023

title of post
AI is disrupting so many domains and industries and by doing so, AI models and algorithms are becoming increasingly large and complex. This complexity is driven by the proliferation in size and diversity of localized data everywhere, which creates the need for a unified data fabric and/or federated learning. It could be argued that whoever wins the data race will win the AI race, which is inherently built on two premises: 1) Data is available in a central location for AI to have full access to it, 2) Compute is centralized and abundant. The impact of edge AI is the topic for our next SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Why Distributed Edge Data is the Future of AI,” on October 3, 2023. If centralized (or in the cloud), AI is a superpower and super expert, but edge AI is a community of many smart wizards with the power of cumulative knowledge over a central superpower.  In this webinar, our SNIA experts will discuss:
  • The value and use cases of distributed edge AI
  • How data fabric on the edge differs from the cloud and its impact on AI
  • Edge device data privacy trade-offs and distributed agency trends
  • Privacy mechanisms for federated learning, inference, and analytics
  • How interoperability between cloud and edge AI can happen
Register here to join us on October 3rd. Our experts will be ready to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

How Edge Data is Impacting AI

Erin Farr

Sep 6, 2023

title of post

AI is disrupting so many domains and industries and by doing so, AI models and algorithms are becoming increasingly large and complex. This complexity is driven by the proliferation in size and diversity of localized data everywhere, which creates the need for a unified data fabric and/or federated learning. It could be argued that whoever wins the data race will win the AI race, which is inherently built on two premises: 1) Data is available in a central location for AI to have full access to it, 2) Compute is centralized and abundant.

The impact of edge AI is the topic for our next SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Why Distributed Edge Data is the Future of AI,” on October 3, 2023. If centralized (or in the cloud), AI is a superpower and super expert, but edge AI is a community of many smart wizards with the power of cumulative knowledge over a central superpower.  In this webinar, our SNIA experts will discuss:

  • The value and use cases of distributed edge AI
  • How data fabric on the edge differs from the cloud and its impact on AI
  • Edge device data privacy trade-offs and distributed agency trends
  • Privacy mechanisms for federated learning, inference, and analytics
  • How interoperability between cloud and edge AI can happen

Register here to join us on October 3rd. Our experts will be ready to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Michael Hoard

Aug 21, 2023

title of post
Unification of structured and unstructured data has long been a goal – and challenge for organizations. Data Fabric is an architecture, set of services and platform that standardizes and integrates data across the enterprise regardless of data location (On-Premises, Cloud, Multi-Cloud, Hybrid Cloud), enabling self-service data access to support various applications, analytics, and use cases. The data fabric leaves data where it lives and applies intelligent automation to govern, secure and bring AI to your data. How a data fabric abstraction layer works and the benefits it delivers was the topic of our recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “Data Fabric: Connecting the Dots between Structured and Unstructured Data.” If you missed it, you can watch it on-demand and access the presentations slides at the SNIA Educational Library. We did not have time to answer audience questions at the live session. Here are answers from our expert, Joseph Dain. Q. What are some of the biggest challenges you have encountered when building this architecture? A. The scale of unstructured data makes it challenging to build a catalog of this information. With structured data you may have thousands or hundreds of thousands of table assets, but in unstructured data you can have billions of files and objects that need to be tracked at massive scale. Another challenge is masking unstructured data. With structured data you have a well-defined schema so it is easier to mask specific columns but in unstructured data you don’t have such a schema so you need to be able to understand what term needs to be masked in an unstructured document and you need to know the location of that field without having the luxury of a well-defined schema to guide you. Q. There can be lots of data access requests from many users. How is this handled? A. The data governance layer has two aspects that are leveraged to address this. The first aspect is data privacy rules which are automatically enforced during data access requests and are typically controlled at a group level. The second aspect is the ability to create custom workflows with personas that enable users to initiate data access requests which are sent to the appropriate approvers. Q. What are some of the next steps with this architecture? A. One area of interest is leveraging computational storage to do the classification and profiling of data to identify aspects such as personally identifiable information (PII). In particular, profiling vast amounts of unstructured data for PII is a compute, network, storage, and memory intense operation. By performing this profiling leveraging computational storage close to the data, we gain efficiencies in the rate at which we can process data with less resource consumption. We continue to offer educational webinars on a wide range of cloud-related topics throughout the year. Please follow us @sniacloud_com to make sure you don’t miss any.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Michael Hoard

Aug 21, 2023

title of post
Unification of structured and unstructured data has long been a goal – and challenge for organizations. Data Fabric is an architecture, set of services and platform that standardizes and integrates data across the enterprise regardless of data location (On-Premises, Cloud, Multi-Cloud, Hybrid Cloud), enabling self-service data access to support various applications, analytics, and use cases. The data fabric leaves data where it lives and applies intelligent automation to govern, secure and bring AI to your data. How a data fabric abstraction layer works and the benefits it delivers was the topic of our recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “Data Fabric: Connecting the Dots between Structured and Unstructured Data.” If you missed it, you can watch it on-demand and access the presentations slides at the SNIA Educational Library. We did not have time to answer audience questions at the live session. Here are answers from our expert, Joseph Dain. Q. What are some of the biggest challenges you have encountered when building this architecture? A. The scale of unstructured data makes it challenging to build a catalog of this information. With structured data you may have thousands or hundreds of thousands of table assets, but in unstructured data you can have billions of files and objects that need to be tracked at massive scale. Another challenge is masking unstructured data. With structured data you have a well-defined schema so it is easier to mask specific columns but in unstructured data you don’t have such a schema so you need to be able to understand what term needs to be masked in an unstructured document and you need to know the location of that field without having the luxury of a well-defined schema to guide you. Q. There can be lots of data access requests from many users. How is this handled? A. The data governance layer has two aspects that are leveraged to address this. The first aspect is data privacy rules which are automatically enforced during data access requests and are typically controlled at a group level. The second aspect is the ability to create custom workflows with personas that enable users to initiate data access requests which are sent to the appropriate approvers. Q. What are some of the next steps with this architecture? A. One area of interest is leveraging computational storage to do the classification and profiling of data to identify aspects such as personally identifiable information (PII). In particular, profiling vast amounts of unstructured data for PII is a compute, network, storage, and memory intense operation. By performing this profiling leveraging computational storage close to the data, we gain efficiencies in the rate at which we can process data with less resource consumption. We continue to offer educational webinars on a wide range of cloud-related topics throughout the year. Please follow us @sniacloud_com to make sure you don’t miss any.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erik Smith

Aug 7, 2023

title of post
The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here. Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network? A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS). Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? A: To the best of my knowledge the only CDC implementation currently available is Dell’s SFSS. Q: You addressed the authentication piece of the security picture, but what about the other half - encryption. Are there encryption solutions available or in the works? A: I was running out of time and flew through that section. Both Authentication (DH-HMAC-CHAP) and Secure Channels (TLS 1.3) may be used per the specification. Dell does not support either of these yet, but we are working on it. Q: I believe NVMe/Fibre Channel is widely deployed as well. Is that true? A: Not based on what I’m seeing. NVMe/FC has been around for a while, it works well and Dell does support it. However, adoption has been slow. Again, based on what I’m seeing, NVMe/TCP seems to be gaining more traction. Q: Is nvme-stas an "in-box" solution, EPEL solution, or prototype solution?  A: It currently depends on the distro.
  • SLES 15 SP4 and SP5 - Inbox
  • RHEL 9.X - Inbox (Tech Preview) [RHEL 8.X: not available]
  • Ubuntu 22.04 - Universe (Community support)
Q: Regarding the slide comparing iSCSI, NVMe-oF, FC speeds, how do these numbers compare to RDMA transport over Ethernet or Infiniband (iSCSI Extensions for RDMA (iSER) or NMVe-oF RMDA)?  Closer to the FC NVMe-oF numbers? Did you consider NVMe-oF RoCE or is there not enough current or perceived future adoption rate? As a follow-on, do you see the same pitfalls with connectivity/hops as seen with FCoE? A: When we first started looking at NVMe over fabrics, we spent quite a bit of time working with RoCE, iWARP, NVMe/TCP and NVMe/FC. Some of these test results were presented during a previous webinar NVMe-oF: Looking Beyond Performance Hero Numbers . The RoCE performance numbers were actually amazing, especially at 100GbE and were much better than anything else we looked at with the exception of NVMe/TCP when hardware offload was used. The downsides to RoCE are described in the Hero numbers webinar referenced above. But the short version is, the last time I worked with it, it was difficult to configure and troubleshoot. I know NVIDIA has done a lot of work to make this better recently, but I think most end users will eventually end up using NVMe/TCP for general purpose IP SAN connectivity to external storage.  Q: Can you have multiple CDCs, like in a tree where you might have a CDC in an area of subnets that are segregated LAN wise, but would report or be managed by a manager of CDC so that you could have one centralized 'CDC' as an area that may present or have a physical presence in each of the different storage networks that are accessible by the segregated servers?  A: Theoretically yes. We have worked out the protocol details to provide this functionality. However, we could currently provide this functionality by providing a single CDC instance that has multiple network interfaces on it. We could then connect each interface to a different subnet. It would be a bit of work to configure, but it would get you out of needing to maintain multiple CDC instances.  Q: Does NVMe/TCP provide a block level or file level access to the storage?  A: Block. More information can be found in the blog post titled Storage Protocol Stacks for NVMe. Q: Which one will give best performance NVMe/TCP on 40G or NVMe/FC over on 32G?  A: It’s impossible to say without knowing the implementation we are talking about. I have also not seen any performance testing results for NVMe/TCP over 40GbE. Q: Ok, but creating two Ethernet fabrics for SAN A and SAN B goes against an ancient single-fabric network deployment standard... Besides: wouldn't this procedure require bare ripping Fibre Channel and replacing it with Ethernet? A: I agree. Air gapped SAN A and SAN B using Ethernet does not go over very well with IP networking teams. A compromise could be to have the networking team allocate two VLANs (one for SAN A and the other for SAN B). This mostly side-steps the concerns I have. With regards to ripping FC and replacing it with Ethernet, I think absolutely nobody will replace their existing FC SAN with an Ethernet based one. It doesn’t make sense from an economics perspective. However, I do think as end-users plan to deploy new applications or environments, using Ethernet as a substitute for FC would make sense. This is mainly because the provisioning process we defined for NVMe/TCP was based on the FC provisioning process and this was done to allow legacy FC customers to move to Ethernet as painlessly as possible should they need to migrate off of FC. Q: Can you share the scripts again that you used to connect?  A: Please refer to slide 47. The scripts are available here: https://github.com/dell/SANdbox/tree/main/Toolkit Q: Any commitment from Microsoft for a Windows NVMe/TCP driver to be developed?  A: I can’t comment on another company’s product roadmap. I would highly recommend that you reach out to Microsoft directly. Q: There is a typo in that slide not 10.10.23.2 should be 10.10.3.2  A: 10.10.23.2 is the IP Address of the CDC in that diagram. The “mDNS response” is telling the host that a CDC is available at 10.10.23.2. Q: What is the difference between -1500 and -9000?  A: This is the MTU (Maximum Transmission Unit) size. Q: When will TP-8010 be ratified?  A: It was ratified in February of 2022. Q: Does CDC sit at end storage (end point) or in fabric?  A: The CDC can theoretically reside anywhere. Dell’s CDC implementation (SFSS) can currently be deployed as a VM (or on an EC2 instance in AWS). Longer term, you can expect to see SFSS running on a switch. Q: In FC-NVMe it was 32Gb adapters. What was used for testing Ethernet/NVMe over TCP? A: We used Intel E810 adapters that were set to 25GbE. Q: Will a higher speed Ethernet adapter give better results for NVMe over TCP as 100Gb Ethernet adapters are more broadly available and 128Gb FC is still not a ratified standard? A: A higher speed Ethernet adapter will give better results for NVMe/TCP. A typical/modern Host should be able to drive a pair of 100GbE adapters to near line rate with NVMe/TCP IO. The problem is, attempting to do this would consume a lot of CPU and could negatively impact the amount of CPU left for applications / VMs unless offloads in the NIC are utilized to offset utilization. Also, the 128GFC standard was ratified earlier this year. Q: Will CDC be a separate device? Appliance?  A: The CDC currently runs as a VM on a server. We also expect CDCs to be deployed on a switch. Q: What storage system was used for this testing?  A: The results were for Dell PowerStore. The testing results will vary depending on the storage platform being used. Q: Slides 20-40:  Who are you expecting to do this configuration work, the server team, the network team or the storage team?”  A: These slides were intended to show the work that needs to be done, not which team needs to do it.  That said, the fully automated solution could be driven by the storage admin with only minimal involvement from the networking and server teams. Q: Are the CPU utilization results for the host or array?  A: Host Q: What was the HBA card & Ethernet NIC used for the testing?  A: HBA = QLE2272. NIC = Intel E810 Q: What were the FC HBA & NIC speeds?  A: HBA was running at 32GFC. Ethernet was running at 25GbE. Q: How to you approach multi-site redundancy or single site redundancy?  A: Single site redundancy can be accomplished by deploying more than one CDC instance and setting up a SAN A / SAN B type of configuration.  Multi-site redundancy depends on the scope of the admin domain. If the admin domain spans both sites, then a single CDC instance COULD provide discovery services for both sites.  If the admin domain is restricted to an admin domain per site, then it would currently require one CDC instance per site/admin domain.  Q: When a host admin decides to use the “direct connect” discovery approach instead of “Centralized Discovery”, what functionality is lost?  A: This configuration works fine up to a point (~tens of end points), but it results in full-mesh discovery and this can lead to wasting resources on both the host and storage. Q: Are there also test results with larger / more regular block sizes?  A: Yes. Please see the Transport Performance Comparison White paper. Q: Is Auto Discovery supported natively within for example VMware ESXi?  A: Yes. With ESXi 8.0u1, dynamic discovery is fully supported. Q: So NVMe/TCP does support LAG vs iSCSI which does not?  A: LAG can be supported for both NVMe/TCP and iSCSI. There are some limitations with ESXi and these are described in the SFSS Deployment Guide. Q: So NVMe/TCP does support routing?  A: Yes. I was showing how to automate the configuration of routing information that would typically need to be done on the host to support NVMe/TCP over L3. Q: You are referring to Dell Open Source Host Software; do other vendors also have the same multipathing / storage path handling concept?  A: I do not believe there is another discovery client available right now.  Dell has gone to great lengths to make sure that the discovery client can be used by any vendor. Q: FC has moved to 64G as the new standard. Does this solution work well with mixed end device speeds as most environments have today, or was the testing conducted with all devices running the same NIC, storage speeds? A: We’ve tested with mixed speeds and have not encountered any issues. That said, mixed speed configurations have not gotten anywhere near the amount of testing that homogeneous speed configurations have. Q: Any reasoning on why lower MTU produces better performance results than jumbo MTU on NVMe TCP? This seems to go against the regular conventional thought process to enable 9K MTU when SAN is involved.  A: That is a great question that I have not been able to answer up to this point. We also found it counterintuitive. Q: Is CDC a vendor thing or a protocol thing?  A: The CDC is an NVM ExpressÒ Standard. The CDC is defined in the NVM ExpressÒ standard. See TP8010 for more information. Q: Do you see any issues with virtual block storage behind NVMe-oF? Specifically, zfs zvols in my case vs raw NVMe disks. Already something done with iSCSI? A: As long as the application does not use SCSI commands (e.g., vendor unique SCSI commands for array management purposes) to perform a specialized task, it will not know if the underlying storage volume is NVMe or SCSI based. Q: In your IOPS comparison were there significant hardware offload specific to NVMe/F TCP or just general IP/TCP offload?  A: There were no HW offloads used for NVMe/TCP testing. It was all software based. Q: Is IPv6 supported for NVMe/TCP? If so, is there any improvement for response times (on the same subnet)?  A: Yes, IPv6 is supported and it does not impact performance. Q: The elephant in the room between Link Aggregation and Multipath is that only the latter actually aggregates the effective bandwidth between any two devices in a reliable manner...  A: I am not sure I would go that far, but I do agree they are both important and can be combined if both the network and storage teams want to make sure both cases are covered. I personally would be more inclined to use Multipathing because I am more concerned about inadvertently causing a data unavailability (DU) event, rather than making sure I get the best possible performance. Q: Effective performance is likely to be limited to only the bandwidth of a single link too... MPIO is the way to go.  A: I think this is heavily dependent on the workload, but I agree that multipathing is the best way to go overall if you have to choose one or the other. Q: You'd need ProxyARP for the interface routes to work in this way, correct?  A: YES! And thank you for mentioning this. You do need proxy ARP enabled on the switches in order to bypass the routing table issue with the L3 IP SAN. Q: Were the tests on 1500 byte frames?  Can we do jumbo frames?  A: The test results included MTU of 1500 and 9000. Q: Seems like a lot of configuration steps for discovery. How does this compare to Fibre Channel in terms of complexity of configuration? A: When we first started working on Discovery Automation for NVMe/TCP, we made a conscious decision to ensure that the user experience of provisioning storage to a host via NVMe/TCP was as close as possible to the process used to provision storage to a host over a FC SAN.  We included concepts like a name server and zoning to make is as easy as possible for legacy FC customers to work with NVMe/TCP. I think we successfully met our goal. Make sure you know about all of SNIA NFS upcoming webinars by following us on Twitter @SNIANSF.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erik Smith

Aug 7, 2023

title of post
The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here. Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network? A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS). Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? A: To the best of my knowledge the only CDC implementation currently available is Dell’s SFSS. Q: You addressed the authentication piece of the security picture, but what about the other half – encryption. Are there encryption solutions available or in the works? A: I was running out of time and flew through that section. Both Authentication (DH-HMAC-CHAP) and Secure Channels (TLS 1.3) may be used per the specification. Dell does not support either of these yet, but we are working on it. Q: I believe NVMe/Fibre Channel is widely deployed as well. Is that true? A: Not based on what I’m seeing. NVMe/FC has been around for a while, it works well and Dell does support it. However, adoption has been slow. Again, based on what I’m seeing, NVMe/TCP seems to be gaining more traction. Q: Is nvme-stas an “in-box” solution, EPEL solution, or prototype solution?  A: It currently depends on the distro.
  • SLES 15 SP4 and SP5 – Inbox
  • RHEL 9.X – Inbox (Tech Preview) [RHEL 8.X: not available]
  • Ubuntu 22.04 – Universe (Community support)
Q: Regarding the slide comparing iSCSI, NVMe-oF, FC speeds, how do these numbers compare to RDMA transport over Ethernet or Infiniband (iSCSI Extensions for RDMA (iSER) or NMVe-oF RMDA)?  Closer to the FC NVMe-oF numbers? Did you consider NVMe-oF RoCE or is there not enough current or perceived future adoption rate? As a follow-on, do you see the same pitfalls with connectivity/hops as seen with FCoE? A: When we first started looking at NVMe over fabrics, we spent quite a bit of time working with RoCE, iWARP, NVMe/TCP and NVMe/FC. Some of these test results were presented during a previous webinar NVMe-oF: Looking Beyond Performance Hero Numbers . The RoCE performance numbers were actually amazing, especially at 100GbE and were much better than anything else we looked at with the exception of NVMe/TCP when hardware offload was used. The downsides to RoCE are described in the Hero numbers webinar referenced above. But the short version is, the last time I worked with it, it was difficult to configure and troubleshoot. I know NVIDIA has done a lot of work to make this better recently, but I think most end users will eventually end up using NVMe/TCP for general purpose IP SAN connectivity to external storage.  Q: Can you have multiple CDCs, like in a tree where you might have a CDC in an area of subnets that are segregated LAN wise, but would report or be managed by a manager of CDC so that you could have one centralized ‘CDC’ as an area that may present or have a physical presence in each of the different storage networks that are accessible by the segregated servers?  A: Theoretically yes. We have worked out the protocol details to provide this functionality. However, we could currently provide this functionality by providing a single CDC instance that has multiple network interfaces on it. We could then connect each interface to a different subnet. It would be a bit of work to configure, but it would get you out of needing to maintain multiple CDC instances.  Q: Does NVMe/TCP provide a block level or file level access to the storage?  A: Block. More information can be found in the blog post titled Storage Protocol Stacks for NVMe. Q: Which one will give best performance NVMe/TCP on 40G or NVMe/FC over on 32G?  A: It’s impossible to say without knowing the implementation we are talking about. I have also not seen any performance testing results for NVMe/TCP over 40GbE. Q: Ok, but creating two Ethernet fabrics for SAN A and SAN B goes against an ancient single-fabric network deployment standard… Besides: wouldn’t this procedure require bare ripping Fibre Channel and replacing it with Ethernet? A: I agree. Air gapped SAN A and SAN B using Ethernet does not go over very well with IP networking teams. A compromise could be to have the networking team allocate two VLANs (one for SAN A and the other for SAN B). This mostly side-steps the concerns I have. With regards to ripping FC and replacing it with Ethernet, I think absolutely nobody will replace their existing FC SAN with an Ethernet based one. It doesn’t make sense from an economics perspective. However, I do think as end-users plan to deploy new applications or environments, using Ethernet as a substitute for FC would make sense. This is mainly because the provisioning process we defined for NVMe/TCP was based on the FC provisioning process and this was done to allow legacy FC customers to move to Ethernet as painlessly as possible should they need to migrate off of FC. Q: Can you share the scripts again that you used to connect?  A: Please refer to slide 47. The scripts are available here: https://github.com/dell/SANdbox/tree/main/Toolkit Q: Any commitment from Microsoft for a Windows NVMe/TCP driver to be developed?  A: I can’t comment on another company’s product roadmap. I would highly recommend that you reach out to Microsoft directly. Q: There is a typo in that slide not 10.10.23.2 should be 10.10.3.2  A: 10.10.23.2 is the IP Address of the CDC in that diagram. The “mDNS response” is telling the host that a CDC is available at 10.10.23.2. Q: What is the difference between -1500 and -9000?  A: This is the MTU (Maximum Transmission Unit) size. Q: When will TP-8010 be ratified?  A: It was ratified in February of 2022. Q: Does CDC sit at end storage (end point) or in fabric?  A: The CDC can theoretically reside anywhere. Dell’s CDC implementation (SFSS) can currently be deployed as a VM (or on an EC2 instance in AWS). Longer term, you can expect to see SFSS running on a switch. Q: In FC-NVMe it was 32Gb adapters. What was used for testing Ethernet/NVMe over TCP? A: We used Intel E810 adapters that were set to 25GbE. Q: Will a higher speed Ethernet adapter give better results for NVMe over TCP as 100Gb Ethernet adapters are more broadly available and 128Gb FC is still not a ratified standard? A: A higher speed Ethernet adapter will give better results for NVMe/TCP. A typical/modern Host should be able to drive a pair of 100GbE adapters to near line rate with NVMe/TCP IO. The problem is, attempting to do this would consume a lot of CPU and could negatively impact the amount of CPU left for applications / VMs unless offloads in the NIC are utilized to offset utilization. Also, the 128GFC standard was ratified earlier this year. Q: Will CDC be a separate device? Appliance?  A: The CDC currently runs as a VM on a server. We also expect CDCs to be deployed on a switch. Q: What storage system was used for this testing?  A: The results were for Dell PowerStore. The testing results will vary depending on the storage platform being used. Q: Slides 20-40:  Who are you expecting to do this configuration work, the server team, the network team or the storage team?”  A: These slides were intended to show the work that needs to be done, not which team needs to do it.  That said, the fully automated solution could be driven by the storage admin with only minimal involvement from the networking and server teams. Q: Are the CPU utilization results for the host or array?  A: Host Q: What was the HBA card & Ethernet NIC used for the testing?  A: HBA = QLE2272. NIC = Intel E810 Q: What were the FC HBA & NIC speeds?  A: HBA was running at 32GFC. Ethernet was running at 25GbE. Q: How to you approach multi-site redundancy or single site redundancy?  A: Single site redundancy can be accomplished by deploying more than one CDC instance and setting up a SAN A / SAN B type of configuration.  Multi-site redundancy depends on the scope of the admin domain. If the admin domain spans both sites, then a single CDC instance COULD provide discovery services for both sites.  If the admin domain is restricted to an admin domain per site, then it would currently require one CDC instance per site/admin domain.  Q: When a host admin decides to use the “direct connect” discovery approach instead of “Centralized Discovery”, what functionality is lost?  A: This configuration works fine up to a point (~tens of end points), but it results in full-mesh discovery and this can lead to wasting resources on both the host and storage. Q: Are there also test results with larger / more regular block sizes?  A: Yes. Please see the Transport Performance Comparison White paper. Q: Is Auto Discovery supported natively within for example VMware ESXi?  A: Yes. With ESXi 8.0u1, dynamic discovery is fully supported. Q: So NVMe/TCP does support LAG vs iSCSI which does not?  A: LAG can be supported for both NVMe/TCP and iSCSI. There are some limitations with ESXi and these are described in the SFSS Deployment Guide. Q: So NVMe/TCP does support routing?  A: Yes. I was showing how to automate the configuration of routing information that would typically need to be done on the host to support NVMe/TCP over L3. Q: You are referring to Dell Open Source Host Software; do other vendors also have the same multipathing / storage path handling concept?  A: I do not believe there is another discovery client available right now.  Dell has gone to great lengths to make sure that the discovery client can be used by any vendor. Q: FC has moved to 64G as the new standard. Does this solution work well with mixed end device speeds as most environments have today, or was the testing conducted with all devices running the same NIC, storage speeds? A: We’ve tested with mixed speeds and have not encountered any issues. That said, mixed speed configurations have not gotten anywhere near the amount of testing that homogeneous speed configurations have. Q: Any reasoning on why lower MTU produces better performance results than jumbo MTU on NVMe TCP? This seems to go against the regular conventional thought process to enable 9K MTU when SAN is involved.  A: That is a great question that I have not been able to answer up to this point. We also found it counterintuitive. Q: Is CDC a vendor thing or a protocol thing?  A: The CDC is an NVM ExpressÒ Standard. The CDC is defined in the NVM ExpressÒ standard. See TP8010 for more information. Q: Do you see any issues with virtual block storage behind NVMe-oF? Specifically, zfs zvols in my case vs raw NVMe disks. Already something done with iSCSI? A: As long as the application does not use SCSI commands (e.g., vendor unique SCSI commands for array management purposes) to perform a specialized task, it will not know if the underlying storage volume is NVMe or SCSI based. Q: In your IOPS comparison were there significant hardware offload specific to NVMe/F TCP or just general IP/TCP offload?  A: There were no HW offloads used for NVMe/TCP testing. It was all software based. Q: Is IPv6 supported for NVMe/TCP? If so, is there any improvement for response times (on the same subnet)?  A: Yes, IPv6 is supported and it does not impact performance. Q: The elephant in the room between Link Aggregation and Multipath is that only the latter actually aggregates the effective bandwidth between any two devices in a reliable manner…  A: I am not sure I would go that far, but I do agree they are both important and can be combined if both the network and storage teams want to make sure both cases are covered. I personally would be more inclined to use Multipathing because I am more concerned about inadvertently causing a data unavailability (DU) event, rather than making sure I get the best possible performance. Q: Effective performance is likely to be limited to only the bandwidth of a single link too… MPIO is the way to go.  A: I think this is heavily dependent on the workload, but I agree that multipathing is the best way to go overall if you have to choose one or the other. Q: You’d need ProxyARP for the interface routes to work in this way, correct?  A: YES! And thank you for mentioning this. You do need proxy ARP enabled on the switches in order to bypass the routing table issue with the L3 IP SAN. Q: Were the tests on 1500 byte frames?  Can we do jumbo frames?  A: The test results included MTU of 1500 and 9000. Q: Seems like a lot of configuration steps for discovery. How does this compare to Fibre Channel in terms of complexity of configuration? A: When we first started working on Discovery Automation for NVMe/TCP, we made a conscious decision to ensure that the user experience of provisioning storage to a host via NVMe/TCP was as close as possible to the process used to provision storage to a host over a FC SAN.  We included concepts like a name server and zoning to make is as easy as possible for legacy FC customers to work with NVMe/TCP. I think we successfully met our goal. Make sure you know about all of SNIA NFS upcoming webinars by following us on Twitter @SNIANSF. The post NVMe®/TCP Q&A first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Open Standards Featured at FMS 2023

SNIA CMS Community

Jul 31, 2023

title of post

SNIA welcomes colleagues to join them at the upcoming Flash Memory Summit, August 8-10, 2023 in Santa Clara CA. SNIA is pleased to join standards organizations CXL Consortium™ (CXL™), PCI-SIG®, and Universal Chiplet Interconnect Express™ (UCIe™) in an Open Standards Pavilion, Booth #725, in the Exhibit Hall.  CMSI will feature SNIA member companies in a computational storage cross industry demo by Intel, MINIO, and Solidigm and a Data Filtering demo by ScaleFlux; a software memory tiering demo by VMware; a persistent memory workshop and hackathon; and the latest on SSD form factors E1 and E3 work by SNIA SFF TA Technical work group. SMI will showcase SNIA Swordfish® management of NVMe SSDs on Linux with demos by Intel Samsung and Solidigm. CXL will discuss their advances in coherent connectivity.  PCI-SIG will feature their PCIe 5.0 architecture (32GT/s) and PCIe 6.0 (65GT/s) architectures and industry adoption and the upcoming PCIe 7.0 specification development (128GT/s).  UCIe will discuss their new open industry standard establishing a universal interconnect at the package-level. SNIA STA Forum will also be in Booth #849 – learn more about the SCSI Trade Association joining SNIA. These demonstrations and discussions will augment FMS program sessions in the SNIA-sponsored System Architecture Track on memory, computational storage, CXL, and UCIe standards.  A SNIA mainstage session on Wednesday August 9 at 2:10 pm will discuss Trends in Storage and Data: New Directions for Industry Standards. SNIA colleagues and friends can receive a $100 discount off the 1-, 2-, or 3-day full conference registration by using code SNIA23. Visit snia.org/fms to learn more about the exciting activities at FMS 2023 and join us there! The post Open Standards Featured at FMS 2023 first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

So Just What Is An SSD?

Jonmichael Hands

Jul 19, 2023

title of post
It seems like an easy enough question, “What is an SSD?” but surprisingly, most of the search results for this get somewhat confused quickly on media, controllers, form factors, storage interfaces, performance, reliability, and different market segments.  The SNIA SSD SIG has spent time demystifying various SSD topics like endurance, form factors, and the different classifications of SSDs – from consumer to enterprise and hyperscale SSDs. “Solid state drive is a general term that covers many market segments, and the SNIA SSD SIG has developed a new overview of “What is an SSD? ,” said Jonmichael Hands, SNIA SSD Special Interest Group (SIG)Co-Chair. “We are committed to helping make storage technology topics, like endurance and form factors, much easier to understand coming straight from the industry experts defining the specifications.”   The “What is an SSD?” page offers a concise description of what SSDs do, how they perform, how they connect, and also provides a jumping off point for more in-depth clarification of the many aspects of SSDs. It joins an ever-growing category of 20 one-page “What Is?” answers that provide a clear and concise, vendor-neutral definition of often- asked technology terms, a description of what they are, and how each of these technologies work.  Check out all the “What Is?” entries at https://www.snia.org/education/what-is And don’t miss other interest topics from the SNIA SSD SIG, including  Total Cost of Ownership Model for Storage and SSD videos and presentations in the SNIA Educational Library. Your comments and feedback on this page are welcomed.  Send them to askcmsi@snia.org. The post So just what is an SSD? first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Your Questions Answered on Persistent Memory, CXL, and Memory Tiering

SNIA CMS Community

Jul 10, 2023

title of post
With the persistent memory ecosystem continuing to evolve with new interconnects like CXL™ and applications like memory tiering, our recent Persistent Memory, CXL, and Memory Tiering-Past, Present, and Future webinar was a big success.  If you missed it, watch it on demand HERE! Many questions were answered live during the webinar, but we did not get to all of them.  Our moderator Jim Handy from Objective Analysis, and experts Andy Rudoff and Bhushan Chithur from Intel, David McIntyre from Samsung, and Sudhir Balasubramanian and Arvind Jagannath from VMware have taken the time to answer them in this blog. Happy reading! Q: What features or support is required from a CXL capable endpoint to e.g. an accelerator to support the memory pooling? Any references? A: You will have two interfaces, one for the primary memory accesses and one for the management of the pooling device. The primary memory interface is the .mem and the management interface will be via the .io or via a sideband interface. In addition you will need to implement a robust failure recovery mechanism since the blast radius is much larger with memory pooling. Q: How do you recognize weak information security (in CXL)? A: CXL has multiple features around security and there is considerable activity around this in the Consortium.  For specifics, please see the CXL Specification or send us a more specific question. Q: If the system (e.g. x86 host) wants to deploy CXL memory (Type 3) now, is there any OS kernel configuration, BIO configuration to make the hardware run with VMWare (ESXi)? How easy or difficult this setup process? A: A simple CXL Type 3 Memory Device providing volatile memory is typically configured by the pre-boot environment and reported to the OS along with any other main memory.  In this way, a platform that supports CXL Type 3 Memory can use it without any additional setup and can run an OS that contains no CXL support and the memory will appear as memory belonging to another NUMA code.  That said, using an OS that does support CXL enables more complex management, error handling, and more complex CXL devices. Q: There was a question on ‘Hop” length. Would you clarify? A: In the webinar around minute 48, it was stated that a Hop was 20ns, but this is not correct. A Hop is often spoken of as “Around 100ns.”  The Microsoft Azure Pond paper quantifies it four different ways, which range from 85ns to 280ns. Q: Do we have any idea how much longer the latency will be?   A: The language CXL folks use is “Hops.”   An address going into CXL is one Hop, and data coming back is another.  In a fabric it would be twice that, or four Hops.  The  latency for a Hop is somewhere around 100ns, although other latencies are accepted. Q: For memory semantic SSD:  There appears to be a trend among 2LM device vendors to presume the host system will be capable of providing telemetry data for a device-side tiering mechanism to decide what data should be promoted and demoted.  Meanwhile, software vendors seem to be focused on the devices providing telemetry for a host-side tiering mechanism to tell the device where to move the memory.  What is your opinion on how and where tiering should be enforced for 2LM devices like a memory semantic SSD? A: Tiering can be managed both by the host and within computational storage drives that could have an integrated compute function to manage local tiering- think edge applications. Q: Re VM performance in Tiering: It appears you’re comparing the performance of 2 VM’s against 1.  It looked like the performance of each individual VM on the tiering system was slower than the DRAM only VM.  Can you explain why we should take the performance of 2 VMs against the 1 VM?  Is the proposal that we otherwise would have required those 2 VM’s to run on separate NUMA node, and now they’re running on the same NUMA node? A: Here the use case was, lower TCO & increased capacity of memory along with aggregate performance of VM’s v/s running few VM’s on DRAM. In this use case, the DRAM per NUMA Node was 384GB, the Tier2 memory per NUMA node was 768GB. The VM RAM was 256GB. In the DRAM only case, if we have to run business critical workloads e.g., Oracle with VM RAM=256GB,  we could only run 1 VM (256GB) per NUMA Node (DRAM=384GB), we cannot over-provision memory in the DRAM only case as every NUMA node has 384GB only. So potentially we could run 4 such VM’s (VM RAM=256Gb) in this case with NUMA node affinity set as we did in this use case OR if we don’t do NUMA node affinity, maybe 5 such VM’s without completely maxing out the server RAM.  Remember, we did NUMA node affinity in this use case to eliminate any cross NUMA latency.78 Now with Tier2 memory in the mix, each NUMA node has 384GB DRAM and 768GB Tier2 Memory, so theoretically one could run 16-17 such VM’s (VM RAM =256GB), hence we are able to increase resource maximization, run more workloads, increase transactions etc , so lower TCO, increased capacity and aggregate performance improvement. Q: CXL is changing very fast, we have 3 protocol versions in 2 years, as a new consumer of CXL what are the top 3 advantages of adopting CXL right away v/s waiting for couple of more years? A: All versions of CXL are backward compatible.  Users should have no problem using today’s CXL devices with newer versions of CXL, although they won’t be able to take advantage of any new features that are introduced after the hardware is deployed. Q: (What is the) ideal when using agilex FPGAs as accelerators? A: CXL 3.0 supports multiple accelerators via the CXL switching fabric. This is good for memory sharing across heterogeneous compute accelerators, including FPGAs. Thanks again for your support of SNIA education, and we invite you to write askcmsi@snia.org for your ideas for future webinars and blogs! The post Your Questions Answered on Persistent Memory, CXL, and Memory Tiering first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to