Sorry, you need to enable JavaScript to visit this website.
Michael Hoard

Aug 21, 2023

title of post
Unification of structured and unstructured data has long been a goal – and challenge for organizations. Data Fabric is an architecture, set of services and platform that standardizes and integrates data across the enterprise regardless of data location (On-Premises, Cloud, Multi-Cloud, Hybrid Cloud), enabling self-service data access to support various applications, analytics, and use cases. The data fabric leaves data where it lives and applies intelligent automation to govern, secure and bring AI to your data. How a data fabric abstraction layer works and the benefits it delivers was the topic of our recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “Data Fabric: Connecting the Dots between Structured and Unstructured Data.” If you missed it, you can watch it on-demand and access the presentations slides at the SNIA Educational Library. We did not have time to answer audience questions at the live session. Here are answers from our expert, Joseph Dain. Q. What are some of the biggest challenges you have encountered when building this architecture? A. The scale of unstructured data makes it challenging to build a catalog of this information. With structured data you may have thousands or hundreds of thousands of table assets, but in unstructured data you can have billions of files and objects that need to be tracked at massive scale. Another challenge is masking unstructured data. With structured data you have a well-defined schema so it is easier to mask specific columns but in unstructured data you don’t have such a schema so you need to be able to understand what term needs to be masked in an unstructured document and you need to know the location of that field without having the luxury of a well-defined schema to guide you. Q. There can be lots of data access requests from many users. How is this handled? A. The data governance layer has two aspects that are leveraged to address this. The first aspect is data privacy rules which are automatically enforced during data access requests and are typically controlled at a group level. The second aspect is the ability to create custom workflows with personas that enable users to initiate data access requests which are sent to the appropriate approvers. Q. What are some of the next steps with this architecture? A. One area of interest is leveraging computational storage to do the classification and profiling of data to identify aspects such as personally identifiable information (PII). In particular, profiling vast amounts of unstructured data for PII is a compute, network, storage, and memory intense operation. By performing this profiling leveraging computational storage close to the data, we gain efficiencies in the rate at which we can process data with less resource consumption. We continue to offer educational webinars on a wide range of cloud-related topics throughout the year. Please follow us @sniacloud_com to make sure you don’t miss any.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erik Smith

Aug 7, 2023

title of post
The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here. Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network? A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS). Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? A: To the best of my knowledge the only CDC implementation currently available is Dell’s SFSS. Q: You addressed the authentication piece of the security picture, but what about the other half - encryption. Are there encryption solutions available or in the works? A: I was running out of time and flew through that section. Both Authentication (DH-HMAC-CHAP) and Secure Channels (TLS 1.3) may be used per the specification. Dell does not support either of these yet, but we are working on it. Q: I believe NVMe/Fibre Channel is widely deployed as well. Is that true? A: Not based on what I’m seeing. NVMe/FC has been around for a while, it works well and Dell does support it. However, adoption has been slow. Again, based on what I’m seeing, NVMe/TCP seems to be gaining more traction. Q: Is nvme-stas an "in-box" solution, EPEL solution, or prototype solution?  A: It currently depends on the distro.
  • SLES 15 SP4 and SP5 - Inbox
  • RHEL 9.X - Inbox (Tech Preview) [RHEL 8.X: not available]
  • Ubuntu 22.04 - Universe (Community support)
Q: Regarding the slide comparing iSCSI, NVMe-oF, FC speeds, how do these numbers compare to RDMA transport over Ethernet or Infiniband (iSCSI Extensions for RDMA (iSER) or NMVe-oF RMDA)?  Closer to the FC NVMe-oF numbers? Did you consider NVMe-oF RoCE or is there not enough current or perceived future adoption rate? As a follow-on, do you see the same pitfalls with connectivity/hops as seen with FCoE? A: When we first started looking at NVMe over fabrics, we spent quite a bit of time working with RoCE, iWARP, NVMe/TCP and NVMe/FC. Some of these test results were presented during a previous webinar NVMe-oF: Looking Beyond Performance Hero Numbers . The RoCE performance numbers were actually amazing, especially at 100GbE and were much better than anything else we looked at with the exception of NVMe/TCP when hardware offload was used. The downsides to RoCE are described in the Hero numbers webinar referenced above. But the short version is, the last time I worked with it, it was difficult to configure and troubleshoot. I know NVIDIA has done a lot of work to make this better recently, but I think most end users will eventually end up using NVMe/TCP for general purpose IP SAN connectivity to external storage.  Q: Can you have multiple CDCs, like in a tree where you might have a CDC in an area of subnets that are segregated LAN wise, but would report or be managed by a manager of CDC so that you could have one centralized 'CDC' as an area that may present or have a physical presence in each of the different storage networks that are accessible by the segregated servers?  A: Theoretically yes. We have worked out the protocol details to provide this functionality. However, we could currently provide this functionality by providing a single CDC instance that has multiple network interfaces on it. We could then connect each interface to a different subnet. It would be a bit of work to configure, but it would get you out of needing to maintain multiple CDC instances.  Q: Does NVMe/TCP provide a block level or file level access to the storage?  A: Block. More information can be found in the blog post titled Storage Protocol Stacks for NVMe. Q: Which one will give best performance NVMe/TCP on 40G or NVMe/FC over on 32G?  A: It’s impossible to say without knowing the implementation we are talking about. I have also not seen any performance testing results for NVMe/TCP over 40GbE. Q: Ok, but creating two Ethernet fabrics for SAN A and SAN B goes against an ancient single-fabric network deployment standard... Besides: wouldn't this procedure require bare ripping Fibre Channel and replacing it with Ethernet? A: I agree. Air gapped SAN A and SAN B using Ethernet does not go over very well with IP networking teams. A compromise could be to have the networking team allocate two VLANs (one for SAN A and the other for SAN B). This mostly side-steps the concerns I have. With regards to ripping FC and replacing it with Ethernet, I think absolutely nobody will replace their existing FC SAN with an Ethernet based one. It doesn’t make sense from an economics perspective. However, I do think as end-users plan to deploy new applications or environments, using Ethernet as a substitute for FC would make sense. This is mainly because the provisioning process we defined for NVMe/TCP was based on the FC provisioning process and this was done to allow legacy FC customers to move to Ethernet as painlessly as possible should they need to migrate off of FC. Q: Can you share the scripts again that you used to connect?  A: Please refer to slide 47. The scripts are available here: https://github.com/dell/SANdbox/tree/main/Toolkit Q: Any commitment from Microsoft for a Windows NVMe/TCP driver to be developed?  A: I can’t comment on another company’s product roadmap. I would highly recommend that you reach out to Microsoft directly. Q: There is a typo in that slide not 10.10.23.2 should be 10.10.3.2  A: 10.10.23.2 is the IP Address of the CDC in that diagram. The “mDNS response” is telling the host that a CDC is available at 10.10.23.2. Q: What is the difference between -1500 and -9000?  A: This is the MTU (Maximum Transmission Unit) size. Q: When will TP-8010 be ratified?  A: It was ratified in February of 2022. Q: Does CDC sit at end storage (end point) or in fabric?  A: The CDC can theoretically reside anywhere. Dell’s CDC implementation (SFSS) can currently be deployed as a VM (or on an EC2 instance in AWS). Longer term, you can expect to see SFSS running on a switch. Q: In FC-NVMe it was 32Gb adapters. What was used for testing Ethernet/NVMe over TCP? A: We used Intel E810 adapters that were set to 25GbE. Q: Will a higher speed Ethernet adapter give better results for NVMe over TCP as 100Gb Ethernet adapters are more broadly available and 128Gb FC is still not a ratified standard? A: A higher speed Ethernet adapter will give better results for NVMe/TCP. A typical/modern Host should be able to drive a pair of 100GbE adapters to near line rate with NVMe/TCP IO. The problem is, attempting to do this would consume a lot of CPU and could negatively impact the amount of CPU left for applications / VMs unless offloads in the NIC are utilized to offset utilization. Also, the 128GFC standard was ratified earlier this year. Q: Will CDC be a separate device? Appliance?  A: The CDC currently runs as a VM on a server. We also expect CDCs to be deployed on a switch. Q: What storage system was used for this testing?  A: The results were for Dell PowerStore. The testing results will vary depending on the storage platform being used. Q: Slides 20-40:  Who are you expecting to do this configuration work, the server team, the network team or the storage team?”  A: These slides were intended to show the work that needs to be done, not which team needs to do it.  That said, the fully automated solution could be driven by the storage admin with only minimal involvement from the networking and server teams. Q: Are the CPU utilization results for the host or array?  A: Host Q: What was the HBA card & Ethernet NIC used for the testing?  A: HBA = QLE2272. NIC = Intel E810 Q: What were the FC HBA & NIC speeds?  A: HBA was running at 32GFC. Ethernet was running at 25GbE. Q: How to you approach multi-site redundancy or single site redundancy?  A: Single site redundancy can be accomplished by deploying more than one CDC instance and setting up a SAN A / SAN B type of configuration.  Multi-site redundancy depends on the scope of the admin domain. If the admin domain spans both sites, then a single CDC instance COULD provide discovery services for both sites.  If the admin domain is restricted to an admin domain per site, then it would currently require one CDC instance per site/admin domain.  Q: When a host admin decides to use the “direct connect” discovery approach instead of “Centralized Discovery”, what functionality is lost?  A: This configuration works fine up to a point (~tens of end points), but it results in full-mesh discovery and this can lead to wasting resources on both the host and storage. Q: Are there also test results with larger / more regular block sizes?  A: Yes. Please see the Transport Performance Comparison White paper. Q: Is Auto Discovery supported natively within for example VMware ESXi?  A: Yes. With ESXi 8.0u1, dynamic discovery is fully supported. Q: So NVMe/TCP does support LAG vs iSCSI which does not?  A: LAG can be supported for both NVMe/TCP and iSCSI. There are some limitations with ESXi and these are described in the SFSS Deployment Guide. Q: So NVMe/TCP does support routing?  A: Yes. I was showing how to automate the configuration of routing information that would typically need to be done on the host to support NVMe/TCP over L3. Q: You are referring to Dell Open Source Host Software; do other vendors also have the same multipathing / storage path handling concept?  A: I do not believe there is another discovery client available right now.  Dell has gone to great lengths to make sure that the discovery client can be used by any vendor. Q: FC has moved to 64G as the new standard. Does this solution work well with mixed end device speeds as most environments have today, or was the testing conducted with all devices running the same NIC, storage speeds? A: We’ve tested with mixed speeds and have not encountered any issues. That said, mixed speed configurations have not gotten anywhere near the amount of testing that homogeneous speed configurations have. Q: Any reasoning on why lower MTU produces better performance results than jumbo MTU on NVMe TCP? This seems to go against the regular conventional thought process to enable 9K MTU when SAN is involved.  A: That is a great question that I have not been able to answer up to this point. We also found it counterintuitive. Q: Is CDC a vendor thing or a protocol thing?  A: The CDC is an NVM ExpressÒ Standard. The CDC is defined in the NVM ExpressÒ standard. See TP8010 for more information. Q: Do you see any issues with virtual block storage behind NVMe-oF? Specifically, zfs zvols in my case vs raw NVMe disks. Already something done with iSCSI? A: As long as the application does not use SCSI commands (e.g., vendor unique SCSI commands for array management purposes) to perform a specialized task, it will not know if the underlying storage volume is NVMe or SCSI based. Q: In your IOPS comparison were there significant hardware offload specific to NVMe/F TCP or just general IP/TCP offload?  A: There were no HW offloads used for NVMe/TCP testing. It was all software based. Q: Is IPv6 supported for NVMe/TCP? If so, is there any improvement for response times (on the same subnet)?  A: Yes, IPv6 is supported and it does not impact performance. Q: The elephant in the room between Link Aggregation and Multipath is that only the latter actually aggregates the effective bandwidth between any two devices in a reliable manner...  A: I am not sure I would go that far, but I do agree they are both important and can be combined if both the network and storage teams want to make sure both cases are covered. I personally would be more inclined to use Multipathing because I am more concerned about inadvertently causing a data unavailability (DU) event, rather than making sure I get the best possible performance. Q: Effective performance is likely to be limited to only the bandwidth of a single link too... MPIO is the way to go.  A: I think this is heavily dependent on the workload, but I agree that multipathing is the best way to go overall if you have to choose one or the other. Q: You'd need ProxyARP for the interface routes to work in this way, correct?  A: YES! And thank you for mentioning this. You do need proxy ARP enabled on the switches in order to bypass the routing table issue with the L3 IP SAN. Q: Were the tests on 1500 byte frames?  Can we do jumbo frames?  A: The test results included MTU of 1500 and 9000. Q: Seems like a lot of configuration steps for discovery. How does this compare to Fibre Channel in terms of complexity of configuration? A: When we first started working on Discovery Automation for NVMe/TCP, we made a conscious decision to ensure that the user experience of provisioning storage to a host via NVMe/TCP was as close as possible to the process used to provision storage to a host over a FC SAN.  We included concepts like a name server and zoning to make is as easy as possible for legacy FC customers to work with NVMe/TCP. I think we successfully met our goal. Make sure you know about all of SNIA NFS upcoming webinars by following us on Twitter @SNIANSF.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erik Smith

Aug 7, 2023

title of post
The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here. Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network? A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS). Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? A: To the best of my knowledge the only CDC implementation currently available is Dell’s SFSS. Q: You addressed the authentication piece of the security picture, but what about the other half – encryption. Are there encryption solutions available or in the works? A: I was running out of time and flew through that section. Both Authentication (DH-HMAC-CHAP) and Secure Channels (TLS 1.3) may be used per the specification. Dell does not support either of these yet, but we are working on it. Q: I believe NVMe/Fibre Channel is widely deployed as well. Is that true? A: Not based on what I’m seeing. NVMe/FC has been around for a while, it works well and Dell does support it. However, adoption has been slow. Again, based on what I’m seeing, NVMe/TCP seems to be gaining more traction. Q: Is nvme-stas an “in-box” solution, EPEL solution, or prototype solution?  A: It currently depends on the distro.
  • SLES 15 SP4 and SP5 – Inbox
  • RHEL 9.X – Inbox (Tech Preview) [RHEL 8.X: not available]
  • Ubuntu 22.04 – Universe (Community support)
Q: Regarding the slide comparing iSCSI, NVMe-oF, FC speeds, how do these numbers compare to RDMA transport over Ethernet or Infiniband (iSCSI Extensions for RDMA (iSER) or NMVe-oF RMDA)?  Closer to the FC NVMe-oF numbers? Did you consider NVMe-oF RoCE or is there not enough current or perceived future adoption rate? As a follow-on, do you see the same pitfalls with connectivity/hops as seen with FCoE? A: When we first started looking at NVMe over fabrics, we spent quite a bit of time working with RoCE, iWARP, NVMe/TCP and NVMe/FC. Some of these test results were presented during a previous webinar NVMe-oF: Looking Beyond Performance Hero Numbers . The RoCE performance numbers were actually amazing, especially at 100GbE and were much better than anything else we looked at with the exception of NVMe/TCP when hardware offload was used. The downsides to RoCE are described in the Hero numbers webinar referenced above. But the short version is, the last time I worked with it, it was difficult to configure and troubleshoot. I know NVIDIA has done a lot of work to make this better recently, but I think most end users will eventually end up using NVMe/TCP for general purpose IP SAN connectivity to external storage.  Q: Can you have multiple CDCs, like in a tree where you might have a CDC in an area of subnets that are segregated LAN wise, but would report or be managed by a manager of CDC so that you could have one centralized ‘CDC’ as an area that may present or have a physical presence in each of the different storage networks that are accessible by the segregated servers?  A: Theoretically yes. We have worked out the protocol details to provide this functionality. However, we could currently provide this functionality by providing a single CDC instance that has multiple network interfaces on it. We could then connect each interface to a different subnet. It would be a bit of work to configure, but it would get you out of needing to maintain multiple CDC instances.  Q: Does NVMe/TCP provide a block level or file level access to the storage?  A: Block. More information can be found in the blog post titled Storage Protocol Stacks for NVMe. Q: Which one will give best performance NVMe/TCP on 40G or NVMe/FC over on 32G?  A: It’s impossible to say without knowing the implementation we are talking about. I have also not seen any performance testing results for NVMe/TCP over 40GbE. Q: Ok, but creating two Ethernet fabrics for SAN A and SAN B goes against an ancient single-fabric network deployment standard… Besides: wouldn’t this procedure require bare ripping Fibre Channel and replacing it with Ethernet? A: I agree. Air gapped SAN A and SAN B using Ethernet does not go over very well with IP networking teams. A compromise could be to have the networking team allocate two VLANs (one for SAN A and the other for SAN B). This mostly side-steps the concerns I have. With regards to ripping FC and replacing it with Ethernet, I think absolutely nobody will replace their existing FC SAN with an Ethernet based one. It doesn’t make sense from an economics perspective. However, I do think as end-users plan to deploy new applications or environments, using Ethernet as a substitute for FC would make sense. This is mainly because the provisioning process we defined for NVMe/TCP was based on the FC provisioning process and this was done to allow legacy FC customers to move to Ethernet as painlessly as possible should they need to migrate off of FC. Q: Can you share the scripts again that you used to connect?  A: Please refer to slide 47. The scripts are available here: https://github.com/dell/SANdbox/tree/main/Toolkit Q: Any commitment from Microsoft for a Windows NVMe/TCP driver to be developed?  A: I can’t comment on another company’s product roadmap. I would highly recommend that you reach out to Microsoft directly. Q: There is a typo in that slide not 10.10.23.2 should be 10.10.3.2  A: 10.10.23.2 is the IP Address of the CDC in that diagram. The “mDNS response” is telling the host that a CDC is available at 10.10.23.2. Q: What is the difference between -1500 and -9000?  A: This is the MTU (Maximum Transmission Unit) size. Q: When will TP-8010 be ratified?  A: It was ratified in February of 2022. Q: Does CDC sit at end storage (end point) or in fabric?  A: The CDC can theoretically reside anywhere. Dell’s CDC implementation (SFSS) can currently be deployed as a VM (or on an EC2 instance in AWS). Longer term, you can expect to see SFSS running on a switch. Q: In FC-NVMe it was 32Gb adapters. What was used for testing Ethernet/NVMe over TCP? A: We used Intel E810 adapters that were set to 25GbE. Q: Will a higher speed Ethernet adapter give better results for NVMe over TCP as 100Gb Ethernet adapters are more broadly available and 128Gb FC is still not a ratified standard? A: A higher speed Ethernet adapter will give better results for NVMe/TCP. A typical/modern Host should be able to drive a pair of 100GbE adapters to near line rate with NVMe/TCP IO. The problem is, attempting to do this would consume a lot of CPU and could negatively impact the amount of CPU left for applications / VMs unless offloads in the NIC are utilized to offset utilization. Also, the 128GFC standard was ratified earlier this year. Q: Will CDC be a separate device? Appliance?  A: The CDC currently runs as a VM on a server. We also expect CDCs to be deployed on a switch. Q: What storage system was used for this testing?  A: The results were for Dell PowerStore. The testing results will vary depending on the storage platform being used. Q: Slides 20-40:  Who are you expecting to do this configuration work, the server team, the network team or the storage team?”  A: These slides were intended to show the work that needs to be done, not which team needs to do it.  That said, the fully automated solution could be driven by the storage admin with only minimal involvement from the networking and server teams. Q: Are the CPU utilization results for the host or array?  A: Host Q: What was the HBA card & Ethernet NIC used for the testing?  A: HBA = QLE2272. NIC = Intel E810 Q: What were the FC HBA & NIC speeds?  A: HBA was running at 32GFC. Ethernet was running at 25GbE. Q: How to you approach multi-site redundancy or single site redundancy?  A: Single site redundancy can be accomplished by deploying more than one CDC instance and setting up a SAN A / SAN B type of configuration.  Multi-site redundancy depends on the scope of the admin domain. If the admin domain spans both sites, then a single CDC instance COULD provide discovery services for both sites.  If the admin domain is restricted to an admin domain per site, then it would currently require one CDC instance per site/admin domain.  Q: When a host admin decides to use the “direct connect” discovery approach instead of “Centralized Discovery”, what functionality is lost?  A: This configuration works fine up to a point (~tens of end points), but it results in full-mesh discovery and this can lead to wasting resources on both the host and storage. Q: Are there also test results with larger / more regular block sizes?  A: Yes. Please see the Transport Performance Comparison White paper. Q: Is Auto Discovery supported natively within for example VMware ESXi?  A: Yes. With ESXi 8.0u1, dynamic discovery is fully supported. Q: So NVMe/TCP does support LAG vs iSCSI which does not?  A: LAG can be supported for both NVMe/TCP and iSCSI. There are some limitations with ESXi and these are described in the SFSS Deployment Guide. Q: So NVMe/TCP does support routing?  A: Yes. I was showing how to automate the configuration of routing information that would typically need to be done on the host to support NVMe/TCP over L3. Q: You are referring to Dell Open Source Host Software; do other vendors also have the same multipathing / storage path handling concept?  A: I do not believe there is another discovery client available right now.  Dell has gone to great lengths to make sure that the discovery client can be used by any vendor. Q: FC has moved to 64G as the new standard. Does this solution work well with mixed end device speeds as most environments have today, or was the testing conducted with all devices running the same NIC, storage speeds? A: We’ve tested with mixed speeds and have not encountered any issues. That said, mixed speed configurations have not gotten anywhere near the amount of testing that homogeneous speed configurations have. Q: Any reasoning on why lower MTU produces better performance results than jumbo MTU on NVMe TCP? This seems to go against the regular conventional thought process to enable 9K MTU when SAN is involved.  A: That is a great question that I have not been able to answer up to this point. We also found it counterintuitive. Q: Is CDC a vendor thing or a protocol thing?  A: The CDC is an NVM ExpressÒ Standard. The CDC is defined in the NVM ExpressÒ standard. See TP8010 for more information. Q: Do you see any issues with virtual block storage behind NVMe-oF? Specifically, zfs zvols in my case vs raw NVMe disks. Already something done with iSCSI? A: As long as the application does not use SCSI commands (e.g., vendor unique SCSI commands for array management purposes) to perform a specialized task, it will not know if the underlying storage volume is NVMe or SCSI based. Q: In your IOPS comparison were there significant hardware offload specific to NVMe/F TCP or just general IP/TCP offload?  A: There were no HW offloads used for NVMe/TCP testing. It was all software based. Q: Is IPv6 supported for NVMe/TCP? If so, is there any improvement for response times (on the same subnet)?  A: Yes, IPv6 is supported and it does not impact performance. Q: The elephant in the room between Link Aggregation and Multipath is that only the latter actually aggregates the effective bandwidth between any two devices in a reliable manner…  A: I am not sure I would go that far, but I do agree they are both important and can be combined if both the network and storage teams want to make sure both cases are covered. I personally would be more inclined to use Multipathing because I am more concerned about inadvertently causing a data unavailability (DU) event, rather than making sure I get the best possible performance. Q: Effective performance is likely to be limited to only the bandwidth of a single link too… MPIO is the way to go.  A: I think this is heavily dependent on the workload, but I agree that multipathing is the best way to go overall if you have to choose one or the other. Q: You’d need ProxyARP for the interface routes to work in this way, correct?  A: YES! And thank you for mentioning this. You do need proxy ARP enabled on the switches in order to bypass the routing table issue with the L3 IP SAN. Q: Were the tests on 1500 byte frames?  Can we do jumbo frames?  A: The test results included MTU of 1500 and 9000. Q: Seems like a lot of configuration steps for discovery. How does this compare to Fibre Channel in terms of complexity of configuration? A: When we first started working on Discovery Automation for NVMe/TCP, we made a conscious decision to ensure that the user experience of provisioning storage to a host via NVMe/TCP was as close as possible to the process used to provision storage to a host over a FC SAN.  We included concepts like a name server and zoning to make is as easy as possible for legacy FC customers to work with NVMe/TCP. I think we successfully met our goal. Make sure you know about all of SNIA NFS upcoming webinars by following us on Twitter @SNIANSF. The post NVMe®/TCP Q&A first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Open Standards Featured at FMS 2023

SNIA CMS Community

Jul 31, 2023

title of post

SNIA welcomes colleagues to join them at the upcoming Flash Memory Summit, August 8-10, 2023 in Santa Clara CA. SNIA is pleased to join standards organizations CXL Consortium™ (CXL™), PCI-SIG®, and Universal Chiplet Interconnect Express™ (UCIe™) in an Open Standards Pavilion, Booth #725, in the Exhibit Hall.  CMSI will feature SNIA member companies in a computational storage cross industry demo by Intel, MINIO, and Solidigm and a Data Filtering demo by ScaleFlux; a software memory tiering demo by VMware; a persistent memory workshop and hackathon; and the latest on SSD form factors E1 and E3 work by SNIA SFF TA Technical work group. SMI will showcase SNIA Swordfish® management of NVMe SSDs on Linux with demos by Intel Samsung and Solidigm. CXL will discuss their advances in coherent connectivity.  PCI-SIG will feature their PCIe 5.0 architecture (32GT/s) and PCIe 6.0 (65GT/s) architectures and industry adoption and the upcoming PCIe 7.0 specification development (128GT/s).  UCIe will discuss their new open industry standard establishing a universal interconnect at the package-level. SNIA STA Forum will also be in Booth #849 – learn more about the SCSI Trade Association joining SNIA. These demonstrations and discussions will augment FMS program sessions in the SNIA-sponsored System Architecture Track on memory, computational storage, CXL, and UCIe standards.  A SNIA mainstage session on Wednesday August 9 at 2:10 pm will discuss Trends in Storage and Data: New Directions for Industry Standards. SNIA colleagues and friends can receive a $100 discount off the 1-, 2-, or 3-day full conference registration by using code SNIA23. Visit snia.org/fms to learn more about the exciting activities at FMS 2023 and join us there! The post Open Standards Featured at FMS 2023 first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

So Just What Is An SSD?

Jonmichael Hands

Jul 19, 2023

title of post
It seems like an easy enough question, “What is an SSD?” but surprisingly, most of the search results for this get somewhat confused quickly on media, controllers, form factors, storage interfaces, performance, reliability, and different market segments.  The SNIA SSD SIG has spent time demystifying various SSD topics like endurance, form factors, and the different classifications of SSDs – from consumer to enterprise and hyperscale SSDs. “Solid state drive is a general term that covers many market segments, and the SNIA SSD SIG has developed a new overview of “What is an SSD? ,” said Jonmichael Hands, SNIA SSD Special Interest Group (SIG)Co-Chair. “We are committed to helping make storage technology topics, like endurance and form factors, much easier to understand coming straight from the industry experts defining the specifications.”   The “What is an SSD?” page offers a concise description of what SSDs do, how they perform, how they connect, and also provides a jumping off point for more in-depth clarification of the many aspects of SSDs. It joins an ever-growing category of 20 one-page “What Is?” answers that provide a clear and concise, vendor-neutral definition of often- asked technology terms, a description of what they are, and how each of these technologies work.  Check out all the “What Is?” entries at https://www.snia.org/education/what-is And don’t miss other interest topics from the SNIA SSD SIG, including  Total Cost of Ownership Model for Storage and SSD videos and presentations in the SNIA Educational Library. Your comments and feedback on this page are welcomed.  Send them to askcmsi@snia.org. The post So just what is an SSD? first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Your Questions Answered on Persistent Memory, CXL, and Memory Tiering

SNIA CMS Community

Jul 10, 2023

title of post
With the persistent memory ecosystem continuing to evolve with new interconnects like CXL™ and applications like memory tiering, our recent Persistent Memory, CXL, and Memory Tiering-Past, Present, and Future webinar was a big success.  If you missed it, watch it on demand HERE! Many questions were answered live during the webinar, but we did not get to all of them.  Our moderator Jim Handy from Objective Analysis, and experts Andy Rudoff and Bhushan Chithur from Intel, David McIntyre from Samsung, and Sudhir Balasubramanian and Arvind Jagannath from VMware have taken the time to answer them in this blog. Happy reading! Q: What features or support is required from a CXL capable endpoint to e.g. an accelerator to support the memory pooling? Any references? A: You will have two interfaces, one for the primary memory accesses and one for the management of the pooling device. The primary memory interface is the .mem and the management interface will be via the .io or via a sideband interface. In addition you will need to implement a robust failure recovery mechanism since the blast radius is much larger with memory pooling. Q: How do you recognize weak information security (in CXL)? A: CXL has multiple features around security and there is considerable activity around this in the Consortium.  For specifics, please see the CXL Specification or send us a more specific question. Q: If the system (e.g. x86 host) wants to deploy CXL memory (Type 3) now, is there any OS kernel configuration, BIO configuration to make the hardware run with VMWare (ESXi)? How easy or difficult this setup process? A: A simple CXL Type 3 Memory Device providing volatile memory is typically configured by the pre-boot environment and reported to the OS along with any other main memory.  In this way, a platform that supports CXL Type 3 Memory can use it without any additional setup and can run an OS that contains no CXL support and the memory will appear as memory belonging to another NUMA code.  That said, using an OS that does support CXL enables more complex management, error handling, and more complex CXL devices. Q: There was a question on ‘Hop” length. Would you clarify? A: In the webinar around minute 48, it was stated that a Hop was 20ns, but this is not correct. A Hop is often spoken of as “Around 100ns.”  The Microsoft Azure Pond paper quantifies it four different ways, which range from 85ns to 280ns. Q: Do we have any idea how much longer the latency will be?   A: The language CXL folks use is “Hops.”   An address going into CXL is one Hop, and data coming back is another.  In a fabric it would be twice that, or four Hops.  The  latency for a Hop is somewhere around 100ns, although other latencies are accepted. Q: For memory semantic SSD:  There appears to be a trend among 2LM device vendors to presume the host system will be capable of providing telemetry data for a device-side tiering mechanism to decide what data should be promoted and demoted.  Meanwhile, software vendors seem to be focused on the devices providing telemetry for a host-side tiering mechanism to tell the device where to move the memory.  What is your opinion on how and where tiering should be enforced for 2LM devices like a memory semantic SSD? A: Tiering can be managed both by the host and within computational storage drives that could have an integrated compute function to manage local tiering- think edge applications. Q: Re VM performance in Tiering: It appears you’re comparing the performance of 2 VM’s against 1.  It looked like the performance of each individual VM on the tiering system was slower than the DRAM only VM.  Can you explain why we should take the performance of 2 VMs against the 1 VM?  Is the proposal that we otherwise would have required those 2 VM’s to run on separate NUMA node, and now they’re running on the same NUMA node? A: Here the use case was, lower TCO & increased capacity of memory along with aggregate performance of VM’s v/s running few VM’s on DRAM. In this use case, the DRAM per NUMA Node was 384GB, the Tier2 memory per NUMA node was 768GB. The VM RAM was 256GB. In the DRAM only case, if we have to run business critical workloads e.g., Oracle with VM RAM=256GB,  we could only run 1 VM (256GB) per NUMA Node (DRAM=384GB), we cannot over-provision memory in the DRAM only case as every NUMA node has 384GB only. So potentially we could run 4 such VM’s (VM RAM=256Gb) in this case with NUMA node affinity set as we did in this use case OR if we don’t do NUMA node affinity, maybe 5 such VM’s without completely maxing out the server RAM.  Remember, we did NUMA node affinity in this use case to eliminate any cross NUMA latency.78 Now with Tier2 memory in the mix, each NUMA node has 384GB DRAM and 768GB Tier2 Memory, so theoretically one could run 16-17 such VM’s (VM RAM =256GB), hence we are able to increase resource maximization, run more workloads, increase transactions etc , so lower TCO, increased capacity and aggregate performance improvement. Q: CXL is changing very fast, we have 3 protocol versions in 2 years, as a new consumer of CXL what are the top 3 advantages of adopting CXL right away v/s waiting for couple of more years? A: All versions of CXL are backward compatible.  Users should have no problem using today’s CXL devices with newer versions of CXL, although they won’t be able to take advantage of any new features that are introduced after the hardware is deployed. Q: (What is the) ideal when using agilex FPGAs as accelerators? A: CXL 3.0 supports multiple accelerators via the CXL switching fabric. This is good for memory sharing across heterogeneous compute accelerators, including FPGAs. Thanks again for your support of SNIA education, and we invite you to write askcmsi@snia.org for your ideas for future webinars and blogs! The post Your Questions Answered on Persistent Memory, CXL, and Memory Tiering first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Considerations and Options for NVMe/TCP Deployment

David McIntyre

Jun 16, 2023

title of post
NVMe®/TCP has gained a lot of attention over the last several years due to its great performance characteristics and relatively low cost. Since its ratification in 2018, the NVMe/TCP protocol has been enhanced to add features such as Discovery Automation, Authentication and Secure Channels that make it more suitable for use in enterprise environments. Now as organizations evaluate their options and consider adopting NVMe/TCP for use in their environment, many find they need a bit more information before deciding how to move forward. That’s why the SNIA Networking Storage Forum (NSF) is hosting a live webinar on July 19, 2023 “NVMe/TCP: Performance, Deployment and Automation” where we will provide an overview of deployment considerations and options, and answer questions such as:
  • How does NVMe/TCP stack up against my existing block storage protocol of choice in terms of performance?
  • Should I use a dedicated storage network when deploying NVMe/TCP or is a converged network ok?
  • How can I automate interaction with my IP-Based SAN?
Register today for an open discussion on these questions as well as answers to questions that you may have about your environment. We look forward to seeing you on July 19th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Considerations and Options for NVMe/TCP Deployment

David McIntyre

Jun 16, 2023

title of post
NVMe®/TCP has gained a lot of attention over the last several years due to its great performance characteristics and relatively low cost. Since its ratification in 2018, the NVMe/TCP protocol has been enhanced to add features such as Discovery Automation, Authentication and Secure Channels that make it more suitable for use in enterprise environments. Now as organizations evaluate their options and consider adopting NVMe/TCP for use in their environment, many find they need a bit more information before deciding how to move forward. That’s why the SNIA Networking Storage Forum (NSF) is hosting a live webinar on July 19, 2023 “NVMe/TCP: Performance, Deployment and Automation” where we will provide an overview of deployment considerations and options, and answer questions such as:
  • How does NVMe/TCP stack up against my existing block storage protocol of choice in terms of performance?
  • Should I use a dedicated storage network when deploying NVMe/TCP or is a converged network ok?
  • How can I automate interaction with my IP-Based SAN?
Register today for an open discussion on these questions as well as answers to questions that you may have about your environment. We look forward to seeing you on July 19th. The post Considerations and Options for NVMe/TCP Deployment first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Connector and Cable differences: SAS-3 vs. SAS-4

STA Forum

Jun 14, 2023

title of post

By: David Einhorn, SCSI Trade Association Board of Directors; Business Development Manager, North America, Amphenol Corp., June 14, 2022

This blog post examines the differences between SAS-3 and SAS-4 connectors and cables. With the new generation of SAS, we see multiple upgrades and improvements.

Drive connector
[Note: 24G SAS uses the SAS-4 physical layer, which operates at a baud rate of 22.5Gb/s.]

The 29-position receptacle and plug connectors used in SAS-4 feature: hot-plugging, blind-mating, connector misalignment correction, and a PCB retention mechanism for robust SMT attachment. The connectors are SATA compliant and available across many suppliers in range of vertical and right-angle configurations. Typical applications are consistent with previous generations of server and storage equipment, HDDs, HDD carriers, and SSDs.

To fulfill the needs of next-generation servers, several improvements have been implemented. Raw materials have been upgraded, housing designs and terminal geometries have been modified to meet signal integrity requirements at 24G SAS speeds while maintaining the footprint of existing SAS-3 for easy upgrades.

  • Compliant to SFF8681 specification
  • Footprint backward compatible to 3Gb/s, 6Gb/s, and 12G SAS connectors
  • Staggered contact lengths for hot plugging applications
  • Receptacles are available in SMT, through-hole and hybrid PCB attach methods
  • Header is available in right angle and vertical orientation
  • Supports both SAS and SATA drives

Ultimately, the goal was to design a connector with the mechanical and electrical reliability which has been part of every previous SAS generation and improve the signal integrity to meet the 24G SAS need while maintaining backward compatibility.

Drive cable assembly
SAS-4 cables for next-generation servers perform to 24G SAS speeds with a significant size and density improvement from previous generations. An 8x SlimSAS consumes the same area as a 4x MiniSAS HD. From a construction standpoint, SlimSAS series plug connectors include an anti-skew feature for misalignment correction. An optimized raw cable structure and upgraded cable manufacturing process enables the enhanced signal integrity performance requirements of SAS-4. Additionally, the plug connector internal components have been optimized to control and stabilize the impedance. The connectors are compliant with SFF-8654 and meet a wide range of straight, right-angle, left-side and right-side exit configurations to solve most mechanical/dimensional constraints.

  • Compliant to SFF-8654 specification
  • Supports various plug connector types: straight, right angle, left-side exit, and right-side exit
  • Anti-skew feature is optional for special applications
  • Pull-tab is available for all type connectors
  • Anti-reverse right angle plug connector is available for supporting special applications
  • Metal latch adds robustness and improves on previous generations (MiniSAS HD)
  • Supports SAS, PCIe, UPI1.0, NVM Express® and NVLink® 25G applications
  • Supports 4x, 6x, 8x, and 12x configurations

The industry set out to design a high-performance cable assembly with the mechanical and electrical reliability which improves upon every previous SAS generation and improves the signal integrity to meet the SAS-4 requirements. The standards based SlimSAS product lines have been proven to reliably meet or exceed the storage industry needs.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Training Deep Learning Models Q&A

Erin Farr

May 19, 2023

title of post
The estimated impact of Deep Learning (DL) across all industries cannot be understated. In fact, analysts predict deep learning will account for the majority of cloud workloads, and training of deep learning models will represent the majority of server applications in the next few years. It’s the topic the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our webinar “Training Deep Learning Models in the Cloud.” If you missed the live event, it’s available on-demand at the SNIA Educational Library where you can also download the presentation slides. The audience asked our expert presenters, Milind Pandit from Habana Labs Intel and Seetharami Seelam from IBM several interesting questions. Here are their answers: Q. Where do you think most of the AI will run, especially training? Will it be in the public cloud or will it be on-premises or both [Milind:] It’s probably going to be a mix. There are advantages to using the public cloud especially because it’s pay as you go. So, when experimenting with new models, new innovations, new uses of AI, and when scaling deployments, it makes a lot of sense. But there are still a lot of data privacy concerns. There are increasing numbers of regulations regarding where data needs to reside physically and in which geographies. Because of that, many organizations are deciding to build out their own data centers and once they have large-scale training or inference successfully underway, they often find it cost effective to migrate their public cloud deployment into a data center where they can control the cost and other aspects of data management. [Seelam]: I concur with Milind. We are seeing a pattern of dual approaches. There are some small companies that don’t have the right capital necessary nor the expertise or teams necessary to acquire GPU based servers and deploy them. They are increasingly adopting public cloud. We are seeing some decent sized companies that are adopting this same approach as well. Keep in mind these GPU servers tend to be very power hungry and so you need the right floor plan, power, cooling, and so forth. So, public cloud definitely helps you have easy access and to pay for only what you consume. We are also seeing trends where certain organizations have constraints that restrict moving certain data outside their walls. In those scenarios, we are seeing customers deploy GPU systems on-premises. I don’t think it’s going to be one or the other. It is going to be a combination of both, but by adopting more of a common platform technology, this will help unify their usage model in public cloud and on-premises. Q. What is GDR? You mentioned using it with RoCE. [Seelam]: GDR stands for GPUDirect RDMA. There are several ways a GPU on one node can communicate to a GPU on another node. There are three different ways (at least) of doing this: The GPU can use TCP where GPU data is copied back into the CPU which orchestrates the communication to the CPU and GPU on another node. That obviously adds a lot of latency going through the whole TCP protocol. Another way to do this is through RoCEv2 or RDMA where CPUs, FPGAs and/or GPUs actually talk to each other through industry standard RDMA channels. So, you send and receive data without the added latency of traditional networking software layers. A third method is GDR where a GPU on one node can talk to a GPU on another node directly. This is done through network interfaces where basically the GPUs are talking to each other, again bypassing traditional networking software layers. Q. When you are talking about RoCE do you mean RoCEv2? [Seelam]: That is correct I’m talking only about RoCEv2. Thank you for the clarification. Q. Can you comment on storage needs for DL training and have you considered the use of scale out cloud storage services for deep learning training? If so, what are the challenges and issues? [Milind]: The storage needs are 1) massive and 2) based on the kind of training that you’re doing, (data parallel versus model parallel). With different optimizations, you will need parts of your data to be local in many circumstances. It’s not always possible to do efficient training when data is physically remote and there’s a large latency in accessing it. Some sort of a caching infrastructure will be required in order for your training to proceed efficiently. Seelam may have other thoughts on scale out approaches for training data. [Seelam]: Yes, absolutely I agree 100%. Unfortunately, there is no silver bullet to address the data problem with large-scale training. We take a three-pronged approach. Predominantly, we recommend users put their data in object storage and that becomes the source of where all the data lives. Many training jobs, especially training jobs that deal with text data, don’t tend to be huge in size because these are all characters so we use object store as a source directly to read the data and feed the GPUs to train. So that’s one model of training, but that only works for relatively smaller data sets. They get cached once you access the first time because you shard it quite nicely so you don’t have to go back to the data source many times. There are other data sets where the data volume is larger. So, if you’re dealing with pictures, video or these kinds of training domains, we adopt a two-pronged approach. In one scenario we actually have a distributed cache mechanism where the end users have a copy of the data in the file system and that becomes the source for AI training. In another scenario, we deployed that system with sufficient local storage and asked users to copy the data into that local storage to use that local storage as a local cache. So as the AI training is continuing once the data is accessed, it’s actually cached on the local drive and subsequent iterations of the data come from that cache. This is much bigger than the local memory. It’s about 12 terabytes of cache local storage with the 1.5 terabytes of data. So, we could get to these data sets that are in the 10-terabyte range per node just from the local storage. If they exceed that, then we go to this distributed cache. If the data sets are small enough, then we just use object storage. So, there are at least three different ways, depending on the use case on the model you are trying to train. Q. In a fully sharded data parallel model, there are three communication calls when compared to DDP (distributed data parallel). Does that mean it needs about three times more bandwidth? [Seelam]: Not necessarily three times more, but you will use the network a lot more than you would use in a DDP. In a DDP or distributed data parallel model you will not use the network at all in the forward pass. Whereas in an FSDP (fully sharded data parallel) model, you use the network both in forward pass and in backward pass. In that sense you use the network more, but at the same time because you don’t have parts of the model within your system, you need to get the model from the other neighbors and so that means you will be using more bandwidth. I cannot give you the 3x number; I haven’t seen the 3x but it’s more than DDP for sure. The SNIA CSTI has an active schedule of webinars to help educate on cloud technologies. Follow us on Twitter @sniacloud_com and sign up for the SNIA Matters Newsletter, so that you don’t miss any.                      

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to