Ethernet Roadmap for Networked Storage Q&A

David Fair

Jul 17, 2015

title of post
Almost 200 people attended our joint Webcast with the Ethernet Alliance: "The 2015 Ethernet Roadmap for Networked Storage." We had a lot of great questions during the live event, but we did not have time to answer them all. As promised, we've complied answers for all of the questions that came in. If you think of additional questions, please feel free to comment on this blog. Q. What did you mean by parity of flash with HDD? A. We were referring to the O'Reilly article in "Network Computing."   O'Reilly is predicting parity in BOTH capacity and price in 2016. Q. When do we expect IEEE standards ratification for 25G speed? A. 2016.   You can see the exact schedule here. Q. Do you envision the Enterprise, Cloud Providers, HPC, Financials getting rid of their 10/40GbE infrastructure and replacing that with 25/100GbE infrastructure in 2017? Will these customers deploy 100GbE/25GbE switch in the leaf layer in 2017? A. Deployment will occur over a multi-year time span overall if only because switch infrastructure is expensive to upgrade, as reflected in the Crehan Research forecast.   New deployments will likely move to 25/100GbE as new switches with 100GbE downstream ports become available in 2016.     Just because the Cloud Service Providers are currently the most aggressive in driving new infrastructure purchases, they represent the largest early volumes for 25/100 GbE.   Enterprise is still in the midst of the transition from 1GbE to 10GbE. Q. What are some of the developments on spanning-tree derivatives vs. Dykstra based derivatives such as OSPF, FSPF for switches? A. Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are at L3 and L4, which is handled by organizations like IETF. Q. With all the speeds possible who is working on flow control? A. Flow control at the 802.1 level is supported in the Layer 1/2 PHY & MAC by setting upper bounds on the delay through each layer which allows higher layers to comprehend the delays & response times to pause frames. Each new speed & PHY in 802.3 is accompanied by delay constraint specifications to support this. Q.   Do you have an overlay graphic that shows the Ethernet RDMA roadmap?   If so, is Ethernet storage the primary driver for that technology? A.   Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are at L3 and L4, which is handled by organizations like IETF and the InfiniBand Trade Association. Q. The adoption of faster and new Ethernet always has to do with the costs of acquiring new technology. How long do you think it will take to adopt/acquire faster Ethernet in datacenters now that the development is happening much faster than the last 20 years? A. Please see the chart on slide 7 where Crehan Research predicts how fast the technology will diffuse into deployments. Q. What do you expect as cost comparison between Ethernet and InfiniBand going forward? Also, what work is being done to reduce latency? A. Beyond the scope of this presentation.   Latency is primarily a consequence of design methodologies and semiconductor process technology, and thus under the control of the silicon device manufacturers.   Some vendors prioritize latency more than others. Q. What's the technical limitation as speeds go higher and higher? A. A number of factors limit speeds going faster and faster, but the main problem is that materials attenuate signals as they travel at higher frequencies. Q. Will 1GbE used for manageability purposes disappear from public cloud? If so, what is the expected time frame? A. This is a choice for end users.   Most equipment is managed on a separate network for security concerns, but users can eliminate these management networks at any time. Q. What are the relative market size predictions for the expanding number of standards (25G, 50G, 100G, 200G, etc.)? A. See the Crehan Research forecast in the presentation. Q. What is the major difference between SMF & MMF for the not so initiated? A. The SMF has a 9um core while the MMF has a 50um core.   Different lasers are used for each fiber type and MMF typically goes 100 meters above 10GbE and SMF goes from 500m to 10km. Q. Will 25G be available through both copper and fibre connectivity? A. Yes.   IEEE 802.3 work is currently underway to specify 25Gb/s on twinax ("direct attach copper)" to 5 meters, printed circuit backplane up to ~1m, twisted pair copper to 30m, multimode fiber to 100m.   There is no technology barrier to 25G on SMF, just that a standards project to specify it has not started yet. Q. This is interesting from a hardware viewpoint, but has nothing to do with storage yet.   Are we going to get to how this relates to storage other than saying flash drives are fast and only Ethernet can keep up? A. Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are directed at the higher layers.   The key point of this webcast is that storage networking engineers need to pay much more attention to the Ethernet roadmap than they have historically, primarily because of NVM. Q. How does "SFP 28" fit in this mix?   Is it required for 25G? A. SFP28 connectors and modules are required for 25GbE because they give better performance than SFP+ that only works to 10GbE. Q. Can you provide the quick difference between copper & optical on speed & costs? A. Copper and optical Ethernet links are usually standardized at the same speed.   400GbE is not defining a copper link but an active Direct Attached Cable (DAC) will probably support 400GbE.   Cost depends on volume and many factors and is beyond the scope of this presentation.   Copper is usually a fraction of the cost of optical links. Q. Do you think people will try to use multiple CAT 5e to get more aggregate bandwidth to the access points to avoid having to run Fibre to them? A. IEEE is defining 2.5GBASE-T and 5GBASE-T to enable Cat5e to support faster wireless access points. Q. When are higher speeds and PoE going to reach the point when copper based Ethernet will become a viable heat source for buildings thus helping the environment? A. :)   IEEE is defining 4 wire PoE to deliver at least 60W to end devices.   You can find out more here. Q. What are the use cases for 2.5Gb and 5.0Gb Base-T? A. The leading use case for 2.5G/5GBASE-T is to provide the uplink for wireless LAN access points that support 802.11ac and future wireless technology.   Wireless LAN technology has advanced to the point where >1Gb/s BW is needed upstream from the AP, and 2.5G/5G provide a higher speed uplink while preserving the user's investment in Cat5e/Cat6 cabling. Q. Why not have only CFP2 sockets right away with things disabled for lower speeds for all the intervening years leading to full-fledged CFP2? A. CFP2 is defined for 100GbE and 8 ports can be used on a 1U switch. 100GbE switches are shifting to QSFP28 so that 32 ports of 100GbE is supported in a 1U switch at low cost.   The CFP2 is much more expensive than QSFP28 and will not be used for lower speeds because of the high cost.                              

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Benefits of RDMA in Accelerating Ethernet Storage Q&A

Mike Jochimsen

Mar 9, 2015

title of post

At our recent live Webcast “Benefits of RDMA in Accelerating Ethernet Storage Connectivity” experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, “Probably the most clearly comprehensible yet comprehensive webinar I’ve attended in some time.” If you missed the Webcast, it’s now available on demand. We did not have time to get to everyone’s questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we’ll get back to you as soon as possible.

Q. Is RDMA over RoCEv2 in production?

A. The IBTA released the RoCEv2 Specification in September 2014.  In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems.  Schedules for implementation of that specification will vary by operating system.  For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now.  Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack.  Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing

Q. I would have liked a list of Windows applications that take advantage of SMB Direct – both in a Hyper-V host or bare metal.

A. In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com.

Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)?

A. There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today’s webcast

  • iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
  • RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
  • RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.

Q. Who are the vendors offering iWARP capable RNICs?

A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers.

Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections?

A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results.

Q. I love a future webcast for RDMA enabled distributed filesystems.

A. Thanks for the suggestion! We’re always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on.

Q. Is Live Migration the scenario where “packet size” is 1MB?

A. All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase.

Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn’t FCoE give the same performance as FC (same stack) vs. SMB3?

A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this!

Q. Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations?

A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard “Fast Register Work Request” method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto’s blog, which contains many benchmark results.

Q. But isn’t Live Migration done in 1MB “chunks”? So not “small” I/Os?

A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both.

Q. Are iSER and iWARP alternatives to one another?

A.  iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand.

Q. What’s Intel’s roadmap for RoCE and/or iWARP?

A. Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information.

Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible?

A. RoCE was developed to leverage Infiniband as much as possible.  For that reason, the Infiniband transport was chosen when the RoCE standard was developed.  As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained.  While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route as of this writing.

 

 

 

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Benefits of RDMA in Accelerating Ethernet Storage Q&A

Mike Jochimsen

Mar 9, 2015

title of post
At our recent live Webcast "Benefits of RDMA in Accelerating Ethernet Storage Connectivity" experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, "Probably the most clearly comprehensible yet comprehensive webinar I've attended in some time." If you missed the Webcast, it's now available on demand. We did not have time to get to everyone's questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we'll get back to you as soon as possible. Q.  Is RDMA over RoCEv2 in production? A. The IBTA released the RoCEv2 Specification in September 2014.  In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems.   Schedules for implementation of that specification will vary by operating system.   For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now.   Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack.   Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing Q. I would have liked a list of Windows applications that take advantage of SMB Direct - both in a Hyper-V host or bare metal. A.  In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com. Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)? A.  There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today's webcast
  • iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
  • RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
  • RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.
Q. Who are the vendors offering iWARP capable RNICs? A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers. Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections? A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results. Q. I love a future webcast for RDMA enabled distributed filesystems. A. Thanks for the suggestion! We're always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on. Q.  Is Live Migration the scenario where "packet size" is 1MB? A.  All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase. Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn't FCoE give the same performance as FC (same stack) vs. SMB3? A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this! Q.  Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations? A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard "Fast Register Work Request" method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto's blog, which contains many benchmark results. Q. But isn't Live Migration done in 1MB "chunks"? So not "small" I/Os? A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both. Q. Are iSER and iWARP alternatives to one another? A.  iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand. Q. What's Intel's roadmap for RoCE and/or iWARP? A.  Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information. Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible? A. RoCE was developed to leverage Infiniband as much as possible.   For that reason, the Infiniband transport was chosen when the RoCE standard was developed.   As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability  and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained.   While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route  as of this writing.      

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

New ESF Webcast: Benefits of RDMA in Accelerating Ethernet Storage Connectivity

David Fair

Jan 30, 2015

title of post

We’re kicking off our 2015 ESF Webcasts on March 4th with what we believe is an intriguing topic – how RDMA technologies can accelerate Ethernet Storage. Remote Direct Memory Access (RDMA) has existed for many years as an interconnect technology, providing low latency and high bandwidth in computing clusters. More recently, RDMA has gained traction as a method for accelerating storage connectivity and interconnectivity on Ethernet. In this Webcast, experts from Emulex, Intel and Microsoft will discuss:

  • Storage protocols that take advantage of RDMA
  • Overview of iSER for block storage
  • Deep dive of SMB Direct for file storage.
  • Benefits of available RDMA technologies to accelerate your Ethernet storage connectivity, both iWARP and RoCE

Register now. This live Webcast will provide attendees with a vendor-neutral look at RDMA technologies and should prove to be an interactive and informative event. I hope you’ll join us!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

New ESF Webcast: Benefits of RDMA in Accelerating Ethernet Storage Connectivity

David Fair

Jan 30, 2015

title of post
We're kicking off our 2015 ESF Webcasts on March 4th with what we believe is an intriguing topic – how RDMA technologies can accelerate Ethernet Storage. Remote Direct Memory Access (RDMA) has existed for many years as an interconnect technology, providing low latency and high bandwidth in computing clusters.  More recently, RDMA has gained traction as a method for accelerating storage connectivity and interconnectivity on Ethernet.  In this Webcast, experts from Emulex, Intel and Microsoft will discuss:
  • Storage protocols that take advantage of RDMA
  • Overview of iSER for block storage
  • Deep dive of SMB Direct for file storage.
  • Benefits of available RDMA technologies to accelerate your Ethernet storage connectivity, both iWARP and RoCE
Register now. This live Webcast will provide attendees with a vendor-neutral look at RDMA technologies and should prove to be an interactive and informative event. I hope you'll join us!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to RDMA