Sorry, you need to enable JavaScript to visit this website.

Training Deep Learning Models Q&A

Erin Farr

May 19, 2023

title of post
The estimated impact of Deep Learning (DL) across all industries cannot be understated. In fact, analysts predict deep learning will account for the majority of cloud workloads, and training of deep learning models will represent the majority of server applications in the next few years. It’s the topic the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our webinar “Training Deep Learning Models in the Cloud.” If you missed the live event, it’s available on-demand at the SNIA Educational Library where you can also download the presentation slides. The audience asked our expert presenters, Milind Pandit from Habana Labs Intel and Seetharami Seelam from IBM several interesting questions. Here are their answers: Q. Where do you think most of the AI will run, especially training? Will it be in the public cloud or will it be on-premises or both [Milind:] It's probably going to be a mix. There are advantages to using the public cloud especially because it's pay as you go. So, when experimenting with new models, new innovations, new uses of AI, and when scaling deployments, it makes a lot of sense. But there are still a lot of data privacy concerns. There are increasing numbers of regulations regarding where data needs to reside physically and in which geographies. Because of that, many organizations are deciding to build out their own data centers and once they have large-scale training or inference successfully underway, they often find it cost effective to migrate their public cloud deployment into a data center where they can control the cost and other aspects of data management. [Seelam]: I concur with Milind. We are seeing a pattern of dual approaches. There are some small companies that don't have the right capital necessary nor the expertise or teams necessary to acquire GPU based servers and deploy them. They are increasingly adopting public cloud. We are seeing some decent sized companies that are adopting this same approach as well. Keep in mind these GPU servers tend to be very power hungry and so you need the right floor plan, power, cooling, and so forth. So, public cloud definitely helps you have easy access and to pay for only what you consume. We are also seeing trends where certain organizations have constraints that restrict moving certain data outside their walls. In those scenarios, we are seeing customers deploy GPU systems on-premises. I don't think it's going to be one or the other. It is going to be a combination of both, but by adopting more of a common platform technology, this will help unify their usage model in public cloud and on-premises. Q. What is GDR? You mentioned using it with RoCE. [Seelam]: GDR stands for GPUDirect RDMA. There are several ways a GPU on one node can communicate to a GPU on another node. There are three different ways (at least) of doing this: The GPU can use TCP where GPU data is copied back into the CPU which orchestrates the communication to the CPU and GPU on another node. That obviously adds a lot of latency going through the whole TCP protocol. Another way to do this is through RoCEv2 or RDMA where CPUs, FPGAs and/or GPUs actually talk to each other through industry standard RDMA channels. So, you send and receive data without the added latency of traditional networking software layers. A third method is GDR where a GPU on one node can talk to a GPU on another node directly. This is done through network interfaces where basically the GPUs are talking to each other, again bypassing traditional networking software layers. Q. When you are talking about RoCE do you mean RoCEv2? [Seelam]: That is correct I'm talking only about RoCEv2. Thank you for the clarification. Q. Can you comment on storage needs for DL training and have you considered the use of scale out cloud storage services for deep learning training? If so, what are the challenges and issues? [Milind]: The storage needs are 1) massive and 2) based on the kind of training that you're doing, (data parallel versus model parallel). With different optimizations, you will need parts of your data to be local in many circumstances. It's not always possible to do efficient training when data is physically remote and there's a large latency in accessing it. Some sort of a caching infrastructure will be required in order for your training to proceed efficiently. Seelam may have other thoughts on scale out approaches for training data. [Seelam]: Yes, absolutely I agree 100%. Unfortunately, there is no silver bullet to address the data problem with large-scale training. We take a three-pronged approach. Predominantly, we recommend users put their data in object storage and that becomes the source of where all the data lives. Many training jobs, especially training jobs that deal with text data, don't tend to be huge in size because these are all characters so we use object store as a source directly to read the data and feed the GPUs to train. So that's one model of training, but that only works for relatively smaller data sets. They get cached once you access the first time because you shard it quite nicely so you don't have to go back to the data source many times. There are other data sets where the data volume is larger. So, if you're dealing with pictures, video or these kinds of training domains, we adopt a two-pronged approach. In one scenario we actually have a distributed cache mechanism where the end users have a copy of the data in the file system and that becomes the source for AI training. In another scenario, we deployed that system with sufficient local storage and asked users to copy the data into that local storage to use that local storage as a local cache. So as the AI training is continuing once the data is accessed, it's actually cached on the local drive and subsequent iterations of the data come from that cache. This is much bigger than the local memory. It’s about 12 terabytes of cache local storage with the 1.5 terabytes of data. So, we could get to these data sets that are in the 10-terabyte range per node just from the local storage. If they exceed that, then we go to this distributed cache. If the data sets are small enough, then we just use object storage. So, there are at least three different ways, depending on the use case on the model you are trying to train. Q. In a fully sharded data parallel model, there are three communication calls when compared to DDP (distributed data parallel). Does that mean it needs about three times more bandwidth? [Seelam]: Not necessarily three times more, but you will use the network a lot more than you would use in a DDP. In a DDP or distributed data parallel model you will not use the network at all in the forward pass. Whereas in an FSDP (fully sharded data parallel) model, you use the network both in forward pass and in backward pass. In that sense you use the network more, but at the same time because you don't have parts of the model within your system, you need to get the model from the other neighbors and so that means you will be using more bandwidth. I cannot give you the 3x number; I haven't seen the 3x but it's more than DDP for sure. The SNIA CSTI has an active schedule of webinars to help educate on cloud technologies. Follow us on Twitter @sniacloud_com and sign up for the SNIA Matters Newsletter, so that you don’t miss any.                      

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Web 3.0 – The Future of Decentralized Storage

Joseph White

May 8, 2023

title of post
Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.” In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. As part of this webinar, we will introduce you to the Decentralized Storage Alliance, a group of like-minded individuals and organizations committed to advancing the adoption of decentralized storage. We will provide insights into the members of the Alliance and the working groups that are driving innovation and progress in this exciting field and answer questions such as:
  • Why is enterprise decentralized storage important?
  • What are the benefits, the demand, and why now?
  • How will on-premises, to cloud, to decentralized storage evolve?
  • What are the use cases for decentralized storage?
  • Who are the members and working groups of the Decentralized Storage Alliance?
Join us on June 1st to gain valuable insights into the future of decentralized storage and discover how you can be part of this game-changing technology.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Web 3.0 – The Future of Decentralized Storage

Joseph White

May 8, 2023

title of post
Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.” In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. As part of this webinar, we will introduce you to the Decentralized Storage Alliance, a group of like-minded individuals and organizations committed to advancing the adoption of decentralized storage. We will provide insights into the members of the Alliance and the working groups that are driving innovation and progress in this exciting field and answer questions such as:
  • Why is enterprise decentralized storage important?
  • What are the benefits, the demand, and why now?
  • How will on-premises, to cloud, to decentralized storage evolve?
  • What are the use cases for decentralized storage?
  • Who are the members and working groups of the Decentralized Storage Alliance?
Join us on June 1st to gain valuable insights into the future of decentralized storage and discover how you can be part of this game-changing technology. The post Web 3.0 – The Future of Decentralized Storage first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

It’s A Wrap – But Networking and Education Continue From Our C+M+S Summit!

SNIA CMSI

May 1, 2023

title of post
Our 2023 SNIA Compute+Memory+Storage Summit was a success! The event featured 50 speakers in 40 sessions over two days. Over 25 SNIA member companies and alliance partners participated in creating content on computational storage, CXL™ memory, storage, security, and UCIe™. All presentations and videos are free to view at www.snia.org/cms-summit. “For 2023, the Summit scope expanded to examine how the latest advances within and across compute, memory and storage technologies should be optimized and configured to meet the requirements of end customer applications and the developers that create them,” said David McIntyre, Co-Chair of the Summit.  “We invited our SNIA Alliance Partners Compute Express Link™ and Universal Chiplet Interconnect Express™ to contribute to a holistic view of application requirements and the infrastructure resources that are required to support them,” McIntyre continued.  “Their panel on the CXL device ecosystem and usage models and presentation on UCIe innovations at the package level along with three other sessions on CXL added great value to the event.” Thirteen computational storage presentations covered what is happening in NVMe™ and SNIA to support computational storage devices and define new interfaces with computational storage APIs that work across different hardware architectures.  New applications for high performance data analytics, discussions of how to integrate computational storage into high performance computing designs, and new approaches to integrate compute, data and I/O acceleration closely with storage systems and data nodes were only a few of the topics covered. “The rules by which the memory game is played are changing rapidly and we received great feedback on our nine presentations in this area,” said Willie Nelson, Co-Chair of the Summit.  “SNIA colleagues Jim Handy and Tom Coughlin always bring surprising conclusions and opportunities for SNIA members to keep abreast of new memory technologies, and their outlook was complimented by updates on SNIA standards on memory-to memory data movement and on JEDEC memory standards; presentations on thinking memory, fabric attached memory, and optimizing memory systems using simulations; a panel examining where the industry is going with persistent memory, and much more.” Additional highlights included an EDSFF panel covering the latest SNIA specifications that support these form factors, sharing an overview of platforms that are EDSFF-enabled, and discussing the future for new product and application introductions; a discussion on NVMe as a cloud interface; and a computational storage detecting ransomware session. New to the 2023 Summit – and continuing to get great views - was a “mini track” on Security, led by Eric Hibbard, chair of the SNIA Storage Security Technical Work Group with contributions from IEEE Security Work Group members, including presentations on cybersecurity, fine grain encryption, storage sanitization, and zero trust architecture. Co-Chairs McIntyre and Nelson encourage everyone to check out the video playlist and send your feedback to askcmsi@snia.org. The “Year of the Summit” continues with networking opportunities at the upcoming SmartNIC Summit (June), Flash Memory Summit (August), and SNIA Storage Developer Conference (September).  Details on all these events and more are at the SNIA Event Calendar page.  See you soon!

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

It’s A Wrap – But Networking and Education Continue From Our C+M+S Summit!

SNIA CMS Community

May 1, 2023

title of post
Our 2023 SNIA Compute+Memory+Storage Summit was a success! The event featured 50 speakers in 40 sessions over two days. Over 25 SNIA member companies and alliance partners participated in creating content on computational storage, CXL™ memory, storage, security, and UCIe™. All presentations and videos are free to view at www.snia.org/cms-summit. “For 2023, the Summit scope expanded to examine how the latest advances within and across compute, memory and storage technologies should be optimized and configured to meet the requirements of end customer applications and the developers that create them,” said David McIntyre, Co-Chair of the Summit.  “We invited our SNIA Alliance Partners Compute Express Link™ and Universal Chiplet Interconnect Express™ to contribute to a holistic view of application requirements and the infrastructure resources that are required to support them,” McIntyre continued.  “Their panel on the CXL device ecosystem and usage models and presentation on UCIe innovations at the package level along with three other sessions on CXL added great value to the event.” Thirteen computational storage presentations covered what is happening in NVMe™ and SNIA to support computational storage devices and define new interfaces with computational storage APIs that work across different hardware architectures.  New applications for high performance data analytics, discussions of how to integrate computational storage into high performance computing designs, and new approaches to integrate compute, data and I/O acceleration closely with storage systems and data nodes were only a few of the topics covered. “The rules by which the memory game is played are changing rapidly and we received great feedback on our nine presentations in this area,” said Willie Nelson, Co-Chair of the Summit.  “SNIA colleagues Jim Handy and Tom Coughlin always bring surprising conclusions and opportunities for SNIA members to keep abreast of new memory technologies, and their outlook was complimented by updates on SNIA standards on memory-to memory data movement and on JEDEC memory standards; presentations on thinking memory, fabric attached memory, and optimizing memory systems using simulations; a panel examining where the industry is going with persistent memory, and much more.” Additional highlights included an EDSFF panel covering the latest SNIA specifications that support these form factors, sharing an overview of platforms that are EDSFF-enabled, and discussing the future for new product and application introductions; a discussion on NVMe as a cloud interface; and a computational storage detecting ransomware session. New to the 2023 Summit – and continuing to get great views – was a “mini track” on Security, led by Eric Hibbard, chair of the SNIA Storage Security Technical Work Group with contributions from IEEE Security Work Group members, including presentations on cybersecurity, fine grain encryption, storage sanitization, and zero trust architecture. Co-Chairs McIntyre and Nelson encourage everyone to check out the video playlist and send your feedback to askcmsi@snia.org. The “Year of the Summit” continues with networking opportunities at the upcoming SmartNIC Summit (June), Flash Memory Summit (August), and SNIA Storage Developer Conference (September).  Details on all these events and more are at the SNIA Event Calendar page.  See you soon! The post It’s A Wrap – But Networking and Education Continue From Our C+M+S Summit! first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Threat Detection Q&A

Michael Hoard

Apr 28, 2023

title of post
Stealing data, compromising data, and holding data hostage have always been the main goals of cybercriminals. Threat detection and response methods continue to evolve as the bad guys become increasingly sophisticated, but for the most part, storage has been missing from the conversation. Enter “Cyberstorage,” a topic the SNIA Cloud Storage Technologies Initiative recently covered in our live webinar, “Cyberstorage and XDR: Threat Detection with a Storage Lens.” It was a fascinating look at enhancing threat detection at the storage layer. If you missed the live event, it’s available on-demand along with the presentation slides. We had some great questions from the live event as well as interesting results from our audience poll questions that we wanted to share here. Q. You mentioned antivirus scanning is redundant for threat detection in storage, but could provide value during recovery. Could you elaborate on that? A. Yes, antivirus can have a high value during recovery, but it's not always intuitive on why this is the case. If malware makes it to your snapshots or your backups, it's because it was unknown and it was not detected. Then, at some point, that malware gets activated on your live system and your files get encrypted. Suddenly, you now know something happened, either because you can’t use the files or because there’s a ransomware banner note. Next, the incident responders come in and a signature for that malware is now identified. The malware becomes known. The antivirus/EDR vendors quickly add a patch to their signature scanning software, for you to use. Since malware can dwell on your systems without being activated for days or weeks, you want to use that updated signature scan and/or utilize a file malware scanner to validate that you're not reintroducing malware that was sitting dormant in your snapshots or backups. This way you can ensure as you restore data, you are not reintroducing dormant malware. Audience Poll Results Here’s how our live audience responded to our poll questions. Let us know what you think by leaving us a comment on this blog. Q. What are other possible factors to consider when assessing Cyberstorage solutions? A. Folks generally tend to look at CPU usage for any solution and looking at that for threat detection capabilities also makes sense. However, you might want to look at this in the context of where the threat detection is occurring across the data life cycle. For example, if the threat detection software runs on your live system, you'll want lower CPU usage. But, if the detection is occurring against a snapshot outside your production workloads or if it's against secondary storage, higher CPU usage may not matter as much.  

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Threat Detection Q&A

Michael Hoard

Apr 28, 2023

title of post
Stealing data, compromising data, and holding data hostage have always been the main goals of cybercriminals. Threat detection and response methods continue to evolve as the bad guys become increasingly sophisticated, but for the most part, storage has been missing from the conversation. Enter “Cyberstorage,” a topic the SNIA Cloud Storage Technologies Initiative recently covered in our live webinar, “Cyberstorage and XDR: Threat Detection with a Storage Lens.” It was a fascinating look at enhancing threat detection at the storage layer. If you missed the live event, it’s available on-demand along with the presentation slides. We had some great questions from the live event as well as interesting results from our audience poll questions that we wanted to share here. Q. You mentioned antivirus scanning is redundant for threat detection in storage, but could provide value during recovery. Could you elaborate on that? A. Yes, anitvirus can have a high value during recovery, but it’s not always intuitive on why this is the case. If malware makes it to your snapshots or your backups, it’s because it was unknown and it was not detected. Then, at some point, that malware gets activated on your live system and your files get encrypted. Suddenly, you now know something happened, either because you can’t use the files or because there’s a ransomware banner note. Next, the incident responders come in and a signature for that malware is now identified. The malware becomes known. The antivirus/EDR vendors quickly add a patch to their signature scanning software, for you to use. Since malware can dwell on your systems without being activated for days or weeks, you want to use that updated signature scan to validate that you’re not reintroducing malware that was sitting dormant in your snapshots or backups. This way you can ensure as you restore data, you are not reintroducing dormant malware. Audience Poll Results Here’s how our live audience responded to our poll questions. Let us know what you think by leaving us a comment on this blog. Q. What are other possible factors to consider when assessing Cyberstorage solutions? A. Folks generally tend to look at CPU usage for any solution and looking at that for threat detection capabilities also makes sense. However, you might want to look at this in the context of where the threat detection is occurring across the data life cycle. For example, if the threat detection software runs on your live system, you’ll want lower CPU usage. But, if the detection is occurring against a snapshot outside your production workloads or if it’s against secondary storage, higher CPU usage may not matter as much.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Scaling Management of Storage and Fabrics

Richelle Ahlvers

Apr 19, 2023

title of post

Composable disaggregated infrastructures (CDI) provide a promising solution to address the provisioning and computational efficiency limitations, as well as hardware and operating costs, of integrated, siloed, systems. But how do we solve these problems in an open, standards-based way?

DMTF, SNIA, the OFA, and the CXL Consortium are working together to provide elements of the overall solution, with Redfish® and SNIA Swordfish manageability providing the standards-based interface.

The OpenFabrics Alliance (OFA) is developing an OpenFabrics Management Framework (OFMF) designed for configuring fabric interconnects and managing composable disaggregated resources in dynamic HPC infrastructures using client-friendly abstractions.

Want to learn more? On Wednesday, May 17, 2023, SNIA Storage Management Initiative (SMI) and the OFA are hosting a live webinar entitled “Casting the Net: Scaling Management of Storage and Fabrics” to share use cases for scaling management of storage and fabrics and beyond.

They’ll dive into:

  • What is CDI? Why should you care? How will it help you?
  • Why does standards-based management help?
  • What does a management framework for CDI look like?
  • How can you get involved? How can your engagement accelerate solutions?

In under an hour, this webinar will give you a solid understanding of how SMI and OFA, along with other alliance partners, are creating the approaches and standards to solve the puzzle of how to effectively address computational efficiency limitations.

Register here to join us on May 17th.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Scaling Management of Storage and Fabrics

Richelle Ahlvers

Apr 19, 2023

title of post
Composable disaggregated infrastructures (CDI) provide a promising solution to address the provisioning and computational efficiency limitations, as well as hardware and operating costs, of integrated, siloed, systems. But how do we solve these problems in an open, standards-based way? DMTF, SNIA, the OFA, and the CXL Consortium are working together to provide elements of the overall solution, with Redfish® and SNIA Swordfish™ manageability providing the standards-based interface. The OpenFabrics Alliance (OFA) is developing an OpenFabrics Management Framework (OFMF) designed for configuring fabric interconnects and managing composable disaggregated resources in dynamic HPC infrastructures using client-friendly abstractions. Want to learn more? On Wednesday, May 17, 2023, SNIA Storage Management Initiative (SMI) and the OFA are hosting a live webinar entitled “Casting the Net: Scaling Management of Storage and Fabrics” to share use cases for scaling management of storage and fabrics and beyond. They’ll dive into:
  • What is CDI? Why should you care? How will it help you?
  • Why does standards-based management help?
  • What does a management framework for CDI look like?
  • How can you get involved? How can your engagement accelerate solutions?
In under an hour, this webinar will give you a solid understanding of how SMI and OFA, along with other alliance partners, are creating the approaches and standards to solve the puzzle of how to effectively address computational efficiency limitations. Register here to join us on May 17th. The post Scaling Management of Storage and Fabrics first appeared on SNIA Storage Management Blog.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Questions & Answers from our January 2023 Webcast: Storage Trends in 2023 and Beyond

STA Forum

Apr 17, 2023

title of post

These questions were asked and mostly answered during our webcast, Storage Trends in 2023 and Beyond. Graphics included in this article were shown during the webcast, and several of the questions refer to the data in the charts.

Thank you to our panelists:

Don Jeanette, Vice President, TRENDFOCUS
Patrick Kennedy, Principal Analyst, ServeTheHome
Rick Kutcipal, At-Large Director, SCSI Trade Association and Product Planner, Data Center Solutions Group, Broadcom

Q1: What does the future hold for U.3? (SFF-TA-1001) Was it included in the U.2 numbers?

U.2 should be U.X in the pie chart. There are some shipments out there today, customers are taking it, and it will likely grow — but all the efforts and priorities are really E3S, E1S and then, to some extent, E1,L.

Q2: How do number of units compare to exabytes shipped?

For hard drives, the sweet spot, for nearline SATA or SAS it’s about 18 terabytes today. For SSDs, we’ve been running at a terabyte for SATA for several years now; SAS we’re averaging about 3.6 to 3.9 terabytes today.

PCIe, interestingly enough, is lower than SAS today, but with these new form factors that we keep talking about, the average cap for PCIe is going to really pop up because of the data center implementations with E1S, one terabyte is going to 2, 4 and 8; ruler, or E1L, is 16 today going to 32TB. U.2 today average capacity is about 6 terabytes, that’s going to 8 and then 16 down the road. That’s why we always look at exabytes because that’s where pretty much every vendor and customer is looking at what they are going to buy, how much storage. As the capacities go up, the units flatten out in the forecast. You could have a potential dip in late 2024 to 2025 in units because you have some big migrations going to new form factors, where it’s going to go 4X in some cases. One terabyte goes to four terabytes.

Total units are 16 million for one quarter vs 39 exabytes. More than 2X on the exabytes for units.

Q3: What needs to be true for SSDs to make significant gains into the HDD share of the market?

10K and 15K HDDs are pretty much dead already, and we said a lot about SSDs going into nearline. They will shave off, and you have some storage RAID companies giving effort, to try to sell their solid-state storage systems and try to cut into that nearline hard drive markets. They are going to shave off a little bit off that top layer of high capacity nearline-based systems, the ones that are still priced higher, where they can start to get to a delta and a price point differential. You might see some elasticity of certain customers moving over but we will not see a major transition in the near term.

10K and 15K HDDs have struggled, because folks realize that SSDs are faster, but there are other things like for example, boot drives. Boot drives used to be hard drives. Now they are M.2 drives. Some of the applications that we are seeing – boot drives that were hard drives – and we also look at some of the applications that are even beyond just the traditional data centers.

One good example of that is video data analytics at the edge. Many are building systems now and they’re currently hard drive-based systems; they’ll deploy them at the edge, they will inject data, put it onto an array of hard drives, usually have AI acceleration, be able to do inference on those video feeds live at the box, and for any anomalies push them up to the cloud.

Almost all of those boxes are currently hard drive boxes. What you will see as the SSD capacities grow, there are companies out there that look at things like – if you have an edge box, if you have one that’s on every street corner, every retail location – rolling a truck to fix a hard drive in that kind of application is difficult – there’s power, there’s size constraints, and so adding things like SSDs, especially as they start to eclipse hard drives in capacity, gives a new specific type of application that folks can go after, not just the high performance segment but also high capacity in a more compact space.

Q4: Do your numbers for SAS include both HDD and SSD?

(NOTE TO READERS: This data comes from TRENDFOCUS analyst firm, specifically from VP Don Jeanette. You are welcome to contact them if you are interested in purchasing their reports.)

The hard drive number in our slides presented, of the 255 exabytes, it’s about a 75-25 split in favor of SATA. Of the enterprise SSDs, SATA is about 5 exabytes a quarter, about 3 to 4 exabytes a quarter on SAS, and then the bulk is PCIe: 24 to 26 exabytes.

Q5: When you say “Enterprise” does that include the Cloud Service Providers?  Or is that a separate category.

Yes, CSPs are included in Enterprise. We distinguish Enterprise & CSP from Client.

Historically at the product level there are two buckets: enterprise or client. When someone says enterprise, I say what are their end markets – traditional end market or data center/ hyperscale end market? So, Products vs. Markets. Enterprise for me, at the device level, still encompasses both of those markets, but then if they want to go further, depending on where those products go, it’s data center / traditional enterprise market.

From my perspective, that categorization is totally right. Initially, there was client and enterprise. Because of Cloud Service Providers have become such a big market, in themselves, we tend to say data center versus enterprise. There are even some pockets of folks that operate at a smaller than hyperscale size but operate and build infrastructure that’s much more like a hyperscaler would do it versus a traditional enterprise.

Another way to say it is that sometimes we talk about data center class SSDs, enterprise class SSDs, and a delineating point would be level of qualification.

Q6: What is the cost difference between HDD vs SDD? And do you see the cost parity getting closer?

For HDDS, we are at 1+ penny a gigabyte in nearline today. It’s not going down 30 or 40% a quarter a year, depending on supply, like we are seeing in NAND today. You can go 40% in two quarters on NAND or SSD, but it might jump back up another 20 or 30% depending on which way the wind blows.

It will continue to go down. For SSDs, if we are at sub ten cents a gigabyte today on average, plus or minus, it’s going to go down though — even with QLC implementations in certain spots; but my answer to “same price” is probably never. Even if that gets cut in half, it’s going to be tough for a lot of teams out there to say let’s do it. I know a lot of times product specifications win on paper when it comes to comparisons, but at the device brick level, you ask any procurement team in any company in the world, and it would have to be nearly the same price to transition from one technology to the other.

We need to remember that the three remaining HDD vendors are still investing very heavily in capacity optimized technologies. We’ve seen incremental improvements with things like SMR. On the horizon we’re watching (HAMR) heat-assisted magnetic recording – if and when HAMR happens, that could be a real boost on the HDD side. So very active playing field.

Three metrics: HAMR, SMR and we’re still adding components. Yesterday we’re shipping nearline hard drives at 4 or 5 platters, we’re at 9 or 10 now, and we are discussing going to 11. They always find a way. For the three vendors left, it constitutes the majority of what they have, so they are going to do everything in their power to keep that competitive edge.

Q7: Why are hyperscalers still planning to use SAS for storage?

In a modern server, for example a single socket Sapphire rapid SuperMicro server, we talked about the PCI Gen 5 lanes being so much faster than PCI Gen 3 lanes. When you have a SAS controller, you can handle so many more hard drives for a given lane of PCIe bandwidth, and a hard drive can actually feed it. An interesting way to think about it is a PCIe x 16 slot now has more bandwidth on a server like this than you had in an entire server a couple of years ago. The architecture that we think of with SAS, one of the biggest reasons for it is you can put a lot of storage behind it, and two, the fact is that the dual port SAS (especially if you are running a multi home solution) there are dual port hard drives or dual port NVMe drives, – not all of them are dual port, even the enterprise drives.

Scalability is a main driver for hyperscalers using SAS. The scale of some of these machines is absolutely enormous. It’s about their nearline capacity, nearline storage and the scale of it.

Another primary driver is dollars per gigabyte. How are you going to store all this stuff? Think about the amount of capacity that must be in a hyperscale data center. It’s absolutely enormous. If they have to pay a dollar more per drive and buy ten million drives a year, that’s ten million dollars off their bottom line.

From a storage device level, SAS hard drives and SSDs:  more throughput, dual port vs single port, better ECC, some technical and reliability metrics that would favor SAS over SATA.

Additionally, another key consideration is supporting these storage requirements from a production perspective. The entire NAND industry would do currently about 170 exabytes on a given quarter. You must take care of every PC out there, all these enterprise SSDs we’re talking about now, you have to take care of every smart phone in the world which consumes about 33-35% of the NAND in the quarter, and then all those other applications DIY, embedded, spot market stuff. Nearline in the quarter we just talked about did 250 exabytes alone. Production support for that is not insignificant.

Q8: What do you think will be the biggest storage challenge for 2023?

(Storage Influencer Perspective) Biggest storage challenge for 2023 – the newest generation of servers, the consolidation ratios, it’s like a hockey stick in terms of people were used to getting generation on generation maybe one and a half previous Gen servers to a current Gen server and now we’re talking like two, three, 4X; and so the idea of how do you balance and how do you rethink your storage architecture. I think is a real challenge for folks – not just in the servers themselves but you know a 400G NIC gives you the ability to have an entire 2017 generation servers’ worth of PCIe bandwidth over a NIC, – and what does that do for an architecture and how do you build with that? So, that’s my biggest storage challenge.

(Data Storage Analyst Perspective) Two things – one technical, one business. PCIe Gen 5 is obviously the hot topic today. I hear expectations from certain companies, I know it’s going to go, but my response is always you know it never goes the way you plan, it’s going to get pushed out, percentage lower than the volumes you think you are going to do.

Number two, on the business case, my concern for 2023 is the health of various companies out there, because of the current environment we are in, and what some of these storage devices are being sold at, so that’s a legitimate concern on “What’s the landscape today in January of 2023” versus what are we talking about in December 2023, because we’ve had three quarters of very tough times, and we have multiple quarters ahead. It is going to be interesting how certain companies navigate these waters.

(Manufacturer / Supplier Perspective) I break mine up into two pieces – this is coming from an infrastructure play, and that’s what I’m very focused on. One, meeting the demands of the scale required by our data center customers. How do we provide that scale? It’s more than just how many things can we hook up. It’s making them all work well together, providing the appropriate quality of service, spin up of large systems – it’s very complicated. The second one is signal integrity around PCIe Gen 5. That’s from an infrastructure perspective, that’s a big challenge that the whole ecosystem is facing, and that’s going to be a very focused effort in 2023.

Q9: Your 2021 vs 2026 Exabyte slide: is that only SSD or includes HDD exabytes?

(NOTE TO READERS: This data comes from TRENDFOCUS analyst firm, specifically from VP Don Jeanette. You are welcome to contact them if you are interested in purchasing their reports.)

The exabytes referred to in that slide (graphic below) are for SSD.

Q11: Is there a difference in performance with U.2 and U.3?

No, there is no difference in performance between U.2 and U.3. Both are the same 2.5” form factor.

Q12: Is there going to be a 48G SAS, and/or what is status of 24G+ SAS? What are the challenges with the next version of Serial Attached SCSI?

Based on the current market demands, we believe that the SAS-4 physical layer, 24G SAS (and, 24G+ SAS) satisfies performance requirements. 24G+ SAS includes emerging requirements (see technology roadmap graphic below, which includes detailed call out for 24G+ SAS).

From a technical standpoint, it is achievable, but we will go to 48G SAS as the market demands.

Q13: Will there be NVMe hard drives soon or will HDDs always be SAS/SATA behind a SAS infrastructure?

OCP is working on an NVMe hard drive specification.  According to market data, the vast majority of HDDs will be connected behind a SAS infrastructure.

Q15: Do you have any data of the trends on NVMe RAID?

No, we do not have data of trends with regards to NVMe RAID. The market data discussed in this webcast comes from TRENDFOCUS analyst firm, specifically their VP Don Jeanette. You are welcome to contact Don if you are interested in purchasing their reports.

Q16: Do you have any data on the SAS-4 SSD deployment in 2023?

No, we do not have that data. The market data discussed in this webcast comes from TRENDFOCUS analyst firm, specifically their VP Don Jeanette. You are welcome to contact Don if you are interested in purchasing their reports.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to