Blog

Storage Trends: Your Questions Answered

Storage Trends: Your Questions Answered

Apr 26, 2024

At our recent SNIA SCSI Trade Association Forum webinar, “Storage Trends 2024” our industry experts discussed new storage trends developing in the coming year, the applications and other factors driving these trends, and shared market data that illustrated the assertions. If you missed the live event, you can watch it on-demand in the SNIA Educational Library. Questions from the audience ranged from projections about the split between on-prem vs public cloud to queries about different technologies and terms such as NVMe, LTO tape, EDSFF and Cyber Storage. Here are answers to the audience’s questions.

Q1: What is the future of HDDs?

A1: HDDs are not in any hurry to depart the market. Despite high-capacity flash being out there, the cost per terabyte for HDDs are still very much in the platters’ favor. Not all jobs need to be done on flash.

Q2: Are hard drives out in 2024? Is the world ready for full flash?

A2. We are seeing a lot of flash for back up. Not necessarily to make your back up faster, but for something like instant restores or vehement instant recovery, to be able to run those on robust infrastructure is interesting. Also from the AI perspective, to be able to run AI jobs against backup data that doesn’t hit production or doesn’t require a big batch of copy data to be stood up so that we now have multiple copies of data, is an interesting use case. This is very much in the minority, but something we’re seeing.

Q3 What will be the role of LTO tape in this future?

A3: We’re seeing object deployed on tape. Quantum is doing this, a lot of clouds are doing this, despite how they may market it. Tape is still very much a thing, outside of today’s scope, but it is storage. When you look at some of these big libraries like DiamondBack from IBM, SpectraLogic has got several – both were very popular booths & technologies at the SC'23 industry event. The HPC installments are using tape, and I would imagine a massive chunk of the Fortune 500 are as well.

Q4: What about NVMe HDDs?

A4: The IDC chart about NVMe HDD penetration in the market didn’t even have NVMe HDDs on the line, but they show up at OCP at the storage sessions there. NVMe HDDs have been shown at industry events before, and StorageReview.com has written about them a couple of times. While they are not really a large presence in the market yet, they are certainly interesting and fun to think about. Like with EDSFF, where we are unifying on a connector, and it is interesting to think about a universal connector for drives. There’s certainly some interest. When you really step back, it is an interface change, but one of the reasons that’s a little slow to adopt, if you look at the earlier IDC chart about amount of capacity and the amount of infrastructure that’s currently in place, that is SAS-leveraged, it’s hard to displace that. While obviously there are some benefits around trying to unify around NVMe, over the long term there could be some TCO benefits. There is a very large installed SAS base, and the market continues to see value in that. That said, is it interesting, is it exciting – yes, you can see how some of these technologies may continue to evolve, and as flash does continue to become a bigger percentage of the ecosystem, it may help to further that along. There’s a lot of ecosystem that’s still built around SAS.

Q5. Does it make sense for HDDs to stay at a SAS interface when SSDs are moving to NVMe? Wouldn't it make more sense to have HDDs with NVMe interface to leverage a single interface.

A5. This is an active project within OCP. It simplifies design but also introduces new challenges, as in PCIe lane allocation in system designs.

Q6: This question is for Jeff Janukowicz (IDC), any projections on the split between public cloud and on-prem AI storage? Can you share any trends with regards to on prem vs cloud-based data archival?

A6: IDC has done some work, but hasn’t published anything around that quite yet. Obviously, the initial wave around AI is being driven by the public cloud vendors. IDC survey work suggests that the way this will ultimately start to play out will be that a lot of folks will then look to customize or build upon some of those publicly available models that are out there. Those are likely to move back on-prem, for compliance reasons or security reasons, or simply that people want to have that data in-house. When we say AI, it doesn’t mean that everything is going to the cloud. In this AI evolution, there will be a place for it in the cloud, on-prem and at the edge as well. The edge is where we will be collecting a lot of the data, when we think about inferencing and more, that will be done better at the edge. For client devices such as the PC, where Apple and Microsoft are pushing to integrate those AI features directly in the device, and next is mobile. The idea of “AI everywhere” will proliferate, and we are confident in saying yes, there will be a strong AI presence in on-prem data center as well.

Q7: It seems "Cyber Storage" is a trend, is this just a feature of storage, or an entirely new product category?

A7: It’s a term coined by Gartner, which is defined as doing threat detection and response in storage software or hardware. The SNIA Cloud Storage Technologies Initiative did a webinar on it.

Q8: Is the storage demands trend mainly affected by Hyperscalers? If yes, what do you expect in enterprise on-premise infrastructure?

A8: Demand from hyperscalers continues to be very large and represents most of the demand. Even in the IDC chart shown earlier around capacity optimization, that is from hyperscalers. This doesn’t mean that on-prem data centers are going away. People tend to leverage it either for security, compliance, or legacy reasons, so on-prem doesn’t go away. We see both continuing to co-exist and continuing to have value for different customers with different needs. Additionally, especially from the SSD development process, there is a strong desire by SSD manufacturers to make less variants of their drives. We’re seeing a desire to manufacture drives that go to hyperscalers, who typically buy in volume and dictate what gets made for the enterprise as a by-product. We are seeing more interest in having one skew or possibly a skew with a different firmware for hyperscalers and enterprise; if we get there, that efficiency should be positive for the overall market in terms of enabling SSD vendors to make that one product in scale, and then tune it a little bit for an enterprise vs a hyperscale use. There are potential efficiencies coming.

Q9: There are so many form factors as part of "EDSFF", how can we really call it standard, when it seems like there are 10 to choose from, plus more to come? How are drive suppliers going to focus efforts on commonality with so many choices?

A9: The standard wants to be able to accommodate all the use cases as possible, but there are many aspects that get defined with a lot of those aspects being optional. Then, just a handful of the those are defined as required. A couple more aspects become de facto standards, but there are several optional outliers that are choices for companies to develop around for more specific use cases. Initially what we are seeing is E.1S and E3.S are most likely going to be the most prevalent form factor versions, with E1.L and E3.S2T as maybe the next more common ones. There are a lot of variants that are defined just so companies can have options for specific use cases.

Q10. QLC has been asserted by some as the end of HDD. Given the projections shown today, is the death of HDDs realistic?

A10. QLC SSDs offer tremendous capacity gains over TLC SSDs and HDDs. They’re not less expensive per TB though, and that’s where HDDs will continue to hold a very critical spot. And while QLC SSDs are of course faster than HDDs, there are plenty of workloads where that speed simply isn’t needed.

Q11. While QLC has been around for years, QLC chatter and activity seems to have really picked up over the last 6 months or so. Is QLC on the precipice of having meaningful shipping volume compared to TLC? What are the drivers?

A11. For workloads that are heavily read-dependent (arguably, most workloads) QLC performance is on par with TLC. Even the endurance of QLC is more robust than most think, we’ve proven that out with our various Pi world record calculations. The density of course is another major benefit for QLC, 61.44TB in a U.2 form factor, even more in E1.L form factors. TLC will remain the go to for mixed or heavy write workloads, however.

Q12. How do you think composable memory system? is it going be a trend?

A12. Future versions of CXL promise composability and sharing of certain resources like DRAM, but this is still very fluid.

Q13. It’s not just capacity. Performance is important. Why purchase a bunch of expensive GPUs just to have them idle while waiting for data? Speed is about keeping these assts highly utilized.

A13. In our experience exploring and utilizing AI, the bottleneck in keeping GPUs busy is fabric, not storage performance. Once 800GbE gain proliferation, that math may change some and force Gen5/6 SSDs to be the default choice, but for now, it’s networking that’s limiting GPU utilization. Also remember, all AI is not created equal, and different use cases will have different storage performance needs, look at edge inferencing for a different model than data center training for instance.

Q14. We are seeing a NAND tightening from our suppliers on availability of the larger capacity SSD drives on SAS and NVMe, what is worse is non=fips vs fips, what's up from your perspective. Thanks

A14. The NAND flash industry continues to recover from the recent memory downturn and ramping up production. Until the industry fully recovers, NAND flash supply (and SSDs) will remain tight. FIPS vs Non-FIPS, such a long lead time for getting FIPS and that’s affecting all of the industry. Getting certification is taking a long time, with 140-3. The storage industry generally values FIPS certification highly and are following the process closely. NIST is the organization managing.

Q15. What power limits trends are seeing from devices being deployed in E3 form factor? The trend I see is the drive companies are continuing to consume more and more power.

A15. E3 offers a variety of power envelopes based on each form factor, ranging up to 70 watts. As you go up in wattage, you are gaining in capacity and performance, but this needs to be managed carefully to create a sustainable balance with data center efficiency. Efficient performance per watt is the goal.

Q16. Is AI accelerating the need for SSDs or is there something new needed? What's the biggest unique SSD requirement that isn't in all the other existing applications? Seems higher density, lower power, performance improvements are not new?

A16. There is not much increased demand we can see directly tied to AI, but as AI becomes more deployed, we will likely see an increase in overall SSD demand, as well as high capacity SSDs.

Q17. I have heard that storage as percentage of total IT spend has dropped significantly. As a lot of dollars are now going toward GPUs how will this trend respond?

A17. There is an evolution from GPU spend supporting increased storage spend.

About the Authors By Jeff Janukowicz, Research Vice President at IDC; Brian Beeler, Owner and Editor In Chief, StorageReview.com; and Cameron T. Brett, SNIA STA Forum Chair. Note to our readers: We had quite a few questions regarding SSDs, forecasting for SSDs, and comparing that with the future of HDDs. As a result, we are preparing a blog addressing these questions which will go into more detail.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Storage Uncategorized

Blog

AIOps Q&A

AIOps Q&A

Pratik Gupta

Apr 25, 2024

Moving well beyond “fix it when it breaks,” AIOps introduces intelligence into the fabric of IT thinking and processes. The impact of AIOps and the shift in IT practices were the focus of a recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “AIOps: Reactive to Proactive – Revolutionizing the IT Mindset.” If you missed the live session, it’s available on-demand together with the presentation slides, at the SNIA Educational Library. The audience asked several intriguing questions. Here are answers to them all: Q. How do you align your AIOps objectives with your company’s overall AI usage policy when it is still fairly restrictive in terms of AI use and acceptance? A.There are a lot of misconceptions on company policies and also what constitutes AI and the actual risk. So, there are several steps you can take:

Understand the policy and intent
Focus on low risk and high value use cases, for example, data used in IT management is often low risk and high value – e.g. metrics, or number of incidents or events
Start with a well-controlled and small environment and show value
Be transparent and demonstrate transparency. Even put human in the loop for a while.
Maintain data governance – responsible data handling.
Use industry’s best practices.

Q. What are the best AIOps tools in the market? A. There are many tools that claim to be an AIOps tool. But as the webinar shows, there is no single good tool and there will never be one best tool. It depends on what problem you are trying to solve.

Step 1: Identify the areas of the software development life cycle (SDLC) that you are focused on
Step 2: Identify the problem areas
Step 3: identify the tools that can help catch the problems earlier and solve them

Q. What kind of coding and tool experience is needed for AIOps? A. Different parts of the lifecycle require different levels of experience with coding or tools. Many don’t need any coding experience. However, a number of them require a thorough understanding of processes and best practices in software development or IT management to use them effectively. Q. How can a DevOps engineer upskill to AIOps? A. It is very easy for a DevOps engineer to upskill to use AIOps tools. A lot of these capabilities are available as open source. It is best to start experimenting with open-source tools and see their value. Second, focus on a smaller section of the problem (looking at the lifecycle) and then identify the tools that solve that problem. Free tiers, open-source tools, and even manual scripts help upskill without buying these tools. A lot of on-line course sites like Udemy are now offering AIOps classes as well. Q. What are examples of existing AI cloud cost optimization tools? There are 2 types of cloud cost optimization tools

ITOps tools – automate actions to optimize cost
FinOps tools – analyze and recommend actions to optimize cost.

The analysis tools are good at identifying issues but fall short of actually providing value unless you manually take action. The tools that automate provide value immediately but need greater buy in from the organization to allow a tool to take action. Some optimization tools available: Turbonomic from IBM, others are from Flexera, Apptio, Densify, AWS cost explorer, Azure Cost Management + Billing, some are built into the cloud providers. Q. Can you please explain runbooks further? A.Runbooks are a sequence of actions often coded as scripts that are used to automate the action or remediation in response to a problem or incident. These are pre-defined procedures. Usually, they are built out of a set of manual actions an operator takes and then codifies in the form of a procedure and then code.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

AI AI

Blog

Just What is an IOTTA? Inquiring Minds Learn Now!

Just What is an IOTTA? Inquiring Minds Learn Now!

SNIA CMS Community

Apr 9, 2024

SNIA’s twelve Technical Work Groups collaborate to develop and promote vendor-neutral architectures, standards, and education for management, movement, and security for technologies related to handling and optimizing data. One of the more unique work groups is the SNIA Input/Output Traces, Tools, and Analysis Technical Work Group (IOTTA TWG). SNIA Compute, Memory, and Storage Initiative recently sat down with IOTTA TWG Chairs Geoff Kuenning of Harvey Mudd College and Tom West of hyperI/O LLC to learn about some exciting new developments in their work activities and how SNIA members and colleagues can get involved. Q: What does the IOTTA TWG do? A: The IOTTA TWG is for those interested in the use of empirical data/metrics to better understand the actual operation and performance characteristics of storage I/O, especially as they pertain to application workloads. We summarize our work in this SNIA video https://www.youtube.com/watch?v=4EVW5IHHhEk One of our most important activities is to sponsor a collaborative worldwide repository for storage-related I/O trace collection and analysis tools, application workloads, I/O traces, and best practices around such topics. Q: What are the goals of the IOTTA Repository collaboration? A: The primary goal of the IOTTA Repository collaboration is to create a worldwide repository for storage related I/O trace files, associated tools, and other related information, all of which are made available free of charge to the storage research and development communities in both academia and industry. Repository data is often cited in research publications, with 627 citations to date listed on the IOTTA Repository website. Q: Why is keeping and sharing information by way of a Repository important? A: The IOTTA Repository provides a common facility through which a broad community (including storage vendors, storage users, and the academic community) can avail themselves of a variety of storage related I/O traces (especially contemporary I/O traces). We like to think of it as a “One-Stop-Shop”. Q: What kind of information are you gathering for the Repository? Is some information more important than other(s)? A: The Repository contains a wide variety of storage related I/O trace types, including Block I/O, HPC Summaries, Key-Value Traces, NFS Traces, Parallel Traces, Static Snapshots, System Call Traces, and Workload Summaries. Reliability Traces are the latest category of traces added to the IOTTA Repository. Generally, the Reliability Traces category includes records of storage system reliability, for example, long-term records of hard-drive failures. The IOTTA Repository additionally provides an off-site link to traces that cannot be included directly within the repository (e.g., unable to obtain permission to host a particular trace within the repository). Q: Who downloads this information? What groups can make use of this information? A: Academic institutions are among the most frequent downloaders of Repository information, along with storage companies. Practitioners can make use of various IOTTA Repository traces to gain a better understanding of actual I/O storage operation activity within various environments and scenarios. Traces can also be used as a basis for benchmarking and testing proposed solutions. SNIA IOTTA TWG members receive a monthly report that shows the number and types (i.e., trace names) of the traces downloaded during the month, including the downloader region (e.g., Asia, Europe, North America). The report also includes company/institution names associated with the downloaders. More information on joining the IOTTA TWG is at http://iotta.snia.org/faqs/joinIOTTA. Q: What is some of the latest information in the Repository? A: In February 2024, we posted NVMe drive reliability traces collected by Alibaba. The collection includes both fail-stop and fail-slow data for a large drive population in Alibaba’s servers. Q: What is the importance of these traces? A: The authors of the associated USENIX ATC 2022 paper indicate that the Alibaba Fail-Stop dataset is the first large-scale public dataset on real-world operational data of NVMe SSD. From their analysis of the dataset, they identified a series of major reliability changes in NVMe SSD. In addition, the authors of the associated USENIX FAST 2023 paper indicate that the Alibaba Fail-Slow dataset is the first large-scale, clear-labeled public dataset on real-world operational traces aiming at fail-slow detection (i.e., where the drive continues to run but with poor performance). Based upon the dataset, the authors have provided a root cause analysis on fail-slow drives. With the growing importance of NVMe SSDs in the data center, it is critical to understand the reliability of hardware in the cloud. The Repository provides the traces download and also links to the papers and presentation videos that discuss these large-scale SSD reliability studies. Q: What new activity would you like to see in the Repository? A: We’d like to see more trace downloads for analysis. Most downloads today are related to benchmarking and replay. Trace activity could feed into a simulated computer system to test activities like failures. We would also like to see more input of data related to tape storage. The Repository does not have much information on cold storage and multilevel storage between hot and cold storage. Finally, we would like feedback on how people are using what they download – for analysis, reliability, benchmarks and other areas they have found the downloads useful. We also want to know what else you would like to be able to download. You can contact us directly at iottachairs@snia.org. Thanks for your time and the great information about the IOTTA Repository. Learn more about the IOTTA Repository on their FAQ page. The post Just What is an IOTTA? Inquiring Minds Learn Now! first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Standards

Blog

SNIA Networking Storage Forum – New Name, Expanded Charter

SNIA Networking Storage Forum – New Name, Expanded Charter

Christine McMonigal

Apr 1, 2024

Anyone who follows technology knows that it is a fast-paced world with rapid changes and constant innovations. SNIA, together with its members, technical work groups, Forums, and Initiatives, continues to embrace, educate, and develop standards to make technology more available and better understood. At the SNIA Networking Storage Forum, we’ve been at the forefront of diving into technology topics that extend beyond traditional networked storage, providing education on AI, edge, acceleration and offloads, hyperconverged infrastructure, programming frameworks, and more. We still care about and spend a lot of time on networked storage and storage protocols, but we felt it was time that the name of the group better reflected the broad range of timely topics we’re covering. We are excited to announce our new name: SNIA Data, Networking & Storage Forum (DNSF). This group name aligns with SNIA’s data-centric focus and summarizes our belief that data is the center of networking and storage. In the same way that storage solutions have moved beyond silos to deliver an array of data services, so are we embracing a range of extended, but interrelated data technologies. We've also updated our charter to include the breadth of topics we cover that extend beyond traditional networked storage. Our Charter: The SNIA Data, Networking & Storage Forum (DNSF) educates and provides insights and leadership for applying technologies to a broad spectrum of end-to-end solutions. The DNSF mission is to:

Educate the industry and end users about technologies, standards and implementations in storage, networking, and data
Promote a broad range of solutions, including storage area networks (SAN), networked attached storage (NAS), software-defined storage (SDS), disaggregated and hyperconverged infrastructure (HCI)
Improve understanding of the impacts and opportunities of a wide variety of emerging technologies and use cases by leveraging cross industry expertise and collaboration

We hope you’ve had an opportunity to attend some of the educational webinars we’ve produced. We’re proud of the vast webinar library we’ve built and the positive feedback we get from our attendees. They know they can count on us to tackle technologies in a vendor-neutral way. It can be challenging to do that, but it really is a key tenet that sets SNIA apart. Our webinars in the past few years have ranged from storage networking security, storage performance metrics, and SAN basics to accelerating generative AI, NVMe/TCP and data center sustainability. In addition to our robust and highly-rated webinar program, our members also author and publish white papers, contributed articles, and blogs. We are excited about our new name and expanded charter! We have many great initiatives planned for 2024. Want to join us? Our DNSF members are highly committed and active at our weekly meetings, and we welcome new insights and expertise. Join one of our meetings to see what it’s all about! Learn more about joining us here. Or email us if you have questions or would like an invite to a meeting. We hope you’ll give consideration to joining our team!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage

Blog

SNIA Networking Storage Forum – New Name, Expanded Charter

SNIA Networking Storage Forum – New Name, Expanded Charter

Christine McMonigal

Apr 1, 2024

Anyone who follows technology knows that it is a fast-paced world with rapid changes and constant innovations. SNIA, together with its members, technical work groups, Forums, and Initiatives, continues to embrace, educate, and develop standards to make technology more available and better understood. At the SNIA Networking Storage Forum, we’ve been at the forefront of diving into technology topics that extend beyond traditional networked storage, providing education on AI, edge, acceleration and offloads, hyperconverged infrastructure, programming frameworks, and more. We still care about and spend a lot of time on networked storage and storage protocols, but we felt it was time that the name of the group better reflected the broad range of timely topics we’re covering. We are excited to announce our new name: SNIA Data, Networking & Storage Forum (DNSF). This group name aligns with SNIA’s data-centric focus and summarizes our belief that data is the center of networking and storage. In the same way that storage solutions have moved beyond silos to deliver an array of data services, so are we embracing a range of extended, but interrelated data technologies. We’ve also updated our charter to include the breadth of topics we cover that extend beyond traditional networked storage. Our Charter: The SNIA Data, Networking & Storage Forum (DNSF) educates, provides insights, and leadership for applying technologies to a broad spectrum of end-to-end solutions. The DNSF mission is to:

Educate the industry and end users about technologies, standards and implementations in storage, networking, and data
Promote a broad range of solutions, including storage area networks (SAN), networked attached storage (NAS), software-defined storage (SDS), disaggregated and hyperconverged infrastructure (HCI)
Improve understanding of the impacts and opportunities of a wide variety of emerging technologies and use cases by leveraging cross industry expertise and collaboration

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage

Blog

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Erik Smith

Mar 25, 2024

Generative AI is front page news everywhere you look. With advancements happening so quickly, it is hard to keep up. The SNIA Networking Storage Forum recently convened a panel of experts from a wide range of backgrounds to talk about Gen AI in general and specifically discuss how dataflow bottlenecks can constrain Gen AI application performance well below optimal levels. If you missed this session, “Accelerating Generative AI: Options for Conquering the Dataflow Bottlenecks,” it’s available on-demand at the SNIA Educational Library. We promised to provide answers to our audience questions, and here they are. Q: If ResNet-50 is a dinosaur from 2015, which model would you recommend using instead for benchmarking? A: Setting aside the unfair aspersions being cast on the venerable ResNet-50, which is still used for inferencing benchmarks 😊, we suggest checking out the MLCommons website. In the benchmarks section you’ll see multiple use cases on Training and Inference. There are multiple benchmarks available that can provide more information about the ability of your infrastructure to effectively handle your intended workload. Q: Even if/when we use optics to connect clusters, there is a roughly 5ns/meter delay for the fiber between clusters. Seems like that physical distance limit almost mandates alternate ways of programming optimization to ‘stitch’ the interplay between data and compute? A: With regards to the use of optics versus copper to connect clusters, signals propagate through fiber and copper at about the same speed, so moving to an all-optical cabling infrastructure for latency reduction reasons is probably not the best use of capital. Also, even if there were a slight difference in the signal propagation speed through a particular optical or copper based medium, 5ns/m is small compared to switch and NIC packet processing latencies (e.g., 200-800 ns per hop) until you get to full metro distances. In addition, the software latencies are 2-6 us on top of the physical latencies for the most optimized systems. For AI fabrics data/messages are pipelined, so the raw latency does not have much effect. Interestingly, the time for data to travel between nodes is only one of the limiting factors when it comes to AI performance limitations and it’s not the biggest limitation either. Along these lines, there’s a phenomenal talk by Stephen Jones (NVIDIA) “How GPU computing works” that explains how latency between GPU and Memory impacts the overall system efficiency much more than anything else. That said, the various collective communication libraries (NCCL, RCCL, etc) and in network compute (e.g., SHARP) can have a big impact on the overall system efficiency by helping to avoid network contention. Q: Does this mean that GPUs are more efficient to use than CPUs and DPUs? A: GPUs, CPUs, AI accelerators, and DPUs all provide different functions and have different tradeoffs. While a CPU is good at executing arbitrary streams of instructions through applications/programs, embarrassingly parallelizable workloads (e.g., matrix multiplications which are common in deep learning) can be much more efficient when performed by GPUs or AI accelerators due to the GPUs’ and accelerators’ ability to execute linear algebra operations in parallel. Similarly, I wouldn’t use a GPU or AI accelerator as a general-purpose data mover, I’d use a CPU or an IPU/DPU for that. Q: With regards to vector engines, are there DPUs, switches (IB or Ethernet) that contain vector engines? A: There are commercially available vector engine accelerators but currently there are no IPUs/DPUs or switches that provide this functionality natively. Q: One of the major bottlenecks in modern AI is GPU to GPU connectivity. Ex. NVIDIA uses a proprietary GPU-GPU interconnect, At DGX-2 the focus was on 16 GPUs within a single box with NVSwitch, but then with A100 NVIDIA pulled this back to 8GPUs. But then expanded on that to a super-pod and a second level of switching to get to 256GPUS. How does NVlink, or other proprietary GPU to GPU interconnects address bottlenecks? And why has industry focused on an 8 GPU deployment vs a 16 GPU deployment resolution, given that LLMs are not training on 10's of thousands of GPUs? A: GPU-GPU interconnects all addresses bottlenecks in the same way that other high-speed fabrics do. GPU-GPU have direct connections featuring large bandwidth, optimized interconnect (point to point or parallel paths), and lightweight protocols. These interconnects have so far been proprietary and not interoperable across GPU vendors. The number of GPUs in a server chassis is dependent on many practical factors, e.g., 8 Gaudis per server leveraging standard RoCE ports provides a good balance to support training and inference. Q: How do you see the future of blending of memory and storage being enabled for generative AI workloads and the direction of "unified" memory between accelerators, GPUs, DPUs and CPUs? A: If by unified memory, you mean centralized memory that can be treated like a resource pool and be consumed by GPUs in place of HBM or by CPUs/DPUs in place of DRAM, then we do not believe we will see unified memory in the foreseeable future. The primary reason is latency. To have a unified memory would require centralization. Even if you were to constrain the distance (i.e., between the end-devices and the centralized memory) to be a single rack, the latency increase caused by the extra circuitry and physical length of the transport media (at 5ns per meter) could be detrimental to performance. However, the big problem with resource sharing is contention. Whether it be congestion in the network or contention at the centralized resource access point (interface), sharing resources requires special handling that will be challenging in the general case. For example, with 10 “compute” nodes attempting to access a pool of memory on a CXL Type 3 device, many of the nodes will end up waiting an unacceptably long period of time for a response. If by unified memory, you mean creating a new “capacity” tier of memory that is more performant than SSD and less performant than DRAM, then CXL Type 3 devices appear to be the way the industry will address that use case, but it may be a while before we see mass adoption. Q: Do you see the hardware design to more specialized into the AI/ML phases (training, inference, etc.)? But today's enterprise deployments you can have the same hardware performing several tasks in parallel. A: Yes, not only have specialized HW offerings (e.g., accelerators) already been introduced (such as in consumer laptops combining CPUs with inference engines), but also specialized configurations that have been optimized for specific use cases (e.g., inferencing) to be introduced as well. The reason is related to the diverse requirements for each use case. For more information, see the OCP Global Summit 23 presentation “Meta’s evolution of network AI” (specifically starting at time stamp 4:30). They describe how different use cases stress the infrastructure in different ways. That said, there is value in accelerators and hardware being able to address any of the work types for AI so that a given cluster can run whichever mix of jobs is required at a given time. Q: Google leaders like Amin Vahdat have been casting doubts on the possibility of significant acceleration far from the CPU. Can you elaborate further on positioning data-centric compute in the face of that challenge? A: This is a multi-billion-dollar question! There isn’t an obvious answer today. You could imagine building a data processing pipeline with data transform accelerators ‘far’ from where the training and inferencing CPU/accelerators are located. You could build a full “accelerator only” training pipeline if you consider a GPU to be an accelerator not a CPU. The better way to think about this problem is to consider that there is no single answer for how to build ML infrastructure. There is also no single definition of CPU vs accelerator that matters in constructing useful AI infrastructure solutions. The distinction comes down to the role of the device within the infrastructure. With emerging ‘chiplet’ and similar approaches we will see the lines and distinctions blur further. What is significant in what Vahdat and others have been discussing: fabric/network/memory construction plus protocols to improve bandwidth, limit congestion, and reduce tail latency when connecting the data to computational elements (CPU, GPU, AI accelerators, hybrids) will see significant evolution and development over the next few years.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

AI DPU GPU

Blog

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Erik Smith

Mar 25, 2024

, we suggest checking out the MLCommons website. In the benchmarks section you’ll see multiple use cases on Training and Inference. There are multiple benchmarks available that can provide more information about the ability of your infrastructure to effectively handle your intended workload. Q: Even if/when we use optics to connect clusters, there is a roughly 5ns/meter delay for the fiber between clusters. Seems like that physical distance limit almost mandates alternate ways of programming optimization to ‘stitch’ the interplay between data and compute? A: With regards to the use of optics versus copper to connect clusters, signals propagate through fiber and copper at about the same speed, so moving to an all-optical cabling infrastructure for latency reduction reasons is probably not the best use of capital. Also, even if there were a slight difference in the signal propagation speed through a particular optical or copper based medium, 5ns/m is small compared to switch and NIC packet processing latencies (e.g., 200-800 ns per hop) until you get to full metro distances. In addition, the software latencies are 2-6 us on top of the physical latencies for the most optimized systems. For AI fabrics data/messages are pipelined, so the raw latency does not have much effect. Interestingly, the time for data to travel between nodes is only one of the limiting factors when it comes to AI performance limitations and it’s not the biggest limitation either. Along these lines, there’s a phenomenal talk by Stephen Jones (NVIDIA) “How GPU computing works” that explains how latency between GPU and Memory impacts the overall system efficiency much more than anything else. That said, the various collective communication libraries (NCCL, RCCL, etc) and in network compute (e.g., SHARP) can have a big impact on the overall system efficiency by helping to avoid network contention. Q: Does this mean that GPUs are more efficient to use than CPUs and DPUs? A: GPUs, CPUs, AI accelerators, and DPUs all provide different functions and have different tradeoffs. While a CPU is good at executing arbitrary streams of instructions through applications/programs, embarrassingly parallelizable workloads (e.g., matrix multiplications which are common in deep learning) can be much more efficient when performed by GPUs or AI accelerators due to the GPUs’ and accelerators’ ability to execute linear algebra operations in parallel. Similarly, I wouldn’t use a GPU or AI accelerator as a general-purpose data mover, I’d use a CPU or an IPU/DPU for that. Q: With regards to vector engines, are there DPUs, switches (IB or Ethernet) that contain vector engines? A: There are commercially available vector engine accelerators but currently there are no IPUs/DPUs or switches that provide this functionality natively. Q: One of the major bottlenecks in modern AI is GPU to GPU connectivity. Ex. NVIDIA uses a proprietary GPU-GPU interconnect, At DGX-2 the focus was on 16 GPUs within a single box with NVSwitch, but then with A100 NVIDIA pulled this back to 8GPUs. But then expanded on that to a super-pod and a second level of switching to get to 256GPUS. How does NVlink, or other proprietary GPU to GPU interconnects address bottlenecks? And why has industry focused on an 8 GPU deployment vs a 16 GPU deployment resolution, given that LLMs are not training on 10’s of thousands of GPUs? A: GPU-GPU interconnects all addresses bottlenecks in the same way that other high-speed fabrics do. GPU-GPU have direct connections featuring large bandwidth, optimized interconnect (point to point or parallel paths), and lightweight protocols. These interconnects have so far been proprietary and not interoperable across GPU vendors. The number of GPUs in a server chassis is dependent on many practical factors, e.g., 8 Gaudis per server leveraging standard RoCE ports provides a good balance to support training and inference. Q: How do you see the future of blending of memory and storage being enabled for generative AI workloads and the direction of “unified” memory between accelerators, GPUs, DPUs and CPUs? A: If by unified memory, you mean centralized memory that can be treated like a resource pool and be consumed by GPUs in place of HBM or by CPUs/DPUs in place of DRAM, then we do not believe we will see unified memory in the foreseeable future. The primary reason is latency. To have a unified memory would require centralization. Even if you were to constrain the distance (i.e., between the end-devices and the centralized memory) to be a single rack, the latency increase caused by the extra circuitry and physical length of the transport media (at 5ns per meter) could be detrimental to performance. However, the big problem with resource sharing is contention. Whether it be congestion in the network or contention at the centralized resource access point (interface), sharing resources requires special handling that will be challenging in the general case. For example, with 10 “compute” nodes attempting to access a pool of memory on a CXL Type 3 device, many of the nodes will end up waiting an unacceptably long period of time for a response. If by unified memory, you mean creating a new “capacity” tier of memory that is more performant than SSD and less performant than DRAM, then CXL Type 3 devices appear to be the way the industry will address that use case, but it may be a while before we see mass adoption. Q: Do you see the hardware design to more specialized into the AI/ML phases (training, inference, etc.)? But today’s enterprise deployments you can have the same hardware performing several tasks in parallel. A: Yes, not only have specialized HW offerings (e.g., accelerators) already been introduced (such as in consumer laptops combining CPUs with inference engines), but also specialized configurations that have been optimized for specific use cases (e.g., inferencing) to be introduced as well. The reason is related to the diverse requirements for each use case. For more information, see the OCP Global Summit 23 presentation “Meta’s evolution of network AI” (specifically starting at time stamp 4:30). They describe how different use cases stress the infrastructure in different ways. That said, there is value in accelerators and hardware being able to address any of the work types for AI so that a given cluster can run whichever mix of jobs is required at a given time. Q: Google leaders like Amin Vahdat have been casting doubts on the possibility of significant acceleration far from the CPU. Can you elaborate further on positioning data-centric compute in the face of that challenge? A: This is a multi-billion-dollar question! There isn’t an obvious answer today. You could imagine building a data processing pipeline with data transform accelerators ‘far’ from where the training and inferencing CPU/accelerators are located. You could build a full “accelerator only” training pipeline if you consider a GPU to be an accelerator not a CPU. The better way to think about this problem is to consider that there is no single answer for how to build ML infrastructure. There is also no single definition of CPU vs accelerator that matters in constructing useful AI infrastructure solutions. The distinction comes down to the role of the device within the infrastructure. With emerging ‘chiplet’ and similar approaches we will see the lines and distinctions blur further. What is significant in what Vahdat and others have been discussing: fabric/network/memory construction plus protocols to improve bandwidth, limit congestion, and reduce tail latency when connecting the data to computational elements (CPU, GPU, AI accelerators, hybrids) will see significant evolution and development over the next few years. The post Q&A for Accelerating Gen AI Dataflow Bottlenecks first appeared on SNIA on Data, Networking & Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

AI DPU GPU

Blog

Hidden Costs of AI Q&A

Hidden Costs of AI Q&A

Erik Smith

Mar 14, 2024

At our recent SNIA Networking Storage Forum webinar, “Addressing the Hidden Costs of AI,” our expert team explored the impacts of AI, including sustainability and areas where there are potentially hidden technical and infrastructure costs. If you missed the live event, you can watch it on-demand in the SNIA Educational Library. Questions from the audience ranged from training Large Language Models to fundamental infrastructure changes from AI and more. Here are answers to the audience’s questions from our presenters. Q: Do you have an idea of where the best tradeoff is for high IO speed cost and GPU working cost? Is it always best to spend maximum and get highest IO speed possible? A: It depends on what you are trying to do If you are training a Large Language Model (LLM) then you’ll have a large collection of GPUs communicating with one another regularly (e.g., All-reduce) and doing so at throughput rates that are up to 900GB/s per GPU! For this kind of use case, it makes sense to use the fastest network option available. Any money saved by using a cheaper/slightly less performant transport will be more than offset by the cost of GPUs that are idle while waiting for data. If you are more interested in Fine Tuning an existing model or using Retrieval Augmented Generation (RAG) then you won’t need quite as much network bandwidth and can choose a more economical connectivity option. It’s worth noting that a group of companies have come together to work on the next generation of networking that will be well suited for use in HPC and AI environments. This group, the Ultra Ethernet Consortium (UEC), has agreed to collaborate on an open standard and has wide industry backing. This should allow even large clusters (1000+ nodes) to utilize a common fabric for all the network needs of a cluster. Q: We (all industries) are trying to use AI for everything. Is that cost effective? Does it cost fractions of a penny to answer a user question, or is there a high cost that is being hidden or eaten by someone now because the industry is so new? A: It does not make sense to try and use AI/ML to solve every problem. AI/ML should only be used when a more traditional, algorithmic, technique cannot easily be used to solve a problem (and there are plenty of these). Generative AI aside, one example where AI has historically provided an enormous benefit for IT practitioners is Multivariate Anomaly Detection. These models can learn what normal is for a given set of telemetry streams and then alert the user when something unexpected happens. A traditional approach (e.g., writing source code for an anomaly detector) would be cost and time prohibitive and probably not be anywhere nearly as good at detecting anomalies. Q: Can you discuss typical data access patterns for model training or tuning? (sequential/random, block sizes, repeated access, etc)? A: There is no simple answer as the access patterns can vary from one type of training to the next. Assuming you’d like a better answer than that, I would suggest starting to look into two resources:

Meta’s OCP Presentation: “Meta’s evolution of network for AI” includes a ton of great information about AI’s impact on the network.
Blocks and Files article: “MLCommons publishes storage benchmark for AI” includes a table that provides an overview of benchmark results for one set of tests.

Q: Will this video be available after the talk? I would like to forward to my co-workers. Great info. A: Yes. You can access the video and a PDF of the presentations slides here. Q: Does this mean we're moving to fewer updates or write once (or infrequently) read mostly storage model? I'm excluding dynamic data from end-user inference requests. A: For the active training and finetuning phase of an AI model the data patterns are very read heavy. There is quite a lot of work done before a training or finetuning job begins that is much more balanced between read & write. This is called the “data preparation” phase of an AI pipeline. Data prep takes existing data from a variety of sources (inhouse data lake, dataset from a public repo, or a database) and performs data manipulation tasks to accomplish data labeling and formatting at a minimum. So, tuning for just read may not be optimal. Q: Fibre Channel seems to have a lot of the characteristics required for the fabric. Could a Fibre Channel fabric over NVMe be utilized to handle the data ingestion for AI component on dedicated adapters for storage (disaggregate storage)? A: Fibre Channel is not a great fit for AI use cases for a few reasons:

With AI, data is typically accessed as either Files or Objects, not Blocks, and FC is primarily used to access block storage.
If you wanted to use FC in place of IB (for GPU to GPU traffic) you’d need something like an FC-RDMA to make FC suitable.
All of that said, FC currently maxes out at 128GFC and there are two reasons why this matters:
1. AI optimized storage starts at 200Gbps and based on some end user feedback, 400Gbps is already not fast enough.
2. GPU to GPU traffic bandwidth requirements require up to 900GB/s (7200Gbps) of throughput per GPU, that’s about 56 128GFC interfaces per GPU.

Q: Do you see something like GPUDirect storage from NVIDIA becoming the standard? So does this mean NVMe will win? (over FC or TCP?) Will other AI chip providers have to adopt their own GPUDirect-like protocol? A: It’s too early to say whether or not GPUDirect storage will become a de facto standard or if alternate approaches (e.g., pNFS) will be able to satisfy the needs of most environments. The answer is likely to be “both”. Q: You've mentioned demand for higher throughput for training, and lower latency for inference. Is there a demand for low cost, high capacity, archive tier storage? A: Not specifically for AI. Depending on what you are doing, training and inference can be latency or throughput sensitive (sometimes both). Training an LLM (which most users will never actually attempt to do) requires massive throughput from storage for reads and writes, literally the faster the better when loading data into the GPUs or when the GPUs are saving checkpoints. An inference workload wouldn’t require the same throughput as training would but to the extent that it needs to access storage, it would certainly benefit from low latency. If you are trying to optimize AI storage for anything but performance (e.g., cost), you are probably going to be disappointed with overall performance of the system. Q: What are the presenters' views about industry trend to run workload or train a model? is it in the cloud datacenters like AWS or GCP or On-prem? A: It truly depends on what you are doing. If you want to experiment with AI (e.g., an AI version of a “Hello World” program), or even something a bit more involved, there are lots of options that allow you to use the cloud economically. Check out this collection of colab notebooks for an example and give it a try for yourself. Once you get beyond simple projects, you’ll find that using cloud-based services will become prohibitively expensive and you’ll quickly want to start running you training jobs on-prem, the downside to this is the need to manage the infrastructure elements yourself, this assumes that you can even get the right GPUs, although there are reports that supply issues are easing in this space. The bottom line is, whether or not to run on-prem or in the cloud is still a question of answering the question, can you realistically get the same ease of use and freedom from HW maintenance from your own infrastructure as you could from a CSP. Sometimes the answer is yes. Q: Does AI accelerator in PC (recently advertised for new CPUs) have any impact/benefit on using large public AI models? A: AI accelerators in PCs will be a boon for all of us as it will enable inference at the edge. It will also allow exploration and experimentation on your local system for building your own AI work. You will, however, want to focus on small or mini models at this time. Without large amounts of dedicated GPU memory to help speed things up only the small models will run well on your local PC. That being said, we will continue to see improvements in this area and PCs are a great starting point for AI projects. Q: Fundamentally -- Is AI radically changing what is required from storage? Or is it simply accelerating some of the existing trends of reducing power, higher density SSDs, and pushing faster on the trends in computational storage, new NVMs transport modes (such as RDMA), and pushing for ever more file system optimizations? A: From the point of view of a typical enterprise storage deployment (e.g., Block storage being accessed over an FC SAN), AI storage is completely different. Storage is accessed as either Files or Objects, not as blocks and the performance requirements already exceed the maximum speeds that FC can deliver today (i.e., 128GFC). This means most AI storage is using either Ethernet or IB as a transport. Raw performance seems to be the primary driver in this space right now rather than reducing power consumption or Increasing density. You can expect protocols such as GPUDirect and pNFS to become increasingly important to meet performance targets. Q: What are the innovations in HDDs relative to AI workloads? This was mentioned in the SSD + HDD slide. A: The point of the SSD + HDD slide was to point out the introduction of SSDs:

dramatically improved overall storage system efficiency, leading to a dramatic performance boost. This performance boost impacted the amount of data that a single storage port could transmit onto a SAN and this had a dramatic impact on the need to monitor for congestion and congestion spreading.
didn’t completely displace the need for HDDs, just as GPUs won’t replace the need for CPUs. They provide different functions and excel at different types of jobs.

Q: What is the difference between (1) Peak Inference, (2) Mainstream Inference, (3) Baseline Inference, and (4) Endpoint Inference, specifically from a cost perspective? A: This question was answered Live during the webinar (see timestamp 44:27) the following is a summary of the responses: Endpoint inference is inference that is happening on client devices (e.g., laptops, smartphones) where much smaller models that have been optimized for the very constrained power envelope of these devices. Peak inference can be thought about as something like Chat GPT or Bings AI chatbot, where you need large / specialized infrastructure (e.g., GPUs, specialized AI Hardware accelerators). Mainstream and Baseline inference is somewhere in between where you're using much smaller models or specialized models. For example, you could have a mistral 7 billion model which you have fine-tuned for your enterprise use case of document summarization or to find insights in a sales pipeline, and these use cases can employ much smaller models and hence the requirements can vary. In terms of cost the deployment of these models for edge inference would be low as compared to peak inference like a chat GPT which would be much higher. In terms of infrastructure requirements some of the Baseline and mainstream inference models can be served just by using a CPU alone or with a CPU plus a GPU, or with a CPU plus a few GPUs, or CPU plus a few AI accelerators. CPUs available today do have built AI accelerators which can provide an optimized cost solution for Baseline and mainstream inference which will be the typical scenario in many enterprise environments. Q: You said utilization of network and hardware is changing significantly but compared to what? Traditional enterprise workloads or HPC workloads? A: AI workloads will drive network utilization unlike anything the enterprise has ever experienced before. Each GPU (of which there are currently up to 8 in a server) can currently generate 900GB/s (7200 Gbps) of GPU to GPU traffic. To be fair, this GPU to GPU traffic can and should be isolated to a dedicated “AI Fabric” that has been specifically designed for this use. Along these lines new types of network topologies are being used. Rob mentioned one of them during his portion of the presentation (i.e., the Rail topology). Those end users already familiar with HPC will find many of the same constraints and scalability issues that need to be dealt with in HPC environments also impact AI infrastructure. Q: What are the key networking considerations for AI deployed at Edge (i.e. stores, branch offices)? A: AI at the edge is a talk all on its own. Much like we see large differences between training, fine tuning, and inference in the data center, inference at the edge has many flavors and performance requirements that differ from use case to use case. Some examples are a centralized set of servers ingesting the camera feeds for a large retail store, aggregating them, and making inferences as compared to a single camera watching an intersection and using an on-chip AI accelerator to make streaming inferences. All forms of devices from medical test equipment, your car, or your phone are all edge devices with wildly different capabilities.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

AI Machine Learning

Blog

Hidden Costs of AI Q&A

Hidden Costs of AI Q&A

Erik Smith

Mar 14, 2024

Meta’s OCP Presentation: “Meta’s evolution of network for AI” includes a ton of great information about AI’s impact on the network.
Blocks and Files article: “MLCommons publishes storage benchmark for AI” includes a table that provides an overview of benchmark results for one set of tests.

Q: Will this video be available after the talk? I would like to forward to my co-workers. Great info. A: Yes. You can access the video and a PDF of the presentations slides here. Q: Does this mean we’re moving to fewer updates or write once (or infrequently) read mostly storage model? I’m excluding dynamic data from end-user inference requests. A: For the active training and finetuning phase of an AI model the data patterns are very read heavy. There is quite a lot of work done before a training or finetuning job begins that is much more balanced between read & write. This is called the “data preparation” phase of an AI pipeline. Data prep takes existing data from a variety of sources (inhouse data lake, dataset from a public repo, or a database) and performs data manipulation tasks to accomplish data labeling and formatting at a minimum. So, tuning for just read may not be optimal. Q: Fibre Channel seems to have a lot of the characteristics required for the fabric. Could a Fibre Channel fabric over NVMe be utilized to handle the data ingestion for AI component on dedicated adapters for storage (disaggregate storage)? A: Fibre Channel is not a great fit for AI use cases for a few reasons:

With AI, data is typically accessed as either Files or Objects, not Blocks, and FC is primarily used to access block storage.
If you wanted to use FC in place of IB (for GPU to GPU traffic) you’d need something like an FC-RDMA to make FC suitable.
All of that said, FC currently maxes out at 128GFC and there are two reasons why this matters:
1. AI optimized storage starts at 200Gbps and based on some end user feedback, 400Gbps is already not fast enough.
2. GPU to GPU traffic bandwidth requirements require up to 900GB/s (7200Gbps) of throughput per GPU, that’s about 56 128GFC interfaces per GPU.

Q: Do you see something like GPUDirect storage from NVIDIA becoming the standard? So does this mean NVMe will win? (over FC or TCP?) Will other AI chip providers have to adopt their own GPUDirect-like protocol? A: It’s too early to say whether or not GPUDirect storage will become a de facto standard or if alternate approaches (e.g., pNFS) will be able to satisfy the needs of most environments. The answer is likely to be “both”. Q: You’ve mentioned demand for higher throughput for training, and lower latency for inference. Is there a demand for low cost, high capacity, archive tier storage? A: Not specifically for AI. Depending on what you are doing, training and inference can be latency or throughput sensitive (sometimes both). Training an LLM (which most users will never actually attempt to do) requires massive throughput from storage for reads and writes, literally the faster the better when loading data into the GPUs or when the GPUs are saving checkpoints. An inference workload wouldn’t require the same throughput as training would but to the extent that it needs to access storage, it would certainly benefit from low latency. If you are trying to optimize AI storage for anything but performance (e.g., cost), you are probably going to be disappointed with overall performance of the system. Q: What are the presenters’ views about industry trend to run workload or train a model? is it in the cloud datacenters like AWS or GCP or On-prem? A: It truly depends on what you are doing. If you want to experiment with AI (e.g., an AI version of a “Hello World” program), or even something a bit more involved, there are lots of options that allow you to use the cloud economically. Check out this collection of colab notebooks for an example and give it a try for yourself. Once you get beyond simple projects, you’ll find that using cloud-based services will become prohibitively expensive and you’ll quickly want to start running you training jobs on-prem, the downside to this is the need to manage the infrastructure elements yourself, this assumes that you can even get the right GPUs, although there are reports that supply issues are easing in this space. The bottom line is, whether or not to run on-prem or in the cloud is still a question of answering the question, can you realistically get the same ease of use and freedom from HW maintenance from your own infrastructure as you could from a CSP. Sometimes the answer is yes. Q: Does AI accelerator in PC (recently advertised for new CPUs) have any impact/benefit on using large public AI models? A: AI accelerators in PCs will be a boon for all of us as it will enable inference at the edge. It will also allow exploration and experimentation on your local system for building your own AI work. You will, however, want to focus on small or mini models at this time. Without large amounts of dedicated GPU memory to help speed things up only the small models will run well on your local PC. That being said, we will continue to see improvements in this area and PCs are a great starting point for AI projects. Q: Fundamentally — Is AI radically changing what is required from storage? Or is it simply accelerating some of the existing trends of reducing power, higher density SSDs, and pushing faster on the trends in computational storage, new NVMs transport modes (such as RDMA), and pushing for ever more file system optimizations? A: From the point of view of a typical enterprise storage deployment (e.g., Block storage being accessed over an FC SAN), AI storage is completely different. Storage is accessed as either Files or Objects, not as blocks and the performance requirements already exceed the maximum speeds that FC can deliver today (i.e., 128GFC). This means most AI storage is using either Ethernet or IB as a transport. Raw performance seems to be the primary driver in this space right now rather than reducing power consumption or Increasing density. You can expect protocols such as GPUDirect and pNFS to become increasingly important to meet performance targets. Q: What are the innovations in HDDs relative to AI workloads? This was mentioned in the SSD + HDD slide. A: The point of the SSD + HDD slide was to point out the introduction of SSDs:

dramatically improved overall storage system efficiency, leading to a dramatic performance boost. This performance boost impacted the amount of data that a single storage port could transmit onto a SAN and this had a dramatic impact on the need to monitor for congestion and congestion spreading.
didn’t completely displace the need for HDDs, just as GPUs won’t replace the need for CPUs. They provide different functions and excel at different types of jobs.

Q: What is the difference between (1) Peak Inference, (2) Mainstream Inference, (3) Baseline Inference, and (4) Endpoint Inference, specifically from a cost perspective? A: This question was answered Live during the webinar (see timestamp 44:27) the following is a summary of the responses: Endpoint inference is inference that is happening on client devices (e.g., laptops, smartphones) where much smaller models that have been optimized for the very constrained power envelope of these devices. Peak inference can be thought about as something like Chat GPT or Bings AI chatbot, where you need large / specialized infrastructure (e.g., GPUs, specialized AI Hardware accelerators). Mainstream and Baseline inference is somewhere in between where you’re using much smaller models or specialized models. For example, you could have a mistral 7 billion model which you have fine-tuned for your enterprise use case of document summarization or to find insights in a sales pipeline, and these use cases can employ much smaller models and hence the requirements can vary. In terms of cost the deployment of these models for edge inference would be low as compared to peak inference like a chat GPT which would be much higher. In terms of infrastructure requirements some of the Baseline and mainstream inference models can be served just by using a CPU alone or with a CPU plus a GPU, or with a CPU plus a few GPUs, or CPU plus a few AI accelerators. CPUs available today do have built AI accelerators which can provide an optimized cost solution for Baseline and mainstream inference which will be the typical scenario in many enterprise environments. Q: You said utilization of network and hardware is changing significantly but compared to what? Traditional enterprise workloads or HPC workloads? A: AI workloads will drive network utilization unlike anything the enterprise has ever experienced before. Each GPU (of which there are currently up to 8 in a server) can currently generate 900GB/s (7200 Gbps) of GPU to GPU traffic. To be fair, this GPU to GPU traffic can and should be isolated to a dedicated “AI Fabric” that has been specifically designed for this use. Along these lines new types of network topologies are being used. Rob mentioned one of them during his portion of the presentation (i.e., the Rail topology). Those end users already familiar with HPC will find many of the same constraints and scalability issues that need to be dealt with in HPC environments also impact AI infrastructure. Q: What are the key networking considerations for AI deployed at Edge (i.e. stores, branch offices)? A: AI at the edge is a talk all on its own. Much like we see large differences between training, fine tuning, and inference in the data center, inference at the edge has many flavors and performance requirements that differ from use case to use case. Some examples are a centralized set of servers ingesting the camera feeds for a large retail store, aggregating them, and making inferences as compared to a single camera watching an intersection and using an on-chip AI accelerator to make streaming inferences. All forms of devices from medical test equipment, your car, or your phone are all edge devices with wildly different capabilities. The post Hidden Costs of AI Q&A first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

AI AI Machine Learning

Blog

2024 Year of the Summit Kicks Off – Meet us at MemCon

2024 Year of the Summit Kicks Off – Meet us at MemCon

SNIA CMS Community

Mar 6, 2024

2023 was a great year for SNIA CMSI to meet with IT professionals and end users in “Summits” to discuss technologies, innovations, challenges, and solutions. Our outreach at six industry events reached over 16,000 and we thank all who engaged with our CMSI members. We are excited to continue a second “Year of the Summit” with a variety of opportunities to network and converse with you. Our first networking event will take place March 26-27, 2024 at MemCon in Mountain View, CA. MemCon 2024 focuses on systems design for the data centric era, working with data-intensive workloads, integrating emerging technologies, and overcoming data movement and management challenges. It’s the perfect event to discuss the integration of SNIA’s focus on developing global standards and delivering education on all technologies related to data. SNIA and MemCon have prepared a video highlighting several of the key topics to be discussed.

At MemCon, SNIA CMSI member and SDXI Technical Work Group Chair Shyam Iyer of Dell will moderate a panel discussion on How are Memory Innovations Impacting the Total Cost of Ownership in Scaling-Up and Power Consumption , discussing impacts on hyperscalers, AI/ML compute, and cost/power. SNIA Board member David McIntyre will participate in a panel on How are Increased Adoption of CXL, HBM, and Memory Protocol Expected to Change the Way Memory and Storage is Used and Assembled? , with insights on the markets and emerging memory innovations. The full MemCon agenda is here. In the exhibit area, SNIA leaders will be on hand to demonstrate updates to the SNIA Persistent Memory Programming Workshop featuring new CXL® memory modules (get an early look at our Programming exercises here) and to provide a first look at a Smart Data Accelerator Interface (SDXI) specification implementation. We’ll also provide updates on SNIA technical work on form factors like those used for CXL. We will feature a drawing for gift cards at the SNIA hosted coffee receptions and at the Tuesday evening networking reception. SNIA colleagues and friends can register for MemCon with a 15% discount using code SNIA15. And stay tuned for engaging with SNIA at upcoming events in 2024, including a return of the SNIA Compute, Memory, and Storage Summit in May 2024, August 2024 FMS-the Future of Memory and Storage; SNIA SDC in September, and SC24 in Atlanta in November 2024. We’ll discuss each of these in depth in our Year of the Summit blog series. The post 2024 Year of the Summit Kicks Off – Meet us at MemCon first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage Solid State Storage Standards

Subscribe to

Storage Trends: Your Questions Answered

Find a similar article by tags

Leave a Reply

AIOps Q&A

Find a similar article by tags

Leave a Reply

Just What is an IOTTA? Inquiring Minds Learn Now!

Find a similar article by tags

Leave a Reply

SNIA Networking Storage Forum – New Name, Expanded Charter

Find a similar article by tags

Leave a Reply

SNIA Networking Storage Forum – New Name, Expanded Charter

Find a similar article by tags

Leave a Reply

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Find a similar article by tags

Leave a Reply

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Find a similar article by tags

Leave a Reply

Hidden Costs of AI Q&A

Find a similar article by tags

Leave a Reply

Hidden Costs of AI Q&A

Find a similar article by tags

Leave a Reply

2024 Year of the Summit Kicks Off – Meet us at MemCon

Find a similar article by tags

Leave a Reply