At our recent SNIA Networking Storage Forum webinar, “Addressing the Hidden Costs of AI,” our expert team explored the impacts of AI, including sustainability and areas where there are potentially hidden technical and infrastructure costs. If you missed the live event, you can watch it on-demand in the SNIA Educational Library. Questions from the audience ranged from training Large Language Models to fundamental infrastructure changes from AI and more. Here are answers to the audience’s questions from our presenters.
Q: Do you have an idea of where the best tradeoff is for high IO speed cost and GPU working cost? Is it always best to spend maximum and get highest IO speed possible?
A: It depends on what you are trying to do If you are training a Large Language Model (LLM) then you’ll have a large collection of GPUs communicating with one another regularly (e.g., All-reduce) and doing so at throughput rates that are up to 900GB/s per GPU! For this kind of use case, it makes sense to use the fastest network option available. Any money saved by using a cheaper/slightly less performant transport will be more than offset by the cost of GPUs that are idle while waiting for data.
If you are more interested in Fine Tuning an existing model or using Retrieval Augmented Generation (RAG) then you won’t need quite as much network bandwidth and can choose a more economical connectivity option.
It’s worth noting that a group of companies have come together to work on the next generation of networking that will be well suited for use in HPC and AI environments. This group, the Ultra Ethernet Consortium (UEC), has agreed to collaborate on an open standard and has wide industry backing. This should allow even large clusters (1000+ nodes) to utilize a common fabric for all the network needs of a cluster.
Q: We (all industries) are trying to use AI for everything. Is that cost effective? Does it cost fractions of a penny to answer a user question, or is there a high cost that is being hidden or eaten by someone now because the industry is so new?
A: It does not make sense to try and use AI/ML to solve every problem. AI/ML should only be used when a more traditional, algorithmic, technique cannot easily be used to solve a problem (and there are plenty of these). Generative AI aside, one example where AI has historically provided an enormous benefit for IT practitioners is Multivariate Anomaly Detection. These models can learn what normal is for a given set of telemetry streams and then alert the user when something unexpected happens. A traditional approach (e.g., writing source code for an anomaly detector) would be cost and time prohibitive and probably not be anywhere nearly as good at detecting anomalies.
Q: Can you discuss typical data access patterns for model training or tuning? (sequential/random, block sizes, repeated access, etc)?
A: There is no simple answer as the access patterns can vary from one type of training to the next. Assuming you’d like a better answer than that, I would suggest starting to look into two resources:
- Meta’s OCP Presentation: “Meta’s evolution of network for AI” includes a ton of great information about AI’s impact on the network.
- Blocks and Files article: “MLCommons publishes storage benchmark for AI” includes a table that provides an overview of benchmark results for one set of tests.
- With AI, data is typically accessed as either Files or Objects, not Blocks, and FC is primarily used to access block storage.
- If you wanted to use FC in place of IB (for GPU to GPU traffic) you’d need something like an FC-RDMA to make FC suitable.
- All of that said, FC currently maxes out at 128GFC and there are two reasons why this matters:
- AI optimized storage starts at 200Gbps and based on some end user feedback, 400Gbps is already not fast enough.
- GPU to GPU traffic bandwidth requirements require up to 900GB/s (7200Gbps) of throughput per GPU, that’s about 56 128GFC interfaces per GPU.
- dramatically improved overall storage system efficiency, leading to a dramatic performance boost. This performance boost impacted the amount of data that a single storage port could transmit onto a SAN and this had a dramatic impact on the need to monitor for congestion and congestion spreading.
- didn’t completely displace the need for HDDs, just as GPUs won’t replace the need for CPUs. They provide different functions and excel at different types of jobs.
Leave a Reply