SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems. Learn about how cloud object storage can be an ideal file system for Andrej Karpathy's LLM-OS concept including the transformation and use of domain-specific data, storing user context and much more.
Explain why domain-specific data and the RAG pattern is so important to building customized AI applications for different industries and domains.
Understand the LLM OS concept including the notion of a filesystem for storing domain-specific data, user context, etc. and why cloud object storage is an ideal storage system for this filesystem.
Use cloud object storage in conjunction with a vector database to build sample RAG applications.
It is well known that storage sensor data on storage systems can detect abnormal symptoms that can lead to failures. With the abnormal sensor data and machine learning techniques, we can predict a storage component failure ahead of time and proactively remove it, before it can impact the remaining system or interrupt customer’s operations. A successful predictive maintenance model must make trade off in detection rate, false positive and lead time. Through a machine learning feature selection process, we can train and build failure prediction models that can alert us critical component failures before they happen. Our approach has successfully predicted the failures and proactively removed many electronic components such as hard drives, solid state drives, or voltage regulators. The worldwide supply chain shortage during Covid re-opening created a new challenge to these predictive maintenance models. While you may know which component will fail soon, the long lead time on supply chain means you may not be able to get the new parts in time before the failure. To address the supply chain management issue, we develop a separate prediction model to assist our supply chain management team. The lead time is stretched to months or over a quarter. The key differences between the two models are, while the predictive maintenance models are catching the dying anomaly, the supply chain model is looking for stress and activities that can accelerate a failure. Our current models have prediction accuracy between 70-80% detection while maintaining a very low false positive rate. Our quarterly quantity prediction accuracy is 90-95% accurate in real world deployment.
Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems. Learn about how cloud object storage can be an ideal file system for Andrej Karpathy's LLM-OS concept including the transformation and use of domain-specific data, storing user context and much more.
As Machine Learning continues to forge its way into diverse industries and applications, optimizing computational resources, particularly memory, has become a critical aspect of effective model deployment. This session, ""Memory Optimizations for Machine Learning,"" aims to offer an exhaustive look into the specific memory requirements in Machine Learning tasks and the cutting-edge strategies to minimize memory consumption efficiently.
We'll begin by demystifying the memory footprint of typical Machine Learning data structures and algorithms, elucidating the nuances of memory allocation and deallocation during model training phases. The talk will then focus on memory-saving techniques such as data quantization, model pruning, and efficient mini-batch selection. These techniques offer the advantage of conserving memory resources without significant degradation in model performance. Additional insights into how memory usage can be optimized across various hardware setups, from CPUs and GPUs to custom ML accelerators, will also be presented.
With the increased business value that AI enabled applications can unlock, there is a need to support Gen AI models at varying degrees of scale - from foundational model training in data centers to inference deployment on edge and mobile devices. Flash Storage and PCIe/NVMe storage in particular, can play an important role in enabling this with their density and cost benefits. Enabling NVMe offload for Gen AI requires a combination of careful ML model design and its effective deployment on a memory-flash storage tier. Using inference as an example, with the Microsoft Deep Speed library, we highlight the benefits of NVMe offload and call out specific optimizations and improvements that NVMe storage can target to demonstrate improved LLM inference metrics
The performance gap between compute and storage has always been fairly considerable. This mismatch between applications need from storage and what they deliver in large-scale deployments is ever increasing, especially those catering AI workloads. The storage requirements encompasses nearly everything and anything, i.e. capacity/$, availability, reliability, IOPS, throughput, security, etc. Moreover, these requirements are highly dynamic for different phases of AI pipelines. For instance, emerging storage-centric AI applications such as Vector DBs and RAGs have unique requirements of performance, capacity and throughput compared to the most talked upon, i.e. AI Training and Inference. However, the main challenge common to all is to reduce data-entropy across the network, i.e., from storage to compute and vice-versa. All this while balancing the foreground and background IO processing.
In this talk, we would discuss on the core-issues and requirements from storage deployed in data-centers doing AI, with the focus on emerging application use-cases. It also aims to provide a holistic full-stack view of the infrastructure and the under-the-hood of how AI impacts each layer of computing, i.e. compute, network, and storage. This is followed with discussions on myraid of already existing solutions and optimizations which are currently deployed as piece-meal solutions but needs a refresh for enabling storage to meet the challenges posed by AI.