Submitted by Anonymous (not verified) on

With the increased business value that AI enabled applications can unlock, there is a need to support Gen AI models at varying degrees of scale - from foundational model training in data centers to inference deployment on edge and mobile devices. Flash Storage and PCIe/NVMe storage in particular, can play an important role in enabling this with their density and cost benefits. Enabling NVMe offload for Gen AI requires a combination of careful ML model design and its effective deployment on a memory-flash storage tier. Using inference as an example, with the Microsoft Deep Speed library, we highlight the benefits of NVMe offload and call out specific optimizations and improvements that NVMe storage can target to demonstrate improved LLM inference metrics

Bonus Content
Off
Presentation Type
Presentation
Learning Objectives

Recognize the need to democratize training and inference at scale
Understand what does enabling NVMe offload of LLMs require
Be aware of opportunities for NVMe flash to enable improved LLM inference performance

Start Date/Time
End Date/Time
YouTube Video ID
fA363YzYEV0
Zoom Meeting Completed
Off
Main Speaker / Moderator
Room Location
Cypress