What is the Role of Flash in Data Storage Ingestion Within the AI Pipeline? | SNIA

HDDs have been the traditional hardware infrastructure for object stores like S3, Google Cloud and Azure Blob in data lakes .But as AI solution deployment transitions to production scale in organizations (Meta's Tectonic-Shift platform being a good example), it begins to impose demands on the data storage ingestion pipeline which have not been seen before. With Deep Learning Recommendation Model (DLRM) training as an AI use case, we first introduce the challenges object stores can expect to face as AI deployments scale. These include the growth in the scale of available data, the growth of faster training GPUs, and the growth in AI/ML ops deployment. We then explain how flash storage is well positioned to meet the needs of bandwidth and power that these systems require. We will share key observations from storage trace analysis of a few MLPerf DLRM preprocessing and training captures. We will conclude with a call to action for more work on standardizing benchmarks to characterize data ingestion performance and power efficiency.

Bonus Content

Off

2024-09-18

PDF Presentation

SNIA-SDC2024-Sankaranarayanan-Rajgopal-What-is-the-Role-of-Flash-in-Data.pdf (549.5 KB)

Presentation Type

Presentation

Learning Objectives

Understand the role of data ingestion in the AI pipeline
Describe how has AI deployment at scale changed and what is expected of object stores?
Understand how flash storage can contribute to address this problem and what more is needed?

Start Date/Time

Wed, 09/18/2024 - 11:35

End Date/Time

Wed, 09/18/2024 - 12:25

YouTube Video ID

wgHAhbXuVKc

Zoom Meeting Completed

Off

Main Speaker / Moderator

Suresh Rajgopal

Speakers

Sundararajan Sankaranarayanan