Submitted by Anonymous (not verified) on

Modern AI systems usually require diverse data processing and feature engineering at a tremendous scale and employ heavy and complex deep learning model that requires expensive accelerators or GPUs. This leads to the typical design of running data processing and AI on two separate platforms, which leads to severe data movement issues and creates big challenges for efficient AI solutions. One purpose of AI democratization is to converge the software and hardware infrastructure and unified data processing and training on the same cluster, where a high-performance, scalable data platform will be a foundational component. In this session, we will introduce motivations and challenges of AI democratization, then we will propose a data platform architecture for E2E AI systems, from software and hardware infrastructure perspectives. It includes distributed compute and storage platform, parallel data processing and connector to deep learning training framework. We will also showcase how this data platform improved the pipeline efficiency of democratized AI solutions on commodity CPU cluster for several recommender system workloads like DLRM, DIEN, and WnD with orders of magnitude performance speedup.

Bonus Content
Off
Presentation Type
Presentation
Learning Objectives
  • Data platform for AI
  • AI democratization
  • bigdata and AI converge
Display Order
235
Start Date/Time
End Date/Time
YouTube Video ID
obsvJvunjDE
Zoom Meeting Completed
Off
Main Speaker / Moderator
Room Location
Salon IV
Webform Submission ID
470