Submitted by Anonymous (not verified) on

The recent uptick in generative artificial intelligence (GAI) has put the more pressure on hardware vendors to reduce the carbon footprint of running these power hungry large language models (LLM) in the datacenter. One way to accomplish a lower in-silicon power profile is to break the Von-Neumann bottleneck by tightly integrating traditional SRAM memory cells with interleaved programable processors in the same die. We report on our progress in this area, in particular, leveraging recent open research in both mixed precision mathematics and extreme low-bit quantization of deep learning model parameters and activations running in our custom "In-SRAM" processor.

Bonus Content
Off
Presentation Type
Presentation
Learning Objectives
  • Learn about the challenges of running generative AI and large language models in the datacenter.
  • Learn about a novel computer architecture, "In-SRAM" computing.
  • Learn about recent advances in new compressed data types suitable for large-scale deep learning models.
Start Date/Time
End Date/Time
YouTube Video ID
Y7tw7UOg_Wk
Zoom Meeting Completed
Off
Main Speaker / Moderator
Track
Room Location
Salon IV
Webform Submission ID
904