Compute, Memory, and Storage Summit Presentation Abstracts

Compute, Memory, and Storage Summit Presentation Abstracts

Opening Remarks
Inventing Our Way Around the Memory Wall
Innovation with SmartSSD for Green Computing
NVMe Computational Storage: An Update on the Standard
CXL and UCIe
Persistent Memory Today
Computational Storage in a Virtualized Environment
AI Memory at Meta: Challenges and Potential Solutions
Day 1 Panel Discussion
Scaling NVDIMM-N Architecture for System Acceleration in DDR5 and CXL-Enabled Applications
DAY 1 AFTERNOON SESSIONS
Recap of Day and Closing Remarks
Birds-of-a-Feather - Computational Storage
Opening Remarks, Day 2
HPC For Science Based Motivations for Computation Near Storage
Big Memory in Multi-Cloud
The Latest Efforts in the SNIA Computatonal Storage Technical Work Group
Compute Express LinkTM CXLTM: Advancing the Next Generation of Data Centers
GPU+DPU for Computational Storage
SNIA's SDXI TWG and Its Forays Towards Standardizing Memory Data Movement
Computational Storage for Storage Applications
Day 2 Speakers Panel Discussion
DAY 2 AFTERNOON SESSIONS
Recap of Day and Closing Remarks

2022 Persistent Memory Summit Presentations

Opening Remarks

David McIntyre, SNIA PM+CS Summit Planning Team Chair, SNIA CMSI Marketing Co-Chair; Director, Product Planning and Business Enablement, Samsung Corporation

Abstract

Join David McIntyre as he kicks off our virtual Summit with an overview of the Summit content and presenters, with a focus on the "hot" topics for 2024!

Inventing Our Way Around the Memory Wall

Jim Handy, General Director, Objective Analysis; Tom Coughlin, President, Coughlin Associates

Abstract

Phenomenal change is coming to computing architecture. Persistence is moving closer to the processor, eventually to find its way into caches and registers. Memory interfaces are being completely reconsidered in order to remove one of the biggest bottlenecks in the system, while cutting energy consumption. In certain cases the processor is being moved into the memory chip. Meanwhile new and completely different algorithms, like the broad adoption of AI, are being adopted and these will change the entire way that computers are being used and configured. This is causing pioneers to create new approaches that combine budding emerging memory technologies like MRAM, ReRAM, and FRAM with in-memory computing algorithms to solve problems in ways never before thought possible. This session will lay out the path of these changes and will detail the kind of work that must be done in both hardware and software to support these changes.

Innovation with SmartSSD for Green Computing

Yang Seok Ki, Vice President and CTO of Memory Solutions Lab Samsung Semiconductor

Abstract

Carbon neutrality is becoming an important issue in the industry. Data centers are currently known to consume 1-2% of global electricity usage. Data centers are transforming their architecture to improve total cost of ownership (TCO). But now environmental impact is also becoming an important factor. Therefore, it can be said that green computing, which increases computational efficiency with low energy consumption, has great significance. I note that this is also a problem stemming from the traditional CPU-centric general-purpose computing model. Therefore, accelerators based on different domain-specific architectures are becoming important not only from a basic TCO point of view, but also from a green computing point of view and can be a potential solution. In particular, Samsung has been approaching these issues from the perspective of memory and storage, and has emphasized the importance of a data-centric computing as an alternative to the existing compute-centric computing.

This keynote will showcase the innovations Samsung is making in storage devices through the SmartSSD concept and share its long-term vision to enable power-efficient green computing through data-centric computing. We would also like to invite the industry to collaborate on realizing data-centric computing across various accelerators for more power-efficient data processing.

NVMe Computational Storage: An Update on the Standard

Stephen Bates, NVM Express/Chief Technology Officer, Eideticom
Kim Malone, NVM Express/Storage Software Architect, Intel Corporation

Abstract

Learn what is happening in NVMe to support Computational Storage devices. The development is ongoing and not finalized, but this presentation will describe the directions that the proposal is taking. Kim and Stephen will describe the high level architecture that is being defined in NVMe for Computational Storage. The architecture provides for programs based on a standardized eBPF. We will describe how this new command set fits within the NVMe I/O Command Set architecture. The commands that are necessary for Computational Storage will be described. We will discuss a proposed new controller memory model that is able to be used for computational programs.

CXL and UCIe

Cheolmin Park, CVP, Samsung

Abstract

The ongoing increase in application performance requirements from Cloud to Edge to on-premise use cases require tighter coupling of compute, memory and storage resources. Memory coherency and low latency attributes across converged compute infrastructures are being addressed in part with interconnect technologies including CXL 2.0 and UCIe. This presentation will provide a forward look into computational storage, computational memory developments and the interconnect standardization that enables them.

Persistent Memory Today

Andy Rudoff, Software Architect, Intel Corporation

Abstract

Andy will talk about the most recent developments around PMem use cases, OS developments, and future directions for PMem such as how it will use the emerging Compute Express Link (CXL) interconnect.

Computational Storage in Virtualized Environments

Jinpyo Kim, Senior Staff Engineer, VMware; Michael Mesnier, Principal Engineer, Intel Labs

Abstract

In this talk we will share our thoughts on computational storage and the special considerations for virtualized environments. We will present real data from a prototype computational storage stack (based on NVMe/TCP) and practical first steps that the industry can take toward fixed-function offloads, like data integrity and search. More programmable offloads can build upon such a foundation.

We will also discuss the silicon opportunities and the wide range of solutions at play, including DPUs/IPUs, FPGAs, GPUs, computational SSDs, and special-purpose accelerators. The best “recipe” is still a matter of great debate, and we will share our experience with integrating various accelerators into a computational storage server.

We are actively seeking fellow travelers. So this talk is also call to action for others to get involved.

AI Memory at Meta: Challenges and Potential Solutions

Chris Petersen, Hardware Systems Technologist, Meta

Abstract

AI at Meta is across many applications/services and at scale - driving a portion of Meta's overall hardware and software infrastructure. AI models scale faster than the underlying memory technology. This presentation discusses how an additional tier of memory can help and the options to enable this new memory tier.

Day 1 Panel Discussion

Moderated by Dave Eggleston, Principal, Intuitive Cognition Consulting

Abstract

Join the presenters from Day 1 in a lively panel discussion and ask your questions.

Scaling NVDIMM-N Architecture for System Acceleration in DDR5 and CXL-Enabled Applications

Arthur Sainio, Director of Product Marketing, SMART Modular Technologies and Pekon Gupta, Solutions Architect, SMART Modular Technologies

Abstract

Server and storage applications are taking advantage of fast persistent memory in the form of NVDIMMs. With the industry transition to DDR5 and CXL, NVDIMM architecture which is aligned with industry standards is emerging to solve the ongoing need for high-speed access to persistent memory for write acceleration applications.

Recap of the Day

David McIntyre, SNIA PM+CS Summit Planning Chair

Abstract

Missed a session? Of course you can watch all sessions on demand at the end of the day, but don't miss the fastest 10-minute recap of key points and take-aways.

Birds-of-a-Feather: Computational Storage

Moderators: Scott Shadley and Jason Molgaard, Co-Chairs, SNIA Computational Storage TWG

Abstract

Come to this BoF and discuss computational storage, how everything works together and how to satisfy different needs with different configurations. We will have some primer slides and look for input from the attendees on direction and value of each of these new solutions.

Opening Remarks, Day 2

David McIntyre, SNIA PM+CS Summit Planning Team Chair, SNIA CMSI Marketing Co-Chair; Director, Product Planning and Business Enablement, Samsung Corporation

Abstract

Join David McIntyre as he kicks off the second day of the virtual Summit with an overview of the Summit content and presenters, and how the technologies of persistent memory and computational storage are so important.

HPC for Science Based Motivations for Computation Near Storage

Gary Grider, HPC Division Leader, Los Alamos National Laboratory

Abstract

Scientific data is mostly stored in linear bytes in files but it almost always has hidden structure that resembles records with keys and values, often times in multiple dimensions. Further, the bandwidths required to service HPC simulation workloads will soon approach tens of terabytes/sec with single data files surpassing a petabyte and single sets of data from a campaign approaching 100 petabytes. Multiple tasks from distributed analytical/indexing functions to data management tasks like compression, erasure, encoding, dedup, are all potentially more efficiently and economically performed near storage devices.

Demonstrating a standard’s-based ecosystem for offloading computation to near storage is a valued contribution to the computing, networking, and storage communities. The goal of the joint Accelerated Box of Flash (ABOF) project (a collaboration between Eideticom, Nvidia, Aeon, SKHynix, and LANL) was to produce a first version of a network attached computational storage enclosure that allows a host application to directly leverage distributed computational elements in the storage enclosure without hiding any of the computation behind a data block interface. The first demo of the ABOF was to accelerate a commonly deployed kernel-based file system, ZFS, to appeal to the large ZFS community while also making it easy for vendors to deploy accelerated ZFS appliances, creating interesting business opportunities.

Motivations for computation in network for HPC based computational science and near storage will be presented.

Big Memory and Multi-Cloud

Dr. Charles Fan, CEO and Founder, MemVerge

Abstract

Many people are not aware that persistent memory is deployed in massive hyperscale public clouds, and in private clouds managed by next wave cloud service providers delivering specialty or regional cloud services. In this session, Charles Fan describes cloud computing issues that are uniquely addressed by Big Memory Computing featuring persistent memory. Charles will also reveal use cases and deployments in public and private clouds. The presentation will wrap-up with predictions of how CXL will impact Big Memory Computing with persistent memory in the cloud.

The Latest Efforts in the SNIA Computational Storage Technical Work Group

Scott Shadley and Jason Molgaard, SNIA Computational Storage TWG Co-Chairs

Abstract

With the ongoing work in the CS TWG, the chairs will present the latest updates from the membership of the working group. In addition, the latest release will be reviewed at a high level to provide attendees a view into next steps and implementation of the specification in progress. Use cases, Security considerations, and other key topics with also be addressed.

Compute Express Link TM (CXL TM): Advancing the Next Generation of Data Centers

Alan Benjamin, CXL Consortium/CEO and President, GigaIO

Abstract

Compute Express Link™ (CXL™) is an open industry-standard interconnect offering coherency and memory semantics using high-bandwidth, low-latency connectivity between the host processor and devices such as accelerators, memory buffers, and smart I/O devices.

CXL technology is designed to deliver an open standard that accelerates the next-generation data center performance. This has now become reality with member companies delivering CXL solutions showcasing interoperability between vendors and enabling a new ecosystem for high-performance, heterogeneous computing. The first CXL hardware solutions feature memory expansion, end-point support, memory disaggregation and more.

The CXL 2.0 specification adds switching support providing fan-out to enable connection to more devices; memory pooling for increased memory utilization efficiency and providing memory capacity on demand; and support for persistent memory. The presentation will also share an overview of the CXL 2.0 ECNs to enhance performance, reliability, software interface, and testability while offering design simplification. Additionally, the presentation will provide a high-level overview of the latest advancements in the CXL 3.0 specification development, its new use cases and industry differentiators.

GPU+ DPU for Computational Storage

Rob Davis, VP, NVIDIA

Abstract

One of the challenges for computational storage is getting flexible and powerful compute close enough to the storage to make it worthwhile. FPGAs have potential but are hard to program and not very flexible. Traditional CPU complexes have a large footprint and lack the parallel processing abilities ideal for AI/ML applications. Data Processing Units (DPUs) tightly coupled with GPUs are the answer. The DPU integrates a CPU and hardware accelerators for IO, and storage into a single chip. At the same time, the GPU provides for rapid computation of multiple parallel processes from a single chip, which is beneficial for computational storage applications, including AI. This talk will detail a GPU+DPU solution along with the use cases that it will enable.

SNIA's SDXI TWG and Its Forays Towards Standardizing Memory Data Movement

Shyamkumar Iyer, SNIA Smart Data Accelerator Initiative Technical Work Group Chair/Dell

Abstract

Data in use performance needs are expanding. At the same time, given the diversity of accelerator and memory technologies, standardizing memory to memory data movement and acceleration is gaining ground. SNIA's SDXI (Smart Data Accelerator Interface) TWG is at the forefront of standardizing this and has been working towards a v1.0 and recently put out a v0.9-rev1 version of the specification for public review. Many TWG members have discussed their motivation towards contributing to this specification.

A memory to memory data movement interface is increasingly being seen as an enabler to persistent memory technologies and computational storage use cases. This talk will summarize the various use cases, various features discussed in the SDXI v0.9-rev1 specification, and SNIA's efforts around standardizing this work.

Computational Storage for Storage Applications

Andy Walls, Chief Architect, IBM Fellow, IBM Corporation

Abstract

When we think of computational storage, we often think of offloading applications like databases. However, the SSD is an ideal location to offload the storage controller itself. For example, data reduction is must have requirement for today’s storage controllers. Offloading compression to the SSD results in data reduction at line speed. This talk will focus on this and other functions that can be offloaded to the SSD enabling lower latency, higher IOPs and making CPU MIPs available for things it does best like replication. Come listen to a well-known expert in the field of high performance speak to fundamental and essential computational storage approaches.

Day 2 Panel Discussion

Moderated by Dave Eggleston, Principal, Intuitive Cognition Consulting

Abstract

Join the presenters from Day 2 in a lively panel discussion and ask your questions.

Recap of the Day and Closing Remarks

David McIntyre, SNIA Persistent Memory + Computational Storage Planning Team Chair

Abstract

With all sessions now available on-demand, join us for a recap of Day 2 highlights.

DAY 1 AFTERNOON SESSIONS

Accelerating Oracle Workloads using Intel DCPMM,Oracle PMEM Filestore on VMware

Sudhir Balasubramanian, Sr. Staff Solution Architect and Global Oracle Practice Lead, VMware
Arvind Jagannath, Product Line Manager for vSphere Platform, VMware

Abstract

Enabling, sustaining and ensuring the highest possible performance along with continued application availability is a major goal for all mission critical applications to meet the demanding business SLA’s., all the way from on-premises to VMware Hybrid Clouds.

Please join this session to learn how performance of business-critical Oracle workloads can be accelerated via Persistent Memory technology using Intel Optane DC PMM in App Direct mode backed Oracle 21c Persistent Memory Filestore on VMware vSphere platform, thereby achieving those stringent business goals and enhancing performance.

An NVMe-based SQL Query Engine for accelerating Big-Data Applications

Stephen Bates, CTO, Eideticom

Abstract

NVMe-based Computational Storage offers the key benefits of computational storage and aligns with a popular and open standard. Using Computational Storage we can build systems that have improved performance and higher efficiency when compared to legacy computer architectures. As Computational Storage becomes more pervasive we are seeing a move from basic Computational Storage Functions (e.g., compression) to more complex functions. In this paper we present a NVMe-based SQL Query Engine based on our NoLoad Computational Storage Device. We show how this Query Engine can be tied into big-data applications and how this can lead to improved performance and efficiency in data-center architectures.

Enabling Memory Tiering in the CacheLib

Daniel Byrne, Intel. Corporation

Abstract

Emergent Compute Express Link (CXL) is a new memory interconnect technology that addresses a wide range of use cases from memory expansion in a single server, through a standardized PMem interface, to dynamic memory pooling across many compute nodes. CXL is likely to bring significant cost savings to infrastructure providers through increased memory utilization and a reduced number of server configuration variants. Therefore, software should be adopted to efficiently use available heterogeneous memory resources.

Our talk highlights our experience in prototyping heterogeneous memory tiering for CacheLib. CacheLib is a general-purpose caching engine developed by Meta with support for hybrid caching using DRAM and flash. prototype adds transparent application-level support of heterogeneous memory. We describe the techniques used to reduce the overhead of data movement among heterogeneous memory tiers and the current challenges we encounter today.

What is the Carbon Footprint Benefit of Computational Storage?

Jerome Gaysse, Sr. Technology and Market Analyst, Silinnov Consulting

Abstract

Computational storage brings power and performance benefits, compared to a traditional CPU-centric architecture. What about the carbon footprint? Is it just similar to the power consumption? No, it is a little bit more complex to calculate it.

In this talk, a carbon footprint analysis methodology will be presented, explaining the different parameters that need to be considered, and the different outputs than need to be observed to really understand a carbon footprint analysis. An carbon footprint analysis example will be provided with a CS system benchmark.

Make Sense of Memory Tiering

Alessandro Goncalves, Cloud Solution Architect, Intel Corporation

Abstract

Memory footprint is a major source of capital expenditure and soon will make up more than half of a compute host. New technologies like Persistent Memory and CXL enable new memory topologies and Volatile Memory usage to enable memory density and cost savings. Memory Tiering can be broken down into types with their unique challenges and requirements. Tiering can be enabled in hardware, kernel, or user space. The presentation will go over definition, architecture, and use cases for Memory Tiering options in the market today and the future.

Methods to Evaluate or Identify Suitable Storage for IoT/AI Boards

Mythri K, Senior Staff Engineer, Samsung

Abstract

Everyday huge unstructured data gets generated that needs to be processed and used for Artificial intelligence (AI) and IoT (Internet of Things). The intense investment in AI and IoT leading to Rapid innovation in these technologies. These technologies rely on cloud for storing structured and unstructured data, fetching and processing data from cloud is slow due to increased latency and also power consumed will be high, apart from these issues End user is not comfortable in storing data in the cloud due to privacy issues. These issues are pushing AI application to have their intelligence implemented at the edge of network instead of data centers.

Storage is important module at the edge network that can accommodate memory hungry AI processes like Training which uses off-chip memory to keep up with performance improvements. As AI and IoT are becoming famous many professionals and companies use Boards like Raspberry Pi and Arduino which are small inexpensive Boards that allows connecting to various external accessories such as sensors and create Applications.

Famous Boards like Raspberry uses MicorSD card that are of low cost for booting and storing data. This paper explains methods to get benchmarking results and lists important parameters and their values that helps to evaluate which cards needs to be chosen to implement any Applications on Raspberry board. This paper also explain method to emulate one of the AI application that helps to find out life time of MicroSD card for various workloads.

SSDs That Think

Mats Oberg, Associate Vice President, DSP Architecture, Storage Office of the CTO, Marvell Corporation

Abstract

With the amount of generated data growing steeply, obviously not all of it can be saved. An even smaller portion of it actually gets analyzed and leveraged. In addition, many expensive host cycles need to be invested in pre-processing before the data is actually used for computation. The benefits of computational storage to offline process the data that is stored locally on the drive, at rest, and generate compact and relevant representation of it, can enable more efficient host processing. While inline processing of data on its commute to/from the drive can significantly improve overall performance, computational storage offline processing is an enabler for more data to be uncovered and more use cases to emerge. This presentation will include concrete examples of AI inferencing done at the storage device in the context of the needed host computation.

Accelerating Near Real-time Algorithms Using Disaggregated Computational Storage

Mayank Saxena, Sr. Director, Engineering, Samsung

Abstract

Computational storage can bring unique benefits in increasing the efficiency of CPU utilization in a data processing system. Here we discuss the benefits of leveraging computational storage in a disaggregated storage environment. We demonstrate the ability of the solution to complement the CPU by taking away tasks that benefit from in-situ processing within the storage, thereby improving the overall system performance while lowering the TCO. Disaggregated storage is particularly attractive when using computational storage since scaling storage naturally yields to scaling of tasks that can be accelerated using computational storage. We experimented with accelerating the S3 Select functionality using our disaggregated computational storage (DCS) platform. Data tagging and partitioning utilizing sharding aspect of DCS platform further enhances ability to provide even greater performance for large data processing with parallel execution.

Empowering Real-Time Decision Making for Large-Scale Datasets with SSD-like Economics 

Prasad Venkatachar, Director of Technical Marketing, Pliops

Abstract

 In the technology-driven world, we live in, the speed of data access and the real-time nature of data can make or break a business for data-driven organizations. Hence many organizations leverage In-Memory solutions for real-time decision-making to capture trends and competitiveness. However, as the data set extends beyond memory footprints, many enterprises take performance & cost-compromised approach by limiting critical datasets in-memory for real-time and perceived less critical datasets in SSD storage, resulting in delayed business decisions and missed business opportunities. This session walks through a new class of data processors that enable in-memory type performance using SSD like cost economics with no performance or data sets compromise for real-time decision making. We will showcase how Redis with SSDs accelerated by a new class of data processor that can support 10x more data with performance and 4 9’s latency close to DRAM-based solutions.

Accelerating Operations on Persistent Memory Device Via Hardware-based Memory Offloading Technique

Ziye Yang, Staff Software Development Engineer, Intel Corporation

Abstract

With more and more fast devices (especially persistent memory, aka. PMEM) equipped in data center, there is great pressure on CPU to drive those devices (e.g., Intel Optane DC persistent memory) for persistency purpose under heavy workloads. Because there is no DMA related capability provided by persistent memory compared with those hdds and SSDs. Thus CPU needs to participate the data operations on the persistent memory all the time. In this talk, we would like to mitigate the CPU pressure via hardware based memory offloading devices (e.g., Intel's IOAT and DSA). To demonstrate our work, we introduce the memory offloading device solution in two kinds of workloads on using PMEM, i.e., (1) storage applications based on SPDK framework; (2) leveraging DSA in Ceph's bluestore with PMEM device. In SPDK side, we introduce how to design and implement a new block device based on persistent memory (e.g., Intel Optane DC persistent memory) and SPDK acceleration framework (which leverages IOAT or DSA). In Ceph side, we describe the main change in bluestore's pmem module (i.e., src/blk/pmemdevice.cc) and state how we can achieve the offloading via DSA including the challenges. With our approach, we do see the performance improvement with the workloads on the PMEM device, .e.g., IOPS increasing and latency decreasing with storage application based on SPDK framework. And we will have some early performance results if Intel's SPR platform is available in public.

DAY 2 AFTERNOON SESSIONS

Persistent Memory-based Storage Node for HPC Domain

Harikrishnan B, Scientist E, CPAC

I/O is recognized as a performance bottleneck in HPC domains. This is expected to worsen with increase in the disparity between computation and I/O capacity on future HPC machines. The fundamental consideration for all potential technologies and architectures is the trade off between cost and performance. So we need to re-evaluate old architectures and consider new storage architectures and technologies that would address the requirements for capacity and speed. This presentation discusses how Storage architectures based on persistent memory can greatly improve the performance bottleneck. Persistent memory data can be accessed even after the process that created or last modified it has ended. Persistent memory moves the storage closer to compute.

Computational Memory: Moving compute near data

Pekon Gupta, Solutions Architect, SMART Modular Technologies
Arthur Sainio, Director of Product Marketing, SMART Modular Technologies

Abstract

In most architectures data are copied from far memory or storage to local memory for processing. When there are 100’s of GB’s of data it is more efficient to move static compute functions nearer to the data. Moving compute near the memory or building a compute engine inside the memory device is called Computational Memory. Static functions like compression, sorting, search and encryption can be offloaded to the computational memory. Hardware engines can filter the data near the source before being consumed by heavy compute engines like GPU and multi-core CPUs. Offloading static functions to computational memory not only reduces the wasteful copy and discard approach but also improves efficiency and frees the bus for other performance-critical workloads. This presentation discusses the use of an Intel Optane™ PMem-based accelerator which attaches up to 2TB’s of memory into a single PCIe Gen 4.0 interface.

Application Requirements for Wider Adoption of Computational Storage Technology

Ramdas Kachare, Sr. Director, System Architecture, MSL, Samsung Semiconductor, Inc.

Abstract

As humanity embraces digital technologies at a rapid pace, data generation and consumption is growing at very high rate. In modern IT infrastructure, humongous amounts of data are being generated by various applications, devices and processes such as autonomous vehicles, social networks, genomics, and smart sensors. New AI and ML algorithms are being developed to effectively analyze the collected data and use it to achieve even greater efficiencies and productivity of applications. In traditional system architectures, data is fetched from persistent storage to high performance servers using high performance networks. Moving such large amounts of raw data to CPU for processing and analyzing is expensive in terms of amount of energy consumed, as well as compute and network resources deployed. Computational Storage (CS) technology promises to alleviate some of these inefficiencies in the system architecture by processing some of the data closer to the storage and thereby reducing large, unnecessary data movements. This evolution of system architectures must address application needs and requirements to achieve a viable and feasible CS solution. This presentation discusses some of the common application requirements that must be addressed by CS solutions for successful and wider adoption of CS technology.

Organic Redesign of Abstractions for Computational Storage Devices using CISCOps

Sudarsun Kannan, Assistant Professor, Rudgers University

Abstract

To exploit the near-storage computational capability for fast I/O and data processing, consequently reducing I/O bottlenecks. By drawing inspiration from seminal CISC processor ISAs, we introduce a new abstraction, CISCOps for storage devices, that combines multiple I/O and data processing operations into one fused operation and offloaded for near-storage processing. By offloading, CISCOps significantly reduces dominant I/O overheads such as system calls, data movement, communication, and other software overheads. Further, to enhance the use of CISCOps, we introduce MicroTx for fine-grained crash consistency and fast (automatic) recovery of I/O and data processing operations. We also explore scheduling techniques to ensure fair and efficient use of in-storage compute and memory resources across tenants. Our evaluation of an emulated prototype against the state-of-the-art user-level and kernel-level file systems using microbenchmarks, macrobenchmarks, and real-world applications shows good performance gains. More details of this research can be found in our FAST '22 paper.

File System Acceleration using Computational Storage for Efficient Data Storage

Srija Malyala, Software Engineer 2, Advanced Micro Devices
Vaishnavi S G, Software Developer, Advanced Micro Devices
Amar Nagula, Senior Software Development Manager, Advanced Micro Devices

Abstract

In this paper, we examine the benefits of using computational storage devices like Xilinx SmartSSD to offload the compression to achieve an ideal compression scheme where higher compression ratios are achieved with lower CPU resources. This offloading of compute intensive task of compression frees up the CPU to cater to real customer applications. The scheme proposed in this paper comprises of Xilinx Storage Services (XSS) with Xilinx Runtime (XRT) software and HLS based GZIP compression kernel that runs on the FPGA. The hardware platform chosen is Xilinx SmartSSD which also has a unique feature of P2P data transfer where the data input/output to/from the FPGA is directly moved from/to the storage device without moving it back to the host system (x86) memory. This further helps in improving the overall system efficiency by reducing the DDR memory traffic by moving computation closer to where data resides. There are different places in the application/OS software stack where data compression can be offloaded to hardware. We have chosen to do this at the file system level because this will enable all the applications using the filesystem to benefit without necessarily making any changes to the application itself. We have selected the Linux ZFS filesystem as this is the most widely used and popular file system today.

Computational Storage: How Do NVMe CS and SNIA CS Work Together?

William Martin, SNIA Technical Council Co-Chair/Principal Engineer, SSD IO Standards, Samsung Semiconductor

Abstract

NVMe and SNIA are both working on standards related to Computational Storage. The question that is continually asked is are these efforts are compatible or at odds with each other. The truth is that many of the same people are working on both of these standards efforts and are very interested in ensuring that they work together as opposed to conflicting with each other. This presentation will discuss how the two standards efforts go hand in hand, the aspects of the SNIA API that support the NVMe efforts, and the NVMe efforts to share common architectural structures with those defined in SNIA. As part of this discussion, a lexicon of terminology used in both organizations will be presented and the decoder ring that allows you to understand one document in terms of the other in spite of some differences in names used.

Programming with Computational Storage

Oscar Pinto, Principal Software Architect, Samsung Semiconductor

Abstract

There is an exponential growth of stored data and of applications processing data in the cloud and the edge. These applications based on traditional CPU based architectures may run into resource limits. Recent developments in Computational Storage have emerged as a promising solution to alleviate the limitations associated with traditional models. In this model, compute is performed near data thereby overcoming CPU, memory and fabric limitations. The SNIA Computational Storage TWG is leading the way towards an application programming model while the NVMe CS TG is defining the device programming model at physical protocol layer. This presentation will provide an overview of how Computational Storage APIs may be used to connect with a Computational Storage capable device. Learn what the device may be able to provide for such applications and how to program them.

Evolving Storage for a New Generation of AI/ML

Somnath Roy, Principal Engineer, System Architecture, Samsung Semiconductor

Abstract

AI/ML is not new, but innovations in ML models development have made it possible to process data at unprecedented speeds. Data scientists have used standard POSIX file systems for years, but as the scale and need for performance have grown, many face new storage challenges. Samsung has been working with customers on new ways of approaching storage issues with object storage designed for use with AI/ML. Hear how software and hardware are evolving to allow unprecedented performance and scale of storage for Machine Learning.

A

How CXL Will Change the Data Center

Bernie Wu, VP Strategic Alliances, MemVerge

Abstract

Compute Express Link will serve as a new computing backbone that will carry the weight of petabytes of memory shared by heterogenous processors. At the heart of the new CXL fabrics will be software that transforms the hardware into a pool of composable resources for fine-grained provisioning of capacity, performance, availability, and mobility. In this session, Bernie Wu will provide a sneak preview of how CXL will change the datacenter. He’ll accomplish this by demonstrating 3 technologies that provide these capabilities today, and which will move onto CXL when it’s available. Bernie will show memory sharing in a Composable Datacenter Infrastructure (CDI) environment; he will show Intel® Optane™ Persistent Memory being used by heterogeneous processors; and he will show how memory orchestration will work with Linux, OpenStack, OpenShift Infrastructure (LOKI).

Birds-of-a-Feather - Computational Storage