David McIntyre, SNIA PM+CS Summit Planning Team Chair, SNIA CMSI Marketing Co-Chair; Director, Product Planning and Business Enablement, Samsung Corporation
Join David McIntyre as he kicks off our virtual Summit with an overview of the Summit content and presenters, with a focus on the "hot" topics for 2024!
Jim Handy, General Director, Objective Analysis; Tom Coughlin, President, Coughlin Associates
Phenomenal change is coming to computing architecture. Persistence is moving closer to the processor, eventually to find its way into caches and registers. Memory interfaces are being completely reconsidered in order to remove one of the biggest bottlenecks in the system, while cutting energy consumption. In certain cases the processor is being moved into the memory chip. Meanwhile new and completely different algorithms, like the broad adoption of AI, are being adopted and these will change the entire way that computers are being used and configured. This is causing pioneers to create new approaches that combine budding emerging memory technologies like MRAM, ReRAM, and FRAM with in-memory computing algorithms to solve problems in ways never before thought possible. This session will lay out the path of these changes and will detail the kind of work that must be done in both hardware and software to support these changes.
Yang Seok Ki, Vice President and CTO of Memory Solutions Lab Samsung Semiconductor
Stephen Bates, NVM Express/Chief Technology Officer, Eideticom
Kim Malone, NVM Express/Storage Software Architect, Intel Corporation
Learn what is happening in NVMe to support Computational Storage devices. The development is ongoing and not finalized, but this presentation will describe the directions that the proposal is taking. Kim and Stephen will describe the high level architecture that is being defined in NVMe for Computational Storage. The architecture provides for programs based on a standardized eBPF. We will describe how this new command set fits within the NVMe I/O Command Set architecture. The commands that are necessary for Computational Storage will be described. We will discuss a proposed new controller memory model that is able to be used for computational programs.
Cheolmin Park, CVP, Samsung
The ongoing increase in application performance requirements from Cloud to Edge to on-premise use cases require tighter coupling of compute, memory and storage resources. Memory coherency and low latency attributes across converged compute infrastructures are being addressed in part with interconnect technologies including CXL 2.0 and UCIe. This presentation will provide a forward look into computational storage, computational memory developments and the interconnect standardization that enables them.
Andy Rudoff, Software Architect, Intel Corporation
Andy will talk about the most recent developments around PMem use cases, OS developments, and future directions for PMem such as how it will use the emerging Compute Express Link (CXL) interconnect.
Jinpyo Kim, Senior Staff Engineer, VMware; Michael Mesnier, Principal Engineer, Intel Labs
Chris Petersen, Hardware Systems Technologist, Meta
AI at Meta is across many applications/services and at scale - driving a portion of Meta's overall hardware and software infrastructure. AI models scale faster than the underlying memory technology. This presentation discusses how an additional tier of memory can help and the options to enable this new memory tier.
Moderated by Dave Eggleston, Principal, Intuitive Cognition Consulting
Arthur Sainio, Director of Product Marketing, SMART Modular Technologies and Pekon Gupta, Solutions Architect, SMART Modular Technologies
Server and storage applications are taking advantage of fast persistent memory in the form of NVDIMMs. With the industry transition to DDR5 and CXL, NVDIMM architecture which is aligned with industry standards is emerging to solve the ongoing need for high-speed access to persistent memory for write acceleration applications.
David McIntyre, SNIA PM+CS Summit Planning Chair
Missed a session? Of course you can watch all sessions on demand at the end of the day, but don't miss the fastest 10-minute recap of key points and take-aways.
Moderators: Scott Shadley and Jason Molgaard, Co-Chairs, SNIA Computational Storage TWG
Come to this BoF and discuss computational storage, how everything works together and how to satisfy different needs with different configurations. We will have some primer slides and look for input from the attendees on direction and value of each of these new solutions.
David McIntyre, SNIA PM+CS Summit Planning Team Chair, SNIA CMSI Marketing Co-Chair; Director, Product Planning and Business Enablement, Samsung Corporation
Join David McIntyre as he kicks off the second day of the virtual Summit with an overview of the Summit content and presenters, and how the technologies of persistent memory and computational storage are so important.
Gary Grider, HPC Division Leader, Los Alamos National Laboratory
Dr. Charles Fan, CEO and Founder, MemVerge
Many people are not aware that persistent memory is deployed in massive hyperscale public clouds, and in private clouds managed by next wave cloud service providers delivering specialty or regional cloud services. In this session, Charles Fan describes cloud computing issues that are uniquely addressed by Big Memory Computing featuring persistent memory. Charles will also reveal use cases and deployments in public and private clouds. The presentation will wrap-up with predictions of how CXL will impact Big Memory Computing with persistent memory in the cloud.
Scott Shadley and Jason Molgaard, SNIA Computational Storage TWG Co-Chairs
With the ongoing work in the CS TWG, the chairs will present the latest updates from the membership of the working group. In addition, the latest release will be reviewed at a high level to provide attendees a view into next steps and implementation of the specification in progress. Use cases, Security considerations, and other key topics with also be addressed.
Alan Benjamin, CXL Consortium/CEO and President, GigaIO
Rob Davis, VP, NVIDIA
One of the challenges for computational storage is getting flexible and powerful compute close enough to the storage to make it worthwhile. FPGAs have potential but are hard to program and not very flexible. Traditional CPU complexes have a large footprint and lack the parallel processing abilities ideal for AI/ML applications. Data Processing Units (DPUs) tightly coupled with GPUs are the answer. The DPU integrates a CPU and hardware accelerators for IO, and storage into a single chip. At the same time, the GPU provides for rapid computation of multiple parallel processes from a single chip, which is beneficial for computational storage applications, including AI. This talk will detail a GPU+DPU solution along with the use cases that it will enable.
Shyamkumar Iyer, SNIA Smart Data Accelerator Initiative Technical Work Group Chair/Dell
Andy Walls, Chief Architect, IBM Fellow, IBM Corporation
When we think of computational storage, we often think of offloading applications like databases. However, the SSD is an ideal location to offload the storage controller itself. For example, data reduction is must have requirement for today’s storage controllers. Offloading compression to the SSD results in data reduction at line speed. This talk will focus on this and other functions that can be offloaded to the SSD enabling lower latency, higher IOPs and making CPU MIPs available for things it does best like replication. Come listen to a well-known expert in the field of high performance speak to fundamental and essential computational storage approaches.
Moderated by Dave Eggleston, Principal, Intuitive Cognition Consulting
Join the presenters from Day 2 in a lively panel discussion and ask your questions.
David McIntyre, SNIA Persistent Memory + Computational Storage Planning Team Chair
With all sessions now available on-demand, join us for a recap of Day 2 highlights.
Sudhir Balasubramanian, Sr. Staff Solution Architect and Global Oracle Practice Lead, VMware
Arvind Jagannath, Product Line Manager for vSphere Platform, VMware
Stephen Bates, CTO, Eideticom
NVMe-based Computational Storage offers the key benefits of computational storage and aligns with a popular and open standard. Using Computational Storage we can build systems that have improved performance and higher efficiency when compared to legacy computer architectures. As Computational Storage becomes more pervasive we are seeing a move from basic Computational Storage Functions (e.g., compression) to more complex functions. In this paper we present a NVMe-based SQL Query Engine based on our NoLoad Computational Storage Device. We show how this Query Engine can be tied into big-data applications and how this can lead to improved performance and efficiency in data-center architectures.
Daniel Byrne, Intel. Corporation
Jerome Gaysse, Sr. Technology and Market Analyst, Silinnov Consulting
Alessandro Goncalves, Cloud Solution Architect, Intel Corporation
Mythri K, Senior Staff Engineer, Samsung
Mats Oberg, Associate Vice President, DSP Architecture, Storage Office of the CTO, Marvell Corporation
With the amount of generated data growing steeply, obviously not all of it can be saved. An even smaller portion of it actually gets analyzed and leveraged. In addition, many expensive host cycles need to be invested in pre-processing before the data is actually used for computation. The benefits of computational storage to offline process the data that is stored locally on the drive, at rest, and generate compact and relevant representation of it, can enable more efficient host processing. While inline processing of data on its commute to/from the drive can significantly improve overall performance, computational storage offline processing is an enabler for more data to be uncovered and more use cases to emerge. This presentation will include concrete examples of AI inferencing done at the storage device in the context of the needed host computation.
Mayank Saxena, Sr. Director, Engineering, Samsung
Computational storage can bring unique benefits in increasing the efficiency of CPU utilization in a data processing system. Here we discuss the benefits of leveraging computational storage in a disaggregated storage environment. We demonstrate the ability of the solution to complement the CPU by taking away tasks that benefit from in-situ processing within the storage, thereby improving the overall system performance while lowering the TCO. Disaggregated storage is particularly attractive when using computational storage since scaling storage naturally yields to scaling of tasks that can be accelerated using computational storage. We experimented with accelerating the S3 Select functionality using our disaggregated computational storage (DCS) platform. Data tagging and partitioning utilizing sharding aspect of DCS platform further enhances ability to provide even greater performance for large data processing with parallel execution.
Prasad Venkatachar, Director of Technical Marketing, Pliops
In the technology-driven world, we live in, the speed of data access and the real-time nature of data can make or break a business for data-driven organizations. Hence many organizations leverage In-Memory solutions for real-time decision-making to capture trends and competitiveness. However, as the data set extends beyond memory footprints, many enterprises take performance & cost-compromised approach by limiting critical datasets in-memory for real-time and perceived less critical datasets in SSD storage, resulting in delayed business decisions and missed business opportunities. This session walks through a new class of data processors that enable in-memory type performance using SSD like cost economics with no performance or data sets compromise for real-time decision making. We will showcase how Redis with SSDs accelerated by a new class of data processor that can support 10x more data with performance and 4 9’s latency close to DRAM-based solutions.
Ziye Yang, Staff Software Development Engineer, Intel Corporation
With more and more fast devices (especially persistent memory, aka. PMEM) equipped in data center, there is great pressure on CPU to drive those devices (e.g., Intel Optane DC persistent memory) for persistency purpose under heavy workloads. Because there is no DMA related capability provided by persistent memory compared with those hdds and SSDs. Thus CPU needs to participate the data operations on the persistent memory all the time. In this talk, we would like to mitigate the CPU pressure via hardware based memory offloading devices (e.g., Intel's IOAT and DSA). To demonstrate our work, we introduce the memory offloading device solution in two kinds of workloads on using PMEM, i.e., (1) storage applications based on SPDK framework; (2) leveraging DSA in Ceph's bluestore with PMEM device. In SPDK side, we introduce how to design and implement a new block device based on persistent memory (e.g., Intel Optane DC persistent memory) and SPDK acceleration framework (which leverages IOAT or DSA). In Ceph side, we describe the main change in bluestore's pmem module (i.e., src/blk/pmemdevice.cc) and state how we can achieve the offloading via DSA including the challenges. With our approach, we do see the performance improvement with the workloads on the PMEM device, .e.g., IOPS increasing and latency decreasing with storage application based on SPDK framework. And we will have some early performance results if Intel's SPR platform is available in public.
Harikrishnan B, Scientist E, CPAC
I/O is recognized as a performance bottleneck in HPC domains. This is expected to worsen with increase in the disparity between computation and I/O capacity on future HPC machines. The fundamental consideration for all potential technologies and architectures is the trade off between cost and performance. So we need to re-evaluate old architectures and consider new storage architectures and technologies that would address the requirements for capacity and speed. This presentation discusses how Storage architectures based on persistent memory can greatly improve the performance bottleneck. Persistent memory data can be accessed even after the process that created or last modified it has ended. Persistent memory moves the storage closer to compute.
Pekon Gupta, Solutions Architect, SMART Modular Technologies
Arthur Sainio, Director of Product Marketing, SMART Modular Technologies
Ramdas Kachare, Sr. Director, System Architecture, MSL, Samsung Semiconductor, Inc.
As humanity embraces digital technologies at a rapid pace, data generation and consumption is growing at very high rate. In modern IT infrastructure, humongous amounts of data are being generated by various applications, devices and processes such as autonomous vehicles, social networks, genomics, and smart sensors. New AI and ML algorithms are being developed to effectively analyze the collected data and use it to achieve even greater efficiencies and productivity of applications. In traditional system architectures, data is fetched from persistent storage to high performance servers using high performance networks. Moving such large amounts of raw data to CPU for processing and analyzing is expensive in terms of amount of energy consumed, as well as compute and network resources deployed. Computational Storage (CS) technology promises to alleviate some of these inefficiencies in the system architecture by processing some of the data closer to the storage and thereby reducing large, unnecessary data movements. This evolution of system architectures must address application needs and requirements to achieve a viable and feasible CS solution. This presentation discusses some of the common application requirements that must be addressed by CS solutions for successful and wider adoption of CS technology.
Sudarsun Kannan, Assistant Professor, Rudgers University
To exploit the near-storage computational capability for fast I/O and data processing, consequently reducing I/O bottlenecks. By drawing inspiration from seminal CISC processor ISAs, we introduce a new abstraction, CISCOps for storage devices, that combines multiple I/O and data processing operations into one fused operation and offloaded for near-storage processing. By offloading, CISCOps significantly reduces dominant I/O overheads such as system calls, data movement, communication, and other software overheads. Further, to enhance the use of CISCOps, we introduce MicroTx for fine-grained crash consistency and fast (automatic) recovery of I/O and data processing operations. We also explore scheduling techniques to ensure fair and efficient use of in-storage compute and memory resources across tenants. Our evaluation of an emulated prototype against the state-of-the-art user-level and kernel-level file systems using microbenchmarks, macrobenchmarks, and real-world applications shows good performance gains. More details of this research can be found in our FAST '22 paper.
Srija Malyala, Software Engineer 2, Advanced Micro Devices
Vaishnavi S G, Software Developer, Advanced Micro Devices
Amar Nagula, Senior Software Development Manager, Advanced Micro Devices
In this paper, we examine the benefits of using computational storage devices like Xilinx SmartSSD to offload the compression to achieve an ideal compression scheme where higher compression ratios are achieved with lower CPU resources. This offloading of compute intensive task of compression frees up the CPU to cater to real customer applications. The scheme proposed in this paper comprises of Xilinx Storage Services (XSS) with Xilinx Runtime (XRT) software and HLS based GZIP compression kernel that runs on the FPGA. The hardware platform chosen is Xilinx SmartSSD which also has a unique feature of P2P data transfer where the data input/output to/from the FPGA is directly moved from/to the storage device without moving it back to the host system (x86) memory. This further helps in improving the overall system efficiency by reducing the DDR memory traffic by moving computation closer to where data resides. There are different places in the application/OS software stack where data compression can be offloaded to hardware. We have chosen to do this at the file system level because this will enable all the applications using the filesystem to benefit without necessarily making any changes to the application itself. We have selected the Linux ZFS filesystem as this is the most widely used and popular file system today.
William Martin, SNIA Technical Council Co-Chair/Principal Engineer, SSD IO Standards, Samsung Semiconductor
NVMe and SNIA are both working on standards related to Computational Storage. The question that is continually asked is are these efforts are compatible or at odds with each other. The truth is that many of the same people are working on both of these standards efforts and are very interested in ensuring that they work together as opposed to conflicting with each other. This presentation will discuss how the two standards efforts go hand in hand, the aspects of the SNIA API that support the NVMe efforts, and the NVMe efforts to share common architectural structures with those defined in SNIA. As part of this discussion, a lexicon of terminology used in both organizations will be presented and the decoder ring that allows you to understand one document in terms of the other in spite of some differences in names used.
Oscar Pinto, Principal Software Architect, Samsung Semiconductor
There is an exponential growth of stored data and of applications processing data in the cloud and the edge. These applications based on traditional CPU based architectures may run into resource limits. Recent developments in Computational Storage have emerged as a promising solution to alleviate the limitations associated with traditional models. In this model, compute is performed near data thereby overcoming CPU, memory and fabric limitations. The SNIA Computational Storage TWG is leading the way towards an application programming model while the NVMe CS TG is defining the device programming model at physical protocol layer. This presentation will provide an overview of how Computational Storage APIs may be used to connect with a Computational Storage capable device. Learn what the device may be able to provide for such applications and how to program them.
Somnath Roy, Principal Engineer, System Architecture, Samsung Semiconductor
AI/ML is not new, but innovations in ML models development have made it possible to process data at unprecedented speeds. Data scientists have used standard POSIX file systems for years, but as the scale and need for performance have grown, many face new storage challenges. Samsung has been working with customers on new ways of approaching storage issues with object storage designed for use with AI/ML. Hear how software and hardware are evolving to allow unprecedented performance and scale of storage for Machine Learning.
Bernie Wu, VP Strategic Alliances, MemVerge
Compute Express Link will serve as a new computing backbone that will carry the weight of petabytes of memory shared by heterogenous processors. At the heart of the new CXL fabrics will be software that transforms the hardware into a pool of composable resources for fine-grained provisioning of capacity, performance, availability, and mobility. In this session, Bernie Wu will provide a sneak preview of how CXL will change the datacenter. He’ll accomplish this by demonstrating 3 technologies that provide these capabilities today, and which will move onto CXL when it’s available. Bernie will show memory sharing in a Composable Datacenter Infrastructure (CDI) environment; he will show Intel® Optane™ Persistent Memory being used by heterogeneous processors; and he will show how memory orchestration will work with Linux, OpenStack, OpenShift Infrastructure (LOKI).
Birds-of-a-Feather - Computational Storage |