| SNIA | Experts on Data

Simulating CXL.mem for Fun and Profit

CXL.mem enables hosts to expand their memories beyond individual servers and access memory regions using load and store instructions. In addition, CXL.mem enables memory sharing among its endpoints. Realizing memory sharing requires extending the coherency management protocol beyond individual hosts. Hosts and devices need to track the state of each memory region using individual finite state machines. This enables devices to modify the state of memory at specific hosts when needed, which is referred to as back-invalidation snooping. Our goal is to democratize the exploration of large-scale CXL.mem deployments using simulations. In this talk, we describe the design and implementation of our packet-level CXL.mem simulator consisting of hosts, CXL switches, and CXL endpoints. We implemented our simulator on top of SimPy, which is a discrete-event simulation framework based on Python. Users can use the simulator APIs to (i) construct CXL topologies by connecting hosts with their CXL endpoints, switches, and devices, (ii) define control and data flow of applications, and (iii) submit applications to specific hosts for execution. The simulator can also be utilized to explore new CXL switch architectures. Next, we show how our simulator helps explore a variety of use cases. First, we characterize the overheads of CXL coherency to realize memory sharing. These overheads include access, serialization, snooping, and eviction delays. Second, we discuss how our simulator helps analyze the performance of existing task schedulers and memory allocators. In addition, we describe how the simulator enables designing a new CXL-aware task scheduler based on our simulation-driven insights.

Diman Zad Tootaghaj

Senior Researcher

HPE

Modeling & Simulation

Total Cost and Performance Analysis of SSDs in AI Data Centers

“Speeds and Feeds” no longer works. Period. Storage vendors have spent the better part of two decades presenting these facts as if they mean something to the user. While they do have some value, the real need really needs to shift to ownership. Per/GB, IOPS per drive, GB/s, all focus on a single product. Not a net solution need that most people are looking for today. This is where real-world long-term ownership costs and performance are the most important. Total Cost of Ownership, TCO, the metric with so many inputs its becomes a challenge to understand, unless you have tools to make it work. This session will deliver an analysis of a complete storage system using practical, real-world, scenariosand workloads. It will take inputs from all the contributing elements, use simulations to see data along the analysis path, and provide the metrics really needed for making infrastructure choices for everything including storage specifics. Looking at so many factors simultaneously can seem impossible, but with informed, measured, starting data, analysis can be done quickly to provide customers the tools needed to make those decisions. Join us to explore these aspects and learn from our simulation-driven studies. You will learn that marketing "speeds and feeds" don't reflect actual performance or costs. We will show initial CapEx often becomes dwarfed do to the true long-term OpEx of storage solutions. Further use of simulation-based experiments reveals optimal performance and TCO trade-offs tailored to specific workload demands.

Director of Leadership Narrative and Evangelist,

Solidigm

Modeling & Simulation

Smarter Cloud Storage—Optimizing Costs with Tiering and Automation

As organizations increasingly rely on cloud storage, managing costs without sacrificing performance has become essential. Fortunately, cloud providers now offer smart features to optimize storage spend—most notably through access tiers and lifecycle management policies. Storage tiering aligns your data with the right storage class—hot, cool, or archive—based on access frequency. Even better, lifecycle policies and intelligent tiering tools automate data movement across tiers as your workload changes, minimizing manual effort. Modern cloud platforms also offer rich telemetry and analytics, revealing data usage patterns and cost drivers. Paired with AI-powered recommendations, these insights empower users to make proactive, data-driven storage decisions. Together, these tools reduce costs, simplify operations, and support a more efficient, sustainable cloud strategy. This presentation will highlight how to harness tiering, automation, and AI to unlock greater value from your cloud storage—and make every gigabyte count.

Modeling & Simulation

Design Specification and AI-Driven Digital Twin Architecture for Storage Devices

We are moving to an era where being First to Market is key. However, there are multiple problems with respect to hardware availability with: 1. Reduced proto hardware 2. Reduced & tight schedules 3. High proto HW cost These constraints create bottlenecks in design, development, and validation cycles, potentially compromising product quality and market positioning. This presentation introduces an innovative approach leveraging artificial intelligence and open industry standards to create sophisticated Digital Twins of hardware infrastructure. By utilizing SNIA Swordfish and DMTF Redfish specifications, organizations can simulate complex datacenter environments without physical hardware dependencies. The solution employs Large Language Models (LLMs) to dynamically generate device configurations and responses that strictly adhere to industry standards, enabling authentic hardware behavior simulation. The framework combines open-source specifications, data models, and JSON schemas from standards bodies with AI capabilities to create a flexible, scalable simulation environment.

Through intelligent prompt engineering and real-time validation against Redfish/Swordfish specifications, the system generates standardized data representations that mirror actual hardware responses. This approach enables teams to prototype, test, and validate solutions against virtually unlimited hardware configurations, including edge cases and disruptive scenarios that would be costly or impossible to replicate physically. Attendees will learn how to implement AI-driven Digital Twins using industry standards, understand the technical architecture for standards-compliant simulation, and explore practical applications for accelerating product development. The presentation demonstrates how this approach reduces costs, eliminates hardware dependencies, and enables true "design anywhere, test everywhere" capabilities while maintaining full compliance with SNIA/DMTF standards.

Modeling & Simulation

CXL Ecosystem Innovation Leveraging QEMU-based Emulation

As with any emerging technology, developing open-source software for CXL is challenging and emulation of features is a key path forward. In this talk, we will introduce the current major CXL features that QEMU can emulate and walk you through how to set up a Linux + QEMU CXL environment to enable testing and developing new CXL features. Some limitations exist, as with any platform, which we will discuss along with the latest developments in support for dynamic capacity devices (DCD) and switches. A key focus will be how to emulate and interact with these devices using a CXL Management Interface library (libcxlmi).

Total Cost and Performance Analysis of SSDs in AI Data Centers

Abstract

Learning Objectives