Agentic AI – Use Cases, Benefits, Risks

webinar

Library Content Type

Technology Focus

Library Release Date

Focus Areas

Abstract

Agentic AI is an emergent and disruptive paradigm beyond generative AI that carries great promise and great risk. With generative AI, a large language model (LLM) generates content as the result of a prompt. With agentic AI, the language model now generates actions to execute on behalf of a user using tools at its disposal. It has agency to act on your behalf and even run autonomously.  But how can this be done safely and effectively for enterprises?

This webinar will introduce Agentic AI and cover:

Unlocking Sustainable Data Centers: Optimizing SSD Power Efficiency and Liquid Cooling for AI Workloads

webinar

Library Content Type

Technology Focus

Library Release Date

Focus Areas

Abstract

Explore cutting-edge innovations in solid-state drive (SSD) power efficiency and liquid cooling, designed to mitigate AI workload bottlenecks and reduce Total Cost of Ownership (TCO) in data centers. By focusing on optimizing SSD performance per watt, we can significantly enhance data center sustainability, operational efficiency, and economic viability.

Key topics include:

Storage Trends in AI: Your Questions Answered

SNIA STA Community

Apr 22, 2025

title of post

As AI workloads grow in complexity and scale, choosing the right storage infrastructure has never been more critical. In the “Storage Trends in AI 2025” webinar, experts from SNIA, the SNIA SCSI Trade Association community, and ServeTheHome discussed how established technologies like SAS continue to play a vital role in supporting AI’s evolving demands—particularly in environments where reliability, scalability, and cost-efficiency matter most. The session also covered emerging interfaces like NVMe and CXL, offering a full-spectrum view of what’s next in AI storage. Below is a recap of the audience Q&A, with insights into how SAS and other technologies are rising to meet the AI moment.

Q: Could you explain how tail latency tied to data type?

A: Tail latency is more closely tied to service level agreements (SLAs). In AI platforms, tail latency isn’t typically a key concern when it comes to model calculations.

Q: For smaller customers, what are best practices for mixing Normal Workloads (like regular VM Infra) with AI workloads? Do they need to separate those workloads on different storage solutions?

A: There are no "small customers," there are only "small workloads." You right-size the bandwidth and server memory/storage for the workload. The trade-off has to do with time, because if you're creating a negotiation for resources and utilization then you either must dedicate the resources that you have available or you must accept the extended time necessary for completion.

Q: Storage technology is moving towards NVMe and with less focus on spinning drives. How is SAS being used for AI workloads?

A: SAS still plays a critical role in the ingest phase of AI workloads, where large data volumes make it a key enabler.

Q: What are power usage budgets looking like for these workloads?

A: That's an excellent question but unfortunately it is vendor-specific because of the number of variables involved.

Q: Are the AI servers using all the on chip PCIe lanes. Ex: 5th Gen Xeon 80 lane PCIe 5.0 per CPU? I would guess the servers have 4 or 8 CPUs. So 8 x 80 lanes = 640 PCIe lanes per AI server. 

A: Short answer is no. No system uses 100% of the lanes for one type of workload. Usually there is a dedicated allocation for specific workloads that is a subset of the total available bandwidth. Efficiency of bandwidth is another challenge that is being tackled by both vendors and industry organizations.

Q: Is your example of GPU / CPU memory transfers a use for CXL? 

A: It certainly can be.

Q: Are any of the major storage OEMs using 24G SAS on the back-end?

A: Yes, both Dell and HPE offer AI targeted servers with SAS and NVMe storage options, among other options.

Q: Question for Patrick: There is a group of industry veterans who think smaller, hyper-focused, and hyper-specialized AI models will add more value for enterprises than large, general purpose, ever-hallucinating ChatGPT-style models. What are your thoughts on this? If scenario pans out, wouldn't "AI servers" be overbuilt? The enterprise arrays, along with advancements in Computational Storage, would lean towards all-flash or flash+disk storage setup?

A: As AI gets better, it ends up becoming more useful, and the overall demand goes up. For example, is training and operating humanoid robots a value for a manufacturing enterprise? If so, we do not have the compute to do that at scale yet. Likewise, functions like finance, procurement, and others should eventually all be automated. If we need many specialized smaller models, then they still need to be trained, customized, and inference run on an ongoing basis. Re-training on new data or techniques will continue. New application spaces will open up. Also, (retrieval-augmented generative (RAG) is able to be incorporated in open-source solutions and that cuts down on hallucinations.

Overall, this is an area where, when you think of the scope of what needs to be done to achieve the automation goals, there is nowhere near enough compute for either training or inference. Even once something is automated, then the next question is how a company gets a competitive advantage by doing something better that will require more work. If we look back in 10 years, it is unlikely everything will be solved, but I imagine trying to explain to my son in 15 years how everyone had to drive themselves “back in my day.” There are folks that deny this is going to happen today, but I have also driven either by Waymo or Tesla FSD 30 minutes or more every day I am home for the last six months. The inhibitor to adoption is regulatory at this point.

I think disk is still going to be dominant for lower-cost storage when the main metric is $/PB. Flash is needed to feed the big AI systems, so it is really a question of whether the performance of flash overcomes the price delta. Also, the larger capacity SSDs can save cost not just in terms of $/TB of the media but also in the connectivity costs. computational storage removes the need to move data, and so there is a decent chance we will see that not just in the persistent layer but also in the memory layer.

Q: Where do you see FC in the AI infrastructure?

A: Fibre Channel isn’t particularly relevant to this discussion. It has a role in networking, but it's not specific to AI infrastructure.

Q: Is the interface going to change from 24G SAS to 24G+ SAS? Is 29pin counts still SAS?

A: Both 24G SAS and 24G+ SAS operate on the SAS-4 physical layer, with no changes in the core interface. However, 24G+ SAS introduces new connector options. The existing 24G SAS connectors are fully compatible, and a new internal connector (SFF-TA-1016) has been introduced for 24G+ SAS, which is also backward compatible with 24G SAS.

 

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Unlocking CXL's Potential Q&A

SNIA CMS Community

Apr 22, 2025

title of post

Compute Express Link® (CXL) is a groundbreaking technology that expands server memory beyond established limits and boosts bandwidth. CXL enables seamless memory sharing, reduces costs by optimizing resource utilization, and supports different memory types. In our Unlocking CXL’s Potential webinar, our speakers Arthur Sainio, SNIA Persistent Memory Special Interest Group Co-Chair, Jim Handy of Objective Analysis, Mahesh Natu of the CXL Consortium, and Torry Steed of SMART Modular Technologies discussed how CXL is transforming computing systems with its economic and performance benefits, and explored its future impact across various market segments

You can learn more about CXL development by attending CXL DevCon, April 29-30 2025 in Santa Clara CA. And as our webinar discussed, access and program CXL memory modules located in the SNIA Innovation Center with the training materials provided in the virtual SNIA Programming Workshop and Hackathon at www.snia.org/pmhackathon.

The audience was highly engaged and asked many interesting questions. Our Q&A takes care of the answers!  Feel free to reach out to us at askcms@snia.org if you have more questions. 

Q: How will the connections of a CXL switch be made physically possible with multiple hosts and multiple endpoints in real time?

A: It’s similar to PCI Express, in that you could have switches that have multiple upstream ports that connect to multiple CPUs, and they can have multiple downstream ports that connect to multiple devices. It's really a well-known thing that PCIe has championed so we don't see any challenges making those connections in the architecture.

Q: Is there standards-based work being done for CXL/PCIe over co-packaged optics for disaggregated computing? 

A: That’s a really good question.  There is work being done in PCI Express for transporting PCI over Optics (optical interface) so once that happens we will probably just leverage that. CXL leverages all of the things that PCIe does in terms of the physical layer or the form factor when possible so I expect that will happen as that is where we are heading to. 

Q: Will Trusted Security Protocol (TSP) be used with Intel Trust Domain Extensions (TDX) and AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP) with TSP? 

A: Both technologies work with TSP. We have had great participation from Intel and AMD when defining TSP. What TSP does is define the interface between the CPU aspect of the TDX and the device. So that's the piece that is not part of the CPU architecture – the CPU definition - because it's outside of that and that's the piece that the TSP builds. Device vendors can build devices that then will follow the TSP specification and will be compatible with both of these technologies. There are other CPU vendors which have similar confidential compute technologies, and I think they can also be compatible with TSP. 

Q: What is TDISP and what is the difference from TSP? 

A: TDISP, or TEE Device Interface Security Protocolwas developed by PCI Express. Again, it solves the same problem for PCI Express meaning it will allow technologies like TDX to inspect a PCIe device, verify the device is in healthy good condition, and bring it into the test boundary of the TD. TSP does something similar for CXL devices. Obviously, with CXL being coherent it's a different problem to solve but I think we have solved the problem, and it's ready to be deployed.

Q: Why are we not seeing CXL Type-1 or -2 adoption by the industry?

A:  We think that it is coming - it's just not here yet. The big interest initially has just been Type-3 which is pure memory expansion but we are starting to see storage eventually moving to CXL as well. There's definite benefits there, and then I think we are starting to see memory with processing as well so it's coming in a next wave of CXL adoption.

The whole ecosystem has to come together and CXL is actually pretty new. We’re sure that there's an awful lot of work being done that has not been announced from mostly hardware manufacturers but also there needs to be an awful lot of software support to make everything fall into place. All of that comes together slowly and that forecast shown in the webinar actually starts with very modest growth simply because the rest of the support network for CXL needs to be put together before CXL can really take off.

Q: We see in-memory databases (IMDBs) as a big use case for CXL. But we have not seen any announcements from SAP HANA, Oracle or MS-SQL, or anybody adopting CXL with in-memory databases. Why is that? 

A: We have not seen anything specific to date.  We believe SAP HANA has published a paper and may have some work going on.  See https://www.vldb.org/pvldb/vol17/p3827-ahn.pdf. IMDBs would benefit from more capacity as they would like as much memory capacity as possible, so CXL is definitely something they would benefit from.

Q: Do we expect GPUs to support CXL? Without that, the AI use case seems highly limited.

A:  We don't really expect GPUs to support CXL and talking with CXL memory directly. Possibly once it's fully disaggregated it's possible you could have a layer where it is sharing information there. It’s more about if memory expansion for the system itself aids in AI use cases. We are seeing evidence that it does but how much that plays out we have to see.

We  haven't really spoken with GPU vendors about this but the understanding is that one of the benefits of CXL is that it makes it easy to fill either the Graphics Double Data Rate (GDDR) that's on the GPU board or to bring in data that can go into the High Bandwidth Memory (HBM) so it seems like there'd be an opportunity for that even if it's not something people are speaking about now.

There are two uses we see right now in GPU. The first one is when they want more memory.  They could use what we call the unordered IO feature of CXL to go over and reach right into CXL Type-3 memory and therefore get more memory expansion. The second use case is the GPU actually using CXL coherency to communicate with the CPU so they are cache coherent going into the CPU and they can quickly exchange data back and forth.  Again both require heavy lifting - lots of software enabling - but I I think those use cases do exist. It just takes effort to go enable those and get the benefits.

Finally, we hope you will join us for more SNIA webinars. Visit https://www.snia.org/webinars for the complete list of scheduled and on-demand webinars.  And check out the SNIA Educational Library for great content on CXL, memory, and much more.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

SC25

webinar

Library Content Type

Technology Focus

Library Release Date

Focus Areas

Abstract

The International Conference for High Performance Computing, Networking, Storage, and Analysis

2025 OCP Global Summit

webinar

Library Content Type

Technology Focus

Library Release Date

Focus Areas

Abstract

The OCP Summit is the premier event uniting the most forward-thinking minds in open IT Ecosystem development. The Summit presents a unique platform for our Community from around the globe to share their insights, foster partnerships and showcase cutting-edge advancements in open hardware and software.

Cloud Object Storage Incompatibilities Q&A

Michael Hoard

Apr 21, 2025

title of post

The SNIA Cloud Storage Technologies (CST) community hosted a live webinar, “Building Community to Tackle Cloud Object Storage Incompatibilities,” where we highlighted how a multi-vendor group of industry innovators came together to address incompatibility issues between various object storage implementations that challenge many organizations.  The webinar, along with presentation slides, are available on-demand here.   

Webinar panelists included engineers from Dell, Google, Hammerspace, IBM, Vast, and Versity Software, all of whom shared their Plugfest experiences, and the audience asked several intriguing questions.  This is a compilation of responses from our guest speakers; thanks to all for their insights and contributions. 

Q. Will you have more SNIA Cloud Object Storage (COS) Plugfests in 2025? 

A. Yes, to participate in the next SNIA COS Plugfest, April 28-30, 2025, (co-located with One Day Regional SDC in Denver CO on April 30th), please pre-register at SNIA Cloud Object Storage Plugfest.  Our second SNIA COS Plugfest for 2025 will be hosted during SDC’25, Sept 15-17, 2025, in Santa Clara, CA.  If you have questions, please contact us at: askcloudplugfest@snia.org.

Note: A Plugfest is a collaborative developer event where industry experts come together, test their cloud object storage solutions, find problems, and fix them. SNIA takes care of the space and tools needed, and everyone agrees to keep issues resolved secret under NDA.

Q. Please provide examples of how end-user customers may be impacted due to unsupported API calls and unexpected behavior? 

A. There are many different ways interoperability issues are introduced that can impact user experience, such as introducing new features or functionality. Third-party object storage server implementations are always a bit behind on the latest changes, so from an application developers’ perspective, it's not always obvious what features are new. Unfortunately, with new changes, even basic functionality can break, like CRUDs (Create, Read, Update, and Delete) or other essentials such as request signing. 

An example was a feature which introduced new methods for validating data integrity with support for additional checksum algorithms. By changing behavior of SDK clients, previous stable workflows halted, with clients reporting 500 errors.  This was a more friendly example, because the breakage was fairly obvious. 

We've also seen this kind of thing happen when a third-party implementation ignores unrecognized HTTP request headers, which has been the default behavior for unrecognized HTTP headers since HTTP 1.0 (1996). One more non-obvious example of how introduction of new features (along with version skew) may impact customers is the introduction of headers that implement conditional writes, which are designed to prevent modification or deletion of objects if they do or do not exist. As an example, if a client were to enable a newly introduced conditional write header, which was not previously implemented by a third-party implementation, users could potentially experience unexpected behavior. 

This points to the need for more collaboration, where developers on both sides of the wire (end-to-end) ensure successful introduction of features, as well as explore how flexible their applications are to configure around feature changes or explore how to bake some of that flexibility into their application.  This is why it is critical to engage all developer stakeholders in open collaborative developer discussions. 

Q. Is there anything that end users should be looking out for?

A. Yes. This is why API versioning is so important when you encounter things like new headers and new checksums. Ideally, you should be able to look at the version of any widely used protocol and identify whether or not it’s expected to have that feature, or not. This has historically been a fundamental expectation of user-friendly protocols – they should mandate certain features and certain versions of the API to ensure basic functionality remains consistent and stable. It’s one method that helps address unexpected behavior while building consensus around interoperability.   

However, the current state of development allows different vendors to choose various API options in their respective implementations (these may not always match or overlap sufficiently to attain proper compatibility support). This allows, or even creates, situations where each implementation may work well within its own limited ecosystem, yet unexpected results may occur when configuring a variety of implementations within a new configuration or end-to-end solution. 

End users expect that everything should just work. They may not even be aware of the backing storage implementation or vendor. This is the primary motivation why organizations are eager to participate in the SNIA Cloud Object Storage Plugfest, not only to proactively broaden the set of their compatibility testing (to find bugs before customers find them) but also to gain agreement on industry best practices. 

Note: up until the formation of this SNIA Cloud Object Storage Plugfest community, there had been no vendor-neutral community available to help define an accepted set of protocol options. Until now, it had been up to separate developer teams to choose independently, and that’s exactly the issue we are resolving. We are organizing a method to help developer teams get on the same page at the same time.   

Q. My organization mandates seamless data portability between cloud instances. What do I need to do to make sure I am not locked into one solution?   

A. The goal of the SNIA Cloud Object Storage Plugfest is to bring the multi-vendor community together to help organizations ensure data portability. Our team recommends customers and end users encourage their cloud object storage providers to take part in this SNIA Cloud Object Storage Plugfest effort, to make sure products and services have gone through rigorous interoperability efforts and the hard work of testing against each other. 

Q. Amazon and Microsoft were notably missing in this webinar, do they plan to join this effort in the future? 

A. Microsoft actively participated in the inaugural SNIA Cloud Object Storage Plugfest in September at SDC’24. In fact, Microsoft brought a team including protocol experts who contributed to the Plugfest bug assessment and the Birds of a Feather (BOF) session, where we gathered insight to formulate next steps for the industry.  At the beginning of the Plugfest, Microsoft and Plugfest contributors quickly compared notes to identify trouble spots. Most everyone in the room knew right away what to look out for. There was a quick consensus on where to test, and the team invested time where needed. Microsoft was an integral part of this. 

AWS has been invited. We are continuing to welcome and encourage them to join the SNIA Cloud Object Storage Plugfest discussion.

Q. Ultimately, doesn’t AWS dictate S3?  If a consortium of vendors represented here added extensions to the API, or new self-describing API, wouldn’t this break things assuming AWS is the genesis of S3 API going forward? 

A. Yes, AWS S3 is a proprietary API for Cloud Object Storage, and any non-AWS implementation of S3 must conform with AWS S3 API and SDKs in order to interoperate. 

A similar situation existed in the late ‘90s, with Microsoft and the SMB (CIFS) protocol. At that time, Microsoft owned the client, server, SMB protocol, and dictated everything that happened.  Yet, an evolution occurred when Microsoft worked with SMB developers, including both proprietary and open source implementations. In 2008, Microsoft sponsored the SNIA CIFS/SMB Plugfest organized / hosted by SNIA as a trusted third-party storage industry association.  This event is still ongoing (now called SNIA SMB Interoperability Lab or “IOL”) and has been proven as a model for timely information exchange among Microsoft and the global ecosystem of SMB developers.  Note, Microsoft continues to own their IP, drives the pace and direction of its SMB protocol roadmap, including changes and release timeframes. 

We anticipate a similar working arrangement may be possible between SNIA and AWS (and others) as we expand community participation focused on multi-vendor, heterogeneous interoperability.   

Thanks to all the SNIA Cloud Object Storage Plugfest team for your time, effort and insights, 

Michael Hoard, SNIA, Chair for Cloud Storage Technologies (CST) community

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

SNIA: Experts on Data Explained

Richelle Ahlvers

Apr 14, 2025

title of post

Everybody in tech knows that change is constant and advancements happen really fast! Think back just five years—how many of your projects have transformed, morphed, or even disappeared? The same holds true for SNIA. For the past 25 years, SNIA has continuously adapted to the shifting tech landscape.  

Historically known as the Storage Networking Industry Association, storage has been at the heart of our mission. However, as the explosion of data accelerates, SNIA’s scope has expanded to encompass all technologies related to data. In fact, our name now is just “SNIA.” We no longer spell out the acronym.

Last year, we redefined our mission to highlight our continued evolution and strategic vision to better reflect our 2,000 plus members’ expertise and projects. “SNIA: Experts on Data” highlights our broader, data-centric approach, covering acceleration, computation, and more. We have segmented our work into six data-focus areas: Accelerate, Protect, Optimize Infrastructure, Store, Transport and Format. 

Do some of these areas overlap? Yes. Does SNIA still care about Storage? Definitely.

For a deeper dive, we’ve launched a "Data Focus" podcast series where SNIA leaders break down each focus area, share real-world applications, and discuss the ongoing work driving the organization. I encourage you to listen or watch these Data Focus podcasts here on the SNIA website

If you’re passionate about data and technology, we invite you to join us. SNIA offers flexible membership options, allowing you to contribute to existing initiatives or bring new ideas to life. Be part of the innovation—help us shape the future of SNIA!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erik Smith

Apr 9, 2025

title of post

Have you ever wondered how RDMA (Remote Direct Memory Access) actually works? You’re not alone, that’s why the SNIA Data, Storage & Networking Community (DSN) hosted a live webinar, “Everything You Wanted to Know About RDMA But Were Too Proud to Ask” where our expert presenters, Michal Kalderon and Rohan Mehta explained how RDMA works and the essential role it plays for AI/ML workloads due to its ability to provide high-speed, low-latency data transfer. The presentation is on demand along with the webinar slides in the SNIA Educational Library.

The live audience was not “too proud” to ask questions, and our speakers have graciously answered all of them here. 

Q: Does the DMA chip ever reside on the newer CPUs or are they a separate chip?

A: Early system designs used an external DMA Controller. Modern systems have moved to an integrated DMA controller design. There is a lot of information about the evolution of the DMA Controller available online. The DMA Wikipedia article on this topic is a good place to start.   

Q: Slide 51: For RoCEv2, are you routing within the rack, across the data center, or off site as well?

A:  RoCEv2 operates over Layer 3 networks, and typically requires a lossless network environment achieved through mechanisms like PFC, DCQCN, and additional congestion control methods discussed in the webinar. These mechanisms are easier to implement and manage within controlled network domains, such as those found in data centers. While theoretically, RoCEv2 can span multiple data centers, it is not commonly done due to the complexity of maintaining lossless conditions across such distances. 

Q: You said that WQEs have opcodes. What do they specify and how are they defined?

 A: The WQE opcodes are the actual RDMA operations referred to in slide 16, 30
SEND, RECV, WRITE, READ, ATOMIC. For each one of these operations there are additional fields that can be set. 

Q: is there a mistake in slide 27? QP-3 on the receive side (should they be labeled QP-1, QP-2 and QP3) or am I misunderstanding something?

A: Correct, good catch. The updated deck is here.

Q: Is the latency deterministic after connection?

A: Similar to all network protocols, RDMA is subject to factors such as network congestion, interfering data packets, and other network conditions that can affect latency.

Q: How does the buffering works? If a server sends data and the client is unable to receive all the data due to buffer size limitations?

A: We will split the answer for the two groups of RDMA operations.

  • (a) Channel Semantics: Send and Recv. In this case, the application must make sure that the receiver has posted an RQ buffer before performing the send operation. This is typically done by the client side posting RQ buffers before initiating a connection request. If the send arrives and there is no RQ buffer posted, the packets are dropped and a RNR NAK message is sent to the sender. There is a configurable parameter for the QP called, rnr_retry_cnt, which specifies how many times the RNIC should try resending messages if it gets and RNR NAK.
  • (b) For Memory semantics, Read, Write, Atomic, this cannot occur since the data is placed on pre-registered memory in a specific location. 

Q: How/where are the packet headers handled?

A: The packet headers are handled by the R-NIC (RDMA NIC). 

Q: How Object (S3) over RDMA is implemented at high level, does it still involve HTTP/S?

A: There are proprietary solutions that provide Object (S3) over RDMA, but there is currently no industry standard available that could be used to create non-vendor specific implementations that would be interoperable with one another.    

Q: How much packet per second transport for single core x86 vs ARM Server single core over TCP/IP?

A: When measuring RDMA performance, you measure messages per second and not packets-per-second as opposed to TCP, you don’t have per-packet processing in the host. The performance depends more on the R-NIC than the host core, as it bypassed CPU processing. If you’d like a performance comparison between RDMA (RoCE) and TCP, please refer to the “NVMe-oF Looking Beyond Performance Hero Numbers” webinar.

Q: Could you clarify the reason for higher latency using interrupt method?

A: It depends upon the mode of operation:

Polling Mode: In polling mode, the CPU continuously checks (or "polls") the completion queue for new events. This constant checking eliminates the delay associated with waiting for an interrupt signal, leading to faster detection and handling of events. However, this comes at the cost of higher CPU utilization since the CPU is always active, even when there are no events to process.

Interrupt Mode: In interrupt mode, the CPU is notified of new events via interrupts. When an event occurs, an interrupt signal is sent to the CPU, which then stops its current task to handle the event. This method is more efficient in terms of CPU usage because the CPU can perform other tasks while waiting for events. However, the process of generating, delivering, and handling interrupts introduces additional latency compared to the immediate response of polling

Q: Slide 58: Does RDMA complement or compete with CXL?

A: They are not directly related. RDMA is a protocol used to perform Remote DMA operations (i.e., over a network of some kind) and CXL is a used to provide high-speed, coherent communication within a single system. CXL.mem allows devices to access memory directly within a single system or a small tightly coupled group of systems. If this question was specific to DMA, as opposed to RDMA, the answer would be slightly different. 

Q: The RDMA demonstration had MTU size on the RDMA NIC set to 1K. Does RDMA traffic benefit from setting the MTU size to a larger setting (3k-9k MTU size) or is that really dependent on the amount of traffic the RoCE application generates over the RDMA NIC?

A: RDMA traffic, similar to other protocols like TCP, can benefit from setting the MTU size to a larger setting. It can reduce packet processing overhead and improve throughput. 

Q: When app send read/write request, how app gets remote site RKEY info? Q2 - Is it possible to tweak LKEY pointing to same buffer for debugging any memory related issue? Fairly new to topic so apologies in advance if any query doesn't make sense.

A: The RKEYs can be exchanged using the channel semantics, where SEND/RECV are used. In this case there is no need for an r-key as the message will arrive to the first buffer posted in the RQ buffer on the peer.  LKEY refers to local memory. Every registered memory LKEY and RKEY will point to same location. The LKEY is used by the local RNIC to access the memory and RKEY will be provided to the remote application to access the memory. 

Q: What does SGE stand for?

A: Scatter Gather Element. An element inside an SGL – Scatter Gather List. Used to reference non-contiguous memory.

Thanks to our audience for all these great questions. We encourage you to join us at future SNIA DSN webinars. Follow us @SNIA and on LinkedIn for upcoming webinars. This webinar was part of the “Everything You Wanted to Know But Were Too Proud to Ask” SNIA webinar series. If you found the information in this RDMA webinar helpful, I encourage you to check out the many other ones we have produced. They are all available here on the SNIAVideo YouTube Channel.   

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Community Driven S3 Compatibility Testing

webinar

Library Content Type

Technology Focus

Library Release Date

Focus Areas

Abstract

SNIA's Cloud Object Storage community organized and hosted the industry first open multi-vendor compatibility testing (Plugfest) at SDC’24, in Santa Clara, CA. Most of the participants focused on their S3 server and client implementations. Community driven testing revealed significant areas to collaborate, including ambiguities among protocol options, access control mechanisms, missing or incorrect response headers, unsupported API calls and unexpected behavior (as well as sharing best practices and fixes).

Subscribe to