Blog

“Year of the Summit” Kicks Off with Live and Virtual Events

“Year of the Summit” Kicks Off with Live and Virtual Events

Mar 24, 2023

For 11 years, SNIA Compute, Memory and Storage Initiative (CMSI) has presented a Summit featuring industry leaders speaking on the key topics of the day. In the early years, it was persistent memory-focused, educating audiences on the benefits and uses of persistent memory. In 2020 it expanded to a Persistent Memory+Computational Storage Summit, examining that new technology, its architecture, and use cases. Now in 2023, the Summit is expanding again to focus on compute, memory, and storage. In fact, we’re calling 2023 the Year of the Summit – a year to get back to meeting in person and offering a variety of ways to listen to leaders, learn about technology, and network to discuss innovations, challenges, solutions, and futures. We’re delighted that our first event of the Year of the Summit is a networking event at MemCon, taking place March 28-29 at the Computer History Museum in Mountain View CA. At MemCon, SNIA CMSI member and IEEE President elect Tom Coughlin of Coughlin Associates will moderate a panel discussion on Compute, Memory, and Storage Technology Trends for the Application Developer. Panel members Debendra Das Sharma of Intel and the CXL

Consortium, David McIntyre of Samsung and the SNIA Board of Directors, Arthur Sainio of SMART Modular and the SNIA Persistent Memory Special Interest Group, and Arvind Jaganath of VMware and SNIA CMSI will examine how applications and solutions available today offer ways to address enterprise and cloud provider challenges – and they’ll provide a look to the future. SNIA leaders will be on hand to discuss work in computational storage, smart data acceleration interface (SDXI), SSD form factor advances, and persistent memory trends. Share a libation or two at the SNIA hosted networking reception on Tuesday evening, March 28. This inaugural MemCon event is perfect to start the conversation, as it focuses on the intersection between systems design, memory innovation (emerging memories, storage & CXL) and other enabling technologies. SNIA colleagues and friends can register for MemCon with a 15% discount using code SNIA15. April 2023 Networking! We will continue the Year with a newly expanded SNIA Compute+Memory+Storage Summit coming up April 11-12 as a virtual event. Complimentary registration is now open for a stellar lineup of speakers, including Stephen Bates of Huawei, Debendra Das Sharma of Universal Chiplet Interconnect Express

, Jim Handy of Objective Analysis, Shyam Iyer of Dell, Bill Martin of Samsung, Jake Oshins of Microsoft, Andy Rudoff of Intel, Andy Walls of IBM, and Steven Yuan of StorageX.

Summit topics include Memory’s Headed for Change, High Performance Data Analytics, CXL 3.0, Detecting Ransomware, Meeting Scaling Challenges, Open Standards for Innovation at the Package Level, and Standardizing Memory to Memory Data Movement. Great panel discussions are on tap as well. Kurt Lender of the CXL Consortium will lead a discussion on Exploring the CXL Device Ecosystem and Usage Models, Dave Eggleston of Microchip will lead a panel with Samsung and SMART Modular on Persistent Memory Trends, and Cameron Brett of KIOXIA will lead a SSD Form Factors Update. More details at www.snia.org/cms-summit. Later in 2023… Opportunities for networking will continue throughout 2023. We look forward to seeing you at the SmartNIC Summit (June 13-15), Flash Memory Summit (August 8-10), SNIA Storage Developer Conference (September 18-21), OCP Global Summit (October 17-19), and SC23 (November 12-17). Details on SNIA participation coming soon! The post “Year of the Summit” Kicks Off with Live and Virtual Events first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Case Studies Computational Storage Solid State Storage Standards

Blog

Digital Twins Q&A

Digital Twins Q&A

Michael Hoard

Mar 9, 2023

A digital twin (DT) is a virtual representation of an object, system or process that spans its lifecycle, is updated from real-time data, and uses simulation, machine learning and reasoning to help decision-making. Digital twins can be used to help answer what-if AI-analytics questions, yield insights on business objectives and make recommendations on how to control or improve outcomes. It's a fascinating technology that the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our live webcast “Journey to the Center of Massive Data: Digital Twins.” If you missed the presentation, you can watch it on-demand and access a PDF of the slides at the SNIA Educational Library. Our audience asked several interesting questions which are answered here in this blog. Q. Will a digital twin make the physical twin more or less secure? A. It depends on the implementation. If DTs are developed with security in mind, a DT can help augment the physical twin. Example, if the physical and digital twins are connected via an encrypted tunnel that carries all the control, management, and configuration traffic, then a firmware update of a simple sensor or actuator can include multi-factor authentication of the admin or strong authentication of the control application via features running in the DT, which augments the constrained environment of the physical twin. However, because DTs are usually hosted on systems that are connected to the internet, ill-protected servers could expose a physical twin to a remote intruder. Therefore, security must be designed from the start. Q. What are some of the challenges of deploying digital twins? A. Without AI frameworks and real-time interconnected pipelines in place digital twins’ value is limited. Q. How do you see digital twins evolving in the future? A. Here are a series of evolutionary steps:

From Discrete DT (for both pre- and post-production), followed by composite DT (e.g assembly line, transportation systems), to Organization DT (e.g. supply chains, political parties).
From pre-production simulation, to operational dashboards of current state with human decisions and control, to autonomous limited control functions which ultimately eliminate the need for individual device manager SW separate from the DT.
In parallel, 2D DT content displayed on smartphones, tablets, PCs, moving to 3D rendered content on the same, moving selectively to wearables (AR/VR) as the wearable market matures leading to visualized live data that can be manipulated by voice and gesture.
Over the next 10 years, I believe DTs become the de facto Graphic User Interface for machines, buildings, etc. in addition to the GUI for consumer and commercial process management.

Q. Can you expand on your example of data ingestion at the edge please? Are you referring to data capture for transfer to a data center or actual edge data capture and processing for digital twin. If the latter, what use cases might benefit? A. Where DTs are hosted and where AI processes are computed, like inference or training on time-series data, don’t have to occur in the same server or even the same location. Nevertheless, depending upon the expected time-to-action and time-to-insight, plus how much data needs to be processed and the cost of moving that data, will dictate where digital twins are placed and how they are integrated within the control path and data path. Example, a high-speed robotic arm that must stop if a human puts their hand in the wrong space, will likely have an attached or integrated smart camera which is capable of identifying (inferring) a foreign object. It will stop itself and an associated DT will receive notice of an event after the fact. A digital twin of the entire assembly line may learn of the event from the robotic arm’s DT and inject control commands to the rest of the assembly line to gracefully slow down or stop. Both DT of the discrete robotic arm and the composite DT of the entire assembly are likely executing on compute infrastructure on the premises in order to react quickly. Whereas, the “what if” capabilities of both types of DTs may run in the cloud or local data center as the optional simulation capability of the DT are not subjected to real or near real-time round-trip time-to-action constraints and may require more compute and storage capacity than is locally available. The point is the “Edge” is a key part of the calculus to determine where DTs operate. Meaning, time-actionable-insights, cost of data movement, governance restrictions of data movement, and the availability / cost of compute and store infrastructure, plus access to Data Scientists, IT professionals, and AI frameworks is increasingly driving more and more automation processing to the “Edge” and its natural for DTs to follow the data. Q. Isn’t Google maps also an example of a digital twin (especially when we use it to drive based on our directions we input and start driving based on its inputs)? A. Good question! It is a digital representation of physical process (a route to a destination) that ingests data from sensor (other vehicles whose operators are using Google Maps driving instructions along some portion of the route.) So, yes. DTs are digital representations of physical things, processes or organizations that share data. But Google maps is an interesting example of a self-organizing composite DT, meaning lots of users acting both sensors (aka discrete DTs) and selective digital viewers of the behavior of many physical cars moving through a shared space. Q. You brought up an interesting subject around regulations and compliance. Considering that some constructions would require approvals from regulatory authorities, how would a digital twin (especially when we have pics that re-construct / re-model soft copies of the blueprints based on modifications identified through the 14-1500 pics) comply to regulatory requirements? A. Some safety regulations in various regions of the world apply to processes. E.g Worker safety in factories. Time to certify is very slow as lots of documentation is compiled and analyzed by humans. DTs could use live data to accelerate documentation, simulation or replays of real data within digital twins and could potentially enable self-certification of new or reconfigured process, assuming that regulatory bodies evolve. Q. Digital twin captures the state of its partner in real time. What happens to aging data? Do we need to store data indefinitely? A, Data retention can shrink as DTs and AI frameworks evolve to perform ongoing distributed AI model refreshing. As AI models refresh more dynamically, the increasingly fewer and fewer anomalous events become the gold used for the next model refresh. In short, DTs should help reduce how much data is retained. Part of what DT can be built to do is to filter out compliance data for long-term archival. Q. Do we not run a high risk when model and reality do not align? What if we trust the twin too much? A. Your question targets more general challenges of AI. There is a small but growing cottage industry evolving in parallel with DT and AI. Analysts refer to it as Explainable AI, whose intent is to explain to mere mortals how and why an AI model results in the predictions and decisions it makes. Your concern is valid, and for this reason we should expect that humans will likely be in the control loop wherein the DT doesn’t act autonomically for non-real-time control functions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Blog

Digital Twins Q&A

Digital Twins Q&A

Michael Hoard

Mar 9, 2023

A digital twin (DT) is a virtual representation of an object, system or process that spans its lifecycle, is updated from real-time data, and uses simulation, machine learning and reasoning to help decision-making. Digital twins can be used to help answer what-if AI-analytics questions, yield insights on business objectives and make recommendations on how to control or improve outcomes. It’s a fascinating technology that the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our live webcast “Journey to the Center of Massive Data: Digital Twins.” If you missed the presentation, you can watch it on-demand and access a PDF of the slides at the SNIA Educational Library. Our audience asked several interesting questions which are answered here in this blog. Q. Will a digital twin make the physical twin more or less secure? A. It depends on the implementation. If DTs are developed with security in mind, a DT can help augment the physical twin. Example, if the physical and digital twins are connected via an encrypted tunnel that carries all the control, management, and configuration traffic, then a firmware update of a simple sensor or actuator can include multi-factor authentication of the admin or strong authentication of the control application via features running in the DT, which augments the constrained environment of the physical twin. However, because DTs are usually hosted on systems that are connected to the internet, ill-protected servers could expose a physical twin to a remote intruder. Therefore, security must be designed from the start. Q. What are some of the challenges of deploying digital twins? A. Without AI frameworks and real-time interconnected pipelines in place digital twins’ value is limited. Q. How do you see digital twins evolving in the future? A. Here are a series of evolutionary steps:

From Discrete DT (for both pre- and post-production), followed by composite DT (e.g assembly line, transportation systems), to Organization DT (e.g. supply chains, political parties).
From pre-production simulation, to operational dashboards of current state with human decisions and control, to autonomous limited control functions which ultimately eliminate the need for individual device manager SW separate from the DT.
In parallel, 2D DT content displayed on smartphones, tablets, PCs, moving to 3D rendered content on the same, moving selectively to wearables (AR/VR) as the wearable market matures leading to visualized live data that can be manipulated by voice and gesture.
Over the next 10 years, I believe DTs become the de facto Graphic User Interface for machines, buildings, etc. in addition to the GUI for consumer and commercial process management.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Blog

A Q&A on the Open Programmable Infrastructure (OPI) Project

A Q&A on the Open Programmable Infrastructure (OPI) Project

Joseph White

Feb 23, 2023

Last month, the SNIA Networking Storage Forum hosted several experts leading the Open Programmable Infrastructure (OPI) project with a live webcast, “An Introduction to the OPI (Open Programmable Infrastructure) Project.” The project has been created to address a new class of cloud and datacenter infrastructure component. This new infrastructure element, often referred to as Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or xPU as a general term, takes the form of a server hosted PCIe add-in card or on-board chip(s), containing one or more ASIC’s or FPGA’s, usually anchored around a single powerful SoC device. Our OPI experts provided an introduction to the OPI Project and then explained lifecycle provisioning, API, use cases, proof of concept and developer platform. If you missed the live presentation, you can watch it on demand and download a PDF of the slides at the SNIA Educational Library. The attendees at the live session asked several interesting questions. Here are answers to them from our presenters. Q. Are there any plans for OPI to use GraphQL for API definitions since GraphQL has a good development environment, better security, and a well-defined, typed, schema approach? A. GraphQL is a good choice for frontend/backend services with many benefits as stated in the question. These benefits are particularly compelling for data fetching. For OPI for communications between different microservices we still see gRPC as a better choice. gRPC has a strong ecosystem in cloud and K8S systems with fast execution, strong typing, and polygot endpoints. We see gRPC as the best choice for most OPI APIs due to the strong containerized approach and ease building schemas with Protocol Buffers. We do keep alternatives like GraphQL in mind for specific cases. Q. Will OPI add APIs for less common use cases like hypervisor offload, application verification, video streaming, storage virtualization, time synchronization, etc.? A. OPI will continue to add APIs for various use cases including less common ones. The initial focus of the APIs is to address the major areas of networking, storage, security and then expand to address other cases. The API discussions today are already expanding to consider the virtualization (containers, virtual machines, etc.) as a key area to address. Q. Do you communicate with CXL Consortium too? A. While we have not communicated with the Compute Express Link (CXL) Consortium formally. There have been a few conversations with CXL interested parties. We will need to engage in discussions with CXL Consortium like we have with SNIA, DASH, and others. Q. Can you elaborate on the purpose of APIs for AI/ML? A. The DPU solutions contain accelerators and capabilities that can be leveraged by AI/ML type solutions, and we will need to consider what APIs need to be exposed to take advantage of these capabilities. OPI believes there is a set of data movement and co-processor APIs to support DPU incorporation into AI/ML solutions. In keeping with its core mission, OPI is not going to attempt to redefine the existing core AI/ML APIs. We may look at how to incorporate those into DPUs directly as well. Q. Have you considered creating a TEE (Trusted Execution Environment) oriented API? A. This is something that has been considered and is a possibility in the future. There are some different sides to this: 1) OPI itself using TEE on the DPU. This may be interesting, although we’d need a compelling use case. 2) Enabling OPI users to utilize the TEE via a vendor neutral interface. This will likely be interesting, but potentially challenging for DPUs as OPI is considering them. We are currently focused on enabling applications running in containers on DPUs and securing containers via TEE is currently a research area in the industry. For example, there is this project at the “sandbox” maturity level: https://www.cncf.io/projects/confidential-containers/ Q. Will OPI support integration with OCP Caliptra project for ensuring silicon level hardware authentication during boot? Reference: https://siliconangle.com/2022/10/18/open-compute-project-announces-caliptra-new-standard-hardware-root-trust/ A. OPI hasn’t looked at Caliptra yet. As Caliptra matures OPI will follow the industry ecosystem wider direction in this area. We currently follow https://www.dmtf.org/standards/spdm for attestation plus IEEE 802.1AR – Secure Device Identity and https://www.rfc-editor.org/rfc/pdfrfc/rfc8572.txt.pdf for secure device zero touch provisioning and onboarding. Q. When testing NVIDIA DPUs on some server models, the temperature of the DPU was often high because of lack of server cooling resulting in the DPU shutting itself down. First question, is there an open API to read sensors from DPU card itself? Second question, what happens when DPU shuts down, then cools, and comes back to life again? Will the server be notified as per standards and DPU will be usable again? A. Qualified DPU servers from major manufacturers integrate close loop thermals to make sure that cooling is appropriate and temp readout is implemented. If a DPU is used in a non-supported server, you may see the challenges that you experienced with overheating and high temperatures causing DPU shutdowns. Since the server is still in charge of the chassis, PDUs, fans and others, it is the BMCs responsibility to take care of overall server cooling and temperature readouts. There are several different ways to measure temperature, like SMBUS, PLDM and others already widely used with standard NICs, GPUs and other devices. OPI is looking into which is the best specification to adopt for handling temperature readout, DPU reboot, and overall thermal management. OPI is not looking to define any new standards in this area. If you are interested in learning more about DPUs/xPUs, SNIA has covered this topic extensively in the last year or so. You can find all the recent presentations at the SNIAVideo YouTube Channel.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

DPU Open Source DPU

Blog

FAQ on CXL and SDXI

FAQ on CXL and SDXI

David McIntyre

Jan 19, 2023

How are Compute Express Link

(CXL

) and the SNIA Smart Data Accelerator Interface (SDXI) related? It’s a topic we covered in detail at our recent SNIA Networking Storage Forum webcast, “What’s in a Name? Memory Semantics and Data Movement with CXL and SDXI” where our experts, Rita Gupta and Shyam Iyer, introduced both SDXI and CXL, highlighted the benefits of each, discussed data movement needs in a CXL ecosystem and covered SDXI advantages in a CXL interconnect. If you missed the live session, it is available in the SNIA Educational Library along with the presentation slides. The session was highly rated by the live audience who asked several interesting questions. Here are answers to them from our presenters Rita and Shyam. Q. Now that SDXI v1.0 is out, can application implementations use SDXI today? A. Yes. Now that SDXI v1.0 is out, implementations can start building to the v1.0 SNIA standard. If you are looking to influence a future version of the specification, please consider joining the SDXI Technical Working Group (TWG) in SNIA. We are now in the planning process for post v1.0 features so we welcome all new members and implementors to come participate in this new phase of development. Additionally, you can use the SNIA feedback portal to provide your comments. Q. You mentioned SDXI is interconnect-agnostic and yet we are talking about SDXI and a specific interconnect here i.e. CXL. Is SDXI architected to work on CXL? A. SDXI is designed to be interconnect agnostic. It standardizes the memory structures, function setup, control, etc. to make sure that a standardized mover can have an architected global state. It does not preclude an implementation from taking advantage of the features of an underlying interconnect. CXL will be an important instance which is why it was a big part of this presentation. Q. I think you covered it in the talk, but can you highlight some specific advantages for SDXI in a CXL environment and some ways CXL can benefit from an SDXI standardized data mover? A. CXL-enabled architecture expands the targetable System Memory space for an architected memory data mover like SDXI. Also, as I explained, SDXI implementors have a few unique implementation choices in a CXL-based architecture that can further improve/optimize data movement. So, while SDXI is interconnect agnostic, SDXI and CXL can be great buddies :-). With CXL concepts like “shared memory” and “pooled memory,” SDXI can now become a multi-host data mover. This is huge because it eliminates a lot of software stack layers to perform both intra-host and inter-host bulk data transfers. Q. CXL is termed as low latency, what are the latency targets for CXL devices? A. While overall CXL device latency targets may depend on the media, the guidance is to have CXL access latency to be within one NUMA hop. In other words, the CXL memory access should have similar latency to that of remote socket DRAM access. Q. How are SNIA and CXL collaborating on this? A. SNIA and CXL have a marketing alliance agreement that allows SNIA and CXL to work on joint marketing activities such as this webcast to promote collaborative work. In addition, many of the contributing companies are members of both CXL and the SNIA SDXI TWG. This helps in ensuring the two groups stay connected. Q. What is the difference in memory pooling and memory sharing? What are the advantages of either? A. Memory pooling (also referred to as memory disaggregation) is an approach where multiple hosts allocate dedicated memory resources from the pool of CXL memory device(s) dynamically, as needed. The memory resources are allocated to one host at any given time. The technique ensures optimum and efficient usage of expensive memory resources, providing TCO advantage. In a memory sharing usage model, allocated blocks of memory can be used by multiple hosts at the same time. The memory sharing provides optimum usage of memory resources and also provides efficiency on memory allocation and management. Q. CXL is termed as low latency, what are the latency targets for CXL devices? Can SDXI enable the data movement across the CXL devices in peer-to-peer fashion? A. Yes. Indeed. SDXI devices can target all memory regions accessible to the host and among other usage models perform data movement across CXL devices in a peer-to-peer fashion. Of course, this assumes a few implications around platform support, but SDXI is designed for such data movement use cases as well. Q. Trying to look for equivalent terms…can you think of SDXI as what NVMe® is for NVMe-oF and CXL as the underlying transport fabric like TCP? A. There are some similarities, but the use cases are very different and therefore I suspect the implementations would drive the development of these standards very differently. Like NVMe which defines various opcodes to perform storage operations, SDXI defines various opcodes to perform memory operations. And it is also true that SDXI opcodes/descriptors can be used to move data using PCIe and CXL as the I/O interconnect and a future expansion to ethernet based interconnects can be envisioned. Having said that, memory operations have different SLAs, performance characteristics, byte addressability concerns, and ordering requirements among other things. SDXI is enabling a new class of such devices. Q. Is there a limitation on the granularity size of transfer – SDXI is limited to bulk transfers only or does it also address small granular transfers? A. As a standard specification, SDXI allows implementations to process descriptors for data transfer sizes ranging from 1 Byte to 4GB. That said, the software may use size thresholds to determine offloading data transfers via SDXI devices based on implementation quality. Q. Will there be a standard SDXI driver available from SNIA or is each company responsible for building a driver to be compatible with the SDXI compatible hardware they build? A. The SDXI TWG is not developing the common open-source driver because of license considerations in SNIA. The SDXI TWG is beginning to work on a common user-space open-source library for applications. The SDXI spec is enabling the development of a common class level driver by reserving a class code with PCI SIG for PCIe based implementations. The driver implementations are being enabled and influenced with discussions in the SDXI TWG and other forums. Q. Software development is throttled by the availability of standard CXL host platforms. When will those be available and for what versions? A. We cannot comment on specific product/platform availability and would advise to connect with the vendors for the same. There is CXL1.1 based host platform available in the market and publicly announced. Q. Does a PCIe based data mover with an SDXI interface actually DMA data across the PCIe link? If so, isn’t this higher latency and less power efficient than a memcpy operation? A. There is quite a bit of prior art research within academia and industry that indicates that for certain data transfer size thresholds, an offloaded data movement device like an SDXI device can be more performant than employing a CPU thread. While software can employ more CPU threads to do the same operation via memcpy it comes at a cost. By offloading them to SDXI devices, expensive CPU threads can be used for other computational tasks helping improve overall TCO. Certainly, this will depend on implementation quality, but SDXI is enabling such innovations with a standardized framework. Q. Will SDXI impact/change/unify NVMe? A. SDXI is expected to complement the data movement and acceleration needs of systems comprising NVMe devices as well as needs within an NVMe subsystem to improve storage performance. In fact, SNIA has created a subgroup, the “CS+SDXI” subgroup that is comprised of members of SNIA’s Computational Storage TWG and SDXI TWG to think about such kinds of use cases. Many computational storage use cases can be enhanced with a combination of NVMe and SDXI-enabled technologies.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage CXL DPU SDXI SDXI DPU

Blog

An Overview of the Linux Foundation OPI (Open Programmable Infrastructure)

An Overview of the Linux Foundation OPI (Open Programmable Infrastructure)

Joseph White

Jan 12, 2023

A new class of cloud and datacenter infrastructure component is emerging into the marketplace. This new infrastructure element, often referred to as Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or xPU as a general term, takes the form of a server hosted PCIe add-in card or on-board chip(s), containing one or more ASIC’s or FPGA’s, usually anchored around a single powerful SoC device. The Open Programmable Infrastructure (OPI) project has been created to address the configuration, operation, and lifecycle for these devices. It also has the goal of fostering an open software ecosystem for DPUs/IPUs covering edge, datacenter, and cloud use cases. The project intends to delineate what a DPU/IPU is, to define frameworks and architecture for DPU/IPU-based software stacks applicable to any vendors’ hardware solution, to create a rich open-source application ecosystem, to integrate with existing open-source projects aligned to the same vision such as the Linux kernel, IPDK.io, DPDK, DASH, and SPDK to create new APIs for interaction with and between the elements of the DPU/IPU ecosystem:

the DPU/IPU hardware
DPU/IPU hosted applications
the host node
remote provisioning software
remote orchestration software.

Want to learn more? On January 25, 2023 the SNIA Networking Storage Forum is hosting a live webcast “An Introduction to the OPI (Open Programmable Infrastructure) Project” where experts actively leading this initiative will provide an introduction to the OPI project, explain OPI workstream definitions and status, and explain how you can get involved. They’ll dive into:

Lifecycle provisioning
API
Use cases
Proof of Concept
Developer platform

In just 60 minutes this overview should provide you with a solid understanding of what the OPI Project is all about. Register here to join us on January 25^th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

DPU DPU

Blog

Is EDSFF Taking Center Stage? We Answer Your Questions!

Is EDSFF Taking Center Stage? We Answer Your Questions!

SNIA CMS Community

Jan 11, 2023

Enterprise and Data Center Form Factor (EDSFF) technologies have come a long way since our 2020 SNIA CMSI webinar on the topic. While that webinar still provides an outstanding framework for understanding – and SNIA’s popular SSD Form Factors page gives the latest on the E1 and E3 specifications – SNIA Solid State Drive Special Interest Group co-chairs Cameron Brett and Jonmichael Hands joined to provide the latest updates at our live webcast: EDSFF Taking Center Stage in the Data Center. We had some great questions from our live audience, so our experts have taken the time to answer them in this this blog. Q: What does the EDSFF roadmap look like? When will we see PCIe^® Gen5 NVMe, 1.2, 2.0 CXL cx devices? As the form factors come out into the market, we anticipate that there will be feature updates and smaller additions to the existing specifications like SFF TA 1008 and SFF TA 1023. There may also be changes around defining LEDs and stack updates. The EDSFF specifications, however, are mature and we have seen validation and support on the connector and how it works at higher interface speeds. You now have platforms, backplanes, and chassis to support these form factors in the marketplace. Going forward, we may see integration with other device types like GPUs, support of new platforms, and alignment with PCIe Gen 5. Regarding CXL, we see the buzz, and having this form factor be the kind of vehicle for CXL will have a huge momentum. Q: I’m looking for thoughts on recent comments I read about PCIe5 NVMe drives likely needing/benefitting from larger form-factors (like 25mm wide vs 22) for cooling considerations. With mass market price optimizations, what is the likelihood that client compute will need to transition away from existing M.2 (esp 2280) form factors in the coming years and will that be a shared form-factor shared with server compute (as has been the case with 5.25″,3.5″,2.5″ drives)? We are big fans of EDSFF being placed on reference platforms for OEMs and motherboard makers. Enterprise storage support would be advantageous on the desktop. At the recent OCP Global Summit, there was discussion on Gen 5specifications and M.2 and U.2. With the increased demands for power and bandwidth, we think if you want more performance you will need to move to a different form factor, and EDSFF makes sense. Q: On E1.S vs E3.S market dominance, can you refer to their support on dual-port modules? Some traditional storage server designs favor E3.S because of the dual port configuration. More modern storage designs do not rely on dual port modules, and therefore prefer E1.S. Do you agree to this correlation ? How will this affect the predictions on market share? A: There is some confusion about the specification support versus what vendors support and what customers are demanding. The EDSFF specifications share a common pin out and connection specifications. If a manufacturer wishes to support the dual port functionality, they can do so now. Hyperscalers are now using E1.S in compute designs and may use E3 for their high availability enterprise storage requirements. Our webcast showed the forecast from Forward Insights on larger shipments of E3 further out in time, reflecting the transition away from 2.5-inch to E3 as server and storage OEMs transition their backplanes. Q: Have you investigated enabling conduction cooling of E1.S and E3.S to a water cooled cold plate? If not, is it of interest? OCP Global Summit featured a presentation from Intel about immersion cooling with a focus on the sustainability aspect as you can get your power usage effectiveness (PUE) down further by eliminating the fans in system design while increasing cooling. There doesn’t seem to be anything eliminating the use of EDSFF drives for immersion cooling. New CPUs have heat pipes, and new OEM designs have up to 36 drives in a 2U chassis. How do you cool that? Many folks are talking about cooling in the data center, and we’ll just need to wait to see what happens! Thanks again for your interest in SNIA and Enterprise and Data Center SSD Form Factors. We invite you to visit our SSD Form Factor page where we have videos, white papers, and charts explaining the many different SSD sizes and formats in a variety of form factors. The post Is EDSFF Taking Center Stage? We Answer Your Questions! first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Solid State Storage Standards

Blog

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Michael Hoard

Jan 5, 2023

Kubernetes cloud orchestration platforms offer all the flexibility, elasticity, and ease of use — on premises, in a private or public cloud, even at the edge. The flexibility of turning on services when you want them, turning them off when you don’t, is an enticing prospect for developers as well as application deployment teams, but it has not been without its challenges. At our recent SNIA Cloud Storage Technologies Initiative webcast “Kubernetes Trials & Tribulations: Cloud, Data Center, Edge” our experts, Michael St-Jean and Pete Brey, debated both the challenges and advantages of Kubernetes. If you missed the session, it is available on-demand along with the presentation slides. The live audience raised several interesting questions. Here are answers to them from our presenters. Q: Are all these trends coming together? Where will Kubernetes be in the next 1-3 years? A: Adoption rates for workloads like databases, artificial intelligence & machine learning, and data analytics in a container environment are on the rise. These applications are stateful and diverse, so a multi-protocol persistent storage layer built with Kubernetes services is essential. Additionally, Kubernetes-based platforms pave the way for application modernization, but when, and which applications should you move… and how do you do it? There are companies who still have virtual machines in their environment, and maybe they’re deploying Kubernetes on top of VMs, but then some are trying to move to a bare-metal implementation to avoid VMs altogether. Virtual machines are really good in a lot of instances… say for example, for running your existing applications. But there’s a Kubernetes service called KubeVirt that allows you to run those applications in VMs on top of containers, instead of the other way around. This offers a lot of flexibility to those who are adopting a modern application development approach, while still maintaining existing apps. First, you can rehost traditional apps within VMs on top of Kubernetes. You can even refactor existing applications. For example, you can run Windows applications on Windows VMs within the environment taking advantage of the container infrastructure. Then while you are building new apps and microservices, you can begin to rearchitect your integration points across your application workflows. When the time is right, you can rebuild that functionality and retire the old application. Taking this approach is a lot less painful than rearchitecting entire workloads for cloud-native. Q: Is cloud repatriation really a thing? A: There are a lot of perspectives on repatriation from the cloud. Some hardware value-added resellers are of the opinion that it is happening quite a bit. Many of their customers had an initiative to move everything to the cloud. Then the company was merged or acquired and someone looked at the costs, and sure, they moved expenses from CapEx to OpEx, but there were runaway projects with little accountability and expanding costs. So, they started moving everything back from the cloud to the core datacenter. I think those situations do exist, but I also think the perspective is skewed a bit. I believe the reality of the situation is that where applications run is really more workload dependent. We continue to see workloads moving to public clouds, and at the same time, some workloads are being repatriated. Let’s take for example, a workload that may need processor accelerators like GPUs or Deep Learning accelerators for a short period of time. It would make perfect sense to offload some of that work in a public cloud deployment because the analyst or data scientist could run the majority of their model on less expensive hardware and then burst to the cloud for the resources they need when they need them. In this way, the organization saves money by not making capital purchases for resources that will largely remain idle. At the same time, a lot of data is restricted or governed and cannot live outside of a corporate firewall. Many countries around the world even restrict companies within their borders from housing data on servers outside of the country domain. These workloads are clearly being repatriated to a datacenter. Many other factors such as costs and data gravity will also contribute to some workloads being repatriated. Another big trend we see is the proliferation of workloads to the edge. In some cases, these edge deployments are connected and can interact with cloud resources, and in others they are disconnected, either because they don’t have access to a network, or due to security restrictions. The positive thing to note with this ongoing transformation, which includes hybrid and multi-cloud deployments as well as edge computing, is that Kubernetes can offer a common experience across all of these underlying infrastructures. Q: How are traditional hardware vendors reinventing themselves to compete? A: This is something we will continue to see unfold over time, but certainly, as we see Kubernetes platforms starting to take the place of virtual machines, there is a lot of interest in building architectures to support it. That said, right now, hardware vendors are starting to make their bets on what segments to go after. For example, there is a compact mode deployment available built on servers targeted at public sector deployments. There is also an AI Accelerator product built with GPUs. There are specific designs for Telco and multi-access edge computing and validated platforms and validated designs for Kubernetes that incorporate AI and Deep Learning accelerators all running on Kubernetes. While the platform architectures and the target workloads or market segments are really interesting to follow, another emerging trend is for hardware companies to offer a full managed service offering to customers built on Kubernetes. Full-scale hardware providers also have amassed quite a bit of expertise with Kubernetes and they have a complete services arm that can provide managed services, not just for the infrastructure, but for the Kubernetes-based platform as well. What’s more, the sophisticated hardware manufacturers have redesigned their financing options so that customers can purchase the service as a utility, regardless of where the hardware is deployed. I don’t remember where I heard it, but some time ago someone once said “Cloud is not a ‘where,’ Cloud is a ‘how.’” Now, with these service offerings, and the cloud-like experience afforded by Kubernetes, organizations can operationalize their expenses regardless of whether the infrastructure is in a public cloud, on-site, at a remote location, or even at the edge. Q: Where does the data live and how is the data accessed? Could you help parse the meaning of “hybrid cloud” versus “distributed cloud” particularly as it relates to current industry trends? A: Organizations have applications running everywhere today: In the cloud, on-premises, on bare metal servers, and in virtual machines. Many are already using multiple clouds in addition to a private cloud or datacenter. Also, a lot of folks are used to running VMs, and they are trying to figure out if they should just run containers on top of existing virtual machines, or move to bare metal. They wonder if they can move more of their processing to the edge. Really, there’s rarely an either-or scenario. There’s just this huge mix-match of technologies and methodologies that are taking place which is why we term this the hybrid cloud. It is really hybrid in many ways, and the goal is to get to a development and delivery mechanism that provides a cloud-like experience. The term Distributed Cloud Computing generally just encompasses the typical cloud infrastructure categories of public, private, hybrid, and multi-cloud. Q: What workloads are emerging? How are edge computing architectures taking advantage of data in Kubernetes? A: For many organizations, being able to gather and process data closer to data sources in combination with new technologies like Artificial Intelligence/Machine Learning or new immersive applications can help build differentiation. By doing so, organizations can react faster, connect everything, anywhere, and deliver better experiences and business outcomes. They are able to use data derived from sensors, video, devices, and other edge devices to make faster data-driven decisions, deploy latency sensitive applications with the experience users expect—no matter where they are, and keep data within geographical boundaries to meet regulatory requirements on data storage and processing. Alongside these business drivers, many organizations also benefit from edge computing as it helps limit the data that needs to be sent to the cloud for processing, decreasing bandwidth usage and costs. It creates resilient sites that can continue to operate, even if the connection to the core datacenter or cloud is lost. And you can optimize resource usage and costs as only necessary services and functionality are deployed to address a use case or problem. Q: How and why will Kubernetes succeed? What challenges still need to be addressed? A: Looking at the application modernization options you can venture to guess the breakdown of what organizations are doing, i.e. how many are doing rehost, refactor, rearchitect, etc., and what drives those decisions. When we look at the current state of application delivery, most enterprises today have a mix of modern cloud-native apps and legacy apps. Also, a lot of large enterprises have a huge portfolio of existing apps that are built with traditional architectures, and traditional languages (Java or .NET or maybe C++) or even mainframe apps. These are supporting stateful and stateless workloads. In addition, many are building new apps or modernizing some of those existing apps on new architectures (microservices, APIs) with newer languages/frameworks (Spring, Quarkus, Node.js, etc.). We’re also seeing more interest in building in added intelligence through analytics and AI/ML, and even automating workflows through distributed event driven architectures/serverless/functions. So, as folks are modernizing their applications, a lot of questions come up around when and how to transition existing applications, how do they integrate with their business processes, and what development processes and methodologies are they adopting? Are they using an agile or waterfall methodology? Are they ready to adopt CI/CD pipelines and GitOps to operationalize their workflows and create a continuous application lifecycle? Q: Based on slide #12 from this presentation, should we assume that 76% for databases and data cache are larger, stateful container use cases? A: In most cases, it is safe to assume they will be stateful applications that are using databases, but they don’t necessarily have to be large applications. The beauty of cloud-native deployments is that code doesn’t have to be a huge monolithic application. It can be a set of microservices that are coded together, each piece of code being able to address a certain part of the overall workflow for a particular use case. As such, many pieces of code can be small in nature, but use an underlying database to store relational data. Even services like a container registry service or logging and metrics will use an underlying database. For example, a registry service may have an object store of container images, but then have a database that keeps an index and catalog of those images. If you’re looking for more educational information on Kubernetes, please check out the other webcasts we’ve done on this topic in the SNIA Educational Library.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Cloud Storage Cloud Storage

Blog

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Michael Hoard

Jan 5, 2023

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Cloud Storage Cloud Storage

Blog

Programming Frameworks Q&A

Programming Frameworks Q&A

Alex McDonald

Nov 17, 2022

Last month, the SNIA Networking Storage Forum made sense of the “wild west” of programming frameworks, covering xPUs, GPUs and computational storage devices at our live webcast, “You’ve Been Framed! An Overview of xPU, GPU & Computational Storage Programming Frameworks.” It was an excellent overview of what’s happening in this space.

There was a lot to digest, so our stellar panel of experts has taken the time to answer the questions from our live audience in this blog.

Q. Why is it important to have open-source programming frameworks?

A. Open-source frameworks enable community support and partnerships beyond what proprietary frameworks support. In many cases they allow ISVs and end users to write one integration that works with multiple vendors.

Q. Will different accelerators require different frameworks or can one framework eventually cover them all?

A. Different frameworks support accelerator attributes and specific applications. Trying to build a single framework that does the job of all the existing frameworks and covers all possible use case would be extremely complex and time consuming. In the end it would not produce the best results. Having separate frameworks that can co-exist is a more efficient and effective approach. That said, providing a well-defined hardware abstraction layer does complement different programming frameworks and can allow one framework to support different types of accelerators.

Q. Is there a benefit to standardization at the edge?

A. The edge is a universal term that has many different definitions, but in this example can be referred to as a network of end points where data is generated, collected and processed. Standardization helps with developing a common foundation that can be referenced across application domains, and this can make it easier to deploy different types of accelerators at the edge.

Q. Does adding a new programming framework in computational storage help to alleviate current infrastructure bottlenecks?

A. The SNIA Computational Storage API and TP4091 programming framework enables a standard programming approach over proprietary methods that may be vendor limited. The computational storage value proposition significantly reduces resource constraints while the programming framework supports improved resource access at the application layer. By making it easier to deploy computational storage, these frameworks may relieve some types of infrastructure bottlenecks.

Q. Do these programming frameworks typically operate at a low level or high level?

A. They operate at both. The goal of programming frameworks is to operate at the application resource management level with high level command calls that can initiate underlying hardware resources. They typically engage the underlying hardware resources using lower-level APIs or drivers.

Q. How does one determine which framework is best for a particular task?

A. Framework selection should be addressed by which accelerator type is best suited to run the workload. Additionally, when multiple frameworks could apply, the decision on which to use would depend on the implementation details of the workload components. Multiple frameworks have been created and evolve because of this fact. There is not always a single answer to the question. The key idea motivating this webinar was to increase awareness about the frameworks available so that people can answer this question for themselves.

Q. Does using an open-source framework generally give you better or worse performance than using other programming options?

A. There is usually no significant performance difference between open source and proprietary frameworks, however the former is more relatively adaptable and scalable by the interested open-source community. A proprietary framework might offer better performance or access to a few more features, but usually works only with accelerators from one vendor.

Q. I would like to hear more on accelerators to replace vSwitches. How are these different from NPUs?

A. Many of these accelerators include the ability to accelerate a virtual network switch (vSwitch) using purpose-built silicon as one of several tasks they can accelerate, and these accelerators are usually deployed inside a server to accelerate the networking instead of running the vSwitch on the server’s general-purpose CPU. A Network Processing Unit (NPU) is also an accelerator chip with purpose-built silicon but it typically accelerates only networking tasks and is usually deployed inside a switch, router, load balancer or other networking appliance instead of inside a server.

Q. I would have liked to have seen a slide defining GPU and DPU for those new to the technology.

A. SNIA has been working hard to help educate on this topic. A good starting point is our “What is an xPU” definition. There are additional resources on that page including the first webcast we did on this topic “SmartNICs to xPUs: Why is the Use of Accelerators Accelerating.” We encourage you to check them out.

Q. How do computational storage devices (CSD) deal with "data visibility" issues when the drives are abstracted behind a RAID stripe (e.g. RAID0, 5, 6). Is it expected that a CSD will never live behind such an abstraction?

A. The CSD can operate as a standard drive under RAID as well as a drive with a complementary CSP (computational storage processor, re: CS Architecture Spec 1.0). If it is deployed under a RAID controller, then the RAID hardware or software would need to understand the computational capabilities of the CSD in order to take full advantage of them.

Q. Are any of the major OEM storage vendors (NetApp / Dell EMC / HPE / IBM, etc.) currently offering Computational Storage capable arrays?

A. A number of OEMs are offering arrays with compute resources that reside with data. The computational storage initiative that is promoted by SNIA provides a common reference architecture and programming model that may be referenced by developers and end customers.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Computational Storage Networked Storage DPU

Subscribe to

“Year of the Summit” Kicks Off with Live and Virtual Events

Find a similar article by tags

Leave a Reply

Digital Twins Q&A

Find a similar article by tags

Leave a Reply

Digital Twins Q&A

Find a similar article by tags

Leave a Reply

A Q&A on the Open Programmable Infrastructure (OPI) Project

Find a similar article by tags

Leave a Reply

FAQ on CXL and SDXI

Find a similar article by tags

Leave a Reply

An Overview of the Linux Foundation OPI (Open Programmable Infrastructure)

Find a similar article by tags

Leave a Reply

Is EDSFF Taking Center Stage? We Answer Your Questions!

Find a similar article by tags

Leave a Reply

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Find a similar article by tags

Leave a Reply

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Find a similar article by tags

Leave a Reply

Programming Frameworks Q&A

Find a similar article by tags

Leave a Reply