The PCI Express (PCIe) interface is an essential component of AI systems due to its ability to provide high bandwidth and low latency over short distances. It was a timely topic at our recent SNIA Data, Storage & Networking Webinar, “Everything You Wanted to Know About PCIe But Were Too Proud to Ask, where our presenter, Martin Chao, covered an introduction to PCIe, PCIe Enumeration / AER / DPC / Hot-Plug, Non-Transparent Bridge, SR-IOV, PCIe in AI Applications, and a look at what’s new in PCIe Generation 6.
As we hoped, the live audience was not “too proud” to ask questions, and Martin has kindly answered them here in this blog.
Q: Are there any efforts to standardize MR-IOV for device sharing over multi-host fabrics? MR-IOV or similar concepts... thinking of CXL MLDs as well.
A: MR-IOV has been standardized for a long time. But there are no widely available PCIe endpoint devices supporting it, and adoption is unlikely in the immediate future.
Q: What is the difference between SR-IOV and latest SIOV?
A: SR-IOV is implemented by multiple VFs on the Endpoint (EP). SIOV (Scalable IOV) is originally defined in OCP for hyperscale datacenter and now the latest PCIe spec Revision 6.4/7.0 released this year also introduces new extended capability structures. According to the PCIe spec, SR-IOV and SIOV have different extended capability registers if implemented.
Q: On slide 11, would each CPU (Xeon chip) have its own on-chip Root Complex?
A: The Root Complex is the logical entity that connects the CPU and memory subsystem to the PCIe fabric, which includes PCIe slots, switches, and endpoint devices. In modern server architectures, the Root Complex functionality is commonly integrated directly into the CPU die itself, especially for high-end server chips like Intel Xeon CPUs.
Q: When do you expect PCIe 6 to show up in Xeon Root Complex?
A: Intel defines Xeon architecture. This question would be better answered by Intel.
Q: Could you please provide examples of real devices using Non-Transparent Bridge functionality?
A: NTB implementation is vendor proprietary, and is not a standard within PCIe specs. NTB-equipped switches and adapters are widely used and common in enterprise systems for multi-host partitioning, failover, clustering, or where resource composability are required. The NTB function is typically found in: multi-host datacenter servers, storage appliances, high-availability embedded platforms, compute boards interconnecting multiple SoCs, workstation clustering and resource pooling.
Q: How does PCIe addressing take care of PCIe endpoints which are part of the same device vs. distributed scenario?
A: For ID routed TLPs, targeting different functions in a single device or different devices in PCIe domain, addressing relies on a hierarchical scheme using Bus, Device, and Function (BDF) numbers that are uniquely assigned during enumeration by the Root Complex.
For MemRd/MemWr TLPs, Endpoints/RC memories are uniquely addressed within the same address space no matter if TLP is on downstream/upstream directions or P2P.
In distributed systems (here assuming across different PCIe hierarchy/partition), PCIe has isolated address domains. PCIe components’ vendors use proprietary implementations to translate and mapping shared windows/BARs between address spaces, enabling communication and resource sharing while keeping independent enumeration and address isolation intact. The addressing mechanism adapts from straightforward (same device/domain) to more complex, translation-based (distributed multi-host/root), but always ensures endpoint uniqueness and domain security.
Q: Does Scalable IOV require any changes to PCIe?
A: Originally SIOV required PASID prefix & DVSEC structure to be utilized but now features a dedicated extended capability structure in PCIe spec 6.4 & 7.0.
Q: In SR-IOV, is the hypervisor a generic system driver, or is that a vendor provided driver?
A: SR-IOV exposed the individual Configuration Space and BARs for each enabled VF. So, from PCIe perspective, it’s standard and doesn’t need any vendor provided driver. But the PF or VF specific vendor functions need the device specific driver, which is the industry norm.
Q: Does PCIe spec have a defined register available that indicates the current BER rate? Or would that be vendor specific?
A: PCIe spec doesn’t define such register. Component vendors may implement their own specific error counters.
Q: Is SR-IOV standardized or vendor specific?
A: SR-IOV is a formal industry standard defined in PCIe spec, not vendor-specific technology. This allows for broad compatibility and interoperability in virtualized environments.
Q: Any work in progress on PCIe over optical interface and PCIe over Ethernet?
A: There is a new formed committee called PCI-SIG Optical Working Group to address PCIe over optical efforts and Ethernet PHY-based PCIe for future revisions is possible, likely not at least Gen8.
Q: Any ongoing projects or working groups on using PCIe for composable infrastructure?
A: No, there is no such formal PCI-SIG working group for composable infrastructure.
Q: Any plans to standardize NTB for composable infrastructure?
A: There are ongoing discussions and research within the industry on the standardization of PCIe Non-Transparent Bridge (NTB) technology for composable infrastructure. However, a universal, ratified standard for NTB targeting composable infrastructure is not yet in place. Industry efforts toward standardization are visible but not yet complete or formalized.
The Everything You Wanted to Know But Were Too Proud to Ask is a 15-part webinar series with vendor-neutral, expert explanations on a wide range of technologies. Watch them all at our SNIAVideo YouTube Playlist.
Leave a Reply