RDMA development and testing traditionally require expensive InfiniBand or RoCE hardware, limiting CI coverage and restricting developer access to a handful of lab machines. Soft-RoCE avoids the hardware dependency but runs entirely inside the guest kernel, offering no visibility into device-level behavior and no path to real-HCA passthrough.
This talk presents a different approach: a standalone userspace server that emulates a complete PCIe RDMA NIC via the VFIO-user protocol. Built on Nutanix's open-source libvfio-user library, the server exposes PCI BARs to an unmodified QEMU/KVM guest, which loads a companion kernel driver and a custom rdma-core provider. Applications in the guest see a standard InfiniBand device and use ordinary libibverbs calls -- ibv_post_send, ibv_poll_cq, perftest, iperf3 -- with no guest-side awareness that the device is emulated.
The server supports pluggable backends: a loopback backend for single-process CI, a TCP mesh backend that connects multiple server instances into a virtual RDMA fabric, and a native verbs backend that proxies operations to a real HCA. On kernels 6.14+, the data path uses shared-memory ring buffers with atomic producer/consumer indices, bypassing the kernel write-based uverbs path entirely.
We share concrete performance results -- up to 2 GB/s bandwidth, sub-millisecond latency, bidirectional traffic, and 24/24 stress-test pass rates -- along with the architectural trade-offs, debugging war stories, and the DMA-BUF plumbing that enables a GPU-Direct RDMA path through the emulated device.