Sorry, you need to enable JavaScript to visit this website.

Characterizing and Emulating FDP SSDs with WARP

Abstract

Flexible Data Placement (FDP) is the NVMe interface that hyperscalers such as Google and Meta have championed to reduce write amplification without the invasive application changes OpenChannel and ZNS required. Major vendors are now shipping FDP-enabled SSDs. But FDP is a best-effort interface, not a guarantee: in our measurements, an adversarial three-stream workload with FDP enabled reaches 4.49× WAF on one commercial drive and 2.58× on another — the reasons are shaped by vendor-specific firmware policy that the host cannot see.

 This talk closes that visibility gap. Drawing on the first cross-device, cross-workload study of commercial FDP SSDs — two PCIe Gen5, NVMe 2.1 drives from different vendors, evaluated across synthetic microbenchmarks, CacheLib production traces from Meta, and F2FS filesystem workloads — we show when FDP delivers near-1 WAF and when it fails.

 We identify two previously unreported behaviors. Noisy RUH: invalidations concentrated in one reclaim unit handle inflate write amplification across other handles, breaking the isolation FDP is meant to provide. We observe this pattern on both commercial devices. Save Sequential: firmware GC heuristics can prematurely reclaim long sequential streams, so even capacity-dominant sequential traffic can end up as the largest contributor to WAF. Together, these effects show how FDP's benefits can erode even when host classification looks correct.

 We also report a result relevant to F2FS users: in our ten-hour Fileserver runs, 99% of user data writes were tagged WARM and funneled into a single RUH, collapsing FDP back to conventional SSD behavior. F2FS does separate node and metadata segments into different RUHs, but this separation alone is insufficient — user data needs finer-grained classification for FDP's benefit to materialize. For CacheLib, the picture is different: FDP holds WAF near 1.0 without degrading hit ratio, and a simple small-RU optimization further reduces CacheLib's WAF from 1.37 to 1.16 at 40% SOC.

 To make these effects reproducible and to explore policies real firmware does not expose, we built WARP (Write Amplification Research Platform), the first open FDP emulator. WARP reproduces the WAF trends of both commercial devices while exposing configurable policies hidden in real hardware: Initially Isolated vs. Persistently Isolated semantics, RU size, over-provisioning ratio, RUH count, and GC policy. Using WARP, we map the II vs. PI tradeoff and show that PI outperforms II only above a device-dependent OP threshold (~7–9% for 256 MB RUs); below that, II is more resilient under limited slack. WARP is upstreamed to FEMU and available for community use.

This work was done in collaboration with Samsung Electronics and Western Digital. Attendees working on flash caches, filesystems, and SSD firmware will leave with concrete, measurement-backed guidance on when FDP helps, when it does not, and how configuration choices shape the outcome.