Abstract
Disaggregating storage from compute maximizes cloud builders' resource utilization, improves their flexibility, and enables them to move from an inefficient direct-attached SSD model to a shared storage model with independently scaled compute and storage. Lightbits’ LightOS is a software-defined disaggregated NVMe/TCP-based storage solution with increased performance, reduced latency, and useful data services. A cluster of LightOS servers replicates data internally and keeps it fully consistent, durable and available in the presence of failures. Data replication is transparent and server failover is seamless. Cloud builders do not need to install any client-side drivers as everything is done through standard NVMe/TCP. In this talk, we explain how to build cloud-native standards-based clustered storage with NVMe/TCP, introduce LightOS clustering, and discuss some of the challenges we encountered while building it.