AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

Author(s)/Presenter(s):
Library Content Type:
Publish Date: 
Monday, September 11, 2017
Event Name: 
Focus Areas:
Abstract: 

This presentation is a discussion of the algorithm that utilizes a recently standardized multi-stream feature in SCSI and NVMe that provides performance and latency improvements for big data applications. Multi-stream SSDs, which is already published as NVMe and SCSI T10 standards specifications, can isolate different lifetime data to separate erase blocks, and thus reduce garbage collection overhead and improve overall SSD performance and latency. Currently applications are responsible for stream management such as data-to-stream mapping. This requires application and system modification and the users deploying the solution needs to be able to individually identify the streams for their workload. As big data systems are transitioning to micro-services and container/Docker environments meaning that multiple applications running on a single system, the stream management becomes much more complicated due to the limited number of streams a device can support, for example, allocating streams to multiple applications or sharing streams across applications causes additional overhead.

To address these issues and reduce the overhead of stream management especially for big data systems, we present automatic stream management algorithms that operate under the application layer. Our stream management algorithm, called AutoStream, is based on runtime workload detection and independent of the application(s). We implement our AutoStream prototype in NVMe Linux device driver and our big data applications performance evaluation shows up to 40% improvement on throughput and up to 40% reduction on tail latency.

Watch video: