Abstract
Serverless computing is becoming increasingly popular, enabling users to quickly launch thousands of short-lived tasks in the cloud with high elasticity and fine-grain billing. While these properties make serverless computing appealing for interactive data analytics, a key challenge is managing intermediate data shared between tasks. Since communicating directly between short-lived serverless tasks is difficult, the natural approach is to store such ephemeral data in a common remote data store. However, existing storage systems are not designed to meet the demands of serverless applications in terms of elasticity, performance and cost. We present Pocket, a distributed data store that elastically scales to automatically provide applications with desired performance at low cost. Pocket dynamically rightsizes storage cluster resource allocations across multiple dimensions (storage capacity, network bandwidth and CPU cores) as application load varies. We show that Pocket cost-effectively rightsizes the type and number of resources such that applications are not bottlenecked on I/O. Learning Objectives: 1. We identify the key characteristics of ephemeral data in serverless analytics and synthesize requirements for storage platforms used for sharing such data among serverless tasks 2. We introduce Pocket, a storage platform targeting the important use case of efficient data sharing in serverless analytics workloads 3. We present an evaluation of Pocket on AWS Lambda for serverless analytics workloads, demonstrating its effectiveness in terms of performance and resource utilization