Blending Objects and Files in Google Cloud Storage
Cloud object storage systems have been built to satisfy simple storage workloads where traditional POSIX semantics are sacrificed for simplicity and scalability. With AI and analytics workloads migrating towards hyperscale cloud computing, object storage users are increasingly requesting file-oriented access to their data. In this presentation, we will discuss how Google Cloud Storage has approached bridging the gap between object-oriented and file-oriented storage to provide the optimal foundation for a scalable and performant data lakehouse which has been validated for both GPUs and TPUs at >60,000 accelerator scale, within both the Pytorch and JAX frameworks.
Google Cloud Storage has recently launched several features to bring file-oriented features to object storage at cloud scale:
* Hierarchical Namespace for directory-style bucket structure and access patterns, including atomic rename of folders and faster I/O ramping.
* Managed Folders allows for managing access control grants at the folder/prefix level.
* Rapid Storage brings stateful handle-based filesystem protocols to object storage, sub-millisecond latencies for random reads and writes, and low-latency appends.
* Cloud Storage FUSE implements the fuse filesystem interface, allowing file-oriented clients to view Cloud Storage buckets as a filesystem
* A gRPC endpoint for Cloud Storage enables new features, including binary transfers, http/2 bidirectional streaming, and dynamic routing.
We will present the technical details of each of these features and how they work together to bring file-like performance and semantics to Cloud Storage.