Apache Ozone - Balancing and Deleting Data At Scale

Library Content Type:
Publish Date: 
Wednesday, September 29, 2021
Event Name: 
Event Track:
Abstract: 

Apache Ozone is an object store which scales to tens of billions of objects, hundreds of petabytes of data and thousands of datanodes. Ozone not only supports high throughput data ingestion but also supports high throughput deletion with performance similar to HDFS. Further with massive scale the data can be non-uniformly distributed due to addition of new datanodes, deletion of data etc. Non-uniform distribution can lead to lower utilisation of resources and can affect the overall throughput of the cluster. The talk discusses the balancer service in Ozone which is responsible for uniform distribution of data across the cluster. It would cover the service design and how the service improves upon HDFS balancer service. Further the talk discusses how Ozone deletion matches the HDFS deletion performance of deleting 1 Million keys / hour but can scale much more. Simple design and asynchronous operations enable Ozone to achieve the scale for deletion. The talk would dive deeper into the design and performance enhancements.

  • Ozone architecture
  • How Ozone deletion design helps achieve high throughput
  • Balancer design considerations and performance

Watch video: