Abstract
More than a decade old data architecture isn’t enough for today’s data-driven businesses which are heavily dependent on AI/ML/DL. As the enterprises begin to operationalize these AI/ML workflows, they would need to optimize on storage I/O performance to feed to massively parallel GPU based compute. With growing IoT footprint, data management and AI/ML based compute challenges span across from edge, to the core and to the cloud.
In this talk we propose a need for a modern data engineering and management pipeline to address the above challenges. Specific learning objectives being, how some of the existing data engineering workflows need to be re-thought, which includes dynamic data indexing, access pattern aware data layout etc. The talk would also cover other emerging data engineering challenges like data reduction and data quality assessment with specific focus on edge/core vs. cloud. The talk would also bring out any ongoing research towards addressing the mentioned challenges.
Learning Objectives:
1. A modern data engineering and management pipeline spanning from edge-to-core-to-cloud
2. A re-think of existing services provided by storage systems like data indexing and data layout in the context of the new-age data engineering pipeline
3. Emerging data engineering challenges including data reduction and data quality assessment