2018 SDC India Abstracts

Main Stage Abstracts

Storage and Data Management trends- 2018 and beyond

Yogesh Anyapanawar

Abstract

With the ever-growing data churn, companies are in need of newer ways to utilize their storage more efficiently than ever. Digital transformation is here to stay resulting in huge volumes of digital data, addressing Bigdata issues, having an integrated data management and cloud strategy to support scale of businesses. This put together calls for a strategy that plans, integrates and manages the storage infrastructure to be ‘transformation ready’. The talk will touch upon crucial factors in Storage and Data management that range from Datacenter elements, data movement across the technology stack, up until backup, archival and business continuity carefully highlighting the challenges and opportunities at hand. The talk will also summarize the most astounding trends in traditional as well as software-defined storage that were registered so far, and carve a depiction about what we should expect in 2018 and beyond.


TRACK A ABSTRACTS DAY ONE

 

Walking the PMEM Talk

Priya Sehgal

Abstract

From the past few years there has been a lot of effort by SNIA NVM TWG to standardize the programming model. More recently, we see a lot of development for supporting persistent memory across the Linux and Windows kernel. Moreover, the persistent memory development kit (PMDK) from pmem.io, provides a gamut of libraries build on top of the Direct Access (DAX) feature available on Linux and Windows, allowing applications to directly load/store to persistent memory by memory-mapping files on persistent memory file system. This talk will cover some of the advancements made in the Linux kernel and few examples of how to use it from the user space through PMDK (using libpmem, libpmemlog). Few additions in Linux kernel that will be covered include:
1. Huge Pages
2. Libnvdimm subsystem - supports three types of NVDIMMs: PMEM, BLK, and NVDIMM devices. 
3. Device-DAX 
4. DAX-based file systems
5. ZUFS – Zero Copy User mode File System

Learning Objectives

  • Additions and changes in Linux Kernel keeping in mind Persistent Memory
  • How to use PMEM.io library kit to develop some applications in user space on top of persistent memory

 


Case Study of NVMe performance on top-of-the-line x86 and ARM64 Servers

Vikas Aggarwal

Abstract

A comparative study was conducted on multiple dimensions of parallel I/O streams using user space(SPDK) NVMe as well Linux Kernel space NVMe drivers. We compared IOPS, latency, throughput for sequential & random read / writes on ARM64 (ThunderX2) and x86 (Intel skylake) storage servers using latest NVMe PCIe SSDs. We also studied the effect of varying I/O queue depth and number of CPU cores.  Latency for submission, completion, peak and average latencies are compared for ARM64 and x86 platforms. The purpose of the case-study is to provide information on the current state of the SPDK NVMe and Linux Kernel NVMe driver performance which can then be used by designers to architect their products.

Learning Objectives

  • Demonstration of  ARM64 based NVMe storage server standing at par with x86
  • NVMe  IOPS,  latency, throughput data sharing from cavium storage labs
  • Performance scalability in-sight

Manage flash storage efficiently in a multi-tenant cloud environment using QoS and adaptive throttling

Ketan Dinesh Mahajan

Abstract

In a multi-tier virtualized storage environment, there is continuous movement of data from higher tiers (flash) to lower tiers. If this is slow, higher tiers may run out of space due to high workload ingest rates. Especially in multi-tenant cloud environment, high-ingest behaviour of certain workloads may undesirably affect other high priority workloads.
In this presentation, we will detail how using storage QoS, total workloads' ingest rate can be made proportional to the residual space on higher tier and the remaining IOPS capacity can be reserved for flush. Even within various workloads the capacity may be distributed based on their space usage in higher tier. We will present results, which will showcase that above will reduce the application ingest rate continuously and at equilibrium it would match the flush rate and space usage would be constant. The workloads with heavy ingest would be throttled faster while bursty workloads would be given fair share.

Learning Objectives

  • Patterns of IOs on flash storage in a multi-tier storage environment and its impact on space usage.
  • Ways to apply QoS in multi-tenant cloud environment.
  • Noisy neighbour problem, its impact in various dimensions and the solutions.

Using Machine Learning for Intelligent Storage Performance Anomaly Detection

Ramakrishna Vadla and Archana Chinnaiah

Abstract

The enterprise applications use huge amount of data which demands large scale distributed storage subsystems deployments in the data centers. The storage virtualization functionalities bring in more complexity in storage management. The latest storage systems support performance data collection with high frequencies such as seconds and minutes which enables to gather more data for performance analysis. The challenge with the storage subsystems is finding the performance bottlenecks and identifying the root cause and resolving them with less turnaround. The performance bottlenecks include wide variety of issues such as inaccessible disks, I/O errors, port masking, volume errors, network congestion.  The Machine Learning models (Multivariate Regression, Time series analysis & VAR models) are helping proactively finding the performance anomalies/bottlenecks and recovering from them intelligently. The performance metrics those are used in building the Machine Learning models are I/O Rate R/W (Read & Write), Data Rate R/W, Response time R/W, Cache hit R/W, Average data block size R/W, porta data rate R/W, port-local node queue time, port protocol errors, port congestion index etc. The storage system is updated with the corrective resolution or send an alert message to the storage administrator based on the performance bottleneck detection. We will share the experiences of using Machine Learning models for performance anomaly detection.

Learning Objectives

  • Applying Machine Learning models in storage performance anomaly detection
  • Choosing the right feature set from a large number of storage performance metrics
  • Evaluating different ML models for different set of performance bottlenecks

 


SNIA Session on Swordfish

Anand Nagarajan

Abstract

The SNIA’s Scalable Storage Management Technical Work Group (SSM TWG) has created and published an open industry standard specification for storage management that defines a customer centric interface for the purpose of managing storage and related data services. This specification builds on the DMTF’s Redfish specification using RESTful methods and JSON formatting. 

This presentation shows how Swordfish extends Redfish, details  Swordfish concepts and talks about CSDL and JSON schema formats and ODATA protocol for modelling resources.

Pre-conference learning

www.snia.org/forums/smi/swordfish

2017 presentation at SDC India DOWNLOAD


 

TRACK B ABSTRACTS DAY ONE

Ozone - Object Store in Apache Hadoop

Mukul Kumar Singh

Abstract

Ozone brings in a new storage paradigm in Hadoop called object storage. It will co-exist with HDFS to provide file store and object store functionality in the same Hadoop cluster. Ozone will also solve the scalability and small file problem of HDFS, where users can now store trillions files in Ozone and access them as if they are on HDFS. Ozone plugs into existing Hadoop deployments seamlessly and programs like Hive and Spark work without any modifications. This talk looks at the architecture, reliability, and performance of Ozone. In this talk, we will also explore Hadoop Distributed Storage Layer, a block storage layer that makes this scaling possible, and how we plan to use the Hadoop Distributed Storage Layer for scaling HDFS. We will demonstrate how to install an Ozone cluster, how to create volumes, buckets and keys, how to run Hive and Spark against HDFS and Ozone file systems using federation, so that users don’t have to worry about where the data is actually stored. Ozone SDK will also be covered in this talk. In other words, a full user primer on Ozone will be part of this talk.

Learning Objectives

  • Learn about Apache Hadoop
  • Future of Hadoop
  • Development of object storage

Pre-conference learning

SNIA Webcast: File vs Block vs Object Storage

 


Amalgamation of cognitive computing inside object storage for security compliance

Smita Raut

Abstract

Security compliance of unstructured data has become an ubiquitous business requirement and even so more with the upcoming GDPR regulation. Object Store is growing continually in deployments hosting oceans of unstructured data. Understanding which object data falls under compliance governed data category becomes vital so that the required security compliance enforcement from the storage side can be acted on it. 
 
On the other hand, there has been substantial progress in the field of cognitive computing which allows deep analysis of unstructured data for pattern recognition, correlation, learning etc . Cognitive Computing over objects can help categorize objects for compliances and even tag them accordingly. 
 
In this talk we present the architecture and design details on how cognitive computing can be embeded into object storage which will proactively and autonomously apply machine learning techniques to objects deriving metadata that helps categorize the objects into compliant and non-compliant category. The object tags are then leveraged for enforcing the security compliance like object retention, object encryption, object endurance, etc. The talk also also presents on how parallel computing can be used for object storage over clustered filesystem to optimize the congnitive computing analysis.

Learning Objectives

  • What are Security requirements on Object storage by compliance like GDPR.
  • Basics of Swift Object Storage Architecture and Cognitive Computing.
  • Introduction to Object Storlet Technology (based on SNIA SDC Keynote talk , Israel 2018).

Pre-conference learning


Container Attached Storage (CAS) architecture for Stateful applications on containers

Umasankar Mukkara

Abstract

There is a huge need to containerize the Stateful applications in today's world of Docker and Kubernetes. These stateful applications need a storage architecture that is truly cloud native. Container Attached Storage or CAS is a truly cloud native software architecture for applications running in containers. In CAS, the storage software itself is containerized and hence gets the advantages of being a micro service. In CAS architecture, each storage volume gets it's own storage controller running completely in user space and attains the maximum agility and policy granularity. 
 
CAS architecture has gained tremendous traction through it's reference implementation - OpenEBS. Through its seamless integration with Kubernetes and associated tools for managing native disks on container orchestration platforms, CAS delivers native hyperconvergence solution for containers. 
 
In this presentation, the author covers the need for CAS architecture for containers and explains in detail about the architectural advantages of it. The use case of how CAS solves the problem of storage scalability management when the number of containers grows into thousands will be discussed.
 
Learning Objectives
  • Learn the requirements of storage for cloud native stateful applications
  • Learn the differences between DAS, NAS and CAS
  • Learn the CAS architecture and why it is truly cloud native

Pre-conference learning


Data Center Networking - Existing challenges and new advances

Anupam Jagdish Chomal

Abstract

A typical Datacenter architecture will be introduced along with the various bottlenecks it experiences. We would then look at the newer Datacenter designs and how it improves their performance/throughput. Finally, we will look at the relatively new found Networking issues like TCP Incast, TCP Outcast, etc
 
We will start by looking at the typical Core-Access-Edge architecture and we will compare it with other architectures like Leaf-Spine, Fat-Tree etc. We will then look at the basic bottlenecks in a traditional Datacenter followed by new protocol level enhancements like DCTCP, A2DTCP, and Google's BBR to solve them.
 
We will also look at issues like TCP Incast, TCP Outcast, and TCP Unfairness and solutions proposed to mitigate them.
 
Finally, we will take a look at Facebook's Open Compute Project and see how Datacenter designs are been shared efficiently in our industry.
 
Learning Objectives
  • Understanding existing Data Center architecture and their performance bottle neck
  • Learn new Data Center Architectures like Leaf n Spine and comparing it with earlier designs
  • We will learn how projects like DCTCP, A2DTCP, and Google's BBR are addressing the basic bottleneck in a typical Data Center

Pre-conference learning



Track A Abstracts DAY Two

Data Centric Security

Srinivasan Narayanamurthy

Abstract

For decades, architecting data security solutions revolved around the idea of building a fortress around the data. This is called perimeter-centric security. 

However, perimeter-centric security is not relevant in this new age of computing where,
1. Data is exponentially increasing in terms of volume, velocity, and variety;
2. Users (including semi-trusted third-party contractors) are spread across the globe accessing data from home and on the road;
3. Infrastructure that access, store and move data include mobile & other heterogeneous devices, cloud, SaaS platforms, etc.;
4. Regulations that govern the data and hackers who steal the same data are both becoming increasingly sophisticated.
We are on the verge of witnessing a paradigm shift in data security architectures--a shift from perimeter-centricity to data-centricity.

This talk is aimed at introducing and discussing in detail about data-centric security. Data-centric security is about securing data without artificial physical / infrastructure boundaries. That is, instead of securing the applications (in-use), endpoints (at-rest) & network (in-motion) infrastructure that use, store & move data respectively, data-centric security embeds security controls within the data itself.

Learning Objectives

  • Introduction to the notion of data-centric security
  • Presentation of a reference framework for architecting data-centric security solutions
  • Discussion about a bouquet of technologies that can act as building blocks

Pre-conference learning

Oxymoron: Computing on Encrypted Data - Srinivasan Narayanamurthy @SDC India 2017

 


Genomics Deployments:  How To Get Right With Software Defined Storage

Sandeep Ramesh Patil

Abstract

The emerging field of Genomics Medicine requires physicians, data scientists and researchers to analyze huge amounts of genomics data quickly. This poses challenges on the backend infrastructure including the storage. In this talk, we present the genomic workload characteristics, its requirement on the backend storage sub-systems and how an composable infrastructure approach based on scale out file system can enable IT architects  to customize deployments for varying functional and performance needs.

Learning Objectives

  • Understand the workload characteristic of Genomic Medicine which is an emerging and disruptive opportunity.
  • Understand the requirement that Genomic Medicine workload poses on backend storage
  • Learn fundamentals of composable infrastructure

Pre-conference learning

 



TRACK B ABSTRACTS DAY TWO

 

How to Build a Reliable, Scalable Parallel Filesystem Solution using AWS Infrastructure

Sasikanth Eda

Abstract

In the last couple of years, the case for HPC in the cloud is growing stronger. But still, the HPC industry lies far behind enterprise IT in its willingness to outsource computational power. One of its reason being storage - as none of the built-in storage solutions available across the public cloud providers are suitable for applications with high bandwidth requirements.

A parallel, clustered file system built on top of block storage (Ex: AWS EBS) forms a good solution. However, there exist multiple architectural approaches (such as when & why to use placement groups, autoscaling, auto recovery, replication across availability zones etc.) one can take to build parallel, clustered filesystem. Apart from building, there exists a need for effectively managing the cloud resources used for building the filesystem as it could quickly become expensive.

The proposed presentation is aimed to discuss in detail different architectures (weighing Pro’s & Con’s) that can be used to build a reliable parallel filesystem in the cloud (showcasing AWS, IBM Spectrum Scale as an example) and data lifecycle techniques that help reduce OPEX cost by effectively managing parallel filesystem in the cloud.

Learning Objectives

  • Understand how the cloud is falling short for HPC workloads
  • Introduction to various AWS compute, storage services
  • Learn different ways (weighing pro's & con's) using which a parallel, clustered filesystem can be built using AWS services

Data Architecture for Data-Driven Enterprises: A Storage Practitioner’s View

Deepti Aggarwal

Abstract

More than a decade old data architecture isn’t enough for today’s data-driven businesses which are heavily dependent on AI/ML/DL. As the enterprises begin to operationalize these AI/ML workflows, they would need to  optimize on storage I/O performance to feed to massively parallel GPU based compute. With growing IoT footprint, data management and AI/ML based compute challenges span across from edge, to the core and to the cloud.  In this talk we propose a need for a modern data engineering and management pipeline to address the above challenges.  Specific learning objectives being, how some of the existing data engineering workflows need to be re-thought, which includes dynamic data indexing, access pattern aware data layout etc. The talk would also cover other emerging data engineering challenges like data reduction and data quality assessment with specific focus on edge/core vs. cloud.  The talk would also bring out any ongoing research towards addressing the mentioned challenges.

Learning Objectives

  • A modern data engineering and management pipeline spanning from edge-to-core-to-cloud
  • A re-think of existing services provided by storage systems like data indexing and data layout in the context of the new-age data engineering pipeline
  • Emerging data engineering challenges including data reduction and data quality assessment

 

Accelerated Erasure Coding:  The New Frontier of Software-Defined Storage

Dineshkumar Bhaskaran

Abstract

Efficient storage is critical to the success of datacenters and the functioning of enterprises. The exponential growth in the volume of data is forcing CIOs to rethink their storage strategies. One challenge they face is finding a replacement for aging RAID technology, which falls short in extreme I/O performance, data protection and resiliency. 
A solution is erasure coding (EC), which is becoming the preferred choice for data protection in large datacenters. Erasure codes have evolved from traditional Reed-Solomon algorithm to more sophisticated locally recoverable and regenerating codes that can perform more efficient data recovery. Erasure codes are compute-intensive and impose higher resource cost on distributed storage system solutions.
Our approach to above problem is a hardened Ceph based distributed storage solution based on modular and scalable EC-offload-engine (ECoE) library. The ECoE, comprising of new age EC algorithms on general-purpose graphics processing units (GPUs) and can provide improvements of up to 40% in encode/decode processes depending on the algorithm.

Learning Objectives

  • Erasure codes and Ceph erasure code plugin infrastructure.
  • ECoE erasure code algorithms and implementation on GPUs
  • Performance and cost analysis of ECoE algorithms on Ceph