2018 SDC India Abstracts

Main Stage Abstracts

Aadhaar: Building the worlds largest identity platform

Vivek Raghavan

Abstract

Aadhaar is a digital identity platform for 1.2 billion residents of India. The presentation will discuss the technology and business process innovations involved in the creation of Aadhaar.  The role of Aadhaar as a key component of  the India Stack with the goal of transforming India into a digital nation where all services can be availed in a paperless, cashless and presence less manner will also be discussed.

Storage and Data Management trends- 2018 and beyond

Yogesh Anyapanawar

Abstract

With the ever-growing data churn, companies are in need of newer ways to utilize their storage more efficiently than ever. Digital transformation is here to stay resulting in huge volumes of digital data, addressing Bigdata issues, having an integrated data management and cloud strategy to support scale of businesses. This put together calls for a strategy that plans, integrates and manages the storage infrastructure to be ‘transformation ready’. The talk will touch upon crucial factors in Storage and Data management that range from Datacenter elements, data movement across the technology stack, up until backup, archival and business continuity carefully highlighting the challenges and opportunities at hand. The talk will also summarize the most astounding trends in traditional as well as software-defined storage that were registered so far, and carve a depiction about what we should expect in 2018 and beyond.


TRACK A ABSTRACTS DAY ONE

 

Energy Efficient Data Storage Methods for Consolidated Servers and Data Centers

David Bachu and Muthu Mohan

Abstract

In today’s technology-driven world, demand for computing is increasing on an almost daily basis. To meet this increasing demand, the implementation of consolidated servers and data centers is growing rapidly.  However, computing and storage components consume more than 40% of energy requirements in addition to power distribution and cooling equipment.

According to the server energy model:

Pt = Pf + Pv

Where  Pt = Total power consumption

Pf = Fixed power consumption by memory modules, disks, IO resources

Pv = Variable power consumption by CPUs (different power requirements at different operational frequencies)

Today, we will be discussing transforming Pf as a variable contributor by adapting energy saving techniques while handling data in memory devices.
 

Walking the PMEM Talk

Priya Sehgal

Abstract

From the past few years there has been a lot of effort by SNIA NVM TWG to standardize the programming model. More recently, we see a lot of development for supporting persistent memory across the Linux and Windows kernel. Moreover, the persistent memory development kit (PMDK) from pmem.io, provides a gamut of libraries build on top of the Direct Access (DAX) feature available on Linux and Windows, allowing applications to directly load/store to persistent memory by memory-mapping files on persistent memory file system. This talk will cover some of the advancements made in the Linux kernel and few examples of how to use it from the user space through PMDK (using libpmem, libpmemlog). Few additions in Linux kernel that will be covered include:
1. Huge Pages
2. Libnvdimm subsystem - supports three types of NVDIMMs: PMEM, BLK, and NVDIMM devices. 
3. Device-DAX 
4. DAX-based file systems
5. ZUFS – Zero Copy User mode File System

Learning Objectives

  • Additions and changes in Linux Kernel keeping in mind Persistent Memory
  • How to use PMEM.io library kit to develop some applications in user space on top of persistent memory

 


Case Study of NVMe performance on top-of-the-line x86 and ARM64 Servers

Vikas Aggarwal

Abstract

A comparative study was conducted on multiple dimensions of parallel I/O streams using user space(SPDK) NVMe as well Linux Kernel space NVMe drivers. We compared IOPS, latency, throughput for sequential & random read / writes on ARM64 (ThunderX2) and x86 (Intel skylake) storage servers using latest NVMe PCIe SSDs. We also studied the effect of varying I/O queue depth and number of CPU cores.  Latency for submission, completion, peak and average latencies are compared for ARM64 and x86 platforms. The purpose of the case-study is to provide information on the current state of the SPDK NVMe and Linux Kernel NVMe driver performance which can then be used by designers to architect their products.

Learning Objectives

  • Demonstration of  ARM64 based NVMe storage server standing at par with x86
  • NVMe  IOPS,  latency, throughput data sharing from cavium storage labs
  • Performance scalability in-sight

Manage flash storage efficiently in a multi-tenant cloud environment using QoS and adaptive throttling

Ketan Dinesh Mahajan

Abstract

In a multi-tier virtualized storage environment, there is continuous movement of data from higher tiers (flash) to lower tiers. If this is slow, higher tiers may run out of space due to high workload ingest rates. Especially in multi-tenant cloud environment, high-ingest behaviour of certain workloads may undesirably affect other high priority workloads.
In this presentation, we will detail how using storage QoS, total workloads' ingest rate can be made proportional to the residual space on higher tier and the remaining IOPS capacity can be reserved for flush. Even within various workloads the capacity may be distributed based on their space usage in higher tier. We will present results, which will showcase that above will reduce the application ingest rate continuously and at equilibrium it would match the flush rate and space usage would be constant. The workloads with heavy ingest would be throttled faster while bursty workloads would be given fair share.

Learning Objectives

  • Patterns of IOs on flash storage in a multi-tier storage environment and its impact on space usage.
  • Ways to apply QoS in multi-tenant cloud environment.
  • Noisy neighbour problem, its impact in various dimensions and the solutions.

Using Machine Learning for Intelligent Storage Performance Anomaly Detection

Ramakrishna Vadla and Archana Chinnaiah

Abstract

The enterprise applications use huge amount of data which demands large scale distributed storage subsystems deployments in the data centers. The storage virtualization functionalities bring in more complexity in storage management. The latest storage systems support performance data collection with high frequencies such as seconds and minutes which enables to gather more data for performance analysis. The challenge with the storage subsystems is finding the performance bottlenecks and identifying the root cause and resolving them with less turnaround. The performance bottlenecks include wide variety of issues such as inaccessible disks, I/O errors, port masking, volume errors, network congestion.  The Machine Learning models (Multivariate Regression, Time series analysis & VAR models) are helping proactively finding the performance anomalies/bottlenecks and recovering from them intelligently. The performance metrics those are used in building the Machine Learning models are I/O Rate R/W (Read & Write), Data Rate R/W, Response time R/W, Cache hit R/W, Average data block size R/W, porta data rate R/W, port-local node queue time, port protocol errors, port congestion index etc. The storage system is updated with the corrective resolution or send an alert message to the storage administrator based on the performance bottleneck detection. We will share the experiences of using Machine Learning models for performance anomaly detection.

Learning Objectives

  • Applying Machine Learning models in storage performance anomaly detection
  • Choosing the right feature set from a large number of storage performance metrics
  • Evaluating different ML models for different set of performance bottlenecks

Automated Storage Tiering using machine learning

Anindya Banerjee and Niranjan Pendharkar

Abstract

In the current era there is a general transition from a human developed automation storage system to machine learning based intelligent storage system. Many storage solutions support multiple tiers of storage coming from on premise and cloud storage in a single namespace. The storage tiers can vary in various dimensions such as performance, resiliency, cost. The storage solutions also provide mechanisms to move objects from one tier to a different tier depending on various parameters, such as last access time, last modification time, access frequency, size of the object. There are thresholds to those parameters and objects are moved  when the thresholds are crossed. But administrators are expected to set the threshold values, which puts the onus on the administrators to set the values properly to make optimal use of the storage.
The presentation will talk about the challenges for an administrator to set the correct threshold value and will propose a machine learning based approach to help the administrator in finding the threshold value. The machine learning based approach (neural networks/ARIMA, clustering techniques) uses historical data to create a model to predict future access for the objects and use the predicted values to either do the automated tiering or suggest proper threshold values. The presentation will also cover some of the results obtained by applying some techniques on in-house data.

Learning Objectives

  • Fundamentals of storage tiers and tiering policies
  • Challenges in setting threshold values for the policies
  • New approach to use historical data to predict future access patterns

SNIA Session on Swordfish

Anand Nagarajan

Abstract

The SNIA’s Scalable Storage Management Technical Work Group (SSM TWG) has created and published an open industry standard specification for storage management that defines a customer centric interface for the purpose of managing storage and related data services. This specification builds on the DMTF’s Redfish specification using RESTful methods and JSON formatting. 

This presentation shows how Swordfish extends Redfish, details  Swordfish concepts and talks about CSDL and JSON schema formats and ODATA protocol for modelling resources.

Pre-conference learning

www.snia.org/forums/smi/swordfish

2017 presentation at SDC India DOWNLOAD


 

TRACK B ABSTRACTS DAY ONE

IoT - Implications to storage architecture

Girish Kumar BK

Abstract

Internet of Things is enabling traditional and newer operational workflows to be digitized for continuous measurement.  The measured data is useful in predicting the operational failures/inefficiencies and take actions before they occur.  The aggregation of measured data from millions of devices is overwhelming the public network bandwidth.  Also, there are privacy and data sovereignty concerns as far as data on the move is concerned.  Thus, most of the IoT platforms (Hyperscalers) offer edge (device side) footprint and enable data services at the edge that were available only in cloud few years back.  The edge reduces the latency and provides local analytics for quicker actions.     

In this talk, we present the implications on storage architecture as IoT data pipelines keep evolving from client server to distributed architecture to cater millions of devices; geographically spread.   Also we discuss how data latency, privacy, sovereignty and need for governing massive amounts of data are driving the newer storage constructs.   


Ozone - Object Store in Apache Hadoop

Mukul Kumar Singh

Abstract

Ozone brings in a new storage paradigm in Hadoop called object storage. It will co-exist with HDFS to provide file store and object store functionality in the same Hadoop cluster. Ozone will also solve the scalability and small file problem of HDFS, where users can now store trillions files in Ozone and access them as if they are on HDFS. Ozone plugs into existing Hadoop deployments seamlessly and programs like Hive and Spark work without any modifications. This talk looks at the architecture, reliability, and performance of Ozone. In this talk, we will also explore Hadoop Distributed Storage Layer, a block storage layer that makes this scaling possible, and how we plan to use the Hadoop Distributed Storage Layer for scaling HDFS. We will demonstrate how to install an Ozone cluster, how to create volumes, buckets and keys, how to run Hive and Spark against HDFS and Ozone file systems using federation, so that users don’t have to worry about where the data is actually stored. Ozone SDK will also be covered in this talk. In other words, a full user primer on Ozone will be part of this talk.

Learning Objectives

  • Learn about Apache Hadoop
  • Future of Hadoop
  • Development of object storage

Pre-conference learning

SNIA Webcast: File vs Block vs Object Storage

 


Amalgamation of cognitive computing inside object storage for security compliance

Smita Raut

Abstract

Security compliance of unstructured data has become an ubiquitous business requirement and even so more with the upcoming GDPR regulation. Object Store is growing continually in deployments hosting oceans of unstructured data. Understanding which object data falls under compliance governed data category becomes vital so that the required security compliance enforcement from the storage side can be acted on it. 
 
On the other hand, there has been substantial progress in the field of cognitive computing which allows deep analysis of unstructured data for pattern recognition, correlation, learning etc . Cognitive Computing over objects can help categorize objects for compliances and even tag them accordingly. 
 
In this talk we present the architecture and design details on how cognitive computing can be embeded into object storage which will proactively and autonomously apply machine learning techniques to objects deriving metadata that helps categorize the objects into compliant and non-compliant category. The object tags are then leveraged for enforcing the security compliance like object retention, object encryption, object endurance, etc. The talk also also presents on how parallel computing can be used for object storage over clustered filesystem to optimize the congnitive computing analysis.

Learning Objectives

  • What are Security requirements on Object storage by compliance like GDPR.
  • Basics of Swift Object Storage Architecture and Cognitive Computing.
  • Introduction to Object Storlet Technology (based on SNIA SDC Keynote talk , Israel 2018).

Pre-conference learning


Container Attached Storage (CAS) architecture for Stateful applications on containers

Umasankar Mukkara

Abstract

There is a huge need to containerize the Stateful applications in today's world of Docker and Kubernetes. These stateful applications need a storage architecture that is truly cloud native. Container Attached Storage or CAS is a truly cloud native software architecture for applications running in containers. In CAS, the storage software itself is containerized and hence gets the advantages of being a micro service. In CAS architecture, each storage volume gets it's own storage controller running completely in user space and attains the maximum agility and policy granularity. 
 
CAS architecture has gained tremendous traction through it's reference implementation - OpenEBS. Through its seamless integration with Kubernetes and associated tools for managing native disks on container orchestration platforms, CAS delivers native hyperconvergence solution for containers. 
 
In this presentation, the author covers the need for CAS architecture for containers and explains in detail about the architectural advantages of it. The use case of how CAS solves the problem of storage scalability management when the number of containers grows into thousands will be discussed.
 
Learning Objectives
  • Learn the requirements of storage for cloud native stateful applications
  • Learn the differences between DAS, NAS and CAS
  • Learn the CAS architecture and why it is truly cloud native

Pre-conference learning


Data Center Networking - Existing challenges and new advances

Anupam Jagdish Chomal

Abstract

A typical Datacenter architecture will be introduced along with the various bottlenecks it experiences. We would then look at the newer Datacenter designs and how it improves their performance/throughput. Finally, we will look at the relatively new found Networking issues like TCP Incast, TCP Outcast, etc
 
We will start by looking at the typical Core-Access-Edge architecture and we will compare it with other architectures like Leaf-Spine, Fat-Tree etc. We will then look at the basic bottlenecks in a traditional Datacenter followed by new protocol level enhancements like DCTCP, A2DTCP, and Google's BBR to solve them.
 
We will also look at issues like TCP Incast, TCP Outcast, and TCP Unfairness and solutions proposed to mitigate them.
 
Finally, we will take a look at Facebook's Open Compute Project and see how Datacenter designs are been shared efficiently in our industry.
 
Learning Objectives
  • Understanding existing Data Center architecture and their performance bottle neck
  • Learn new Data Center Architectures like Leaf n Spine and comparing it with earlier designs
  • We will learn how projects like DCTCP, A2DTCP, and Google's BBR are addressing the basic bottleneck in a typical Data Center

Pre-conference learning



Track A Abstracts DAY Two

Data Centric Security

Srinivasan Narayanamurthy

Abstract

For decades, architecting data security solutions revolved around the idea of building a fortress around the data. This is called perimeter-centric security. 

However, perimeter-centric security is not relevant in this new age of computing where,
1. Data is exponentially increasing in terms of volume, velocity, and variety;
2. Users (including semi-trusted third-party contractors) are spread across the globe accessing data from home and on the road;
3. Infrastructure that access, store and move data include mobile & other heterogeneous devices, cloud, SaaS platforms, etc.;
4. Regulations that govern the data and hackers who steal the same data are both becoming increasingly sophisticated.
We are on the verge of witnessing a paradigm shift in data security architectures--a shift from perimeter-centricity to data-centricity.

This talk is aimed at introducing and discussing in detail about data-centric security. Data-centric security is about securing data without artificial physical / infrastructure boundaries. That is, instead of securing the applications (in-use), endpoints (at-rest) & network (in-motion) infrastructure that use, store & move data respectively, data-centric security embeds security controls within the data itself.

Learning Objectives

  • Introduction to the notion of data-centric security
  • Presentation of a reference framework for architecting data-centric security solutions
  • Discussion about a bouquet of technologies that can act as building blocks

Pre-conference learning

Oxymoron: Computing on Encrypted Data - Srinivasan Narayanamurthy @SDC India 2017

 


Comparison of write-back strategies

Mahesh Khatpe

Abstract

The ability to provide a low latency  access to the data has become even more important problem to solve in the todays world of data explosion. Caching techniques were evolved in past decades so that we accommodate the faster FLASH devices to improve the performance of storage systems.
 
The development of read caches are mostly around the replacement algorithms, the approaches to give better write bandwidth has changed over the period as the applications changed.  In traditional storage systems  built for OLTP applications, we see implementation write-back cache systems.
With increased bandwidth need for the writes, a design of write-log evolved with other benefits such as versioning, ease in recovery.
   
This paper mainly compare the data structure layout for both write-back cache and write-log with their merits and also how they respond if stressed with different workloads. The two solution also enable different ways of data management and protection. This analysis could help storage architects chose appropriate write-back solutions for various applications.

Learning Objectives

  • Understanding the storage layout in write-back s & write-log strategies
  • Merits and de-merits of each strategy while serving different workloads
  • High-level recommendation about to use a strategy fulfil a specific functionality/use case

 

 


Genomics Deployments:  How To Get Right With Software Defined Storage

Sandeep Ramesh Patil

Abstract

The emerging field of Genomics Medicine requires physicians, data scientists and researchers to analyze huge amounts of genomics data quickly. This poses challenges on the backend infrastructure including the storage. In this talk, we present the genomic workload characteristics, its requirement on the backend storage sub-systems and how an composable infrastructure approach based on scale out file system can enable IT architects  to customize deployments for varying functional and performance needs.

Learning Objectives

  • Understand the workload characteristic of Genomic Medicine which is an emerging and disruptive opportunity.
  • Understand the requirement that Genomic Medicine workload poses on backend storage
  • Learn fundamentals of composable infrastructure

Pre-conference learning

 



TRACK B ABSTRACTS DAY TWO

Next Generation Ecosystem Storage Management

Balaji Marimuthu

Abstract

With the current hyperscale datacenters, managing multi-vendor storage hardware using one simple user friendly tool is the datacenter admins desire. Server and storage Industries are trying to solve this common problem by providing a standard way of storage management. DMTF and SNIA have attempted to standardize the storage management using CIM and SMI-S standards for a decade. Now DMTF and SNIA have reviewed the lessons we learnt in a decade and have come up with Redfish and Swordfish. A simplified and easy to implement and use standards for the next generation of storage management. In addition to the standard based storage management, below are the common ask on the next generation of storage management.

  • AI/ML based data analysis for prediction, notification and automated error recovery
  • In-band and Out of band management
  • Capability to run as containerized / serverless application

 

How to Build a Reliable, Scalable Parallel Filesystem Solution using Cloud Infrastructure

Sasikanth Eda

Abstract

In the last couple of years, the case for HPC in the cloud is growing stronger. But still, the HPC industry lies far behind enterprise IT in its willingness to outsource computational power. One of its reason being storage - as none of the built-in storage solutions available across the public cloud providers are suitable for applications with high bandwidth requirements.

A parallel, clustered file system built on top of block storage (Ex: AWS EBS) forms a good solution. However, there exist multiple architectural approaches (such as when & why to use placement groups, autoscaling, auto recovery, replication across availability zones etc.) one can take to build parallel, clustered filesystem. Apart from building, there exists a need for effectively managing the cloud resources used for building the filesystem as it could quickly become expensive.

The proposed presentation is aimed to discuss in detail different architectures (weighing Pro’s & Con’s) that can be used to build a reliable parallel filesystem in the cloud (showcasing AWS, IBM Spectrum Scale as an example) and data lifecycle techniques that help reduce OPEX cost by effectively managing parallel filesystem in the cloud.

Learning Objectives

  • Understand how the cloud is falling short for HPC workloads
  • Introduction to various AWS compute, storage services
  • Learn different ways (weighing pro's & con's) using which a parallel, clustered filesystem can be built using AWS services

Data Architecture for Data-Driven Enterprises: A Storage Practitioner’s View

Deepti Aggarwal

Abstract

More than a decade old data architecture isn’t enough for today’s data-driven businesses which are heavily dependent on AI/ML/DL. As the enterprises begin to operationalize these AI/ML workflows, they would need to  optimize on storage I/O performance to feed to massively parallel GPU based compute. With growing IoT footprint, data management and AI/ML based compute challenges span across from edge, to the core and to the cloud.  In this talk we propose a need for a modern data engineering and management pipeline to address the above challenges.  Specific learning objectives being, how some of the existing data engineering workflows need to be re-thought, which includes dynamic data indexing, access pattern aware data layout etc. The talk would also cover other emerging data engineering challenges like data reduction and data quality assessment with specific focus on edge/core vs. cloud.  The talk would also bring out any ongoing research towards addressing the mentioned challenges.

Learning Objectives

  • A modern data engineering and management pipeline spanning from edge-to-core-to-cloud
  • A re-think of existing services provided by storage systems like data indexing and data layout in the context of the new-age data engineering pipeline
  • Emerging data engineering challenges including data reduction and data quality assessment

 

Accelerated Erasure Coding:  The New Frontier of Software-Defined Storage

Dineshkumar Bhaskaran

Abstract

Efficient storage is critical to the success of datacenters and the functioning of enterprises. The exponential growth in the volume of data is forcing CIOs to rethink their storage strategies. One challenge they face is finding a replacement for aging RAID technology, which falls short in extreme I/O performance, data protection and resiliency. 
A solution is erasure coding (EC), which is becoming the preferred choice for data protection in large datacenters. Erasure codes have evolved from traditional Reed-Solomon algorithm to more sophisticated locally recoverable and regenerating codes that can perform more efficient data recovery. Erasure codes are compute-intensive and impose higher resource cost on distributed storage system solutions.
Our approach to above problem is a hardened Ceph based distributed storage solution based on modular and scalable EC-offload-engine (ECoE) library. The ECoE, comprising of new age EC algorithms on general-purpose graphics processing units (GPUs) and can provide improvements of up to 40% in encode/decode processes depending on the algorithm.

Learning Objectives

  • Erasure codes and Ceph erasure code plugin infrastructure.
  • ECoE erasure code algorithms and implementation on GPUs
  • Performance and cost analysis of ECoE algorithms on Ceph

 

What Persona Are You – Managing Storage through Handheld Devices

Dhishankar Sengupta and Sandeep Lad

Abstract

There has been a tremendous rise in the computing power of handheld devices over the last 5 years or so. What used to be extremely limited device that was best used for making and receiving calls and text messages (anybody remember the pager), are today more powerful than computers that helped in taking Apollo 11 to the moon. Combine this with the network infrastructure (Read 4G, 5G, 6G and what not coming our way) that gives us the ability to send and receive data from/to anywhere, today handheld devices particularly mobile phones can play and are playing a huge role in changing the way we manage Storage and infrastructure vis.a.vis managing them traditionally. 
This paper explores the extent to which different users with varying responsibilities, would like to control (buttons and knobs readily available at their disposal) their infrastructure from the confines of their handheld devices instead of logging in to the respective management softwares using the traditional access methods. Three persona’s (Admin, IT Generalist, DC-Admin) identified gives a flavor of the typical use cases and to how the transition can be made from merely monitoring alerts and events on our smartphones to actually managing storage infrastructure and other complicated operations.

Learning Objectives

  • What are the prevalent roles of managing data centre storage.
  • What are the mobile technology advancements towards storage management
  • What you can do and can't do from your handheld devices when it comes to storage management