2022 SDC India Abstracts

2022 SDC India Abstracts

Day One Plenary
Day One Track 1
Day One Track 2
Day Two Plenary
Day Two Track 1
Day Two Track 2

Day One Plenary Abstracts

Evolution of NVMeoF Controllers in combining decentralized Edge with centralized public/private Cloud

Sridhar Sabesan, Director, Platform Engineering, Western Digital

Abstract

Edge Datacenters and devices/compute/storage are starting to get more decentralized and combining to form a new architecture with public/private cloud. The emergence is more noticeable with many industry players starting the adoption of NVMeoF. The crux of the adoption lies in the evolution of NVMeoF controllers as the ecosystem provides the path from 100G to 400G network connectivity which is enabling greater embracement in the new age architecture. There are multiple diverse real world use cases, as there is a race towards more innovation in this emerging market.

Object Storage Service Framework

Dia Ali, Data Intelligence Global Solution Leader, Hitachi Vantara

Abstract

The cloud has been adopted by every level of IT organizations. The startup and small business sectors embrace the instant access to resources they traditionally could not leverage early in a market cycle. The medium to large business sectors are pushing hard into the off-premises cloud options for cost savings and simplified management. The juggernauts of business are whispering that they are all in with cloud adoption. So why are we still developing on-premises solutions? The reality many early cloud adopters have come to terms with is the cloud must include an on-premises component. The hybrid cloud is the answer. Defining a framework for Object Storage Service (OSS) will lead to greater adoption and investment in the cloud. A true framework provides a process to evaluate current and future cloud enabled technologies. During this session we will describe a framework for allowing a modern approach to how OSS should deliver intelligent data services.Compute Express Link (CXL) is a communications protocol which enables faster CPU-to-Device and CPU-to-Memory data transfers.

Object Store for Backup, Disaster Recovery and Analytics

Sumith Makam, Principal Engineer, NetApp

Abstract

At a high level, about the unified Architecture which will enable all the above and more workflows by having a copy of primary data in the object store.
In detail, how above architecture enables to use data in object store during Disaster Recovery events.

Protocol Analysis for CXL

Rakesh A Hanumanthappa, Manager, Customer Success, VIAVI Solutions

Abstract

Protocol analysis for Compute Express Link (CXL) links. CXL is a new open interconnect standard that leverages PCIe 5.0 Phy and electricals. As an emerging technology, CXL is gaining traction in the industry and is backed by many top suppliers in the server market. CXL is in its early phases of adoption, so it is important to understand the potential failure modes and how to root cause these issues using analyzer tools.

Day One Track 1 Abstracts

Separating storage & compute for Search Engine - AWS OpenSearch

Murali Krishna, Senior Principal Engineer, Amazon Web Services

Abstract

Search engines use a concept called “inverted index” for fast information retrieval on huge corpus of documents. The compute involves “indexing” and “query”. Due to low latency requirements, storage and compute are closely tied in these systems, which prevents independent scalability, and impacts price/performance. This talk will cover the traditional search engine architectures, how Amazon is innovating in this area with a “remote storage” concept in the OpenSearch engine.

Learning Objectives

Education of product/engine
Innovation Awareness
Brand Awareness

Open standards – Opening up new possibilities

Shiva Pahwa, Senior Manager, & Sathyashankara Bhat Muguli, Director, Micron Technology

Abstract

The biggest barrier to innovation is its adaptation. We see a lot of innovative products failing because of them failing to be compatible to the existing production infrastructure. Customers(especially hyper scalers) do not like to change their ecosystem as a lot of cost has gone into integrating their platform with application stack. Non open standard hardware will need application stack level changes, This will lead to integration issues with software stack. Also, Debugging the software integration changes require a lot of time and effort and proprietary software stack might not be compatible with all OS platforms.

Open standards enable customer to drive their requirements, which can be integrated into the specifications. This enables better interoperability with the existing hardware and software stack, which further leads to reduced procurement and training cost. As standards mature root causing issues will be more deterministic.

Our presentation focuses on highlighting how open standards like UCIe, OMI, CXL, OCP and NVMe are making an impact in the storage world and will continue be the de-facto going forward. We will go through some of the ways open standards are getting the storage industry together, avoiding vendor lock and paving the path to the future.

Learning Objectives

Industry trends on open standard architectures
Past trends and future possibilities
How various standards are coming together?

Optimizing file system performance through investigation of component bottlenecks

Pidad D'Souza, System Performance Architect, & John Lewars, Senior Technical Staff Member, IBM

Abstract

With the exponential growth in data generation, both at rest and in transit, advanced techniques are required to process the large volume of data and extract meaningful insights. Advancements in data processing have been exploited in areas such as high-performance computing (HPC), and Machine Learning/Artificial Intelligence, improving business insights, enhancing customer engagements, and accelerating many aspects of research.

Data architects and scientists continuously seek lower latencies, high throughput, and more efficient scalable access to stored data. File systems play a vital role in ensuring data is both high available and can be efficiently manipulated. Many applications’ performance will be greatly impacted by file system performance, which may depend on data access patterns and system metrics such as throughput, IOPS, and latency. These file system performance metrics are highly dependent on many components such as network architecture, storage subsystem, operating system, and compute/storage node Hardware Architecture (including CPU performance, memory bandwidth, interaction with accelerators, e.g. GPUs, etc).

Learning Objectives

Identification of file system performance metrics
Understanding the influence of various system components on file system performance
Discussion on modelling benchmark performance & using microbenchmarks to investigate components of file system performance

Demystifying AIOPS for Storage Management

Sandeep Patil, STSM, Master Inventor & Ramakrishna Vadla, Senior Software Engineer, IBM

Abstract

Artificial Intelligence(AI) for IT Operations(Ops) is a methodology that combines big data & analytics and machine learning to automate and improve IT operations. AIOPS for storage management has been the trend that is evolving and is expected to be part of storage management and support. If implemented well, it directly benefits the business goals and lowers the cost of IT operations.

As storage developers, it’s important to demystify AIOPS in the context of storage management and separate out the hype from reality. In this session, we detail the applicability of AIOPS in the context of storage covering different scenarios and use cases where it can be applied for storage management. The session covers the architectural building blocks required to implement AIOPS functionality that can lead to predictive support, anomaly detection, and causality determination for storage sub-systems. The talk will entail a sample use of AIOPS for proactive support in a vendor-neutral manner.

Learning Objectives

Understanding Basic concepts of AIOPS
Understanding relationship of AIOPS with Storage Management and Support
Architecture building blocks required to implement AIOPS

Apache Ozone: Multi-Protocol Aware System Handles Both Files and Objects Efficiently

Rakesh Radhakrishnan, Staff Software Engineer, & Mukul Kumar Singh, Senior Engineering Manager, Cloudera

Abstract

Apache Ozone is a distributed, scalable and a high performance object store that can scale to billions of objects of varying sizes. Apache Ozone object store recently implemented a multi-protocol aware bucket feature, where a single Ozone cluster with the capabilities of both Hadoop Core File System (HCFS) and Object Store (like Amazon S3) features. In this talk we will deep dive into the unified and extensible architectural design in Ozone representing directories, files, objects and buckets that allows interoperability between hierarchical file system and object store protocol. Basically, this multi-protocol capability will be attractive to systems that are primarily oriented towards File System - like workloads, but would like to add some Object Store feature support. For example, a user can ingest data into Apache Ozone using FileSystem API, and the same data can be accessed via Ozone S3 API(Amazon S3 implementation). This would potentially improve the efficiency of the user platform with on-prem Object Store. Furthermore, data stored in Ozone can be shared for various use cases, eliminating the need for data duplication, which in turn reduces risk and optimizes resource utilization.

Finally, we will also talk about the roadmap to leverage this new design to introduce a hash based key path locking mechanism which allows more concurrent metadata namespace operations(mixture of write and read workloads).

Learning Objectives

Learn the internal architecture of Apache Ozone suitable for on-prem workloads
Understand the design of a distributed storage system with both hierarchical file system and KV capabilities
Learn about scalability and performance characteristics of Apache Ozone for analytics workloads

Migrating Distributed Object Storage Software onto Kubernetes

Ujjwal Lanjewar, Software Architect & Madhavrao Vemuri, Software Architect, Seagate Technology

Abstract

CORTX is an Open-Source S3 compatible massive scalable and reliable, distributed object storage software platform. CORTX has recently been migrated to deploy in Kubernetes environment in a cloud native fashion. This proposal aims to share the experiences of migrating CORTX onto Kubernetes platform.

CORTX software stack includes data path components like distributed object storage MOTR, S3 server and kernel components, and could only be deployed on a set of physical servers, which took a long time to deploy and had challenges with failure handling situations. CORTX stack was modified to follow stateless micro-service architecture to help loosely coupled infrastructure which can work independently and isolated manner to offer reduced complexity. The software provisioning was redesigned with several improvements to enable it to be distributed as container images and to work in containerised environment which reduced the deployment time significantly with improved failure handling capabilities. The stack was then deployed in the form of pods and containers, along with several other Kubernetes constructs to facilitate service mesh for the components. The network components were ported to user space, along with optimisations of communication channel for kubernetes environment. Storage layers were abstracted to work on the devices provisioned as per the container infrastructure with SWAP space dependencies removed. The logging infrastructure was improved to help system administrators determine the failures and probable actions to resolve those. We performed the performance evaluation of the system deployed in the kubernetes and found it on par with the one deployed in the bare-metal physical servers. Upgrade path follows the rolling upgrade capabilities provided by kubernetes platform which simplifies the upgrade process.

The presentation would cover background about the benefits of migrating to Kubernetes and provide various aspects of design for the migration, such as user mode operations, starting up clustered services, working with storage devices, hardware agnostic deployment, configuration management, logging, scaling, upgrades, cross application co-existence, handling resource constraints and many other similar things. This should benefit all the audiences who are developing distributed storage applications, and trying to migrate the same to deploy in the kubernetes container platform. It would help the community with a perspective of provisioning storage applications in the cloud native environment and the complexity levels of such an activity.

Learning Objectives

Containerising Distributed Applications
Migrate distributed storage system onto the kubernetes platform
Deploy distributed applications in the kubernetes platform

Ensuring Multi-cloud Security

Girija Swami, Principal SQA Engineer, & Prashant Wakchaure, Principal SQA Engineer, Veritas

Abstract

While adoption of cloud computing continues to grow, Organizations are struggling with security issues observed across multiple cloud platforms.

Misconfigurations of cloud resource settings are a leading cause of cloud data and security breaches. Unauthorized and Insecure access to the Cloud Infrastructure, Network and Data are main challenges before Enterprise’s Cloud adoption.

In this multi-cloud era where new type of security attacks are being observed daily, organizations should be conversant with the variety of threats so that they can be prevented before causing much destruction. It will allow you to fix most of the security issues beforehand and avoid any business impact instead of learning from own experience.

Learning Objectives

Get acquainted with Multi-cloud Security Challenges
Identify threat causing Configurations or Behaviors
Preventive Actions

Design challenges and solutions - Unstructured Data Management platform perspective

Anand Reddy, Senior Performance Engineer, Komprise

Abstract

In a hybrid cloud/multicloud environment, organizations are looking to constantly move data back and forth between on-premises and public cloud to run desired cloud services (e.g., AWS, Azure for S3 and Cloud NAS Filesystem), and get the data back to wherever it is needed.

The organizations need a simple, efficient, secure, and automated way to rapidly move data across locations while keeping the data in sync and without the complexity of data formats (e.g., files to AWS S3 and vice versa). Tiers may be defined by performance, capacity, and/or resiliency requirements of the data/applications. These capabilities will help the customer benefit by getting the best of all the worlds: SSD-level performance for hot data, HDD prices for capacity, geographic and industry compliance by leveraging the right cloud, leverage of cloud-based analytics and, above all, automatic data placement to exploit all the benefits and cost savings.

To achieve above in Data Management platform often we run into interesting Design challenges which needs system level and business level trade off. This Presentation will talk about common and uncommon Performance and scale issues and their solutions implemented by Data Managment vendor.

Learning Objectives

Filesystems and protocols
Cloud NAS topologies
Performance engineering challenges

Day One Track 2 Abstracts

SSD read latency optimization

Varun Paliwal, Technical Lead, & Ankit Kumar, Senior Software Engineer, Mindteck

Abstract

An SSD with high-end cellular technology provides high storage durability but suffers significant performance degradation due to multiple retrieval functions. However, the re-reading method is essential to ensure the integrity of the SSD memory. It can significantly increase SSD reading delays by introducing multiple re-read steps that re-read the target page with adjusted reading reference values. To reduce re-read delays, two advanced features are widely accepted in NAND Flash-based SSDs: 1) CACHE READ command and 2) ECC robust engine. First, we can minimize the delay in retrying using the advanced CACHE READ command that allows the NAND flash chip to perform sequential readings in a pipeline. Second, a large ECC competent margin exists in the final attempt and can be used to reduce chip-level learning delays. Based on new findings, we discuss two new strategies to reduce re-reading delays effectively: 1) Pipelined Read-Retry (PR²) and 2) and Adaptive Read-Retry (AR²). PR² minimizes the delay of the experimental task by installing the sequence retrieval steps using the CACHE READ command. AR² minimizes the delay of each re-step step by flexibly reducing the chip level reading delay depending on current operating conditions that determine the ECC power margin. These strategies improve SSD response time by up to 31.5% (17% on average) over a modern basis with only a slight change in SSD controller.

Effective device thermal management based on dynamic ranking of device cooling needs

Hemant Gaikwad, Test Senior Principal Engineer and Shelesh Chopra, Senior Director, Dell Technologies & Rahul Vishwakarma, Graduate Student, California State University

Abstract

A constant cooling is provided to the systems for the data center cooling perspective, while the fans provided on the devices try to locally cool the device components by spinning faster whenever required. The fans spin faster, in turn, resulting in higher power consumption. Every device in the data center is cooled by the centrally conditioned air at a constant set temperature irrespective of the heat dissipated by the device; however, what is needed is a context-aware cooling per device or a group of devices. Devices heating at a much higher rate than the cooling offered negatively impact the device performance, where CPUs, DIMMs, Disks, NICs, etc. under-perform due to high temperatures. Within the Datacenter racks too, devices higher in the rack get more heated compared to the lower placed devices, since the hot air moves upwards. Moreover, the inlet and outlet temperature control are difficult to achieve, and this inefficient thermal management increases the cost and reduces device availability.

Also, on the other side, overcooling is another problem for the data center leading to high power bills where all devices are provided with the same amount of cooling irrespective of whether any device needs it or not.

We present a mechanism to provide recommendations with respect to effective device thermal management based on a-priori predictions for the future device statistics.

1. The solution can autonomously identify devices needing specialized cooling in a particular range and group those together and enforce business continuity for such devices.

2. Dynamic ranking of device cooling needs is done based on the forecast with continuous accuracy and detailed root-cause analysis using causal relationships.

3. Fourier Time Series to find next n-step ahead forecast of the device thermal metrics along with device load & failure and devices depicting prolonged periods of thermally high state are dynamically ranked for special cooling need.

Learning Objectives

Challenges while deploying a solution for device cooling
Machine learning implementation for device thermal management
Recommendations with respect to effective device thermal management based on a-priori predictions

Future of Long term Archival Data storage - DNA

Kannadasan Palani, Senior Test Manager, MSys Technologies

Abstract

This session will demonstrate the methodology and principle of using DNA for long-term data storage. The audience will learn how to store data in SSD, HDD, or Tape Drive and discuss its cost, reliability & longevity. The session will help understand the pros and cons of each drive type. We will learn What Is DNA storage and how it works.

DNA data storage is the process of encoding and decoding binary data onto and from synthesized strands of DNA. We will learn how the original digital data is encoded, written (synthesized using chemical/biological processes), and stored. We will also understand how DNA molecules are sequenced to reveal each individual A, C, G, or T in order and remapped from DNA bases back to 1s and 0s.” When the stored data is needed again. Also, We will see the pros and cons of DNA storage and why we are calling it as "Future of Long-term Archival Data storage"

Learning Objectives

Methodology and principle of using DNA for long-term data storage
Process of encoding and decoding the binary data
Pros and cons of each drive type - SSD, HDD, Tape Drive

Memory Profiling of Java Based Microservice Architecture

Gururaj Kulkarni, Distinguished Member of Technical Staff, & Cherami Liu, Architect, Dell Technologies

Abstract

Distributed Microservices Based Architecture continues to grow as an architecture choice of building the complex application stacks. Microservices architecture is becoming a kind of de-facto choice for applications which reduces multiple level of dependencies in Agile methodologies and DevOps cycle and improves go-to market strategy. In a monolithic application, components invoke one another via function calls and maybe they use single programming language. The resource requirement (such as memory) for such monolithic application is statically sized since there will be a smaller number of services that perform unique set of sub-operation for different use cases. These services may run on single machine or highly available clustered machines or inside containers. In micro service-based architecture, each service instance is performing unique set of tasks which is independent of other services and communicates with other microservices using either REST API or message bus architecture. Therefore, resource requirement for such application for individual microservice varies from one service to other depending on its business logic.

The microservice can run independently inside a container (e.g., docker container) or outside, however they are generally independent each other. The memory usage and optimization for microservice is one of the key important aspects for its sizing consideration. For microservices that are running as independent entity outside containers, the sizing consideration generally applies to JVM service. Whereas, for microservices running inside container, the sizing consideration applies to container. The architects generally face the challenge about how much memory to allocate for heap while sizing the microservices. Also, another key challenge generally being faced while dealing with memory sizing is “how garbage collection eats-up the memory and which garbage collection algorithm suits well for different microservices?”. There are various factors that affect the Java based micro services. In this presentation, we provide a detail overview about following topics

1. Monolithic and Microservice based architecture and high-level differences

2. Memory model, what architects should be worried about for sizing consideration of microservices

3. Detail memory segments

• Resident, Swap, Heap, Non-heap

4. Monitoring (What are important factors to be measured and how to measure)

5. Recommendations about memory sizing

6. Key takeaways

Learning Objectives

Understanding of Monolithic and Microservices Architecture
Memory model and sizing consideration for Java based microservices
Potential tools to Monitor and measure the microservice memory usage

High Speed Magnetic recording without energy dissipation

Boris Tankhilevich, CEO, Magtera Inc.

Abstract

While recent developments in photonics enable lossless data transfer with speeds exceeding 1 Tb/s, current magnetic data storage cannot keep up with these data-flow rates nor decrease energy dissipations. Consequently, already now data centers are becoming the biggest consumers of electricity world-wide.

The ultrafast writing of bits at the speed up to THz that does not involve the dissipation of energy in the recording medium is a monumental challenge.

Magtera invented an ultrafast writing of bits at the speed up to THz that does not involve any usage of electrical current thus avoiding the dissipation of energy in the recording medium.

More specifically the apparatus for novel technique of high-speed magnetic recording is based on manipulating pinning layer in magnetic tunnel junction-based memory by using terahertz magnon aser. The apparatus comprises a terahertz writing head configured to generate a tunable terahertz writing signal and a memory cell including a spacer that comprises a thickness configured based on Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction. The memory cell comprises two separate memory states: a first binary state and a second binary state; wherein the first binary memory state corresponds to a ferromagnetic sign of the Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction corresponding to a first thickness value of the spacer; and wherein the second binary memory state corresponds to an antiferromagnetic sign of the Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction corresponding to a second thickness value of the spacer. The thickness of the spacer is manipulated by the tunable terahertz writing signal.

Learning Objectives

Terahertz Magnon Laser
RKKY interaction
Ultra-fast magnetic recording without energy dissipation

Augmenting SPDK with xNVMe BDEV

Krishna Kanth Reddy, Associate Director & Kanchan Joshi, Associate Director, Samsung Semiconductor India Research (SSIR)

Abstract

SPDK creates block-device abstraction layer, referred as SPDK BDEV, on top of various low-level backends that are called BDEV modules. Linux aio, io_uring and NVMe are few examples of such modules.

This talk presents a new BDEV module, xNVMe BDEV, which has recently made its way to upstream and will be available as part of SPDK 22.09 release.

xNVMe BDEV not only provides a way to use existing aio and io_uring interface, but also allows using asynchronous NVMe-passthrough interface that guarantees availability and scalability. We touch upon the new passthrough interface, and elaborate how SPDK block layer can utilize it through the implementation of xNVMe BDEV model. With the xNVMe BDEV module, applications such as RocksDB, CEPH that use SPDK, leverage the high performance Linux Asynchronous IOCTL feature without any change to their software interfaces.

The talk also showcases performance of the xNVMe BDEV module with the io_uring and the io_uring pass-through backends.

Learning Objectives

Understand SPDK block layer
Understand xNVMe and xNVMe BDEV
Understand io_uring passthrough for NVMe

Data Analytics with Computational Storage and PostgreSQL

Ajay Joshi, Technologist, Western Digital

Abstract

The data in the cloud is ever-evolving and rapidly expanding, leading to an explosion in storage and network infrastructure requirements. Unfortunately, data analytics on this disaggregated stored data leads to high latencies and increased IO load.

Our presentation shows how integrating Computational Storage with PostgreSQL's query planner allows it to utilize Computational Storage seamlessly to reduce the network & IO load.

Learning Objectives

Understand Computational Storage
Offloading of query planner to hardware
Understanding of advantages of the offload mechanism

S3select: Computational Storage in S3

Girjesh Rajoria, Software Engineer, Red Hat

Abstract

S3select is an S3 operation (introduced by Amazon in 2018) that implements a pushdown paradigm that pulls out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3. The talk will introduce s3select operation and architecture. It will describe what the pushdown technique is, why and where it is beneficial for the user. It will cover s3select supported features and their integration with analytic applications. It will discuss the main differences between columnar and non-columnar formats (CSV vs Parquet). We’ll also discuss recent developments for ceph/s3select. The presentation will show how easy it is to use ceph/s3select.

Learning Objectives

Explains how s3select improves the performance and reduces the cost of applications that need to access data in S3
Explaining the official S3 operation (Introduced by AWS a few years ago) and how it is implemented in Ceph/s3select
Helps in understanding how s3select can be used and where it is beneficial

File System Acceleration using Computational Storage for Efficient Data Storage

Srija Malyala, Software Developer, & Vaishnavi SG, Software Developer, AMD

Abstract

We examine the benefits of using computational storage devices like Xilinx SmartSSD to offload the compression to achieve an ideal compression scheme where higher compression ratios are achieved with lower CPU resources. This offloading of compute intensive task of compression frees up the CPU to cater to real customer applications. The scheme proposed in this paper comprises of Xilinx Storage Services (XSS) with Xilinx Runtime (XRT) software and HLS based GZIP compression kernel that runs on the FPGA. The hardware platform chosen is Xilinx SmartSSD which also has a unique feature of P2P data transfer where the data input/output to/from the FPGA is directly moved from/to the storage device without moving it back to the host system (x86) memory. This further helps in improving the overall system efficiency by reducing the DDR memory traffic by moving computation closer to where data resides. There are different places in the application/OS software stack where data compression can be offloaded to hardware. We have chosen to do this at the file system level because this will enable all the applications using the filesystem to benefit without necessarily making any changes to the application itself. We have selected the Linux ZFS filesystem as this is the most widely used and popular file system today.

Learning Objectives

We examine the benefits of using computational storage devices like Xilinx SmartSSD to offload the compression
Peer to Peer data flow and it's advantages
Advantages of offloading at File System

Day Two Plenary Abstracts

Storage architecture challenges in the Fintech sector – A call for innovation

Murali Brahmadesam, CTO & Head of Engineering, Razorpay

Abstract

The rapid development of the Fintech sector in recent years has brought with it a number of data related challenges such as managing data growth, data protection and security against hacking, compliance with money laundering and data privacy legislation, and demand for new applications in a competitive space. Many Fintech companies must overcome these challenges so they can continue to develop sustainably and build trust amongst their customer base.

This talk will outline the needs of the market sector and act as a motivation to storage developers, architects, and engineers to produce innovative storage products and solutions to overcome the challenges in the Fintech industry.

Ransomware attacks? Data protection holds the key!

Vijay Mhaskar, VP Engineering and Pune Center Site Lead, Veritas

Abstract

Data has become the Oil of this century. It’s not just large enterprises, but even consumers are not able to conduct their day-to-day chores without access to data. So, no wonder hackers have a field day. It’s not if - but it’s when! 80% of the organizations have publicly accepted that they have been attacked by ransomware, while the rest 20% are just not vocal about it.

Data protection is being seen as the last defense against any security threat. So, what is the expectation from customers and what are the possibilities?

Streaming data platform

Karnendu Pattanaik, Senior Manager, & Somesh Joshi, Software Engineer, Dell Technologies

Abstract

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). Streaming data includes a wide variety of data such as log files generated by customers using your mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices or instrumentation in data centers. Streaming data processing is beneficial in most scenarios where new, dynamic data is generated on a continual basis. The presentation will talk about storage of streaming data and aspects of platforms that enable data processing capabilities on this streaming data.

Building a Storage Services Platform for Enterprise Containerized Applications

Tom Clark, Distinguished Engineer, Chief Architect Storage Software, IBM

Abstract

Modernization of applications is a major trend in the IT industry. Modernization implies creating containerized applications using micro-services and devops approaches and includes transitioning existing applications to containers. As more enterprise workloads move to containers, storage and storage services become a critical part of running containerized applications in a production environment. This talk will describe the current state of storage technology for containerized applications, the challenges and required solutions for providing enterprise level storage services for critical applications running in a containerized environment.

Day Two Track 1 Abstracts

Cyberstorage - A security first storage approach

Sanjeev Kumar, Lead, Storage Center of Excellence, Tata Consultancy Services

Abstract

Living in this modern era, losing information in physical mode is an outdated trend now. We have created a better solution for this i.e., Digital Data storage. Evolution of security also evolved the fluidity of attack on our unstructured data storage. Cyber-attack, a familiar term which resists storage organizations from providing a privacy consolidate experience to their users. Organizations have shown maturity in protecting end points for data centre products but the centralized storage is still vulnerable to most of the cyber-attack. Unstructured data platforms provide inadequate protection from malicious deletion, encryption and data exfiltration, making it an easy-to-attack target.

In this presentation, we will discuss about Cyber Storage phenomenon which uses active defence mechanism to identify, protect, detect, respond and recover cyber- attack on un-structured data storage solutions based on NIST Framework. We will draw an end to end solution approach, tools and best practices for unstructured data platform active protection from cyber-attack.

The new, catch-all, fast path to NVMe in Linux

Kanchan Joshi, Associate Director, & Anuj Gupta, Software Engineer, Samsung Semiconductor India Research (SSIR)

Abstract

While the Linux kernel has elevated the efficiency of the I/O stack with constructs such as io_uring, the NVMe storage continues to evolve with new features and command sets.

Many new NVMe features either do not fit well within the Linux abstraction layers and/or face adoption challenges due to a lack of appropriate syscall interface. The existing NVMe passthrough interface does not help as it is tied to synchronous ioctls.

To address these challenges, we have added a new NVMe passthrough interface that will be available in the mainline kernel from 5.19 onwards.

This interface guarantees:

(a) availability - any existing/future NVMe command-set/feature remains usable with this path

(b) performance efficiency - as it is paired with io_uring

The talk presents the specifics, and how applications can go about using this new interface.

We also present the tooling/testing enhancements and provide the evaluation comparing this path against regular block and passthrough IO

Learning Objectives

Understand the limitation of existing path (block IO and sync passthrough)
Understand how to use the the new path (async passthrough)
Understand io_uring based async programming

PoseidonOS: An Innovative Open Source Storage System and its exploration using Trident

Badarinarayan Joshi, Associate Technical Director, & Arun V Pillai, Staff Engineer, Samsung Semiconductor India Research (SSIR)

Abstract

With the popularity of NVMe SSDs, the NVMe-oF interface is becoming important in disaggregated datacenters. However, it is non-trivial to fully utilize large numbers of high-performance and high-capacity NVMe SSDs. In this talk, we will introduce PoseidonOS, which is an NVMe-oF reference solution that was open-sourced last year. In particular, we will introduce its essential features, and our choices for an NVMe-oF storage system as a building block of a disaggregated datacenter. We will also address the challenges in developing test libraries for the POS software stack, and the design principles applied in order to build libraries which are powerful but also simple to use. We will help audience explore Poseidon OS using a test tool for NVMe-oF systems called Trident. Trident enables using simple test scripts to test complex scenarios, including use cases wherein the details are handled by configuration-based core libraries

Learning Objectives

NVMeOF based storage systems
Test library design for complex storage system test
Poseidon OS and its Introduction

Stream processing and storage system : how streams help to make real time insights.

Krishan Rai, Senior Manager & Abhin Balur, Senior Principal Engineer, Dell Technologies

Abstract

The evolution of technology has enabled the continuous generation of massive data (e.g., from connected devices, sensors etc. ) and Stream processing has been an active research field for more than 20 years. As, more data becomes available, organizations are using cutting-edge tools and techniques to extract useful insight from the data immediately once they are generated.

We would like to talk about streaming storage system and how streaming can help to make real time insight from streaming data and how open source Pravega can help to achieve this objective.

Learning Objectives

Challenges while performing analytics over historical data
Introduction to Streaming World
Pravega's contribution to stream processing and storage system

CXL Vs Computational Storage: Revving up the Compute in Data centers

Abhilash Nag, Principal Engineer-II & Shiva Pahwa, Senior Manager, Micron Technologies

Abstract

As Moore's law catches up with the CPU's technology enablement, vendors are finding it hard to reduce the cost to performance ratio of newer nodes to meet the ever evolving industry needs.

As micro service architecture evolves, more and more applications are being parallelly executed on the same compute node, making CPU the bottleneck.

Today's industry is moving away from CPU bound operations to offloading significant amount of compute tasks to devices that are better capable of handling such operations. This has generated significant amount of interest at industry wide horizon to design future standards in compute disaggregation.

The Storage and compute technology enablers have come together to define the next generation of standards and technologies.

This digital transformation of accelerators is being led by two different industry consortiums - CXL and Computational storage standards.

In our presentation, we would compare both the computational paradigms and the use cases of each by providing insights into the following:

1. CXL and Computational storage's driving force

2. Commonality and differences between both the standards.

3. Use cases and fitment of each standards.

4. Advantages of each over the other.

Learning Objectives

CXL and Computational storage's driving force
Commonality and differences between both the standards.
Use cases and fitment of each standards.

Breaking The Barrier – How To Apply Dynamic Machine Learning Modelling At Scale In Production Storage Systems

Supriya Kannery, Software Senior Principal Engineer, & Prajwala B Patil, Software Quality Senior Engineer, Dell Technologies

Abstract

For intelligent systems, Machine Learning (ML) has become inevitable. The recent proliferation of ML algorithms has led to challenge of selecting the right model for a problem domain. We have seen both Static models / Batch models as well as Dynamic models / Online models being used for production systems. But there are barriers : How accurate are these static models when applied into production systems where data trend is dynamic? How frequent re-training of the data set is required in dynamic modelling? Are these existing modelling approaches able to satisfy the increasing demands for different regression and classification problems encountered in a storage system? Isn’t that we need to emphasise on required speed and accuracy than focusing on which model to be used?

In this talk, we will be explaining how we used AutoML (Automated Machine Learning) with enhanced algorithms to address real-world problems in a resource-constrained environment. We could address those major requirements related to machine learning modelling in production systems: apply ML modelling at scale, demand for faster predictions, restricted resource availability and time taken for getting the right model from plethora of algorithms. Autogluon, a popular open-source implementation of AutoML from AWS, was used in our trials. We will demonstrate how dynamic machine learning modelling can be applied at scale in production systems by taking popular problems of capacity and disk failure prediction as examples for regression and classification problems respectively.

Learning Objectives

How automated machine learning can be used for applying dynamic machine learning at scale in production systems
Fundamentals of hyperparameter tuning for a resource-constrained environment
Enhanced AutoML approach for dynamic modelling to get required speed and accuracy

Demystifying Edge devices cloud native storage services for different data sources

Umang Kumar, Master Technologist, Subhadip Das, Expert Technologist, & Chaitra Shankar, Sr Cloud Developer, HPE

Abstract

Edge is becoming the new core. More and more data will live in edge environments rather than in traditional data centers/cloud due to reasons ranging from data gravity to data sovereignty.Different data sources(block,object,streaming etc) require different kind of storage architecture at edge. Data movement and data storage are key components of edge computing. Also Taking cloud operating model at edge is gaining momentum.

In this talk we will try to demystify different cloud native storage services that can be use at the edge nodes for different data types and advantages it brings .

Learning outcome:

a) understanding cloud native operating model at edge

b) Understanding Cloud native storage services at far edge ,Enterprise edge and network edge

c) Understanding Different data sources(block,object,streaming etc) at edge and its associated storage services

Learning Objectives

Understanding cloud native operating model at edge
Understanding Cloud native storage services at far edge ,Enterprise edge and network edge
Understanding Different data sources(block,object,streaming etc) at edge and its associated storage services

Building an Object based STaaS solution with Poseidon Storage

Swati Chawdhary, Senior Manager, & Abdul Ahad Amir, Senior Staff Engineer, Samsung

Abstract

Samsung recently contributed Poseidon project, which is an OCP-based industry collaboration between component vendor (Samsung), system vendor(Inspur) and data center.

Poseidon is an open-source storage software and hardware platform for NVMe-oF based systems. It's an EDSFF based storage reference system, targeted for performance and density, suitable for cloud data center.

Today cloud service providers (CSP) are constantly challenged with large volumes of data and increasing demand from customers for cost effective storage.

To solve these challenges, CSPs provide software-defined StaaS(Storage as a Service) solutions by exposing the data as a service. STaaS services can be deployed with object, block and file storage.

Amongst these, object based STaaS service is becoming popular, as object storage is the dominant class of storage for the cloud.

We deployed object based STaaS on Poseidon, which provides a high-performance storage platform, suitable for AI and ML workloads. Also, we integrated our STaaS deployment with Kubernetes to manage storage resources, just like compute resources, delivering full scale automation of both stateful and stateless components.

In our talk, we will discuss how Poseidon helps in improving data center storage efficiency and performance and present the S3 performance benchmark numbers obtained for the STaaS deployment.

Learning Objectives

Poseidon Project
Significance of Object Storage in Cloud
How to build Object based STaaS with Kubernetes

Scale Out Cloud Storage Testing Challenges, Tools and Techniques

Pranav Sahasrabudhe, Senior Staff Engineer, & Sarang Sawant, Senior Staff Engineer, Seagate Technology

Abstract

With new age applications, the cloud data has been growing exponentially over the last few years. Consequently, the underlying cloud infrastructure requires highly scalable mass capacity cloud-native object storage, coupled with performance, reliability & availability. To address this problem, many scalable cloud object storage implementation, have evolved such as Ceph, MinIO, ActiveScale, CleverSafe, CORTX Object Storage. Testing a mass capacity distributed storage, for the scalability, reliability and availability has own sets of challenges. This proposal, presents the challenges of testing a highly scalable, highly available and new age mass storage solution and ways to handle them.

CORTX is an Open-Source S3 compatible object storage software. It is a robust, massive scalable, highly available and reliable, object storage platform. It can be deployed in a cloud-native fashion in Kubernetes based container environment and can work with any underlying Storage Enclosures or JBODs. Various aspects of scale out object storage functions were evaluated during the development of CORTX. The objective is to share the experiences from the testing of CORTX object storage platform, for functional and non functional aspects such as performance, security, availability, durability and scalability and provide landscape of tools, techniques and methods used for the same.

We plan to provide a holistic view of compilation of testing tools and technologies that compliment the human factor involved in developing & testing scalable storage. The solution set includes open sourced tools such as ‘CORIO' and 'CORBOT’ to address the requirement of longevity, reliability, scalability and sustenance testing using containerized automation platform, followed by tools such as 'PerfLine' and 'PerfPro' to fulfil the performance testing requirements. For testing of availability features, availability models and Chaos Engineering tools (Chaos Mesh; kube-monkey), can be most effective. Further, it includes the tools and technologies used for various levels of security ranging from source code, platform all the way to the ecosystem, to validate end-to-end security.

This presentation should benefit all the audiences working for the development and testing of distributed storage applications, to understand about the entire landscape of storage testing. It would help the community with a perspective of testing applications in the cloud-native environment and available tools and techniques for the same.

Learning Objectives

Scale Out Cloud Storage
Cloud Storage Testing challenges
Storage Testing Tools and Techniques

Azure SMB multichannel using an improved Linux client

Shyam Prasad, Senior Software Engineer, Microsoft

Abstract

Azure files service is a true clustered filesystem in the Azure storage offerings from Microsoft. The service can be accessed using SMB and NFS clients, other than the REST protocol. Last year, Azure files added support for SMB 3.1.1 multichannel feature, which allows SMB clients to setup multiple connections to Azure files service, thereby improving the overall achievable bandwidth. While the Linux SMB client supported this feature as early as the 5.8 kernel, there has been several changes in the recent past, which makes this feature a lot more robust, particularly for the Azure files scenario.

Learning Objectives

What is SMB multichannel feature?
How to get best perf out of Azure files?
How does Linux do SMB multichannel?

DIRL: An Innovative Fabric-Centric SAN Congestion Solution

Fausto Vaninetti, Technical Solutions Architect, Cisco Systems

Abstract

Congestion is a well-known challenge impacting all networks and Fibre Channel Storage Area Networks make no exception. Several techniques have been devised over the years to alleviate and reduce the impact of SAN congestion on workloads and applications, but all implementations so far have come with some limitation or concern. Recently a new fabric-centric approach has been introduced, with the capability to dynamically rate limit ingress traffic to switch ports. The new approach can effectively prevent congestion from Fibre Channel fabrics of any kind. This session explains the basic concepts behind the DIRL solution and its benefits.

Learning Objectives

Illustrate the different congestion scenarios in a SAN
Describe the working principles of the DIRL solution
Present the benefits of DIRL in contrast to alternative approaches

Day Two Track 2 Abstracts

Zero Trust Security And Air Gapping For Secure Long Term Data Archival

Mohamed Ashraf Kottilungal, Senior Solutions Architect, MSys Technologies

Abstract

Only a little more than a week has passed when a pharmaceutical industry joined too found themselves added to the list of recent cyberattack extortion survivors. While the global enterprises and ISVs are moving towards predictive analytics and actionable insights drawn from long-term historical data, the possibilities of cyberattacks and natural disasters cannot be ignored that easily. Among the recent developments in data archival, the two that have really stood out for fully-managed data archival and protection are – Zero Trust security and Logical Air Gapping.

In this session we will discuss the two approaches of security in detail. We will see how zero trust security architecture can assure easy availability while allowing least privileges with role-based control. The discussion will then highlight how zero trust security has a strong effect in logically air gapping the data objects to create an impenetrable fortress around the archived data.

Learning Objectives

Know the existing challenges for long-term data archival and current cybersecurity trends
Learn the fundamental needs for data archive protection that led to Zero Trust Security measure
Understand the intrusion defence and data manipulation protection that can be achieved by logically air gapping offered by Zero Trust

Intelligent Data Centre health monitoring and Remediation

Nibu Habel, Principal Engineer, NetApp

Abstract

Datacenter deployments are a complex task, It involves many steps and is quite error-prone. Providing guidance to customers that could keep a check on these deployments makes the administrator's life simple. Once installations are complete, keeping these installations to their best performing ability and ensuring they are secure involves being on the latest updates. Empowering the customers by having a continuous monitoring system that could learn, check and predict issues and failures, can save the data centres from catastrophes of data loss and downtime. This presentation is about, how we can make data center health checks and validations more reliable by learning through community wisdom and predicting issues well in advance. Pointing out issues does not help, auto remediations are the need of the hour. Everyone wants data centers to be just like a self-driving car which brings them home safe all the time.

Learning Objectives

Checking validity of configurations
AI driven methods that could classify systems with issues
Auto remediation of issues

Securing APIs: Threats and security best practices

Anupam Chomal, Software Security Expert, CARIAD

Abstract

APIs are used extensively today across different businesses, like autonomous driving, banking, IoT applications etc. we will start by a brief introduction to APIs and how APIs should be configured securely. The large store of data, usually employed for such applications, becomes an easy target for attackers. Also, its possible to DoS services, which again could lead to high losses. Most cases these APIs would be running thousands of connections, along with million of operations each day.

We will then look at the recent attacks against API security, and try to understand the best practices that could have avoided them.

We will also look at the OWASP API security top 10, and also some common API security testing tools.

We will conclude by summarizing about the advantages of using APIs and how properly configuration can keep us safe from most issues.

Learning Objectives

API Security issues
Recent attacks against APIs
Learning the OWASP API Security top 10

Performance Study between On Wire Encryption through gRPC and Secure TCP interfaces for Data Replication

Sravan Kumar Reddy Bhavanam, Member Technical Staff, Nutanix

Abstract

The need for encryption of data on wire has been gaining importance to provide better security for data that is sent across sites. This has been initially leveraged in our architecture by sending RPCs and application data through gRPC interfaces with encryption enabled. While this solved the problem of sending encrypted data, the performance throughput was vastly affected(~3-14%). By directly incorporating TLS to secure the TCP communications, the expectation was to move towards a more lightweight form of encrypted interfaces. The performance experiments carried out on the different flavours of data replication seem to indicate that when using secure TCP interfaces as means of encryption, there was lesser degradation in throughput of around ~2-7% less when compared to gRPC interfaces.

Learning Objectives

TLS equipped TCP interfaces
Comparative study between gRPC and secure TCP interfaces
Performance benefits of secure TCP over gRPC interfaces

Secure P2P File Sharing in Distributed Cloud Storage

Vishwas Saxena, Senior Technologist, Firmware Engineering, Western Digital

Abstract

Distributed Cloud Storage is built on the backbone of P2P network of NAS nodes and one cool use case that gets enabled on top of this architecture is P2P peer File sharing. User friendly and well desined security flow for the peer to peer file sharing is crucial for the success of this feature. In distributed cloud storage architecture, each File is sharded, encrypted and erasure coded before its stored

Various schemes have been proposed that utilizes a centralized server to maintain a database of ownership of users to a File and allows the ownership data base to be appended with a new user. While this remains a good solution for small amount of Files, it doesn’t scale well when the number of Files reaches billions.

We propose a new architecture of moving the ownership of database to NAS nodes contributed to distrubuted storage. This way we ensure a fail safe methodology to do secure peer to peer File sharing in distributed cloud storage systems. In this architecture there are two types of NAS nodes – Nodes that initaties the uploads/downloads of a File on distributed network called File owner nodes and nodes storing the chunks of Files called File storage nodes

Each File owner node pushes the owner node ID along with respective HMAC key to the File Storage nodes. The HMAC Key is used by the node owner to verify the HMAC provided by the node requesting access to a File. The AES encryption key for the File is stored in a meta-data on the File owner node is not passed to the File storage node. Whenever a primary node wants to share a File to Secondary node, it will raise a request to File Storage node. File Storage node will update the metadata on the node storing the data. After the File Storage node’s meta-data is updated about with the authorized owners for File that now includes the Secondary node a shared transport channel is established between primary and secondary node using Diffie Hellman Key exchange and secondary node is provided with the HMAC key of the File and encryption Key on the secure channel.

The Secondary node is now enabled as owner of that File and now approaches the File storage nodes by providing the HASH computed using the HMAC provided over the secure channel by Primary node, File Storage node verifies the Hash provide by the secondary File owner node by computing the hash itself using the File’s HMAC key and get all the shards/chunks of the File from the respective File Storage nodes in the same fashion. After reassembling the shards, secondary node now decrypts the File using the AES encryption received from the primary File owner node over the secure channel. This architecture flow ensures a fail-safe method of P2P File sharing between Primary and Secondary File Owner nodes that is independent of any centralized data-base and thus scalable when number of Files reaches billions.

Learning Objectives

Learning state of the art secure Peer to Peer File sharing method
Learning state of the art distributed cloud storage architecture and algorithms
Feedback from storage developers developing distributed cloud storage

Challenges to Container storage from perspective of data protection

Sushantha Kumar, Lead Architect, & Pravin Ranjan, Senior System Architect, Huawei Technologies

Abstract

Container storage usecases/deployments rapidly increasing and with that, lot of challenges around that as well.

One of the key challenges is about data protection for containerised applications

In this session, we would like put forward various challenges related to data protection and opportunities and happenings in this area.

We shall also show demonstration on how soda foundation project deals with some of these use cases and what is coming next.

Learning Objectives

Some learning on container Data protection
Containerised storage usage patterns
Containerised application deployment view

Crash consistent Backup of Complex Containerised Application with multiple PVCs

Deepak Ghuge, Software Architect and Master Inventor, Rahul Nema, Lead Architect, & Rasika Vyawahare, Quality Engineer, IBM

Abstract

Real Life Application are complex and likely to be consist of multiple loosely coupled part (Thanks to Adoption Micro service Architecture ). When application migrated/designed for Kubernetes, They are usually set of deployment, pods, PVC and other Kubernetes artefacts. From application perspective those logically connected, but for Kubernetes all those are individual entities. By Design container are stateless but for stateful application state is maintained at storage level. The Application consistency in Kubernetes environment means, consistency of all containers belong to that application which in turn means consistency of all PVC's belonging to that application. As of Today Kubernetes only exposes the mechanism to take snapshot of PVC independently, There is no inbuilt mechanism to take group snapshot of all PVC's belonging to an application at the same time maintaining the data consistency. The only way as of today is to take a large downtime whenever backup is requested and then iterate over PVC's for the snapshot/backup. In this session, we will go over how complex application can be backed up easily with/without downtime using existing Kubernetes API and with the help of Shared FileSystem used as storage for containers.

Learning Objectives

Understand real life containerised application
Understand current state of backup and restore in container environment
Understand Consistent Backup and restore of Containerised application

Testing Resiliency and Fault Tolerance in Containerized Environment using Chaos Tools

Aparna Bindage, Software Quality Assurance Engineer & Sheryl Francis, Software Quality Assurance Engineer, Veritas

Abstract

In containerized environment, there are challenges to retain the stability and reliability of storage and applications. While running various applications in such environment, unpredictable issues related to storage might appear. They can hamper the performance and show undesired behaviour. This session provides a procedure of assessing stability and reliability of application pods that are using container storage with the help of various chaos tools. This will also help in evaluating the resiliency, fault tolerance and built-in recovery feature of the applications present. At storage layer we can ensure data consistency throughout volumes, mirrors, snapshots etc. Such turbulent conditions when introduced by these tools can produce two possible outcomes, either we can identify future issues beforehand and fix them or we can verify the resiliency and the fault tolerance, hence increasing confidence in the environment.

Learning Objectives

Performance of applications run in containerized environment in unpredictable situations
Evaluate the stability and reliability of storage resources by applications
Evaluate fault tolerance, resiliency and recovery of applications

Build Kubernetes Native Storage Observability

Sanil Kumar D, TOC Co-Chair, Arch Lead, SODA Foundation & Chief Architect, Huawei & Joseph Vazhappilly, Lead Architect, Huawei/SODA Foundation

Abstract

Data Observability is confirmed as one of the key trends in the coming years. For infrastructure management, insights and automation, it will play a key role, such as build end to end AIOps solutions.

Deep storage metadata monitoring can provide critical health information for preventive and predictive maintenance of infrastructure deployments. Container application deployments are growing and so as the demands of container storage management. Container storage monitoring in Kubernetes has certain features through CSI. However, it does not provide all storage insights (like any vendor management platform!). Hence there is a need for a native storage monitoring support and in turn container storage observability.

We discuss different aspects of getting the storage resource, alert, and performance information in Kubernetes deployments. The session describes the architecture and solution for Kubernetes native storage observability. Also, how we can work on an open-source solution in a collaborative way.

Learning Objectives

Introduce Storage Monitoring and Observability
What is available in Kubernetes for Storage monitoring
Heterogeneous container storage monitoring

Architecting Multi-Cloud Storage Solutions

Kaustubh Katruwar, Senior Software Engineer & Sasikanth Eda, Cloud Storage Engineer, IBM

Abstract

In the current era, for many enterprises multi-cloud strategy has become inevitable for various obvious reasons such as flexibility, disaster recovery, cyber resiliency, compliance etc. However, the native storage services offered by clouds are not compatible with each other and poses difficulty in setting up data lake environments. This leads to the raise of strong need to build a distributed filesystem spanned across multi-clouds seamlessly exchanging data tolerating the WAN latency, along with considering the storage service challenges offered by each cloud. To up the game introducing container native approach which offer agility, portability is the need of the hour to seamlessly manoeuvre multi-cloud strategy.

This presentation will talk about the challenges in building a multi-cloud native storage implementation abstracting cloud specific characteristics, discusses pro’s and con’s of different architectural methodologies, layouts, test/cost optimization scenarios.

Learning Objectives

Understand how native solution services offered by clouds fail short for unified data lake environments.
Learn how one can start architecting their storage solutions for multi-cloud (best practices, tools, tips).
Evolution of metrics to measure how multi-cloud storage solutions react to control and data path operations.