Storage Developer Conference Abstracts

Break Out Sessions and Agenda Tracks Include:

Note: This agenda is a work in progress. Check back for updates on additional sessions as well as the agenda schedule.

Birds of a Feather

Using Interoperable Cloud Encryption to Secure Healthcare Records

David Slik, NetApp
Xavier Xia, Phillips
Mark Carlson, Toshiba

Abstract

For the data protection needs of sharing health data across different cloud services, this BOF will explore the capabilities of the Cloud Data Management Interface (CDMI) in addressing the requirements, and show implementations of CDMI extensions for a health care profile. This BOF will demonstrate the results of testing at the SDC Cloud Plugfest for Encrypted Objects and Delegated Access Control extensions to CDMI 1.1.1.


Improving the Interoperability and Resiliency of Data Replication

Moderator: Thomas Rivera, Sr. Technical Associate, Hitachi Data Systems

Abstract

Join the DPCO Special Interest Group in a discussion on the subject of data replication for disaster recovery. Customers are telling us there are interoperability and resiliency issues that the industry should address. What’s your experience?

Background
There is an assumption that disaster tolerance can be met by simply replicating data from the primary data center to a backup data center which is far from the truth. At the heart of any disaster recovery solution is the ability to ensure that the data loss will not exceed the defined RPO after recovering from the incident and so good disaster-tolerant solutions will have some kind of recovery mechanism integrated with the data replication to automate and limit any downtime that may be incurred.

However, over the years enterprise and midrange array-based replication solutions have not been interoperable, which leads to complex high-cost replication solutions and a ‘lock-in’ proprietary solution for customers. Customers are now seeking direction from the industry to help them implement DR replication in a heterogeneous fashion, to apply a degree of interoperability between technologies, and to provide some best practice guidance for what can be expected in the way of latency, security, and handling different data types. Existing guidance (including standards) may not be sufficient, or there may be a lack knowledge or awareness or even difficulty in implementing certain standards, and identifying this will be a major part of the SIG’s work.


Open Source Storage Networking

Michael Dexter, Senior Analyst, iXsystems, Inc.
Brad Meyer, Technical Marketing Engineer, iXsystems Inc.
Ronald Pagani, Jr., Open Technology Partners, LLC

Abstract

At this BoF session we’ll discuss how open source is enabling new storage and datacenter architectures. All are welcome who have an interest in open source, scale-up and scale-out storage, hyperconvergence and exploring open solutions for the datacenter.

  • How is open source solving your datacenter challenges?
  • What is working? What is not?
  • What would you like to see?
  • Which project(s) are you most excited about?

Long Term Data Preservation and the SIRF Container Format

Sam Fineberg Distinguished Technologist Hewlett Packard Enterprise

Abstract

Generating and collecting very large data sets is becoming a necessity in many domains that also need to keep that data for long periods. Examples include astronomy, genomics, medical records, photographic archives, video archives, and large-scale e-commerce. While this presents significant opportunities, a key challenge is providing economically scalable storage systems to efficiently store and preserve the data, as well as to enable search, access, and analytics on that data in the far future.

We'll present advantages and challenges in long term retention of digital data. We will also discuss recent work on the Self Contained Information Retention Format (SIRF) and the OpenSIRF project, which were developed to enable future applications to interpret stored data regardless of the application that originally produced it.

Capacity Optimization

Improving Copy-On-Write Performance in Container Storage Driver

Frank Zhao, Software Architect CTO Office, EMC

Abstract

Authors: Kevin Xu, and Randy Shain, EMC Corporation
Copy-On-Write (COW) is widely used in file system, storage and container such as via DevMapper, AUFS etc. However, in practice, COW is usually complained due to its performance penalty e.g,: lookup overhead, disk IO amplification and memory amplification (duplicate copies). In this presentation, we share our inspiring solution to effectively mitigate those long standing performance issues specially for dense container environment.

Our solution, namely, the Data Relationship and Reference (DRR), is a lightweight and intelligent software acceleration layer atop of existing COW framework that can speed up cross-layer lookup, reduce duplicated disk IO and potentially enable the possibility of single data copy in memory. We present our design and prototype based on DevMapper for Docker and demonstrate such approach’s promising performance gains in launch storm and App data load use cases.

Learning Objectives

  • Dense Docker use case: challenge and opportunity
  • COW essential and performance insight
  • New approach DRR atop existing COW: why and how
  • Design and prototyping with DM-thin
  • Outlook more opportunities

Smashing Bits: Comparing Data Compression Techniques in Storage

Juan Deaton, Research Engineer and Scientist, AHA

Abstract

As all flash arrays mature into full feature platforms, vendors are seeking new ways to differentiate themselves and prepare for the next killer app. This presentation will first talk about industrial internet of things and how to prepare for these workloads.  With these new workloads, storage controller overhead can be high for data compression and lead to significant performance impacts. This presentation compares the performance of different data compression algorithms (LZO, LZ4, GZIP, and BZIP2) using a CPU versus acceleration hardware.  Comparisons will be shown in terms of throughput, CPU utilization, and power. By using data compression accelerators, storage arrays can increase write throughput, storage capacity, and flash cycle life. Storage system designers attending will understand the tradeoffs between CPU based compression algorithms and data compression accelerators.

Learning Objectives

  • Introduce Industrial IoT and work load types.
  • Introduce different compression algorithms (LZO, LZ4, GZIP, and BZIP2)
  • Analyze performance differences between compression algorithms
  • Introduce hardware compression
  • Understand the tradeoffs of using CPU based algorithms vs. acceleration hardware

Accelerate Finger Printing in Data Deduplication

Xiaodong Liu, Storage software engineer, Intel Asia R&D
Qihua Dai, Engineering Manager, Intel Asia R&D

Abstract

Finger printing algorithm is the foundation of Data deduplication, it is the prominent hotspot on CPU utilization due to its heavy computation. In this talk, we analyze the nature of the data deduplication feature in ZFS file system, present methods to increase data deduplication efficiency with a proper method using multiple-buffer hashing optimization technology. The multiple-buffer hashing has the usage limitations in data deduplication applications, like memory bandwidth for multiple-core, and lower performance for light workload. We design a multiple-buffer hashing framework for ZFS data deduplication to resolve these limitations. As result, ZFS improves 2.5x data deduplication throughput with this framework. This framework for multiple-buffer hashing is general and convenient to benefit other data deduplication applications.

Learning Objectives

  • Analyzing the nature of the data deduplication feature in ZFS file system.
  • Understanding how multiple-buffer hashing works.
  • Analyzing the usage limitations of multiple-buffer hashing for data deduplication applications.
  • Designing a general framework of multiple-buffer hash for data deduplication applications.
  • Demonstrating multiple-buffer hashing framework for ZFS data deduplication solution and performance.

Cloud

What You Need to Know on Cloud Storage

David Slik, Technical Director, NetApp
Mark Carlson, Principal Engineer, Toshiba

Abstract

This session assumes no prior knowledge on cloud storage and is intended to bring a storage developer up to speed on the concepts, conventions and standards in this space. The session will include a live demo of a storage cloud operating to reinforce the concepts presented.



Introduction to OpenStack Cinder

Sean McGinnis, Sr. Principal Software Engineer, Dell

Abstract

Cinder is the block storage management service for OpenStack. Cinder allows provisioning iSCSI, fibre channel, and remote storage services to attach to your cloud instances. LVM, Ceph, and other external storage devices can be managed and consumed through the use of configurable backend storage drivers. Led by some of the core members of Cinder, this session will provide an introduction to the block storage services in OpenStack as well as give an overview of the Cinder project itself. Whether you are looking for more information on how to use block storage in OpenStack, are looking to get involved in an open source project, or are just curious about how storage fits into the cloud, this session will provide a starting point to get going.

Learning Objectives

  • Cloud storage management
  • OpenStack storage

Cloud Access Control Delegation

David Slik, Technical Director, NetApp

Abstract

There is much value in using clouds for data storage, distribution and processing. However, security concerns are increasingly at odds with today's approaches to clouds, where the cloud provider has full access to the data, and makes the access control decisions on your behalf. This session describes and demonstrates how delegated access control can be integrated into cloud architectures in order to separate access control decisions and key disclosure from data storage and delivery, with examples for CDMI, Swift and S3.

Learning Objectives

  • Learn about traditional cloud access control
  • Learn what delegated access control can offer
  • Learn how delegated access control can be implemented for CDMI, Swift and S3
  • See a demonstration of delegated access control

USB Cloud Storage Gateway

David Disseldorp, Engineer, SUSE Linux

Abstract

Cloud block storage implementations, such as Ceph RADOS Block Device (RBD) and Microsoft Azure Page Blobs, are considered flexible, reliable and relatively performant.

Exposing these implementations for access via an embedded USB storage gadget can solve a number of factors limiting adoption, namely:

  • Interoperability
    • Cloud storage can now be consumed by almost any system with a USB port
  • Ease of use
    • Configure once, then plug and play
  • Security
    • Encryption can be performed on the USB device itself, reducing reliance on cloud storage providers

This presentation will introduce and demonstrate a USB cloud storage gateway prototype developed during SUSE Hack Week, running on an embedded Linux ARM board.

Learning Objectives

  • Knowledge of existing Ceph and Azure block storage implementations
  • Awareness of problems limiting cloud storage adoption
  • Evaluate a USB cloud storage gateway device as a solution for factors limiting adoption

How to Ensure OpenStack Swift & Amazon S3 Conformance for Storage Products & Services Supporting Multiple Object APIs

Ankit Agrawal, Solution Developer, Tata Consultancy Services
Sachin Goswami, Solution Architect, Tata Consultancy Services

Abstract

Unstructured data is continuously growing and cloud storage is the best suited solution to store and manage such type of data.

In cloud storage industry, multiple cloud APIs are available to store and manage data in cloud such as Amazon S3, OpenStack Swift, and SNIA CDMI. These APIs are dominating the market and hence storage industry players are focusing on cloud products/services that support all or some of such APIs to manage data in cloud.

Various products are available in the market supporting different types of object APIs such as EMC ECS (S3, Swift & Atmos), NetApp Storage Grid (S3 & CDMI) and so on. However, issue arises, how end user or vendor can ensure if implemented object APIs are correct and conformant to actual standards.

To address this issue, TCS is focusing on a testing solution that conducts conformance testing against SNIA CDMI, Amazon S3, and OpenStack Swift standards.

This proposal provides details on how a vendor can ensure, if the object APIs (Swift and S3) they are supporting in their products are implemented correctly and are actually conformant to actual standards that is OpenStack Swift and Amazon S3.

Learning Objectives

  • Understanding on why conformance is critical for object APIs (S3 & Swift) supported by products and services
  • Approach on how to implement conformance testing for S3 and OpenStack Swift.
  • Sample Test Cases for Conformance Testing.

Deploying and Optimizing for Cloud Storage Systems using Swift Simulator

Gen Xu, Performance Engineer, Intel

Abstract

Today’s data centers are built on traditional architectures that can take days or weeks to provision new services and typically run with low server utilization, limiting efficiency and flexibility while driving up costs. To meet both Storage capacity and SLA/SLOs requirements also need kind of trade-off.

If you are planning to deploy a cloud storage cluster, growth is what you should be concerned with and prepared for. So how exactly can you architect such a system, without breaking the bank, while sustaining a sufficient capacity and performance across the scaling spectrum?

The session is designed to present a novel simulation approach which shows flexibility and high accuracy to be used for cluster capacity planning, performance evaluation and optimization before system provisioning. We will focus specifically on storage capacity planning and provide criteria for getting the best price-performance configuration by setting Memory, SSD and Magnetic Disk ratio. We will also highlight performance optimization ability via evaluating different OS parameters (e.g. log flush and write barrier), software configurations (e.g. proxy and object worker numbers) and hardware setups (e.g. CPU, cluster size, the ratio of proxy server to storage server, network topology selection CLOS vs. Fat Tree).

Learning Objectives

  • Design challenges of a cloud storage deployment (Hardware Sizing, Hardware component selection and Software Stack Tuning)
  • Storage system modeling technology for OpenStack-Swift
  • Use Case study: Use simulation approach to plan and optimize a storage cluster to meet capacity and performance requirements.

Hyper Converged Cache Storage Infrastructure For Cloud

Authors: Yuan Zhou, Senior Software Development Engineer in the Software and Service Group for Intel Corporation
Chendi Xue, Software Engineer, Intel APAC R&D

Abstract

With the strong requirements of cloud computing and software defined architecture, more and more data centers are adopting distribute storage solutions, which usually centralized, based on commodity hardware, with large capacity and designed for scale-out solution. However, the performance of the distribute storage system suffers when running multiple VM on the compute node due to remote access of VM I/O in this architecture, especially for database workloads. Meanwhile, the critical enterprise readiness features like deduplication, compression are usually missed.

In this work we proposed a new novel client-side cache solution to improve the performance of cloud VM storage, which will turn current common cloud storage solution into a hyper converged solution. In our cache solution it provides strong reliability, crash-consistent, various data services like deduplication and compression on non-volatile storage backend, with configurable modes like write-through and write-back. The interface of cache is designed to be flexible to use external plugins or third parity cache software. Our evaluation shows that this solution has great performance improvements to both read-heavy and write-heavy workloads. We also investigated the potential usage of Non-Volatile Memory Technologies in this cache solution.

Learning Objectives

  • Dedup/Compression on non-volatile storage
  • Potential usage of Non-Volatile Memory Technologies in cache solution

Cold STORAGE Data

Introducing the View of SNIA Japan Cold Storage Technical Working Group on "Cold Storage"

Kazuhiko Kawamura, Senior Manager - System Engineering, Sony Corporation

Abstract

Introducing the view of SNIA Japan Cold Storage Technical Working Group on "Cold Storage"

Nowadays, huge amount of data is being generated everyday regardless of our awareness. Since nobody knows how much value the data would have in the future, we are obliged to retain more and more data, a.k.a. "cold data" at a certain cost.

Fortunately, we have quite a few options, however, there is apparently no standard yardstick to make our decision. Therefore, we have been discussing this topic for many months. Today, I'd like to introduce our current view on the cold storage with a bit of taxonomy by various characteristics of such storage devices.

Data Preservation

Long Term Retention of Data and the Impacts to Big Data Storage

Shawn Brume, Business Line Executive, Data Retention Infrastructure, IBM, The LTO Program, IBM

Abstract

Data retention and preservation is rapidly becoming the most impacting requirement to data storage. Regulatory and corporate guidelines are causing stress on storage requirements. Cloud and Big Data environments are stressed even more by the growth of rarely touched data due to the need to improve margins in storage. There are many choices in the market for data retention, managing the data for decades must be as automated as possible. This presentation will outline the most effective storage for Long term data preservation, emphasizing Total Cost of Ownership, ease of use and management, and lowering the carbon footprint of the storage environment.

Learning Objectives

  • How to address Regulatory Guidelines
  • How to manage cloud/big data environments
  • How to manage and energy efficient data center
  • How to preserve assets for an extended timeframe
  • TCO of ownership for these issues

The Role of Active Archive in Long-Term Data Preservation

Mark Pastor, Director of Archive and Technical Workflow Solutions, Quantum

Abstract

Anyone managing a mass storage infrastructure for HPC, Big Data, Cloud, research, etc., is painfully aware that the growth, access requirements and retention needs for data are relentless. At the heart of that problem is the need to rationalize the way that data is managed, and create online access to all that data without maintaining it in a continuous, power-consuming state. The solution lies in creating an active archive that enables straight-from-the-desktop access to data stored at any tier for rapid data access via existing file systems that expand over flash, disk and tape library storage technologies. Active archives provide organizations with a persistent view of the data and make it easier to access files whenever needed, regardless of the storage medium being utilized.

Learning Objectives

  • Understand how active archive technologies work and how companies are using them to enable reliable, online and efficient access to archived data.
  • Learn the implications of data longevity and planning considerations for long-term retention and data integrity assurance.
  • Learn why active archive solutions can achieve unmatched efficiency and cost savings as data continues to grow much faster than storage budgets.

Erasure Coding

Modern Erasure Codes for Distributed Storage Systems

Srinivasan Narayanamurthy, Member Technical Staff, NetApp

Abstract

Traditional disk based storage arrays commonly use various forms of RAID for failure protection and/or performance improvement. Modern distributed storage systems are built as shared nothing architectures using commodity hardware components and have different failure characteristics.

RAID has changed very little ever since its introduction three decades ago. There are several new and powerful constructions of erasure codes in the recent past (post- 2007) that were handcrafted to meet the needs of modern storage architectures. Several notable ones are fountain codes, locally repairable codes (LRC) and regenerating codes. However, these codes have made very little impact in the industry, because of the highly resistant nature of the storage vendors to investigate this space.

The subject of erasure codes is core to storage technologies, but is sadly ignored by most storage conventions. One of the reasons could be the complexity involved in understanding the space.

This talk will provide a non-mathematical, non-marketing, no-nonsense spin to the space of modern erasure codes. By the end of this talk, the audience will be able to pick specific categories from a bouquet of erasure codes that best suit a given storage system design.

Learning Objectives

  • History and classification of erasure codes.
  • Technical deep-dive into modern erasure codes.
  • Storage system parameters’ trade-off.
  • Storage system design & suitable codes.
  • Who is doing what in this space?

Fun with Linearity: How Encryption and Erasure Codes are Intimately Related

Jason Resch, Senior Software Architect, IBM

Abstract

Erasure codes are a common means to achieve availability within storage systems. Encryption, on the other hand, is used to achieve security for that same data. Despite the widespread use of both methods together, it remains little known that both of these functions are linear transformations of the data. This relation allows for them to be combined in useful ways. Ways that are seemingly unknown and unused in practice. This presentation presents novel techniques built on this observation, including: rebuilding lost erasure code fragments without exposing any information, decrypting fragments produced from encrypted source data, and verifying consistency and integrity of erasure coded fragments without exposing any information about the fragments or the data.

Learning Objectives

  • What are linear functions?
  • Examples of linear functions
  • Combining encryption and erasure codes
  • Exploiting the linearity of erasure codes to securely rebuild
  • Using the linearity of CRCs to securely verify erasure coded data

Next Generation Scale-Out NAS

Philippe Nicolas, Advisor, Rozo Systems

Abstract

Rozo Systems develops a new generation of Scale-Out NAS with a radical new design to deliver a new level of performance. RozoFS is a high scalable, high performance and high resilient file storage product, fully hardware agnostic, that relies on an unique patented Erasure Coding technology developed at University of Nantes in France. This new philosophy in file serving extends what is capable and available today on the market with super fast and seamless data protection techniques. Thus RozoFS is the perfect companion for high demanding environments such HPC, Life Sciences, Media and Entertainment, Oil and Gas.


SNIA Tutorial:
Optimize Storage Efficiency & Performance with Erasure Coding Hardware Offload

Dror Goldenberg, VP of Software Architecture, Mellanox Technologies

Abstract

Nearly all object storage, including Ceph and Swift, support erasure coding because it is a more efficient data protection method than simple replication or traditional RAID. However, erasure coding is very CPU intensive and typically slows down storage performance significantly. Now Ethernet network cards are available that offload erasure coding calculations to hardware for both writing and reconstructing data. This offload technology has the potential to change the storage market by allowing customers to deploy more efficient storage without sacrificing performance. Attend this presentation to learn how erasure coding hardware offloads work and how they can integrate with products such as Ceph.

Learning Objectives

  • Learn the benefits and costs of erasure coding
  • Understand how erasure coding works in products such as Ceph
  • See how erasure coding hardware offloads accelerate storage performance

/etc

Storage as an IoT Device

Walt Hubis, Principal, Hubis Technical Associates
Tom Coughlin, President, Coughlin Associates

Abstract

Ongoing changes in computing architectures, a continued trend toward virtualization (Software Defined Storage, SaaS, PaaS, etc.), and recent questions about the security of the data on drives has led to major architectural changes in both the interface and internal implementation of secure storage devices. This presentation looks into the drivers of these changes, including recent reports of drive vulnerabilities, and describe the changes currently being driven into storage standards and devices. In particular, the parallel between embedded security in storage and IoT devices is examined to show how these technologies are beginning to converge. Techniques used in the development of secure embedded systems will be explored.

Learning Objectives

  • Understand current threats to secure storage
  • Learn how IoT techniques can improve the security of storage devices
  • Understand embedded techniques for security

Early Developer Experiences with New Windows Nano Server as Embedded Enterprise Storage Appliance Platform OS

Terry Spear, President & CTO, IOLogix, Inc.

Abstract

This presentation will cover early developer experiences using the new Windows 2016 Nano Server as an embedded platform OS for Enterprise Storage Appliances. We will briefly review available built-in storage technologies of this OS configuration. Experiences with the different development and test methodologies required for this new OS will be discussed. Finally, a specific storage application development case of extending Nano Server by adding a Fibre Channel SCSI Target will be covered in detail including preliminary performance results.


Iometer for SMR Hard Disk Drives

Muhammad Ahmad, Staff Software Engineer, Seagate Technology

Abstract

Iometer is widely used in the industry to benchmark storage devices. Storage device vendors and OEMs that integrate it into their systems use these benchmarks to evaluate if the new generation of drives are any better than the previous generation.

The problem arises when these Iometer benchmark profiles are used to evaluate Shingled Magnetic Drives. Iometer does not have any intelligence of what a storage device is capable off.

Since ZAC/ZBC support not available in common file systems or Windows device driver stack, my presentation will show the modifications I made to Iometer to make it’s IO generation aware of the Zoned-Block ATA Devices to better evaluate their performance & compare those results to Conventional Magnetic Recording (CMR) devices.

Learning Objectives

  • Performance issues with SMR devices
  • Iometer (dynamo) code review
  • Changes to Iometer/dynamo code
  • Highlighting the challenges with this approach

File Systems

Design of a WORM Filesystem

Terry Stokes, Principal Software Engineer, EMC/Isilon

Abstract

There are many governmental regulations requiring the archival of documents and emails for a period of years to Write-Once-Read-Many times (WORM) storage. In the past, companies would meet these requirements by archiving all data to optical storage. But optical storage is slow and difficult to manage. So today, many companies are opting for hard disk storage that exhibit WORM properties.

This talk will walk through the process the presenter went through for designing a WORM compliant filesystem. It will cover the customer expectations for such a system, and how they were meet in the design of the WORM filesystem.

Learning Objectives

  • A survey of the government regulations requiring WORM storage.
  • Disadvantages of traditional optical WORM storage.
  • Difficulties in implementing an immutable file storage system.
  • Techniques to provide WORM properties to disk based file system storage.

MarFS: Near-POSIX Access to Object-Storage

Jeff Inman, Software Developer, Los Alamos National Laboratory
Gary Grider, HPC Div Director, LANL

Abstract

Many computing sites need long-term retention of mostly-cold data, often referred to as “data lakes”. The main function of this storage-tier is capacity, but non-trivial bandwidth/access requirements may also exist. For many years, tape was the most economical solution. However, data sets have grown larger more quickly than tape bandwidth has improved, such that disk is now becoming more economically feasible for this storage-tier. MarFS is a Near-POSIX File System, storing metadata across multiple POSIX file systems for scalable parallel access, while storing data in industry-standard erasure-protected cloud-style object stores for space-efficient reliability and massive parallelism. Our presentation will include: Cost-modeling of disk versus tape for campaign storage, Challenges of presenting object-storage through POSIX file semantics, Scalable Parallel metadata operations and bandwidth, Scaling metadata structures to handle trillions of files with billions of files/directory, Alternative technologies for data/metadata, and Structure of the MarFS solution.


Data Integrity support for Silent Data Corruption in Gfarm File System

Osamu Tatebe, Professor, University of Tsukuba

Abstract

Data stored in storage are often corrupted silently without any explicit error. To cope with the silent data corruption, the file system level detection is effective; Btrfs and ZFS have a mechanism to detect it by adding checksum in each block. However, data replication is required to correct the corrupted data, which waste storage capacity in local file system. Gfarm file system is an open-source distributed file system, which federates storages even among several institutions in wide area. It supports file replicas to improve access performance and also fault tolerance. The number and the locations of file replicas are specified by an extended attribute of a directory or a file. This presentation describes a design and implementation of data integrity support for silent data corruption in Gfarm file system. Due to native and required feature of file replicas in Gfarm file system, the silent data corruption can be detected efficiently and the corrupted data can be corrected without any waste storage capacity.


Optimizing Every Operation in a Write-optimized File System

Rob Johnson, Research Assistant Professor, Stony Brook University

Abstract

BetrFS is a new file system that outperforms conventional file systems by orders of magnitude on several fundamental operations, such as random writes, recursive directory traversals, and metadata updates, while matching them on other operations, such as sequential I/O, file and directory renames, and deletions. BetrFS overcomes the classic trade-off between random-write performance and sequential-scan performance by using new "write-optimized" data structures.

This talk explains how BetrFS's design overcomes multiple file-system design trade-offs and how it exploits the performance strengths of write-optimized data structures.



Hardware Based Compression in Ceph OSD with BTRFS

Weigang Li, Software Engineer, Intel

Abstract

Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability. BTRFS with the compelling set of features is recommended for non-production Ceph environments. In this talk we will introduce our experiment work on integrating the hardware acceleration in BTRFS to optimize the data compression workload in Ceph OSD, we analyze the nature of compression feature in BTRFS file system and the cost of the software compression library, and present the optimized solution to reduce the CPU cycles, disk IO with the hardware compression accelerator enabled in Ceph OSD.

Learning Objectives

  • Huge Cache Index
  • Hierarchical Storage Management
  • Optimized Data Bases
  • Software Defined Storage
  • Data tiering or Data classification in GlusterFS

Efficient Data Tiering in GlusterFS

Rafi KC, Software Engineer, Red Hat India

Abstract

Modern Software Defined Storage systems use heterogeneous storage media including disks, SSDs, and (soon) persistent memory. It’s important for these systems to identify, segregate and optimally utilize such storage media to gain advantages in performance, capacity and cost. Gluster's Data Tiering feature aims to help tackle these problems by segregating fast and slow storage into separate tiers, and providing intelligent movement of data across tiers according to dynamically observed current usage patterns. In this presentation, Rafi will discuss and demonstrate Gluster tiering, the use cases and future enhancements of this feature. Rafi will also discuss and compare on the heat indexing techniques used in Gluster, Ceph and DM cache, with their pros and cons.

Learning Objectives

  • Huge Cache Index
  • Hierarchical Storage Management
  • Optimized Data Bases
  • Software Defined Storage
  • Data tiering or Data classification in GlusterFS

Multi-Chance Scalable ARC (Adaptive Replacement Cache)

Shailendra Tripathi, Architect, Tegile Systems

Abstract

ZFS uses an Adaptive Replacement Cache, ARC, algorithm to manage the data and metadata page cache buffers. The cache is partitioned into most recently used and most frequently used buckets. For each partition it uses ghost caches to derive even better cache hit patterns. This way the algorithm provides a very good short term and long term value differentiation and is scan resistant.

However, the ARC has a serious scaling issues, especially on increasingly more dense multi-processor systems. The fundamental scaling problem is caused due the inherent LRU algorithm used in movement, insertion, and, eviction.

Multi-Chance Scalable ARC implements the lockless ARC. The buffer addresses are stored in page arrays using the atomic instructions. The insertion and removal just become a swap operation in the page array. Thus, it avoids global lock usage in extremely hot path and minimizes “too-many” movements within or across the lists.

Learning Objectives

  • Objectives of a good page cache
  • ZFS ARC page cache
  • Scaling issues in the ARC
  • Lockless ARC
  • Implementation details of the lockless ARC algorithm

Green Storage

Using SPEC SFS with the SNIA Emerald Program for EPA Energy Star Data Center Storage Program

Vernon Miller, Senior Software Engineer, IBM
Nick Principe, Principal Software Engineer, EMC

Abstract

The next storage platform category to be added into the EPA Data Center Storage program is File storage. Come learn what it takes in setting up a SNIA Emerald File testing environment with the SPEC SFS tools, the additional energy related instrumentation and data collection tools. Don't wait to be kicked in the "NAS" when an Energy Star rating gates selling your File storage solutions.

Learning Objectives

  • Understand how the EPA Energy Star Data Center Storage Program applies to file and NAS environments.
  • Understand the methodology used in the SNIA Emerald Program to evaluate energy consumption of file and NAS environments.
  • Understand how the SNIA Emerald Program uses SPEC SFS2014 to drive file and NAS workloads.

Environmental Conditions and Disk Reliability in Free-cooled Datacenters

Manousakis Ioannis, PhD candidate in Rutgers University, Department of Computer Science

Abstract

Free-cooling lowers datacenter costs significantly, but may also expose servers to higher and more variable temperatures and relative humidities. It is currently unclear whether these environmental conditions have a significant impact on hardware component reliability. Thus, in this paper, we use data from nine hyperscale datacenters to study the impact of environmental conditions on the reliability of server hardware, with a particular focus on disk drives and free cooling. Based on this study, we derive and validate a new model of disk lifetime as a function of environmental conditions. Furthermore, we quantify the tradeoffs between energy consumption, environmental conditions, component reliability, and datacenter costs. Finally, based on our analyses and model, we derive server and datacenter design lessons. We draw many interesting observations, including:

  1. Relative humidity seems to have a dominant impact on component failures
  2. Disk failures increase significantly when operating at high relative humidity, due to controller/adaptor malfunction
  3. Though higher relative humidity increases component failures, software availability techniques can mask them and enable free-cooled operation, resulting in significantly lower infrastructure and energy costs that far outweigh the cost of the extra component failures.

Hardware

The Curse of Areal Density

Ralph Weber, Consultant, ENDL Texas

Abstract

During the past half decade, the uninterrupted march of higher and higher areal densities has begun to encounter the unforgiving laws of physics. Where once the differences between the energies required write and read had not mattered one iota, the higher forces demanded by writes are putting pressure on a wide variety of I/O parameters that previously were thought to be inviolate for random access read/write storage devices. A new form of Whack A Mole has emerged in which point solutions for various problems are popping up faster than file systems designers can respond. Never mind that the real hobgoblin is licking his chops and preparing to claim more victims.

Learning Objectives

  • Perspective on current storage technology issue
  • Why this issue is unlikely to disappear
  • Open a discussion on how to standarize coverage for the issue

Characterizing the Evolution of Disk Attributes via Absorbing Markov Chains

Rachel Traylor, Research Scientist, EMC

Abstract

The life cycle of any physical device is a series of transitions from an initial healthy state to ultimate failure. Since Markov chains are a general method for modeling state transitions, they can effectively model the transitions as well as life expectancy. As a specific example, HDD life cycle can be analyzed by developing a Markov model from various disk indicators (e.g. medium errors, RAS, usage). An illustration is given wherein the evolution of medium errors in HDDs is modeled by an absorbing Markov chain. Examples of methods to employ the information contained in the Markov chain are provided.

Learning Objectives

  • understand how a Markov chain may be used to model a variety of disk qualities/errors
  • understand the types of business/operations related answer such a model provides
  • gain insight into how one can employ such information to develop flexible business rules

HyperScaler

Hyperscaler Storage

Mark Carlson, Principal Engineer, Industry Standards, Toshiba

Abstract

Hyperscaler companies as well as large enterprises who build their own datacenters have specific requirements for new features in storage drives. The SNIA Technical Council has created a white paper on these requirements and how current and future standards and open source projects can address them.

This talk will present the results of the the TC research in this area and discuss how SNIA and other standards bodies are making changes to accommodate them.

Learning Objectives

  • Learn about Datacenter customer drive requirements
  • Learn about existing and new standards for drive interfaces
  • Learn about open source projects that address these requirement

Standards for Improving SSD Performance and Endurance

Bill Martin, Principal Engineer Storage Standards, Samsung

Abstract

Standardization efforts have continued for features to improve SSD performance and endurance. NVMe, SCSI, and SATA have completed standardization of streams and background operation control. Standardization is beginning on how SSDs may implement Key Value Storage (KVS) and In Storage Compute (ISC). This effort is progressing in the SNIA Object Drive Technical Work Group (TWG), and may involve future work in the protocol standards (NVMe, SCSI, and SATA). Currently the Object Drive TWG is defining IP based management for object drives in utilizing the DMTF RedFish objects. Future Object Drive TWG work will include APIs for KVS and ISC. This presentation will discuss the standardization of streams, background operation control, KVS, and ISC and how each of these features work.

Learning Objectives

  • What is streams
  • What is background operation control
  • Progress of standardization in NVMe, SCSI, SATA, and SNIA

Open Storage Platform Delivers Hyper-scale Benefits for the Enterprise

Eric Slack, Sr. Analyst, Evaluator Group

Abstract

The Open Storage Platform (OSP) is an architecture for building storage and compute infrastructures using independent software and hardware components that leverages some popular concepts in IT – scale-out storage, software-defined storage and commodity hardware. This approach originated in the large, hyper-scale companies as a way to provide the enormous amounts of storage capacity they needed, cost effectively.

These companies popularized the use of low-cost server chassis filled with consumer-grade storage media. They also used complex software, often developed in-house, to provide resiliency at a cluster level and federate the large numbers of physical servers that comprised these infrastructures.

The Open Storage Platform provides an architecture for assembling these scale-out storage systems from commercially available products. Where the hyper-scalers’ experience was a “do-it-yourself” kind of process, OSP could be described as more “roll your own”.

Learning Objectives

  • Understand the OSP model and its origin
  • Understand where OSP is being used
  • Understand the benefits OSP can provide
  • Understand how to implement OSP

KEYNOTE SPEAKERS AND GENERAL SESSIONS

Cloud Architecture in the Data Center and the Impact on the Storage Ecosystem: A Journey Down the Software Defined Storage Rabbit Hole

Dan Maslowski, Global Engineering Head, Citi Storage/Engineered Systems - Citi Architecture and Technology Engineering (CATE), Citigroup

Abstract

Linux is usually at the edge of implementing new storage standards, and NVMe over Fabrics is no different in this regard. This presentation gives an overview of the Linux NVMe over Fabrics implementation on the host and target sides, highlighting how it influenced the design of the protocol by early prototyping feedback. It also tells how the lessons learned during developing the NVMe over Fabrics, and how they helped reshaping parts of the Linux kernel to support over Fabrics and other storage protocols better.


MarFS: A Scalable Near-POSIX File System over Cloud Objects for HPC Cool Storage

Grider, Gary, HPC Div Director, LANL

Abstract

Many computing sites need long-term retention of mostly cold data often “data lakes”. The main function of this storage tier is capacity but non trivial bandwidth/access requirements exist. For many years, tape was the best economic solution. Data sets have grown larger more quickly than tape bandwidth improvements and access demands have increased in the HPC environment. Disk can be more economically for this storage tier. The Cloud Community has moved towards erasure based object stores to gain scalability and durability using commodity hardware. The Object Interface works for new applications but legacy applications utilize POSIX for their interface. MarFS is a Near-POSIX File System using cloud storage for data and many POSIX file systems for metadata. Extreme HPC environments require that MarFS scale a POSIX namespace metadata to trillions of files and billions of files in a single directory while storing the data in efficient massively parallel ways in industry standard erasure protected cloud style object stores.

Learning Objectives

  • Storage tiering in future HPC and large scale computing environments
  • Economic drivers for implementing data lakes/tiered storage
  • HPC specific requirements for data lakes - multi way scaling
  • Overview of existing data lake solution space
  • How the MarFS solution works and for what types of situations

Application Access to Persistent Memory – The State of the Nation(s)!

Authors: Stephen Bates, Paul Grun, Tom Talpey, Doug Voigt

Abstract

After years of anticipation, an era of new memory technology is upon us. Next Generation non-volatile memories, such as NVDIMM, and the 3D XPoint™ memory developed by Intel and Micron, offers low latency, byte-addressability and persistence. The next step is figuring out how to ensure applications can utilize these new technologies. To this end there is a large amount of activity across a range of industry bodies exploring options for providing connectivity between NG-NVM and the applications that will use it. These groups include SNIA, NVM Express, the RDMA communities including the OpenFabrics Alliance and the InfiniBand Trade Association, JEDEC, IETF and others. In addition, operating systems and programming models are being extended to align with these standards and the memories to complete the connection between NVM and application. These models extend across networks, providing storage-class deployments which support advanced resiliency and availability.

In this presentation we give a “State of the Nation(s)” overview of where we are on this path. We provide a high-level update of the work going on within the industry bodies, the OSes and the programming models. We outline some of the highlights of what has been achieved so far and discuss some of the (many) challenges that still need to be solved.

NVMe over Fabrics

NVMe over Fabrics - High Performance Flash moves to Ethernet

Rob Davis, VP Storage Technology, Mellanox
Idan Burstein, Storage Architect, Mellanox

Abstract

There is a new PCIe based very high performance flash interfaced called NVMe available today from many flash device suppliers. This session will explain how this local server based technology is now expanding its capabilities to the network. And with protocol offload technology is able to maintain local performance level. The original NVMe technology was developed by an industry standards group called NVM Express. It was driven by the need for a faster storage hardware interface to the operating system to allow applications to take advantage of the much higher performance of solid state vs. hard drives. There is now a new standards effort underway to enhance this technology for use across a network. Called NVMe over Fabrics, it utilizes ultra-low latency RDMA technology to achieve device sharing across a network without sacrificing local performance characteristics.

Learning Objectives

  • Understanding of NVMe
  • Understanding of NVMe over fabics
  • Understanding of how NVMe over Fabrics and RDMA work together

NVMe Over Fabrics Support in Linux

Christoph Hellwig
Sagi Grimberg, Co-founder, Principle Architect, LightBits Labs

Abstract

Linux is usually at the edge of implementing new storage standards, and NVMe over Fabrics is no different in this regard. This presentation gives an overview of the Linux NVMe over Fabrics implementation on the host and target sides, highlighting how it influenced the design of the protocol by early prototyping feedback. It also tells how the lessons learned during developing the NVMe over Fabrics, and how they helped reshaping parts of the Linux kernel to support over Fabrics and other storage protocols better.

Object Storage

Cutting the Cord: Why We took the File System Out of Out Storage Nodes

Manish Motwani, Backend Storage Software Architect, Cleversafe, an IBM Company

Abstract

A major component of request latency is the time back-end servers spend reading and writing data. Object storage systems commonly use JBOD servers together with a general purpose file system to store object data in files. Yet there are far more efficient ways of implementing the back-end storage. In this presentation, we explore the lessons we learned, problems we encountered, and resulting performance gains in the process of creating a new back-end storage format. This format is applied to the raw disk over the SCSI interface, is append-only in its operation (SMR and SSD friendly) and enables single-seek access to any object on the disk. The result is significantly increased IOPS together with significantly reduced latency.

Learning Objectives

  • General purpose filesystems are not efficient for large scale object storage
  • An example of a raw-disk data format
  • Append-only writes work efficiently for both SMR and PMR drives
  • Memory requirements of a highly performant raw-disk storage backend

New Fresh Open Source Object Storage

Jean-Francois Smigielski, Co-founder and R&D Manager, OpenIO

Abstract

With a design started in 2006, OpenIO is a new flavor among the dynamic object storage market segment. Beyond Ceph and OpenStack Swift, OpenIO is the last coming player on that space. The product relies on an open source core object storage software with several object APIs, file sharing protocols and applications extensions. The inventors of the solution took a radical new approach to address large scale environment challenges. Among them, the product avoids any rebalance like consistent hashing based systems always trigger. The impact is immediate as new machines contribute immediately without any extra tasks that impact the platform service. OpenIO also introduces the Conscience, an intelligent data placement service, that optimizes the location of the data based on various criteria such nodes workload, storage space… OpenIO is fully hardware agnostic, running on commodity x86 servers promoting a total independence.


Dynamic Object Routing

Balaji Ganesan, Director Of Engineering, Cloudian
Bharat Boddu, Principal SW Engineer, Cloudian

Abstract

Problem Description: In object store systems objects are identified using a unique identifier. Objects on disk are immutable. Any modification to an existing object, results in new object with different timestamp to be created on disk. A common method to distribute objects to storage nodes is using consistency hashing method. A storage node can have multiple disks to store the objects assigned to it and based on the application usage patterns some hash bucket grow faster than others resulting in some disks getting more used than others.

* Existing solution: - One way to solve this problem it to move this hash bucket to one of the less used disks, which moves data from disk to disk

* Our solution - A routing table is used to determine object's storage location. Object hash value as well as its insertion timestamp is used to determine the object's storage location. Each hash bucket is assigned initially to one of the available disks. When a hash bucket's storage disk utilization is greater than overall average disk utilization, another less used disk is assigned to that hash bucket with a new timestamp. All new objects to that hash bucket will be stored in new disk. Existing objects will be accessed from old disk using the routing table. This method will avoid moving data.


Storage Solutions for Private Cloud, Object Storage Implementation on Private Cloud, MAS/ACS

Ali Turkoglu, Principal Software Engineering Manager, Microsoft
Mallikarjun Chadalapaka, Principal Program Manager , Microsoft

Abstract

In MAS (Microsoft Azure Stack), we implemented a scalable object storage on unified namespace leveraging Windows Server 2016 features in file system and in places creating new storage back ends for Azure block and page blobs, as well as table and queue services. This talk gives a details about capabilities of private cloud object storage and key design/architectural principles and problems being faced.

Learning Objectives

  • Private cloud object storage
  • Block and page blob design challenges
  • Microsoft Azure Stack

Heterogeneous Architectures for Implementation of High-capacity Hyper-converged Storage Devices

Michaela Blott, Principal Engineer, Xilinx/Research
Endric Schubert, PhD, CTO & Founder, MLE

Abstract

Latest trends in software defined storage indicate emergence of hyper-convergence where compute networking and storage are combined within one device. In this talk, we introduce a novel architecture to implement such a node in one device. The unique features include a combination of ARM-processors for control plane functionality and dataflow architectures in FPGAs to handle data processing. We leverage a novel hybrid memory system mixing NVMe drives and DRAM to deliver a multi-terabyte object store with 10Gbps access bandwidth. Finally, network connectivity is accelerated by leveraging a full TCP/IP endpoint dataflow implementation within the FPGA’s programmable fabric. A first proof-of-concept deploys Xilinx Ultrascale+ MPSoC to demonstrate feasibility of a single-chip solution that can produce unprecedented levels of performance (13M requests per second) and storage capacity (2TB) at minimal power consumption (<30W). Furthermore, the deployed dataflow architecture supports additional software-defined features such as video compression and object recognition without performance penalty while resources fit in the device.


Object Storage Analytics : Leveraging Cognitive Computing For Deriving Insights And Relationships

Pushkar Thorat, Software Engineer, IBM

Abstract

Object storage has become a de facto cloud storage for both private and public cloud deployment. Analytics over data stored on object store for deriving greater insights is an obvious exercise being looked by implementers. In object store where the data resides within objects, the user-defined metadata associated with the objects has the ability to provide quick relevant insights of the data. Hence leveraging user defined object metadata for analytics can help derive early insights. But having relevant user defined metadata with every object data is one of the biggest inhibitors for such analytics. On the other hand, Conginitive Computing has been an up trend where fiction meets reality. Various cognitive services are available which leverage extreme data analysis using machine learning techniques that help in data interpretation and beyond. In this presentation, we discuss on how cognitive services can help enrich object stores for analytics by self tagging objects which can not only be used for data analytics but also for deriving object relationships to helps short-list & categorize the data for analytics. The presentation includes a manifestation using IBM Spectrum Scale Object Store based on OpenStack SWIFT and popular cognitive service in marketplace like IBM Watson.

Learning Objectives

  • One of the objective is to understand how to analyze data on object store
  • Audience will learn about cognitive computation and how it relates to analytics
  • Audience will be able to understand how the emerging cognitive services can be applied to object store for better analysis of data hosted on object store
  • Audience will be able learn practically how they one can apply cognitive services for media based workloads hosted on object store to derive better insights

Performance

Data Center Workload Analysis

Peter Murray, Principal System Engineer, Virtual Instruments

Abstract

For years, enterprises have desired a way to ensure that an application will operate properly at scale on a particular storage array and network infrastructure, and increasingly, ensure that multiple applications can safely run at capacity on an all-flash or hybrid architecture. Companies need an approach that enables users to reference detailed workload analysis profiles as they build and share workload models, and simulate performance across a variety of vendors, products and technologies.

Learning Objectives

  • How to determine what the traffic related to an enterprise application implementation looks like from the perspective of the network infrastructure between application servers and storage arrays
  • How the application I/O workload can be modeled to accurately represent the application in production using storage analytics or vendor performance statistics to generate a synthetic workload model
  • How a statistical application model can be used to validate both array and network performance at varying traffic levels and rates to rapidly test if/then scenarios
  • How multiple application models can be combined to determine appropriate throughput and response time levels on a given array
  • How using these models can enable engineers and architects to cost-effectively test and confidently deploy new networked storage infrastructure

IOPS: Changing Needs

Jim Handy, General Director, Objective Analysis
Thomas Coughlin, President, Coughlin Associates

Abstract

Four years have elapsed since our first IOPS survey – What has changed? Since 2012 we have been surveying SysAdmins and other IT professionals to ask how many IOPS they need and what latency they require. Things have changed over the past four years. Everyone understands that SSDs can give them thousands to hundreds of thousands of IOPS (I/Os Per Second), with flash arrays offering numbers in the million-IOPS range, while HDDs only support from tens to hundreds of IOPS. But many applications don’t need the extreme performance of high-end SSDs. In the survey users shared their IOPS needs with us both in 2012, and again in 2016. What has changed? What has remained the same? This presentation examines the need for high IOPS and profiles applications according to these needs.

Learning Objectives

  • Hear what your peers consider the necessary level of performance in both IOPS and latency for various common enterprise applications
  • Learn how the proper combination of HDDs and SSDS can satisfy IOPS and latency requirements for common enterprise activities
  • See some examples of how users have combined HDDs and flash memory to achieve cost effective solutions that meet their application requirements.

Bridging the Gap Between NVMe SSD Performance and Scale Out Software

Anjaneya Chagam, Principal Engineer, Intel

Abstract

NVMe SSDs are becoming increasingly popular choice in scale out storage for latency sensitive workloads like databases, real time analytics, video streaming. NVMe SSDs provide significant performance throughput and lower latency compared to SATA, SAS SSDs. It is not unrealistic to expect these devices providing close to million random IOs per second. However scale out software stacks have significant amount of software overhead limiting the immense potential of NVMe SSDs. In this session, we present all flash scale out cluster performance, analysis on data path I/O overhead and programming techniques to systemically address software performance barriers.

Learning Objectives

  • Scale out storage software data path flows
  • Performance profiling with NVMe SSDs
  • User mode v/s kernel mode NVMe SSD integration
  • Optimization techniques

Ceph Performance Benchmarking with CBT

Logan Blyth, Software Developer, Aquari, a division of Concurrent

Abstract

Once a ceph cluster has been assembled, how do you know the bandwidth capabilities of the cluster? What if you want to compare your cluster to one of the clusters in a vendor whitepaper? Use the Ceph Benchmarking Tool (CBT). Learn about CBT so that you can measure the performance of your cluster. You can simulate Openstack, S3 /Swift via CosBench, or an application written against the librados API. The tool is config file based with parametric sweeping to test a range of values in one file, ties in with collectl and blktrace, and will re-create a ceph cluster if desired.

Learning Objectives

  • Overview of Ceph IO Path
  • Usage of CBT
  • Increase Awareness of the Ceph Benchmarking Tool

SPDK - Building Blocks for Scalable, High Performance Storage Applications

Benjamin Walker, Software Engineer, Intel Corporation

Abstract

Significant advances in throughput and latency for non-volatile media and networking require scalable and efficient software to capitalize on these advancements. This session will present an overview of the Storage Performance Development Kit, an open source software project dedicated to providing building blocks for scalable and efficient storage applications with breakthrough performance. There will be a focus on the motivations behind SPDK's userspace, polled-mode model, as well as details on the SPDK NVMe, CB-DMA, NVMe over Fabrics and iSCSI building blocks. http://spdk.io.

Learning Objectives

  • Why use userspace drivers
  • When polling is better than interrupts
  • Applying shared-nothing architecture to storage

Introducing the EDA Workload for the SPEC SFS® Benchmark

Nick Principe, Principal Software Engineer, EMC Corporation
Jignesh Bhadaliya, CTO for EDA, EMC Corporation

Abstract

The SPEC SFS subcommittee is currently finalizing an industry-standard workload that simulates the storage access patterns of large-scale EDA environments. This workload is based upon dozens of traces from production environments at dozens of companies, and it will be available as an addition to the SPEC SFS benchmark suite. Join us to learn more about the storage characteristics of real EDA environments, how we implemented the EDA workload in the SPEC SFS benchmark, and how this workload can help you evaluate the performance of storage solutions.


Reducing Replication Bandwidth for Distributed Document-oriented Databases

Sudipta Sengupta, Principal Research Scientist, Microsoft

Abstract

With the rise of large-scale, Web-based applications, users are increasingly adopting a new class of document-oriented database management systems (DBMSs) that allow for rapid prototyping while also achieving scalable performance. As in other distributed storage systems, replication is important for document DBMSs in order to guarantee availability. The network bandwidth required to keep replicas synchronized is expensive and is often a performance bottleneck. As such, there is a strong need to reduce the replication bandwidth, especially for geo-replication scenarios where wide-area network (WAN) bandwidth is limited. This talk presents a deduplication system called sDedup that reduces the amount of data transferred over the network for replicated document DBMSs. sDedup uses similarity-based deduplication to remove redundancy in replication data by delta encoding against similar documents selected from the entire database. It exploits key characteristics of document-oriented workloads, including small item sizes, temporal locality, and the incremental nature of document edits. Our experimental evaluation of sDedup with real-world datasets shows that it is able to significantly outperform traditional chunk-based deduplication techniques in reducing data sent over the network while incurring negligible performance overhead.

Learning Objectives

  • Replication in distributed databases
  • Techniques for network bandwidth reduction
  • Similarity detection in sDedup
  • sDedup system design

Performance Analysis of the Peer Fusion File System (PFFS)

Richard Levy, CEO and President, Peer Fusion

Abstract

The PFFS is a POSIX compliant parallel file system capable of high resiliency and scalability. The user data is dispersed across a cluster of peers with no replication. This is an analysis of the performance metrics obtained as we ramped up the count of peers in the cluster. For each cluster configuration we adjusted the allowable count of peer failures (FEC - Forward Error Correction) from 14% to 77% of the cluster and measured the I/O performance of the cluster. Write operations consistently exceeded 700MB/s even with 77% of the peers faulted. Read operations always exceeded 400MB/s with no peer failures and we observed a graceful performance degradation as faults were injected so that even with 77% peer failure we never dropped below 300MB/s. As expected large I/O were faster when the cluster was mounted in direct_io mode whereas smaller I/O were faster when direct_io was disabled and the kernel was allowed to optimize I/O requests. We discuss how Ethernet latency vs. throughput issues affect PFFS performance. We explore how the PFFS can be made faster in the future. We draw conclusions on the scalability of the cluster’s performance based upon our measurements.

Learning Objectives

  • Design for performance
  • Resiliency vs. performance
  • Scaling performance
  • Issues with Ethernet

Persistent Memory

Persistent Memory Quick Start Programming Tutorial

Andy Rudoff, Architect Datacenter Software, Intel Corporation

Abstract

A tutorial on using the open source persistent memory (PMEM) programming library offered at pmem.io; including examples providing power-fail safety, while storing data on PMEM. Programming examples will zero in on pmem.io’s transactional object store (i.e. “libpmemobj”) library, which is layered on the SNIA NVM Programming Model. The examples covered will demonstrate proper integration techniques, macros, C API, key theory concepts, terminology, and present in depth overview of what the library offers for PMEM programming initiatives.

Learning Objectives

  • Overview of persistent memory programming
  • Close examination of the transactional object store library (libpmemobj)
  • Code integration walkthrough

Networking New Persistent Memory Technologies

Idan Burstein, Architect, Mellanox
Rob Davis, VP Storage Technology, Mellanox

Abstract

New Persistent Memory Technologies like 3D-Xpoint are on the near horizon. They will be inside SSDs making them much faster than today’s NAND based products and also accessible on the memory bus. Will new network technologies like NVMe over fabrics and 100Gb fabrics be fast enough to support networked Persistent Memory storage application? If not what new technologies are in development to harness the enhanced performance of Persistent Memory across a network? What curve balls does storage on a memory bus throw at the problem? This session will explore these questions and possible solutions.

Learning Objectives

  • Understanding new Persistent Memory Technologies
  • Potential issues when attachingPersistent Memory to Networks
  • How networking Persistent Memory Technologies might be done

The SNIA NVM Programming Model

Doug Voigt, Distinguished Technologist, Hewlett Packard Enterprise

Abstract

The SNIA NVM Programming model enables applications to consume emerging persistent memory technologies through step-wise evolution to greater and greater value. Starting with an overview of the latest revision of the NVM programming model specification this session summarizes the recent work of the NVM programming TWG in areas of high availability and atomicity. We take an application view of ongoing technical innovation in a persistent memory ecosystem.

Learning Objectives

  • Learn what the SNIA NVM programming TWG has been working on.
  • Learn how applications can move incrementally towards greater and greater benefit from persistent memory.
  • Learn about the resources available to help developers plan and implement persistent memory aware software.

Enabling Remote Access to Persistent Memory on an IO Subsystem Using NVM Express and RDMA

Stephen Bates, Sr. Technical Director, Microsemi

Abstract

NVM Express is predominately a block based protocol where data is transferred to/from host memory using DMA engines inside the PCIe SSD. However since NVMe 1.2 there exists a memory access method called a Controller Memory Buffer which can be thought of as a PCIe BAR managed by the NVMe driver. Also, the NVMe over Fabrics standard was released this year that extended NVMe over RDMA transports. In this paper we look at the performance of the CMB access methodology over RDMA networks. We discuss the implications of adding persistence semantics to both RDMA and these NVMe CMBs to enable a new type of NVDIMM (which we refer to as an NVRAM). These NVRAMs can reside on the IO sub-system and hence are decoupled from the CPU and memory sub-system which has certain advantages and disadvantages over NVDIMM which we outline in the paper. We conclude with a discussion on how NVMe over Fabrics might evolve to support this access methodology and how the RDMA standard is also developing to align with this work.

Learning Objectives

  • NVM Express overview
  • NVMe over Fabrics overview
  • Using IO memory as an alternative to NVDIMM
  • Controller Memory Buffers in NVMe
  • Performance results for new access method

There and Back Again, a Practical Presentation on Using Persistent Memory with Applications

Mat Young, VP of Marketing, Netlist
Chad Dorton, App Architect, Netlist

Abstract

Storage class memory (SCM) fills the gap between DRAM and NVMe SSD. SCM has the ability to deliver performance at-or-near the performance of DRAM with the persistence of flash. In order to realize these gains, however, software stacks must be updated to take advantage of SCM in a more native fashion. In this session we will demonstrate how recent advancements in the Linux Kernel like DAX and libpmem allow firms to take of the speed of SCM without any modification of the application. Go beyond bandwidth and IOPS to the real world of application performance acceleration with Storage Class Memory.

Learning Objectives

  • Myth busting on NVM
  • Show the real world beyond IOPS
  • Explanation and Demo of DAX
  • Explanation and demo of using memoy mapped files
  • Discussion and demo of appliction modifed to use NVML

Challenges with Persistent Memory in Distributed Storage Systems

Dan Lambright, Principal Software Engineer, Red Hat

Abstract

Persistent memory will significantly improve storage performance. But these benefits can be harder to realize in distributed storage systems such as Ceph or Gluster. In such architectures, several factors mitigate gains from faster storage. They include costly network overhead inherent for many operations, and deep software layers supporting necessary services. It is also likely that the high costs of persistent memory will limit deployments in the near term. This talk shall explore steps to tackle those problems. Strategies include tiering subsets of the data or metadata in persistent memory, incorporating high speed networks such as infiniband, and the use of tightly optimized “thin” file systems. The talk will include discussion and results from a range of experiments in software and hardware.

Learning Objectives

  • Understand bottlenecks in scale-out storage when persistent memory is used
  • Learn the benifits and limits of faster networks in such configurations
  • Understand the current stacks and where improvements can be made

Building on The NVM Programming Model – A Windows Implementation

Paul Luse, Principal Engineer, Intel
Chandra Kumar Konamki, Sr Software Engineer, Microsoft

Abstract

In July 2012 the SNIA NVM Programming Model TWG was formed with just 10 participating companies who set out to create specifications to provide guidance for operating system, device driver, and application developers on a consistent programming model for next generation non-volatile memory technologies. To date, membership in the TWG has grown to over 50 companies and the group has published multiple revisions of The NVM Programming Model. Intel and Microsoft have been long time key contributors in the TWG and we are now seeing both Linux and Windows adopt this model in their latest storage stacks. Building the complete ecosystem requires more than just core OS enablement though; Intel has put considerable time and effort into a Linux based library, NVML, that adds value in multiple dimensions for applications wanting to take advantage of persistent byte addressable memory from user space. Now, along with Intel and HPE, Microsoft is moving forward with its efforts to further promote this library by providing a Windows implementation with a matching API. In this session you will learn the fundamentals of the programming model, the basics of the NVML library and get the latest information on the Microsoft implementation of this library. We will cover both available features/functions and timelines as well as provide some insight into how the open source project went from idea to reality with great contributions from multiple companies.

Learning Objectives

  • NVM Programming Model Basics
  • The NVM Libraries (NVML)
  • The Windows Porting Effort

Breaking Barriers: Making Adoption of Persistent Memory Easier

Sarah Jelinek, SW Architect, Intel Corp

Abstract

One of the major barriers to adoption of persistent memory is preparing applications to make use of it's direct access capabilities. This presentation will discuss a new user space file system for persistent memory and how it breaks these barriers. The presentation will introduce the key elements to consider for a user space persistent memory file system and discuss the internals of this new file system. The discussion will conclude with a presentation of current status and performance of this new persistent memory file system.

Learning Objectives

  • Discussion of current barriers to persistent memory adoption.
  • Introduce how this new file system breaks down the barriers to adoption of persistent memory.
  • Introduce the SW internals of the this file system.
  • Present performance statistics and discussion of why this file system out-performs conventional, kernel based file systems.

Experience and Lessons from Accelerating Linux Application Performance Using NVMp

Vinod Eswaraprasad, Chief Architect, Wipro Technologies

Abstract

The SNIA Nonvolatile Memory Programing Model defines recommendations on how NVM behavior can be exposed to application software. There are Linux NVM library implementations that provide a rich set of interfaces which can be used by applications to improve their performance drastically on systems with NVM.

We analyzed these available interfaces and how it can be leveraged in typical applications. Our work demonstrates how a sample OpenSource Linux application can make use of these interfaces for improving performance. We also give examples of analyzing application code and finding out opportunity to use NVMp style interfaces. The sample application we would discuss is Linux Sqllite, and found 9 such opportunities to use Linux NVMP compatible interfaces. We also show how these storage optimizations can improve overall I/O and performance of such applications.

Learning Objectives

  • NVMp - programming model and use cases
  • Linux NVMP implementations
  • Analyzing application for usage of NVM
  • Measure performance improvements in application - with specific workload focusing on I/O

RDMA Verbs Extensions for Persistency and Consistency

Idan Burstein, Storage Architect, Mellanox

Abstract

Standard block/file storage protocols assume that the data will be bounced through intermediate buffer in memory while writing into the remote disk, this assumption came from two interface characteristics of the underlying storage: 1. Storage latency is higher with respect to network latency 2. Storage is not byte addressable, accessing the storage requires asynchronous command/completion interface. Persistent memory (PMEM) / storage class memory (SCM) breaks the assumptions above, PMEM devices expose byte addressable storage that can be accessible through RDMA network and have the latency characteristics that fits the latency of high speed RDMA networks, sub microsecond latency. In this talk I will discuss the current IB/RoCE VERBs semantics for accessing memory through RDMA and the reliability model within them, I will share the challenges of using the VERBs reliability for accessing persistent memory and share the initial thought about the reliability and consistency extensions work we are doing in IBTA in order to meet the reliability requirement of storage access.

Learning Objectives

  • PMEM
  • RDMA
  • Reliability
  • File system
  • PMEM.IO

In-memory Persistence: Opportunities and Challenges

Dhruva Chakrabarti, Research Engineer, Hewlett Packard Labs

Abstract

The advent of persistent memory enables a new class of applications where objects can be persisted in-memory as well as reused across executions and applications. This storage paradigm gets rid of the impedance mismatch between in-memory and stored data formats that is inherent using today’s block storage. A single (object) format of data should result in both performance and programmability benefits.

However, it is far from clear how to program persistent memory in a failure-tolerant manner. Opening up persistence to arbitrary data structures implies that any failure-tolerance technique should be largely transparent to the programmer while incurring low overheads. In this talk, I will review prior work on persistent programming paradigms and describe some recent work that provides consistency support with zero to minimum code changes. The audience can expect to learn about a specific model and its APIs. Other challenges and possible future directions will be discussed.

Learning Objectives

  • Overview of persistent memory and examples of upcoming platforms surrounding it
  • The existing literature on persistent memory programming
  • In-memory persistence and associated consistency pitfalls
  • An example failure-resilient persistent memory programming platform

Low Latency Remote Storage - a Full-stack View

Tom Talpey, Architect, Microsoft

Abstract

A new class of ultra low latency remote storage is emerging - nonvolatile memory technology can be accessed remotely via high performance storage protocols such as SMB3, over high performance interconnects such as RDMA. A new ecosystem is emerging to "light up" this access end-to-end. This presentation will explore one path to achieve it, with performance data on current approaches, analysis of the overheads, and finally the expectation with simple extensions to well-established protocols.

Learning Objectives

  • Understand the potential for low latency remote storage
  • Survey the protocols and interfaces in use today
  • See current performance data, and future performance expectations
  • See a view of the future of the end-to-end storage revolution

RDMA Extensions for Accelerating Remote PMEM Access - HW and SW Considerations, Architecture, and Programming Models

Chet Douglas, Principal SW Engineer, Intel Corporation

Abstract

The SNIA Nonvolatile Memory Programing Model defines recommendations on how NVM behavior can be exposed to application software. There are Linux NVM library implementations that provide a rich set of interfaces which can be used by applications to improve their performance drastically on systems with NVM.

We analyzed these available interfaces and how it can be leveraged in typical applications. Our work demonstrates how a sample OpenSource Linux application can make use of these interfaces for improving performance. We also give examples of analyzing application code and finding out opportunity to use NVMp style interfaces. The sample application we would discuss is Linux Sqllite, and found 9 such opportunities to use Linux NVMP compatible interfaces. We also show how these storage optimizations can improve overall I/O and performance of such applications.

Learning Objectives

  • Introduce HW Architecture concepts of Intel platforms that will affect RDMA usages with PM
  • Introduce proposed high-level HW modifications that can be utilized to provide native HW support for pmem, reduce RDMA latency, and improve RDMA with pmem bandwidth
  • Focused discussion on proposed Linux libfabric and libibverb interface extensions and modification to support the proposed HW extensions Discuss open architecture issues and limitations with the proposed HW and SW extensions
  • Discuss Intel plans for standardization and industry review

Professional Development

How Agile is Game Changer in Storage Development and Best Practices to Transform from Traditional Model to Agile Model

Saurabh Bhatia, Storage Test Specialist, IBM

Abstract

Over the past few years Agile has been widely accepted by companies world wide.

This flexible, holistic product development strategy where a development team works as a unit to reach a common goal",challenges assumptions of the "traditional, sequential approach"to product development and even the storage industry has welcomed it it with open arms.

But it's not easy to transform from decades old development model to a new model, with this presentation we want to highlight the key areas, challenges and solutions to the most common and sometimes complex problems that organizations face when Scrum model is adopted over traditional development models.

Also touching upon basics of Agile model, the terminologies, roles and processes involved. What should be the best practices and how to leverage best outcome from scrum model.

Learning Objectives

  • Agile and scrum model deep dive
  • Challenges and problems faced by organisations on adopting Scrums and best practices involved
  • Use Case :How EMC's Vplex team adopted scrum and sailed through, proving as clear winners

Security

SNIA Tutorial:
Implementing Stored-Data Encryption

Michael Willett, VP Marketing, Drive Trust Alliance

Abstract

Data security is top of mind for most businesses trying to respond to the constant barrage of news highlighting data theft, security breaches, and the resulting punitive costs. Combined with litigation risks, compliance issues and pending legislation, companies face a myriad of technologies and products that all claim to protect data-at-rest on storage devices. What is the right approach to encrypting stored data?

The Trusted Computing Group, with the active participation of the drive industry, has standardized on the technology for self-encrypting drives (SED): the encryption is implemented directly in the drive hardware and electronics. Mature SED products are now available from all the major drive companies, both HDD (rotating media) and SSD (solid state) and both laptops and data center. SEDs provide a low-cost, transparent, performance-optimized solution for stored-data encryption. SEDs do not protect data in transit, upstream of the storage system.

For overall data protection, a layered encryption approach is advised. Sensitive data (eg, as identified by specific regulations: HIPAA, PCI DSS) may require encryption outside and upstream from storage, such as in selected applications or associated with database manipulations.

This tutorial will examine a ‘pyramid’ approach to encryption: selected, sensitive data encrypted at the higher logical levels, with full data encryption for all stored data provided by SEDs.

Learning Objectives

  • The mechanics of SEDs, as well as application and database-level encryption
  • The pros and cons of each encryption subsystem
  • The overall design of a layered encryption approach

Storage Security Evolution and Advancements in NVMe Interactions

Dr. Jorge Campello, Director of System and Software Technologies, Western Digital Corporation; TCG Storage Work Group Chair
Thomas Bowen, Storage Security Architect, Intel Corporation; Intel TCG Storage Workgroup Standards Representative

Abstract

This presentation explores the continued evolution of Trusted Computing Group's Storage Security standards, including definition of advanced configurations for Self-Encrypting Drive (SED) interactions with NVMe features like Namespaces and Fabrics; and the recent launch of a certification program for Opal SEDs.

Learning Objectives

  • New evolution of TCG Storage specifications
  • Interactions between SEDs and NVMe namespaces
  • Authentication for NVMe over fabrics
  • Certification Program information

When Will Self-Encrypting Drives Predominate?

Tom Coughlin, President, Coughlin Associates, Inc.
Walt Hubis, President, Hubis Consulting

Abstract

Self-Encrypting Drives (SEDs) have found applications in enterprise storage where crypto-erase allows rapid sanitization of retired drives but their use in client storage devices is still minimal. Learn about the history and uses for SEDs and what needs to happen to bring them into broader use based upon results from a recent poll of client end users.

Learning Objectives

  • What are SEDs used for?
  • What does a recent survey show are factors in client SED success?
  • What is needed to make SEDs widespread?

Active Directory Client Scaling Challenges

Marc VanHeyningen, Principal Software Engineer, Isilon/EMC

Abstract

Isilon’s OneFS is a clustered NAS capable of scaling to multi-petabyte sizes and handling millions of IOPS. The problems of scaling servers to this degree are well understood, but what about client operations? This talk discusses challenges of joining such a cluster to a complex Active Directory infrastructure, including customer war stories and implementation details of our scalable, resilient client implementation.

Note that this is something of a follow-on to a BOF session from last year’s SDC with new stories and updated practices. The evening scheduling was helpful to producing an informal, collaborative session.

Learning Objectives

  • Challenges associated with integrating two complex distributed systems
  • Best practices for scalable clients of AD
  • Isilon’s implementation of AD client operation

Multi-vendor Key Management – Does It Actually Work?

Tim Hudson, CTO, Cryptsoft

Abstract

A standard for interoperable key management exists but what actually happens when you try to use products and key management solutions from multiple vendors? Does it work? Are any benefits gained?

Practical experience from implementing the OASIS Key Management Interoperability Protocol (KMIP) and from deploying and interoperability testing multiple vendor implementations of KMIP form the bulk of the material covered.

Guidance will be provided that covers the key issues to require that your vendors address and how to distinguish between simple vendor tick-box approaches to standard conformance and actual interoperable solutions.

Learning Objectives

  • Knowledge of practical approaches to Key Management
  • Awareness of how to compare vendor approaches

SMB

Cephalopods and Samba?

Ira Cooper, Principal Software Engineer, Red Hat

Abstract

Ceph is a open source software defined storage system with very mature object and block interfaces. The CephFS file system is a rapidly maturing part of Ceph that will be of increasing interest for many workloads. It is expected that, as it has with most file systems, SMB will quickly become a key access method for CephFS. Therefore, Samba integration with CephFS will be key to the futures of both Samba and Ceph.

Come and see how Samba has already integrated with CephFS and future directions for integration.

Learning Objectives

  • Basic Ceph
  • Basic Samba
  • Current Ceph + Samba
  • Future Ceph + Samba directions

SMB3.1.1 and Beyond in the Linux Kernel: Providing Optimal File Access to Windows, Mac, Samba and Other File Servers

Steven French, Principal Systems Engineer: Protocols, Samba Team/Primary Data

Abstract

With many SMB3 servers in the industry, some with optional extensions, getting optimal configuration, and POSIX compliance can be confusing.

The Linux kernel client continues to improve with new performance and security features, and implementation of SMB3.1.1. This presentation will discuss recent enhancements to the Linux kernel client, as well as extensions to provide improved Apple (AAPL) interoperability and new POSIX extensions for Linux/Samba (see Jeremy Allison's presentation) and improved POSIX emulation to other servers.

New copy offload features will be demonstrated, as well as the current state of POSIX compatibility to different server types. Finally, a discussion of new protocol features under development in the Linux kernel client will be discussed.

Learning Objectives

  • How POSIX compliant is access from Linux to SMB3 servers?
  • How can I best configure access depending on server type: Windows or Mac or Samba or other NAS?
  • What is the status of the Linux kernel client? What new features are available?
  • How can I get optimal performance out of the Linux kernel client
  • What are the advantages to using SMB3 for Linux?

SMB3 and Linux - A Seamless File Sharing Protocol

Jeremy Allison, Engineer, Samba Team/Google

Abstract

SMB3 is the default Windows and MacOS X file sharing protocol, but what about making it the default on Linux ? After developing the UNIX extensions to the SMB1 protocol, the Samba developers are planning to add UNIX extensions to SMB3 also. Co-creator of Samba Jeremy Allison will discuss the technical challenges faced in making SMB3 into a seamless file sharing protocol between Linux clients and Samba servers, and how Samba plans to address them. Come learn how Samba plans to make NFS obsolete (again :-) !

Learning Objectives

  • SMB3
  • Linux
  • Windows interoperability

Improving DCERPC Security

Stefan Metzmacher, Developer, SerNet/Samba Team

Abstract

This talk explains the upcoming DCERPC security improvements in Samba after the badlock bug. These changes are designed to be backward compatible and hopefully implemented by other products as well.

Learning Objectives

  • What the problems are
  • How the urgent fixes look like
  • How the protocol can be further hardened

SMB3 in Samba – Multi-Channel and Beyond

Michael Adam, Architect and Manager, Red Hat

Abstract

The implementation of SMB3 is a broad and important set of topics on the Samba roadmap. After a longer period of preparations, the first and the most generally useful of the advanced SMB3 features has recently arrived in Samba: Multi-Channel. This presentation will explain Samba's implementation of Multi-Channel, especially covering the challenges that had to be solved for integration with Samba's traditional clustering with CTDB, which is invisible to the SMB clients and hence quite different from the clustering built into SMB3. Afterwards an outlook will be given on other areas of active development like persistent file handles, RDMA, and scale-out SMB clustering, reporting on status and challenges.


Clustered Samba Challenges and Directions

Volker Lendecke, Developer, Samba Team/SerNet

Abstract

Clustered Samba together with ctdb have been successful together for many years now. This talk will present latest developments in Samba and ctdb from the Samba perspective. It is directed towards implementors of Samba in clustered storage products, as well as developers interested in the challenges the SMB protocol carries in clustered environments.

* Performance / Scalability
The core ctdb engine is one single threaded daemon per cluster node. Samba will take over clusterwide messaging from ctdb.

* Stability
Samba has an inherent potential for deadlocking: Smbd in many situations has to lock more than one tdb file simultaenously. Samba itself has proper locking hierarchies, but if other components like file systems and ctdb can also take locks and block, things become messy. Samba will offer changes such that no more than one kernel-level lock will be taken at a time.

* Databases for Persistent file handles
To implement clusterwide persistent file handles, Samba needs a middle ground between completely volatile databases that can go away with a cluster node and persistent databases that need to survive node crashes. Based on the database restructuring work mentioned above, Samba will enable a new database consistency model to enable persistent file handles.

Learning Objectives

  • Status update for clustered
  • Samba Infrastructure challenges in ctdb and Samba
  • New directions of development in clustered Samba

SMB Server: Kernel versus User Space Learnings

Oleg Kravtsov, Lead Developer, Tuxera

Abstract

SMB protocol can be implemented in both user and kernel space but the choice depends on understanding the differences between the two modes and their use cases: enterprise and consumer/embedded NAS. Since the two server modes add an overhead cost, it is important to understand when and where to implement them considering their features and performance. In this presentation, I will compare the two versions and how their architecture affects the use case’s needs. 

Learning Objectives

  • Strengths and weaknesses of SMB Linux kernel and SMB user space versions.
  • Benchmarking and tuning the kernel and user space SMB (measured through download/upload speed, CPU utilization, memory consumption, memory fragmentation)
  • Zoom in on the architecture: version-specific differences and shared commonalities.

Sailing in Uncharted Waters. A Story of Implementing Apple MacOS X Extensions for SMB2 in EMC Isilon OneFS Server

Rafal Szczesniak, Principal Software Engineer, EMC

Abstract

Starting from MacOS X 10.9 Apple has turned to SMB2 as the preferred protocol for file sharing. Although it was supported earlier it has become the primary file sharing protocol for the OS X platform. In order to keep supporting a number of features specific to OS X file system Apple's SMB2 implementation has been equipped with several extensions and optimisations aimed at maintaining a positive user experience when working over the new protocol. That naturally sets new expectations from the clustered storage vendors because many of their customers are heavy-weight OS X users. Unfortunately, the scarcity of an accurate documentation makes the task more difficult. The talk describes experiences from investigating and implementing some of the extensions in EMC Isilon scale-out clusters to provide a more Apple-friendly SMB2 server.

Learning Objectives

  • How does MacOS X file sharing translate to SMB2 protocol?
  • The actual extensions and their implementation based only on publicly available information
  • SMB2 client behaviour specific to MacOS X
  • Possible optimisations

Building a Highly Scalable and Performant SMB Protocol Server

Sunu Engineer, CTO, Ryussi Technologies

Abstract

We discuss the architecture and design of a high performance SMB server on Unix platforms especially Linux and FreeBSD. We elaborate on the considerations that go into getting high performance out of the SMB server when running against a variety of data stores.


Samba and NFS Integration

Steven French, Principal Systems Engineer Protocols, PrimaryData Corp

Abstract

Samba is used on a wide variety of local file systems, but it can also be used on cluster filesystems and now on NFS. This presentation will describe our experiences running Samba on NFS, with Samba as a gateway to pNFS storage, and the ways to best integrate the two protocols on Linux when run in this configuration.

The presentation will discuss how ACLs and SMB3 specific metadata can be handled, as well as some performance considerations.

Learning Objectives

  • How to configure Samba as a gateway over pNFS What limitations does Samba have when run with NFS
  • What additional enhancements are needed to the NFS client to make this better?
  • What about the reverse?
  • What would happen if you tried to run NFS over SMB3?

Panel Discussion of SMB3 POSIX Protocol Extensions

Steven French, Principal Systems Engineer Protocols, PrimaryData Corp/Samba Team
Jeremy Allison, Samba Team, Google

Abstract

With the excellent clustering, reliability, performance and security features brought by SMB3, there has been considerable interest in improving its ability to better support POSIX clients such as Linux, Unix, and Mac. Earlier presentations discussed the details of the proposed extensions. This panel discussion will encourage feedback and information exchange on additional requirements for better practical POSIX compliance using extensions to the SMB3 protocol.

SMR

The Shingled Magnetic Recording (SMR) Revolution – Data Management Techniques Examined

Tom Coughlin, President, Coughlin Associates

Abstract

The unyielding growth of digital data continues to drive demand for higher capacity, lower-cost storage. With the advent of Shingled Magnetic Recording (SMR), which overlaps HDD tracks to provide a 25 percent capacity increase versus conventional magnetic recording technology, storage vendors are able to offer extraordinary drive capacities within existing physical footprints. That said, IT decision makers and storage system architects need to be cognizant of the different data management techniques that come with SMR technology, namely Drive Managed, Host Managed and Host Aware. This panel session will offer an enterprise HDD market overview from prominent storage analyst Tom Coughlin as well as presentations on SMR data management methods from leading SMR HDD manufacturers (Seagate, Toshiba and Western Digital).

Learning Objectives

  • Enterprise HDD market update
  • Brief introduction of Shingled Magnetic Recording (SMR)
  • Deep dive into SMR data management techniques

ZBC/ZAC Support in Linux

Damien Le Moal, Sr. Manager, Western Digital

Abstract

With the support for the ZBC and ZAC shingled magnetic recording (SMR) command standards maturing in Linux kernel, application developers can now more easily implement support for these emerging high-capacity disks. This presentation will go through the basics of SMR support in Linux kernel and provide an overview of different application implementation approaches, including the use of available SMR abstraction layers (file system and device mappers).

Learning Objectives

  • Linux kernel SMR disk management
  • Applications SMR constraints
  • SMR abstraction layers

ZDM: Using an STL for Zoned Media on Linux

Shaun Tancheff, Software Engineer, AeonAzure LLC

Abstract

As zoned block devices supporting the T10 ZBC and T13 ZAC specifications are becoming available there few strategies for putting these low TCO ($/TB and watts/TB) drives into existing storage clusters with minimal changes to the existing software stack.

ZDM is a Linux device mapper target for zoned media that provides a Shingled Translation Layer (STL) to support a normal block interface at near normal performance for certain use cases.

ZDM-Device-Mapper is an open source project available on github with a goal of being upstreamed to the kernel.

Learning Objectives

  • ZDM performance compared to existing disc drive options
  • How to determine if your workload is a likely candidate for using ZDM.
  • How to use and tune ZDM effectively for your workload.
  • Using ZDM in cloud storage environments.
  • Using ZDM in low resource embedded NAS / mdraid.

Software Defined Storage

Storage Spaces Direct - A Software Defined Storage Architecture

Vinod Shankar, Principal Development Lead, Microsoft Corporation

Abstract

Storage Spaces Direct is Microsoft’s Software Defined Storage architecture for the private cloud. This talk will cover the technical architecture of S2D and its evolution from our SDC 2015 presentation to the final form for Windows Server 2016. We will discuss implementation of higher order fault domains, technical details and rationale for multi-resilient virtual disks that blend mirror and parity encoding, stretch clusters with synchronous replication and more. Software defined storage solutions can (and should still!) be based on industry-standard hardware!

Learning Objectives

  • Software defined storage on windows
  • Hybrid storage

Solid State Storage

NVMe Virtualization Ideas for Virtual Machines on Cloud

Sangeeth Keeriyadath, Senior Staff Software Engineer, IBM

Abstract

Wider adoption of Flash did bring in some anxious moments for the seasoned storage industry; and it is expected to cause more jitters with increased adoption of Non-Volatile Memory Express(NVMe) standard. Not only that, there is a need to rethink the existing storage virtualization solutions predominantly focussed around SCSI; and embrase NVMe.

Virtual Machine "storage stack" is bound to benefit from the optimized(low latency) and high performining(parallelism) of NMVe interface. This presentation is focused around detailing the various approaches available for virtualizing NMVe and make it available for Virtual Machines by maintaining same/better flexibility as today and reducing cost :

  1. Implementing SCSI to NVMe translation layer on the Host/Hypervisor. ( Blind Mode )
  2. Pure virtual NVMe stack by distributing I/O queues amongst hosted VMs. ( Virtual Mode )
  3. SR-IOV based NVMe controllers per virtual functions ( Physical mode )

Learning Objectives

  • Storage virtualization
  • NVMe
  • SSD

The Coming of Ultra-low Latency SSDs

Tien Shiah, Product Marketing Manager, Samsung

Abstract

For years, OEMs have attempted to find an optimal balance between DRAM and NAND for the best performance in end-user applications. Latency has always been a concern for the industry when it comes to high performance. The question has been how to make low-latency provisioning cost-efficient. 3D NAND technology is changing all that. With the transformation of 3D NAND into a manufacturing solution for the entire industry in 2016, OEMs will find ultra-low latency in SSD offerings a viable alternative to using more DRAM in caching and in-memory applications. Some believe that this will be the next big thing in NAND memory. Hear how a new class of SSDs will be able to fulfill the high performance requirements of premium applications using flash memory.

Learning Objectives

  • 3D NAND based SSDs can address low latency requirements of time-sensitive applications
  • This new class of SSDs addresses demanding requirements in the most cost-effective way
  • The new class of SSDs does not require any changes to existing server architecture

Accelerating OLTP performance with NVMe SSDs

Veronica Lagrange, Staff Engineer, Samsung
Changho Choi, Principal Engineer, Samsung
Vijay Balakrishnan, Director, Samsung

Abstract

We compare multiple SSDs performance when running OLTP applications on both MySQL Server and Percona Server. We discuss important configuration tuning to allow the Server to benefit from faster storage, and present results using an implementation of the TPC-C standard. We show a paradigm shift, where a typical OLTP workload running on HDDs is I/O bound, but that by replacing storage with NVMe SSDs, that same workload on the otherwise same server may yield 2 orders of magnitude more throughput and furthermore become CPU bound.

Learning Objectives

  • How to optimize transactional, SQL servers for fast storage
  • Important parameters and tradeoffs
  • How much throughput improvement to expect
  • How much response time improvement to expect when the original bottleneck is storage
  • Potential server capacity gains when moving to faster storage

An Examination of Storage Workloads

Eden Kim, CEO, Calypso Testers

Abstract

SSD performance depends on the workload that it sees. IO streams are constantly changed as they traverse the IO stack such that IO streams generated in user (application) space differ from what the storage sees. Here, we look at a variety of real world workload captures and how these workloads affect SSD performance.

Learning Objectives

  • Understanding what real world workloads are
  • Seeing how real world workloads affect storage performance
  • Comparing workloads across different SSDs

Analysis of SSD health and Prediction of SSD life

Dr. M. K. Jibbe, Technical Director, NetApp
Bernard Chan, NetApp

Abstract

Unlike HDDs, which have some parameters that are specific to magnetic hard drives, SSD do not have such parameters Instead, they have other variables representing overall health of the disk. SMART (Self-Monitoring, Analysis and Reporting Technology) tools. Such tool calculates SSD health by analyzing the following variables: Reallocated Sectors Count, Current Pending Sectors Count, Uncorrectable Sector Count, as well as Percentage of the Rated Lifetime Used (i.e. SSD Life Left, whichever is available). In this paper we will show the two methods which we used to calculate the health of the drive. We will show you how you can accurately predict the life of an SSD, when you must consider replacement of an SSD, and when you need to consider online backup to the cloud storage.

Learning Objectives

  • How do you calculate SSD life?
  • Do we need to calculate the SSD life?
  • Is the Wear level a good indicator for the SSD life?
  • What is the common root of SSD failure?
  • What is the right way to estimate SSD lifetime?

Breaking Through Performance and Scale Out Barriers - A Storage Solution for Today's Hot Scale Out Applications

Bob Hansen, V.P. Systems Architecture, Apeiron Data Systems

Abstract

This presentation will review how classic storage architectures create barriers to throughput and scaleability for several of today's hot applications. Case studies and benchmarks will be used to illustrate how a very high performance, NVMe networked storage solution allows these applications to break through these barriers.

Learning Objectives

  • Understand how storage can limit scale out application performance
  • Understand how new scale out apps apply different tiers of storage
  • Understand barriers to application scale out
  • Understand barriers to application throughput
  • Understand how networked NVMe storage can dramatically improve app performance and scalability

Storage Architecture

The Magnetic Hard Disk Drive Today’s Technical Status and Its Future

Edward Grochowski, Consultant, Memory/Storage
Peter Goglia, President, VeriTekk Solutions

Abstract

The ubiquitous magnetic hard disk drive continues to occupy a principal role in all storage applications, shipping more bytes than all other competing product technologies. The emergence of cloud computing has firmly established the future HDD products well into the next decade. This presentation will discuss today’s HDD products, their technical characteristics, as; performance, reliability and endurance, capacity as well as cost per gigabyte. The technologies which enable these properties as; form factor, interface, shingle write, helium ambient, two dimensional magnetic recording (TDMR) and yet to be implemented heat assisted magnetic recording (HAMR), will be detailed. A projection of disk drives of the future will be made and their competiveness to flash as well as emerging non volatile memories (NVM) will be discussed.


An Enhanced I/O Model for Modern Storage Devices

Martin Petersen, Architect, Oracle

Abstract

While originally designed for disk drives, the read/write I/O model has provided a common storage abstraction for decades, regardless of the type of storage medium.

Devices are becoming increasingly complex, however, and the constraints of the old model have compelled the standards bodies to develop specialized interfaces such as the Object Storage Device and the Zoned Block Commands protocols to effectively manage the storage.

While these protocols have their place for certain workloads, there are thousands of filesystems and applications that depend heavily on the old model. It is therefore compelling to explore how the read/write mechanism can be augmented using hints and stream identifiers to communicate additional information that enables the storage to make better decisions.

The proposed model is applicable to all types of storage devices and alleviates some of the common deficiencies with NAND flash and Shingled Magnetic Recording which both require careful staging of writes to media.

Learning Objectives

  • Present the proposed enhancements to a wider audience
  • Solicit feedback and nurture discussion
  • Demonstrate how the proposed enhancements reduce write amplification on flash and assist with data placement on SMR

Corporate/Open Source Community Relationships: The OpenZFS Example

Michael Dexter, Senior Analyst, iXsystems

Abstract

Corporations and the global Open Source community have had a colorful relationship over the decades with each group struggling to understand the priorities, abilities and role of the other. Such relationships have ranged from hostile to prodigiously fruitful and have clearly resulted in the adoption of Open Source in virtually every aspect of computing. This talk will explore the qualities and precedences of strong Corporate/Open Source relationships and focus on the success of the OpenZFS enterprise file system project as a benchmark of contemporary success. I will explore:

  • Historical and contemporary corporate/open source relationship precedences
  • Corporation/Project non-profit foundation relationships
  • Pragmatic project collaboration and event participation strategies
  • Motivations for relationship building

Learning Objectives

  • How do I work with the Open Source community?
  • What organizations can I turn to for guidance and participation?
  • What tangible resources can the community provide?

SNIA Tutorial:
Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Irfan Ahmad, CTO, CloudPhysics

Abstract

It is well-known that cache performance is non-linear in cache size and the benefit of caches varies widely by workload. Irrespective of whether the cache is in a storage system, database or application tier, no two real workload mixes have the same cache behavior! Existing techniques for profiling workloads don’t measure data reuse, nor do they predict changes in performance as cache allocations are varied.

Recently, a new, revolutionary set of techniques have been discovered for online cache optimization. Based on work published at top academic venues (FAST '15 and OSDI '14), we will discuss how to 1) perform online selection of cache parameters including cache block size and read-ahead strategies to tune the cache to actual customer workloads, 2) dynamic cache partitioning to improve cache hit ratios without adding hardware and finally, 3) cache sizing and troubleshooting field performance problems in a data-driven manner. With average performance improvements of 40% across large number of real, multi-tenant workloads, the new analytical techniques are worth learning more about.

Learning Objectives

  • Storage cache performance is non-linear; Benefit of caches varies widely by workload mix
  • Working set size estimates don't work for caching Miss ratio curves for online cache analysis and optimization
  • How to dramatically improve your cache using online MRC, partitioning, parameter tuning
  • How to implement QoS, performance SLAs/SLOs in caching and tiering systems using MRCs

SNIA Tutorial:
Snapshotting Eventually Consistent Scale-out Storage: The Pitfalls and Solutions

Alex Aizman, Chief Advisor and Founder, Nexenta Systems

Abstract

The most difficult to support is the requirement of consistency that often implies not only storage system’s own internal consistency (which is mandatory) but the application level consistency as well. Distributed clocks are never perfectly synchronized: temporal inversions are inevitable. While most I/O sequences are order indifferent, we cannot allow inconsistent snapshots that reference user data contradicting the user’s perception of ordering. Further, photo snapshots do not require the subjects to freeze: distributed snapshot operation must not require cluster-wide freezes. It must execute concurrently with updates and eventually (but not immediately) result in a persistent snapshot accessible for reading and cloning. This presentation will survey distributed snapshotting, explain and illustrate ACID properties of the operation and their concrete interpretations and implementations. We will describe MapReduce algorithms to snapshot a versioned eventually consistent object cluster. Lastly the benefits of atomic cluster-wide snapshots for distributed storage clusters will be reviewed. True cluster-wide snapshots enable capabilities and storage services that had been feared lost when storage systems scaled-out beyond transactional consistency of medium-size clusters.

Learning Objectives

  • Distributed copy-on-write snapshots - a lost art?
  • Usage of client-defined timestamps to support causal consistency
  • MapReduce programming model vs. cluster-wide snapshotting – a perfect fit

storhaug: High-Availability Storage for Linux

Jose Rivera, Software Engineer, Red Hat/Samba Team

Abstract

storhaug is an open source project that brings protocol-agnostic high-availability (HA) to Linux-based clustered storage systems. It aims to provide a storage-focused system design based on Pacemaker, an open source HA cluster resource manager. This presentation will take a look at the project's components and design, what it learned from previous solutions like CTDB, and what it's done differently.

Learning Objectives

  • HA basics and considerations for design
  • How the notion of state recovery differs between access protocols
  • Challenges in putting together HA cluster systems
  • What are the components of storhaug and how are they used

Storage Management

Introduction and Overview of Redfish

Patrick Boyd, Principal Systems Management Engineer, Dell
Jeff Autor, Distinguished Technologist, HP Enterprise

Abstract

Designed to meet the expectations of end users for simple, modern and secure management of scalable platform hardware, the DMTF’s Redfish is an open industry standard specification and schema that specifies a RESTful interface and utilizes JSON and OData to help customers integrate solutions within their existing tool chains.

This session provides an overview of the Redfish specification, including the base storage models and infrastructure that are used by the SNIA Swordfish extensions (see separate sessions for details).

We will cover details of the Redfish approach, as well as information about the new PCIe and memory models added to support storage use cases.

Learning Objectives

  • Introduction to Redfish concepts
  • Application of REST APIs to standards management

What One Billion Hours of Spinning Hard Drives Can Tell Us?

Gleb Budman, CEO, Backblaze

Abstract

Over the past 3 years we’ve been collecting daily SMART stats from the 60,000+ hard drives in our data center. These drives have over one billion hours of operation on them. We have data from over 20 drive models from all major hard drive manufacturers and we’d like to share what we’ve learned. We’ll start with annual failure rates of the different drive models. Then we’ll look at the failure curve over time, does it follow the “bathtub curve” as we expect. We’ll finish by looking a couple of SMART stats to see if they can reliably predict drive failure.

Learning Objectives

  • What is the annual failure rate of commonly used hard drives?
  • Do hard drives follow a predictable pattern of failure over time?
  • How reliable are drive SMART stats in predicting drive failure?

Managing Data by Objectives

Douglas Fallstrom, Senior Director of Product Management, Primary Data

Abstract

As data usage continues to skyrocket, IT teams are challenged to keep up with ever-growing storage infrastructure. Data virtualization allows enterprises to move beyond manual migrations and manage data by objectives, so that data moves automatically to the ideal storage type to meet evolving needs across performance, price and protection. This session will provide an introduction to how objective-based data management will enable IT to simplify and solve data growth challenges.

Learning Objectives

  • How data virtualization enables objective-based management
  • How objective-based data management saves IT admins time and budget
  • How dynamic data mobility works within existing storage environments
  • How objective-based data management will continue to evolve as a technology
  • How objective-based management simplifies data migrations

The Data Feedback Loop: Using Big Data to Enhance Data Storage

Shannon Loomis, Data Scientist, Nimble Storage

Abstract

Benchmarks are useful for comparing data storage arrays, but they don’t necessarily relate to how consumers utilize the product in the field. The best way to truly understand how arrays perform across the full breadth of use cases is through combined analysis of configuration elements, log events, and sensor data. These analyses can enhance all aspects of product monitoring and operation, including (but not limited to) performing health checks, modeling storage needs, and providing performance tuning. Here I will discuss both the benefits and challenges of ingesting, maintaining, and analyzing the volume, variety, velocity, and veracity of big data associated with storage arrays.

Learning Objectives

  • Gain insight into how customers use data storage arrays in real-life situations
  • Recognize how the data science/engineering feedback loop improves array technology
  • Understand the challenges associated with big data from storage arrays
  • Demonstrate the importance of data curation from product inception

Hyper-V Windows Server 2016 Storage QoS and Protocol Updates

Matt Kurjanowicz, Senior Software Engineer Lead, Microsoft
Adam Burch, Software Engineer II, Microsoft

Abstract

A discussion of the Hyper-V Storage Quality of Service protocol (MS-SQOS) and updates to networked storage requirements for Windows Server 2016 virtual machine hosts. This session is targeted at engineers and product managers of providers of networked storage to Hyper-V hosts and anyone else interested in how Hyper-V hosts utilize storage. Those attending this session should be able to gain familiarity with customer scenarios around storage quality of service, should be able to describe the purpose and scope of the MS-SQOS protocol, and should understand how Windows Server 2016 made changes to how it uses networked storage.

Learning Objectives

  • Describe the purpose and scope of the MS-SQOS protocol
  • Become aware of customer scenarios around storage quality of service
  • Enumerate updates in how Hyper-V hosts use networked storage in Windows Server 2016

Time to Say Good Bye to Storage Management with Unified Namespace, Write Once and Reuse Everywhere Paradigm

Anjaneya Chagam, Principal Engineer, Intel

Abstract

Cloud computing frameworks like Kubernetes are designed to address containerized application management using "service" level abstraction for delivering smart data center manageability. Storage management intelligence and interfaces need to evolve to support "service" oriented abstraction. Having every computing framework reinvent the storage integration makes the storage management more complex from end user perspective. Moreover it adds significant burden on storage vendors to write drivers and certify for every orchestration stack which is least desirable. In this session, we present industry wide effort to develop unified storage management interfaces that work across traditional and cloud computing frameworks and eliminate the need to reinvent storage integration.

Learning Objectives

  • Storage integration in container frameworks
  • Unified storage management interface
  • Open source community work

Overview of Swordfish: Scalable Storage Management

Richelle Ahlvers, Principal Storage Management Architect, Broadcom Limited

Abstract

The SNIA’s Scalable Storage Management Technical Work Group (SSM TWG) is working to create and publish an open industry standard specification for storage management that defines a customer centric interface for the purpose of managing storage and related data services. This specification builds on the DMTF’s Redfish specification’s using RESTful methods and JSON formatting. This session will present an overview of the specification being developed by SSM including the scope targeted in the initial (V1) release in 2016 vs later (2017). This session will also provide the positioning of the specification developed by the SSM TWG vs SMI-S as well as the base Redfish specification.


Swordfish Deep-dive: Scalable Storage Management

Richelle Ahlvers, Principal Storage Management Architect, Broadcom Limited

Abstract

Building on the concepts presented in the Introduction to Swordfish (and Redfish) sessions, this session will go into more detail on the new Swordfish API specification.

Learning Objectives

  • Introduction to the specifics of the Swordfish API
  • Working with the Swordfish Schema

Storage Networking

SNIA Tutorial:
SAS: Today’s Fast and Flexible Storage Fabric

Authors: Rick Kutcipal, President, SCSI Trade Association, Product Planning, Data Center Storage SCSI Trade Association, Broadcom
Cameron Brett, Director SSD Product Marketing, Toshiba America Electronic Components’ SSD Business Unit

Abstract

Authors: Rick Kuticipal, Cameron Betts
For storage professsionals seeking fast, flexible and reliable data access, Serial Attached SCSI (SAS) is the proven platform for innovation. With a robust roadmap, SAS provides superior enterprise-class system performance, connectivity and scalability. This presentation will discuss why SCSI continues to be the backbone of enterprise storage deployments and how it continues to rapidly evolve by adding new features, capabilities, and performance enhancements. It will include an up-to-the-minute recap of the latest additions to the SAS standard and roadmaps, the status of 12Gb/s SAS deployment, advanced connectivity solutions, MultiLink SAS™, and 24Gb/s SAS development. Presenters will also provide updates on new SCSI features such as Storage Intelligence and Zoned Block Commands (ZBC) for shingled magnetic recording.

Learning Objectives

  • Understand the basics of SAS architecture and deployment, including its compatibility with SATA, that makes SAS the best device level interface for storage devices.
  • Hear the latest updates on the market adoption of 12Gb/s SAS and why it is significant. See high performance use case examples in a real-world environment such as distributed databases.
  • See examples of how SAS is a potent connectivity solution especially when coupled with SAS switching solutions. These innovative SAS configurations become a vehicle for low cost storage expansion

Peformance Implications of User-space iSCSI Extensions for RDMA Initiator in Libiscsi

Roy Shterman, Software Engineer, Mellanox
Sagi Grimberg, Co-founder, Principle Architect, LightBits Labs
Shlomo Greenberg, PhD, Electrical & Computer Engineering Department, Ben-Gurion University of the Negev Beer-Sheva, Israel

Abstract

Special recognition and appreciation for the contributors of this project: Roy Shterman, Idan Burstein, and Dr. Shlomo Greenberg, Ben Gurion University.

Storage virtualization is gaining popularity as a way to increase the flexibility and the consolidation of data centers. Performance and reliability of the virtual storage device constitute an important factor to fulfill customer’s demand for a minimum latency in the cloud storage. We will focus on a virtual storage block devices based on ISCSI protocol. We will introduce a new approach for creation of a virtual block device between a Quick Emulator, iSCSI user-space initiator library called Libiscsi and Remote Direct Memory Access technology using iSCSI Extensions for RDMA protocol. The implementation presented in this article will demonstrate the benefits of using iSCSI/iSER protocol over preceding iSCSI/TCP. Experiments with the storage benchmark (Flexible I/O) show that our implementation achieves average improvement of 200 percent in IO per second, over the iSCSI/TCP implementation. We also made compared our results to two other iSER implementations for virtual block devices. First one used a Virtual-Function and a second one created iSER disk in the Hyper-Visor with Pass-through virtualization method given to the Virtual Machine. Both methods use common iSER kernel modules. More experiments shown that a few I/O workloads (depends on block size and I/O depth) had better results when running over the proposed here iSER user-space initiator. The results in this document will be applicable also to future upcoming protocol such as NVMe over Fabrics implementations.

Learning Objectives

  • Storage virtalization
  • RDMA
  • iSER
  • QEMU
  • Performance

Testing

Uncovering Distributed Storage System Bugs in Testing (not in Production!)

Shaz Qadeer, Principal Researcher, Microsoft
Cheng Huang, Principle Researcher, Microsoft

Abstract

Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this talk, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors.

Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging.

Learning Objectives

  • Specifying distributed storage systems
  • Testing distributed storage systems
  • Experience with advanced testing techniques on distributed storage systems