2016 Tutorials - Storage Developer Conference Abstracts

The Abstracts

Optimize Storage Efficiency & Performance with Erasure Coding Hardware Offload
Dror Goldenberg

Nearly all object storage, including Ceph and Swift, support erasure coding because it is a more efficient data protection method than simple replication or traditional RAID. However, erasure coding is very CPU intensive and typically slows down storage performance significantly. Now Ethernet network cards are available that offload erasure coding calculations to hardware for both writing and reconstructing data. This offload technology has the potential to change the storage market by allowing customers to deploy more efficient storage without sacrificing performance. Attend this presentation to learn how erasure coding hardware offloads work and how they can integrate with products such as Ceph.

Learning Objectives

  • Learn the benefits and costs of erasure coding
  • Understand how erasure coding works in products such as Ceph
  • See how erasure coding hardware offloads accelerate storage performance

Object Drives: Simplifying the Storage Stack
Mark Carlson

A number of scale out storage solutions, as part of open source and other projects, are architected to scale out by incrementally adding and removing storage nodes. Example projects include:

Hadoop’s HDFS
Swift (OpenStack object storage)

The typical storage node architecture includes inexpensive enclosures with IP networking, CPU, Memory and Direct Attached Storage (DAS). While inexpensive to deploy, these solutions become harder to manage over time. Power and space requirements of Data Centers are difficult to meet with this type of solution. Object Drives further partition these object systems allowing storage to scale up and down by single drive increments.

This talk will discuss the current state and future prospects for object drives. Use cases and requirements will be examined and best practices will be described.

Learning Objectives

  • What are object drives?
  • What value do they provide?
  • Where are they best deployed?

Implementing Stored-Data Encryption
Michael Willettt

Data security is top of mind for most businesses trying to respond to the constant barrage of news highlighting data theft, security breaches, and the resulting punitive costs.  Combined with litigation risks, compliance issues and pending legislation, companies face a myriad of technologies and products that all claim to protect data-at-rest on storage devices. What is the right approach to encrypting stored data?

The Trusted Computing Group, with the active participation of the drive industry, has standardized on the technology for self-encrypting drives (SED): the encryption is implemented directly in the drive hardware and electronics. Mature SED products are now available from all the major drive companies, both HDD (rotating media) and SSD (solid state) and both laptops and data center. SEDs provide a low-cost, transparent, performance-optimized solution for stored-data encryption. SEDs do not protect data in transit, upstream of the storage system.

For overall data protection, a layered encryption approach is advised. Sensitive data (eg, as identified by specific regulations: HIPAA, PCI DSS) may require encryption outside and upstream from storage, such as in selected applications or associated with database manipulations.
This tutorial will examine a ‘pyramid’ approach to encryption: selected, sensitive data encrypted at the higher logical levels, with full data encryption for all stored data provided by SEDs.

Learning Objectives

  • The mechanics of SEDs, as well as application and database-level encryption   
  • The pros and cons of each encryption subsystem   
  • The overall design of a layered encryption approach

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation
Irfan Ahmad

It is well-known that cache performance is non-linear in cache size and the benefit of caches varies widely by workload. Irrespective of whether the cache is in a storage system, database or application tier, no two real workload mixes have the same cache behavior! Existing techniques for profiling workloads don’t measure data reuse, nor do they predict changes in performance as cache allocations are varied.

Recently, a new, revolutionary set of techniques have been discovered for online cache optimization. Based on work published at top academic venues (FAST '15 and OSDI '14), we will discuss how to 1) perform online selection of cache parameters including cache block size and read-ahead strategies to tune the cache to actual customer workloads, 2) dynamic cache partitioning to improve cache hit ratios without adding hardware and finally, 3) cache sizing and troubleshooting field performance problems in a data-driven manner. With average performance improvements of 40% across large number of real, multi-tenant workloads, the new analytical techniques are worth learning more about.

Learning Objectives

  • Storage cache performance is non-linear; Benefit of caches varies widely by workload mix
  • Working set size estimates don't work for caching Miss ratio curves for online cache analysis and optimization
  • How to dramatically improve your cache using online MRC, partitioning, parameter tuning
  • How to implement QoS, performance SLAs/SLOs in caching and tiering systems using MRCs

SAS: Today’s Fast and Flexible Storage Fabric
Authors: Rick Kutcipal
Cameron Brett


Authors: Rick Kuticipal, Cameron Betts
For storage professsionals seeking fast, flexible and reliable data access, Serial Attached SCSI (SAS) is the proven platform for innovation. With a robust roadmap, SAS provides superior enterprise-class system performance, connectivity and scalability. This presentation will discuss why SCSI continues to be the backbone of enterprise storage deployments and how it continues to rapidly evolve by adding new features, capabilities, and performance enhancements. It will include an up-to-the-minute recap of the latest additions to the SAS standard and roadmaps, the status of 12Gb/s SAS deployment, advanced connectivity solutions, MultiLink SAS™, and 24Gb/s SAS development. Presenters will also provide updates on new SCSI features such as Storage Intelligence and Zoned Block Commands (ZBC) for shingled magnetic recording.

Learning Objectives

  • Understand the basics of SAS architecture and deployment, including its compatibility with SATA, that makes SAS the best device level interface for storage devices.
  • Hear the latest updates on the market adoption of 12Gb/s SAS and why it is significant. See high performance use case examples in a real-world environment such as distributed databases.
  • See examples of how SAS is a potent connectivity solution especially when coupled with SAS switching solutions. These innovative SAS configurations become a vehicle for low cost storage expansion

Snapshotting Scale-out Storage Pitfalls and Solutions
Alex Aizman

The most difficult to support is the requirement of consistency that often implies not only storage system’s own internal consistency (which is mandatory) but the application level consistency as well. Distributed clocks are never perfectly synchronized: temporal inversions are inevitable. While most I/O sequences are order indifferent, we cannot allow inconsistent snapshots that reference user data contradicting the user’s perception of ordering. Further, photo snapshots do not require the subjects to freeze: distributed snapshot operation must not require cluster-wide freezes. It must execute concurrently with updates and eventually (but not immediately) result in a persistent snapshot accessible for reading and cloning. This presentation will survey distributed snapshotting, explain and illustrate ACID properties of the operation and their concrete interpretations and implementations. We will describe MapReduce algorithms to snapshot a versioned eventually consistent object cluster. Lastly the benefits of atomic cluster-wide snapshots for distributed storage clusters will be reviewed. True cluster-wide snapshots enable capabilities and storage services that had been feared lost when storage systems scaled-out beyond transactional consistency of medium-size clusters.

Learning Objectives

  • Distributed copy-on-write snapshots - a lost art?
  • Usage of client-defined timestamps to support causal consistency
  • MapReduce programming model vs. cluster-wide snapshotting – a perfect fit