Storage Developer Conference Abstracts

Storage Developer Conference Abstracts

Break Out Sessions and Agenda Tracks Include:

Note: This agenda is a work in progress. Check back for updates on additional sessions as well as the agenda schedule.

Big Data
Birds of a Feather
Block Protocol
Cloud
Dat
Management
Development Methodologies
Distributed Storage
File Systems
Hardware
Hot Topics
Key Note and Featured Speakers
Long Term Retention

New Thinking
NFS
Object Storage
Performance
Security
SMB
Software Defined X
Solid State
Storage Management
Storage Utilities
Testing Methodologies
Workloads

BIG DATA

Can Storage fix Hadoop?

John Webster, Senior Partner, Evaluator Group

Abstract

Survey data shows that at least half of all enterprise data center Hadoop projects are stalled and that only 20% are actually making into production. This presentation looks at the problems with Hadoop that enterprise data center administrators encounter and how the storage environment can be used to fix at least some of these problems.

Learning Objectives

Understand the issues with current enterprise data center Hadoop implementations
Learn what the open source community and vendors are doing to fix the problems
Learn how storage devices and platforms can be used to address the problems and move Hadoop implementations forward

Hadoop: Embracing Future Hardware

Sanjay Radia, Co-founder, Hortonworks
Suresh Srinivas, Hortonworks

Abstract

This talk looks at the implications to Hadoop of future server hardware - and to start preparing for them. What would a pure SSD Hadoop filesystem look like, and how to get there via a mixed SSD/HDD storage hierarchy? What impact would that have on ingress, analysis and HBase? What could we do do better if network bandwidth and latency became less of a bottleneck, and how should interprocess communication change? Would it make the graph layer more viable? What would massive arrays of WIMPy cores mean -or a GPU in every sever. Will we need to schedule work differently? Will it make per-core RAM a bigger issue? Finally: will this let us scale Hadoop down?

BIRDS OF A FEATHER

NVM Programming Model - Next Steps

Abstract

The NVM Programming TWG recently celebrated its first birthday and is finalizing its first publication. We looking for suggestions from TWG members and non-members on NVM software extensions for future publications. The BOF includes a short overview of the TWG and NVM Programming Model specification, followed by a round-table suggestion of future work items.

Licensing Microsoft File Protocols

Abstract

Abstract coming soon

Green Storage – The Big Picture

Abstract

"The most expensive storage purchased is that which causes the deployment of another Data Center." George Crump, President & Founder Storage-Switzerland In a world of more, more, more, using 'less' to store all of it, is a crucial skill, which translates to a real competitive advantage for an organization.

Join us on September 17th at the MESS meetup as our panel of experts discuss the key techniques for reducing the power, cooling, space, and networking impact of storage, using new paradigms like:

IO density metrics
Geo-dispersal of data
Next-generation storage pods
Self-healing protection algorithms

...together contributing heartily to a simple goal ‘A Lower COST Footprint’
Improve your company's bottom line by attending the September MESS meet up!

Building a Linux Storage Appliance with Data Optimization

Abstract

Data deduplication and compression are no longer storage optimizations relegated to backup. They have become mainstream in primary and high performance (flash) storage. In this BOF session, we will discuss how to build a Linux storage appliance using standard Linux components (XFS, LVM2, and Linux iSCSI) and Permabit Albireo Virtual Data Optimizer (VDO). Whether you are designing cloud storage, backup solutions, or high performance flash arrays, this discussion will show you how to build a storage-optimized product in matter of hours.

Cloud Application Management for Platforms (CAMP)

Abstract

There are multiple commercial PaaS offerings in existence using languages such as Java, Python and Ruby and frameworks such as Spring and Rails. Although these offerings differ in such aspects as programming languages, application frameworks, etc., there are inherent similarities in the way they manage the applications that are deployed upon them. Cloud Application Management for Platforms (CAMP) specifies deployment artifacts and a RESTful API designed to both ease the task of moving applications between PaaS platforms as well as provide an interoperable mechanism for managing PaaS-based applications in a way that is language, framework, and platform neutral.

pNFS Open Discussion

Abstract

Learning Objectives

This session will provide basic information on new capabilities being proposed for SCSI, iSCSI, SAS, and Fibre Channel. Attendees will be able to evaluate these new capabilities for possible use in their application environments, and to engage in an informed discussion with vendors about their use cases for these new capabilities.

Illustrate utilization of CDMI
How to build convergent data access methods

Architecting An Enterprise Storage Platform Using Object Stores

Niraj Tolia, Chief Architect, Maginatics

Abstract

While object storage systems such as S3 and Swift are exhibiting rapid growth, there is still an impedance mismatch between their feature set and enterprise requirements. This talk dives into the design and architecture of MagFS: a strongly consistent and multi-platform distributed file system that layers itself on top of multiple object storage systems. In particular, it covers the challenges of using eventually consistent object stores, optimizing both data and metadata traffic for wide-area network communication and mobile devices, and how MagFS delivers an on-premises security model while still being able to leverage off-premises storage. This talk also discusses how specific enterprise requirements have influenced the technical design of MagFS and some of the surprises we encountered during our design and implementation.

COSBench: A Benchmark tool for Cloud Storage

Yaguang Wang, Sr. Software Engineer, Intel

Abstract

With object storage services becoming increasingly accepted as one new offering comparing to traditional file or block systems, it is important to effectively measure the performance of these services. Thus people can compare different solutions or tune their systems for better performance. However, little has been reported on this specific topic as yet. To address this problem, we developed COSBench (Cloud Object Storage Benchmark), a benchmark tool that we are currently working on inside Intel for cloud object storage services. In addition, we will share the status for CDMI supporting, and share results of the experiments we have performed so far.

Learning Objectives

To know what's cosbench, and how does it work
To know what kind of information can get from cosbench
To know the key different points between CDMI and other object store interfaces like S3

DATA MANAGEMENT

Virtual Machine Archival System

Parag Kulkarni, VP Engineering, Calsoft Inc.
Dr. Anupam Bhide, CEO Co-Founder, Calsoft Inc.

Abstract

Popular server virtualization vendors have enabled integration with backup and recovery solutions, but not with virtual machine archival systems. Server virtualization system should have knowledge of various storage systems attached to it such as SSD, HDD, Object Storage, Tape library and Cloud. For instance, VMWare ecosystem.

We propose ‘Virtual Machine Archival System’ with following functionality:

Decision on which type of storage should be used as destination
Labeling locations of VMs data
Discovery interface and VM archival policies

Server virtualization system facilitates following functionality:

Storage types available around
Archival link creation in file system containing VM data
Passing on archival and restore request to archival system
GUI integration for archival system

Learning Objectives

Understand virtual machine archival process
Apprehend importance of server virtualization

Data Deduplication as a Platform for Virtualization and High Scale Storage

Adi Oltean, Principal Software Design Engineer, Microsoft

Abstract

The primary data deduplication system in Windows Server 2012 is designed to achieve high deduplication savings at low computational overhead on commodity storage platforms. In this talk, we will build upon that foundational work and present new techniques to scale primary data deduplication on both the primary data serving and optimization pathways. This will include hardware accelerated performance improvements for hashing and compression, better file system integration to reduce write path overheads and optimize live files, and deduplication aware caching to mitigate disk bottlenecks. We will show how this enables deduplication to be leveraged as a platform for storage virtualization.

Learning Objectives

Fundamental building blocks for a primary data deduplication system.
Deduplication data serving for “live data” as a storage layer for virtualization workload.
Optimization of data at high scale and little to zero impact on compute resources of virtualization platform.
Utilizing data deduplication as a means to implement an efficient system cache.

DEVELOPMENT METHODOLOGIES

Using Big Data Analytic for Agile Software Development

Ashish Goyal, Pricipal Software Engineer, EMC

Abstract

Reporting is a critical element in agile software development in large projects with distributed teams and strict governance requirements. Reporting includes software quality and integration effort of the most recent build. Developers provide information about code changes, source-code analysis, and unit test results. The testing team provides information about test runs. Release engineers provide information on quality of current build, uptime, and defects logged. Historical data can also be used for comparing quality and testing trends. All these information comes from independent sources. To provide better insight into state of software development to customers, management and team members, Big Data analytic can be used to analyze the data. Analytic can help to identify test cases that break most frequently, changes in source code files, which might cause unit test and functional test to break, identifying area of code base which requires additional test or redesign. Historical and Current defect reports can be used to estimate the number of defects customer might report for new release.

Learning Objectives

Issue affecting agile software development for distributed teams
How Big Data Analytic provides insight into software quality
Provide view to Management if product development is on track

A Method to Establish concurrent Rapid Development Cycle and High Quality in a Storage Array System Environment

M. K. Jibbe, Director of Quality Architect Team for NetApp APG Products, NetApp
Kuok Hoe Tan, QA Architect, NetApp

Abstract

Shift Left is a combination of a change in development and validation approaches and key engineering framework improvements to ensure that each phase of the release process provides a solid foundation for the subsequent phase till final product release. As the name implies, the goal is to move development and validation earlier into the release cycle to ensure content design, development, validation and bug fixes are occurring when the bulk of the Engineering resources are engaged and available. Each release phase is focused on delivering the building blocks for a successful and high quality release. We have adopted industry standard best practices and methodologies to bolster our engineering framework and process to support this transformation. Agile forms the cornerstone of our new content development scrum teams with the goal of maintaining a potentially shippable product early and consistently.

Learning Objectives

Learn about early validation from early development to full system validation
Learn about the focused targeted outcome for each phase of the release process
Learn about Key release metrics to track progress to key outcome for each phase of the release process
Learn about an internalized industry standard best practices and methodologies to form the framework as a foundation to drive continuous improvements for the release process

Code Coverage as a Process

Aruna Prabakar, Software Engineer, EMC
Niranjan Page, Engineering Manager, EMC

Abstract

Learning Objectives

Refresh my knowledge of erasure coding. (Some knowledge is assumed – this is *not* a tutorial on Erasure Coding. For basics, refer to the tutorial at USENIX FAST 2013 – “Erasure Coding for Storage Applications”, by Plank and Huang. )
Get an overview of Windows Storage Spaces technology and its fault tolerance mechanism.
Understand the implementation of LRC and its benefits in clustered storage systems.

FILE SYSTEMS

Snapshot Cauterization

Sandeep Joshi, Manager, EMC
Narain C. Ramdass, Manager, EMC

Abstract

Snapshot cauterization and MetaSnaps When a user takes a snapshot of a filesystem, it captures all the data, some of which the user may not want to retain. Presently there are no mechanisms for a user to delete this data without deleting the whole snapshot. We present methods to cauterize such unwanted data from a snapshot, to reclaim space. This technique can be used to build more features which will be useful for file system analytics.

Learning Objectives

Snapshots
Need for Snapshot cauterization
Maintaining Snapshot consistency
Future enhancements

Multiprotocol Locking and Lock Failover in OneFS

Aravind Velamur Srinivasan, Senior Software Engineer, EMC, Isilon Systems

Abstract

This talk will examine the details on how multiprotocol locking is implemented in a distributed clustered file system such as Isilon’s OneFS and also looks into the existing lock failover implementation in OneFS for NFS and how it can be extended for implementing lock failover for SMB3. A clustered file system such as Isilon’s OneFS can have multiple clients accessing the server using different protocols such as SMB and NFS. A robust and efficient distributed lock manager is necessary is necessary to achieve both protocol correctness and data consistency in the presence of multi-protocol access to data/files. We also need a failover mechanism to implement the failover semantics of these protocols so that the locks are not lost even when a node in the cluster goes down. This talk will examine the details of such a locking mechanism in OneFS.

Learning Objectives

Details of the distributed lock manager in OneFS
Challenges in implementing multiprotocol locking on a clustered file system and how it is made possible in OneFS
Details of the design and implementation of lock failover for NFS
Challenges in extending the lock failover for SMB

HDFS - What is New and Future

Sanjay Radia, Co-founder, Hortonworks
Suresh Srinivas, Hortonworks

Abstract

Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, etc. We describe these features and their benefits. We also discuss development that is underway for the next HDFS release. This includes much needed data management features such as Snapshots and Disaster Recovery. We add support for different classes of storage devices such as SSDs and open interfaces such as NFS; together these extend HDFS as a more general storage system. As with every release we will continue improvements to performance, diagnosability and manageability of HDFS.

Snapshots for Ibrix - Highly Distributed Segmented Parallel FS

Boris Zuckerman, Distinguish Technologist, HP

Abstract

This presentation explores designing ‘native snapshots’ for scale-out segmented parallel file systems (Ibrix). An appropriate model of snapshots requires flexibility and fluidity to allow easy selection of objects, reliability to assure logical unity of such subsets. We scale linearly adding servers and segments fundamentally by limiting the number of objects participating in operations and de-centralizing control over meta-data. With snapshots, associated state transition has to affect not only directly referenced objects, but has to be immediately propagated to all the descendant nodes controlled by a large number of other servers. We also look into recovery, achieving quick rollback logically resetting state of the subspace to a desired point in time and allowing corresponding longer running cleanup processes to finish in the background.

Learning Objectives

Expose fundamentals of highly distributed segmented parallel file system architecture
Review the challenges of implementing snapshots for such environment
Define Snap Identities as dynamically inheritable attributes
Logical preservation of name components in snapshots and Avoid large scale data flushes at snap time

A Brief History of the BSD Fast Filesystem

Dr. Marshall Kirk McKusick, Computer Scientist Author and Consultant

Abstract

This talk provides a taxonomy of filesystem and storage development from 1979 to the present with the BSD Fast Filesystem as its focus. It describes the early performance work done by increasing the disk block size and by being aware of the disk geometry and using that knowledge to optimize rotational layout. With the abstraction of the geometry in the late 1980's and the ability of the hardware to cache and handle multiple requests, filesystems performance ceased trying to track geometry and instead sought to maximize performance by doing contiguous file layout. Small file performance was optimized through the use of techniques such as journaling and soft updates. By the late 1990's, filesystems had to be redesigned to handle the ever growing disk capacities. The addition of snapshots allowed for faster and more frequent backups. The increasingly harsh environment of the Internet required greater data protection provided by access-control lists and mandatory-access controls. The talk concludes with a discussion of the addition of symmetric multi-processing support needed to utilize all the CPUs found in the increasingly ubiquitous multi-core processors.

Scale-out Storage Solution

Mahadev Gaonkar, Technical Architect, iGATE

Abstract

Today, data is growing at an exponential rate and the need to provide an efficient storage mechanism has become more critical than ever. In this presentation, we will discuss about a scale out storage solution intended to address small and medium businesses in a cost effective manner. This is a Linux based software-only solution that works on commodity hardware. It is a POSIX compliant solution and provides file storage through CIFS/ NFS interfaces. The entire solution is designed to have small footprint and easy installation on available Linux machines. This paper presents technical details of the solution and implementation challenges. In addition, the paper will also discuss about tools and techniques used to test scale out storage product.

Learning Objectives

Overview of the Scale-out storage solution
Solution details - architecture of the Distributed File system, Storage workload distribution mechanism, Key optimizations to achieve higher performance
Testing Scale-out Storage - Challenges in testing Scale out products, Stress, scalability and performance testing and Various open source tools

Advancements in Windows File Systems

Andy Herron, Principal Software Developer, Microsoft

Abstract

There are some advances, refinements, and improvements in Windows File Systems coming, which we'll be able to talk about at SDC-2013. Stay tuned for more details....

Cluster Shared Volumes

Vladimir Petter, Principal Software Design Engineer, Microsoft

Abstract

Cluster Shared Volumes is a cluster file system for the Windows Hyper-V and File Server workloads. It enables concurrent access to volumes and files from any node in a Windows Server Failover Cluster. In this session, we will describe how Cluster Shared Volumes leverages and extends existing Windows technology, such as NTFS for metadata and storage allocation, SMB 3.0 for high-speed interconnect, Volume Snapshot Service for distributed backups, oplocks for cache coherency, and failover clusters for multi-node coordination.

Learning Objectives

Review the scenarios targeted by Cluster Shared Volumes.
Explain how Cluster Shared Volumes is layered between client applications and NTFS.
Understand the conditions under which multiple cluster nodes can concurrently access NTFS volumes at block level.
Describe how SMB 3.0 and failover clusters are used for efficient solve multi-node problems, such as metadata updates and snapshot coordination.

Balancing Storage Utilization Across a Global Namespace

Manish Motwani, Lead Software Developer, Cleversafe, Inc.

Abstract

Global namespaces represent the pinnacle of scalability as no central authority need be consulted to locate or update a resource. Just as DNS has enabled the Internet to scale to billions of hosts, global namespaces have much utility for scaling storage systems to Petabytes and beyond. Yet there are trade-offs to be made. The less dynamic the namespace the greater the scalability, but a more rigid namespace restricts data migration and rebalancing choices. We describe the trade-offs we made in designing a namespace that scales to Exabytes and how we deal with storage imbalance and expansion.

Learning Objectives

What a global namespace is, and the benefits they provide over traditional metadata or lookup services.
Limitations imposed by a rigid namespace, in terms of where data can be migrated or moved to without causing the namespace mapping to change or expand in size.
The design of our global namespace and algorithms employed to balance utilization across a storage system of thousands of nodes.

What is Shingled Magnetic Recording and what are its benefits and requirements.
What are some potential approaches to address SMR requirements in software.
What WD learned in utilizing Linear Tape File System for SMR drives.

InfiniBand Architectural Overview

David Deming, President, Solution Technology

Abstract

InfiniBand Architecture Overview.
This session will provide an overview of the entire InfiniBand Architecture including application, transport, network, link, and physical layers. This session is meant to update the student on current and future enhancements to the IB architecture including 8 Gbps links and RoCE.

Infiniband Verbs and Memory Management - RDMA

David Deming, President, Solution Technology

Abstract

InfiniBand RDMA Protocol and Memory Management.
This session overviews the verb interface and RDMA protocol including how memory regions and windows are used for inter-processor communication. NVM and SCSI Express both utilized similar programming interfaces (queue pairs) to communicate between host RAM and either another hosts' RAM or to a non-volatile storage device.

Introduction to HP Moonshot

Tracy Shintaku, Distinguished Technologist, HP Server Engineering R&D, HP

Abstract

HP’s Moonshot represents a series of products designed to ease and expedite the onramp of emerging low-power, low-cost, high-density, high-volume technologies in the data center. HP’s first Moonshot System breaks new ground in terms of power efficiency and compute density with a flexible cartridge-based form factor.

Learn about the capabilities of HP Moonshot and emerging technologies as we explore the genesis of the platform, where it could go and what it could mean for storage in the low power, highly efficient data center.

Learning Objectives

Learn about HP Moonshot architecture and catalyzing Industry trends. Discuss what Moonshot could mean for storage and storage related applications.

HOT TOPICS

Can Your Storage Infrastructure Handle the Coming Data Storm?

Amritam Putatunda, Technical Marketing Engineer, Ixia

Abstract

In day-to-day operations, a storage infrastructure must effectively perform unique tasks, like data storage, backups, access validations, edits, deletes, analysis, etc. Any delay introduced at the storage level impacts user quality of experience (QoE). To ensure effective storage infrastructure, you must evaluate and optimize the system’s ability to perform under extreme environments. Strong and resilient storage not only must handle today’s data storm – business-critical financial transactions, the fire hose of big data, on-demand video and gaming, etc. – but also stores and protects the most precious artifacts of modern-world data.

OpenStack Cloud Storage

Dr. Sam Fineberg, Distinguished Technologist, Hewlett-Packard Company

Abstract

OpenStack is an open source cloud operating system that controls pools of compute, storage, and networking. It is currently being developed by thousands of developers from hundreds of companies across the globe, and is the basis of multiple public and private cloud offerings. In this presentation I will outline the storage aspects of OpenStack including the core projects for block storage (Cinder) and object storage (Swift), as well as the emerging shared file service. It will cover some common configurations and use cases for these technologies, and how they interact with the other parts of OpenStack. The talk will also cover new developments in Cinder that enable a variety of storage devices and storage fabrics to be used

KEY NOTE AND FEATURED SPEAKERS

The Impact of the NVM Programming Model

Andy Rudoff, SNIA NVM Programming TWG

Abstract

As exciting new Non-Volatile Memory (NVM) technologies emerge, the SNIA NVM Programming Technical Workgroup (TWG) has been working through the applicable programming models. Andy will talk about the impact these programming models will have on the industry, focusing especially on the more disruptive areas of NVM like Persistent Memory.

Windows Azure Storage – Scaling Cloud Storage

Andrew Edwards, Principal Architect, Windows Azure Storage, Microsoft

Abstract

Optical Storage Technologies: The Revival of Optical Storage”

Ken Wood, CTO – Technology & Strategy Office of Technology and Planning, Hitachi Data Systems

Abstract

Optical storage is seeing a resurgence in new industry verticals for it’s improved and unique preservation and environmental qualities. Recent developments have increased capacities and functionality while maintaining decades of backwards compatibility. This is due to the wide range of industries and markets that support this medium.

Hypervisors and Server Flash

Satyam Vaghani, CTO, PernixData

Abstract

Hypervisors and server flash is an important but inconvenient marriage. Server flash has profound technology and programming implications on hypervisors. Conversely, various hypervisor functions make it challenging for server flash to be adopted in virtualized environments. In this talk, we will present specific hypervisor design areas that are challenged by the new physics of storage presented by server flash, and possible solutions. We will discuss the motivation and use cases around a software layer to virtualize server-flash and make it compatible with clustered hypervisor features like VM mobility, high availability, distributed VM scheduling, data protection, and disaster recovery. Finally, we will present some empirical results from one such flash hypervisor (FVP) implemented at PernixData, and its potential long term impact on data center storage design.

Migrating to Cassandra in the Cloud, the Netflix Way

Attend this presentation to hear how Go Daddy utilized an innovative new approach to storage infrastructure validation that enabled them to accelerate the adoption of new technologies and reduce costs by nearly 50% while maintaining 99.999% uptime for their 28 PB of data. The new process empowers Go Daddy with the insight they need to optimize both service delivery and vendor selection. Audience members will also learn how to evaluate storage workloads and identify potential performance and availability problems before they are experienced by end users.

Worlds Colliding: Why Big Data Changes How to Think about Enterprise Storage

Addison Snell, CEO, Intersect360 Research

Abstract

Addison Snell of Intersect360 Research will present an overview of how Big Data trends have changed some fundamental drivers in acquiring, architecting, and administering enterprise storage. With the majority of Big Data implementations coming from in-house development — Hadoop is just the tip of the iceberg — storage developers will find themselves taking on new roles that are defined by performance and scalability as much as reliability and uptime. Learn why high performance computing technologies like parallel file systems and InfiniBand could cross the Rubicon into enterprise, while an IT darling like Cloud might not play.

LONG TERM RETENTION

Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data

Sam Fineberg, Distinguished Technologist, HP
Simona Rabinovici-Cohen, Research Staff Member, IBM

Abstract

Generating and collecting very large data sets is becoming a necessity in many domains that also need to keep that data for long periods. Examples include astronomy, atmospheric science, genomics, medical records, photographic archives, video archives, and large-scale e-commerce. While this presents significant opportunities, a key challenge is providing economically scalable storage systems to efficiently store and preserve the data, as well as to enable search, access, and analytics on that data in the far future. Both cloud and tape technologies are viable alternatives for storage of big data and SNIA supports their standardization. The SNIA Cloud Data Management Interface (CDMI) provides a standardized interface to create, retrieve, update, and delete objects in a cloud. The SNIA Linear Tape File System (LTFS) takes advantage of a new generation of tape hardware to provide efficient access to tape using standard, familiar system tools and interfaces. In addition, the SNIA Self-contained Information Retention Format (SIRF) defines a storage container for long term retention that will enable future applications to interpret stored data regardless of the application that originally produced it. This tutorial will present advantages and challenges in long term retention of big data, as well as initial work on how to combine SIRF with LTFS and SIRF with CDMI to address some of those challenges. SIRF for the cloud will also be examined in the European Union integrated research project ForgetIT – Concise Preservation by combining Managed Forgetting and Contextualized Remembering.

Learning Objectives

Importance of long term retention
Challenges in long term retention
Learn about SIRF
Learn how SIRF works with tape and in the cloud

Best Practices, Optimized Interfaces, API’s designed for Storing Massive Quantities of Long Term Retention Data

Stacy Schwarz-Gardner, Strategic Technical Architect, Spectra Logic

Abstract

The growth, access requirements and retention needs for data in a mass storage infrastructure for HPC, life sciences, media and entertainment, higher education and research are becoming unmanageable. Organizations continue to utilize legacy methodologies to manage Big Data Growth of today and it is not working. Traditional storage tiering and backups do not solve the problem and create additional cost and overhead. Redefining the term “Archive” as an online, accessible, affordable data management platform decoupled from infrastructure will be required to solve data growth and retention challenges going forward. Leveraging new optimized interfaces and API’s for disk, tape, and cloud will be required to fully enable the Active Archive experience.

Learning Objectives

Understand how active archive technologies work and how companies are using them to gain data assurance and cost effective scalability for archived data.
Learn the implications of data longevity and planning considerations for long-term retention and accessibility.
An overview of new, innovative interfaces and api’s designed to better optimize disk, tape, and cloud storage medium for archive purposes.

NEW THINKING

Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions

Ethan Miller, Director of the NSF Industry/University Cooperative Research Center, Associate Director of the Storage Systems Research Center (SSRC), University of California

Abstract

Galois Field arithmetic forms the basis of Reed-Solomon and other erasure coding techniques to protect storage systems from failures. Most implementations of Galois Field arithmetic rely on multiplication tables or discrete logarithms to perform this operation. However, the advent of 128-bit instructions, such as Intel’s Streaming SIMD Extensions, allows us to perform Galois Field arithmetic much faster. This talk outlines how to leverage these instructions for various field sizes, and demonstrates the significant performance improvements on commodity microprocessors. The techniques that we describe are available as open source software.

NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation Non-Volatile Memories

Learning Objectives

Understand the NFS protocol & its application to modern workloads
How NFSv4.1 is being implemented by vendors and end users
The differences between NFSv3 and NFSv4.1, pNFS, FedFS
An overview of proposed features in NFSv4.2 and NFSv4.3

OBJECT STORAGE

Architecting Block and Object Geo-replication Solutions with Ceph

Sage Weil, Founder & CTO, Inktank

Abstract

As the size and performance requirements of storage systems have increased, file system designers have looked to new architectures to facilitate system scalability. Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance, and scalability from terabytes to exabytes. The Ceph architecture was initially designed to accommodate single data center deployments, where low latency links and synchronous replication were an easy fit for a strongly consistent data store. For many organizations, however, storage systems that span multiple data centers and geographies for disaster recovery or follow-the-sun purposes are an important requirement. This talk will give a brief overview of the Ceph architecture, and then focus on the design and implementation of asynchronous geo-replication and disaster recovery features for the RESTful object storage layer, the RBD block service, and Ceph's underlying distributed object store, RADOS. The fundamental requirements for a robust georeplication solution (like point in time consistency) and the differing requirements for each storage use-case and API and the implications for the asynchronous replication strategy will be discussed.

Learning Objectives

An overview of the Ceph architecture
Info on the design and implementation of asynchronous geo-replication and disaster recovery features for the RESTful object storage layer, the RBD block service, and Ceph's underlying distributed object store, RADOS.
The fundamental requirements for a robust geo-replication solution (like point in time consistency) and the differing requirements for each storage use-case and API and the implications for the asynchronous replication strategy

Transforming PCIe-SSDs and HDDs with Infiniband into Scalable Enterprise Storage

Dieter Kasper, Principal Architect, Fujitsu

Abstract

Developers and Technologists are fascinated by the low latency and high IOPS of PCIe-SSDs. But, customers expect a healthy balance between performance and enterprise features such as high availability, scalability, elasticity and data management. The open source distributed storage solution Ceph, designed for reliability, performance, and scalability in combination with commodity hardware as Infiniband, PCIe-SSDs and HDDs will merge into a perfect team, if the best I/O parameters and the right interconnect protocols are used and tuned.

Learning Objectives

Get an overview on scale-out flash storage
Learn how to transform Open Source Software and commodity hardware into Scalable Enterprise Storage
Learn about the right I/O subsystem parameters to access SSDs
Lessons learned choosing the right interconnect protocol for low latency and high bandwidth

Huawei SmartDisk Based Object Storage UDS

Qingchao Luo, Cloud Storage Architect, Huawei

Abstract

Learning Objectives

Why the network layer is increasingly important in today's storage systems, as cloud storage takes off, and the NIC and Internet speeds begin to eclipse the speed of a local hard drive.
Features and limitations of TCP, including congestion control, window scaling, error handling, order preservation and how they pertain to modern cloud based storage systems.
How to achieve reliable low-latency delivery of messages over networks with unpredictable reliability and performance, in ways that are easy to implement, manageable and widely supported.

Overview of the security challenges facing storage
Business and compliance motivations for applying trusted computing to storage, including breach notification legislation
Introduction to the technical specifications of the Trusted Computing Group, especially as they apply to storage
Technical details of Self-Encrypting Drives (SED)

Huawei’s implementation of Offload Data Transfer (ODX) through SMB Copy Offload
Huawei’s use of cluster-wide locking through its proprietary Outer Lock (OL) service to achieve SMB3.0 transparent failover
Test results on multi-channel, ODX, [and maybe SMB over RDMA]

SMB Direct Update

Greg Kramer, Sr. Software Engineer, Microsoft

Abstract

This talk will explore upcoming changes to the SMB 3 protocol that increase SMB Direct performance for high IOP workloads. The protocol changes will be motivated by performance analyses, including updated SMB Direct performance results for a variety of IO workloads.

SMB3 Update

David Kruse, Development Lead, Microsoft

Abstract

The past year has seen multiple companies and teams release SMB3 solutions, and many customers deploy them into production. This talk will look at some upcoming minor adjustments to SMB3 based on lessons learned, and cast forward for what might come next.

Learning Objectives

Learn more about SMB3
Discuss some potentially new and interesting applications for SMB3

Scaled RDMA Performance & Storage Design with Windows Server SMB 3.0

Dan Lovinger, Principal Software Design Engineer, Microsoft
Spencer Shepler, Principal Program Manager, Microsoft

Abstract

This session will present a summary of the performance of the Windows Server 2012 File Server’s Remote Direct Memory Access capabilities over SMB 3.0. Systems presented will range from production “Windows Server Cluster in a Box” from EchoStreams to rack-scaled storage systems, across multiple RDMA solutions. Design considerations for highly scaled systems and their tradeoffs will be discussed. With RDMA the processor cost of bulk data access on remote file systems has the potential to approach the range of local storage. This provides a novel build option for deploying high speed and highly efficient consolidated storage solutions.

Exploiting the High Availability Features in SMB 3.0 to Support Speed and Scale

James Cain, Principal Software Architect, Quantel Ltd

Abstract

Microsoft have made massive changes in version 3.0 of SMB protocol, many of which contribute towards SMB 3.0 offering high availability (HA) for use in the data centre. This talk will present the results of investigations into how these innovations can be exploited for improved I/O speed & architectural scale. The presentation will first look at the needs of HA in a NAS protocol. It will then offer an insight into why features were added to SMB 3.0, providing technical analysis of the protocol itself and live demonstrations of noted features using the authors own implementation of an SMB server.

Learning Objectives

Understanding SMB 3.0 at an architectural level
Practical insight into network topologies for NAS
Exploring requirements for High Availability and design choices made in SMB 3.0
Finding the minimal shared state between SMB servers in a cluster

Samba 4.0 released: What now for the Open Source AD Domain Controller?

Andrew Bartlett, Samba Developer, Samba Team

Abstract

Coming Soon

A Status Report on SMB Direct (RDMA) for Samba

Richard Sharpe, Samba Team Member, Panzura

Abstract

Since Microsoft announced SMB Direct there has been interest in providing support for SMB DIrect under Samba. This presentation will describe the current state of the project to provide that support. It will discuss the process that we have undertaken, the players, and what we have working todate.

Learning Objectives

Obtain more information about the state of this project
Understand the alternatives we have looked at
Understand the technical difficulties involved

Learn to better define software defined storage and separate the reality from the marketing buzz
Learn about the benefits of software defined storage, and how it can bring new flexibility to your datacenter
Take a look at the future of software defined storage and how this movement is changing the game

Awareness of the new SNIA NVM Programming Model
Advantages to software utilizing NVM features
Motivations for NVM device vendors to support the model

STORAGE MANAGEMENT

Demand for Storage Systems from a Customer Viewpoint in Japan

Satoshi Uda, Assistant Professor, Japan Advanced Institute of Science and Technology (JAIST)

Abstract

We are providing storage services more than 20 years with large-scale NAS systems in JAIST for central data managing whole over our institute. Recently, construction of the private cloud environment based on virtualization technology complicate the dependency structure of systems, and also makes it increasingly difficult to operate our storage systems. In this presentation, we talk about case study and demands for storage systems from a customer viewpoint, based on our knowledge from deploying and operating our storage systems. Furthermore, the non-technical matter is also important in operating storage systems, i.e. we need to get good support for trouble shooting. We mention the consideration from this point with story in Japan. Note: This presentation will be jointly-conducted with SNIA-J (SNIA Japan Branch).

Learning Objectives

case study
customers' voice

Prequel: Distributed Caching and WAN Acceleration

Jose Rivera, Software Engineer, Red Hat
Christopher R. Hertel, Senior Principal Software Engineer, Red Hat

Abstract

Prequel is an open-source implementation of PeerDist, a protocol for wide-area distributed caching developed by Microsoft. PeerDist is more commonly known as Microsoft's BranchCache feature. Prequel seeks to bring PeerDist into the open source world and integrate its capabilities with data access projects like Samba. This talk will cover the basics of the PeerDist protocol, currently deployed scenarios, and illustrate scenarios which Prequel can serve. We'll demonstrate a working Prequel installation on Linux serving Windows peers and walk through the protocol step by step. Questions and heckling will be taken throughout the presentation.

A Method to Back up and Restore Configuration Settings for each and every Component in the SAN Environment using SMI-S and also Replicate the Configuration on Clean Setups with the Same or Similar Components

Dhishankar Sengupta, Test Architect, NetApp

Abstract

As of today all disaster recovery solutions getting shipped communicates with APIs from different vendor devices to retrieve information and perform management oriented operations on it. To perform these operations it is very important the solution is interoperable with the APIs provided by the vendors. The method proposed in this paper overcomes the issues arising from dependency on the vendor APIs. The solution is a software stack that backs up the configuration on devices participating in a SAN and upon failure of a site or a device in a site the solution bears the capability of replicating the same configuration that existed on the previous site/device on the new site/device. The solution is based on SNIA standards using the SMI providers developed by each of the device vendors. Since the SMI providers are built on a standard CIM model to which all devices seek compliance to; the interoperability factor between the individual devices is overcome and developing a solution comprising of different vendor products becomes lot easier, enhanced request/response time and cost effective.

Learning Objectives

How to easily migrate logical configuration/data from any Storage Vendor to any storage platforms of a different vendor
How to enable replication of the configuration settings from a NAS system to a SAN system thereby allowing the customer to migrate from a NAS setup to a BLOCK array based on any changes in the application environment, without much fuss
How administrators can use the SMI-S standard to create back up objects or restoration objects irrespective of Vendor mismatch or any other mismatches on both the setups to be backed up and the setup on which the restoration is to happen

STORAGE UTILITIES

Building the Public Storage Utility

Wesley Leggette, Software Architect, Cleversafe, Inc.

Abstract

The holy grail of cloud storage is to make storage into a utility. That is, an ubiquitous, standard, public resource, in the same sense that electricity and tap water are today. What makes utility storage difficult is that unlike water or electricity each user's data is unique and private. In this presentation we propose a solution to this problem. Our proposed solution enables a global anonymous public storage service where the storage system has no knowledge about users, or the data or metadata they store. Yet each user has their own private and secure storage space. Further, we consider some of the payment options that exist within a fully anonymous storage utility.

Learning Objectives

Requirements of utility storage
How to secure data without knowledge of user identities
Methods for anonymous purchase of storage resources

Typical Storage failures events
Simulating errors that cause Data Unavailability, Data Loss
Usage of open source tools in implementing fault injection framework in storage development
Robust and Fault tolerant storage stack design aspects

WORKLOADS

An SMB3 Engineer’s View of Windows Server 2012 Hyper-V Workloads

Gerald Carter, Sr. Consulting Software Engineer, EMC/Isilon

Abstract

Many traditional physical storage workloads are well understood. Quantifying how access patterns change when a hypervisor in inserted in between server applications and physical storage requires rethinking what is optimal for a NAS configuration. This presentation will examine several Hyper-V workloads from the perspective of an SMB3 implementer.

Learning Objectives

What is required set of SMB3.0 features to support a Hyper-V workload? How should I prioritize my SMB3 feature development schedule for Hyper-V?
How does the Hyper-V map guest I/O requests into SMB2/3 operations? What is the new distribution and size of requests?
What is the performance impact of SMB2/3 feature such as LargeMTU, SMBDirect, and the “File Level Trim” IoFsControl?

Direct NFS – Design Considerations for Next-gen NAS Appliances Optimized for Database Workloads

Gurmeet Goindi, Principal Product Manager, Oracle
Akshay Shah, Principal Software Engineer, Oracle

Abstract

NAS appliances have traditionally been a popular choice for shared storage as they support a standardized and mature NFS protocol and leverage inexpensive Ethernet networking. However, the NFS protocol and traditional NAS appliances are designed for general purpose file system storage. Database workloads are very unique in the kind of requirements they place on a storage system. Different database workloads can have very different response time or bandwidth requirements. Along with the traditional database requirements of atomicity and consistency; critical database systems also have strict uptime and high availability requirements. Database workloads have the ability to convey this information to the storage. This session will explore some novel ideas to help design the next generation of NAS appliances and integrate newer NFS protocols and breakthrough technologies such as flash storage for optimizing database performance.

Learning Objectives

Database Workloads and storage performance

Storage Developer Conference Abstracts

Break Out Sessions and Agenda Tracks Include:

BIG DATA

Can Storage fix Hadoop?

John Webster, Senior Partner, Evaluator Group

Abstract

Hadoop: Embracing Future Hardware

Sanjay Radia, Co-founder, Hortonworks Suresh Srinivas, Hortonworks

Abstract

BIRDS OF A FEATHER

NVM Programming Model - Next Steps

Abstract

Licensing Microsoft File Protocols

Abstract

Green Storage – The Big Picture

Abstract

Building a Linux Storage Appliance with Data Optimization

Abstract

Cloud Application Management for Platforms (CAMP)

Abstract

pNFS Open Discussion

Abstract

BLOCK PROTOCOL

FCoE Direct End-Node to End-Node (aka FCoE VN2VN)

John Hufferd, Hufferd Enterprises

Abstract

SCSI Standards and Technology Update

Marty Czekalski, President, SCSI Trade Association

Abstract

Extending SAS Connectivity in the Data Center

Bob Hansen, Storage Architecture Consultant, LHP Consulting Group

Abstract

SCSI and FC Standards Update

Fred Knight, Standards Technologist, NetApp

Abstract

Cloud

CDMI and Scale Out File System for Hadoop

Philippe Nicolas, Director of Product Strategy, Scality

Abstract

Lessons Learned Implementing Cross-protocol Compatibility Layer

Scott Horan, Integration Engineer, Cleversafe, Inc

Abstract

Profile Based Compliance Testing of CDMI: Approach, Challenges & Best Practices

Sachin Goswami, Solution Architect and Storage COE Head, Hi Tech TATA Consultancy Services

Abstract

Transforming Cloud Infrastructure to Support Big Data

Dr. Ying Xu, R&D Lead, Aspera Inc

Abstract

Resilience at Scale in the Distributed Storage Cloud

Alma Riska, Consulting Software Engineer, EMC

Abstract

Open-source CDMI-compliant Proxy: "Stoxy"

Ilja Livenson, KTH Royal Institute of Technology

Abstract

LTFS and CDMI - Tape for the Cloud

David Slik, Technical Director, NetApp

Abstract

CDMI Federations, Year 4

David Slik, Technical Director, NetApp

Abstract

Windows Azure Storage - Speed and Scale in the Cloud

Joe Giardino, Senior Development Lead, Microsoft

Abstract

CDMI, The Key Component of Scality Open Cloud Access

Giorgio Regni, CTO, Scality

Abstract

Architecting An Enterprise Storage Platform Using Object Stores

Niraj Tolia, Chief Architect, Maginatics

Abstract

COSBench: A Benchmark tool for Cloud Storage

Yaguang Wang, Sr. Software Engineer, Intel

Abstract

DATA MANAGEMENT

Virtual Machine Archival System

Parag Kulkarni, VP Engineering, Calsoft Inc. Dr. Anupam Bhide, CEO Co-Founder, Calsoft Inc.

Abstract

Data Deduplication as a Platform for Virtualization and High Scale Storage

Adi Oltean, Principal Software Design Engineer, Microsoft

Abstract

DEVELOPMENT METHODOLOGIES

Sanjay Radia, Co-founder, Hortonworks
Suresh Srinivas, Hortonworks

Parag Kulkarni, VP Engineering, Calsoft Inc.
Dr. Anupam Bhide, CEO Co-Founder, Calsoft Inc.

M. K. Jibbe, Director of Quality Architect Team for NetApp APG Products, NetApp
Kuok Hoe Tan, QA Architect, NetApp

Aruna Prabakar, Software Engineer, EMC
Niranjan Page, Engineering Manager, EMC

M. K. Jibbe, Director of Quality Architect Team for All APG Product, NetApp
Marlin Gwaltney, Quality Architect, NetApp

Sandeep Joshi, Manager, EMC
Narain C. Ramdass, Manager, EMC

Sanjay Radia, Co-founder, Hortonworks
Suresh Srinivas, Hortonworks

Albert Chen, Western Digital
Jim Malina, Technologist, Western Digital