![]() |
![]() |
|
Abstracts
KeynotesGoogle File System
The Google File System (GFS) is a petabyte scale distributed file system designed for large data-intensive applications. It runs on inexpensive commodity hardware, relying on fault-tolerant software to provide reliability and availability. Client applications access the file system in parallel, achieving tens of gigabytes per second in aggregate performance. The file system is widely deployed within Google. This talk will describe the architecture of GFS and highlight some of its strengths and weaknesses. The system is composed of a replicated master, which maintains the file system metadata, and a large collection of chunkservers, which provide access to file data in 64MB chunks. Clients query the master to perform name lookup and then transfer data directly with the appropriate chunkservers; this separation of file system control from data transfer enables high aggregate throughput. The use of a centralized master has the advantage of simplicity, but it has become a bottleneck as cluster sizes have grown to thousands of machines and petabytes of storage. The talk will conclude with some thoughts on how the system will evolve to overcome this limit of scalability. Taming the Yahoo Storage Beast: How Yahoo is developing their own framework to manage storage and data protection
Yahoo's storage growth poses an abundance of management challenges. When storage capacity grows by dozens of Terabytes every day, staying ahead of utilization while maintaining lofty standards of data protection requires vast amounts of bookkeeping and coordination. To facilitate this Yahoo developed their own tool to unify both storage capacity with data protection providing a comprehensive global view of the storage environment. Revolution in Data Storage Technology
Since the introduction of the disk drive in 1956, the areal density of recording on magnetic disk drives has increased 65 million times. Throughout that time, there were many advances to individual components in the drive such as the change to thin film disks and the change to first MR, then GMR and, most recently, TMR heads, but the basic technology (longitudinal recording) remained the same. Now the industry is changing the basic recording mechanism to perpendicular recording, and this is being enabled by a change in nearly every component in the drive simultaneously. This exciting revolution in technology is enabling higher capacity and higher performance drives ranging from 12 Gbyte one-inch drives for handheld devices such as cell phones to 750 Gbyte 3. 5-inch drives for high performance computing applications. Perpendicular recording is expected to enable roughly a factor of 5 additional gains in capacity from today’s technology. At the expected 40% areal density growth rate, this would occur in the order of 5 years. At that point yet another change in technology will be required. Candidates for that are Heat Assisted Magnetic Recording (HAMR) and Bit Patterned Media. These changes in technology are enabling new products that were unforeseen only a few years ago ranging from MP3 players, to automobile GPS and entertainment systems, to personal video recorders while also enabling larger and higher performance devices for computer storage. In addition, intelligence that is enabling previously unforeseen performance is being built into drives. Full disk encryption (FDE), for example, has been introduced to thwart thieves of laptop computers, not only making the data on the disk unavailable to anyone but the owner, but also making the stolen disk unusable by the thief. Enterprise Grid Requirements and Architecture at eBay
Enterprise Grids are still a mystery to many, although the term captures a set of concepts and trends with which we are all familiar:
It is clear that we are all heading in this direction (at varying rates). It is also clear that scalable and highly mutable Enterprise Grids and the disaggregated, distributed applications which run on them, place new, or extend existing, requirements for the management of storage, data and information. This presentation uses the real world, and perhaps somewhat extreme, example of eBay’s infrastructure to illustrate the nature of Enterprise Grids today, to show where they are heading in the near and long terms, and to provide a context for articulating requirements for the management of storage, data and information Featured SpeakersStorage Networking Standards: Future Directions
Interoperability standards play a vital role in customer adoption and advancement of storage networking technologies and systems. Storage networking is based on a broad spectrum of standards (developed by multiple standards organizations) in areas such as Fibre Channel (INCITS T11), SCSI (INCITS T10), iSCSI (IETF), and storage management (SNIA, DMTF, IETF). The current state and future direction of standards development can provide useful insights into technology developments. This tutorial covers storage networking standards and the role that the resulting standardized interfaces/functionality play in networked storage infrastructure. A Different Perspective on Storage Resource Management
Trends of data center integration have lead to an environment where concerns are shifting from management of individual, hierarchical storage systems toward an environment where one needs to be concerned with the end-to-end management of data across data center, line of business and organizational boundaries. These issues can be addressed by integrating storage into a distributed management environment in which all aspects of an enterprise storage environment can be provisioned, reserved, scheduled and coordinated with demand in response to business objectives. The benefits of taking this approach include predictable service levels (increased user trust),more efficient utilization of existing storage infrastructure (operating cost savings),more opportunities to leverage commodity storage hardware for mission critical applications (capital and operating cost savings), improved alignment of data requirements and business needs (improved IT responsiveness). In this presentation Dr. Kesselman will describe how this new perspective can be achieved in practice and provide examples showing its benefits.
More Storage on the Lunatic Fringe
This presentation will provide you with some insight to data storage problems faced by various government and commercial organizations that live on the extreme edge of requirements and the technologies to meet those requirements. Managing trillions of files and Petabytes of metadata. Achieving millions of IO's per second or Terabytes per second to disk. It is a look into your future. If you don't believe me, then come sit through the talk.
Requirements
Tutorial Track AThe Backbone of SMI-S: CIM and WBEMJim Davis The SNIA's Storage Management Initiative Specification (SMI-S) is based on the DMTF's Common Information Model (CIM) and Web-Based Enterprise Management (WBEM). This session will provide the CIM and WBEM overview developers need to effectively understand the foundation architecture of SMI-S. iSCSI: Protocol and FunctionalityDavid Black The iSCSI protocol (SCSI over TCP/IP) is the foundation for many IP Storage solutions. This talk covers what the iSCSI protocol does and how it works, including the ways in which iSCSI fits into storage and networking infrastructure. Closely related topics such as iSCSI security,boot and multipathing are also addressed. )
Tutorial Track BZFS: The Last Word in File SystemsJeff Bonwick Bill Moore ZFS is a new kind of filesystem that provides simple administration, transactional semantics, provable end-to-end data integrity, and immense scalability using cheap, commodity hardware. ZFS presents a pooled storage model that eliminates the concept of volumes and the attendant problems of provisioning, stranded storage, and wasted bandwidth. ZFS introduces a new data replication scheme, RAID-Z, that provides RAID-5/6 semantics with no partial write penalty, no write hole and no need for NVRAM. ZFS also provides pipelined I/O; unlimited, constant-time snapshots and clones; extremely fast incrementals (snapshot deltas); disk scrubbing; and built-in compression.
iSCSI TrackiSCSI Boot Support in WindowsShiv Rajpal Alan Warwick Windows iSCSI boot support allows computers without a local physical hard disk drive to boot Windows 2003 Server. Instead of the local physical hard disk, Windows uses an iSCSI disk in the SAN. This talk describes the requirements, functionalityand technical details of the implementation of Windows iSCSI boot support on Server 2003. Managing iSCSI to Fibre Channel Bridge with SMI-SJohn Crandall Technologies have recently been implemented to allow iSCSI Hosts to access storage in the fibre channel fabric. This tutorial examines the fact that the bridge creates 'virtual' iSCSI targets and 'virtual' fibre channel initiators, what a clients will discover when this technology is enabled, and how this profile leverages existing profiles/subprofiles in SMI-S to discover the bridge and expose fibre channel targets to the iSCSI initiators iSCSI Testing: How does an iSCSI interface impact the testing of a RAID Storage System?M. K. Jibbe Testing an iSCSI host interface of a RAID Storage System at the development level in a RAID Storage environment raises lot of challenges. The amount of testing should not be limited to the unit module tests using basic implementation verification and standard RAID testing or the iSCSI plug fest, because the iSCSI interface will introduce to the development module test time windows that are related to login, discovery, multiple session, CHAP, and others. There are specific features and application compatibility that are supported by the RAID product and must be verified during the testing of the RAID controller host interface. Learn the following:
iSCSI Management API version 2Neeraj Kuppam As iSCSI deployments become more widespread, various initiator implementations have arisen, each with their own proprietary management scheme and interface. Managing iSCSI in such an environment is technically challenging, if not impossible. The iSCSI management API (IMA) specifies a common interface that can be used by management applications to enumerate and control iSCSI objects in a common way,avoiding proprietary API calls and vastly simplifying the general management problem. Based on the notion of managed objects, similar to the Fibre Channel HBA API,and mapping into the SMI-S iSCSI profiles, IMA is implemented with a hardware independent common library, and one or more hardware dependent plugins. This talk gives an overview of IMA version 1,as well as discusses the updates for IMA version 2 in detail. Included in the talk are details about the objects,API calls,and high-level management functions that it does and does not provide as well as how IMA relates to other management standards such as SMI-S. The talk concludes with the current status and future direction of the IMA. Learn the following:
Storage Security TrackTrusted StorageMichael Willett Storage Systems, such as disk drives, and other computing-system peripherals are critical components of a security, privacy, and trust configuration of a computing platform. This session provides a framework with which to understand why and how peripheral devices should be secured as independent roots of trust. The framework provides a generic security model for all peripheral devices, and shows how peripherals can be configured as roots of trust, each playing a complementary role in establishing the overall security and privacy goals of platform-based and networked computing. The session begins with security measures for storage systems that exist today and their relative effectiveness. It will then go into where and how to secure access control of the storage system, discussing in detail what needs to be controlled and how to grant control in a secure manner. The Trusted Computing Group's Trusted Storage Use Cases will be reviewed in depth, highlighting the technical requirements being solved by the formal specifications. Relationships and cooperation with other industry storage standards (e. g. SCSI and ATA) will be discussed, and the TCG's specification for secure and trusted storage will be outlined (anticipated publication: June 2006). Representative Use Cases for trusted storage include:
All major hard drive manufacturers are participating in the development of these specifications, as well as flash, optical, and tape memory representatives. The result will be that storage systems, where sensitive data spends most of its life time, will be a source of trust for multi-component trusted platforms of the future. New Storage Security InitiativesRoger Cummings This presentation will review the storage security initiatives that have been undertaken in various standards organizations subsequent to the first SNIA Security Summit held in 2002. It will present an overview of each activity, and compare and contrast the characteristics of the definitions being produced. It will also attempt to identify and prioritize any "gaps" that may exist, and propose future SNIA activities to address those issues. Learning Objectives:
Data Disposal – Gone for GoodEric Schafer This tutorial will educate users about the methods and challenges associated with disposing of data on magnetic storage media. This tutorial will cover data disposal requirements per Health Insurance Portability and Accountability Act (HIPAA),Gramm-Leach- Bliley Act (GLB Act) and the Department of Defense. The data disposal methods covered will include basic methods to more advanced techniques including deleting files, storing data, overwriting, destruction and degaussing. This tutorial will discuss the challenges of each data disposal method to help users define and manage security risks. The data disposal challenges discussed will be based on scientific evidence produced by research universities sponsored by the National Security Agency (NSA), followed by current NSA guidelines for proper disposal of data on magnetic storage media. Emerging Technologies TrackNFS v4. 1: pNFS:Meeting Scalable Storage Requirements of Grid ComputingBrent Welch This presentation describes the pNFS extension to NFSv4,and discusses its application to the storage requirements of grid computing. pNFS eliminates the performance bottleneck in NFS servers by providing direct, parallel I/O paths between clients and storage devices, including block devices, object storage devices, and other file servers. pNFS benefits local grid computations of hundreds and thousands of compute nodes that share storage, and has wide area applications for high speed transfer of large datasets.
NFS/RDMATom Talpey NFS/RDMA now runs at high performance on multiple industry-standard platforms, including Linux, OpenSolaris, NFSv3, NFSv4, Infiniband and iWARP. We will explore the NFS/RDMA protocol landscape, and make a detailed exploration of its performance results and benefits on these operating systems with representative hardware. The data are compelling, and will enable NFS to address new horizons in performance-critical application areas such as grid computing. Architecting Storage in a Virtual InfrastructureBob Slovick With the increasing adoption of Virtual Machine Infrastructure, which opens up new possibilities for data protection, it also offers unique challenges in architecting an optimal solution. This session will describe the many characteristics unique to architecting storage in a 'Virtual Infrastructure', as well as design considerations, and trade-offs. The talk will also discuss lifecycle management and Disaster Planning/Disaster Recovery strategies, unique to this environment. Technical Overview of SNIA RAID Disk Data Format and RAID-6 ExtensionsRamamurthy Krithivas This session shall address the following:
Management FrameworkMark Carlson SMI-S has laid the foundation for management software to be able to be written to one interface for each type of device instead of a separate software adapter for each vendor’s device. However, this has not resulted in sufficient acceleration of value for the end user customers. There is a significant effort in creating management software around common components such as persistence, data collectors, and event management. All management software needs to create these common components before they can start adding value in the actual management applications. This talk will discuss some of those components and the author's thoughts on how to standardize them. Objectives:
APERI Technical OverviewTodd Singleton The mission of the Aperi project is to create a vendor-neutral, open, storage management framework and to cultivate both an open-source community and an ecosystem for complementary products, capabilities, and services around the framework. Goals of the Aperi project include promoting interoperability, reducing complexity and incompatibility, fostering greater opportunity for innovation, and providing a greater choice of value-added functionality for end-users. Presentation Takeaways:
Lesser Bits: Experimental and Non-Standard Block I/O Systems on the CheapFibre Channel and iSCSI are the standards in network block storage, but experimental protocols such as NBD, DRBD, and AoE are finding their way out of test SANs and into niche markets. These systems are typically specialized in one way or another, allowing them to meet needs that the heavy-duty, general purpose SAN protocols typically don't. This presentation will look at the strengths and weaknesses of these upstart block protocols, the future of standard technologies. The Role of the DMTF's SMASH in a Storage EnvironmentThe DMTF's Systems Management Architecture for Server Hardware (SMASH) initiative is a suite of specifications that deliver architectural semantics, industry standard protocols and profiles to help unify the management of the data center. The SMASH Server Management Command Line Protocol specification enables simple and intuitive management of heterogeneous servers - which play a critical role in today's storage solutions - independent of machine state, operating system state, server system topology or access method, facilitating local and remote management of server hardware in both Out-of-Service and Out-of-Band management environments. This session will explain how SMASH can deliver renewed simplicity when managing storage as part of a distributed environment. Storage Grids: Relevance and OverviewAbbott Schindler Storage infrastructures based on new architectures - storage grids - are emerging in the marketplace. Storage grids promise to change both storage and overall IT deployment landscapes. This session will explore what storage grids are, their basic elements and capabilities, and how the industry is implementing them. Also, in an emerging world of grids, the tutorial will aid in understanding how storage, compute, and application grids can work together. This includes a differentiation of "storage for grid computing" and storage grids. Finally, standards envisioned to facilitate storage grid integration will be discussed, and storage grids will be compared to conventional storage in terms of business benefits and compatibility with existing storage environments, as well as possible grid integration and application task distribution in future data centers.
CIFS TrackExploring the SMB2 ProtocolAndrew Tridgell The recent Microsoft Vista test releases have introduced a new protocol variant of the SMB/CIFS protocol called 'SMB2'. The Samba Team in conjunction with Ronnie Sahlberg from the ethereal team have been conducting experiments to map out this protocol and produce an initial implementation. The results show that while the structure of protocol requests has changed dramatically between SMB and SMB2, most of the underlying functions will be very familiar to anyone used to SMB, and mapping between the two protocols is straightforward in most cases. In this presentation I will describe what has been discovered about the structure of the SMB2 protocol, and what we have completed in terms of an initial implementation. Multihead Samba Export using GPFSSven Oehme This talk wilI cover IBM's implementation experience in creating a multinode CIFS Exporter on top of the IBM GPFS File system to create a Linux based Large scale CIFS Server. Update on CIFS Unix ExtensionsJeremy Allison Steve French Although "Unix Extensions" to the CIFS Protocol were implemented in multiple clients and servers and allowed better file system semantics for applications running on network mounts, their local/remote transparency was imperfect. To provide better support for applications that require near identical behavior and function over local and remote mounts, additional extensions to the CIFS protocol have been developed and implemented in Linux client and Samba server among others. In some cases this included features that were not available with other network file systems such as NFS. This presentation will describe the current extensions to the CIFS protocol which include support for POSIX ACLs, POSIX advisory locks, and statfs, as well as a negotiation mechanism for POSIX features including case sensitivity. We also will discuss the newest features of the protocol which are being added for better application support - and some of the challenges: including support for system xattrs, inotify and more efficient file/directory change notification, lease/caching/oplock management, nfsv4 ACL mapping and direct i/o. ILM and Content Aware Storage TrackStatus of ILM Standards DevelopmentEdgar St.Pierre This presentation will review the latest status of ILM standards development, work in progress, and a roadmap of future development. This will include a look at the architecture and evolving object model. It will also examine the application and use cases for these standards as they relate to data center resource management, service level management, and evolving automated computing solutions. Key learning objectives from this talk are:
Why Content Aware Storage?Zoran Cakeljic CAS is a new category of automated networked storage established to store large volumes of fixed content over extended periods of time. Unlike NAS that is designed to facilitate collaboration and file sharing or SAN that focuses on performance, CAS is specifically designed for fixed content which might have a significantly extended life-cycle compared to transactional data. The essential characteristics of CAS are:
By the end of this talk we hope you will not only be intrigued with a new way of thinking about storage, but challenged to think of new ways of applying a new paradigm, CAS, understand the properties that constitute CAS, and be able to differentiate this technology from SAN and NAS. XAM - The Next Interface StandardZoran Cakeljic Christina Casten The Fixed Content Technical Work Group (FCAS TWG) within the Storage Networking Industry Association (SNIA) is developing the XAM (eXtensible Access Method) specification for a standard interface between applications and storage providers. XAM will impact legacy storage practices and change current business practices. Real business requirements from business units will demand that the data it creates be managed and meet various retention requirements in a cost effective manner. XAM compliant devices will be one of the available resources for maintaining the longevity of business data. By abstracting the physical assets of storage with a data persistence perspective, the underlying storage can age and run the course of its product lifecycles while the XAM architecture provides the consistent view of the managed content and all of its attributes. This capability is extremely important as the adaptive datacenter will also be able to age and retire physical assets while preserving the various services through other abstractions for the different types of resources. The key learning objectives for this talk are:
Object Based Storage TrackObject-based Storage Devices (OSD) - Architecture and SystemsRich Ramos Sami Iren The Object-based Storage Device (OSD) interface standard is focused on moving chosen low-level storage,space management, and security functions into storage devices (disks, subsystems, appliances) to enable the creation of scalable, self-managed, protected and heterogeneous shared storage for storage networks. The SCSI Object-based Storage Device Command Set (OSD-1) was ratified by ANSI in September 2004,after more than four years of joint work in the SNIA OSD Technical Work Group,including storage device,storage subsystem, and software companies with help from several universities and research groups. Work on command set extensions to be standardized as OSD-2 is currently underway.
Experiences in Building an Object-Based Storage System based on the OSD T-10 StandardAravindan Raghuveer With ever increasing storage demands and management costs, object based storage is on the verge of becoming the next standard storage interface. The American National Standards Institute (ANSI) ratified the object based storage interface standard (also referred to as OSD T-10) in January 2005. In this paper we present our experiences building a reference implementation of the T10 standard based on an initial implementation done at Intel Corporation. Our implementation consists of a file system, object based target and a security manager. Efforts are underway to open source our implementation very soon. We also present performance analysis of our implementation and compare it with an iSCSI-based-SAN and NAS storage configuration. In future, we intend to use this implementation as a platform to explore different forms of storage intelligence. Data Path TrackTechnical Overview of CDPMichael Rowan Continuous Data Protection (CDP) has become one of the most exciting developments in data protection over the last 5 years, with both end user experience and analysts opinions converging on the radical improvements CDP can make in an organizations Service Level Agreements (SLA's) around application availability, backup and recovery. In an environment where data corruption and disaster are become ever more important problems to solve,combined with data center environments that are becoming ever more complex and difficult to run and protect, CDP is a breath of fresh air delivering both a magnitude improvement in availability and recovery as well as massively simplifying the data centers run-book around back and restoration. CDP has some incredibly powerful implications to today's application and business environment, and this talk will introduce you to the concept and give you a solid overview of why everyone is so excited about it.
Storage Management TrackOverview of SMI-S v1. 1. 0 and v1. 2. 0Michael Walker This presentation will provide an overview of the contents of SMI-S 1. 1. 0, with an emphasis on what is new in SMI-S 1. 1. 0. In addition the talk will provide an overview of what is planned for SMI-S 1. 2. 0. This will include new Profiles, enhanced Profiles and other changes planned for SMI-S 1. 2. 0. This includes the break-up of SMI-S into 9 books based on topic area. Attendees will walk away with the following key knowledge:
Multi-tenant CIMOMSteve Peters This presentation will cover problems encountered when software from multiple companies try to share the SMI-S/CIM/WBEM infrastructure. The solutions will include information about SNIA SMI-S enhancements to solve this problem as well as changes to the Pegasus open source CIMOM.
Developing CIM Indication Providers and Listeners Using PegasusAn Lam Developing CIM Indication providers and listeners is considered a challenging task for many new developers due to the complex nature of indication and the lack of comprehensive tutorials on this topic. This presentation discusses both concepts and implementations. Some key concepts such as CIM Indication class hierarchy, indication filter, indication handler, subscription and delivery process are covered. Live demo and detailed coding examples demonstrate step-by-step how to write indication providers and listeners. Pegasus is used as a CIM Server for this demo. However, the concepts and implementations can be applied to any other CIM Server. After attending this tutorial, the audience will clearly understand how the CIM event model works and how to develop supporting code.
"Best of FAST" TrackThe following talks were selected as the two best papers at the last File and Storage Technology Conference. The FAST Conference is an Academic and Industry research conference put on by the USENIX organization. Ursa Minor: Versatile Cluster-based StorageJohn Strunk No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-line changes to, encoding schemes and fault models. Thus, different data types can share a scalable storage infrastructure and still enjoy specialized choices, rather than suffering from "one size fits all." Experiments with Ursa Minor show performance benefits of 2–3x when using specialized choices as opposed to a single, more general, configuration. Experiments also show that a single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use a "one size fits all" configuration. When using the specialized distributions, aggregate cluster throughput nearly doubled. On Multidimensional Data and Modern DisksSteven Schlosser With the deeply-ingrained notion that disks can efficiently access only one dimensional data, current approaches for mapping multidimensional data to disk blocks either allow efficient accesses in only one dimension, trading off the efficiency of accesses in other dimensions, or equally penalize access to all dimensions. Yet, existing technology and functions readily available inside disk firmware can identify non-contiguous logical blocks that preserve spatial locality of multidimensional datasets. These blocks, which span on the order of a hundred adjacent tracks, can be accessed with minimal positioning cost. This paper details these technologies, analyzes their trends, and shows how they can be exposed to applications while maintaining existing abstractions. The described approach can achieve the best possible access efficiency afforded by the disk technologies: sequential access along primary dimension and access with minimal positioning cost for all other dimensions. Experimental evaluation of a prototype implementation demonstrates a reduction of overall I/O time for multidimensional data queries between 30% and 50% when compared to existing approaches. |