Big Data

Material on this page is intended solely for the purpose of content review by SNIA members. Tutorial material may be read and commented upon by any SNIA member, but may not be saved, printed, or otherwise copied, nor may it be shared with non-members of the SNIA. Tutorial managers are responsible for responding to all comments made during the open review period. No responses will be given to comments made outside the open review period.

Jump straight to an abstract:

The Abstracts

Introduction to Analytics and Big Data - Hadoop
Rob Peglar
Download

This tutorial serves as a foundation for the field of analytics and Big Data, with an emphasis on Hadoop.  An overview of current data analysis techniques, the emerging science around Big Data and an overview of Hadoop will be presented.  Storage techniques and file system design for the Hadoop File System (HDFS) and implementation tradeoffs will be discussed in detail.  This tutorial is a blend of non-technical and introductory-level technical detail, ideal for the novice.

Learning Objectives

  • Gain a further understanding of the field and science of data analytics 
  • Comprehend the essential differences surrounding Big Data and why it represents a change in traditional IT thinking
  • Understand introductory-level background detail around Hadoop and the Hadoop File System (HDFS)

Protecting Data in the "Big Data" World
Thomas Rivera
Download

Data growth is in an explosive state, and these "Big Data" repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges and the methods that can be used for protecting "Big Data" repositories.

Learning Objectives

  • Participants will get to understand the unique challenges of managing and protecting "Big Data" repositories. 
  • Participants will be able to understand the various technologies available for protecting "Big Data" repositories. 
  • Participants will get to understand the various data protection considerations for "Big Data" repositories, for various environments, including Disaster Recovery/Replication, Capacity Optimization, etc.

How to Cost Effectively Retain Reference Data for Analytics and Big Data
Molly Rector
Download

The need to store critical digital content at the petabyte level and beyond is quickly moving outside the capabilities of traditional storage solutions. In addition, the complexity of many of today’s storage solutions can prohibit companies from migrating to a solution that best fits their needs.  Many companies are turning to active archives, an approach that is quickly gaining traction for companies that regularly manage high-volume reference data or face exponential data growth associated with Data Analytics and Big Data. Software, tape and disk vendors alike are combining the best of these technologies to provide more efficient and functional solutions. And, according to InterSect360, 35% of Big Data users are already using tape to help access and retain the data necessary for long-term analytics.

Learning Objectives

  • What to consider when implementing an active archive for storage. 
  • How to store your data for maximum impact and analytic reference.
  • How to obtain reliable access to data storage and analytics in a cost-effective manner. 

Big Data Storage Options for Hadoop
Dr. Sam Fineberg
Download

Companies are generating more and more information today, filling an ever growing collection of storage devices.  The information may be about customers, web sites, security, or company logistics.  As this information grows, so does the need for businesses to sift through the information for insights that will lead to increased sales, better security, lower costs, etc. The Hadoop system was developed to enable the transformation and analysis of vast amounts of structured and unstructured information.  It does this by implementing an algorithm called MapReduce across compute clusters that may consist of hundreds or even thousands of nodes.  In this presentation Hadoop will be looked at from a storage perspective.  The presentation will describe the key aspects of Hadoop storage, the built-in Hadoop file system (HDFS), and other options for Hadoop storage that exist in the commercial, academic, and open source communities.

Learning Objectives

  • Understand the basics of Hadoop, both from a compute and storage perspective.  Understand how Hadoop uses storage, and how this is strongly tied to its native filesystem, HDFS. 
  • Understand what other storage options have been adapted to work with Hadoop and how they differ from HDFS.
  • Understand the key tradeoffs between storage options including performance, reliability, efficiency, flexibility, and manageability.

Massively Scalable File Storage
Philippe Nicolas
Download

Internet changed the world and continues to revolutionize how people are connected, exchange data and do business. This radical change is one of the cause of the rapid explosion of data volume that required a new data storage approach and design. One of the common element is that unstructured data rules the IT world. How famous Internet services we all use everyday can support and scale with thousands of new users added daily and continue to deliver an enterprise-class SLA ? What are various technologies behind a Cloud Storage service to support hundreds of millions users ?  This tutorial covers technologies introduced by famous papers about Google File System and BigTable, Amazon Dynamo or Apache Hadoop. In addition, Parallel, Scale-out, Distributed and P2P approaches with Lustre, PVFS and pNFS with several proprietary ones are presented as well. This tutorial adds also some key features essential at large scale to help understand and differentiate industry vendors offering.

Learning Objectives

  • Understand large storage environments needs and challenges 
  • Learn Network, Parallel, Distributed and P2P technologies used in industry solutions 
  • Detail design, advantages and drawbacks of various approaches

Can You Manage at Petabyte Scale?
John Webster
Download

Stored data growth within the data center now averages 50-80% on a compounded annual growth rate (CAGR) basis. However, it is anticipated that storage administrators will experience accelerating growth rates over the next three to five years. Acceleration will be driven by a number of factors we can identify now including mobile devices, analytics applications, and a growing list of data sources that are external to the data center.  The cost of acquiring, managing and maintaining data center storage is already the largest line item in the IT hardware budget. But, with an accelerating demand for added capacity, IT management must now develop new and more sustainable practices and processes to adequately manage data growth. This presentation will review the storage-based technologies and best practices for managing at petabyte scale.

Learning Objectives

  • Understand the factors the will be driving accelerating  demand for additional storage capacity in the next three to five years. 
  • Learn what storage system-based technologies can be used to manage storage at petabyte scale
  • Learn a few best practices for managing accelerating storage growth.

Social and MultiMedia


Videos

Webcasts

Blog

Podcasts

        SNIA on Facebook   SNIA on Twitter   SNIA on LinkedIn

Login




Forgot your password? Is your company a member?
Get your login here!
Nonmembers join here

Featured Events

View more SNIA Events>