Technology Communities |
Big DataMaterial on this page is intended solely for the purpose of content review by SNIA members. Tutorial material may be read and commented upon by any SNIA member, but may not be saved, printed, or otherwise copied, nor may it be shared with non-members of the SNIA. Tutorial managers are responsible for responding to all comments made during the open review period. No responses will be given to comments made outside the open review period. Jump straight to an abstract:
The Abstracts
Introduction to Analytics and Big Data - Hadoop This tutorial serves as a foundation for the field of analytics and Big Data, with an emphasis on Hadoop. An overview of current data analysis techniques, the emerging science around Big Data and an overview of Hadoop will be presented. Storage techniques and file system design for the Hadoop File System (HDFS) and implementation tradeoffs will be discussed in detail. This tutorial is a blend of non-technical and introductory-level technical detail, ideal for the novice. Learning Objectives
Protecting Data in the "Big Data" World Data growth is in an explosive state, and these "Big Data" repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges and the methods that can be used for protecting "Big Data" repositories. Learning Objectives
How to Cost Effectively Retain Reference Data for Analytics and Big Data The need to store critical digital content at the petabyte level and beyond is quickly moving outside the capabilities of traditional storage solutions. In addition, the complexity of many of today’s storage solutions can prohibit companies from migrating to a solution that best fits their needs. Many companies are turning to active archives, an approach that is quickly gaining traction for companies that regularly manage high-volume reference data or face exponential data growth associated with Data Analytics and Big Data. Software, tape and disk vendors alike are combining the best of these technologies to provide more efficient and functional solutions. And, according to InterSect360, 35% of Big Data users are already using tape to help access and retain the data necessary for long-term analytics. Learning Objectives
Big Data Storage Options for Hadoop Companies are generating more and more information today, filling an ever growing collection of storage devices. The information may be about customers, web sites, security, or company logistics. As this information grows, so does the need for businesses to sift through the information for insights that will lead to increased sales, better security, lower costs, etc. The Hadoop system was developed to enable the transformation and analysis of vast amounts of structured and unstructured information. It does this by implementing an algorithm called MapReduce across compute clusters that may consist of hundreds or even thousands of nodes. In this presentation Hadoop will be looked at from a storage perspective. The presentation will describe the key aspects of Hadoop storage, the built-in Hadoop file system (HDFS), and other options for Hadoop storage that exist in the commercial, academic, and open source communities. Learning Objectives
Massively Scalable File Storage Internet changed the world and continues to revolutionize how people are connected, exchange data and do business. This radical change is one of the cause of the rapid explosion of data volume that required a new data storage approach and design. One of the common element is that unstructured data rules the IT world. How famous Internet services we all use everyday can support and scale with thousands of new users added daily and continue to deliver an enterprise-class SLA ? What are various technologies behind a Cloud Storage service to support hundreds of millions users ? This tutorial covers technologies introduced by famous papers about Google File System and BigTable, Amazon Dynamo or Apache Hadoop. In addition, Parallel, Scale-out, Distributed and P2P approaches with Lustre, PVFS and pNFS with several proprietary ones are presented as well. This tutorial adds also some key features essential at large scale to help understand and differentiate industry vendors offering. Learning Objectives
Can You Manage at Petabyte Scale? Stored data growth within the data center now averages 50-80% on a compounded annual growth rate (CAGR) basis. However, it is anticipated that storage administrators will experience accelerating growth rates over the next three to five years. Acceleration will be driven by a number of factors we can identify now including mobile devices, analytics applications, and a growing list of data sources that are external to the data center. The cost of acquiring, managing and maintaining data center storage is already the largest line item in the IT hardware budget. But, with an accelerating demand for added capacity, IT management must now develop new and more sustainable practices and processes to adequately manage data growth. This presentation will review the storage-based technologies and best practices for managing at petabyte scale. Learning Objectives
|
LoginFeatured Events
|

