Analytics and Big Data Summit Abstracts



The Evolving Apache Hadoop Eco System - What it means for big data analytics and storage developers

Sanjay Radia, Co-Founder, Hortonworks

Abstract

This talk I will cover its impact on the storage industry and what hadoop means for the big data analytics.


Hadoop and Relational Database – The Best of Both Worlds for Analytics

Greg Battas, Chief Technologist, Data Management

Abstract

As users embrace Hadoop for analytics, it is becoming clear that relational DBMS’s are a complimentary tool for solving most deep analytic problems.  This presentation will present a six step framework for analytics that starts with understanding the business problem and ends with deployment and tuning of analytic models then use that as a basis to discuss how to best leverage Hadoop and Relational Databases throughout the process.  This recommended approach is based on research and performance testing that will be reviewed in this presentation.      HP have tested various portions of the analytic in HP Labs and in our Performance Engineering Labs using Cloudera and MapR as well as various relational database products to understand where these technologies work best.


Top 10 Things We Learned About Hadoop

Val Bercovici, Cloud Czar, Office of he CTO

Abstract

Hadoop continues to climb the IT hype cycle.  Along the way, plenty of truth, myth and folklore has been created around Hadoop's business capabilities and technical infrastructure requirements.  NetApp has been actively developing, selling and supporting Hadoop solutions for over a year now and has accumulated tangible knowledge around what it takes to build a sustainable Hadoop Cluster as well as what new business insights that enables.  Come hear Val review NetApp's real-world discoveries about Hadoop and find out what myths need retiring as well as which truths need uncovering.


Big Data: What's in it for me?

Benjamin Woo, Managing Director, Neuralytix

Abstract

Big Data to date has been all about the technologies - NoSQL databases, Hadoop, in-memory processing, etc. However, at the end of the day, Big Data is about how to create value from data. This session will discuss what end-users are looking for from "Big Data" and help them move from buzz to buy.


Big Data and the Analysis Conundrum – Challenges and Opportunities”

Rob Peglar, Chief Technology Officer, Americas at EMC Isilon.

Abstract

This talk will cover several current topics in big data and specific analytic use cases, outlining the challenges and opportunities in the field, as well as the ethics of big data analytics. The use of Hadoop and associated toolsets, along with optimal HDFS architecture for analysis problems at scale, will be discussed and best practices outlined.

Learning Objectives

  • Gain a further understanding of the field and science of data analytics
  • Comprehend the essential differences surrounding Big Data and why it represents a change in traditional IT thinking
  • Understand introductory-level background detail around Hadoop and the Hadoop File System (HDFS)

Planning, Implementing, and Going Live with Big Data for the Enterprise

Moderator:  Wayne Adams, Chairman, SNIA Board of Directors.

Panelists:  Benjamin Woo, Managing Director, Neuralytix; John Webster, Senior Partner, Evaluator Group; Addison Snell, CEO, Intersect360 Research; Ashish Nadkarni, IDC

Abstract

Panelists will offer their insights, recommendations, and best practices to make the journey to Enterprise Big Data.  Topics will cover IT infrastructure for big data and leveraging the cloud, IT and data analyst skill sets, big data platforms,  sourcing public data from the web, data protection and security,  dashboards, business alignment and ROI.

The audience will be encouraged to get answers to their pressing questions.


Different Perspectives on Big Data: A Broad-Based, End-User View

Addison Snell, CEO Intersect360 Research

Abstract

Addison Snell of Intersect360 Research will share a selection of research and insights from the recent research study, "The Big Data Opportunity," which surveyed over 300 Big Data application users, comprising differing viewpoints on Big Data in enterprise, small business, and public sector. As part of this presentation, Addison will discuss the types and characteristics of Big Data applications and the different dimensions they take. The full study includes purchase criteria, application environments, and deep-dive technology modules on storage, compute, interconnect, and cloud technologies, with insights for the adoption of high-performance technologies for Big Data applications. This overview will present highlights from the study.


Long Term Retention Of Big Data

Dr. David Pease, Senior Technical Staff Member, Manager of Exploratory Storage Systems

Simona Rabinovici-Cohen, Research Staff Member, IBM Research - Haifa

Abstract

Generating and collecting large data sets is becoming a necessity in domains that also need to keep data for long periods. A challenge is providing economically scalable storage systems that efficiently store and preserve the data, as well as enabling search and analytics. There are many reasons to prefer tape storage for such archives; the Linear Tape File System (LTFS) provides efficient access to tape using familiar interfaces. To enable long term preservation and OAIS support we are developing the Self-contained Information Retention Format (SIRF). SIRF consists of preservation objects, and a catalog containing metadata relating to the container and the preservation objects. We present challenges in long-term retention of big data, and work on combining LTFS and SIRF to address some of those challenges.


Introduction to Analytics and Big Data - Hadoop

Rob Peglar, Chief Technology Officer, Americas at EMC Isilon.

Abstract

This tutorial serves as a foundation for the field of analytics and Big Data, with an emphasis on Hadoop.  An overview of current data analysis techniques, the emerging science around Big Data and an overview of Hadoop will be presented.  Storage techniques and file system design for the Hadoop File System (HDFS) and implementation tradeoffs will be discussed in detail.  This tutorial is a blend of non-technical and introductory-level technical detail, ideal for the novice.


Setting the Direction for Big Data Benchmark Standards

Chaitan Baru, Director, Center for Large-scale Data Systems Research, San Diego Supercomputer Center, UC San Diego

Abstract

This presentation will provide a summary of the outcomes from the Workshop on Big Data Benchmarking held May 8-9, 2012 in San Jose, CA. The 60 invited workshop attendees represented 45 different organizations, including industry and academia, with backgrounds in the areas of big data management, database systems, performance benchmarking, and big data applications. The workshop discussed definitional and process-based issues related to big data benchmarking and concluded that there was a need as well as opportunity for benchmarks that could capture the end-to-end aspects of big data applications. Benchmark metrics would include performance and price/performance; costs would encompass total system cost, setup cost, and energy costs. The next two workshops are scheduled for December 2012 in Pune, India and June 2013 in Xian, China.


Shared Storage for Shared Nothing

John Webster, Senior Partner, Evaluator Group

Abstract

Storage within the distributed computing environments typically used for Big Data analytics platforms is not NAS and SAN. DAS is preferred in these “Shared Nothing” clusters like Hadoop. This presentation makes the case for using shared storage for Big Data analytics. It examines the pros and cons of a number of storage deployment models that use shared storage for data analytics processes including Hadoop. It also examines the pros and cons of an emerging deployment model: using a distributed storage cluster to host Hadoop and the use of Hadoop as a shared storage platform.


How to Cost-effectively Retain Reference Data for Analytics and Big Data

Molly Rector, Executive Vice President of Product Management and Worldwide Marketing, Spectra Logic 

Abstract

The need to store critical digital content at the petabyte level and beyond is quickly moving outside the capabilities of traditional storage solutions. In addition, the complexity of many of today’s storage solutions can prohibit companies from migrating to a solution that best fits their needs.      Many companies are turning to active archives, an approach that is quickly gaining traction for companies that regularly manage high-volume reference data or face exponential data growth associated with Data Analytics and Big Data. Software, tape and disk vendors alike are combining the best of these technologies to provide more efficient and functional solutions. And, according to InterSect360, 35% of Big Data users are already using tape to help access and retain the data necessary for long-term analytics.


What is this Next Generation Object Storage and Can It Really Deliver Global Access?

Claire Giordano, Sr. Director of Product Management, Quantum Corporation

Abstract

Big Data and Cloud create opportunities for IT but also exposes more complexities than most may have bargained for.  Traditional storage architectures do not address the requirements for lower cost, more reliability and a performance rate that doesn’t impact the ultimate goal -- data access. The reality is that RAID is not Cloud ready or provide the bandwidth or reliability to support Big Data needs.  Large volumes of unstructured data in a file-based architecture are not optimized to support Cloud’s promise of ubiquitous access.  Will the next generation of object storage rise to the challenge to support these new requirements and help solve the Big Data, global access conundrum?


A Big Data Storage Architecture for the Second Wave

David "Sunny" Sundstrom, Principle Product Director, Oracle

Abstract

The first “wave” of big data was all about Hadoop, by definition its own storage silo. The second wave of big data recognizes with value of multiple data types  (not just unstructured), different processing engines, and ingest and archival requirements that enable business analytics to produce the greatest value, short AND long term. A storage implementation for big data is not just dropping in a Hadoop cluster but designing to take these requirements into account along with considerations for existing datacenters versus greenfield implementations. A big data storage architecture needs to be integrated, optimizing data usage, movement, and aligning the cost of storage with data value. It’s all about the data. This big data storage architecture and considerations for it will be discussed.


Protecting Data in the "Big Data" World

Thomas Rivera, Senior Technical Association, Hitachi Data Systems

Abstract

Data growth is in an explosive state, and these "Big Data" repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges and the methods that can be used for protecting "Big Data" repositories.

Learning Objectives

  • Participants will get to understand the unique challenges of managing and protecting "Big Data" repositories
  • Participants will be able to understand the various technologies available for protecting "Big Data" repositories.
  • Participants will get to understand the various data protection considerations for "Big Data" repositories, for various environments, including Disaster Recovery/Replication, Capacity Optimization, etc.

Three Proven New Ideas That Will Completely Change Your Cost Model for Big Data Storage

Mark Seamans, CTO, FileTek

Abstract

Face it – your storage costs are going through the roof faster than you can write requests for additional funding – and you’ve got your disk vendor setup on speed dial. You need some new ideas and a new way to slay the dragon while still assuring online access to all the data that people/applications need. This session will unveil new ideas that have changed the game for some big data pioneers. We’ll “name the names” of organizations who have made the switch, and we’ll give details on a new generation cost model that will get your name on the invite list for your CFO’s next cocktail party.

Learning Objectives

  • Participants will get to understand the unique challenges of managing and protecting "Big Data" repositories
  • Participants will be able to understand the various technologies available for protecting "Big Data" repositories.
  • Participants will get to understand the various data protection considerations for "Big Data" repositories, for various environments, including Disaster Recovery/Replication, Capacity Optimization, etc.

Creating an Enterprise-Class Hadoop Platform

Joey Jablonski, Global Analytics Practice Lead, DataDirect Networks

Abstract

Apache Hadoop is the intersection of storage of data, and understanding of information.  It combines two powerful technologies to create a platform to allow for the analysis of very large data sets, while providing flexibility in how information is presented to users and consumers.  Traditional enterprise computing created silos for these functions, requiring time consuming processes to move data between them.  Hadoop enables a single platform for both storage and analysis of growing data sets.    Today, Hadoop is commonly used within the largest Web & Cloud companies on the Internet.  Hadoop is beginning to emerge within traditional enterprise computing environments because it provides powerful technology to enable businesses to better understand their data.  Hadoop adoption is slowed by the difficulty that is perceived in managing and integrating it within traditional enterprise computing environments.    This talk will focus on the real world application of Hadoop within traditional enterprise environments.  The focus will be on the architectural considerations, operational considerations and integration considerations to add Hadoop to your computing environment.  The speaker will provide background on the tools and processes that enable Hadoop adoption, while minimizing the operational impact to existing IT organizations.  Participants will leave with a greater understudying of how they can identify and deploy enterprise class Hadoop platforms for enabling the business.    Objectives:  - Participants will gain an understanding of the architectural considerations when planning an enterprise class Hadoop deployment  - Participants will see how to plan and manage growth in Hadoop environments  - Participants will see how to develop reliable and repeatable models for SLA through QoS  - Participants will see how to integrate Hadoop with existing IT infrastructure to ensure that it does not become yet another silo within IT.


A Big Data Storage Architecture for the Second Wave

David "Sunny" Sundstrom, Principal Product Director, Oracle

Abstract

 The first “wave” of big data was all about Hadoop, by definition its own storage silo. The second wave of big data recognizes with value of multiple data types  (not just unstructured), different processing engines, and ingest and archival requirements that enable business analytics to produce the greatest value, short AND long term. A storage implementation for big data is not just dropping in a Hadoop cluster but designing to take these requirements into account along with considerations for existing datacenters versus greenfield implementations. A big data storage architecture needs to be integrated, optimizing data usage, movement, and aligning the cost of storage with data value. It’s all about the data. This big data storage architecture and considerations for it will be discussed.


Big Data Storage Options for Hadoop

Dr. Sam Fineberg, Distinguished Technologist, Hewlett-Packard Company 

Abstract

Companies are generating more and more information today, filling an ever growing collection of storage devices.  The information may be about customers, web sites, security, or company logistics.  As this information grows, so does the need for businesses to sift through the information for insights that will lead to increased sales, better security, lower costs, etc. The Hadoop system was developed to enable the transformation and analysis of vast amounts of structured and unstructured information.  It does this by implementing an algorithm called MapReduce across compute clusters that may consist of hundreds or even thousands of nodes.  In this presentation Hadoop will be looked at from a storage perspective.  The presentation will describe the key aspects of Hadoop storage, the built-in Hadoop file system (HDFS), and other options for Hadoop storage that exist in the commercial, academic, and open source communities.

Learning Objectives

  • Understand the basics of Hadoop, both from a compute and storage perspective.  Understand how Hadoop uses storage, and how this is strongly tied to its native filesystem, HDFS.
  • Understand what other storage options have been adapted to work with Hadoop and how they differ from HDFS.
  • Understand the key tradeoffs between storage options including performance, reliability, efficiency, flexibility, and manageability.

Social and MultiMedia


Videos

Webcasts

Blog

Podcasts

        SNIA on Facebook   SNIA on Twitter   SNIA on LinkedIn

Login




Forgot your password? Is your company a member?
Get your login here!
Nonmembers join here

Featured Events

View more SNIA Events>