Farsighted News SNIA
Community Advertising Subscribe to FarSighted Feedback Contact

Table of Contents

Home Page
Only in FarSighted
Spotlight on SNIA
Analyst Watch
Events

Archives

June 2008
March 2008
November 2007
August 2007
May 2007
February 2007



IT Corner

February 2007 Archives - Only in FarSighted

The views expressed in FarSighted are not necessarily the views of the SNIA. The SNIA strives to be vendor neutral and technology agnostic.

2007 Outlook: A Collection of Views from the Industry's Thought Leaders

With 2007 underway, the industry's leading analysts are on record with predictions and prognostications about the coming year and beyond. FarSighted takes a look at some of the most relevant and interesting forecasts, as well as key recommendations for the months ahead.

Among them, John Mahoney, vice president and distinguished analyst at Gartner, has stated that "2007 will see mounting demand for business growth and agility, rapid development of consumer technology and increasing availability of new infrastructure tools, at the same time as the evolution of the IT organization continues to pick up pace. This will require CIOs to have a fairly ambitious list of 'new year resolutions' for 2007, in addition to the big main agenda projects that other people depend on them to deliver in a timely way." Mahoney has warned that CIOs' focus on managing and reducing IT costs can be damaging to longer term strategies for growth, leading to under-investment in infrastructure.

Mahoney recommends that CIOs "tag savings from one area, for direct application in another. Be specific; 'savings made from server consolidation will be used to upgrade sales force laptops in quarter three.' Start reporting regularly on business value delivered for completed projects and make simplification a mantra. This isn't the same as cutting IT costs, it means redesigning business processes for less complex and expensive systems."

Also from Gartner, the firm predicts that by 2010, the average total cost of ownership (TCO) of new PCs will fall by 50%. The firm states that "the growing importance and focus on manageability, automation and reliability will provide a welcome means of differentiating PCs in a market that is increasingly commoditized. Many of the manageability and support tools will be broadly available across multiple vendors. However, vendors that can leverage these tools further and can graduate from claims of 'goodness' to concrete examples of cost savings will have a market advantage."

Another analyst, Robin Harris of StorageMojo.com, recently entered a "time machine" to review 2007 - a year in advance. This "look back" at 2007 includes the observation that "The 1.5 TB disk drive got off to a slow start late in the year, but the 240 GB 2.5? notebook drives took off fast as notebooks took 60% of the PC market by year end and the Internet Data Centers started using notebook drives in the millions to reduce power, space and cooling needs."

For 2007, analyst firm IDC expects that there will be heightened competition in the information access and management space as customers "demand 'rapid access to relevant information' as a top business requirement for IT." In addition, the firm predicts that virtualization and software appliances will reshape the infrastructure landscape, stating "Already one of the most disruptive forces in the infrastructure marketplace, virtualization will shift into a new phase, delivering higher IT service levels and creating new opportunities for software vendors to create products that manage an increasingly virtualized IT environment. A key innovation here will be 'software appliances' - limited function, self-contained products that are easily and inexpensively acquired and replaced."

2007 promises to be an exciting year for the storage industry. For more information on the latest news from the top industry analysts, be sure to read FarSighted's Analyst Watch (http://www.snia.org/about/news/farsighted/analyst_watch) section.




ILM and Intelligent Storage Architecture

By Fred Moore, President, Horison Information Strategies

Introduction to the Life Cycle of Data
Understanding what happens to data throughout its lifetime is becoming an increasingly important aspect of effective data management. What happens to data as it ages? Does usage decline as data ages? Does the value of data increase or decrease as it ages? Why are we keeping more data longer than ever before? What conditions indicate when data should be retired? Do storage management requirements change as data goes through its life-cycle? If data is the most valuable asset of so many businesses, why do we know so little about it? Now commonly referred to as ILM (Information Lifecycle Management), managing data throughout its lifetime has become a critical storage management discipline.

In recent years, the overall effectiveness of ILM applications has been limited primarily by the one-piece-at-a-time approach needed to build them. Designing an ILM system normally required combining a data migration technology, a reporting technology and a backup/recovery technology from several vendors. Since these components ignored the call interoperability, they haven't always been easy to implement. This approach requires purchasing components from several different companies, which involve separate acquisition, implementation and maintenance costs, as well as resources and training of personnel to manage them.

Implementing ILM at the application level would seem ideal. However, since many applications have multiple files with different life-cycle and data protection requirements, building an ILM strategy today is usually done at the individual file level. In particular, the probability of reuse of data has historically been one of the most meaningful metrics for understanding optimal data placement and remains a key metric for effective HSM (Hierarchical Storage Management) implementation. HSM has been an integral storage management function for the mainframe for more than 20 years, and is just now gaining popularity for Unix and NT systems. For the majority of data types (excluding system files, indices, directories, etc.), the number of references to data significantly declines as the data ages. This basic observation serves as the basis for more cost-effective storage management, as it enables the movement of less active data to lower-cost levels of storage in terms of capital expense and operating expense.

For the first time, the amount of data is increasing as it ages. Unlike the past, fixed content including entertainment data, compliance and archival storage have now become the fastest growing areas of the storage industry. Storage demand grew at over 100 percent per year during the dot-com boom of the late 1990's. Today, the storage industry is generating new data at a rate of approximately 35-50% per year depending on the application.

The percent of all digital data that has lost its value and should be deleted is quickly declining, as obsolete data is often "just kept around forever." In many cases, this approach is perceived to be easier than managing the data throughout its life-cycle. The probability of reusing most data typically falls by 50 percent after the data is three days old or three days since its creation. Thirty days after creation, the probability of reuse normally falls below a few percentage points. E-mail and medical imaging applications represent good examples for the data aging profile described here. Keeping very low activity, archival and inactive data on continually spinning disks for long periods of time is not economical, as energy consumption alone can drive up operating expense. The following reference table provides additional insight into numerous storage consumption and usage patterns that will place even more pressure on data life cycle management.

Storage Facts, Figures, Estimates and Rules of Thumb

Average annual digital storage demand rate (primary occurrence of data, all platforms) 35-40% (2006-2008)
Amount of disk data stored on Unix, Windows and Linux systems WW (est.) >90%
Average disk allocation levels for z/OS (eSeries mainframes using DFSMS suite) 60-80%
Average disk allocation levels for iSeries (AS/400 servers with single-level storage) 60-80%
Average disk allocation levels for Unix/Linux systems 30-45%
Average disk allocation levels for Windows systems 25-40%
Average annual disk drive demand increase 35-50% (downward trend expected as recording limits begin to appear)
Average annual disk drive performance improvement (seek, latency and data rate) <5% (mainly with data rate, as seek time improvement is minimal)
Increase in disk drive capacity per actuator since the first disk drive in 1956 150,000x (5MB to 750 GB)
Increase in native tape cartridge capacity since the first tape cartridge in 1984 4,000x (200MB to 800 GB)
Average data center power consumption 40 watts per square foot
Average power used per blade server rack 15-30 kilowatts
Average cost to build a data center $400 per square foot
Average non-mainframe server busy (% processor busy) 25-40%
Average tape cartridge utilization levels for virtual tape systems 60-80%
Typical range of disk data managed per administrator (for non-mainframe systems -Windows, Unix, Linux) 500GB - 10TB
Typical amount of disk data managed per administrator (z/OS, mainframe) >50TB
Estimated range of automated tape data managed per administrator (all platforms) 40TB to >1EB (varies widely based on library size)
Annual growth rate of e-mail spam message traffic ~350%
Estimated percentage of SANs that are homogeneous ( the same operating system) 60-70% (Mainframes, Unix and Windows systems only)
Percent of NAS deployed databases greater than 500GB (est.) ~10%
Average number of spam e-mails delivered every 30 days 3.65 billion
Average size of e-mail in 2007 (est.) 650kb
Number of e-mails sent daily in 2006 (est.) >35,000,000,000 (billion)
Percentage of all e-mail traffic that is unwanted ~84%
Percent of companies citing employees as the most likely source of hacking 77%
Percentage of US adults with more than 200GB of storage capacity 10% (approximately 28 million)
Percentage of digital data stored on removable media (primarily magnetic tape) ~80%
Number of new small businesses created in the US in 2005 550,000
Average revenues of InformationWeek 500 companies in 2005 $9.776 B (was $12.47B in 2001)
Average IT budget of Information Week 500 as a percentage of revenue 3% (was 3.88% in 2001)
Average percentage of IT budget in the US spent on IT salaries and benefits 32%
Average percentage of IT budget in the US spent on compliance 5%
Percentage of businesses who take backup tapes offsite daily, weekly, and monthly Daily - 56%. Weekly -32%. Monthly - 4%.
Data selected from a variety of industry sources.
Compiled by: Horison Information Strategies. http://www.horison.com
June 2006

Data Retention Requirements Change
When the Nearline ™ concept was becoming widely accepted in the 1990s, the common belief was that archival status was the last stop for the data before deletion or end-of-life. Then, one-to-two- year data retention periods were viewed as a reasonable amount of time to keep most digital data accessible. Fifteen years later, the game and rules are different. Increasing legal claims, new government compliance regulations for transmission and retention of data have made us change the way we look at data as it ages. Several major health care providers are faced with managing and storing petabytes of digital data requiring storage management services for a person's lifetime plus twenty years. This could easily exceed 100 years for many people. All data is not created equal, and the value of data can change throughout its lifetime.

Implementing Life Cycle Management and Policies
Can someone actually implement an information life-cycle strategy? Is managing data for its lifetime realistically possible given today's technologies? If you are using a mainframe for your storage services, deploying an ILM strategy is possible with well proven and integrated capabilities like DFSMS™ (Data Facility Storage Management Subsystem) and DFHSM (Data Facility Hierarchical Storage Manager.) DFSMS is a policy-based software suite that automatically manages data from creation to expiration. DFSMS provides allocation control for availability and performance, backup/recovery and disaster recovery services, security, space management, removable media management, and reporting and simulation for performance and configuration tuning. These systems provide the policy engine and data mover which represent two of the three minimum components necessary for an ILM implementation. The third component is a tiered storage hierarchy, optimizing infrastructure and operating expenses as data advances throughout its life-cycle.

The catch here is, of course, that mainframes only account for about 5% of the world's digital storage. Over 90% of the world's digital data resides on the increasingly popular Unix, Windows, and Linux operating systems, and these systems don't have an equivalent to DFSMS. Therefore, for the vast majority businesses wanting to implement a real ILM solution, they have to build it - not buy it.

Data Life Cycle Management Needs a Solution
To implement a data lifecycle management strategy from a technology perspective, the de-facto standard three-tiered storage hierarchy model has emerged as the optimal choice. These tiers include primary storage, which is always disk-based for highly active, mission critical or customer-facing revenue generating applications. Secondary storage includes a variety of virtual tape implementations for enterprise systems, and low cost SATA disk systems for data that has a lower activity level but hasn't yet reached archival status. The third tier is long-term archival storage, which remains the realm of magnetic tape and automated tape libraries (though the new MAID [Massive Arrays of Idle Disks] concept and the arrival of "disk in a cartridge" has significant promise). The issue of moving large amounts of data from one level of the hierarchy to another, passing in and out of a server, is a growing performance problem. A device-to-device data-transfer capability between the tiers appears as the best solution - but a solution that remains in the future.

Intelligent Storage Architecture
As intelligent storage solutions continue to evolve, policy-based data placement and movement between various levels of the storage hierarchy will eventually occur automatically, without human involvement. As more storage management functions gradually move outboard of the application servers, they will be implemented as either an in-band or out-of-band function in the storage fabric itself. Advanced policy-driven SRM (Storage Resource Management) software will add intelligence to measure reference patterns and trigger management policies that result in moving data, in conjunction with an HSM-like function, to the optimal storage location throughout its lifetime.

Moving select storage management functions off the server and into the storage fabric or subsystems is slowly emerging from select suppliers in the storage industry, and eventually will enable direct device-to-device data transfer - avoiding significant server I/O overhead. The initial application expected to move outboard is traditional backup and recovery. Representing a fundamental change in the traditional backup and recovery process, server-less backup will allow businesses to perform a variety of operations such as full backup, snapshots, and CDP (Continuous Data Protection) without consuming compute and I/O bandwidth resources from application servers. The advent of de-duplication offers a means to change the rules of backup and recovery. With outboard backup, the server initiates the backup or recovery function, but doesn't sit in the data movement path. For recovery, the data moves directly from tape or disk storage back to disk. This capability further leverages the SAN infrastructure by providing significant management enhancements for storage administrators.

Conclusion
As storage becomes cheaper to buy, it grows faster and becomes harder to manage. In parallel, the value of data is increasing irrespective of economic and other pressing global issues. As the value of data can change significantly as it ages, storage management has now become a lifetime activity. The place where data is initially stored is not necessarily the same place where it will finally be stored. Everyone can state the problem of data life cycle management. Building and delivering a solution to this growing problem will take the best minds in the industry. Given the anticipated growth rates for digital storage, the time to begin this process has already passed.

Nearline is a trademark of StorageTek (now Sun)
DFSMS is a trademark of IBM Corp.

About the Author
In 1998, Fred Moore founded Horison Information Strategies (www.horison.com), an information strategies consulting firm in Boulder, Colorado that specializes in marketing strategy, industry analysis and business development for the IT industry.



Meet the New Boss, Same as the Old Boss - What to expect in 2007

By Greg Schulz, Author "Resilient Storage Networks" and founder of the StorageIO group.

I'm not sure if it's from the "Who" concert I attended in December and the song "Won't get Fooled Again," with its classic phrase "…Meet the new boss, same as the old boss…" However, I have a sense of deja vu when it comes to the annual rite of commenting on what's going to be hot or popular for the coming year. Part of the deja vu comes from the fact that what was to be hot last year, or the year before will be hot again this year, or next year, or better yet, actually adopted and deployed. Consequently, what comes to mind is the phrase "meet the new trend, same as the old trend," and of course you could substitute "trend" for "technology," "buzzword," "hype," or "initiative," among other things.

Don't get me wrong, there are plenty of new and interesting evolutionary technologies and developments taking place on the server, storage, networking and software side of things, along with creative services and marketing activities. Is 2007 the year for iSCSI? Well, like 2006 and 2005 and before that, go ahead, you can call 2007 the year of iSCSI, along with the year of the grid, and CDP, and deduplication, and encryption, and switch-based storage virtualization, among others. Some may even re-forecast that 2007 may be the year of the final demise of magnetic tape, traditional backup or the mainframe, among other technologies that have been previously decried as dead.

If you picked up on a bit of cynicism, you are correct. The IBM zOS mainframe, which also happens to run Linux and includes support for native Fibre Channel and other open technologies, continues to do well in its target market place, as does magnetic tape. Granted, neither the magnetic tape nor the mainframe are significant growth curves, but, you could probably take some wild west speculative market growth numbers for hyped up emerging technologies, and apply those to the perceived demise of real technologies (like say the mainframe and magnetic tape among others), and you might see a more realistic picture. That is, legacy technologies are still shipping and generate revenue and profits for their manufactures, meanwhile, technologies like iSCSI, virtualization, deduplication, CDP and others continue to gain in popularity and more importantly, market adoption and deployment.

What else do we have to look forward to seeing and hearing about in 2007 and beyond? Again to the theme of "…meet the new boss, same as the old boss…" a lot of what you will hear will include power and cooling of individual components and systems. Ironically, with the race to consolidate remote office branch office (ROBO) environments with WAFS and other technologies to central locations, more pressure may be put on already stressed data centers with limited electrical power and cooling budgets.

For some, the silver bullet to solve all data center woes - from power consumption to archiving and data backup - is data de-duplication. So in 2007, rest assured, there will be more debates on the various performance, scalability, effectiveness and data integrity capabilities around data deduplication, a.k.a. single instancing, differencing, compaction, commonality factoring, and normalization.

There continues to be a focus of increasing resource utilization via consolidation and virtualization, however there is also a growing awareness of the role and importance of performance to meet service objectives. Consequently, there are many new tools for collecting data, cross-event correlation and analysis and reporting on capacity utilization, as well as performance activity. Additional solutions and approaches to addressing more effective use of storage, in terms of performance to balance with improved utilization, will be heard about in 2007.

Tape continues to be used for archiving and large scale backup operations, due to continued cost advantages on large scale deployments. CDP technologies migrate and converge with traditional data protection techniques, with a real benefit being to reduce, if not eliminate, the overhead of having to scan file systems and volumes to see what data needs to be backed up, and a side benefit being variable and dynamic recovery point objectives. eDiscovery, search, indexing and related technologies (http://www.snia.org/about/news/farsighted/archives_2006/aug06_ts_considerations) continue to get attention, and in some cases actual adoption, beyond pilot and proof of concept scenarios.

There is continued focus and growing awareness of the broad and diverse SMB market, as well as consolidation of channel vendors, emergence of new solutions and services targeted for SMB. In addition, there is the realization that there is a very large market that sits above consumer and low end of the SMB space called small office home office (SOHO), which has growing storage needs.

The re-emergence of network and remote storage-related services, ranging from managed backup providers to vaulting services to storage for rent, among others, continues in 2007. The good news is that networks and bandwidth are more plentiful, and perhaps affordable, than during the last coming of the network and managed storage services era in the late 90s. The bad news, however, is that more data needs to be moved in less time and kept for longer periods, and not all bandwidth services are available in all locations.

Clustered storage discussions continue around clustered NAS, clustered block using Fibre Channel and iSCSI, and clustered file systems among other clustered technologies to address scaling of performance, capacity, availability and management on a local and wide area basis. Grid gets a bit of a break after putting in a couple of busy years on the hype circuit, while a refreshed and re-invigorated "virtualization "discussion resurfaces (or, at least products that implement virtualization continue to be deployed, however they may not be referred to as virtualization products or technology per se).

There will be new enlightenment around encryption - from desktop to portable media to enterprise storage to removable media, including RHDD and tape - combined and enabled with key management to address complexities associated with or perceived with data security. Holographic hype shifts into reality as the technology moves from beta to GA later in 2007. There will also be more talk about Wireless USB, SAS, 4x InfiniBand, 8Gb Fibre Channel, 10Gb iSCSI and NAS over Ethernet, as well as 100Gb Ethernet for those with a need for speed. Market consolidation, mergers and acquisitions (M&A), IPOs, launch of new companies, disappearance of old companies will continue.

Needless to say, there are plenty of things to stay current and learn about during 2007. For now, have a safe and successful 2007, and keep in mind, if you have a deja vu moment with storage related technology, it could be that you are looking into a mirror in that a great crystal ball… for the future is to look into the past.

About the Author
Greg Schulz is founder and senior analyst of the StorageIO group (www.storageio.com) and author of the book (SNIA recommended reading) "Resilient Storage Networks " (Elsevier)




Why Choose Fibre Channel

By Mike McNamara, Marketing Committee Chair of the Fibre Channel Industry Association

Today's data explosion presents unprecedented challenges incorporating a wide range of application requirements such as database, transaction processing, data warehousing, imaging, integrated audio/video, real-time computing, and collaborative projects. Fibre Channel (FC) is an ideal solution for IT professionals who need reliable, cost-effective information storage and delivery at fast speeds. With development starting in 1988 and ANSI standard approval in 1994, FC is a mature, safe solution for 1Gb, 2Gb, and 4Gb communications, providing an ideal solution for fast, reliable mission-critical information storage and retrieval for today's data centers.

FC has been the major storage system interconnect since the mid 1990s, and dominates the storage area network (SAN) and external storage marketplace (see Figure 1). FC SANs offer a range of benefits, such as improved backup and restore, enhanced business continuance, and simplified consolidation. This article addresses of some of the common misconceptions about FC, and discusses some of the latest developments.

Figure 1) Block external controller-based disk storage by host interface revenue

FC Misconceptions
FC is a mature interface, interoperability is well understood, and management is becoming more standardized. FC is designed to scale from simple to the most complex topologies, and the FC Simple Configuration and Management (FC-SCM) initiative will help to streamline and cost-optimize system configurations and simplify installation and interoperability for smaller FC SANs. Although FC is sometimes considered too expensive, it offers very strong price/performance, and new bandwidth options with 4Gb offer increased performance with no increase in price versus 2Gb. In certain cases, such as high-performance tapes, FC is the optimal solution to achieve the required performance. FC offers access to cost-effective tiered storage and its backward and forward compatibility, ensuring investment protection and, at the same time, allowing for seamless migration to higher bandwidths and capacity.

FC is not just for large companies and enterprise data centers, but for companies and data centers of all sizes that need or have performance and tiered storage solutions, scalability, and mission-critical data that cannot afford business downtime. FC is designed for "bet your business" storage applications, and it runs a separate, secure network.

New FC Capabilities and Roadmap
Although FC is a mature technology, it is by no means a dead-end technology. It has a vibrant evolution and growth track in the following areas: performance (see Figure 2), security, distance, lower costs, EMI management and disk and tape devices. It is designed to allow incremental growth so that both the costs and the risks can be absorbed gradually, without exposing the user's business to excessive risk, and it does not need to employ radical new technologies to move with the demands of new applications and solutions.

FCIA Fibre Channel Speed Roadmap
Base 2
Product
Naming
Throughput
(MBps)
Line Rate
(GBaud)*
T11 Spec
Technically
Completed
(Year)**
Market
Availability
(Year)**
1 GFC 200 1.0625 1996 1997
2 GFC 400 2.125 2000 2001
4 GFC 800 4.25 2003 2005
8 GFC 1600 8.5 2006 2008
16 GFC 3200 17 2009 2011
32 GFC 6400 34 2012 Market Demand
64 GFC 12800 68 2016 Market Demand
128 GFC 25600 136 2020 Market Demand
Base2 used throughout all applications for Fibre Channel infrastructure and devices. Each speed maintains backward compatifiblity at least two previous generations (I.e., 4 GFC backward compatible to 2 GFC and 1 GFC)
* Line Rate: All Base2 speeds are single-lane serial stream
** Dates: Future dates estimated
Source FCIA

Figure 2) FCIA FC Speed Roadmap (base2)

While most back-end fibre channel is based on an arbitrated loop, the loop implementation is generally implemented as a switched loop architecture. This has the benefit of not requiring fabric services, while offering the benefits of a switched topology. This architecture takes advantage of high-performance disk drives while providing the performance of a switched architecture.

FC distances have not been getting shorter as the speed increases. FC is designed to offer the distances needed for fibre channel applications without increasing the complexity and cost of the interconnect as the speed grows. FC has offered distances much longer than needed for SAN applications for several generations, without increasing complexity, power use, or costs and while allowing reuse of the same cable plant in many cases.

In our data-intensive world, faster is better: 4Gb FC in the fabric provides support for more servers with fewer connections for less expensive fabrics. The larger connection through the SAN enables bandwidth-intensive transfers to happen faster. Applications such as modeling, video, data analysis, and medical imaging require more speed to support data-intensive streaming applications, as do tapes for backup and archiving, where sustained streaming is vital.

FC speeds will double again to 8Gb FC, which is under development and will be backward compatible to 4Gb FC and 2Gb FC. The backward compatibility of 8Gb is important because it assures users that their 2Gb and 4Gb investments will be protected and preserved going forward. Expected market availability for 8Gb is 2008. 10Gb FC is deployed today for inter-switch links (ISL) providing 2.5x - 3x ISL core bandwidth for 4Gb edge links, and 16Gb FC is on the horizon.

FC guarantees at least two generations of forward and backward compatibility, future-proofing storage and providing the best backward and forward compatibility of any data transport. FC is also very secure, has fewer entry points compared to other protocols and the FC-SP protocol (authentication with DH-CHAP encryption) has recently been released.

Innovations such as SATA Tunneling Over FC (FC-SATA), Inter-Fabric Routing (IFR), N-Port ID Virtualization (NPIV) and Fabric Application Interface Specification (FAIS) will improve interoperability and reduce costs. All will support product within 12 months.

SATA Tunneling over FC provides native connectivity of low cost Serial ATA (SATA) disk drives into existing enterprise storage systems that use FC embedded infrastructures, eliminating the layer of costly protocol bridging or discrete components used today to connect SATA disk drives into FC infrastructures. Inter-Fabric Routing is for heterogeneous fabric routing and improves scalability and interoperability. N-Port ID Virtualization makes the port ID autonomous from the server improving the sharing of HBAs, and Fabric Application Interface Specification will speed up the deployment of storage applications in the fabric.

Summary
For nearly a decade, FC has been the mainstay for companies looking to increase storage resiliency and bandwidth performance while maintaining backward compatibility. FC has a huge installed base, a dominate presence in the data center and a strong roadmap. FC will continue to be a leading storage interface for many years to come.

For more information on FC and the FCIA, visit www.fibrechannel.org

About the Author
Mike McNamara is the Chair of the FCIA Marketing Committee and the SAN Product Marketing Manager at NetApp, a world leader in unified storage solutions and the industry's fastest growing FC SAN vendor. Mike has over 17 years of marketing experience in the computer industry, with over 11 years in the storage industry. Before joining NetApp, Mike spent 5 years at Adaptec/Eurologic Systems where he had product management and marketing responsibility for Adaptec's block and file-based external storage products. Prior to Adaptec, Mike worked for EMC CLARiiON where he spent 5 years in various marketing roles of increasing responsibility helping CLARiiON grow to be an industry leader. Prior to joining EMC CLARiiON, Mike held several marketing positions of succeeding levels at Racal-Datacom and Digital Equipment Corporation. Mike is also a regular contributor to industry journals and a speaker and panelist at industry events.




The Right "Fit" for CDP

By Dan Tanner, ProgresSmart

What will it take to make Continuous Data Protection (CDP) catch on in the marketplace? That is a pressing question for vendors, investors, and enterprise users alike. Start-up vendors and their backing investors are weighing technological approaches and potential for market acceptance, and considering whether to hang on and ultimately go public, or set-up to be acquired. Established "household name" vendors are considering CDP's market value, the price of an acquisition versus in-house development and/or time to revenue. Enterprise users are hoping to become better informed about CDP, and perhaps some are biding time until they can become comfortable not only with CDP itself, but also with the vendor source.

An issue of Usage
When CDP entered the marketplace it was perceived as additive to data protection. Thus, it often wasn't purchased unless it could be seen as: (A) doing something extra, which could be extra protection or business continuance to what is most valuable, or (B) better protection/continuance for everything. In clearer language, (A) amounts to business continuity (BC) and (B) emphasizes data protection (DP). Continuous data protection does not necessarily assure continuous data availability. Thus, DP is necessary but not sufficient for BC, and BC is a subset of DP.

What Is Available?
There are CDP products that provide value for both BC and DP, and there are also more specialized offerings available where there is the possibility for different approaches: CDP products oriented toward BC will be closely linked to critical apps, providing great continuance for them. That is, they would be oriented toward not only recovery to any point (meeting the finest-grained Recovery Point Objective - RPO - which any and all CDP products must do), but also offering the shortest recovery time (meeting the most demanding Recovery Time Objective - RTO - demanded for business continuity). For reasons explained below, this type of CDP product could be highly specialized, but ultimately permanently bonded to an application, becoming legacy when the application does.

CDP products aimed primarily at RPO, but not emphasizing RTO, may cover more of the protection/continuance landscape. They would be more generalized, even capable of replacing "mainstream" backup tools, and perhaps be longer-lived.

"Fit" and "the Stack"
The fit of a CDP product may come to depend on its vertical capture point position in the stack (figure below).

The further down the stack CDP is implemented, the more general and DP-oriented it is, but with less consistency for BC usage. This type of CDP is coming to be known as block-oriented. It captures changes to storage, but without file context. However, it may be well-suited as a general backup replacement.

The higher in the stack CDP is implemented, the more application-specific and BC-oriented with consistency it is, and the less general and DP-suited. This type of CDP captures changes to files, with context, but usually with the trade-off of application-specificity (i.e.; it is not suitable for use as a replacement for general backup).

Figure courtesy of TimeSpring Software Corporation. For simplicity, it does not show fidelity, consistency and capture point relationships.

The horizontal capture points in the diagram relate to another implementation decision. CDP can be implemented on all hosts, or on selected hosts for selected applications; requiring code for each and every host environment or being limited to certain operating environments. Alternatively, CDP can be implemented in the network - productized as software on a white box or as a packaged hardware/software appliance, with the possibility of someday being implemented on an intelligent network switch - which has the advantage of seeing both of the end points, the hosts and the storage. CDP could be implemented directly on storage arrays, but no vendor is doing that, because it could then only apply to the new storage (and "fork-lift" upgrades, thankfully, are becoming a thing of the past).

You can probably glean from the figure that CDP for BC must be able to preserve true file changes with coherence among all of a file's open applications. That is what fidelity and consistency are all about.

Are We Stuck with CDP Tradeoffs?
At this point you may be saying, "OK, so now I must choose whether I want CDP for BC on critical application, meaning I will continue using normal backup regimens for all else, or to use CDP for DP replacing general backup, but realizing that my RTO for critical applications may not be optimal." However, it is not necessarily an either/or purchase decision - there are solutions available that deliver both BC and DP, though in varying degrees through different approaches.

Examples include offerings that use software to protect data (and virtualize storage) in-band at the block level, while also handling out-of-band application-level change logging, using a copy on write technique. Another approach uses software to cover the full spectrum (at block level) from scheduled replication to event-driven CDP, which with configuration tweaking could come very close to the RTO of application-specific BC-oriented CDP, but while leaving file fidelity and consistency assurance responsibility with the user.

The CDP Decision
When deciding how to implement CDP, you should identify your priorities and look at where in the stack you will do your protection. Regardless of the approach, the promise of CDP means faster data retrieval, enhanced data protection and increased business continuity. The potential for lowered costs and complexity means this will continue to be a focus for organizations in 2007 - and beyond.

About the Author
Dan Tanner is an industry veteran and Founder and Sole Proprietor of ProgresSmart (www.progressmart.com). He is an Individual Contributor to the Storage Networking Industry Association (SNIA), President of the Association of Storage Networking Professionals (ASNP) New England Chapter, and a member of the Storage Network User Group of New England (SNUG/NE).









Training at the SNIA Tech Center