Pervasive data protection in the corporate cloud

Welcome to the inaugural Computerworld blog on intelligent storage networking! In this space, I will be covering everything related to storage networks. I have been doing this stuff for quite some time, so hopefully we can learn from each other's real world experiences. I will endeavor to help you wade through all the marketing hype, so you can focus on what has real value to you as an end-user or even a storage networking professional looking to better inform your customers.

Since it's a scary time of year with Halloween and all, let's begin by tackling one of the scarier functions of IT: Backup and Disaster Recovery.

Let's face it, most backup administrators are typically under paid, under appreciated, and only become important in the eyes of upper management when things go wrong.

As an example, I had a recent conversation over lunch with a friend in Denver who is a backup admin, and the story he told me is one I have heard in different variations many times.

"Chris, man, you have no idea how bad my job is sometimes. We had yet another stupid DBA accidentally delete data, and I had to scramble to recover the database. The recovery job was almost complete, but right at the end of the second to last tape, we had a media error! I had to restart the entire job over again, but the same error occurred yet again! The end result was that we had to recover the data from the previous backup, then re-enter 24 hours worth of transactions, and I was the guy who was blamed!"

Sound familiar? This scenario is a fairly typical result of what I call a "Traditional Method" backup process. The good news for my buddy and other backup administrators is your job is about to become much more important, and you may end up with the respect you deserve, and hopefully a raise!

The advent of cloud computing and service-oriented data protection has begun to mutate the traditional role of backup administrator. The entire backup process will become a service offering by the IT department as part of the internal cloud's application service level agreement.

That's a mouthful. In essence, the backup administrator's role is transforming from the traditional "tape jockey" into a "data protection policy manager". An example of this is Symantec's push to make NetBackup more of a policy engine for backup and DR recovery.

This is actually great news for the folks in charge of backup. If you have been paying attention to where the industry is going, three recent advancements in technology are beginning to transform data center operations and the role of the IT Administrator:

  • Virtualization (Server and Storage)
  • Disk-based continuous and snapshot data protection
  • Data Deduplication

Let’s cover each of these in more detail.

Virtualization (Server and Storage): The role of server virtualization is to provide an abstraction layer between the server hardware and applications, so they can be moved between servers at will, and the role of storage virtualization is to provide the same abstraction between the servers and the storage.

The ability to abstract applications and storage from the actual hardware makes the hardware a commodity, enables applications to be moved from one server to another at anytime, without downtime, and allows storage to be purchased based on price and reliability, rather than functionality in the firmware.

Storage virtualization also facilitates the movement of data. Application data can be moved anywhere, anytime, based on performance or other requirements via a policy created by the IT admin.

Step one to the creation of an internal cloud is the introduction of intelligent virtualization for both servers and storage. You can find more about storage virtualization from the vendor-neutral tutorial at the storage networking industry association site.

Continuous and periodic protection (CDP and Snapshots): Adding continuous protection and snapshots to the mix eliminates the need to do bulk transfers of data over the network to make actual backup copies. The definition of a backup is a copy of the data, and it has to be a full copy to actually be a backup.

The backup copy must be separate from the production copy, and must be stored on physically separate hardware or storage media. Once the base copy is available, that copy can be used as the source for snapshots so that the primary copy is unaffected.

In order to accomplish realtime non-disruptive snapshots, the copy must be continually updated via CDP technology to capture any new information between snapshots. Instead of the traditional method of backing the data up with a bulk copy operation, data is simply always protected, continually through CDP, and periodically via the snapshots.

Consider the benefits! No backup window, no bulk copies over the network, and everything is stored on disk so recovery can occur rapidly by either mounting a snapshot or rolling back the CDP journal. The SNIA site has a good CDP tutorial.

This leaves data deduplication for last.

Data Deduplication: So far, we have virtualized everything and have implemented continuous protection for our critical data, and are making periodic snapshots of everthing else. The last step to an optimized IT infrastructure for our internal cloud is to add dedupe to the mix, so we can store it and replicate it as efficiently as possible.

Backup is the killer app for dedupe, but dedupe also helps make DR much more efficient. The reason backup is the killer app is because a full backup copies the same files over and over again. As an example, let's take a legal company with 500 desktops running Microsoft Word that are backed up using weekly fulls with a 30 day retention.

How many copies of Winword.exe do you need to store? Without dedupe, the first week there are 500 copies of it on tape, the next week there are 1000, the week after that there are 1500 copies, and the last week there are 2000 copies of that one file before the tapes are over written.

Now extrapolate that out to every file in the organization. You can see how it adds up real fast. If you do the math, using typical backup operations and retention requirements, 20TB worth of data with a 2% change rate and 3% growth rate will require over 101TB of media storage if retained over 5 weeks. Now let's add dedupe.

The same 20TB with the same growth and change rate at a 7:1 dedupe ratio could be stored in about 24TB. (101TB - 24TB = a savings of 77TB worth of space!) You can begin to see how much money you can save over time here. But that's not the main benefit of dedupe.

The main financial benefit of dedupe (besides less media and less storage) is how it saves WAN bandwidth for data replication. WAN bandwidth is typically a re-occurring monthly cost, and although the cost has been going down, it's still a major part of most IT budgets, which is the reason many companies are still shipping backup tapes offsite for disaster recovery. Imagine being able to get data replicated offsite electronically more efficiently and at a lower cost than shipping and storing tapes!

Let's tie all this together now into the steps to create an internal corporate cloud.

  1. Virtualize everything so application and data location are irrelevant
  2. Continually protect, rather than use a bulk copy backup for data protection, which will change the physics of backup by removing the need to move large amounts of data at the same time.
  3. Dedupe everything so it can be stored and moved efficiently
  4. Lastly, create policies for storage tiers and data life-cycle, and apply those policies on the objects being stored (files, blocks, and tapes) so that the entire data life-cycle is automated, and everything moves to where it belongs based on that policy.

The diagram below is an example of what current data center administrators are contending with. Each layer in the diagram is a separate management point. Each physical server needs to be tended to. Each element is discreet, and is usually managed via a different interface. Backup is "bolted on" as an after thought, and most data movement for backup is over the LAN, which impacts client access to the applications. Therefore, all backup operations must occur during off hours, creating the dreaded backup window.

computerworldpng1.png

A Typical Data Center

As you can see, managing this conglomerate of components and processes can be complex. Your goal is to move to more of an internal cloud-based methodology as depicted in the next graphic.

Computerworldpng2.png

The Optimized Data Center

Here, since everything is virtualized, continually protected and deduplicated, there is a single management point at the fabric layer (I call this policy management component the data services engine) where all policies are created and enforced.

Instead of managing discreet elements, everything is controlled and automated based on business policy. All data elements are placed in pools of tiered storage based on the inherent performance and reliability aspects of the underlying storage. Data movement is transparent, and data protection is continuous, and provided as a service to the application based on a specific service level agreement.

The areas of policy creation and understanding the importance of certain data types and applications to the business is where smart backup and IT administrators will focus their education efforts, so they do not get left behind in this new age of cloud computing.

Let me know if you want to hear more on these topics, and if so, we can delve more deeply into each subject. Next time I’ll provide an overview of how global file systems and object-based storage are making an impact on how data is accessed, stored, and protected, and how it impacts intelligent storage area networks.

In the meantime, if you want to peruse a few chapters of my book for free, you can search Google books for  "storage area networks for dummies"

No trick, and hopefully a treat!


Christopher Poelker is the author of Storage Area Networks for Dummies, and he is currently the vice president of Enterprise Solutions at FalconStor Software.

Copyright © 2009 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon