Controlling the enterprise information life cycle

There's a stage in the life of a new technology in which half the world thinks it's a whole new paradigm and the other half thinks it's all hype. Half says it will never happen whereas the other half says, "We're doing it now." And even the most improbable vendor claims to have strategies and products to support it. So it is with ILM (information life-cycle management).

The current darling of the storage industry, ILM is based on two simple concepts. First, not all information has the same value to the organization. Second, whatever value information has tends to change over time.

If these assumptions are true, then why apply the same level of expensive storage, management, and protection to all information in an enterprise? By moving less-valuable information to less-expensive storage and applying appropriate levels of protection to each storage tier, companies save money and reserve high-end resources for the information that demands them.

The result: Mission-critical systems are less bloated, more stable, and better performing. Backup windows shrink, storage runs out less often, upgrades are less frequent, and the overall cost of storage and storage management drops.

That's the idea, anyway. Given such a tall order, it makes sense to be skeptical about what, if anything, ILM can do for you on an enterprise scale. But once we stopped worrying about the "grand vision" of ILM and focused on the reality, we found that a number of nascent, policy-based point solutions are already providing real benefit to organizations challenged by exploding storage and complex compliance requirements.

What it is and is not

Superficially, at least, the ILM concept resembles earlier storage technologies, including HSM (hierarchical storage management) and DLM (data life-cycle management). Whereas DLM focused on data as the unit of storage and whereas HSM tended to associate data with applications and moved that data based on a single criterion, time, ILM sets policies based on the value of the information that data carries, regardless of the application or time.

"In terms of information, HSM is brain-dead," says Jeremy Burton, executive vice president of Veritas' Data Management Group. For example, he says, one e-mail might require a different storage policy from the next, depending on its subject, sender, or relationship to a particular lawsuit. Similarly, health records don't always decrease in value; they may have to be quickly accessed if a patient has a recurrence. In these cases, it's the information contained in each parcel of data that's important, not the data itself.

The other difference is protection. "In some ways ILM is like HSM, but you protect each tier differently," says Nancy Hurley, a senior analyst at Enterprise Strategy Group. "So you may snap tier 1 every few hours and do incremental backups every day. Tier 2 only gets backed up once a week. Tier 3 never gets backed up; you replicate it and that becomes your store."

Finally, in most cases ILM assumes that despite migration or archiving, data will continue to be accessible for a long time, either as an identical archive instance or as a searchable repository.

Smarter storage now

The two principal drivers behind ILM are exploding storage management costs and compliance. Which one is more important depends on whom you talk to.

"Many people assume it's compliance that's driving ILM," Hurley says, "but only two out of 10 users I interviewed cite compliance as the main reason they are interested. Most of the rest cite cost savings."

Take the North Bronx Healthcare Network, which oversees several New York City public health facilities. "We did some analysis and found that 84% of our data is stagnant," CIO Daniel Morreale says. "So using EMC's DiskXtender software, we applied some business rules to move the data automatically from our EMC Symmetrix DMX [Direct Matrix Architecture] storage to a less expensive NAS, if [the data] isn't used for six months, and then to our EMC Centera CAS [Content Addressed Storage] fixed content storage six months after that."

All of the files, however, are easily accessible to users. "The difference between accessing files on the SAN and NAS is imperceptible," Morreale says, "and getting files off the CAS takes maybe an extra one and a half seconds." Morreale says this tiered storage model lowered his storage and staffing costs significantly and enhanced business continuity, in addition to aiding HIPAA compliance.

Michael Howard, CEO of ILM vendor OuterBay, says compliance issues account for much of his business. For example, when Tektronix consolidated its Oracle systems from 27 countries to two locations in Beaverton, Ore., its storage requirements exploded and compliance issues became much more complex.

"In the U.S., a customer invoice has to be retained for five years," says Lois Hughes, senior manager of business application systems at Tekronix. "But in Italy it's 12 years and in China, 15."

Tektronix deployed OuterBay's Live Archive to move transactions from its production environment to a less expensive read-only archive storage tier after two years. Different levels of protection are applied to each tier, because stable data doesn't need to be replicated or backed up as often as live data. And data on the archive tier is readily available to users. "It looks just like the production environment; no user training required," Hughes says.

The next step will be to move data after six years to a third tier: OuterBay's Encapsulated Archive, a self-describing XML archive store. "This brings us from the huge, demanding Oracle application environment to compact, Oracle-release-independent XML storage. We can still run queries and reports. SQL code identifies the owner of the transaction, so the system will know that if the legal owner is Tektronix Germany, it should purge after 10 years and one day."

Frank Harbist, vice president and general manager of storage software and ILM at Hewlett-Packard Co., sees yet a third driver: information leverage. "We see more and more companies wanting to use information as a way to help run their business more effectively," he says. "They want more of it accessible so they can take advantage of data mining, business decision support, and analytics tools to gain competitive advantage."

Vision vs. reality

How long will it take to achieve the full ILM vision? Experts only agree that it is at least a few years away. Missing from today's ILM offerings is the enterprisewide, single-console ideal -- tools that would allow an enterprise to classify all its information according to value, set up a single system of storage tiers, and apply migration and protection policies across it all using a single management tool. Much more common are point solutions, each with different emphases and capabilities.

For example, companies such as OuterBay and Princeton Softech concentrate on structured data found in Oracle databases, as well as CRM, ERP, and supply-chain-management systems. Other solutions from EMC/Documentum, HP, and Ixos target unstructured data such as files and images. E-mail archiving solutions from iLumin, Ixos, Veritas, and Zantaz focus almost solely on messaging. StorageTek has separate point solutions for structured data and e-mail. Various other point solutions are available from such vendors as Hitachi Data Systems, IBM, Network Appliance, and Sun.

Second, most current solutions calling themselves ILM only move and archive data. Protection at each tier -- in the form of mirroring, replication, and backup -- is generally left to the storage manager for implement using other solutions. The long-term vision of ILM assumes that a management architecture will tie the two together and manage them as one, possibly as just another feature of an overall storage management platform.

For information awareness to truly become a reality, the applications that capture and use information will inevitably have to be involved as well. Currently, products such as EMC's Xtender line and OuterBay's Application Data Management suite act as a kind of application-aware middleware sitting between individual applications and storage, providing information awareness and policy-based data movement and disposal. Also, Oracle Database 10g provides some of its own data partitioning and storage tiering and management capabilities.

"There has to be a marriage between the storage and applications," Hurley says. "Once I've [assigned a value to] my information, all the applications that will use, move, migrate, protect, and retain it should understand that valuation from the moment it's created. It's going to come from the vendors providing open APIs to work with each other and a level of integration such that policy engines understand each other."

Legacy applications are often critical roadblocks to enterprisewide ILM. "We thought we could use our ILM solution to move information from our 12-year-old clinical information system to [EMC's Centera] CAS," Morreale says, "but in fact we can't because the application doesn't allow the freedom to move data dynamically."

Metadata will also be a key provider of information awareness. "There are all sorts of things people will want to trap," says Ken Steinhardt, director of technology analysis at EMC. "They'll need multiple metadata views -- not just how frequently something was accessed, but also tagging things that might be associated with a sensitive project with a 20-year retention requirement."

Eventually, fulfilling the ILM vision may require standards as well. SNIA's Data Management Forum is in the early stages of crafting an ILM model, but most experts agree that ILM standards are many years away.

Forget the vision

The reality may be that enterprise ILM is too huge a project for many companies to take on. "ILM has expanded to mean everything in storage hardware, software, and services," Jeremy Burton of Veritas says. "Customers don't know where to start because ILM sounds like some kind of ERP project that will grow out of control and take 10 years."

One way companies can cope is to stop worrying about the vision. Instead, start with the areas that are giving you the most pain. For many organizations, e-mail is a major source of pain and a great place to start, particularly with its compliance challenges. Others may find that ERP or CRM data hurt the most.

Wherever the pain is, the first step is a careful process of information discovery, analysis, and classification. Hurley recommends taking advantage of SRM (storage resource management) applications. "They'll tell you quickly what you can get rid of and what's taking up the most data, and you'll be able to see access patterns clearly," she says. "You'll probably be surprised at what you find."

The result of this analysis should be a system that puts your information into categories based on performance, protection, and retention requirements during its life cycle. Then, based on the storage needs identified by each classification, decide on a series of storage tiers, each with its own appropriate performance, availability, and protection service levels. Finally, investigate policy-based automated data-moving solutions, such as those from EMC, HP, Veritas, and others, which address your requirements.

Many companies start ILM with one application or department, or to solve a particular problem, such as compliance. The key is to get familiar with the process and see what it can do for your organization. Then you can argue about the ILM vision over lunch.

This story, "Controlling the enterprise information life cycle" was originally published by InfoWorld.

Copyright © 2005 IDG Communications, Inc.

8 simple ways to clean data with Excel
Shop Tech Products at Amazon