Archive and backup: What's the difference?

Companies of all sizes have one thing in common: They create data and lots of it, including customer information, product specifications and accounting files. In fact, many corporations double their amount of internal data each year. With this level of growth comes the challenges of protecting that data from accidental deletions and disasters and complying with regulatory requirements for long-term retention.

In the past, protection and retention were handled by copying or moving data to tape. But the improving economics of disk storage and the emergence of archiving solutions create new options.

Backup

Backup technologies have long provided effective recovery options for systems subject to data loss from human error, hardware failure or major natural disasters. They are ideally suited for quick restoration of large amounts of lost information and can return complete systems to full operational capacity in a short period of time. However, backup also is a major pain point for storage administrators. Massive amounts of data can strain the ability of backup infrastructures to keep up. According to Gartner Inc., the average data center has a backup success rate of only 87%. Many would also tell you the ability to successfully recover data is even lower.

The time required to back up data is shrinking, and the ability to quickly restore information is significantly improved. By effectively leveraging backup across both tape and disk options, companies can increase the throughput and reliability of their disaster recovery infrastructure at a reasonable cost. Further augmenting traditional backup with replication capabilities will help solve the most rigorous data protection requirements.

However, these technologies will be only stopgap measures if the uncontrolled growth in the amount of data requiring backup isn't curtailed. This becomes a real danger when a company treats backup as a single solution for both data protection and data retention, resulting in highly ineffective and inefficient data management.

For example, most organizations perform nightly incremental and weekly full backups and retain backup data for three months to protect data in case of accidental deletions. A second copy of the data might be replicated (or shipped via tape) to an off-site location to protect against disaster. If you add on the requirement to retain the backup data for a period of years to meet data retention requirements, you significantly increase your backup overhead. An increase in data equates to an increase in costs, particularly in terms of time, money and personnel.

Let's take a hypothetical company like ABC Corp., which has 10TB of data on production file servers. Company policy is to create daily incremental backups onto disk storage and weekly full backups onto tape. These tapes have historically been cataloged and maintained for three months before being cycled back through the process. However, new corporate governance rules coincide with governmental regulations, so the new policy states that all data related to quarterly financial results must be retained for five years. Because ABC doesn't differentiate among the different types of data on the network, everything is backed up and retained for five years. With 10TB in production, ABC's backup data will add up to 2.5 petabytes, not counting increases in the amount of data. By keeping a weekly version of the tapes for five years, ABC is devoting increasing amounts of time and resources to data backup.

File Archiving

By introducing file archiving, corporations can improve their service levels for backup and recovery while reducing backup costs. File archiving can also meet regulatory requirements for data retention, managing files with complete knowledge of the file system and document metadata, as well as knowledge of the files' content. A file archiving system moves or copies files according to the value of the actual content. They also find and retrieve individual files based on their content, which could include any number of parameters, including author, date and customized tags such as "audit" or "Sarbanes-Oxley."

To effectively manage data, file archiving systems discover all files on a network and provide an inventory of unstructured data. During the discovery process, the systems collect file system metadata and extract file contents, building a foundation for data classification and application of information governance policies.

A file archiving system must provide the following capabilities:

  • Be content-aware. For example, it should index the content in the documents, not only the file system metadata.
  • Populate customized metadata tags by extracting information from content.
  • Prune production storage by using policies to archive information to the appropriate tiered storage level.
  • Archive a subset of data (defined by archival policies) selectively to meet regulatory compliance and corporate information governance rules.
  • Provide quick access to archived data.

Let's take another look at ABC Corp. ABC recognized a potential data crisis and deployed a file archiving product to complement its existing backup. The file archiving product crawled the entire network and created metadata abstracts of each file, which contained file system metadata, document metadata and automated custom metadata. The product determined that 70% of the unstructured production data was stale (not accessed in the past 90 days) and that 5% of the unstructured production data was related to quarterly financial reporting.

Because of the ability to classify, manage and retrieve files with set policies according to business value, file archiving applications can be used in any number of real-world settings, where intelligent and often fast access to information stored across a diverse and distributed set of storage platforms is required.

Some of those application areas include regulatory compliance, legal discovery, corporate governance and tiered storage.

Conclusion

File archiving and backup systems have two distinct and complementary functions within an enterprise: backup for high-speed copy and restore to minimize the impact of failures, human error or disaster; and file archiving to effectively manage data for retention and long-term access and retrieval. These two capabilities can be applied together to optimize the cost and improve the overall effectiveness of any storage infrastructure. Backup is more efficient in an environment that has an effective archiving solution, and archives still leverage the backup infrastructure for its own data protection needs. Both applications are important for an effective data management strategy.

Sudhakar Muddu is CEO and founder of Kazeon Systems Inc., a data management vendor in Mountain View, Calif.

Enterprise mobility 2018: UEM is the next step
  
Shop Tech Products at Amazon