Petascale storage may trickle down to you
As storage moves to the forefront of supercomputer research, advanced technologies from national labs and universities are expected to trickle down to commercial users.
Computerworld - Discussions about super computer performance almost always center on processing speed — how many gazillion operations per second can be performed by the giant machines. Makers and users of supercomputers also like to brag about things like the number of processors, the amount of memory and the bandwidth available for moving data about.
Such metrics are important determinants of how much work the machines can do. Less often focused on, but becoming critically important, are questions of storage: How much disk capacity do the computers have? How fast can data be written to and read from storage? How easily and quickly can an application be restarted when a disk fails? How can file systems be scaled up to efficiently handle petabytes of information? How the heck can you find something when your system has 30,000 disks?

Ethan Miller, professor at University of California, Santa Cruz
That system may not much resemble the one used by your accounting department, but the computer scientists at the institute say — and the vendor sponsors are hoping — that new technologies from petascale storage research will trickle down to commercial users.
“The use of high-performance computer clusters in many commercial applications, [such as] oil and gas, semiconductors and biotechnology, is growing substantially,” says Garth Gibson, a principalinvestigator for the PDSI and a professor at Carnegie Mellon University. He adds that companies are increasingly using supercomputers to boost revenues. “High-performance computing is not so much about cost reduction as it is about improving the quality of products,” Gibson says.
Disk Dilemmas
Storage systems have the unfortunate quality of not scaling well. Here are some of the problems that PDSI researchers will try to solve:
- Disk access times have not kept pace with disk capacity. In 1990, a computer could read an entire hard drive in under a minute. Now it takes three hours or so to read the largest disks. “It’s only going to get worse, and it will take longer and longer to recover from a disk failure,” Miller says.
- As the number of disks in a system increases, so does the probability that one will fail in any period of time. Right now, big systems at the national laboratories fail once or twice a day, but with multipetabyte systems, that rate could increase to a failure every few minutes.
- When a disk does fail, the ones that must restore the affected data to another disk have to work even harder, increasing the chances that one of them will fail too.

Garth Gibson, professor at Carnegie Mellon University
One promising approach that’s now coming into use at the national labs is a technology called object storage, by which clients can access storage devices directly without going through a central file server. Object storage devices have processors attached to them so that lower-level functions, such as space management, can be handled by the devices themselves. And because data objects contain both data and metadata, it’s possible to apply fine-grained, highly intelligent controls for security and other purposes. What’s more, object-based storage systems tend to be much more scalable than traditional ones.
Researchers will also work on protocols and APIs, especially those related to Linux. They will help develop extensions to Posix, the portable operating system interface for Unix, to enable more effective use of file systems in highly parallel computer clusters. Researchers will also work with The Open Group and the Internet Engineering Task Force to make the Network File System protocols for file access more capable in highly parallel systems.
The PDSI will explore a number of emerging technologies, such as phase-change RAM, Miller says. PRAM, recently announced by Samsung Electronics Co., offers the speed of dynamic RAM with the nonvolatility of flash memory. Miller says it’s the perfect place to put metadata because it can be accessed much more quickly than if it were on disk, thereby making object storage systems much faster.
Miller says PRAM might also be used to store indexes used by search engines, greatly accelerating them as well. That increased speed may prove to be of interest to businesses such as oil companies that have huge stores of private data but lack the enormous resources of a company like Google Inc.
Few corporations will ever have systems the size of those at the national labs, with tens of thousands of disks, says Miller. But even desktop systems, which will have more and more disk drives over time, will experience some of the challenges the PDSI will address.
“I can’t tell you yet which ones they will be,” he says. “But problems at the high end have a nasty habit of trickling down to the low end.”

Source: Panasas Inc., Fremont, Calif.
Read more about Data Storage in Computerworld's Data Storage Topic Center.
- 10 Hot Big Data Startups to Watch
- 11 Unique Uses for Google Glass, Demonstrated by Celebs
- How to Export Your Google Reader Account
- How to Better Engage Millennials (and Why They Aren't Really so Different)
- Telltale signs of ATM skimming
- 20 security and privacy apps for Androids and iPhones
- Big screen con artists: 7 great movies about social engineering
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- The Total Cost of Email In this white paper, we'll explore the true costs of fragmented email management and uncover how to reduce those costs with a cloud-based...
- The Shape of Email The shape of email is a starting point in helping us understand the qualify of the information residing in the inboxes of organizations...
- SaaS with a Face: User Satisfaction in Cloud-Based E-mail Management with Mimecast Learn how a carefully targeted SaaS approach can add value to your email environment and potentially result in better services within a much...
- Sepaton Boosts Performance and Connectivity Options Senior ESG analyst Jason Buffington and Research Analyst Monya Keane describe the Sepaton S2100-ES3 Series 2925 data protection appliance (version 7.0) for large...
- 3 Reasons Why Sepaton is the World's Fastest Backup Solution Leading analyst, Storage Switzerland learns how Sepaton backs up and deduplicates massive data volumes while maintaining the industry's fastest performance - all in...
- Gartner Key Data Protection Challenges Analyst Video Shifting market dynamics, new delivery models and environments, data created at the endpoints, and flatling budgets mean the data center is undergoing a... All Data Storage White Papers | Webcasts