IBM's 120 petabyte drive could help better predict weather
Massive drive would store up to 1 trillion files (video below)
Computerworld - The development of the world's largest single-file name data repository could help predict weather and prevent overhyping of hurricanes like Irene.
Forecasters had predicted Irene could devastate cities such as Washington and New York, but instead some of the most severe damage occurred far further inland in states such as Vermont, which was drowned in tropical-storm downpours.
Several post-Hurricane Irene reports pointed to inaccurate forecasts as problematic. As the UK publication, The Guardian, wrote: The "storm surge that could have swamped [Manhattan] failed to materialize." And many New Yorkers were unhappy about having prepared for the worst only to experience little to no damage.
Enter IBM's Data Storage Group at Almaden, Calif., which has proved it can build a 120PB data system by using 200,000 SAS (serial SCSI) drives -- all configured as if it is a single drive under one name. That's roughly 30 times larger than the biggest single data repository on record, according to IBM. The system could store up to 1 trillion files. Even the Wayback Machine, a massive data time capsule created by The Internet Archive to store everything on the Web since 1996, only holds 2PB of data.
IBM said it chose high-performance SAS drives over high-capacity SATA drives because the system has high bandwidth requirements. The drives are also connected via a backbone that uses the SAS (serial SCSI) protocol, but the storage is connected to compute nodes via a proprietary fabric, which IBM would not disclose.
The technology for IBM's massive data store, which the company plans to begin installing in several customer sites later this year, would be ideal for creating more powerful high-performance computing systems that perform tasks such as climate modeling.
To be sure, Hurricane Irene packed plenty of punch. At least 21 people in eight states died as a result of the storm. And early estimates for damage top $7 billion. But most models showed the storm hitting the East Coast with far more force than it did.
"As with any of these high-performance computing simulations ... the more variables you can look at, the more granular you can be, the better the models. Hopefully, the better the model, the better the prediction," said Bruce Hillsberg, director of Storage Systems Research at IBM. While IBM used the weather simulation as an example, it would not say who its customers were for the data store.
IBM's 120PB data store has yet to be built. The company will be assembling it in the data centers of several customers over the next year, but the base technology to build the systems has been around for many years. The technology, IBM's General Parallel File System (GPFS), is already used in a number of IBM products, including its scale-out NAS (SONAS) array, which IBM brought to market last year, and can scale to 14PB of capacity. IBM also uses GPFS in its strategic archive product called the IBM Information Archive, as well as its cloud storage service offerings.
IBM has been using GPFS to build massive data stores since 1998. Back then, the largest single virtual drive was 43TB, a capacity that's easily achieved in a single data center rack today.
In fact, IBM's GPFS technology was the data store behind IBM's Watson supercomputer, which earlier this year demonstrated its processing prowess by handily beating champions of the game show Jeopardy. That system boasted a 21.6TB data store.
It was for that very reason, the massive growth in customer data storage requirements, that IBM built its latest GPFS storage system.
- Data Warehouse Augmentation: The Queryable Data Store While organizations have, to date, been busy exploring and experimenting, they are now beginning to focus on using big data technologies to solve...
- Rebranded Quadmark revamps its IT solutions with Google Apps Switching to Google Apps halved Quadmark's IT admin costs while achieving 10% time savings per employee. The global consulting firm now spends 80%...
- CrashPlan PROe Security Because mobile laptops often are connected to unsecured networks, a very high standard of security is required to ensure privacy.
- Protecting Digitalized Assets in Healthcare Healthcare providers face an urgent, internal battle every day: security and compliance versus productivity and service. For most healthcare organizations, the fight is...
- Live Webcast LIVE EVENT: 5/7, The End of Data Protection As We Know It. Introducing a Next Generation Data Protection Architecture. Traditional backup is going away, but where does this leave end-users?
- LIVE EVENT: 5/7, The End of Data Protection As We Know It. Introducing a Next Generation Data Protection Architecture. Traditional backup is going away, but where does this leave end-users?
- Make or Break: New Auto Products Must Go To Market On Time This Webcast quantifies the value of time to market for the auto industry and highlights how Primavera Enterprise Portfolio Management can help organizations. All Data Storage White Papers | Webcasts