IDG News Service - IBM researchers have developed a new algorithm that could in minutes analyze terabytes' worth of raw data to more quickly predict weather and electricity usage, the company said today.
The mathematical algorithm, developed by IBM's laboratories in Zurich, can sort, correlate and analyze millions of random data sets, a task that could otherwise take days for supercomputers to process, said Costas Bekas, an IBM researcher.
The algorithm is just under a thousand lines of code and will be instrumental in establishing usage patterns or trends based on data gathered from sources such as sensors or smart meters, he said. The algorithm could be used to analyze a growing mass of data measuring electricity usage trends as well as air or water pollution levels. The algorithm could also break down data from global financial markets and assess individual and collective exposure to risk, Bekas said.
"We are interested in measuring the quality of data," Bekas said. Efficient analysis of large data sets requires new mathematical techniques that reduce computational complexity, Bekas said.
The algorithm combines models of data calibration and statistical analysis that can assess measurement models and hidden relationships between data sets. IBM has been working on the research for two years, Bekas said.
The algorithm can also reduce the cost burden on companies by analyzing data in a more energy-efficient way, Bekas said. The lab used a Blue Gene/P Solution system at the Forschungszentrum Julich research center in Germany to validate 9TBs of data in less than 20 minutes. Analyzing the same amount of data without the algorithm would have taken a day with the supercomputer operating at peak speeds, which would have added up to higher electricity bills, Bekas said.
According to Top500.org, the Blue Gene/P is the fourth-fastest supercomputer in the world as of last November, with 294,912 IBM Power processing cores that can provide peak performance of up to 1 petaflop.
The traditional approach to data analysis is to take multiple data sets and look at them individually, said Eleni Pratsini, manager of mathematical and computational sciences at the IBM research labs. However, the algorithm compares data sets against each other, which could help enterprises point toward larger trends in particular areas, such as risk reduction in financial portfolios.
Enterprises will want faster ways of generating business intelligence as masses of data flood servers with the expansion of computing to new devices, he said.
Now that the algorithm has been proven to work scientifically, the research lab is collaborating with IBM's Global Services unit to use it for specific services, Pratsini said. Ultimately, the algorithm could make its way to IBM applications such as the SPSS statistical analysis software, but the company didn't provide a specific time frame for that.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- 4 Customers who never have to refresh their PCs again This paper illustrates a common theme: the combination of desktop virtualization and thin client computing helps organizations deliver an up-to-date user experience more...
- Mobile Devices: The New Thin Clients Get essential guidance for understanding the role thin clients plus virtual desktops play in the enterprise today.
- Taking Windows Mobile on Any Device Taking Windows applications mobile has many advantages, but the process of identifying a solution is complex. Learn how to solve this complex problem...
- PaaS - Powering a New Era of Business IT Why PaaS has suddenly become relevant and irresistible to many organizations. Dive into the opportunities and considerations associated with using PaaS from an...
- Redefine Your IT Operations: Remote Office IT Has Never Been Simpler Join us to see why PC Pro named Dell PowerEdge VRTX the "2013 Server of the Year." PowerEdge VRTX may be just what...
- Webinar: Building a Big Data solution that's production-ready Big data solutions are no longer just a nice-to-have. All Hardware White Papers | Webcasts