Data+ Awards: Harvard's Clean Energy Project gets a massive speed boost

Computing grid accelerates data sharing among scientists

Harvard University professor Alan Aspuru-Guzik and his team are supporting the search for organic compounds that could be used in the next generation of solar power cells.

To date, Harvard's Clean Energy Project has studied 2.3 million compounds and accumulated 500 terabytes of molecular data.

A massive undertaking, for sure, but Aspuru-Guzik's team has it covered. They tapped into the IBM World Community Grid -- a distributed platform that uses the spare processing power of about 6,000 computers made available by volunteers around the world -- to perform quantum chemical calculations on millions of organic materials.

This approach allowed the researchers to perform in three years, from 2010 to 2013, calculations that would have taken 17,000 years on a single computer.

The calculations determine which compounds are most promising for use in solar power cells. All 2.3 million compounds are ranked from most promising to least based on those computations, Aspuru-Guzik explains.

Results are kept in a system of large data storage arrays known as "Jabba." Based on a design by Backblaze, each array in Jabba utilizes 45 3TB hard drives from HGST, a Western Digital company.

The technology allows the team to share data on materials with other researchers on an unprecedented scale. The sharing occurs through a portal called Molecular Space (at

Aspuru-Guzik says his team can provide raw data to researchers who request it. Those researchers can then search for compounds based on, for example, specific properties, or they can plug in specific data to make predictions about particular molecules.

"It's an extremely valuable data set, and it has the potential to push us years ahead," says Marcus Hanwell, technical leader at Kitware, a Clifton Park, N.Y.-based company that makes and supports open-source software frequently used by researchers.

Hanwell says the CEP's analysis helps other researchers avoid going down "blind alleys" and spending months or even years researching compounds that ultimately won't yield results.

Kitware is collaborating with the Clean Energy Project to develop software tools that help researchers analyze the data.

At the same time, Kitware is using the data set -- one of the largest that is open for such use -- to develop open-source software and simulation code that could benefit researchers in general.

"There are all these secondary benefits you can get with a large data set that's open for the community," Hanwell says.

Next: Ingram Micro nets 135% increase in service renewals with BI

Pratt is a Computerworld contributing writer in Waltham, Mass. You can contact her at

Copyright © 2013 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon