Honors Program: Iowa State University

Project Name: Cracking the Corn Genome Code

Deciphering the genome of corn, or maize, is a complex scientific challenge that is currently being addressed by the scientific community. The maize genome will be the foundation supporting decades of future research by scientists on higher yields, improving nutrition and biorenewable fuel. Preliminary efforts to crack the maize genome code started a decade ago, culminating in a three-year-long, $32 million project for maize genome sequencing that was announced in November of 2005 and is a joint initiative by the National Science Foundation (NSF), the U.S. Department of Agriculture and the U.S. Department of Energy. Sequencing genomes requires computationally assembling tens of millions of short DNA pieces of the genome, like a giant jigsaw puzzle. Through their efforts since 2003 and currently as members of this project, Iowa State University researchers have developed a unique set of supercomputing solutions to assemble genomes and analyze them through comparison with other sequenced genomes. Their solution is available as the parallel software framework PaCE, which can run on as many as thousands of processors. The software has reduced the time for a complex genome assembly from the three months it typically takes with current technology to the range of several hours to a few days.

An optimized version of this software technology on a 1,024-node IBM Blue Gene/L supercomputer has been used to generate draft assemblies of maize and sorghum. This has enabled the research of hundreds of maize geneticists and plant scientists months in advance. The software is in use by over 50 research groups from 11 countries to solve a variety of problems involving large-scale sequence analysis, especially to study large collections of pieces of DNA derived from genes. It is also being used by Pioneer Hi-Bred International Inc., now a subsidiary of DuPont. Iowa State researchers provided early maize genome assemblies using this software, which has enabled the work of hundreds of scientists and fostered innovation.

Introductory Overview

CONTEXTAssembling genomes from short DNA fragments, discovering genes by processing DNA fragments derived from genes and identifying subtle differences in genomes that cause genetically inherited diseases all require processing large collections of DNA fragments. The present technology for solving these problems involves discovering pairwise relationships between DNA fragments and putting the results together. Sometimes, this effort is manually parallelized by partitioning the data into pieces and running different programs on different computers. These have the effect of slowing down important scientific projects significantly or, in some cases, make it very difficult to analyze important data. For example, assembling complex genomes takes over three months with existing technology. When genomes have many repeats, such as in maize, conventional programs require exorbitant memory. New high-throughput sequencing technologies, such as from 454 Life Sciences Corp., make it even more important to improve data analysis capabilities. With current technology, the rate at which sequences can be analyzed is far behind the rate at which they can be generated.

1 2 3 4 5 6 Page 1
Page 1 of 6
7 inconvenient truths about the hybrid work trend
 
Shop Tech Products at Amazon