IBM and Google float in parallel clouds (and makeamin)

Woof! It's Tuesday's IT Blogwatch: in which IBM and Google help teach students how to use cloud computing paradigms. Not to mention making a Theremin...

Grant Gross reports
:

Google Inc. and IBM have teamed up to offer a curriculum and support for software development on large-scale distributed computing systems, with six universities signing up so far. The program is designed to help students and researchers get experience working on Internet-scale applications ... [using] the relatively new form of parallel computing, sometimes called cloud computing, [which] hasn't yet caught on in university settings ... techniques that take computational tasks and break them into hundreds or thousands of smaller pieces to run across many servers at the same time [which] allow Web applications such as search, social networking and mobile commerce to run quickly ... A cloud is a collection of machines that can serve as a host for a variety of applications, including interactive Web 2.0 applications. Clouds support a broader set of applications than do traditional computing grids, because they allow various kinds of middleware to be hosted on virtual machines distributed across the cloud.
...
IBM and Google are providing hardware, software and services to add to university resources ... IBM and Google have dedicated a cluster of several hundred computers -- including PCs donated by Google and IBM BladeCenter servers -- and the companies expect the cluster to grow to more than 1,600 processors ... IBM and Google have created several resources for the program, including the following:
  • A cluster of processors running an open-source version of Google's published computing infrastructure, including MapReduce and GFS from Apache's Hadoop project, a software platform that lets one easily write and run applications that process vast amounts of data.
  • A Creative Commons-licensed curriculum on parallel computing developed by Google and the University of Washington.
  • Open-source software designed by IBM to help students develop programs for clusters running Hadoop. The software works with Eclipse, an open-source development platform ...
The University of Washington signed up with the program late last year. This year, five more schools, including MIT, Stanford University and the University of Maryland, have joined the program. The two companies expect to expand the program to other universities in the future. [more]

Jacqui Cheng adds:

The idea for the program came from Google senior software engineer Christophe Bisciglia, who said that while interviewing close to a hundred college students during his time at Google, he had noticed a consistent pattern. The pattern was that, despite the extreme talent of these potential job candidates, they "sort of stalled" when asked to think about algorithms the way that Google does ... Biscliglia then began working with his alma mater, the University of Washington, to develop the curriculum in his 20-percent time (the paid time that Google allows employees to work on their own projects) to better prepare students for a changing industry
...
The two companies [will] offer millions of dollars in resources to universities in order to promote cloud computing projects ... The companies hope to expand the program in the future and grow the cluster to over 1,600 processors. As an example of one of the projects that has already been performed on the cluster, Google says that University of Washington students were able to use the cluster to scan the millions of edits made to Wikipedia in order to identify spam and organize news by geographic location.
...
Of course, the cloud computing initiative isn't designed just to offer resources to students—Google and IBM have a vested interest in making sure that students at these top universities keep coming to their companies after graduating ... The pairing may seem odd upon first blush, but both Google and IBM recognize that they bring two sets of expertise to the table that can make the project succeed. IBM's experience in running data centers, combined with Google's obvious experience in running web apps on giant clusters, complement each other. [more]

And here's the mastermind; Christophe Bisciglia:

Just as people are social animals, computers are social machines—the more, the merrier. Twenty or thirty years ago, large, centralized mainframes sat alone in sheltered bunkers in computer science departments and government offices alike, choking for hours on mere megabytes of data ... One computer just won’t hack it; these days, to support a new paradigm of massively parallel systems architecture, we need to break the machine out of its bunker and give it some friends.
...
This is how I, along with my good friend and mentor Ed Lazowska of the University of Washington’s CSE department, started to think about CS curricula and the obstacles to teaching a practical and authentic approach to massively parallel computing. It's no easy feat. Teaching these methods effectively requires access to huge clusters and innovative new approaches to curricula ... All of the course material developed by UW as well as other tools and resources to facilitate teaching this cutting- edge technology is available at http://code.google.com/edu. [more]

Rich Miller tilts at nothing: [Eh? Oh, ha ha -Ed.]

The initiative highlights the growing importance of the data center as a development platform for scalable web-based software. While the universities will see the immediate benefits, the project also figures to advance the business interests of Google and IBM, both prominent backers of open source software. Cloud computing services are a major focus for Microsoft, and the new research consortium will train developers to build similar apps on an open source platform. As more computing functions shift to cloud-based software as a service (SaaS), so will the competitive battle between Microsoft and the open source community.

For IBM, Google and other companies developing for open source platforms, the availability of developers with expertise in SaaS apps is crucial. [more]

Josh Catone meows alone: [Now you're just being silly -Ed.]

The SETI@Home project is probably the most famous distributed computing project, but many of the online services we use today utilize data centers with thousands of commodity servers operating in tandem using the same basic concept ... Amazon already offers access to their compute cloud as a service.

Google and IBM are betting that cloud computing will continue to be important on the web and by training future engineers on their tools they can ensure themselves access to the top minds in the field. [more]

But Ethan Stock stirs the soup: [Just cut it out -Ed.]

Tech journalism stinks ... None of the commentaries or source articles mention Amazon, who's done more in this area with EC2 and S3 than anyone.
...
The Google press release has key details ... Key questions answered: No Google code open-sourced ... No advanced functionality (BigTable) -- just MapReduce/GFS as implemented in Hadoop.
...
This all fits quite nicely -- IBM gets a great new Open Source Java/Eclipse program to promote (Hadoop is all written in Java), and Google gets to promote its world-view without going through the hassle of open-sourcing any of its own code. [more]

Brian Harris is taking the UofW pilot class:

[Knowing] how to write the highly parallelizable computer programs necessary to solve all sorts of Google-scale problems ... is becoming increasingly important for many types of problems; Moore's Law is failing (has failed) and the current workaround is to tie many computers together and spread out your workload across them.
...
Google donated a ~40 node cluster to UWCSE which we've been running Hadoop on and using for the class; it has something like 20 terabytes of usable (i.e. not counting redundancy) disk capacity. Hadoop is an open source implementation of MapReduce and GFS written in Java. I think their idea for the future is to make this a 300 level course which would be a prerequisite for some of the 400 level courses thereby allowing the instructors of these other courses to create assignments which could utilize a cluster. This is an ambitious goal for many reasons, and I don't see it being accomplished as stated; a more reasonable goal would be to make it a 400 level class structured like other 400 levels without it being a prerequisite for other classes. Also someone needs to add permissions to Hadoop's DFS.

We've had a few class projects: some simple introductory stuff, computing PageRank on a corpus of Wikipedia pages, and k-means clustering the Netflix prize data using canopies. Now everyone is doing a final project of their choice, I'm working on the video-mosaic project I talked about in the last post. [more]

Buffer overflow:

Around the Net Around Computerworld Previously in IT Blogwatch

And finally... Did you learn how to make a Theremin this weekend?

Richi Jennings is an independent analyst/adviser/consultant, specializing in blogging, email, and spam. A 20 year, cross-functional IT veteran, he is also an analyst at Ferris Research. You too can pretend to be Richi's friend on Facebook, or just use boring old email: blogwatch@richi.co.uk. Happy birthday, Oscar!

Copyright © 2007 IDG Communications, Inc.

  
Shop Tech Products at Amazon