David DeWitt's journey to becoming one of the world's leading academic experts on databases started off almost by accident.
"I had taken one database class in graduate school," DeWitt recalled. "That was enough that when I showed up as a new faculty member at the University of Wisconsin, Madison (in the mid-1970s), the chairman said, 'You're the new database guy.'"
DeWitt took the ball and ran with it. After three decades in the field, DeWitt's resume includes the co-invention of three parallel databases -- including one that was sold to NCR Inc., publication of more than 100 technical papers and numerous awards and honors from his database peers.
DeWitt retired from the University of Wisconsin last year. But he has already returned, this time as a Microsoft Technical Fellow and head of a new database research center located on the Madison campus and funded primarily by his new employer.
DeWitt will talk about the center during a keynote speech Friday at the Professional Association for SQL Server's annual conference, which is taking place this week in Seattle.
The confab has 2,500 attendees, many coming to learn about Microsoft's recently released SQL Server 2008 or hear about Microsoft's road map as it attempts to move into the high-end business intelligence arena dominated by Teradata Inc. and small data warehousing appliance vendors.
On Wednesday, Microsoft demonstrated a feature that will let database administrators manage pools of hundreds of SQL Server databases at a time.
For DeWitt, the lab is an opportunity to do the same sort of research he has done for the past 32 years, but also see those results make their way into products, namely SQL Server, in a much shorter time frame.
It also gives DeWitt the financial backing that computer science academics, especially those in the database field, have lost in recent years.
"Researching query optimization on parallel systems -- this is not something you can go to [the National Science Foundation] or DARPA and get money for anymore," DeWitt said. But he added that cutting-edge database research was already shifting away from academia to industry.
"In the old days, you could take a small group of grad students and build a state-of-the-art prototype of a database system," he said. "Systems are so complex these days, it's hard to make headway with only five grad students."
Also, "the smartest students from abroad don't come for their [computer science] Ph.Ds anymore, they go and join investment banks," DeWitt continued. "So industry has really taken over a leadership role. It's one reason I left academia."
DeWitt would also love to taste some of the "success after success" of a good friend of his, database industry legend Michael Stonebraker.
A professor at both MIT and the University of California, Berkeley, Stonebraker is generally credited with helping invent two seminal databases, Ingres and Postgres. The former underlies popular products such as Microsoft Corp.'s SQL Server, Sybase Inc.'s Adaptive Server Enterprise, Ingres Corp.'s eponymous product, IBM's Informix and others, while the latter is an emerging open-source database.
Just as important, Stonebraker started companies that helped bring those databases -- and lesser-known ones -- to market. His current venture, for example is column-based data warehousing vendor Vertica Systems Inc.
"My goal is to short-circuit the process from research to product line," said DeWitt, who noted that he works directly for the Microsoft data and storage division that produces SQL Server, not Microsoft Research. "We absolutely want to be more market-responsive and nimble."
The University of Wisconsin lab is named after Microsoft database researcher Jim Gray, who was lost at sea last year. Gray not only helped build products such as SQL Server, he cooperated with many in academia such as DeWitt, who considered Gray a close friend and mentor.
The Microsoft Jim Gray Systems Lab has three researchers today. "It will top out at between 10 to 15 people," DeWitt said. In general, research produced by the lab will be owned by the university, though Microsoft gets non-exclusive royalty-free access to the patents. However, research by grad students that doesn't draw upon Microsoft confidential materials will be owned by the students themselves, DeWitt said.
DeWitt plans to initially focus on query optimization.
"It's one area in which there's been very little progress" in the past three decades, DeWitt said. He added that he plans to test an approach that "does a little optimization, a little execution, and so forth."
If successful, this could make its way into upcoming Microsoft data warehousing appliances code-named Madison, which DeWitt swears was not his doing. Those appliances are due in the first half of 2010.
DeWitt's other big interest is in very large database clusters. For example, he has strong opinions about the MapReduce parallel data store used by Google Inc. to index the Web.
In blog postings that DeWitt and Stonebraker co-wrote this spring, the two called MapReduce a "sub-optimal ... not novel" type of database that lacks the features modern DBAs and developers take for granted and was not worthy of the hype it had received.
The blogs received heavy criticism, with most critics arguing that MapReduce isn't comparable to a standard database because it is optimized for a single task -- quickly sifting through huge amounts of messy, unstructured data -- which even the largest databases today are poor at doing.
As one commentator snarkily wrote: "I tried to have MapReduce babysit my kids, and I came back half an hour later to find that it was just sitting there crunching data, and wasn't watching them at all. This thing can't do anything at all.... Also, compared to a standard hammer, this MapReduce thing is really crappy at pounding nails into things."
DeWitt is thick-skinned. He claims Oracle Corp. CEO Larry Ellison tried to have him fired in the 1980s after database performance benchmarks created by DeWitt showed Oracle lagging in key areas. "I don't think he quite understood the concept of tenure," he joked.
Responding to the critics, DeWitt and Stonebraker blogged: "Just to let you know, we don't hold a personal grudge against MapReduce. MapReduce didn't kill our dog, steal our car, or try and date our daughters."
DeWitt concedes today that MapReduce "does scale pretty well." He hails its ability to continue queries without interruption if a particular server fails, which most clustered databases cannot do.
But he stands by his argument, which is that true relational databases "give you a lot more leverage and good features." And DeWitt said he will soon release research to back that up.