PostgreSQL creator on the art of open source

Bruce Momjian may not be the most famous free software figure, but as a founder and lead architect of the PostgreSQL relational database management system, he is an ardent believer in the correctness and beauty of open source development.

In Australia to speak at Sydney's PostgreSQL user group, Momjian spoke about the early days of the database's development and its transition from academia to the Internet.

"In 1996 the database had left Berkeley and we started an Internet development team," Momjian said. "I realized the code needed some organization behind it as it didn't have the focus and management behind it. There wasn't a unified effort so everyone could work efficiently."

When Momjian started on the project there was "one guy from Berkeley" maintaining the software, and it was not uncommon for bug fixes to be released as a patch but never make it into the mainstream release.

"Then you get into some funny e-mails where people have their wish list of what they want to get PostgreSQL to do and they didn't get that for seven years," he said.

The original Ingres database was developed at Berkeley from 1972 to 1984, and in the mid-eighties an academic by the name of Stonebraker, who worked on Ingres, said it was time to go beyond relational to extended data types and plug-in languages.

"He developed this post-Ingres and that's how it became Postgres," Momjian said, adding the DB2 database may have also come out of Berkeley.

The modest, but charismatic, Momjian used a moon landing analogy to describe PostgreSQL development.

"There is the famous line 'one small step for man, one giant leap for mankind' and in a way the PostgreSQL history is like that," he said. "You look at the windows port now and say it's great, but at the time you are doing it it's like this is never going to work. At the time you are doing it, it looks like you are making sausage. You really think I am never going to finish this."

Momjian said open source development deals with "little things", but once done people wow over how there aren't any other problems.

"The fixes are so disjointed - they are coming from different problems you are having. You look at the end product and it is an engineering feat, but the beginning doesn't seem that way."

Momjian said when people look at PostgreSQL now "and the companies behind it", like Fujitsu and his employer EnterpriseDB, they say it was always that way, but he admits it came form Berkeley "pretty messed up".

"Slowly chipping away over time you end up with something that doesn't look like it used to," he said. "If you sculpt something you start with a block and when you finish you end up with something that looks nothing like it."

It is this level of focus on the code that Momjian says differentiates commercial and open source software.

"The motivation of the open source community is dramatically different to that of a company," he said. "If you look at a typical tool a database vendor makes it's little programs written by the company and they put a team on it and ship it in the product."

"What happens to the code after that? Nothing. Nobody is going to buy any more copies of a database if you add another flag. For us everything is incremental. What happens is all of our users will see it [so] the features and the way things work are much more holistic. We have a much tighter cycle of how we interact with our users. Code quality is much more important to us because if the code is much more difficult to understand people are not going to work with it."

Page Break

Momjian expressed pride in the reach of PostgreSQL code, saying during its development it solved a lot of problems no other project has.

"I will Google for something and the first Google reply is my PostgreSQL posting - what, are we the only people that care about this? I many cases it's true - what we accept is so stringent we won't accept anything else. The PHP guys needed spinlock code and they took it from us. We ended up not getting a lot of help from people we thought we could get help from."

According to Momjian, a lot of open source projects can learn from PostgreSQL's stringent quality assurance because they don't get the luxury of putting out a bad release because there is zero tolerance for things that don't work.

"Look at Mozilla where they tried to make a platform that no one wanted and Firefox just wanted to make a browser," he said. "People are more confident with us than some of the commercial databases."

When asked about the future, Momjian said he thinks development is always going to look this way.

"It's like we don't know what we are doing but we are determined to get there," he said. "I've got so many patches and some are too hard and complex so we need to move forward. If you look at the hackers list it looks completely chaotic because we are in trouble, but a month from now it is going to look fine."

With some patches going to require three days to do, Momjian is confident the upcoming 8.3 release "should be" more powerful than 8.2 which did not have a lot of new functionality but feature completion.

"If you look beyond 8.3 that's when the wheels start to fall off the car," he said. "Even in 8.3 we will have the ability to delay the data to disk for buffered transactions. I put it on the list because Informix had it and we are now getting stuff I thought I would never get in really trend-setting ways."

While PostgreSQL runs on 16 and 32-way systems "really well" users are going to see it "rocket off" because there is so much activity happening "we can hardly keep up with it".

"In the early years it was about stopping PostgreSQL from crashing, then it was performance, and the third phase is enterprise features and a lot of things enterprises need to do," Momjian said. "So I would say we kind of ended the enterprise features phase of things we are doing. So if we look at 8.3 and forward what you have are revolutionary features that go beyond things you can't do with other databases."

Such revolutionary functionality is "completely beyond" what Momjian thought a database could do in terms of performance and transaction processing.

"If you look in the next five years, PostgreSQL will be a poster child for databases period," he said. "There is not really another database that's enhancing at the speed of PostgreSQL, so what that would look like is hard to say."

Page Break

There are grand plans for PostgreSQL, but just don't ask the development team about a road map.

"When I get asked why don't you have a road map I say we can't have a road map because we can't tell developers what to do and we don't know what we are going to be able to do?" Momjian said. "We have a lot of growth at the bottom in terms of new developers but not at the top in terms of people that understand what is going on. In terms of management we do a lot of things by the shoestring. A lot of the formal methods tend to obscure what is happening."

One contentious aspect of database technology is replication, where there has always been one camp of people who say there is not one suitable replication solution, but Momjian believes "that's just nineties thinking".

"The closest we have come now is about high availability and load balancing with Slony, pg-cluster, and data partitioning, and at least we got something down and we have a best of breed for each category," he said, adding there may be a solution in version 8.3 involving frequently updated rows.

"It seems as though replication adds so much complexity that if we can do it outside it would be better. Having it separate makes it possible to do version upgrades without any downtime. We won't put things out there that don't make sense."

Momjian, originally from Pennsylvania, said Sydney is a special place for him because it has the second largest number of PostgreSQL people per capital.

"Open source itself is obviously very strong here," he said.

Copyright © 2007 IDG Communications, Inc.

8 simple ways to clean data with Excel
  
Shop Tech Products at Amazon