Tuesday, August 24, 2010

Version Numbering

Over the last few days, there's been a debate raging on pgsql-hackers on the subject of version numbering. There are many thoughtful (and some less-thoughtful) opinions on the thread that you may wish to read, but I thought the most interesting was a link posted by Thom Brown to a blog post called The Golden Rules of Version Naming. If you haven't seen it, it's definitely worth a read.

Monday, August 16, 2010

Why We're Conservative With PostgreSQL Minor Releases

Last week, a PostgreSQL user filed bug #5611, complaining about a performance regression in PostgreSQL 8.4 as compared with PostgreSQL 8.2. The regression occurred because PostgreSQL 8.4 is capable of inlining SQL functions, while PostgreSQL 8.2 is not. The bug report was also surprising to me, because in my experience, inlining SQL queries has always improved performance, often dramatically. But this user managed unluckily hit a case where the opposite is true: inlining caused a function which had previously been evaluated just once to be evaluated multiple times. Fortunately, there is an easy workaround: writing the function using the "plpgsql" language rather than the "sql" language defeats inlining.

Although the bug itself is interesting (let's face it, I'm a geek), what I found even more interesting was that I totally failed to appreciate the possibility that inlining an SQL function could ever fail to be a performance win. Prior to last week, if someone had asked me whether that was possible, I would have said that I didn't think so, but you never know...

And that is why the PostgreSQL project maintains stable branches for each of our major releases for about five years. Stable branches don't get new features; they don't get performance enhancements; they don't even get tweaks for things we wish we'd done differently or corrections to behavior of doubtful utility. What they do get is fixes for bugs (like: without this fix, your data might get corrupted; or, without this fix, the database might crash), security issues, and a smattering of documentation and translation updates. When we release a new major release (or actually about six months prior to when we actually release), development on that major release is over. Any further changes go into the next release.

On the other hand, we don't abandon our releases once they're out the door, either. We are just now in the process of ceasing to support PostgreSQL 7.4, which was released in November 2003. For nearly seven years, any serious bugs or security vulnerabilities which we have discovered either in that version or any newer version have been addressed by releasing a new version of PostgreSQL 7.4; the current release is 7.4.29. Absent a change in project policy, 7.4.30 will be the last 7.4.x release.

If you're running PostgreSQL 8.3 or older, and particularly if you're running PostgreSQL 8.2 or older, you should consider an upgrade, especially once PostgreSQL 9.0 comes out. Each release of PostgreSQL includes many exciting new features: new SQL constructions, sometimes new data types or built-in functions, and performance and manageability enhancements. Of course, before you upgrade to PostgreSQL 8.4 (or 9.0), you should carefully test your application to make sure that everything still works as you expect. For the most part, things tend to go pretty smoothly, but as bug #5611 demonstrates, not always.

Of course, this upgrade path is not for everyone. Application retesting can be difficult and time-consuming, especially for large installations. There is nothing wrong with staying on the major release of PostgreSQL that you are currently using. But it is very wise to upgrade regularly to the latest minor version available for that release. The upgrade process is generally as simple as installing the new binaries and restarting the server (but see the release notes for your version for details), and the PostgreSQL community is firmly committed to making sure that each of these releases represents an improvement to performance and stability rather than a step backwards.

Friday, August 13, 2010

How I Hack on PostgreSQL

Today's post by Dimitri Fontaine gave me the idea of writing a blog posting about the tools I use for PostgreSQL development. I'm not saying that what I do is the best way of doing it (and it's certainly not the only way of doing it), but it's one way of doing it, and I've had good luck with it.

What commands do I use? The following list shows the ten commands that occur most frequently in my shell history.

[rhaas pgsql]$ history  | awk '{print $2}' | sort | uniq -c | sort -rn | head
250 git
57 vi
31 %%
25 cd
24 less
20 up
18 make
13 pg_ctl
10 ls
8 psql

Wow, that's a lot of git. I didn't realize that approximately half of all the commands I type are git commands. Let's see some more details.

[rhaas pgsql]$ history  | awk '$2 == "git" { print $3}' | sort | uniq -c | sort -rn | head
93 diff
91 grep
15 log
10 commit
8 checkout
7 add
6 reset
5 clean
4 pull
3 branch

As you can see, I use git diff and git grep far more often than any other commands. The most common things I do with git diff are just plain git diff, which displays the unstaged changes in my working tree (so I can see what I've changed, or what a patch I've just applied has changed) and git diff master (which shows all the differences between my working tree and the master branch; this is because I frequently use git branches to hack on a patch I'm working on). A great deal of the work of writing a good patch - or reviewing one - consists in looking at the code over and over again and thinking about whether every change can be justified and proven correct.

git grep does a recursive grep starting at the current directory, but only examines files checked into git (not build products, for example). I use this as a way to find where a certain function is defined (by grepping for the name of the function at the start of a line) and as a way to find all occurrences of an identifier in the code (which is an absolutely essential step in verifying the correctness of your own patch, or someone else's).

As you can also see, my preferred editor is vi (really vim). This might not be the best choice for everyone, but I've been using it for close to 20 years, so it's probably too late to learn something else now. I think Dimitri Fontaine said it well in the post linked above: the best editor you can find is the one you master. Having said that, if you do even a small amount of programming, you're likely to spend a lot of time in whatever editor you pick, so it's probably worth the time it takes to learn a reasonably powerful one.

Saturday, August 07, 2010

Git is Coming to PostgreSQL

As discussed at the PGCon 2010 Developer Meeting, PostgreSQL is scheduled to adopt git as its version control system some time in the next few weeks. Andrew Dunstan, who maintains the PostgreSQL build farm, has adapted the build farm code to work with either CVS or git; meanwhile, Magnus Hagander has done a trial conversion so that we can all see what the new repository will look like. My small contribution was to write some documentation for the PostgreSQL committers, which has subsequently been further edited by Heikki Linnakangas (the link here is to his personal web page, whose one complete sentence is one of the funnier things I've read on the Internet).

I don't think the move to git is going to be radical change; indeed, we're taking some pains to make sure that it isn't. But it will make my life easier in several small ways. First, the existing git clone of the PostgreSQL CVS repository is flaky and unreliable. The back-branches have had severe problems in this area for some time (some don't build), and the master branch (aka CVS HEAD) has periodic issues as well. At present, for example, the regression tests for contrib/dblink fail on a build from git, but pass on a build from CVS. While we might be able to fix (or minimize) these issues by fixing bugs in the conversion code, switching to git should eliminate them. Also, since I do my day-to-day PostgreSQL work using git, it will be nice to be able to commit that way also - it should be both faster (CVS is very slow by comparison) and less error-prone (no cutting and pasting the commit message, no forgetting to add a file in CVS that you already added in git).