Saturday, February 26, 2011

The Case For Logical Replication

Replication, as it exists in PostgreSQL today, is physical replication.  That is, the disk files as they exist on the standby are essentially byte-for-byte identical to the ones on the master (with the exception that the same hint bits may not be set).  When a change is made on the master, the write-ahead log is streamed to the standby, which makes the same change to the corresponding disk block on the standby.  The alternative is logical replication, in which what is transferred is not an instruction to write a certain sequence of bytes at a certain location, but the information that a certain tuple was inserted, or that a table with a given schema was created.

Thursday, February 17, 2011

Working in the Community

The PostgreSQL community makes decisions by consensus.  The craziness of this approach should be readily apparent.  The United States Senate has come under withering criticism for rules which permit one or a small number of Senators to hold up the business of the chamber to such an extent that it's difficult to get anything done.  Forty-one senators can filibuster just about anything, to the frustration of the fifty-nine who want to have a vote and get on with it; and sometimes just one Senator can make himself or herself a near-complete obstacle to progress.

Keeping Local Git Branches Up To Date

Because I spend most of my time working on the master branch of the PostgreSQL git repository, I prefer to work with just a single clone.  The PostgreSQL wiki page Committing with Git describes several ways of using multiple clones and/or git-new-workdir, but my personal preference is to just use one clone.  Most of the time, I keep the main branch checked out, but every once in a while I check out one of the back-branches to look at something, or to back-patch.  (If you're unfamiliar with the PostgreSQL workflow, note that we do not merge into our official branches; we always rebase, so that there are no merge commits in our official repository.  You may or may not like this workflow, but it works for us.)

One small annoyance is that "git pull" doesn't leave my clone in the state I want.  Say I have the master branch checked out.  "git pull" will update all of my remote tracking branches, but it will only update the local branch that I currently have checked out.  This is annoying, first of all because if I later type "git log REL9_0_STABLE" I'll only get the commits since the last time I checked out and pulled that branch, rather than as I intended the latest state of the upstream, and secondly because it leads to spurious griping when I later do "git push": it complains that the old branches can't be pushed because it wouldn't be a fast-forward merge.  This is of course a little silly: since my branch tip is an ancestor of the tracking branch, it would be more reasonable to conclude that I haven't updated it than to imagine I meant to clobber the origin.

Tuesday, February 01, 2011

MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge

Almost two months ago, I wrote part one of what I indicated would be an occasional series of blog posts comparing the architecture of PostgreSQL to that of MySQL.  Here's part two.  Please note that the caveats set forth in part one apply to this and all future installments as well, so if you haven't read part one already, please click on the link above and read at least the first two paragraphs before reading this post.