Thursday, April 29, 2010

Write-Ahead Logging in PostgreSQL 9.0

PostgreSQL 9.0 will feature a new setting, currently called wal_level (we still have a ways to go before the final release, so that could change, but probably not), which will control the amount of information written to the system's write-ahead log. There are three levels:

- "minimal" will write just enough information to the log to ensure proper crash recovery.
- "archive" will write enough extra information to support a standby server.
- "hot_standby" will write the same information as "archive", plus enough additional information to allow the standby server to run queries (the feature popularly known as "Hot Standby").

If you're familiar with setting up a standby server in releases prior to 9.0, you may know that this behavior was controlled in 8.4 and prior by setting the configuration variable called "archive_mode". In 8.4 and prior, archive_mode controls not only the amount of information written to the write-ahead log (WAL), but also whether the archiver process is started and whether unarchived segments are retained on the master. While this design worked OK for earlier releases, we outgrew it in 9.0 due to the addition of two new features: streaming replication and hot standby.

In 8.4, archive_mode controls three separate behaviors: what information is written to WAL, whether or not to start the archiver process, and whether or not to retain unarchived WAL segments indefinitely. In 9.0, it's possible to set up a system where streaming replication is enabled but archiving is not, and streaming replication has the same WAL requirements as archiving. Rather than writing the necessary information to WAL if either archiving or streaming replication is enabled, it seemed cleaner to have a separate setting to explicitly control what got written to WAL.

While streaming replication and archiving have essentially the same WAL requirements, Hot Standby requires all of those same things to be logged plus a few more. Initially, we just had a setting for whether or not to log those few extra things, and it simply didn't do anything unless either archiving or streaming replication was also enabled. But that led to some confusing error conditions. Streaming replication was enabled by setting a variable called max_wal_senders to a value greater than 0, so the WAL needed for Hot Standby would get written if:

(archive_mode = 'on' OR max_wal_senders>0) AND recovery_connections='on'

Again, it seemed cleaner to have a separate setting. When Hot Standby failed to start on the standby because of incorrect settings on the master, the error message needed to say something like this:
We can't start Hot Standby because the master server either has max_wal_senders=0 and archive_mode=off, or else it has recovery_connections=off. So please either go enable recovery_connections, or if it's already enabled, then either increase max_wal_senders or turn on archive_mode.
With the new wal_level setting, we can just say (in effect) this:
We can't start Hot Standby because you didn't set wal_level='hot_standby' on the master.
That's not the exact text of the error message. We actually refer to the thing that we can't start on the standby as "recovery_connections", and there is a "recovery_connections" setting on the standby that controls whether or not we try to start it. I'm not sure whether this is all good terminology - these are all new features so we're hammering it out as we go along! Hopefully we'll get some further feedback as we move into beta. Beta1 should be wrapped later today!

1 comment:

  1. Hi Robert Hass,
    Thanks for posting.

    In streaming replication, we use wal_sender to sent "info" from primary to standby, but is it synchronous or asynchronous ? Is it transaction in master commited or not before sending to slave?
    I dont understand that "writing to WAL is success if transaction have commited" ( , it is not detail)

    And I read in "PG admin cookbook/ chapter 12" ebook is: "if data changes are acknowledged as sent from Master to Sandby before transaction commit is acknowledged, we refer to that as SYNCHRONOUS replication ... ELSE after transaction commit -> ASYNCHRONOUS replication"

    Luan Huynh