Tuesday, July 02, 2013

MVCC Catalog Access

Some wag, riffing on Rudyard Kipling, once wrote that "if you can keep your head when all around you are losing theirs, maybe you just don't understand the situation".  I thought of that line this morning, while committing a patch to use MVCC snapshots for system catalog access.

If you're not the sort of person who spends a lot of time reading pgsql-hackers, it might not be obvious why this is important, or even a good thing.  Here's the short version: it's necessary infrastructure for allowing concurrent DDL.

In existing releases of PostgreSQL (including the upcoming 9.3 release), system catalogs are read using a funny set of rules which we call SnapshotNow.  If a row has been inserted and the inserting transaction has committed, we consider that row good; unless it's also been deleted and the deleting transaction has also committed, in which case we consider it dead.  Although this seems superficially reasonable, it leads to very surprising results when transactions commit in mid-scan.

Consider, for example, the case where a row is updated.  Depending on the placement of the old and new tuple versions in the table and its indexes, our scan might see either the old or the new version first.  If the updating transaction commits after we see the old version and before we see the new version, we'll see both of them; if the updating transaction commits after we see the new version and before we see the old version, we'll see neither of them.  Both of these results are quite surprising.

In practice, the consequence is that we never safely allow a row in a system catalog to be updated without first taking a lock strong enough to keep other backends from searching for that row - or at least, not without taking extreme precautions.  And sometimes we forget to do everything exactly right, resulting in strange bugs.

The patch I committed today doesn't change any locking rules, and it may still have bugs or performance consequences that have to be fixed, or perhaps even problems so bad that we end up reverting the whole patch.  But preliminary results look good; and if it stands up in wider testing, I think it's a pretty good bet we'll be seeing patches to weaken the full table locks we must now hold in the pretty near future.  I'm pretty excited about that.


  1. Nice Blog. Do we see this issue even for isolation levels higher than read committed?

    1. Yes. Isolation levels don't affect SnapshotNow.