Monday, June 13, 2011

False Contention

For the last few weeks, I've been buried in PostgreSQL performance analysis, mostly focusing on the workload generated by "pgbench -S" under high concurrency.  In other words, lots and lots of very simple SELECT statements on a single table.  Such workloads can generate serious internal contention within PostgreSQL on systems within many CPU cores.

But it's interesting to note that most of the contention points I've so far identified are what might be called "false contention".  The transactions generated by this workload need to perform operations such as:

- acquire an AccessShareLock on the target relation and its index to guard against a concurrent drop or schema change
- acquire an ExclusiveLock on their VXID in case another transaction wishes to wait for transaction end
- read the list of pending "invalidation" events, which are generally created by DDL
- read the list of in-progress XIDs, to generate a snapshot
- find the root index block for the table's primary key index in the shared buffer pool
- momentarily pin the block containing the root index page into the buffer pool, so that it can't be evicted while we're examining it
- momentarily lock the root index page, so that no new items can be added until we've decided which downlink to follow

Now, in fact, in this workload, there is no concurrent DDL against the relevant table and index, or any other one; no one is attempting to wait for the completion of any VXID; the list of in-progress XIDs can easily be simultaneously read by many backends; and the root index page is in no danger either of being modified or of being evicted.  In short, the problem is not that there are resource conflicts, but that verifying that no resource conflicts exist is itself eating up too many resources.


  1. Great stuff. Quick Q.

    Is it possible to set a flag when DDL is going on that triggers these checks? DDL + whatever else needs this? Then each trx just checks the flag and if no DDL is waiting skips the unneeded contention. If the flag is set the trx participates in the locking. The DDL in turn would obviously have to wait until all existing transactions (that hadn't read the flag) have exited before progressing, so there would obviously be a delay on the DDL side. With some thought, reasonable admins would schedule big DDL changes on off hours or during maintenance periods.

  2. In fact, several of the ideas I'm working are on the lines you suggest, but I'm implementing something more sophisticated than a single global flag.

  3. I've been thinking of this global flag when I was reading the article, just as M. Anonymous (sic). Can you give the name of the approach you are developing please ? Just for curiosity ?