Monday, June 06, 2011

Reducing Lock Contention

In a recent blog post on Performance Optimization, I mentioned that Noah Misch and I had discussed some methods of reducing the overhead of frequent relation locks.  Every transaction that touches a given table or index locks it and, at commit time, unlocks it.  This adds up to a lot of locking and unlocking, which ends up being very significant on machines with many CPU cores.  I ended up spending a good chunk of last week hacking on this problem, with very promising results: I have a prototype patch that improves throughput on a SELECT-only pgbench test by about 3.5x on a system with 24 cores.  Not bad for a couple days work.

And it looks like we haven't exhausted the possibilities of this line of attack.  Some further investigation (inspired by a note from Stefan Kaltenbrunner) revealed that even with the patch applied, locking contention is still a severe problem on this workload.  Each transaction takes and releases a lock on it's virtual transaction ID, or VXID.  This scheme allows other processes to wait for the completion of a particular transaction.  However, we don't actually use that mechanism very much: it appears that it is only needed for CREATE INDEX CONCURRENTLY, and for Hot Standby.  I think it may be possible to arrange things so that the VXID locks are only taken when someone wants to wait for them, which would probably lead to a significant further boost in throughput.

These optimizations will primarily benefit read-only workloads.  On at least some read-write workloads, throughput is limited by an LWLock known as WALInsertLock, which serializes insertions into the write-ahead log stream.  I don't have a clear idea how to solve that problem at the moment, but intend to look into it further.


  1. Nice work! Curious what the improvement is on an 8 way machine?

  2. Yes! It's a really important optimization. We encounter heavy locking on large SELECT's.

  3. Andreas KarlssonJune 07, 2011 5:34 AM

    @WildRAiD: I do not think PostgreSQL has any problems with locks on large SELECTs, assuming you mean selects which touch many rows. I believe it is when you have many short read-only transactions with a high level of concurrency you will hit this bottleneck.