Robert Haas

Thursday, August 18, 2011

Index-Only Scans: Now There's a Patch

In November of 2010, I blogged about a much-requested PostgreSQL feature: index-only scans. We've made some progress! In June of this year, I committed a patch (and then, after Heikki found some bugs, another patch) to make the visibility map crash-safe. In previous releases, it was possible for the visibility map to become incorrect after a system crash, which means that it could not be relied on for anything very critical. That should be fixed now. Last week, I posted a patch for the main feature: index-only scans.

Linux and glibc Scalability

As some of you probably already know from following the traffic on pgsql-hackers, I've been continuing to beat away at the scalability issues around PostgreSQL. Interestingly, the last two problems I found turned out, somewhat unexpectedly, not to be internal bottlenecks in PostgreSQL. Instead, they were bottlenecks with other software with which PostgreSQL was interacting during the test runs.

Read Scaling Out to 32 Cores

With the exception of a week's vacation, my last month has been mostly absorbed by PostgreSQL performance work. Specifically, I've been looking at the workload generated by "pgbench -S", which essentially fires off lots and lots of SELECT queries that all do primary key lookups against a single table. Even more specifically, I've been looking at the way this workload performs on systems with many CPU cores where (I found) PostgreSQL was not able to use all of the available CPU time to answer queries. Although the single-core performance was around 4,300 tps, performance with 36 clients (on a 24-core server) was only about 36,000 tps.

Research revealed that performance was being limited mostly by PostgreSQL's lock manager. Each SELECT query needed to lock the table being queried - and its index - against a concurrent DROP operation. Since PostgreSQL 8.2, the lock manager has been partitioned: a lock request against a database object will be assigned to one of 16 "partitions", and lock requests against objects that fall into different partitions can proceed in parallel. Unfortunately, that doesn't help much in this case, because only two objects are being locked: the table, and its index. Most of the traffic therefore targets just two of the sixteen lock manager partitions. Furthermore, because the query itself is so trivial, the rate of lock and unlock requests is extremely high - on a more complex query, the bottleneck wouldn't be as severe.

False Contention

For the last few weeks, I've been buried in PostgreSQL performance analysis, mostly focusing on the workload generated by "pgbench -S" under high concurrency. In other words, lots and lots of very simple SELECT statements on a single table. Such workloads can generate serious internal contention within PostgreSQL on systems within many CPU cores.

But it's interesting to note that most of the contention points I've so far identified are what might be called "false contention". The transactions generated by this workload need to perform operations such as:

- acquire an AccessShareLock on the target relation and its index to guard against a concurrent drop or schema change
- acquire an ExclusiveLock on their VXID in case another transaction wishes to wait for transaction end
- read the list of pending "invalidation" events, which are generally created by DDL
- read the list of in-progress XIDs, to generate a snapshot
- find the root index block for the table's primary key index in the shared buffer pool
- momentarily pin the block containing the root index page into the buffer pool, so that it can't be evicted while we're examining it
- momentarily lock the root index page, so that no new items can be added until we've decided which downlink to follow

Now, in fact, in this workload, there is no concurrent DDL against the relevant table and index, or any other one; no one is attempting to wait for the completion of any VXID; the list of in-progress XIDs can easily be simultaneously read by many backends; and the root index page is in no danger either of being modified or of being evicted. In short, the problem is not that there are resource conflicts, but that verifying that no resource conflicts exist is itself eating up too many resources.

Monday, June 06, 2011

Reducing Lock Contention

In a recent blog post on Performance Optimization, I mentioned that Noah Misch and I had discussed some methods of reducing the overhead of frequent relation locks. Every transaction that touches a given table or index locks it and, at commit time, unlocks it. This adds up to a lot of locking and unlocking, which ends up being very significant on machines with many CPU cores. I ended up spending a good chunk of last week hacking on this problem, with very promising results: I have a prototype patch that improves throughput on a SELECT-only pgbench test by about 3.5x on a system with 24 cores. Not bad for a couple days work.

Open Source Licensing

I can't resist the opportunity to comment on the FSF's guidelines - apparently just published - for how to choose a license for your work. A story on this was posted on Slashdot this morning. The FSF's guidelines are a little more nuanced than they could be; for example, they recommend contributing code to existing projects under the licenses used by those projects, rather than starting a giant war with the maintainers of those projects. And if you're including trivial test programs in your work, or your work is shorter than the text of the GPL itself, you might just want to put the work in the public domain. Very sensible!

But that's about as far as they're willing to go. For example, the section on contributing to existing projects suggests that, if you're making major enhancements to a non-GPL program, you might want to fork it and release your changes under the GPL. In other words, in the FSF's opinion, sometimes you should start a giant war with the maintainers. In the open source world, forks are a part of life, and people are free to choose the licenses they want, but this seems awfully cavalier to me. Forks are really damaging. Maintaining a large and unified developer community has value; having those people spend their time developing, rather than arguing about licensing, is a good thing.

Robert Haas

Thursday, August 18, 2011

Index-Only Scans: Now There's a Patch

Thursday, August 04, 2011

Linux and glibc Scalability

Monday, July 25, 2011

More On Read Scaling

Thursday, July 21, 2011

Read Scaling Out to 32 Cores

Monday, June 13, 2011

False Contention

Monday, June 06, 2011

Reducing Lock Contention

Friday, May 27, 2011

Open Source Licensing