Monday, March 12, 2012

First Results for Write Performance on IBM POWER7

In a previous blog post, I posted some SELECT-only pgbench results on IBM POWER7, and promised to post read-write results when I had them.  That took a little longer than expected due to periodic lock-ups on the machine, which seem to have been resolved by a kernel update (thanks to Brent Baude at IBM for some timely help with this issue).  But now I have some.

Since write performance is more variable than read performance, I ran each test for 30 minutes rather than 5 minutes.  Since that slows things down quite a bit, I cut down the number of combinations I tested - just two scale factors (300 and 3000) rather than the five I tested previously, and I tested only multiples of 8 clients up through 80 rather than multiples of 4 clients.  The test configuration is otherwise similar: scale factor 300 fits into PostgreSQL's buffer cache, while scale factor 3000 does not.
Although these results - at least at scale factor 300 - are probably a significant improvement over PostgreSQL 9.1, the scaling is clearly not as good here as it is in the read-only case.  In the scale factor 3000 case, increasing the number of clients basically doesn't improve performance at all.   Lock contention is doubtless a factor here.   Heikki Linnakangas has been working on a patch to allow parallel insertion into WAL but it still has bugs; and for the scale factor 3000 case, the scalability limitations around buffer eviction are probably a factor as well, and there may be others.

It also occurs to me - with the benefit of hindsight - that I haven't done a particularly good job tuning this machine for this workload.  I haven't, for example, played around with the background writer settings at all; nor have I made any serious attempt to optimize the disk configuration - the entire data directory is on one big LVM partition, which is probably not a configuration anyone would recommend for a high-write system.  So there's probably a bit more that can be squeezed out here by more careful tuning.  It's also the case that this machine doesn't have a particularly powerful disk subsystem (6 x 600GB 10K RPM SAS SFF Disk Drive), and it does appear that there is a substantial amount of iowait time during the tests; so it's possible that the disks are just running out of juice.  I think that most real-world workloads are not this write-intensive; here, each transaction writes a row to each of 4 tables - the only read is of one of the just-updated rows.

Other thoughts on tuning this are welcome, though if they involve buying different hardware that may not be something I can manage, as this is all made possible by the generosity of IBM.


  1. If you plotted TPS every few seconds for a given number of clients, is there significant variance in the result?

  2. Yes, there are serious latency spikes. There's an active discussion about this on pgsql-hackers; I'm trying to hunt it down.