I recently had a chance to run some benchmarks of PostgreSQL 9.2devel on an IBM POWER7 machine provided by IBM and hosted by Oregon State University's Open Source Lab. I'd like to run more tests, but for a first pass I decided to run a SELECT-only pgbench test at various client counts and scale factors. This basically measures how quickly we can do primary key lookups in a completely read-only environment. This generated a pretty graph. Here it is.
There are a couple of interesting things about this graph. First, this machine has 16 cores, but 64 hardware threads. A hardware thread, clearly, is not quite as good as a full core. So, unsurprisingly, all four curves have an inflection point right around 16 cores. After that, performance keeps going up - in fact, in the scale factor 10 and 100 case, it increases slightly even beyond the 64-client mark at which all hardware threads are presumably saturated. pgbench was running on the same machine as the database server, which may or may not be related.
Second, the absolute performance is quite good. Here's an older graph, taken on a 32-core AMD 6128 machine. That was generated using an older code base, but I don't think too much has changed in the interim, at least not that's relevant to this test. So we can see that this POWER7 machine is pretty fast - we get significantly more transactions per second, despite having fewer real cores. Also note that this machine is running Linux 3.2.x, which should mean that the Linux lseek scalability problem I complained about previously is no longer an issue.
Third, it's interesting to note what happens as we increase the size of the data set. Scale factor 100 generates a database of about 1502 MB, so both that one and the scale factor 300 run are operating on databases that fit entirely inside the database's page cache, shared_buffers, which I had set to 8GB for these tests. On the other hand, the runs at scale factor 1000 and scale factor 3000 are larger than the database's page cache, so we've got to copy pages in and out from the operating system as they are used. PostgreSQL doesn't use direct I/O, so we're just copying from the operating system's page cache, not the disk.
Still, data copying is expensive, so we'd expect some performance degradation, yet, at the lower client counts, it's not too bad. At 16 clients, scale factor 1000 is just 6% slower than scale factor 100. As we ramp up the number of clients, though, things get quite a bit worse. At 32 clients, the regression has increased to 13%, and at 64 clients, it's 41%. There's a known concurrency problem with buffer allocation (a lock called BufFreelistLock) so these results aren't entirely surprising, but they do illustrate that at least on this problem, the issue isn't so much performance as scalability. The extra data copying does hurt, but the lock contention hurts more.
I did one more set of test runs using scale factor 10000. This data set was so large that it didn't even fit in memory - the server has 64GB of RAM. Of course, this led to a huge drop-off in performance, so it didn't make sense to put those results on the same graph. But I made a separate graph with just those results.
I don't think this server has a particularly powerful I/O subsystem, but even if it did, disks are a lot slower than memory, and this benchmark is generating completely random I/O, which is not something disks are very good at. Nevertheless we seem to do a pretty good job saturating the available I/O capacity.
When I have the opportunity, I'd like to run some read-write tests on this machine as well; I'll post those results when I have them.