Tuesday, April 19, 2011

PostgreSQL East, and The MySQL Conference and Expo

Last month, I attended (and spoke at) PostgreSQL East in New York City, which this year featured a MongoDB track.  This past week, I was in Santa Clara at the O'Reilly MySQL Conference & Expo, which had a substantial PostgreSQL track this year, where I also spoke.

Both conferences had some very good talks.  The first talk I attended at the MySQL conference turned out to be one of the best - it was entitled Linux and H/W optimizations for MySQL.  I had a little difficulty understanding Yoshinori Matsunobu's accent at times, but the slides were excellent, and very detailed.  Some of his more interesting findings: (1) SSDs speed things up both on the master and on replication slaves, but the speedup is larger on the slaves; so it's useful to put hard disks on the master and SSDs on the slave to make it possible for single-threaded recovery there to keep up with the master; (2) while SSDs are much faster for random access, they are actually slower for sequential access and fsync, so a RAID array with a battery-backed or flash-backed write cache may still be a better option in those cases, (3) Fusion I/O drives were FAR faster than Intel drives, (4) the Intel Nehalem architecture was much faster than the AMD Opteron architecture when used in combination with SSDs, and (5) HyperThreading helps more in SSD environments than it does otherwise, because the system, overall, becomes more heavily CPU-bound; for the same reasons, mutex contention hurts more.

Another very good talk was Peter Zaitsev's discussion of Innodb and XtraDB Architecture and Performance Optimization, which gave me the feeling of looking into a sort of carnival mirror, where you recognize yourself, but it's all distorted.  Two of the problems that give PostgreSQL DBAs heartburn - bloat, and checkpoint I/O spikes (and less frequently, purge not keeping up a la vacuum not keeping up) - are apparently problems for MySQL as well, though with significantly different details.  I'm not even going to attempt to summarize the differences, or say which problem is worse or occurs more often, because I honestly have no idea.  I was a bit surprised to hear dump-and-reload recommended to recover from certain worst-case scenarios, though.

There were other good talks, too, which helped me understand what's going on in the world of MySQL forks.  Apparently, the Drizzle team is busy removing features that they consider half-baked and modularizing the code so that it is easier to understand and improve, while the MariaDB team is busy adding optimizer features, including support for hash joins and persistent statistics.  From what I understand, the MySQL optimizer has typically worked by gathering statistics through on-the-fly index probes, which can be a problem in some situations.  It's not so easy to categorize the work that Oracle is doing, but it seems to involve a fair amount of filing down of rough edges, and various improvements to replication, including, perhaps most significantly, parallel replication apply.

At PostgreSQL East, I think my favorite talk was Ken Rosensteel's talk, somewhat misleadingly titled Large Customers Want PostgreSQL, Too.  This talk turned to be about migrating a large Oracle mainframe application to use PostgreSQL, and the challenges faced during that migration.  He, or his team, built an Oracle-to-PostgreSQL converter for stored procedures; it was interesting to see that they got bitten by our bizarre casting rules around the smallint data type.  They also ended up doing some very interesting work optimizing the performance of ECPG for small FETCH statements; these are areas of the code that I think don't normally get a lot of attention, and it was great to hear about the optimization work that got done.

I was disappointed that Jon Hoffman's talk on Experiences with Postgres and MongoDB at foursquare.com got cancelled; I think that would have been an interesting talk.  I did have an opportunity to attend Jake Luciani's talk Comparing the Apache Cassandra Architecutre to PostgreSQL, which turned out to be more about Cassandra than PostgreSQL, but was nevertheless interesting.  I would have been interested to hear a more technical talk, though, about how problems like distributed serialization anomalies and distributed checkpointing are handled.

Next month, I'll be speaking at PGCon 2011 on Using The PostgreSQL System Catalogs and How To Get Your PostgreSQL Patch Accepted.  And after that, unlike Bruce, I'm going to stay home for a few months!


  1. At the MySQL conference I also heard a lot about MySQL trying to keep up with a replication stream. I think MySQL has that problem worse than Postgres because their transaction log is in a more logical format than PG. The PG log is more like "go write this data into a file at this offset", while the MySQL log is more like, "here is a query/row; go process it".

  2. MySQL can suffer replication lag because replication apply is read-modify-write and single threaded. AFAIK, PG replication doesn't ship page images for most changes (that would send too much across a WAN) so it is also read-modify-write and single threaded. Isn't it also subject to replication lag for IO bound workloads?

    Many of us have mostly simple SQL in our replication logs so the difference between replaying an SQL statement and replaying a log record doesn't save us. The problem for us is that read page, modify page, write page is bound by the latency of "read page". So SSD makes a big deal here.

    The MySQL community has a few projects in progress to improve on this. Tungsten Replicator is now open source and does parallel replication apply. MySQL 5.6 (still in beta) also has parallel replication apply.

    Has there been any discussion about modifying PG to support concurrent read requests while replaying the WAL?

  3. @rhaas: At GoDaddy I dealt with a bunch (dozens) of different workloads for several years and had a total of one instance of a problem with Purge.

  4. @Mark: Yeah, replication lag is a problem for us, too. I wouldn't like to speculate on whose problem is more severe. It is difficult because there are few people equally familiar with both systems, and I don't think that is really what's driving people toward one system vs. the other.

    @Rob: As I say, I am NOT attempting to quantify which problem is worse. I certainly think there's room for improvement in our VACUUM implementation, and that's one of the things I'd like to work on. But I did hear some discussion of improvements in progress on the MySQL side as well.

  5. Deployments likely to encounter replication lag from being IO bound are also likely to hit purge lag. Fortunately, multi-threaded purge is around the corner for those of us with extreme workloads for InnoDB.