Tuesday, August 02, 2016

Uber's move away from PostgreSQL

Last week, a blog post by an Uber engineer explained why Uber chose to move from PostgreSQL to MySQL. This article was widely reported and discussed within the PostgreSQL community, with many users and developers expressing the clear sentiment that Uber had indeed touched on some areas where PostgreSQL has room for improvement. I share that sentiment. I believe that PostgreSQL is a very good database, but I also believe there are plenty of things about it that can be improved. When users - especially well-known names like Uber - explain what did and did not work in their environment, that helps the PostgreSQL community, and the companies which employ many of its active developers, figure out what things are most in need of improvement.  I'm happy to see the PostgreSQL community, of which I am a member, reacting to this in such a thoughtful and considered way.

Having read the blog post a few times now, I keep finding myself wanting just a bit more detail. I've come to the conclusion that one of the big problems is that Uber really wanted logical replication rather than physical replication, but logical replication is not yet available in PostgreSQL core. Uber mentions that they considered pglogical, but it was not available on the server version they were using, which is somewhat older. They do not mention whether they tried any of the out-of-core replication solutions which are available, such as Slony, Bucardo, or Londiste; or, on the proprietary side, EnterpriseDB's xDB Replication Server. It would be great, if they did try any of those, to hear about their experiences. Regardless, I think that virtually the entire PostgreSQL community is in agreement that an in-core logical replication solution is important; I first blogged about this back in 2011, and we made great progress with PostgreSQL 9.4, which introduced logical decoding, but the full solution is still not there. Logical replication would, I believe, address Uber's concerns about the size of the change stream which they would need to send over their wide area network, cross-version replication, propagation of physical corruption, and query cancellation on replicas.

Some of these issues can be addressed in other ways. For example, query cancellation on replicas can be prevented by configuring hot_standby_feedback=on, and I can't tell from the blog post whether Uber's engineers were aware of this feature. Write-ahead log files can be compressed before transmitting them over a wide area network, either by using SSL (which can do compression as part of the protocol) or by using archive_command to apply an arbitrary compression command (e.g. gzip) to each 16MB segment before transmitting it. Again, this may have been tried and been found ineffective or insufficient, but it would be nice to hear more of the details. For major version upgrades, they mention that pg_upgrade took many hours to complete in their environment, which is not supposed to happen. A number of performance problems with pg_upgrade related to the handling of databases containing large number of objects, have been fixed in newer releases, so things might have been better for them on 9.4 or 9.5, but without more details it's hard to know - and it would be good to know, because problems that can be understood can be fixed.

Perhaps the thorniest problem which Uber raises is that of write amplification caused by secondary index updates.  As the Uber post explains, there are some basic differences between the way PostgreSQL organizes data and the way MySQL organizes data, which I've written about before.  PostgreSQL performs some update as "HOT" updates and the rest as "non-HOT". HOT updates touch no indexes, while a non-HOT update must touch every index. Therefore, the percentage of updates which are HOT has a critical impact on update performance, and any update that touches an indexed column is always non-HOT. For this reason, many experienced PostgreSQL users are quite cautious about adding indexes, since it is quite easy to create a great deal of additional index maintenance - both index insertions and later VACUUM activity - if the percentage of HOT updates falls. For most users, being conservative in adding indexes is sufficient to avoid this problem, but Uber isn't the only company to have problems of this type. In InnoDB, or any other system where the primary key index is a clustered index and secondary indexes reference the primary key rather than the physical tuple position, secondary index updates are less frequent than they are in PostgreSQL.  This approach is not without disadvantages - for example, it may be space-inefficient if the primary key is very wide and every secondary index access must also traverse the clustered index - but it's unquestionably got some benefits, especially for tables that are both heavily indexed and heavily updated.

What makes this problem thorny is that the PostgreSQL tuple format is pretty deeply embedded throughout the system, and it presupposes the current model for handling updates. We might be able to do better - HOT, itself, was an optimization as compared with our old strategy of ALWAYS updating every index - but I believe, and have believed for a while, that PostgreSQL also needs to be open to the idea of alternate on-disk formats, including but not limited to index-organized formats. In the past, there has been considerable skepticism within the PostgreSQL community about this kind of development direction, not only because developing a new on-disk format is hard, but also because of fears that the alternate formats would proliferate, dividing the community's development resources and perhaps even leading to multiple forks. While I'd like to hear more about Uber's experience - exactly how much secondary index traffic did they experience and were all of those secondary indexes truly necessary in the first place? - I view this overall as another piece of evidence suggesting that we need to look very hard at making our storage layer pluggable so that it's easier to experiment with new designs in this area. We may be able to get a bit more mileage out of extending the HOT optimization, but I think to really solve the problem is going to require something altogether new.

Even after as many years as I've been working on PostgreSQL, and currently that's about 8, it's always amazing how much more work there is to be done. We've come a very long way in that time, adding innumerable performance features, replication features, and others. But every time I think that we've just about overcome all of the really serious problems, a new generation of problems shows up, desperately needing developer attention. Once more unto the breach!

[After writing this blog post but before publishing it, I became aware that Markus Winand had already written an excellent post on this topic, which is very much worth reading.]


  1. "if the percentage of non-HOT updates falls": I believe you meant the percentage of HOT updates here, not non-HOT.

  2. This comment has been removed by the author.

  3. I'm more interested in the future rather than the past and the future looks a lot brighter for Postgres vs MySQL. Why would Oracle improve it for Amazon to sell it on RDS and take away their customers

    Depending on your workload sometimes a heap, in place updates and rollback segments work best. Sometimes clustered indexed tables with read locks. sometimes an immutable heap. The use case for Uber seemed to make sense but it would not for us where we like our analytic functions, hash joins and future parallel query.

  4. Well said, I personally think that not everybody has to say PostgreSQL is lousy, it's just one case where it doesn't mean the expectation.

  5. https://github.com/posix4e/jsoncdc

  6. And then there's this: http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-limitations/