Last week, a blog post by an Uber engineer explained why Uber chose to move from PostgreSQL to MySQL. This article was widely reported and discussed within the PostgreSQL community, with many users and
developers expressing the clear sentiment that Uber had indeed touched
on some areas where PostgreSQL has room for improvement. I share that
sentiment. I believe that PostgreSQL is a very good database, but I
also believe there are plenty of things about it that can be improved.
When users - especially well-known names like Uber - explain what did
and did not work in their environment, that helps the PostgreSQL
community, and the companies which employ many of its active
developers, figure out what things are most in need of improvement. I'm happy to see the PostgreSQL community, of which I am a member, reacting to this in such a thoughtful and considered way.
Having read the blog post a few times now, I keep finding myself
wanting just a bit more detail. I've come to the conclusion that one
of the big problems is that Uber really wanted logical replication
rather than physical replication, but logical replication is not yet
available in PostgreSQL core. Uber mentions that they considered
pglogical, but it was not available on the server version they were
using, which is somewhat older. They do not mention whether they
tried any of the out-of-core replication solutions which are
available, such as Slony, Bucardo, or Londiste; or, on the proprietary side, EnterpriseDB's xDB Replication Server. It would be great, if
they did try any of those, to hear about their experiences.
Regardless, I think that virtually the entire PostgreSQL community is
in agreement that an in-core logical replication solution is
important; I first blogged about this back in 2011, and we made great
progress with PostgreSQL 9.4, which introduced logical decoding, but
the full solution is still not there. Logical replication would, I
believe, address Uber's concerns about the size of the change stream
which they would need to send over their wide area network,
cross-version replication, propagation of physical corruption, and
query cancellation on replicas.
Some of these issues can be addressed in other ways. For example,
query cancellation on replicas can be prevented by configuring
hot_standby_feedback=on, and I can't tell from the blog post whether
Uber's engineers were aware of this feature. Write-ahead log files
can be compressed before transmitting them over a wide area network,
either by using SSL (which can do compression as part of the protocol)
or by using archive_command to apply an arbitrary compression command
(e.g. gzip) to each 16MB segment before transmitting it. Again, this
may have been tried and been found ineffective or insufficient, but it
would be nice to hear more of the details. For major version
upgrades, they mention that pg_upgrade took many hours to complete in
their environment, which is not supposed to happen. A number of
performance problems with pg_upgrade related to the handling of
databases containing large number of objects, have been fixed in newer
releases, so things might have been better for them on 9.4 or 9.5, but
without more details it's hard to know - and it would be good to know,
because problems that can be understood can be fixed.
Perhaps the thorniest problem which Uber raises is that of write
amplification caused by secondary index updates. As the Uber post explains, there are some basic differences between the way PostgreSQL organizes data and the way MySQL organizes data, which I've written about before. PostgreSQL performs
some update as "HOT" updates and the rest as "non-HOT". HOT updates
touch no indexes, while a non-HOT update must touch every index.
Therefore, the percentage of updates which are HOT has a critical
impact on update performance, and any update that touches an indexed
column is always non-HOT. For this reason, many experienced
PostgreSQL users are quite cautious about adding indexes, since it is
quite easy to create a great deal of additional index maintenance -
both index insertions and later VACUUM activity - if the percentage of HOT updates falls. For most users, being conservative in adding
indexes is sufficient to avoid this problem, but Uber isn't the only
company to have problems of this type. In InnoDB, or any other system
where the primary key index is a clustered index and secondary indexes
reference the primary key rather than the physical tuple position,
secondary index updates are less frequent than they are in PostgreSQL. This approach is not without disadvantages - for example, it may be
space-inefficient if the primary key is very wide and every secondary
index access must also traverse the clustered index - but it's
unquestionably got some benefits, especially for tables that are both
heavily indexed and heavily updated.
What makes this problem thorny is that the PostgreSQL tuple format is
pretty deeply embedded throughout the system, and it presupposes the
current model for handling updates. We might be able to do better -
HOT, itself, was an optimization as compared with our old strategy of
ALWAYS updating every index - but I believe, and have believed for a
while, that PostgreSQL also needs to be open to the idea of alternate
on-disk formats, including but not limited to index-organized formats.
In the past, there has been considerable skepticism within the
PostgreSQL community about this kind of development direction, not
only because developing a new on-disk format is hard, but also because
of fears that the alternate formats would proliferate, dividing the
community's development resources and perhaps even leading to multiple
forks. While I'd like to hear more about Uber's experience - exactly
how much secondary index traffic did they experience and were all of
those secondary indexes truly necessary in the first place? - I view
this overall as another piece of evidence suggesting that we need to
look very hard at making our storage layer pluggable so that it's
easier to experiment with new designs in this area. We may be able to
get a bit more mileage out of extending the HOT optimization, but I
think to really solve the problem is going to require something
altogether new.
Even after as many years as I've been working on PostgreSQL, and
currently that's about 8, it's always amazing how much more work there
is to be done. We've come a very long way in that time, adding
innumerable performance features, replication features, and others.
But every time I think that we've just about overcome all of the
really serious problems, a new generation of problems shows up,
desperately needing developer attention. Once more unto the breach!
[After writing this blog post but before publishing it, I became aware that Markus Winand had already written an excellent post on this topic, which is very much worth reading.]
"if the percentage of non-HOT updates falls": I believe you meant the percentage of HOT updates here, not non-HOT.
ReplyDeleteYes, you are right.
DeleteI have corrected the text. Thanks.
DeleteThis comment has been removed by the author.
ReplyDeleteI'm more interested in the future rather than the past and the future looks a lot brighter for Postgres vs MySQL. Why would Oracle improve it for Amazon to sell it on RDS and take away their customers
ReplyDeleteDepending on your workload sometimes a heap, in place updates and rollback segments work best. Sometimes clustered indexed tables with read locks. sometimes an immutable heap. The use case for Uber seemed to make sense but it would not for us where we like our analytic functions, hash joins and future parallel query.
Well said, I personally think that not everybody has to say PostgreSQL is lousy, it's just one case where it doesn't mean the expectation.
ReplyDeletehttps://github.com/posix4e/jsoncdc
ReplyDeleteAnd then there's this: http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-limitations/
ReplyDeleteon sept 9 there was this audio explaining more from him
ReplyDeletehttp://softwareengineeringdaily.com/2016/09/09/ubers-postgres-problems-with-evan-klitzke/?utm_source=postgresweekly&utm_medium=email