Thursday, March 10, 2016

No More Full-Table Vacuums

I just committed a very important patch to PostgreSQL.  The short summary for the patch is "Don't vacuum all-frozen pages." and it follows up on a patch I committed last week, whose short summary was "Change the format of the VM fork to add a second bit per page."  This led Andres Freund to respond with a one word email: "Yeha!"

That's a pretty understandable reaction.  Current releases of PostgreSQL need to read every page in the database at least once every 2 billion write transactions (less, with default settings) to verify that there are no old transaction IDs on that page which require "freezing".  For small databases, this is unnoticeable, but for large databases, especially large databases where the data is concentrated in a single giant table, the impact can be quite severe.  All of a sudden, when the number of transaction IDs that have been consumed crosses some threshold, autovacuum begins processing one or more tables, reading every page.  This consumes much more I/O bandwidth, and exerts much more cache pressure on the system, than a standard vacuum, which reads only recently-modified page.

Well, this commit fixes it.  Instead of whole-table vacuums, we now have aggressive vacuums, which will read every page in the table that isn't already known to be entirely frozen.  If you have a large database which is mostly static but which has a comparatively small active portion, this change will massively reduce the impact of wraparound vacuuming.  Users with multi-terabyte databases having non-trivial write activity will be especially happy about this change.  An aggressive vacuum still figures to read more data than a regular vacuum, possibly a lot more.  But at least it won't read the data that hasn't been touched since the last aggressive vacuum, and that's a big improvement.

I'd like to thank Masahiko Sawada for carrying this work forward - he has put an enormous amount of work into this, and deserves most of the credit for the fact that this feature will (barring discovery of any critical flaws) appear in PostgreSQL 9.6.

7 comments:

  1. Great news, thanks for the work. Looking forward to forgeting about wraparound.

    ReplyDelete
  2. > wraparound

    64bit per xid -- too many bytes

    ReplyDelete
  3. Great work Robert!

    ReplyDelete
  4. The target audience for this enhancement is...ME! Amazing, timely work, you continue to hit homeruns. I am in the process of designing/building/loading a large (20TB) genomics database, currently in v9.5 and one of my biggest concerns was the vacuum. I'm currently building the ETL to bring the data over from dozens of smaller MySQL databases.

    Our data is the exact model you described - huge areas of static data with smaller areas of very hot activity (as we write new genomic data or update a small percentage of existing rows). Thank you! I look forward to trying this out as soon as available...the timing looks to be very good.

    ReplyDelete
  5. The sheer fucking hubris of this post

    ReplyDelete
  6. Great work, thanks for contributing

    ReplyDelete
  7. This will help us tremendously. Those VACUUM (to prevent wraparound) processes are a big problem for us, not so much for any performance impact, but they do constantly get in the way of automated software upgrades that need to do DDL.

    ReplyDelete