Robert Haas: 2014

Monday, December 22, 2014

Parallelism Update

It's been over a year since I last blogged about parallelism, so I think I'm past due for an update, especially because some exciting things are happening.

First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.

Memory Matters

Database performance and hardware selection are complicated topics, and a great deal has been written on that topic over the years by many very smart people, like Greg Smith, who wrote a whole book about PostgreSQL performance. In many cases, the answers to performance questions require deep understanding of software and hardware characteristics and careful study and planning.

But sometimes the explanation is something very simple, such as "you don't have enough memory".

Linux disables vm.zone_reclaim_mode by default

Last week, Linus Torvalds merged a Linux kernel commit from Mel Gorman disabling vm.zone_reclaim_mode by default. I mentioned that this change might be in the works when I blogged about attending LSF/MM and again when I blogged about how the page cache may not behave quite the way we want even with vm.zone_reclaim_mode disabled.

For those who haven't read previous discussion on this topic, either on my blog, on pgsql-performance, or elsewhere around the Internet, enabling vm.zone_reclaim_mode can cause a lot of problems for applications, such as PostgreSQL, that make use of more page cache than will fit on a single NUMA node. Pages may get evicted from memory in preference to using memory on other nodes, effectively resulting in a page cache that is much smaller than available free memory. See the second of the two blog posts linked above for more details.

PostgreSQL isn't the only application that suffers from non-zero values of this setting, so I think a lot of people will be happy to see this change merged (like the guy who said that this setting is the essence of all evil). It will doubtless take some time for this to make its way into mainstream Linux distributions, but getting the upstream change made is the first step. Thanks to Mel Gorman for pursuing this.

Tuesday, May 13, 2014

Troubleshooting Database Corruption

When your database gets corrupted, one of the most important things to do is figure out why that happened, so that you can try to ensure that it doesn't happen again. After all, there's little point in going to a lot of trouble to restore a corrupt database from backup, or in attempting to repair the damage, if it's just going to get corrupted again. However, there are times when root cause analysis must take a back seat to getting your database back on line.

Why The Clock is Ticking for MongoDB

Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Subtly Bad Things Linux May Be Doing To PostgreSQL

In addition to talking about PostgreSQL at LSF/MM and Collab, I also learned a few things about the Linux kernel that I had not known before, some of which could have implications for PostgreSQL performance. These are issues which I haven't heard discussed before in the PostgreSQL community, and they are somewhat subtle, so I thought it would be worth writing about them.

Back from LSF/MM and Collab

Last week, I attended the Linux Storage, Filesystems, and Memory Management summit (LSF/MM) on Monday and Tuesday, and the Linux Collaboration Summit (aka Collab) from Wednesday through Friday. Both events were held at the Meritage Resort in Napa, CA. This was by invitation of some Linux developers who wanted to find out more about what PostgreSQL needs from the Linux kernel. Andres Freund and I attended on behalf of the PostgreSQL community; Josh Berkus was present for part of the time as well.

My overall impression is that it was a good week, except that by Thursday the combination of 14 hour days and jet lag were catching up with me in a big way. However, from the point of view of the PostgreSQL project, I think it was very positive. On Monday, Andres and I had an hour-and-a-half slot; we used about an hour and fifteen minutes of that time. Our big complaint was with the Linux kernel's fsync behavior, but we talked about some other issues as well, including double buffering, transparent huge pages, and zone reclaim mode.

PostgreSQL Now Has Logical Decoding

A few weeks ago, Josh Berkus wrote a blog post on Why HStore2/jsonb is the most important patch of 9.4. Everybody's going to have their own opinion on these kinds of questions, but for what it's worth, I think the most important new feature in 9.4 is going to turn out to be logical decoding, the result of a lot of hard work mostly by Andres Freund.

write scalability for UPDATE operations

Yesterday, Heikki Linnakangas committed this patch:

commit a3115f0d9ec1ac93b82156535dc00b10172a4fe7
Author: Heikki Linnakangas
Date:   Wed Mar 12 22:46:04 2014 +0200

    Only WAL-log the modified portion in an UPDATE, if possible.

    When a row is updated, and the new tuple version is put on the same page as
    the old one, only WAL-log the part of the new tuple that's not identical to
    the old. This saves significantly on the amount of WAL that needs to be
    written, in the common case that most fields are not modified.

    Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.

This patch is the result of a lot of work, and a lot of testing, principally by Amit Kapila, but as the commit message says, also by Heikki, myself, and others. So, how much does it help?

Linux's fsync() woes are getting some attention

In two weeks, I'm headed to LSF/MM and the Linux Collaboration Summit, by invitation of some Linux kernel hackers, to discuss how the Linux kernel can better interoperate with PostgreSQL. This is good news for PostgreSQL, and hopefully for Linux as well. A post from Mel Gorman indicates that this topic is attracting a lot of interest, and that MariaDB and MySQL developers have now been invited to participate as well. His summary of the discussion so far quotes some blunt words from one of my posts:

 IMHO, the problem is simpler than that: no single process should
 be allowed to completely screw over every other process on the
 system.  When the checkpointer process starts calling fsync(), the
 system begins writing out the data that needs to be fsync()'d so
 aggressively that service times for I/O requests from other process
 go through the roof.  It's difficult for me to imagine that any
 application on any I/O scheduler is ever happy with that behavior.
 We shouldn't need to sprinkle of fsync() calls with special magic
 juju sauce that says "hey, when you do this, could you try to avoid
 causing the rest of the system to COMPLETELY GRIND TO A HALT?".
 That should be the *default* behavior, if not the *only* behavior.

VACUUM FULL doesn't mean "VACUUM, but better"

There's a persistent belief among some users of PostgreSQL that VACUUM and VACUUM FULL do the same thing, but that VACUUM FULL does it better. If VACUUM is the moral equivalent of running the Dust Buster across the room a few times, VACUUM FULL must be the equivalent of hiring a professional cleaning crew to shampoo the carpets, and maybe repaint the walls as well. Unfortunately, this mental model is not accurate.

Robert Haas

Monday, December 22, 2014

Parallelism Update

Wednesday, August 06, 2014

Memory Matters

Tuesday, June 10, 2014

Linux disables vm.zone_reclaim_mode by default

Tuesday, May 13, 2014

Troubleshooting Database Corruption

Wednesday, April 16, 2014

Why The Clock is Ticking for MongoDB

Wednesday, April 02, 2014

Subtly Bad Things Linux May Be Doing To PostgreSQL

Monday, March 31, 2014

Back from LSF/MM and Collab

Tuesday, March 18, 2014

PostgreSQL Now Has Logical Decoding

Thursday, March 13, 2014

write scalability for UPDATE operations

Monday, March 10, 2014

Linux's fsync() woes are getting some attention

Tuesday, March 04, 2014

VACUUM FULL doesn't mean "VACUUM, but better"