Last week, I attended the Linux Storage, Filesystems, and Memory Management summit (LSF/MM) on Monday and Tuesday, and the Linux Collaboration Summit (aka Collab) from Wednesday through Friday. Both events were held at the Meritage Resort in Napa, CA. This was by invitation of some Linux developers who wanted to find out more about what PostgreSQL needs from the Linux kernel. Andres Freund and I attended on behalf of the PostgreSQL community; Josh Berkus was present for part of the time as well.
My overall impression is that it was a good week, except that by Thursday the combination of 14 hour days and jet lag were catching up with me in a big way. However, from the point of view of the PostgreSQL project, I think it was very positive. On Monday, Andres and I had an hour-and-a-half slot; we used about an hour and fifteen minutes of that time. Our big complaint was with the Linux kernel's fsync behavior, but we talked about some other issues as well, including double buffering, transparent huge pages, and zone reclaim mode.
Monday, March 31, 2014
Tuesday, March 18, 2014
PostgreSQL Now Has Logical Decoding
A few weeks ago, Josh Berkus wrote a blog post on Why HStore2/jsonb is the most important patch of 9.4. Everybody's going to have their own opinion on these kinds of questions, but for what it's worth, I think the most important new feature in 9.4 is going to turn out to be logical decoding, the result of a lot of hard work mostly by Andres Freund.
Thursday, March 13, 2014
write scalability for UPDATE operations
Yesterday, Heikki Linnakangas committed this patch:
commit a3115f0d9ec1ac93b82156535dc00b10172a4fe7
Author: Heikki Linnakangas
Date: Wed Mar 12 22:46:04 2014 +0200
Only WAL-log the modified portion in an UPDATE, if possible.
When a row is updated, and the new tuple version is put on the same page as
the old one, only WAL-log the part of the new tuple that's not identical to
the old. This saves significantly on the amount of WAL that needs to be
written, in the common case that most fields are not modified.
Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.
This patch is the result of a lot of work, and a lot of testing, principally by Amit Kapila, but as the commit message says, also by Heikki, myself, and others. So, how much does it help?
commit a3115f0d9ec1ac93b82156535dc00b10172a4fe7
Author: Heikki Linnakangas
Date: Wed Mar 12 22:46:04 2014 +0200
Only WAL-log the modified portion in an UPDATE, if possible.
When a row is updated, and the new tuple version is put on the same page as
the old one, only WAL-log the part of the new tuple that's not identical to
the old. This saves significantly on the amount of WAL that needs to be
written, in the common case that most fields are not modified.
Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.
This patch is the result of a lot of work, and a lot of testing, principally by Amit Kapila, but as the commit message says, also by Heikki, myself, and others. So, how much does it help?
Monday, March 10, 2014
Linux's fsync() woes are getting some attention
In two weeks, I'm headed to LSF/MM and the Linux Collaboration Summit, by invitation of some Linux kernel hackers, to discuss how the Linux kernel can better interoperate with PostgreSQL. This is good news for PostgreSQL, and hopefully for Linux as well. A post from Mel Gorman indicates that this topic is attracting a lot of interest, and that MariaDB and MySQL developers have now been invited to participate as well. His summary of the discussion so far quotes some blunt words from one of my posts:
IMHO, the problem is simpler than that: no single process should be allowed to completely screw over every other process on the system. When the checkpointer process starts calling fsync(), the system begins writing out the data that needs to be fsync()'d so aggressively that service times for I/O requests from other process go through the roof. It's difficult for me to imagine that any application on any I/O scheduler is ever happy with that behavior. We shouldn't need to sprinkle of fsync() calls with special magic juju sauce that says "hey, when you do this, could you try to avoid causing the rest of the system to COMPLETELY GRIND TO A HALT?". That should be the *default* behavior, if not the *only* behavior.
Tuesday, March 04, 2014
VACUUM FULL doesn't mean "VACUUM, but better"
There's a persistent belief among some users of PostgreSQL that VACUUM and VACUUM FULL do the same thing, but that VACUUM FULL does it better. If VACUUM is the moral equivalent of running the Dust Buster across the room a few times, VACUUM FULL must be the equivalent of hiring a professional cleaning crew to shampoo the carpets, and maybe repaint the walls as well. Unfortunately, this mental model is not accurate.