Tuesday, June 10, 2014

Linux disables vm.zone_reclaim_mode by default

Last week, Linus Torvalds merged a Linux kernel commit from Mel Gorman disabling vm.zone_reclaim_mode by default.   I mentioned that this change might be in the works when I blogged about attending LSF/MM and again when I blogged about how the page cache may not behave quite the way we want even with vm.zone_reclaim_mode disabled.

For those who haven't read previous discussion on this topic, either on my blog, on pgsql-performance, or elsewhere around the Internet, enabling vm.zone_reclaim_mode can cause a lot of problems for applications, such as PostgreSQL, that make use of more page cache than will fit on a single NUMA node.  Pages may get evicted from memory in preference to using memory on other nodes, effectively resulting in a page cache that is much smaller than available free memory.  See the second of the two blog posts linked above for more details.

PostgreSQL isn't the only application that suffers from non-zero values of this setting, so I think a lot of people will be happy to see this change merged (like the guy who said that this setting is the essence of all evil).  It will doubtless take some time for this to make its way into mainstream Linux distributions, but getting the upstream change made is the first step.  Thanks to Mel Gorman for pursuing this.