Monday, November 14, 2011

Linux lseek scalability

I don't normally follow Linux kernel development, but I was pleased to hear (via Andres Freund) that the Linux kernel developers have committed a series of patches by Andi Kleen to reduce locking around the lseek() system call.  As I blogged about back in August, PostgreSQL calls lseek quite frequently (to determine the file length, not to actually move the file pointer), and due to the performance enhancements in 9.2devel, it's now much easier to hit the contention problems that can be caused by frequently acquiring and releasing the inode mutex.  But it looks like this should be fixed in Linux 3.2, which is now at rc1, and therefore on track to be released well before PostgreSQL 9.2.

Meanwhile, we're gearing up for CommitFest #3.  Interesting stuff in this CommitFest includes Álvaro Herrera's work on reducing foreign key lock strength and a PostgreSQL foreign data wrapper (pgsql_fdw) by Hanada Shigeru.  Reviewers are needed, for those and many other patches!

7 comments:

  1. I do have one question not related to this post: Will PostgreSQL 9.2 be ready before 2012-05-08 which is official release date of fedora 17??? Just asking. I didn't want to ask in mailing lists.

    ReplyDelete
  2. There's another very interesting change in Linux kernel 3.2 that affects PostgreSQL. The I/O-less dirty throttling patches went in: https://lwn.net/Articles/456904/

    This complements the dynamic writeback throttling patches that were merged in 3.1: https://lwn.net/Articles/405076/

    As far as I can tell, these together should fix the thrashing behavior when the kernel dirty memory limit is exceeded, which is experienced on write-bound servers, particularly during checkpoints.

    ReplyDelete
  3. @maxmim: Probably not. If we look at the length of the development cycles of 9.0 and 9.1 it will be out some time late summer or early autumn.

    ReplyDelete
  4. I'm sure the reasoning is sound but is there any reason why stat, fstat and/or lstat aren't being used to get file length information? Do they carry extra overhead that the seek calls don't?

    ReplyDelete
  5. I did test fstat(). At very high client counts, fstat() was a huge win because it doesn't lock the inode mutex, even in existing Linux releases. However, under ordinary circumstances, it's noticeably slower than lseek, probably because it copies more data from kernel space to user space. So if we went with fstat() it would really be just a hack to work around a kernel issue that only exists on Linux and only people with very large machines will notice, at the expense of everyone else.

    ReplyDelete
  6. Robert

    Any updates on you testing this on FreeBSD?

    ReplyDelete
  7. Why don't you set a flag in shared memory when a file is extended that the other backends can check to see if they need to call lseek? Firing an "extra" system call on every query is lunacy.

    ReplyDelete