Monday, November 14, 2011

Linux lseek scalability

I don't normally follow Linux kernel development, but I was pleased to hear (via Andres Freund) that the Linux kernel developers have committed a series of patches by Andi Kleen to reduce locking around the lseek() system call.  As I blogged about back in August, PostgreSQL calls lseek quite frequently (to determine the file length, not to actually move the file pointer), and due to the performance enhancements in 9.2devel, it's now much easier to hit the contention problems that can be caused by frequently acquiring and releasing the inode mutex.  But it looks like this should be fixed in Linux 3.2, which is now at rc1, and therefore on track to be released well before PostgreSQL 9.2.

Meanwhile, we're gearing up for CommitFest #3.  Interesting stuff in this CommitFest includes Álvaro Herrera's work on reducing foreign key lock strength and a PostgreSQL foreign data wrapper (pgsql_fdw) by Hanada Shigeru.  Reviewers are needed, for those and many other patches!

4 comments:

maxim said...

I do have one question not related to this post: Will PostgreSQL 9.2 be ready before 2012-05-08 which is official release date of fedora 17??? Just asking. I didn't want to ask in mailing lists.

Andreas Karlsson said...

@maxmim: Probably not. If we look at the length of the development cycles of 9.0 and 9.1 it will be out some time late summer or early autumn.

Chris Mellish said...

I'm sure the reasoning is sound but is there any reason why stat, fstat and/or lstat aren't being used to get file length information? Do they carry extra overhead that the seek calls don't?

Robert Haas said...

I did test fstat(). At very high client counts, fstat() was a huge win because it doesn't lock the inode mutex, even in existing Linux releases. However, under ordinary circumstances, it's noticeably slower than lseek, probably because it copies more data from kernel space to user space. So if we went with fstat() it would really be just a hack to work around a kernel issue that only exists on Linux and only people with very large machines will notice, at the expense of everyone else.