Monday, December 20, 2010

PostgreSQL Performance vs. Microsoft SQL Server

A recent poster to the pgsql-performance mailing list enquired as to the relative performance of Microsoft SQL Server vs. PostgreSQL.  It's a reasonable question.  Switching databases can be a major project, and you certainly wouldn't want to do it and then find out at the end that you'd taken a huge performance hit and had to throw all your work away and switch back.  The good news is that this scenario is fairly unlikely.


The original poster found, in his initial testing, that creating a table with 1000 rows took 9.85 seconds with Microsoft SQL Server, and 0.65 seconds with PostgreSQL.  Then he ran a query, joining two tables, which took 0.76 seconds in Microsoft SQL Server, and 7.5 seconds with PostgreSQL.  What can we infer from this test?


Probably nothing.  In a previous job, I had a PostgreSQL query running in production that joined 34 tables and ran in under a second, and my general experience has been that the PostgreSQL query planner is extremely good.  It is certainly not impossible that on a particular test case some other product's query planner might arrive at a better answer, but 7.5 seconds is a really long time.  In fact, even the 0.76 seconds that Microsoft SQL Server took to execute that query seems pretty slow for a join involving only 1000 rows.  I suspect that both products could be made to give better performance.

And on the flip side, the fact that PostgreSQL loaded 1000 rows into a table more than ten times faster than Microsoft SQL Server is probably also proof of nothing.  It's inconceivable that Microsoft SQL Server takes 10 seconds to load 1000 rows of data.   Whatever you think of Microsoft, there are way too many people using Microsoft SQL Server to develop large and complex applications to believe for a minute that its performance could be that bad.  And here again, the 0.65 seconds PostgreSQL took to load the data doesn't seem so hot either: surely it's possible to do better on this side, too.

Just to be clear, I'm not denying that the poster accurately reported how long those tests took to run.  However, I don't believe that they represent the typical experience of users of either product.  Nor am I denying that there are real performance differences between PostgreSQL and Microsoft SQL Server.  There certainly are.  But those differences are due to architecture, not inefficiency.  PostgreSQL has reached the point where it's worthwhile to think about optimizations that remove single machine instructions from inner loops, and that tells you that it's pretty well optimized.  I haven't seen the Microsoft SQL Server commit logs, but it wouldn't surprise me a bit to find that the situation there is similar.  I'm not sure what accounted for the observed differences in runtime on these tests, but it wasn't just that either product is "faster" than the other one.

A few other comments on the thread seem worth mentioning here:

- Kevin Grittner noted that, in moving from Sybase on Windows to PostgreSQL on Linux, he got a major performance improvement.  Microsoft SQL Server was originally based on Sybase.

- Richard Broesma commented that, in his experience, Microsoft SQL Server was generally comparable, but could be faster when it could leverage parallel query, a capability that PostgreSQL does not have (yet, anyway!).

- Justin Pitts made similar observations, and also commented on the fact that PostgreSQL could be hurt on certain workloads by its lack of index-only scans.


I think the important thing to realize here is that both PostgreSQL and Microsoft SQL Server are complex, robust products with many features and generally good performance.  People can and do use both products to manage large amounts of data in critical production environments.  It would be a serious mistake to believe any categorical statement about which system is faster and by how much.  There are going to be cases - in either system - where things just don't perform well.  It might be the case, in your particular environment, that one of those cases causes such a serious problem that you're forced into the use of the other product.

It's much more likely, though, that you can adjust your configuration or create an index or rewrite the problem query or take some other action to remedy the problem.  Steve Tibbett wrote a blog post a few years ago on this from the SQL server side, entitled Why Is SQL Server So Slow?.  He gives some tips on improving Microsoft SQL Server performance, but his basic point is that blaming the database for bad performance is usually pointing the finger in the wrong direction: the database isn't slow, but neither can it be expected to handle every query perfectly with no work on your part.  The same point applies, with exactly equal force, to PostgreSQL.

3 comments:

  1. SQL Server has a really nice hash join including support for async IO. When will Postgres get async IO to make large external sorts and joins faster? By "async IO" I mean real (use OS facilities) or simulated (use background threads/processes). Either is good enough to reduce the time that the join/sort stalls for IO when the server has many disks and a smaller number of concurrent queries. Parallel query would be even better, but that is much more work.

    ReplyDelete
  2. Over many years of using both, here is what I have found:

    For applications with plenty of hardware and few users, the two are very comparable for performance. As concurrency grows, PostgreSQL seems to handle it better, likely due to its MVCC model using disk in the existing data files. However, if the data file run out of room, it can be slower expanding used storage than SQL server. Keep your free space maps big enough, and your vacuums frequent enough to deal with it, and it is great.

    The other time when it really seems to shine is when starved for hardware. PostgreSQL seems to deal much better with all of the CPU's being at 100% and swap going like mad trying to keep up. Remember also that I have always used PostgreSQL on Linux or BSD, so some of this may be how the operating systems handle different loads.

    ReplyDelete
  3. @Mark Callaghan - I'm not aware of any async I/O work under way at the moment, but I agree it's an interesting idea.

    @Grant Johnson - In PostgreSQL 8.4+, the free space map is sized automatically, so there is no need to tune that parameter.

    ReplyDelete