Wednesday, August 06, 2014

Memory Matters

Database performance and hardware selection are complicated topics, and a great deal has been written on that topic over the years by many very smart people, like Greg Smith, who wrote a whole book about PostgreSQL performance.  In many cases, the answers to performance questions require deep understanding of software and hardware characteristics and careful study and planning.

But sometimes the explanation is something very simple, such as "you don't have enough memory".

There's a very old joke I must have read for the first time at least ten years ago, which you can still find circulating around the Internet.  It goes approximately like this. Abraham (the biblical patriarch) intends to install Windows XP on his 386 PC.  Isaac points out that the machine doesn't have enough memory to run Windows XP.  Abraham replies: "Do not worry, my son; God will provide the RAM."

I'm not sure how well this joke has aged.  I remember a time when discussion of buying a new computer tended, at least in my family, to involve a lot of discussion of how big the hard disk was and how much RAM it had.  These days, most people I know assume that if they buy a new computer, the memory and storage will be sufficient for their needs.  In many cases, that assumption is valid.  If you're anything like me, you probably know how much flash storage your smartphone has, but not how much actual RAM it has.  And that's OK, because there's probably no reason you should care.

But you should care a lot how much RAM your database server has.  Over and over, I've seen people running very large databases on systems with very small amounts of RAM.  Because their access is concentrated in a small percentage of the database, they experience very high cache hit ratios (well over 99%), little I/O, and generally good performance, despite the fact that the database is anywhere between 4 and 40 or more times the size of physical memory.

Then something happens.  It could be a pg_dump or a hot backup, which involves reading the whole database.  It could be an anti-wraparound vacuum on a large table.  It could be a bulk load.  It could be an additional task running on the machine.  Whatever it is, it pushes some database pages previously cached by the operating system out of the operating system page cache.  The next access to those pages requires reading them from disk.  The cache hit ratio drops, and system performance drops dramatically.  On Linux, where processes that are waiting for I/O count towards the system load average, an enormous spike in the system load average often results.

Under any circumstances, reading from disk is vastly slower than reading from memory, but reading data from disk sequentially is 10 to 100 times faster than random I/O.  Unfortunately, it's often the case that the task which is evicting data from memory is writing data sequentially while the underlying database workload is typically accessing some working set of pages in a more-or-less random fashion.  The result is that data is removed from the cache at a vastly higher rate than it can be read back in.  Even after the bulk operation terminates and the cache-purging ceases, it can take a painfully long time - sometimes many hours - for random I/O to bring all of the hot data back into memory.

All of this pain can be avoided by buying enough memory.  It's true that database sizes are growing larger and larger, and more and more people have terabyte or even petabyte-scale databases that can't ever be cached fully in RAM.  But it's also true, I think, that the number of people who can fit their database in memory today, for a reasonable price, is larger than ever before.  I've seen systems recently that support up to 192GB of memory per CPU at very reasonable prices, and I've run across or been told about systems with 1TB or even 2TB of RAM that are described as commodity hardware.  If your PostgreSQL database is smaller than that, and you don't have enough RAM to keep all of it in memory at all times, why not?

There are a number of good answers to that question.  For example, maybe your database is very write-heavy, so that performance is limited by the rate at which data can be flushed to disk.  In that case, adding RAM might not help very much.  Or, maybe your database executes mostly batch jobs, so that you can control the number of tasks running at one time, and aren't sensitive to query latency.  Or, maybe your workload is heavily CPU-bound, so that even if the operating system cache is wiped out completely, the impact is not too severe.  But if you don't have an answer to that question, you probably ought to just buy more RAM.

Two final points to consider:

1. There is often a dramatic performance difference between having enough RAM, and not quite enough RAM.  For example, consider repeatedly reading, in a sequential fashion, 16GB of data.  If you have 15GB of RAM available to cache that data, the operating system will apply an LRU policy, and every read will require a physical I/O.  Your cache hit ratio will be 0%.  But if you have 17GB of RAM, all of that data will stay resident, and your cache hit ratio will be 100%.  Although the analysis is complicated, essentially the same logic applies to data which you access randomly rather than sequentially.  If the access is relatively uniform, cache hit ratios will surge, and performance will increase dramatically, only when you have enough memory to fit all of the frequently-accessed data.  If adding memory doesn't seem to help, it's possible that you just haven't added enough.

2. It is often hard to tell how much RAM you really need for good performance, which means you might not realize that you're running low until things really take a turn for the worse.  Disk space utilization is progressive, so it's easy to monitor utilization.  CPU utilization fluctuates more, but you can take a measurement during a busy period to get an idea.  Memory is different: because of the way operating system and PostgreSQL caching works, it's likely that substantially all of your memory will be in use all the time, but it's very hard to tell how much of is being used for data that is only accessed occasionally and how much of it is being accessed frequently.  Because RAM is now relatively cheap, it's a good idea to err on the high side.


  1. While buffer cache is very important, performance penalty for fetching page from OS page cache is much less than getting them from disk. Assuming RAM is large enough, setting too small buffer cache doesn't 'hurt that much' . On the other hand it's vital to give postgresql enough per-process memory (work_mem, maintenance_work_mem and other settings) so that temporary results/ merge / hash joins don't spill to disk .

    Recently I worked on a very complex query (about 20 joins, as ugly as hibernate-generated query can be) which was very slow under default work_mem - 40 seconds - and quite fast - 0,3 seconds - after changing it to 4MB .


  2. Why do you (any many other pg hackers) still care about hdd io performance ???
    Please don't waste time and energy on ancient outdated irrelevant tech, please optimise for 2015, for pcie nand, not for the 1960's.
    Any user with a brain, who cares about performance now, would not dream of running a transactional rdbms on hdd.
    Of course RAM still beneficial for ssd, so you should also assume that serious users will also have reasonable amounts of ram, at least 32 Gig.
    It is I believe very important that postgres is developed, to take as much advantage of this exciting new paradigm, as possible, especially with hp's amazing memristor coming shortly.
    This new paradigm will, I think, result in a renaissance for relational model, even for many of the dim developers, who currently think, that a document structure for a database sensible, they will realise their nonsense, especially when they see, that postgres on ssd can query graph relations, as fast as a mongo or even neo4j - using a closure table - when auto updating mat views finally appear!
    Simple options on install, would help, such as, where to: dedicated server ?, virtual/physical? , ram small/medium/large, all ssd?, some ssd ?, pcie ssd ?, 10k rust?, 15k rust?, block size - then sensible defaults assigned relevant to answers.

    1. I applaud you for using your name but I must take strong exception to your repeated, derogatory and offensive characterizations of "pg hackers", "users with a brain", etc. Whatever the technical merits of your comments, your highly negative and derogatory tone is a real turn off.
      We don't all operate in an environment that we can control.. Any number of factors such as budget, legacy support, license, development costs, regulatory compliance, industry standards and a dozen others may limit our options.
      I now work with DBs where 32GBRAM is not sufficient for a dev server let alone production. That does not mean we don't have application for much smaller servers.

  3. We solved this problem with NVRAM-based storage. Indeed, the random-read performance of SSDs and PCIe solutions can drastically reduce memory pressure requirements in the long term. But even NVRAM and tons of RAM don't solve every issue. I believe it was you who contributed pg_prewarm to address those times when cold memory is just as bad as insufficient memory. I'm surprised you didn't talk about it here at all.

    1. pg_prewarm is definitely useful for warming the cache, but if your cache isn't big enough to hold all of your data, or if you don't know precisely which data you want to prewarm, it won't help. Your point about SSDs is a good one, but most systems that I've encountered are still using HDDs.