Thursday, June 24, 2010

PostgreSQL as an In-Memory Only Database

There's been some recent, interesting discussion on the pgsql-performance mailing list on using PostgreSQL as an in-memory only database. In other words, you basically want to use it as a cache, similar to the way that you would use memcached or a NoSQL solution, but with a lot more features.

If you're interested in doing this, you'll need to configure the system so that you have a convenient, automatic way erase the database cluster and reinitialize it (using initdb) after an operating system crash. Per discussion on the mailing list, for best performance, it seems best to set up the data directory on a tmpfs and configure the following parameters in postgresql.conf:


With fsync=off, and most likely also with full_page_writes=off, your database will not be crash-safe - but you don't care, because you're planning to start from scratch after a crash anyway. If you're familiar with postgresql.conf parameters, setting synchronous_commit=off might seem redundant if you've already set fsync=off, but testing reveals that it still boosts performance. Turning off full_page_writes and bgwriter_lru_maxpages eliminates I/O that isn't needed for this use case.

On a related note, Gavin Roy gave a talk at PGCon comparing the performance of PostgreSQL with fsync=off with a number of NoSQL databases. The results were pretty good, but there might even be room for improvement with some additional tuning.

If you end up testing a configuration along these lines, please post a comment here or on the pgsql-performance mailing list with your experiences.

Wednesday, June 23, 2010

Working Toward PostgreSQL 9.0 Beta3

We are gradually creeping toward the release of PostgreSQL 9.0, but there's still a ways to go. We're continuing to get several bug reports per week about problems in 9.0 beta 2 - many thanks to all those who have tested and reported bugs. Here are some of the issues we've resolved in CVS since beta2:

- Fix a problem that could cause checkpoints to happen too infrequently when using streaming replication, with certain combinations of settings (Fujii Masao).
- Fix quoting problems in EXPLAIN (FORMAT YAML) output (Dean Rasheed).
- Fix a problem that could cause a "cache lookup failed for type %d" error when using PL/python (Tom Lane).
- Change pg_last_xlog_receive_location() and pg_last_xlog_replay_location() to return NULL instead of 0/0 when they do not apply (Heikki Linnakangas).
- Rename restartpoint_command to archive_cleanup_command, to clarify what it's for (Itagaki Takahiro).
- Allow replication connections to use a "replication" entry in .pgpass (Fujii Masao).
- Fix the newly added vacuumdb -Z option (Josh Berkus).
- Have pg_upgrade create its output files in a less surprising location (Bruce Momjian).
- Fix ALTER LARGE OBJECT and GRANT ... ON LARGE OBJECT to not break when an OID too large to be represented by a signed integer is used (Robert Haas).
- Avoid entering a tight loop when streaming replication hits a corrupt WAL record (Heikki Linnakangas).
- New contrib module for use as an archive_cleanup_command (Simon Riggs).
- Adjust GUC categories to match the docs (Itagaki Takahiro).
- Deprecate the use of => as an operator name, and remove or rename the new => operators in 9.0, so we can eventually use this for the purpose the SQL standards committee has in mind (Robert Haas).
- Revert apparently-useless code to add symbol table entries for PL/perl functions (Andrew Dunstan).
- Avoid calling malloc(0) in pg_upgrade (Bruce Momjian).
- Don't allow WAL to be streamed to the standby until it has been fsync'd on the master - otherwise, a master crash can effectively corrupt the standby database (Fujii Masao).
- Fix pg_upgrade problems on Win32 (Bruce Momjian).
- Various documentation improvements.

If you haven't yet beta-tested PostgreSQL 9.0, there's no time like the present! Ideally, load up your production data, run your production application against it, and let us know whether anything breaks. Or, pull the plug a few times and see if anything goes wrong; or try to break Streaming Replication or Hot Standby. This is shaping up to be a great release - but it's not done yet.

Sunday, June 13, 2010

Making PostgreSQL Faster

Although we've made great progress in speeding up PostgreSQL over the last few years, there's always more to be done. Performance, with PostgreSQL as with any other database, is largely determined by the availability of three resources: CPU, memory, and disk. What could we do to use each of these resources most efficiently?

PostgreSQL is already pretty efficient at using the CPU. For high-concurrency databases, I don't anticipate that things will get much better than they already are. For low-concurrency databases, we need parallel query - that is, the ability to use more than one CPU to process the same query.

Memory is a little bit more of a problem. We do a good job keeping our memory footprint small, but we don't manage it terribly well. work_mem limits the maximum size of a sort or hash, but takes no account of current conditions: if the system is swapping due to memory pressure, you get the same plan as if the system has 40GB of free memory. And all the memory allocated to shared_buffers remains allocated even when it isn't truly needed.

I/O is perhaps the biggest problem. I don't think this problem is unique to PostgreSQL - I believe all databases probably share this pain point to some degree. Disks are slow. With respect to PostgreSQL specifically, there are a number of things we need to do to minimize our I/O bandwidth, including index-only scans and further improvements to VACUUM. Partial vacuum (implemented in 8.4) is a pretty big deal, but there's more that needs to be done.

We also need to put more effort into minimizing our on-disk format and WAL volume. The actual disk space is cheap, but the time needed to read and write a larger volume of data hurts performance.

Tuesday, June 08, 2010

Why Join Removal Is Cool

When people talk to me about the (limited implementation of) join removal that will be part of PostgreSQL 9.0, the conversation usually goes in two ways. Some people ask how the feature works and then say something like "oh, I guess that could be useful every once in a while". Other people already know exactly how the feature works and usually say some variant of "this is an amazingly wonderful feature that I am looking forward to with great enthusiasm".

The difference between these two groups of people (I think) is not so much their level of technical knowledge or how closely they've been following pgsql-hackers, but their use case. If your database is primarily a data warehouse, my guess is that you won't have many occasions to benefit from join removal. Where this feature really comes in handy is in OLTP workloads with highly normalized data, in situations where users are generating queries against views (perhaps through some sort of reporting interface) and expecting to get results back immediately.

Let's take an example. Suppose you're writing a bug-tracking system. Each bug has a number of properties associated with it: who reported it, who's working on it, current status, priority, date opened, release for which it's slated to be fixed, date of last status change, date resolved, people who want to get an email when it's updated, comments, etc. If like me you're a big fan of database normalization, you'll not want to store all of these as text fields. So you might end up with a table like this:

id serial,
reporter_id integer not null references person (id),
assigned_to_id integer references person (id),
status_id integer not null references bug_status (id),
priority_id integer not null references priority (id),
target_release_id integer references release (id),
open_date date not null,
primary key (id)

You'll probably also end up with some supplementary tables for the items that can exist multiple times, like bug_comment and bug_watchers. Now, to make reporting easier, you'll probably want to define a view over the bug table that joins to all the other tables, so that it's easy to get the text values for the reporter, status, etc.

b.reporter_id, AS reporter,
b.assigned_to_id, AS assigned_to,
b.status_id, AS status,
b.priority_id, AS priority,
b.target_release_id, AS target_release,
bug b
JOIN person r ON b.reporter_id =
JOIN bug_status s ON b.status_id =
JOIN priority p ON b.priority_id =
LEFT JOIN person at ON b.assigned_to_id =
LEFT JOIN release tr ON b.target_release_id =;

And now you can pretty easily write an engine that will let users select the columns they'd like to see from bug_view and the filter conditions they'd like to apply (only open bugs, only bugs slated to be resolved in release X, etc.) via a spiffy web interface. Note that the reporter, bug status, and priority fields can't be null, so we can use a plain old JOIN, but the bug might be assigned to no one or have no target release, so we use LEFT JOIN in those cases. (Otherwise, rows where those fields were NULL would not appear in the output.)

Over time, you'll tend to add more fields. Scalar fields like open_date don't cause much heartache, but as you add more fields that require joins, your view will tend to slow down. Some people might say that the answer is simply to denormalize - use natural keys in the bug table, and don't join. While that solution may be appropriate for some people, it is not without its downsides: database normalization was invented for a reason. The good news is that PostgreSQL is fast and has an excellent query planner, so even fairly complex queries run quite quickly. The bad news is that every query against the view is going to hit every table that's part of the view definition, so if you add enough of them, it's eventually going to be slow.

And, realistically, most of the time, users aren't going to want all the columns anyway. In a web application, 8-12 columns of output in an HTML table is typically about as much as you can squeeze in without starting to have a lot of line wrapping issues. This leads pretty naturally to the following question: if you don't need all of the columns, can you skip some of those joins and speed up the query?

Yes. In PostgreSQL 9.0, we can drop a join against a base table if (1) it's a left join, (2) there is a unique index on all or a subset of the join columns, and (3) none of the attributes from the nullable side of the join are used elsewhere in the query. So, in the above example, we could skip the joins to person at or release tr if the assigned_to or target_release columns, respectively, are not selected, assuming those tables have unique indexes on their id columns (if they don't, the join might change the number of rows in the output, so we must perform it).

We can't skip joining to any of the other tables, because those are inner joins. That's an implementation restriction which I hope will be lifted in PostgreSQL 9.1, but some more logic is needed to make that safe. In the meantime, a useful workaround may be to write those joins as LEFT JOINs rather the INNER JOINs, in cases where either join type will produce the same results.