Josh Berkus' recent blog posting on What We Should Be Learning from MySQL, part 2 includes the following quote: "We tend to treat all data in Postgres as if it were made of gold, and not all data is equally valuable." He goes on to wonder what we can do to better provide for the case where your data isn't made of gold.
As it happens, I have an answer to that question - one answer, anyway. I just recently submitted a patch implementing unlogged tables, which I hope will become part of PostgreSQL 9.1. Unlogged tables are tables for which data changes are not written to the write-ahead log, meaning that they are not replicated and cannot survive a crash; instead, an unlogged table is automatically truncated at database startup. They therefore are appropriate for non-critical data which you can afford to lose in the event of unexpected downtime, such as user session information. What you get in exchange for giving up durability is better performance (disclaimer: not all workloads will show as much benefit as Andy Colson found here).
Of course, in existing versions of PostgreSQL, it is already possible to configure your database for fast, unreliable operation, as I've blogged about in the past. Run at top speed and, if the database crashes, just blow it away and make a new one. Unlogged tables, however, are in many ways even better suited to this use case. First, they should be even faster than running with fsync turned off. Instead of writing WAL but not making sure it hits the disk, we simply don't write WAL at all. Second, you can mix more and less important data in the same database and get the durability guarantee that you want for each type. As far as I know, no other database product provides such a feature. And third, even if you aren't mixing unlogged tables with ordinary tables, it's nicer to have your tables get truncated (and you can just start loading data into them again) than to have your whole database be corrupted and need to be blown away and recreated from scratch (postgresql.conf, user provisioning, table schemas, etc.).
If unlogged tables aren't right for your use case, another important knob to consider - which already exists in released versions - is synchronous_commit. When synchronous_commit is disabled, which can be done either globally or only for certain transactions, the write-ahead log entries for transaction commit is written to disk asynchronously. If your system is properly configured, and synchronous_commit=on, then when you get that COMMIT message, your transaction is absolutely guaranteed to be durably committed. With synchronous_commit=off, there is a very short window of time after you've received the COMMIT during which your transaction could still be lost in the event of a system crash. Typically, synchronous_commit=off provides a significant performance boost - if it doesn't, chances are your system has a WAL reliability problem. For many applications, especially web applications, a short window of time during which a transaction can be lost is an acceptable price to pay for a large performance increase.
I think that, in the future, we may be able to provide more options to allow people to relax the data integrity guarantees that PostgreSQL provides in controlled ways. For example, I can imagine a "dirty read" table, where transactions are not used; instead, rows become visible as soon as they're inserted, and disappear as soon as they're deleted. Such a table would be unsuitable for many business applications, but if your application only does single-row operations indexed by primary key, it might work just fine; and it would open up a number of interesting optimization opportunities that aren't available for ordinary tables. Or, you might have a "no snapshot" table, where rows don't become visible until the inserting transaction commits, but we make no attempt to guarantee serializability: rows appear pop into existence the instant they're committed, and disappear out from under you if a deleting transaction commits.
Of course, it also bears mentioning that there are many cases where treating your data as if it were made of gold is a good thing. Unlogged tables provide an option to weaken ACID semantics to improve performance; there is simultaneous work going on to allow trade-offs in the other direction: less performance for even better ACID semantics. There is also general performance and reliability work going on where the community is simply making the product better, without giving up anything. Ultimately, the goal is to developer a product that can be useful to many different people with many different use cases. Ideas for further improvement are always welcome.