Robert Haas: March 2012

Monday, March 26, 2012

Why Is My Database Corrupted?

Working in the support department (as I do) is a good way to hear lots of horror stories. From time to time, we get reports of what can only be described as database corruption: errors reading tables (or even trying to connect to the database) due to missing files, corrupted data blocks or tuples, or rows that show up in the table but not its indexes. Attempting to decode a corrupted tuple can cause PostgreSQL to attempt to allocate a gigantic amount of memory (and fail with an ERROR) or even to die with a segmentation fault, aborting all in-progress transactions and restarting the server. Such occurrences are painful, to say the least, and users are often left wondering how it happens. How exactly does the database get corrupted?

Tuning shared_buffers and wal_buffers

I spend a lot of time answering questions about PostgreSQL, and one of the questions I get asked frequently is: how should I set shared_buffers? And, a bit less often, how should I set wal_buffers? I've got canned answers that I can rattle off so fast it'll make your head spin. Exceptions to my canned answers keep popping up, and it's starting to get hard to give an answer that actually captures all the complexity in this area, so here's a longer explanation.

Security Barrier Views

People sometimes want to use PostgreSQL to implement row-level security, and historically it has not been very easy to do that securely. You can try to do it by creating a view which exposes just some of the rows in the underlying table, and grant access to the view but not the underlying table to the user to whom you wish to expose the data; but this turns out not to be secure. Consider the following example.

First Results for Write Performance on IBM POWER7

In a previous blog post, I posted some SELECT-only pgbench results on IBM POWER7, and promised to post read-write results when I had them. That took a little longer than expected due to periodic lock-ups on the machine, which seem to have been resolved by a kernel update (thanks to Brent Baude at IBM for some timely help with this issue). But now I have some.

The Perils of Collation-Aware Comparisons

If you use psql's \l command to list all the databases in your PostgreSQL cluster, you'll notice a column in the output labelled "Collate"; on my system, that column has the value "en_US.UTF-8". This means that when, for example, you sort strings, you'll use the "English" sort order rather than the traditional "C" sort order. You might ask: what's the difference?

In the "C" sort order, all capital letters come before all lower-case letters, whereas in en_US.UTF8, a comes before A which comes before b which comes before B, and so on. In other words, every collation can have its own rules for sorting strings, consistent with the way that the people who speak that language like to alphabetize things; or at least with the way that the people who wrote the locale definitions for that language think that they like to alphabetize things. Collations are OS-dependent: some operating systems don't support them at all, while Windows has a completely different naming convention from every other operating system, and probably different behaviors as well.

The Git Workflow

When the PostgreSQL project decided to migrate to git, we decided not to allow merge commits. A number of people made comments, in a number of different fora, to the effect that we weren't following "the git workflow". While a few commentators seemed to think that was reasonable, many thought that it demonstrated our ignorance and stupidity, and some felt it outright heresy.

So I noted with some interest Julio Hamano's blog post about the forthcoming release of git 1.7.10, which is slated to include a change to the way that merge commits work: users will now be prompted to edit the commit message, rather than just accepting a default one. Actually, what I found most interesting where Linus Torvalds' comments on this change, particularly where he says this: "This change hopefully makes people write merge messages to explain their merges, and maybe even decide not to merge at all when it's not necessary." His comments are quoted more fully in the above-linked blog article; unfortunately I don't know how to link directly to his Google+ post. And Julio Hamano makes this remark: "Merging updated upstream into your work-in-progress topic without having a good reason is generally considered a bad practice. [...] Otherwise, your topic branch will stop being about any particular topic but just a garbage heap that absorbs commits from many sources, both from you to work on a specific goal, and also from the upstream that contains work by others made for random other unfocused purposes."

Performance and Scalability on IBM POWER7

I recently had a chance to run some benchmarks of PostgreSQL 9.2devel on an IBM POWER7 machine provided by IBM and hosted by Oregon State University's Open Source Lab. I'd like to run more tests, but for a first pass I decided to run a SELECT-only pgbench test at various client counts and scale factors. This basically measures how quickly we can do primary key lookups in a completely read-only environment. This generated a pretty graph. Here it is.

Robert Haas

Monday, March 26, 2012

Why Is My Database Corrupted?

Thursday, March 15, 2012

Tuning shared_buffers and wal_buffers

Wednesday, March 14, 2012

Security Barrier Views

Monday, March 12, 2012

First Results for Write Performance on IBM POWER7

Monday, March 05, 2012

The Perils of Collation-Aware Comparisons

Friday, March 02, 2012

The Git Workflow

Thursday, March 01, 2012

Performance and Scalability on IBM POWER7