Robert Haas: My Patches Are Breeding

One of the great things about being a long-term contributor to an open source project like PostgreSQL is that you get to see other people take the stuff you've done and use it as a stepping stone to bigger and better things. One of my early PostgreSQL hacking projects was a patch to extend the syntax of EXPLAIN. Prior to 9.0, the grammar looked like this:

EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

Now, there's nothing particularly wrong with that grammar from a usability perspective, but it turns out to be pretty terrible for extensibility. Let's suppose we want to add a new EXPLAIN option that does something new and different - say, omit the costing information from the output. Then we have to change the grammar to something like this:

EXPLAIN [ ANALYZE ] [ VERBOSE ] [ NOCOSTS ] statement

There are a couple of problems with this. One is that, as the number of options increases, it gets hard to remember the order in which they must be specified. You might think that it would be easy enough to recode the grammar to look like this:

EXPLAIN [ ANALYZE | VERBOSE | NOCOSTS ]... statement

...and it might be, but it's surprisingly easy, when using bison, to create situations that bison finds ambiguous, even though a human being might not. Another problem is that NOCOSTS has to become what's called a keyword, which has a very small but nonzero distributed cost across our entire grammar. Rightly or wrongly, a number of patches that proposed to enhance EXPLAIN in various ways got shot down because of these issues.

Now, I will come clean and admit that I absolutely love EXPLAIN. It's one of my favorite things about PostgreSQL. There are things about it I don't like, but overall, it's a fantastic tool, and the number of problems I've solved over the years by staring at the EXPLAIN output is enormous. So I was disappointed to see what I considered to be worthwhile improvements to EXPLAIN get rejected due to grammar issues. So I spent some time studying the grammar and eventually figured out a way to allow support for arbitrary options, to be specified in arbitrary order, without adding keywords, using syntax similar to what we already allowed elsewhere. The new syntax looks like this:

EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

    ANALYZE [ boolean ]
    VERBOSE [ boolean ]
    COSTS [ boolean ]
    BUFFERS [ boolean ]
    FORMAT { TEXT | XML | JSON | YAML }

So in other words, you can do stuff like this:

EXPLAIN (COSTS OFF) statement

The initial patch (linked above) added support for the parenthesized-options syntax and added the COSTS option. A subsequent patch, also written by me, added support for the FORMAT option, which initially allowed TEXT, XML, or JSON. Then, Greg Sabino Mullane wrote a patch to add YAML output. Itagaki Takahiro wrote a patch to add the BUFFERS option. All these were available in 9.0. Just recently, Tomas Vondra wrote a patch to add a new TIMING option, which will be in 9.2. Greg Stark proposed a RESOURCE option a while back, which didn't get finished, but perhaps will be someday. And I suspect there will be others as well. Meanwhile, we've also adopted similar syntax for COPY and VACUUM to solve similar problems there, in each referring back to the new EXPLAIN syntax as precedent.

Poking at the bison grammar, or refactoring more generally, is typically not very glamorous work: the amount of effort that is required is usually pretty large, relative to the amount of immediate benefit you get out of it. But it's kind of fun to see people take that work and think up new and interesting uses for it, even years later, that I would never have thought of myself.

Robert Haas

Wednesday, February 08, 2012

My Patches Are Breeding

No comments:

Post a Comment