It's been over a year since I last blogged about parallelism, so I think I'm past due for an update, especially because some exciting things are happening.
First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.
Second, I have published a draft patch for what I am calling parallel mode and parallel contexts. With the exception of some demonstration code that I included to show how the infrastructure is intended to work, this code isn't intended to do any particular thing in parallel; it is rather a toolkit for writing parallel algorithms in PostgreSQL. The demonstration code includes about 100 lines of code that support counting the rows in a table in parallel. Being able to write code to do something useful in parallel in 100 lines of code is pretty neat, even if (as is certainly the case) more work is needed before this infrastructure is production-ready.
Third, I have published a patch introducing a contrib module called pg_background. pg_background lets you run a command in a background worker process and retrieve the results. The main reason for proposing this patch was to get a bunch of
infrastructure that will be needed for parallelism committed, and those
preparatory patches are now all committed. But the patch itself is also useful: it can be used as a low-performance substitute for autonomous transactions; or as a way to run utility commands such as VACUUM from inside a function, which normally wouldn't be possible. The patch needs some more work to be ready for commit, but it does basically work; if you're interested, you can download the latest version.
So we're not there yet, but we're getting there.
this is really great. i can see how it improves performance of analytical queries especially when I/O can take more load.
ReplyDelete