Robert Haas: parallelism

Showing posts with label parallelism. Show all posts

Thursday, June 21, 2018

Using force_parallel_mode Correctly

I admit it: I invented force_parallel_mode. I believed then, and still believe now, that it is valuable for testing purposes. Certainly, testing using force_parallel_mode=on or force_parallel_mode=regress has uncovered many bugs in PostgreSQL's parallel query support that would otherwise have been very difficult to find. At the same time, it's pretty clear that this setting has caused enormous confusion, even among PostgreSQL experts. In fact, in my experience, almost everyone who sets force_parallel_mode is doing so for the wrong reasons.

Parallel Query v2

A recent Twitter poll asked What is your favorite upcoming feature of PostgreSQL V10? In this admittedly unscientific survey, "better parallelism" (37%) beat out "logical replication" (32%) and "native partitioning" (31%). I think it's fruitless to argue about which of those features is actually most important; the real point is that all of those are amazing features, and PostgreSQL 10 is on track to be an amazing release. There are a number of already-committed or likely-to-be-committed features which in any other release would qualify as headline features, but in this release they'll have to fight it out with the ones mentioned above.

PostgreSQL 9.6 with Parallel Query vs. TPC-H

I decided to try out parallel query, as implemented in PostgreSQL 9.6devel, on the TPC-H queries. To do this, I followed the directions at https://github.com/tvondra/pg_tpch - thanks to Tomas Vondra for those instructions. I did the test on an IBM POWER7 server provided to the PostgreSQL community by IBM. I scaled the database to use 10GB of input data; the resulting database size was 22GB, of which 8GB was indexes. I tried out each query just once without really tuning the database at all, except for increasing shared_buffers to 8GB. Then I tested them again after enabling parallel query by configuring max_parallel_degree = 4.

Parallel Query Is Getting Better And Better

Back in early November, I reported that the first version of parallel sequential scan had been committed to PostgreSQL 9.6. I'm pleased to report that a number of significant enhancements have been made since then. Of those, the two that are by the far the most important are that we now support parallel joins and parallel aggregation - which means that the range of queries that can benefit from parallelism is now far broader than just sequential scans.

Parallel Sequential Scan is Committed!

I previously suggested that we might be able to get parallel sequential scan committed to PostgreSQL 9.5. That did not happen. However, I'm pleased to report that I've just committed the first version of parallel sequential scan to PostgreSQL's master branch, with a view toward having it included in the upcoming PostgreSQL 9.6 release.

Planning Parallel and Distributed Queries

I have been somewhat lax about blogging for the last six months or so due to having been even busier than usual with various projects, and I think that's likely to continue for at least the next month or two as I work to finish the first version of parallel query for PostgreSQL. If you have been following the PostgreSQL commit log recently, you will have noticed many new commits building up towards that goal.

However, I wanted to take a minute to point out the presentation that I did yesterday at 2015.pgconf.eu, which I have now uploaded to my presentations web site. The title of the presentation is "Planning Parallel and Distributed Queries". If you have not closely followed the development of parallel query, you might find this presentation interesting to review, because it gives examples of the types of query plans I hope that PostgreSQL will be able to generate in the future.

(Everything in the talk represents future work ... and not all of it will be in 9.6!)

Wednesday, March 18, 2015

Parallel Sequential Scan for PostgreSQL 9.5

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5. It is not all there yet, but we are making very good progress. I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort. Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done. Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.

Parallelism Update

It's been over a year since I last blogged about parallelism, so I think I'm past due for an update, especially because some exciting things are happening.

First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.

Parallelism Progress

For the last several months, I have been spending a large percentage of my time trying to bring parallelism to PostgreSQL. Previous blog posts on the future direction of PostgreSQL development have often mentioned this as a priority, although the top spot has usually been reserved for materialized views, a feature which now exists in PostgreSQL 9.3 and which has been improved in PostgreSQL 9.4. My colleague at EnterpriseDB, Kevin Grittner, is continuing to work on further improvements in that area. But my focus is on parallelism. So, how's that going?

Robert Haas

Thursday, June 21, 2018

Using force_parallel_mode Correctly

Tuesday, March 14, 2017

Parallel Query v2

Thursday, April 21, 2016

PostgreSQL 9.6 with Parallel Query vs. TPC-H

Monday, March 21, 2016

Parallel Query Is Getting Better And Better

Wednesday, November 11, 2015

Parallel Sequential Scan is Committed!

Friday, October 30, 2015

Planning Parallel and Distributed Queries

Wednesday, March 18, 2015

Parallel Sequential Scan for PostgreSQL 9.5

Monday, December 22, 2014

Parallelism Update

Thursday, October 17, 2013

Parallelism Progress