Tuesday, March 14, 2017

Parallel Query v2

A recent Twitter poll asked What is your favorite upcoming feature of PostgreSQL V10?  In this admittedly unscientific survey, "better parallelism" (37%) beat out "logical replication" (32%) and "native partitioning" (31%).  I think it's fruitless to argue about which of those features is actually most important; the real point is that all of those are amazing features, and PostgreSQL 10 is on track to be an amazing release.  There are a number of already-committed or likely-to-be-committed features which in any other release would qualify as headline features, but in this release they'll have to fight it out with the ones mentioned above.

Still, I can't say that I'm unhappy that better parallelism ended up being the most popular choice in the poll mentioned above, and I'd just like to highlight a few of the current and proposed improvements that will or may appear in this release.

Already Committed

These improvements have been committed, and will appear in PostgreSQL 10 unless, due to some issue, it becomes necessary to revert them.

Parallel Bitmap Heap Scan (Dilip Kumar).  The driving table of a parallel query must be scanned with a parallel-aware scan type capable of partitioning the scan among the available workers; in PostgreSQL 9.6, the only type of parallel-aware scan was a parallel sequential scan, which means that you had to choose between using parallelism and using your indexes.  With a parallel bitmap heap scan, one process scans the index and builds a data structure in shared memory indicating all of the heap pages that need to be scanned, and then all cooperating processes can perform the heap scan in parallel.  It would be nice to be able to perform the index scan in parallel, too, but that will have to wait for a future release.

Parallel Index Scan (Rahila Syed, Amit Kapila, Robert Haas), Parallel Index-Only Scan (Rafia Sabih).  Infrastructure has been added to allow index access methods to support parallel scans, the btree access method has been modified to use this infrastructure, and the optimizer and executor have been taught to use these capabilities for index scans and index only scans.  Therefore, it's now possible for the driving table to be scanned using an index-scan or an index-only scan.  I suspect that, because of the way this is implemented, the maximum degree of useful parallelism will be less in this case than for a parallel bitmap heap scan or parallel sequential scan.  It's also true that index scans tend to lose to bitmap scans when the number of rows being returned is large, and if the number of rows being returned from the driving table isn't large, you probably aren't getting a lot of benefit out of parallel query.  Still, these changes have led to large improvements on certain queries.

Gather Merge (Rushabh Lathia).  The Gather node introduced in PostgreSQL 9.6 gathers results from all workers in an arbitrary order.  That's fine if the data that the workers were producing had no particular ordering anyway, but if each worker is producing sorted output, then it would be nice to gather those results in a way that preserves the sort order.  This is what Gather Merge does.  It can speed up queries where it's useful for the results of the parallel portion of the plan to have a particular sort order, and where the parallel portion of the plan produces enough rows that performing an ordinary Gather followed by a Sort would be expensive.

Parallel Merge Join (Dilip Kumar).  In PostgreSQL 9.6, only hash joins and nested loops can be performed in the parallel portion of a plan.  In PostgreSQL 10, merge joins can also be performed in the parallel portion of the plan.  In some sense, these are not truly parallel hash joins, because while each participant sees only a subset of the rows on the outer side of the join, each participant must visit every row on the inner side of the join; otherwise, the join results might be incomplete.  This means that the work on the outer side of the join is divided among the participants, but the work on the inner side of the join is duplicated by every participant.  Better strategies are possible, and have been documented in the academic literature on parallel query, but this still sometimes improves on what was possible in PostgreSQL 9.6, especially because parallel index scan provides a way of getting result in sorted order without needing an explicit sort.

Subplan-Related Improvements (Amit Kapila).  In PostgreSQL 9.6, subplans were not passed from the leader to the workers, and therefore any table which appeared in the query plan with an associated subquery was not a candidate for parallelism.  In PostgreSQL 10, this restriction has been slightly relaxed.  Now, a table with an uncorrelated subplan can appear in the parallel portion of the plan.  Unfortunately, a table with a correlated subplans or a reference to an InitPlan is still not able to be considered for parallelism; there is substantial further work to be done in this area.

Pass Query Text To Workers (Rafia Sabih).  In PostgreSQL 9.6, the query text associated with a parallel worker does not show up in pg_stat_activity; in PostgreSQL 10, it will.  Also, if you're unlucky enough to have a parallel worker crash, the query text for the query which it was running will now be reported, just as for crashes in user-connected backends.  These improvements are helpful for monitoring and troubleshooting.

Still Pending

These improvements have been submitted for possible inclusion in PostgreSQL 10, but are neither committed nor deferred to a future release as of this writing.

Parallel CREATE INDEX (Peter Geoghegan).  The server-side infrastructure for parallelism was built in such a way as to allow it to be applied to utility commands as well as queries, but we haven't got any examples yet.  This proposed work speeds up btree index creation by about a factor of three.

Better Parallel Hash Join (Thomas Munro).  In PostgreSQL 9.6, the only way to perform a hash join in parallel is for each worker to build a private copy of the hash table.  Then, all workers scan the probe table in parallel and each probes its own hash table.  This is fine, and perhaps optimal, when the build table is small, because each worker has entirely contention-free access to its own hash table.  However, it's a poor strategy when the hash table is large or expensive to build.  The proposed patch fixes this by allowing workers to build a single, shared hash table, optionally in parallel.   For the most part, the advantage of this patch is not that it makes the join run any faster, but that it allows both inputs to the join to be constructed in parallel rather than only one of them.

Pass InitPlan Values to Workers (Amit Kapila).  If an InitPlan is attached to a query above a Gather node and used below the Gather node, we could compute the value in the leader and disseminate it to all the workers.  Currently, plans of this type are disallowed because we have no mechanism for passing the value down to the workers; this patch adds such a mechanism, and allows the corresponding plans.

Parallel Append (Amit Khandekar).  Currently, Append can be used in parallel queries only in very limited ways.  A partitioned table can be scanned in parallel, but every partition must be scanned using some type of parallel plan.  While the changes mentioned above make this much less of a restriction in PostgreSQL 10 than it was in PostgreSQL 9.6, it's still possible for the fastest method of scanning a table to be some kind of non-parallel plan, such as an index scan on a non-btree index.  Furthermore, when an Append does appear in the parallel portion of a plan, the first child plan will be scanned to completion before beginning the second child plan, which will be completed before beginning the third child plan, and so forth.  It will often be better to spread out the workers so that, for example, I/O from different disks can happen in parallel, and to reduce CPU contention.  This patch aims to address these problems, as well as to allow UNION ALL queries to be parallelized in the obvious way.

Improve Access to Parallel Query from Procedural Languages (Rafia Sabih).  In PostgreSQL 9.6, certain queries in PL/pgsql procedures can access parallel query, but other procedural languages cannot.  Work is underway to improve access to parallel queries across all built-in procedural languages, both by adding support for languages other than PL/pgsql, and also by expanding the range of situations in which parallel query can be used from within PL/pgsql.

Allow Parallel Query at SERIALIZABLE (Thomas Munro).   The serializable isolation level does not current support parallel query.  Patches are proposed to lift this restriction.


With the exception of Peter Geoghegan, all of the developers mentioned above are employed by EnterpriseDB.  I'd like to take this opportunity to thank EnterpriseDB management for continuing to support this work in the PostgreSQL community; 100% of the work that EnterpriseDB has done in this area has been submitted to the PostgreSQL community using the PostgreSQL community process and under the PostgreSQL license.   We expect to continue making such contributions in future releases.

In the process of developing, testing, and benchmarking these improvements, my colleagues and I have learned quite a bit about what distinguishes a good parallel query idea from a bad one, which I think will be useful in setting direction for future parallel query work.  I'll share some insights about that topic in a future blog post, as this one has already grown quite long.

Please feel free to add comments below!  I'm interested to hear what you like and dislike about the new parallel query improvements, and about what work you would like to see done in the future.

10 comments:

  1. Interesting read, thank you.

    ReplyDelete
  2. Nice write up Robert!

    Parallel create index concurrently sure would have been useful to me last week when I dropped an index that was unexpectedly still in use, and mostly took down Bitbucket for the 15 minutes it took to rebuild it. (facepalm)

    ReplyDelete
  3. Looking forward to all this, but I fear that understanding explain statements and predicting a query's performance profile has become much more complicated ? Hopefully only humans will get confused, and the query planner will do a good job.

    ReplyDelete
    Replies
    1. It's certainly true that every time we add complexity to the executor, the plans get more complicated. Worse, every new possible plan type we add increases the chances that the query planner will get it wrong, because the total number of options has increased and there's still only one of them that is the fastest. On top of that, the way we're doing planning right now is fairly basic, and no doubt could stand to be improved.

      Despite all that, I think we're fairly clearly going in the right direction. The new plan types are fairly straightforward generalizations of things we already have, and I think people will get used to them fairly quickly. Even if they don't, the only way to avoid adding the complexity is to skip adding the features, and I don't think anybody wants that.

      Delete
  4. Thanks for this read. Very inspiring. Kudos, EnterpriseDB & PostgreSQL - among the most substantial open-source projects.

    ReplyDelete
  5. Nice work. Are there plans to support parallel query with Inserts. For example create tables as select where the select is run in parallel.

    ReplyDelete
    Replies
    1. Actually, there's a proposed patch for that which I thought was dead for this release when I wrote the above, but the patch author wrote back some more things about it, so maybe it's not so dead. I have to look at it again. If it doesn't happen in v10, I'm guessing we'll try to make something happen for v11.

      Delete
    2. Thanks for the response. For ELT workloads a serial insert after a parallel select would be very handy. In an explain plan most of the time is spent in the query part.

      Delete
    3. I see that the plan is for Parallel Query with updates planned for Postgres 11.

      https://wiki.postgresql.org/wiki/Fujitsu_roadmap

      Delete
  6. Great post Robert, keep up the great work

    ReplyDelete