I'm really pretty fired up about the results of our PGCon 2010 Developer Meeting. Of course, the list of what everyone plans to work on is pretty interesting, and if we got even a fraction of those features we'd have a pretty awesome release. But that's not really what got me fired up. What I'm excited about is some of the new and innovative thinking on replication and clustering - or, at any rate, it was new to me.
Two concepts in particular stand out for me. First, we discussed the ability to give replication solutions a crash-recoverable view into transaction commit order, a point which Jan Wieck has since expanded into a full-blown design proposal. Jan believes that this infrastructure will be useful not only for his own project, Slony, but also for other replication solutions such as Londiste which also operate by replaying transactions in commit order. As I understand it, one of the major advantages of this approach is that it eliminates the need for a global shared counter to track the order of writes (which becomes a bottleneck). Instead, they can be tagged with their order within the top-level transaction, and then the transactions as a whole can be ordered using the transaction commit ordering information.
Second, I was very interested in our discussion of a global transaction manager, for which I unfortunately do not have a good link for further reading. One possible way of avoiding cross-node serialization anomalies in a distributed database environment is to have a single node which knows about all in-flight transactions and hands out snapshots that are coherent across the entire cluster. Postgres-XC takes this approach, but there might be value in integrating something like this into core PostgreSQL. We might imagine allowing one PostgreSQL instance to be configured as a "snapshot provider" and another instance to subscribe to it. Right now, it's not clear that there's enough benefit to core PostgreSQL from accepting a patch along these lines, but there are several ways that might change as our distributed computing capabilities improve. For example, if we had a significant SQL/MED implementation, we'd need to think about how to do serialization correctly across multiple nodes; there might also be applications as we work to expand the capabilities of Hot Standby.
If your eyes are glazing over at this point, you're probably not alone. These features are fairly esoteric. Still, I think the fact that we're starting to seriously talk about this topics and consider integrating some of them into core shows that we're starting to understand better what the real needs are for replication and clustering. As our understanding of those needs continues to improve, I expect to see more capabilities in core PostgreSQL, but perhaps even more importantly, an even stronger set of tools around core PostgreSQL that will make it progressively easier to scale horizontally. I don't expect this to happen overnight, but I feel like we're moving in the right direction.
Monday, May 24, 2010
Thursday, May 20, 2010
Global Temporary and Unlogged Tables
From a technical standpoint, temporary tables in PostgreSQL have three properties that distinguish them from ordinary tables:
1. They're stored in a special schema, so that they are normally visible only to the creating backend.
2. They are managed by the local buffer manager rather than the shared buffer manager.
3. They are not WAL-logged.
It makes sense to think about removing these properties one by one, in the order listed above. Removing just the first property, without doing anything else, doesn't quite make sense, because a table which is managed by the local buffer manager can't be simultaneously accessed by multiple backends. We could work around this by having each backend access a separate set of files. This would give us a global temporary table - that is, a table which is visible to everyone, but each backend sees its own contents. (There is some debate about whether this is the right name, or what the right name for this concept might be - but that's what I'm calling it for now.)
Removing both of the first two properties also makes sense. It gives us an unlogged table - that is, a basically ordinary table for which no WAL is written. (Again, the naming is debatable.) Such tables are not crash-safe: an unexpected system crash could leave the table hopelessly corrupted. The only obvious workaround for this problem is to truncate the table on every system restart.
Why might someone want these new table types? Global temporary tables are appealing for users who need temporary tables with a relatively fixed structure, and don't want to recreate them in every new session. In addition to administrative convenience, this avoids the overhead of repeatedly creating and vacuuming the system catalog entries associated with the temporary tables, which may be a performance benefit for some users.
Unlogged tables are appealing for data that needs to be shared across backends, but which we're willing to lose in the case of a server restart. For example, consider a web application maintaining a table of active user sessions. If the server restarts, we may be willing to lose this data. Everyone will need to log in again, but considering that database crashes are rare, that may not be such a big deal. Unlogged tables also won't be replicated to standby servers, since replication relies on WAL. But, on the plus side, skipping WAL-logging should hopefully yield a significant performance benefit.
I'm going to be working on implementing both of these table types for PostgreSQL 9.1. In each case, the hardest part seems to be making sure that we clean up properly after a crash or server restart.
1. They're stored in a special schema, so that they are normally visible only to the creating backend.
2. They are managed by the local buffer manager rather than the shared buffer manager.
3. They are not WAL-logged.
It makes sense to think about removing these properties one by one, in the order listed above. Removing just the first property, without doing anything else, doesn't quite make sense, because a table which is managed by the local buffer manager can't be simultaneously accessed by multiple backends. We could work around this by having each backend access a separate set of files. This would give us a global temporary table - that is, a table which is visible to everyone, but each backend sees its own contents. (There is some debate about whether this is the right name, or what the right name for this concept might be - but that's what I'm calling it for now.)
Removing both of the first two properties also makes sense. It gives us an unlogged table - that is, a basically ordinary table for which no WAL is written. (Again, the naming is debatable.) Such tables are not crash-safe: an unexpected system crash could leave the table hopelessly corrupted. The only obvious workaround for this problem is to truncate the table on every system restart.
Why might someone want these new table types? Global temporary tables are appealing for users who need temporary tables with a relatively fixed structure, and don't want to recreate them in every new session. In addition to administrative convenience, this avoids the overhead of repeatedly creating and vacuuming the system catalog entries associated with the temporary tables, which may be a performance benefit for some users.
Unlogged tables are appealing for data that needs to be shared across backends, but which we're willing to lose in the case of a server restart. For example, consider a web application maintaining a table of active user sessions. If the server restarts, we may be willing to lose this data. Everyone will need to log in again, but considering that database crashes are rare, that may not be such a big deal. Unlogged tables also won't be replicated to standby servers, since replication relies on WAL. But, on the plus side, skipping WAL-logging should hopefully yield a significant performance benefit.
I'm going to be working on implementing both of these table types for PostgreSQL 9.1. In each case, the hardest part seems to be making sure that we clean up properly after a crash or server restart.
Monday, May 10, 2010
Lots and Lots of PostgreSQL Feature Requests
I was surprised and pleased to see that my last blog post, concerning possible directions for future PostgreSQL development, got about five times as many page views as my previous five posts put together, and a total of 70 comments (to date). This may be partly because it got syndicated on LWN, where a few more comments were also posted. I've gone through the comments posted on the blog itself and on the LWN article and counted up the number of times each feature was mentioned. Of course, this is totally unscientific, but it matches up fairly well to the results of previous surveys and gives me an excuse to talk about a bunch of interesting features.
1. Materialized Views (12). See my previous post on Materialized Views in PostgreSQL. Long story short, we may quite possibly get a simple version of this in PostgreSQL 9.1, but I suspect a lot more work will be needed to meet some of the use cases people have in mind.
2. Multi-master replication (6). This is more or less the holy grail of database geeks; it's a pretty hard problem. As it turns out, there is a project in the works called Postgres-XC which does just this. I hope to learn more about this project next week at PGcon. My understanding is that it currently supports only a subset of the SQL query types supported by PostgreSQL, but that work is underway to remove these limitations. Currently, none of the Postgres-XC code can be considered for including in core PostgreSQL because it uses a different license (LGPL), but it's still very interesting as an independent project.
3. Index-organized tables and/or index-only scans and/or automatic maintenance of CLUSTER order (6). I've grouped all of these features together because they're really driving toward the same underlying goal: reducing the I/O cost of an index scan. PostgreSQL will most likely not implement index-organized tables in the sense that Oracle has them, wherein, as I understand it, the table data is stored in the leaf pages of an index. However, we probably will implement index-only scans, which will allow us to gain some of the same performance benefits. Automatic maintenance of CLUSTER order would help, too, but I am not aware that anyone is currently working on that project.
4. MERGE (6). There is a (very ambitious) Google Summer of Code project to implement this feature. Stay tuned. If you're curious about what this feature actually does, I think Simon Riggs has written the best description of MERGE I've seen so far, together with some discussion of the implementation issues. A few weeks later he discussed he followed up with some further notes on the design of the feature.
5. Partitioning syntax (5). Itagaki Takahiro proposed a patch to implement this for PostgreSQL 9.0, but we simply ran out of time. I am hopeful that this will come back to life and be improved and eventually committed for PostgreSQL 9.1. The patch as proposed would have allowed the use of dedicated syntax to specify partitions and would have automatically created appropriate CHECK constraints for each partition, but would not have automatically routed INSERTs to the proper partition, which I think is necessary to make this really useful. Of course, others may have different opinions. :-)
6. Parallel query execution (5). This is a very good idea and yet also a very hard problem; I am not aware that anyone has even proposed a possible design for this, yet alone attempted to implement it. If we implement better SQL/MED support, it might be possible to get some of the benefits of this feature by spreading out data across multiple nodes but making it all appear local by creating remote tables. Or, it might be possible to leverage some of the I/O bandwidth of remote nodes by adding a feature to support non-local tablespaces (with some kind of specialized daemon process reading and writing remote pages on request). But neither of these are spot-on: what we really want is the ability to parallelize a query on a single node.
7. Procedures that can BEGIN, COMMIT, or ABORT transactions (4). This is another feature that would be great to have, but I am not aware that anyone is currently working on it.
8. Granular collation support (4). There are really two halves to this project. The SQL standard specifies a set of complex rules for determining which collation should be used for a particular comparison or ORDER BY operation. In PostgreSQL, you could imagine setting the collation for a particular setting using the "SET" command; associating a collation with a particular column; or overriding the collation for a particular instance of ORDER BY or a particular use of the < operator. So, one half of this problem is simply being able to recognize which collation applies in a particular context and doing the sort or comparison under that collation. The other half of the problem is extending our indexing system to handle multiple collations - either the ability to create an index with a particular collation (which can then be used to satisfy queries that pertain to that collation) or even the ability to create a single index which can somehow answer queries pertaining to multiple collations.
9. Better drivers (4). JDBC was mentioned twice, and asynchronous drivers were mentioned twice. It was also suggested that we should have an "official" Python driver. As a project, we've generally been wary about endorsing other projects, perhaps to our detriment. But good things seem to be happening with the psycopg2 project, especially the recent license change.
10. Graph database features (3). At least one person observed that you can do some of what is wanted here using common table expressions, also known as WITH queries, which are supported beginning in PostgreSQL 8.4. But several people seem to feel that we should have more graph support; one poster mentioned algorithms such as all-pairs shortest paths. I don't have a clear idea of what is needed here, but I suspect that some of the things people are looking for here could be implemented as PostgreSQL extensions. It would be interesting to see some more detailed requirements.
11. Documentation (3). I was surprised to see several requests for documentation among the comments, since I have generally found the PostgreSQL documentation to be superb and one of the great strengths of the project. But one poster did hit on an issue which I think is entirely legitimate: if you Google something like "PostgreSQL documentation", you get a link to our main documentation. But if you Google "PostgreSQL ALTER TABLE", you get a link to the documentation for ALTER TABLE in PostgreSQL 8.1, whereas you might hope to get a link to the 8.4 version of the documentation for that command. And if you Google "PostgreSQL setting", well, let's just say you don't get a link to a page that tells you how to change PostgreSQL settings. If you actually go to the documentation page and navigate through it manually, it's quite easy to find what you're looking for, but there must be something about our site that make Google fail to grok it properly.
Still another poster was looking for documentation in Chinese. Currently, I believe that we maintain documentation only in English due to the rather large translation effort that would be involved in keeping documentation up to date in multiple languages. In fact, we currently don't even ship Chinese translations of our error messages, due to the fact that our existing set of translations is too incomplete. If you would like to help localize PostgreSQL for your native language, please see our wiki page on NLS. Volunteers are needed!
1. Materialized Views (12). See my previous post on Materialized Views in PostgreSQL. Long story short, we may quite possibly get a simple version of this in PostgreSQL 9.1, but I suspect a lot more work will be needed to meet some of the use cases people have in mind.
2. Multi-master replication (6). This is more or less the holy grail of database geeks; it's a pretty hard problem. As it turns out, there is a project in the works called Postgres-XC which does just this. I hope to learn more about this project next week at PGcon. My understanding is that it currently supports only a subset of the SQL query types supported by PostgreSQL, but that work is underway to remove these limitations. Currently, none of the Postgres-XC code can be considered for including in core PostgreSQL because it uses a different license (LGPL), but it's still very interesting as an independent project.
3. Index-organized tables and/or index-only scans and/or automatic maintenance of CLUSTER order (6). I've grouped all of these features together because they're really driving toward the same underlying goal: reducing the I/O cost of an index scan. PostgreSQL will most likely not implement index-organized tables in the sense that Oracle has them, wherein, as I understand it, the table data is stored in the leaf pages of an index. However, we probably will implement index-only scans, which will allow us to gain some of the same performance benefits. Automatic maintenance of CLUSTER order would help, too, but I am not aware that anyone is currently working on that project.
4. MERGE (6). There is a (very ambitious) Google Summer of Code project to implement this feature. Stay tuned. If you're curious about what this feature actually does, I think Simon Riggs has written the best description of MERGE I've seen so far, together with some discussion of the implementation issues. A few weeks later he discussed he followed up with some further notes on the design of the feature.
5. Partitioning syntax (5). Itagaki Takahiro proposed a patch to implement this for PostgreSQL 9.0, but we simply ran out of time. I am hopeful that this will come back to life and be improved and eventually committed for PostgreSQL 9.1. The patch as proposed would have allowed the use of dedicated syntax to specify partitions and would have automatically created appropriate CHECK constraints for each partition, but would not have automatically routed INSERTs to the proper partition, which I think is necessary to make this really useful. Of course, others may have different opinions. :-)
6. Parallel query execution (5). This is a very good idea and yet also a very hard problem; I am not aware that anyone has even proposed a possible design for this, yet alone attempted to implement it. If we implement better SQL/MED support, it might be possible to get some of the benefits of this feature by spreading out data across multiple nodes but making it all appear local by creating remote tables. Or, it might be possible to leverage some of the I/O bandwidth of remote nodes by adding a feature to support non-local tablespaces (with some kind of specialized daemon process reading and writing remote pages on request). But neither of these are spot-on: what we really want is the ability to parallelize a query on a single node.
7. Procedures that can BEGIN, COMMIT, or ABORT transactions (4). This is another feature that would be great to have, but I am not aware that anyone is currently working on it.
8. Granular collation support (4). There are really two halves to this project. The SQL standard specifies a set of complex rules for determining which collation should be used for a particular comparison or ORDER BY operation. In PostgreSQL, you could imagine setting the collation for a particular setting using the "SET" command; associating a collation with a particular column; or overriding the collation for a particular instance of ORDER BY or a particular use of the < operator. So, one half of this problem is simply being able to recognize which collation applies in a particular context and doing the sort or comparison under that collation. The other half of the problem is extending our indexing system to handle multiple collations - either the ability to create an index with a particular collation (which can then be used to satisfy queries that pertain to that collation) or even the ability to create a single index which can somehow answer queries pertaining to multiple collations.
9. Better drivers (4). JDBC was mentioned twice, and asynchronous drivers were mentioned twice. It was also suggested that we should have an "official" Python driver. As a project, we've generally been wary about endorsing other projects, perhaps to our detriment. But good things seem to be happening with the psycopg2 project, especially the recent license change.
10. Graph database features (3). At least one person observed that you can do some of what is wanted here using common table expressions, also known as WITH queries, which are supported beginning in PostgreSQL 8.4. But several people seem to feel that we should have more graph support; one poster mentioned algorithms such as all-pairs shortest paths. I don't have a clear idea of what is needed here, but I suspect that some of the things people are looking for here could be implemented as PostgreSQL extensions. It would be interesting to see some more detailed requirements.
11. Documentation (3). I was surprised to see several requests for documentation among the comments, since I have generally found the PostgreSQL documentation to be superb and one of the great strengths of the project. But one poster did hit on an issue which I think is entirely legitimate: if you Google something like "PostgreSQL documentation", you get a link to our main documentation. But if you Google "PostgreSQL ALTER TABLE", you get a link to the documentation for ALTER TABLE in PostgreSQL 8.1, whereas you might hope to get a link to the 8.4 version of the documentation for that command. And if you Google "PostgreSQL setting", well, let's just say you don't get a link to a page that tells you how to change PostgreSQL settings. If you actually go to the documentation page and navigate through it manually, it's quite easy to find what you're looking for, but there must be something about our site that make Google fail to grok it properly.
Still another poster was looking for documentation in Chinese. Currently, I believe that we maintain documentation only in English due to the rather large translation effort that would be involved in keeping documentation up to date in multiple languages. In fact, we currently don't even ship Chinese translations of our error messages, due to the fact that our existing set of translations is too incomplete. If you would like to help localize PostgreSQL for your native language, please see our wiki page on NLS. Volunteers are needed!
Wednesday, May 05, 2010
Big Ideas
Each release of PostgreSQL features dozens, if not hundreds, of small improvements over the previous release; and a few really major new features. Of course, not everyone will agree on what the best new features are. In the forthcoming 9.0 release, pretty much everyone seems to agree that Streaming Replication (standby servers can receive WAL incrementally rather than in 16MB chunks) and Hot Standby (standby servers can execute read-only queries) are the big new features. In 8.4, I think the biggest improvements were the addition of window functions and common table expressions and the self-tuning free space map, which eliminated a major potential source of hassle for database administrators. In 8.3, we got several dramatic performance improvements - HOT, checkpoint spreading, and improvements to the background writer.
So, what's next? What should our community be focusing on for PostgreSQL 9.1? If you've been following developments on pgsql-hackers, you might be tempted to pick out some of the big patches that weren't completed in time for 9.1, like KNNGIST (use indices to accelerate queries that do ORDER BY , as when you want to find, say, the nearest points to a given circle), improved partitioning support (built-in syntax to reduce manual setup), and index-only scans (reduce I/O for index scans by opportunistically skipping or postponing tuple visibility checks). Another possible source of ideas is to look at the features that will be part of 9.0 and think about ways in which they could be further improved: streaming replication that is synchronous rather than asynchronous, or allowing Hot Standby to pause and resume WAL replay, or narrowing the set of circumstances under which Hot Standby needs to cancel a query in order to proceed with WAL replay. You could also look at our TODO list, or previous surveys and blog postings on this topic.
But I think it might be good to step back and look at the problem a bit more broadly. Ignoring for a minute what people actually are working on, what should they be working on? Where is PostgreSQL strong as a product and where does it need improvement? Here are a few things to think about - please leave a comment with your thoughts.
1. Performance and Scalability. When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks? What features can we provide to make better use of machines with lots and lots of RAM, CPU, and/or I/O bandwidth? Or what if you have more than one machine available? Are there classes of queries that we don't optimize well, and could do better with?
2. SQL Features. We're pretty close to having all of the SQL features required by the standard - and many of the ones that are left are not terribly interesting, which may be why no one has gone to the trouble of implementing them. Still, there are still a few more things that would be nice to have, including updateable views, materialized views, LATERAL(), and global temporary tables Do you need these features? Are there other things we're missing?
3. Indexes and Data Types. We currently support btree, hash, gin, and gist indices; hash indices are somewhat limited at present because they are not WAL-logged, and therefore can become corrupted after a system crash. We also support a broad array of datatypes, including all of the usual base types plus user-defined record and enum types, an XML datatype, and the ability to create new datatypes using PostgreSQL's powerful extensibility model. What else do we need, or what that we already have needs improvement?
4. Procedural Languages. Our core distribution includes support for SQL functions and four procedural languages: PL/pgsql, PL/perl, PL/python, and PL/tcl. Additional languages such as PL/R and the ever-popular PL/LOLCODE are available as extensions. It seems unlikely to me that we need more procedural lanaguages, but the existing ones might need improvement.
5. Security. Security can be further subdivided into connection security (preventing the bad guys from connecting to your database) and database privileges (allowing access to some of the data in the database, but not all of it). Two major features that we don't have in this are row-level security and SE-Linux integration, but there may be other things as well. What are they and which ones are important?
6. Administration. I think that the simplicity of administering PostgreSQL is one of its greatest strengths: installing PostgreSQL doesn't mean that you need to hire a dedicated DBA. Still, there's always something that's hard to do. What is it?
7. Replication and High Availability. I alluded to some possible projects in this area near the top of this post, and of course some of our needs in this area will continue to be met by third-party projects, such as Slony, Bucardo, and Londiste, but there may be other enhancements needed in the core product, also.
8. Other Stuff. What else is there?
So, what's next? What should our community be focusing on for PostgreSQL 9.1? If you've been following developments on pgsql-hackers, you might be tempted to pick out some of the big patches that weren't completed in time for 9.1, like KNNGIST (use indices to accelerate queries that do ORDER BY
But I think it might be good to step back and look at the problem a bit more broadly. Ignoring for a minute what people actually are working on, what should they be working on? Where is PostgreSQL strong as a product and where does it need improvement? Here are a few things to think about - please leave a comment with your thoughts.
1. Performance and Scalability. When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks? What features can we provide to make better use of machines with lots and lots of RAM, CPU, and/or I/O bandwidth? Or what if you have more than one machine available? Are there classes of queries that we don't optimize well, and could do better with?
2. SQL Features. We're pretty close to having all of the SQL features required by the standard - and many of the ones that are left are not terribly interesting, which may be why no one has gone to the trouble of implementing them. Still, there are still a few more things that would be nice to have, including updateable views, materialized views, LATERAL(), and global temporary tables Do you need these features? Are there other things we're missing?
3. Indexes and Data Types. We currently support btree, hash, gin, and gist indices; hash indices are somewhat limited at present because they are not WAL-logged, and therefore can become corrupted after a system crash. We also support a broad array of datatypes, including all of the usual base types plus user-defined record and enum types, an XML datatype, and the ability to create new datatypes using PostgreSQL's powerful extensibility model. What else do we need, or what that we already have needs improvement?
4. Procedural Languages. Our core distribution includes support for SQL functions and four procedural languages: PL/pgsql, PL/perl, PL/python, and PL/tcl. Additional languages such as PL/R and the ever-popular PL/LOLCODE are available as extensions. It seems unlikely to me that we need more procedural lanaguages, but the existing ones might need improvement.
5. Security. Security can be further subdivided into connection security (preventing the bad guys from connecting to your database) and database privileges (allowing access to some of the data in the database, but not all of it). Two major features that we don't have in this are row-level security and SE-Linux integration, but there may be other things as well. What are they and which ones are important?
6. Administration. I think that the simplicity of administering PostgreSQL is one of its greatest strengths: installing PostgreSQL doesn't mean that you need to hire a dedicated DBA. Still, there's always something that's hard to do. What is it?
7. Replication and High Availability. I alluded to some possible projects in this area near the top of this post, and of course some of our needs in this area will continue to be met by third-party projects, such as Slony, Bucardo, and Londiste, but there may be other enhancements needed in the core product, also.
8. Other Stuff. What else is there?