Robert Haas: Big Ideas

Wednesday, May 05, 2010

Big Ideas

Each release of PostgreSQL features dozens, if not hundreds, of small improvements over the previous release; and a few really major new features. Of course, not everyone will agree on what the best new features are. In the forthcoming 9.0 release, pretty much everyone seems to agree that Streaming Replication (standby servers can receive WAL incrementally rather than in 16MB chunks) and Hot Standby (standby servers can execute read-only queries) are the big new features. In 8.4, I think the biggest improvements were the addition of window functions and common table expressions and the self-tuning free space map, which eliminated a major potential source of hassle for database administrators. In 8.3, we got several dramatic performance improvements - HOT, checkpoint spreading, and improvements to the background writer.

So, what's next? What should our community be focusing on for PostgreSQL 9.1? If you've been following developments on pgsql-hackers, you might be tempted to pick out some of the big patches that weren't completed in time for 9.1, like KNNGIST (use indices to accelerate queries that do ORDER BY , as when you want to find, say, the nearest points to a given circle), improved partitioning support (built-in syntax to reduce manual setup), and index-only scans (reduce I/O for index scans by opportunistically skipping or postponing tuple visibility checks). Another possible source of ideas is to look at the features that will be part of 9.0 and think about ways in which they could be further improved: streaming replication that is synchronous rather than asynchronous, or allowing Hot Standby to pause and resume WAL replay, or narrowing the set of circumstances under which Hot Standby needs to cancel a query in order to proceed with WAL replay. You could also look at our TODO list, or previous surveys and blog postings on this topic.

But I think it might be good to step back and look at the problem a bit more broadly. Ignoring for a minute what people actually are working on, what should they be working on? Where is PostgreSQL strong as a product and where does it need improvement? Here are a few things to think about - please leave a comment with your thoughts.

1. Performance and Scalability. When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks? What features can we provide to make better use of machines with lots and lots of RAM, CPU, and/or I/O bandwidth? Or what if you have more than one machine available? Are there classes of queries that we don't optimize well, and could do better with?

2. SQL Features. We're pretty close to having all of the SQL features required by the standard - and many of the ones that are left are not terribly interesting, which may be why no one has gone to the trouble of implementing them. Still, there are still a few more things that would be nice to have, including updateable views, materialized views, LATERAL(), and global temporary tables Do you need these features? Are there other things we're missing?

3. Indexes and Data Types. We currently support btree, hash, gin, and gist indices; hash indices are somewhat limited at present because they are not WAL-logged, and therefore can become corrupted after a system crash. We also support a broad array of datatypes, including all of the usual base types plus user-defined record and enum types, an XML datatype, and the ability to create new datatypes using PostgreSQL's powerful extensibility model. What else do we need, or what that we already have needs improvement?

4. Procedural Languages. Our core distribution includes support for SQL functions and four procedural languages: PL/pgsql, PL/perl, PL/python, and PL/tcl. Additional languages such as PL/R and the ever-popular PL/LOLCODE are available as extensions. It seems unlikely to me that we need more procedural lanaguages, but the existing ones might need improvement.

5. Security. Security can be further subdivided into connection security (preventing the bad guys from connecting to your database) and database privileges (allowing access to some of the data in the database, but not all of it). Two major features that we don't have in this are row-level security and SE-Linux integration, but there may be other things as well. What are they and which ones are important?

6. Administration. I think that the simplicity of administering PostgreSQL is one of its greatest strengths: installing PostgreSQL doesn't mean that you need to hire a dedicated DBA. Still, there's always something that's hard to do. What is it?

7. Replication and High Availability. I alluded to some possible projects in this area near the top of this post, and of course some of our needs in this area will continue to be met by third-party projects, such as Slony, Bucardo, and Londiste, but there may be other enhancements needed in the core product, also.

8. Other Stuff. What else is there?

77 comments:

Alex ClayMay 05, 2010 1:35 PM
I would love to see some native ability to view past versions of a record or resurrect a deleted row without the need to build a soft delete system. I've read some blog postings about how such a feature could be built. See Scott Bailey's slides from PG West 2009 on future enhancements.

http://scottrbailey.files.wordpress.com/2009/10/temporal_data.pdf
ReplyDelete
Replies
UnknownMay 05, 2010 1:52 PM
I would love to see better documentation around configuration settings as well as the ability to change more from within PostgreSQL.
ReplyDelete
Replies
AnonymousMay 05, 2010 1:54 PM
I'd love to see PostgreSQL provide some of the functionality of a graph database.
ReplyDelete
Replies
AnonymousMay 05, 2010 1:57 PM
Built-in materialized views would be *awesome*.
ReplyDelete
Replies
AnonymousMay 05, 2010 2:12 PM
How about making multiple instances able to operate from separate machines on shared files. In other words, would it be possible to make several machines run the PG daemon but all using the same files on an nfs/SAN/ceph based share safely. Maybe even directly on RADOS files?
ReplyDelete
Replies
AnonymousMay 05, 2010 2:17 PM
I'd vote for better JDBC drivers - one may argue that's not core issue - look, if you can't use Postgres's object system in Java - is that a small issue?
ReplyDelete
Replies
UnknownMay 05, 2010 2:27 PM
I'd like to see:
- automatically maintained clustered indexes (pg clustered indexes are awesome but can be painful to keep up-to-date in terms of load)
- materialized views could be really cool for caching in the DB tho they're easily worked around
- better performance monitoring tools
ReplyDelete
Replies
AnonymousMay 05, 2010 2:31 PM
To my way of thinking, documentation is key at this point and time. People know mysql is the default and use that instead of looking at PG to see if this DB might be a better solution. We used that at DirtyPhoneBook without even taking a look at PG because of that I'm afraid. Now we're scaling well and all, but missing several features that would make life a little easier.
ReplyDelete
Replies
Tore HalvorsenMay 05, 2010 3:07 PM
Automatically maintained clustered index would be my favorite.
ReplyDelete
Replies
AsherMay 05, 2010 3:07 PM
The feature I would love most is the ability to modify a VIEW, or COMPOSITE TYPE that functions are dependent on. In this way it's not necessary to DROP .. CASCADE, re-create the object, and then re-create the functions.
ReplyDelete
Replies
AnonymousMay 05, 2010 3:18 PM
Real procedures that can start and commit transactions would also be a benefit.
ReplyDelete
Replies
Beaud76May 05, 2010 3:44 PM
I would also appreciate to be able to better integrate PostgreSQL databases in information systems, by being able to EASILY and EFFICIENTLY join for instance 2 tables, one belonging to a pg database and the other managed by another RDBMS that respects SQL/MED standart.
ReplyDelete
Replies
pilifMay 05, 2010 4:15 PM
what I personally would love to see is the ability to do locale based sorting on a query-by-query basis instead of only being able to set LC_COLLATE on database creation time.

Even just being able to define different indexes with varying LC_COLLATE and then choosing which to use would be perfect.
ReplyDelete
Replies
AnonymousMay 05, 2010 4:22 PM
CUBE() is my #1 feature
ReplyDelete
Replies
Christopher BrowneMay 05, 2010 4:29 PM
1. Extending partitioning.

What's wanted hasn't entirely solidified yet; there have been proposals to this end.

2. Autonomous stored procedures that can manage transactions.

I suspect that the Right Idea here is to have an approach that is conspicuously different from current Stored Functions, so that people don't waffle about which to use in any given case.
ReplyDelete
Replies
Adam NelsonMay 05, 2010 4:44 PM
Please focus on these:

1. The website itself
2. Documentation on the website itself (as tested by finding the correct documentation for the current version via Google)
3. East of administration 'out of the box' (i.e. backup scenario that works with minimal understanding of underlying issues for simple deployments).
4. Cloud tools (i.e. backups going to S3)
5. Deprecating some of the arcane syntax like \dt instead of 'show tables' or something simple and descriptive.

Those are the biggest obstacles for me.
ReplyDelete
Replies
Tomasz SkutnikMay 05, 2010 5:22 PM
L10N/I18N sorting in tables.
ReplyDelete
Replies
tcMay 05, 2010 5:36 PM
For my Reporting/BI--MERGE statement and Materialized Views
ReplyDelete
Replies
UnknownMay 05, 2010 5:56 PM
Two things I have hoped to see a long time are:

Proper collation support since I am a bit of a language nerd and hate the way collation works right now. A global locale is never right.

MERGE to be implemented since it is so common to want to either insert or update. Almost every project includes that.

Adam, I disagree with almost all of your idea. I prefer \dt over SHOW and the documentation and the website of PostgreSQL are some of the best on the Internet. Sure they can always be improved but you make it sound like there would be a problem now. And backing up databases is trivial in simple cases for Postgres. It first gets hard when you try to do replication. Cloud tools also seem out of scope of the Postgres project and instead part of another project.
ReplyDelete
Replies
Malte CornilsMay 05, 2010 6:00 PM
Only feature we missed so far: pg/sql ability to control transactions!
ReplyDelete
Replies
gcMay 05, 2010 6:11 PM
How about have a Hot Standby that you can pause and resume WAL replay, but also rewind.

I could imagine situations where I want to see what my live server looked like at some point in history. So I pop over to the Hot Standby, hit 'rewind' and go back a few days, check the data and then hit 'play' again.
ReplyDelete
Replies
UnknownMay 05, 2010 7:07 PM
It's probably more an issue for associated projects, but more flexible asynchronous replication is very important for us. In particular:
- Multi-master, so we can fail over to DR without having to get replication up-to-date first
- Some ability to transform replication events, so we can replicate to a different (incompatible) schema version, and thus perform upgrades without outages.

..... John
ReplyDelete
Replies
Scott BaileyMay 05, 2010 7:48 PM
1) MERGE statement or some form of upsert would top my list.

2) CUBE and ROLLUP

3) Dependency invalidation or something to help the view/function dependency hell.

4) XMLTABLE and binary XML data type

5) Better parameter support in plpython. For instance returning record.

@Alex - Our proposal for a period data type was shot down on hackers. Instead they wanted a more generic mechanism to create ranges from many scalar data types not just timestamps. Jeff Davis is working on this for 9.1.

I'm not sure what to say to the folks asking for better documentation. Postgres' docs are top notch. Go look at the website/docs for Firebird and then we'll talk. Perhaps we can do something about getting Google to prefer "current" instead of a particular version.
ReplyDelete
Replies
UnknownMay 05, 2010 7:59 PM
This request might be more of a 10.0 feature than a 9.1 feature... I would like to have support for historical queries, similar to Oracle's Query Flashback:
http://wiki.oracle.com/page/Query+Flashback

If PostgreSQL supported SELECT ... AS OF TIMESTAMP queries, that would make it much easier to develop applications that require an audit trail. Without that feature today, you could only develop such an application via an awkward convention of INSERTing log entries with timestamps — never UPDATEing or DELETEing — and querying views that "replay" the log. It would be much easier if the database supported it natively.
ReplyDelete
Replies
Sean HsienMay 05, 2010 8:13 PM
I would second better drivers, especially asynchronous drivers (see ADBCJ). That would vastly improve scalability and make PG an attractive option for large scale applications.
ReplyDelete
Replies
MatinMay 05, 2010 8:37 PM
A stronger focus on performance on SSDs. RethinkDB is building a new storage engine for MySQL that's optimized around the performance characteristics of SSDs. It would be great to see a similar initiative in the PG community.

Checkout what the guys at RethinkDB are doing for more detail
ReplyDelete
Replies
Witek BarylukMay 05, 2010 9:21 PM
Comparing to MySQL you definietly needs simpler installation, adding users and creating databases. It is very simple in mysql, can be also done from phpmyadmin easly.

Beyond that I think psql is in very good shape.

For me most important is data safety. Maybe you should check on start check for example if all writing barriers and sync, fsync are all properly working. some disks, filesystems, volume manager, crypto layers, just cheats, and do ignore this requests and sometimes doesn't even return proper error codes. :(
ReplyDelete
Replies
傑夫的BlogMay 05, 2010 10:26 PM
Document !!

I am a traditional chinese version user. I want to do the chinese version SGML document.

That would create many user in china area with a chinese version document.
ReplyDelete
Replies
AnonymousMay 06, 2010 1:54 AM
I'd like to see better support for databases with thousands or tens of thousands of schemas, especially when most are almost identical.
ReplyDelete
Replies
serverhorrorMay 06, 2010 3:24 AM
Driver Support.

I mainly do Python and http://wiki.python.org/moin/PostgreSQL shows that there's no real consensus on what driver to use (I think there also was a recent discussion on the mailing list).

For the main langauges present I'd love to see "official" drivers from PostgreSQL in a sync and async Variant.

Also Multi-Master without shared storage
ReplyDelete
Replies
Daniel LyonsMay 06, 2010 3:52 AM
MERGE is at the top of my list. After that, probably rounding out the windowing functions, and then materialized views.

I don't personally care much for updateable views; we can do that now with RULEs and it's not much work. Materialized views are more problematic IMO to do by hand.
ReplyDelete
Replies
jMay 06, 2010 4:06 AM
I TOTALLY agree with pilif, this would be a very important feature!

You could create indexes also, on a locale-basis so it's always fast.

"what I personally would love to see is the ability to do locale based sorting on a query-by-query basis instead of only being able to set LC_COLLATE on database creation time.

Even just being able to define different indexes with varying LC_COLLATE and then choosing which to use would be perfect."
ReplyDelete
Replies
splixMay 06, 2010 4:15 AM
Storage optimized for SSD drives
ReplyDelete
Replies
AnonymousMay 06, 2010 4:24 AM
Nested Transactions
ReplyDelete
Replies
UnknownMay 06, 2010 4:29 AM
Hehe, there are 34 comments before mine, listing about 25 different, significant and nontrivial features. Seems like we're not going to run out of things to do anytime soon. ;-)
ReplyDelete
Replies
AnonymousMay 06, 2010 4:55 AM
1) Manage transaction inside functions (allow commit)

2) autonomous transactions

3) altering composite types and enums without recrate it.
ReplyDelete
Replies
AnonymousMay 06, 2010 5:49 AM
Much better XPath support.
XQuery support.
better features for key/value pairs.
SSD tuning.
ReplyDelete
Replies
Karel GardasMay 06, 2010 5:53 AM
I'd like to see more development involved around Postgres-XC: multi-master synchronous pgsql cluster...
ReplyDelete
Replies
Matthew WakelingMay 06, 2010 7:47 AM
I have three performance suggestions:

1. Implement partial sorts. This is useful when you have a fast source of tuples that are sorted by field a, but you need tuples "ORDER BY a, b". Currently, Postgres loads the whole lot into memory and sorts it, whereas it only needs to sort each group of similar values of a.

2. A generalised multi-column R-tree index. What I mean is, in a similar way that you can create a multi-column B-tree index, which can answer queries like "a = 1 AND b > 1", it would be great to have a multi-column R-tree index, which can answer queries like "a < 1 AND b > 1". This may be what Jeff Davis is doing, but I haven't looked.

3. Planner risk analysis. There are queries where picking the lowest cost plan is not the sensible choice, if that plan is more risky than other plans. An example is a query with EXISTS. If matching rows are common in the table, then Postgres will use a sequential scan to answer EXISTS, assuming that it will find a matching row very quickly. However, if the table is ordered in some way, and all the matching rows are at the end of the table, a full sequential scan is performed, which can cause bafflingly slow queries, where an index would be very quick. We see this crop up on the performance mailing list quite frequently.
ReplyDelete
Replies
Craig RingerMay 06, 2010 8:00 AM
1) The to use multiple cores to execute a single CPU-heavy transaction.

2) Concurrent partition reading+processing for a single query

3) Clean fkey/pkey handling in partitions

I don't ask much, do I ;-) . I'm well aware of how absurdly improbable the first two are, and I'm not actually asking anyone to do anything. It's nothing more than a "gee, it'd be nice if..." .

I do think Pg's single-thread-per-backend, single-backend-per-connection design will begin to seriously hurt it as machines stop increasing in single-core performance as much and start sprouting more cores than you can count. At some point the horrid task of getting multi-threaded backends going is going to have to be tackled.

For that matter, separating query executor from connection state may become necessary, too. You can get around the very expensive connections (because each connection has a private executor that's idle most of the time) in Pg using an external pool, but it'd be really nice if people didn't have to worry about poolers to get the most out of Pg in situations with high client counts.
ReplyDelete
Replies
AnonymousMay 06, 2010 8:10 AM
@Peter Eisentraut: it's a basic fact of economics that humans always want (demand) more but resources (people, time, etc.) are scarce.

For those that have lots of resources (CPU, I/O bandwidth, etc.), some form of parallelization of queries would be interesting, e.g., breaking down a query over a partitioned table so that each partition is processed by a separate process (or thread).
ReplyDelete
Replies
AnonymousMay 06, 2010 8:23 AM
Based on past data warehouse experiences:

1) Materialized views.
2) Merge Statement
3) Multiple storage-engine architecture
4) Simpler Partitioning

I put number 3 in there as being a MySQL fan (though not as much anymore), I love the fact you can change the underlying storage engine and still retain the same interface. It has lead to some very cool projects like RethinkDB, Infobright, PBXT, etc. What I'm looking at specifically is the ability to plug in a column-oriented storage engine as they provide significant benefits for OLAP systems.
ReplyDelete
Replies
Merlin MoncureMay 06, 2010 8:24 AM
Writable CTE. This will change postgres. There are a couple of other things that would be really nice may not ever happen, like an in-process postmaster.
ReplyDelete
Replies
Del MachinaMay 06, 2010 8:24 AM
I think updateable and materialized views would be useful.

Ability to maximize use of multi-core CPUs and I/O partitions in single queries.

More cowbell. :)
ReplyDelete
Replies
Laurence RoweMay 06, 2010 8:25 AM
@MartinMarcher

Both SQLAlchemy and Django use psycopg2 by default. While there is no absolute consensus (choice is good!) there is, I would say, 90% consensus.
ReplyDelete
Replies
UnknownMay 06, 2010 9:34 AM
I'd love to see PostgreSQL provide some of the functionality of a graph database.

This. Having CTEs makes it easier to do "graph stuff" in PostgreSQL, but graph operations done this way don't scale particularly well from what I've seen. Putting some more thought into ways to make PostgreSQL excel at storing and querying graphs would be a wonderful thing.
ReplyDelete
Replies
RyanMay 06, 2010 9:50 AM
Clustered indexes. Not clustering a table using an index (seems to be some confusion here), but actual clustered indexes aka index organized tables. That is to store the entire table inside an index. The result is that when we're looking up something by primary key we only need to do the index scan to find the data we're interested in. Today we need to scan the index and then actually find the row on the heap.

On a similar (very) note, index only queries. With this we can select data directly from an index without needing to lookup the data from the underlying table.
ReplyDelete
Replies
robwalker01May 06, 2010 10:34 AM
I agree with many of the suggestions:

- Materialized/indexed views (e.g. useful for calculated fields in views that don't exist in underlying tables, etc.)
- Built-in partitioning syntax
- Index-only scans (essential for people who like to use natural keys for the primary key)

...But there is another feature that is related to the replication features that I don't hear much about:

- Built-in "database resynchronization"

Not everyone replicates for performance. I need replication for fault-tolerance/fail-over. It's useful to have hot-standby, but what do I do when the "down" server comes back up again? How do I re-sync the databases in a way that will minimize downtime, so that the data in both is the most current?
ReplyDelete
Replies
UnknownMay 06, 2010 10:40 AM
Materialized views and partitioning in the base syntax without the over-head required to do it today would be positively amazing. Index Organized tables and further scalability/reliability improvements as discussed in the post would also be very very very welcome.
ReplyDelete
Replies
UnknownMay 06, 2010 10:53 AM
Concentrate entirely on user friendliness. Postgres is so much less user friendly than MySQL. People like me are just running simple web sites, and the awesome database stuff beneath the covers in Postgres doesn't make any difference to us. Thus, we end up using MySQL just because it's so much easier. We even use *gasp* the dreaded MyISAM, because it doesn't make any difference for our applications.

Make Postgres way way way more user friendly in pretty much every aspect. Now that replication is there, this should be priority number one.
ReplyDelete
Replies
ca_broomeMay 06, 2010 11:05 AM
After hearing the effusive praise for Postgres I've been giving it a shot. If I could make a few requests: strengthen the UPDATE capabilities. For instance, allow group-by syntax in a update with joins. Also, I hear there's no "replace into" support. And this: "" doesn't equate into an empty string? Really? It's 2010 not 1980.
ReplyDelete
Replies
UnknownMay 06, 2010 11:32 AM
My vote: materialized views.

Currently we maintain our own "materialized views" as separate tables that we update via Java code. It works, but it's a pain, and it's easy to mess up. I would love it if the DB handled all that work for us.

Thanks!
ReplyDelete
Replies
AnonymousMay 06, 2010 11:46 AM
I'm no database guy, but my business associate won't use postgres because it doesn't have true record level locking for serial transactions. I know for most uses mvcc and concurrent transactions are faster, but some people have to use the serial transaction features and would really like true record level locking.

Parallel queries would also be nice.
ReplyDelete
Replies
AnonymousMay 06, 2010 12:03 PM
I would love to see SSI (serializable snapshot isolation) for truly serializable transactions.
Then you would know for sure there's no isolation anomalies despite having very complex transactions.
ReplyDelete
Replies
UnknownMay 06, 2010 12:04 PM
Materialized Views.

They're the only thing I really miss from my days using commercial database software.
ReplyDelete
Replies
AdamMay 06, 2010 12:07 PM
live materialized views would be massively useful for web apps.

at the moment a postgres user is forced to make tradeoffs between good design (normalization, referential integrity) and performance - basically it's too slow to do all the joins, and you're better off duplicating the data.

a live materialized view would give the best of both worlds - create a well structured DB, but then use materialized views to cache the overhead of flattening the tables.

web sites frequently run at 10,000-1 or higher read/write ratio, so it really doesn't matter a bit if there is significant cost to maintaining the view (as long as it's not a complete rebuild on every change..)
ReplyDelete
Replies
AnonymousMay 06, 2010 12:08 PM
1. autoconfiguration: I would love postgres to suggest or set its best guess at the optimal settings for shared_buffers, work_mem etc given my hardware.

2. query optimization with stored procedures: while the query optimizer is great for plain SQL queries it is not able to dig down and take into account the queries within stored procedures. That is, I have a SQL query that has a subquery with a call to PL/PGSQL stored procedure. Expand the ultimate query from all the calls and optimize that.

3. documentation: yes, the technical docs are good but what about books for newbs: "postgres for dummies", "postgres in 24 hours" etc. These are important in lowering the barrier to entry to non-DBAs. That your average Barnes & Noble / Borders etc. has 10+ books on MySQL, none on postgres I am sure drives traffic to mysql.

4. schema support: schema are not yet first class concepts with full support. E.g. in 8.3 there is not schema option for log_line_prefix and I cannot specify a schema in psql -c
ReplyDelete
Replies
jmcentireMay 06, 2010 12:22 PM
Can you tackle a big problem?

Distributed database with automated load-balancing, optimized querying, data replication, et cetera.

While temporal databases, graph-database-like interaction, data-cubes, and the multitude of RDBMS standards are all very nice to have and support...

RDBMSes are inherently based on relational algebra. Relational algebra ought to be distributable. I would like to be able to access my database as though it were a single server. But, when I need additional performance, add a few systems to a cloud and call it good. I don't want to worry that if one of those systems goes down, the data is lost. I don't want to be limited in my data capacity to that amount of storage available on the smallest machine. This might require that I have a separate disk performing asynchronous saves like git change-sets to a central data store. Each system would have to be ACID compliant with its operations including a commit to the queue on the persistent data store manager.

That's a big task, eh?
ReplyDelete
Replies
Hugo R. L.May 06, 2010 1:52 PM
I Vote for:

Parallel Query Optimizer (From Greemplum)
Converting SQL or MapReduce into a physical execution plan.
Using a cost-based optimization algorithm in which it evaluates a vast number of potential plans
and selects the one that it believes will lead to the most efficient query execution.
Take a global view of execution across the cluster, and factors in the cost of moving data between nodes
in any candidate plan.

Polymorphic Data Storage (From Greemplum)
Customers can tune the storage types and compression settings of different partitions within the same table.
A single partitioned table could (for example)
have older data stored as 'column-oriented with deep/archival compression',
more recent data as 'column-oriented with fast/light compression',
and the most recent data as 'read/write optimized' to support fast updates and deletes.

BITMAP INDEX like Oracle 9i Does

create bitmap index
idx_product_local
on
pos_product_local(ag_prod_cod_barra, ag_gdet_local)
from
pos_product_local ag ,pos_product a, pos_local pg
where
ag.ag_prod_cod_barra = a.prod_cod_barra and ag.ag_gdet_local = pg.gon_codigo;

MATERIALIZED VIEWS
Need of Built-in materialized views and updateables.

CONFIGURATION PARAMETER
The use multiple cores to execute a single CPU-heavy transaction.
ReplyDelete
Replies
Paul RamseyMay 06, 2010 2:23 PM
Fun post Robert! Here's a PostGIS take!
ReplyDelete
Replies
Douglas McCleanMay 06, 2010 3:05 PM
The killer database feature nobody seems to have been able to implement yet is good temporal support. Read "Temporal Data and the Relational Model by Date (no flame wars ;)) and Darwen, the stuff in there would save *tons* of application development time in lots of areas (accounting, contracts, scheduling) and improve application correctness and performance as a bonus.

Server-side support for graph algorithms would also be great. DFS, BFS, all-pairs shortest paths, single source shortest paths, connected components, minimum spanning trees, etc all have lots of applications but sometimes require access to huge amounts of data to produce quite small result sets.
ReplyDelete
Replies
AnonymousMay 06, 2010 4:15 PM
I would love to see better integration in .NET. A good LINQ to Postgres implementation would be really useful.
ReplyDelete
Replies
UnknownMay 06, 2010 4:15 PM
Being able to alter column position in a way that is better than what is listed at http://wiki.postgresql.org/wiki/Alter_column_position would be appreciated.
ReplyDelete
Replies
fdrMay 06, 2010 6:31 PM
MERGE or a simpler non-standard upsert-y statement.
ReplyDelete
Replies
UnknownMay 07, 2010 12:24 AM
+1 for better JDBC drivers. The ability to use postgresql to its full potential with Hibernate and JPA in general would be huge. It's my understanding that it doesn't yet support fully JDBC 4.0.
ReplyDelete
Replies
UnknownMay 07, 2010 1:16 AM
Postgres has been kind to me, but I still have two large problems with Postgres:

1) Partitioning

2) Clustered Indexes

The data sets I deal with are time ordered sets of continuous data 24/7. This naturally leads me to want to both cluster on timestamp and partition by time.

After realizing how much server side code had to be written to support partitioning (building up the next partition at the correct time, dropping old partitions, replacing triggers etc) I was actually a bit shocked. I suspect that many users besides myself are looking to parition by time ranges, so I think that support for automated paritioning based on some simple rules (per day, per 1,000,000 records) would be very useful.

Additionally, the lack of automatic clustering significantly hurts reporting on my latest partition that is still receiving incoming data. It is relatively easy for me to get our "historical" partitions clustered, but the current partition is not (and can't reasonably be) routinely clustered. Right now there is no real solution besides making the clustered range a bit smaller to limit the size of the non clustered data set, but making too many paritions seems to have a significant impact on insert performance. I suspect that this has improved in 9.0, but I haven't yet had time to test.
ReplyDelete
Replies
DSquiresMay 07, 2010 2:47 AM
First of all - shame on me for not having sent you guys a contribution yet. I will remedy this.

I've been using postgres for a small personal project: web-based (but socket driven) mmo space strategy game - infinite time / space, no "winner". the client is in silverlight of all things.

anyway. this is my first experience with postgres and i like it! since i'm in a continuous, heavy dev cycle and i'm still in early beta - i haven't learned much about the diagnostics and monitoring...

one thing i have used a lot is pg_dump... trying to automate beta deployments. it would be nice if -c supported "if exists, drop" and if -T also filtered out sequences. other than that...

YAY.

(oh also - if you want to play... i'll make you a beta account)
ReplyDelete
Replies
awe_czMay 07, 2010 5:13 AM
Although I'm happy with WAL shipping changes in 9.0 I'd still like to see full-blown multi-master replication.
ReplyDelete
Replies
UnknownMay 07, 2010 8:07 AM
Top 10 mentioned features so far:

1. Materialized views
2. Multiple CPUs/parallel query
3. MERGE
4. Automatically clustered indexes (index-only scans?)
5. Improved partitioning
6. Multimaster replication
7. Easier configuration/administration
8. Temporal features
9. "Real" procedures with transaction handling
10. Locale per column or query

Which is up to small variations almost exactly what http://postgresql.uservoice.com/forums/21853-general says.
ReplyDelete
Replies
Uncommon SenseMay 07, 2010 11:08 AM
Personally I'd like to see more "enterprisey" stuff:

1. Auditing capabilities similar to oracle. PG is getting pushed aside at my shop because of lack of SOX compliance auditing.

2. Triggers on "create" and "alter" SQL commands. I think this could potentially make replication much much easier to administer (you could trigger the replication system to replicate the change). Triggers on login/logout, startup/shutdown, and on errors (especially privilege errors for auditing) would be extremely useful.

3. More monitoring metrics to measure and tune PG parameters, such as autovac and bgwriter.

4. Read/Write capability while re-indexing. I have a 24x7x365 system and reindexing never happens.

5. Separate logs for connection logs, SQL logs, and system notice/warning/error logs.
ReplyDelete
Replies
pabloaMay 11, 2010 1:24 AM
I would like to see new features in table partitioning:

a) Horizontal partitioning. Right now there is some very useful Horizontal partitioning. It would be nice to have support at GUI level.
It would be nice to improve the current Horizontal partitioning. Version 8.2.x needed some triggers and some manual stuff.
I think it would be nice to create a table and define the Horizontal partition with DDL and let postgresql to solve the implementation details.

b) Alter a Horizontal partitioned table. Meaning alter using DDL the partition scheme.

c) Vertical Partitined Table. Same thing: It would be nice to have support at GUI and DDL level.
Perhaps assigned columns or groups of columns to different tablespaces. Same indexes.

d) Alter a vertical partitioned table.

:D
ReplyDelete
Replies
BerndMay 12, 2010 5:51 AM
More monitoring features. We want to know which component does a query realy slow down: parser? executer? IO-Time (read/write)? For us log_statement_stats, log_parser_stats, log_planner_stats, log_executor_stats are a little bit too much, but log_min_duration_statement(0) with the duration-time is not enough to drill down the what's realy going on inside of PG.
ReplyDelete
Replies
GregoryMay 24, 2010 8:14 PM
I'll add another one for "more information from the database". This could be:

* More built in queries/views for determining index/table sizes, which indexes are getting used and which aren't, when things are getting bloated, etc. I've been devouring some of the pgCon talks about these things but the information sometimes seems so difficult to get to unless someone posts some helpful system catalog queries to build that info.
* More options for logging useful information (into separate files that can be rotated/handled differently) that can later be parsed and analyzed in useful ways. Especially if some contrib modules were added to do that analysis.
* Anything to help me make more sense out of EXPLAIN ANALYZE. My company's main product is maturing to the point where I can go back to some of our problem queries to optimize, but I often don't know where to start. Identifying problems and hinting at potential solutions would help me immensely. Documentation, tools, GUIs, I don't care, I just need some help.
* Some way to sanity check various configuration/tuning options. So many of the optimization recommendations and best practices are outdated and confusing. I'd like a simple way to determine either what my best settings are or be able to create test suites to benchmark various settings. Really, even an updated best practices documentation with each release would be awesome.

Simple and easy things that would make my life easier:

* Being able to modify ENUMs without having to rebuild my table would be super dandy.
* Some kind of an "If 0 results on UPDATE, do an INSERT instead" functionality. Currently I implement this in application logic (which makes sense), but being able to fire a single statement at the DB instead would be faster and more efficient code-wise.

Beyond that, next-gen pie-in-the-sky type features I'd like to see include:
* Materialized views would be helpful in some circumstances. I'd actually been investigating rolling my own here without knowing about this generalized functionality.
* Better partitioning support. Being able to split my data up without all the caveats and manual setup would be dandy.
* Multi-master clusters, when combined with that partitioning support would really enhance scalability.

I've been using PostgreSQL since 7.0/7.1 and I have to say the progress that's been made has been incredible. That so many things have dropped off my wishlist since those days is very encouraging and I really look forward to seeing what the future holds.
ReplyDelete
Replies
AnonymousSeptember 20, 2010 9:10 AM
Thanks for sharing the link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please answer to my post if you do!

I would appreciate if a staff member here at rhaas.blogspot.com could post it.

Thanks,
William
ReplyDelete
Replies
AnonymousSeptember 25, 2010 1:29 PM
Thanks for sharing this link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please reply to my post if you do!

I would appreciate if a staff member here at rhaas.blogspot.com could post it.

Thanks,
Peter
ReplyDelete
Replies
Robert HaasSeptember 28, 2010 10:58 PM
@Anonymous - Which link is not working for you?
ReplyDelete
Replies
AnonymousNovember 14, 2010 7:45 AM
Hello,

I have a message for the webmaster/admin here at rhaas.blogspot.com.

Can I use part of the information from this blog post above if I provide a link back to this website?

Thanks,
Peter
ReplyDelete
Replies

Add comment