Comments on Robert Haas: Big Ideas

Hello, I have a message for the webmaster/admin...

2010-11-14T07:45:17.573-05:00

Hello,

I have a message for the webmaster/admin here at rhaas.blogspot.com.

Can I use part of the information from this blog post above if I provide a link back to this website?

Thanks,
Peter

@Anonymous - Which link is not working for you?

2010-09-28T22:58:55.575-04:00

@Anonymous - Which link is not working for you?

Thanks for sharing this link, but unfortunately i...

2010-09-25T13:29:19.420-04:00

Thanks for sharing this link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please reply to my post if you do!

I would appreciate if a staff member here at rhaas.blogspot.com could post it.

Thanks,
Peter

Thanks for sharing the link, but unfortunately it...

2010-09-20T09:10:50.545-04:00

Thanks for sharing the link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please answer to my post if you do!

I would appreciate if a staff member here at rhaas.blogspot.com could post it.

Thanks,
William

I'll add another one for "more informatio...

2010-05-24T20:14:22.038-04:00

I'll add another one for "more information from the database". This could be:

* More built in queries/views for determining index/table sizes, which indexes are getting used and which aren't, when things are getting bloated, etc. I've been devouring some of the pgCon talks about these things but the information sometimes seems so difficult to get to unless someone posts some helpful system catalog queries to build that info.
* More options for logging useful information (into separate files that can be rotated/handled differently) that can later be parsed and analyzed in useful ways. Especially if some contrib modules were added to do that analysis.
* Anything to help me make more sense out of EXPLAIN ANALYZE. My company's main product is maturing to the point where I can go back to some of our problem queries to optimize, but I often don't know where to start. Identifying problems and hinting at potential solutions would help me immensely. Documentation, tools, GUIs, I don't care, I just need some help.
* Some way to sanity check various configuration/tuning options. So many of the optimization recommendations and best practices are outdated and confusing. I'd like a simple way to determine either what my best settings are or be able to create test suites to benchmark various settings. Really, even an updated best practices documentation with each release would be awesome.

Simple and easy things that would make my life easier:

* Being able to modify ENUMs without having to rebuild my table would be super dandy.
* Some kind of an "If 0 results on UPDATE, do an INSERT instead" functionality. Currently I implement this in application logic (which makes sense), but being able to fire a single statement at the DB instead would be faster and more efficient code-wise.

Beyond that, next-gen pie-in-the-sky type features I'd like to see include:
* Materialized views would be helpful in some circumstances. I'd actually been investigating rolling my own here without knowing about this generalized functionality.
* Better partitioning support. Being able to split my data up without all the caveats and manual setup would be dandy.
* Multi-master clusters, when combined with that partitioning support would really enhance scalability.

I've been using PostgreSQL since 7.0/7.1 and I have to say the progress that's been made has been incredible. That so many things have dropped off my wishlist since those days is very encouraging and I really look forward to seeing what the future holds.

More monitoring features. We want to know which co...

2010-05-12T05:51:31.596-04:00

More monitoring features. We want to know which component does a query realy slow down: parser? executer? IO-Time (read/write)? For us log_statement_stats, log_parser_stats, log_planner_stats, log_executor_stats are a little bit too much, but log_min_duration_statement(0) with the duration-time is not enough to drill down the what's realy going on inside of PG.

I would like to see new features in table partitio...

2010-05-11T01:24:19.142-04:00

I would like to see new features in table partitioning:

a) Horizontal partitioning. Right now there is some very useful Horizontal partitioning. It would be nice to have support at GUI level.
It would be nice to improve the current Horizontal partitioning. Version 8.2.x needed some triggers and some manual stuff.
I think it would be nice to create a table and define the Horizontal partition with DDL and let postgresql to solve the implementation details.

b) Alter a Horizontal partitioned table. Meaning alter using DDL the partition scheme.

c) Vertical Partitined Table. Same thing: It would be nice to have support at GUI and DDL level.
Perhaps assigned columns or groups of columns to different tablespaces. Same indexes.

d) Alter a vertical partitioned table.

:D

Personally I'd like to see more "enterpri...

2010-05-07T11:08:39.714-04:00

Personally I'd like to see more "enterprisey" stuff:

1. Auditing capabilities similar to oracle. PG is getting pushed aside at my shop because of lack of SOX compliance auditing.

2. Triggers on "create" and "alter" SQL commands. I think this could potentially make replication much much easier to administer (you could trigger the replication system to replicate the change). Triggers on login/logout, startup/shutdown, and on errors (especially privilege errors for auditing) would be extremely useful.

3. More monitoring metrics to measure and tune PG parameters, such as autovac and bgwriter.

4. Read/Write capability while re-indexing. I have a 24x7x365 system and reindexing never happens.

5. Separate logs for connection logs, SQL logs, and system notice/warning/error logs.

Top 10 mentioned features so far: 1. Materialized...

2010-05-07T08:07:22.284-04:00

Top 10 mentioned features so far:

1. Materialized views
2. Multiple CPUs/parallel query
3. MERGE
4. Automatically clustered indexes (index-only scans?)
5. Improved partitioning
6. Multimaster replication
7. Easier configuration/administration
8. Temporal features
9. "Real" procedures with transaction handling
10. Locale per column or query

Which is up to small variations almost exactly what http://postgresql.uservoice.com/forums/21853-general says.

Although I'm happy with WAL shipping changes i...

2010-05-07T05:13:42.558-04:00

Although I'm happy with WAL shipping changes in 9.0 I'd still like to see full-blown multi-master replication.

First of all - shame on me for not having sent you...

2010-05-07T02:47:27.632-04:00

First of all - shame on me for not having sent you guys a contribution yet. I will remedy this.

I've been using postgres for a small personal project: web-based (but socket driven) mmo space strategy game - infinite time / space, no "winner". the client is in silverlight of all things.

anyway. this is my first experience with postgres and i like it! since i'm in a continuous, heavy dev cycle and i'm still in early beta - i haven't learned much about the diagnostics and monitoring...

one thing i have used a lot is pg_dump... trying to automate beta deployments. it would be nice if -c supported "if exists, drop" and if -T also filtered out sequences. other than that...

YAY.

(oh also - if you want to play... i'll make you a beta account)

Postgres has been kind to me, but I still have two...

2010-05-07T01:16:23.383-04:00

Postgres has been kind to me, but I still have two large problems with Postgres:

1) Partitioning

2) Clustered Indexes

The data sets I deal with are time ordered sets of continuous data 24/7. This naturally leads me to want to both cluster on timestamp and partition by time.

After realizing how much server side code had to be written to support partitioning (building up the next partition at the correct time, dropping old partitions, replacing triggers etc) I was actually a bit shocked. I suspect that many users besides myself are looking to parition by time ranges, so I think that support for automated paritioning based on some simple rules (per day, per 1,000,000 records) would be very useful.

Additionally, the lack of automatic clustering significantly hurts reporting on my latest partition that is still receiving incoming data. It is relatively easy for me to get our "historical" partitions clustered, but the current partition is not (and can't reasonably be) routinely clustered. Right now there is no real solution besides making the clustered range a bit smaller to limit the size of the non clustered data set, but making too many paritions seems to have a significant impact on insert performance. I suspect that this has improved in 9.0, but I haven't yet had time to test.

+1 for better JDBC drivers. The ability to use pos...

2010-05-07T00:24:13.998-04:00

+1 for better JDBC drivers. The ability to use postgresql to its full potential with Hibernate and JPA in general would be huge. It's my understanding that it doesn't yet support fully JDBC 4.0.

MERGE or a simpler non-standard upsert-y statement...

2010-05-06T18:31:43.913-04:00

MERGE or a simpler non-standard upsert-y statement.

Being able to alter column position in a way that ...

2010-05-06T16:15:58.548-04:00

Being able to alter column position in a way that is better than what is listed at http://wiki.postgresql.org/wiki/Alter_column_position would be appreciated.

I would love to see better integration in .NET. A ...

2010-05-06T16:15:04.824-04:00

I would love to see better integration in .NET. A good LINQ to Postgres implementation would be really useful.

The killer database feature nobody seems to have b...

2010-05-06T15:05:50.985-04:00

The killer database feature nobody seems to have been able to implement yet is good temporal support. Read "Temporal Data and the Relational Model by Date (no flame wars ;)) and Darwen, the stuff in there would save *tons* of application development time in lots of areas (accounting, contracts, scheduling) and improve application correctness and performance as a bonus.

Server-side support for graph algorithms would also be great. DFS, BFS, all-pairs shortest paths, single source shortest paths, connected components, minimum spanning trees, etc all have lots of applications but sometimes require access to huge amounts of data to produce quite small result sets.

Fun post Robert! Here's a PostGIS take!

2010-05-06T14:23:51.776-04:00

Fun post Robert! Here's a PostGIS take!

I Vote for: Parallel Query Optimizer (From Greem...

2010-05-06T13:52:05.383-04:00

I Vote for:

Parallel Query Optimizer (From Greemplum)
Converting SQL or MapReduce into a physical execution plan.
Using a cost-based optimization algorithm in which it evaluates a vast number of potential plans
and selects the one that it believes will lead to the most efficient query execution.
Take a global view of execution across the cluster, and factors in the cost of moving data between nodes
in any candidate plan.

Polymorphic Data Storage (From Greemplum)
Customers can tune the storage types and compression settings of different partitions within the same table.
A single partitioned table could (for example)
have older data stored as 'column-oriented with deep/archival compression',
more recent data as 'column-oriented with fast/light compression',
and the most recent data as 'read/write optimized' to support fast updates and deletes.

BITMAP INDEX like Oracle 9i Does

create bitmap index
idx_product_local
on
pos_product_local(ag_prod_cod_barra, ag_gdet_local)
from
pos_product_local ag ,pos_product a, pos_local pg
where
ag.ag_prod_cod_barra = a.prod_cod_barra and ag.ag_gdet_local = pg.gon_codigo;

MATERIALIZED VIEWS
Need of Built-in materialized views and updateables.

CONFIGURATION PARAMETER
The use multiple cores to execute a single CPU-heavy transaction.

Can you tackle a big problem? Distributed databas...

2010-05-06T12:22:10.160-04:00

Can you tackle a big problem?

Distributed database with automated load-balancing, optimized querying, data replication, et cetera.

While temporal databases, graph-database-like interaction, data-cubes, and the multitude of RDBMS standards are all very nice to have and support...

RDBMSes are inherently based on relational algebra. Relational algebra ought to be distributable. I would like to be able to access my database as though it were a single server. But, when I need additional performance, add a few systems to a cloud and call it good. I don't want to worry that if one of those systems goes down, the data is lost. I don't want to be limited in my data capacity to that amount of storage available on the smallest machine. This might require that I have a separate disk performing asynchronous saves like git change-sets to a central data store. Each system would have to be ACID compliant with its operations including a commit to the queue on the persistent data store manager.

That's a big task, eh?

1. autoconfiguration: I would love postgres to sug...

2010-05-06T12:08:41.235-04:00

1. autoconfiguration: I would love postgres to suggest or set its best guess at the optimal settings for shared_buffers, work_mem etc given my hardware.

2. query optimization with stored procedures: while the query optimizer is great for plain SQL queries it is not able to dig down and take into account the queries within stored procedures. That is, I have a SQL query that has a subquery with a call to PL/PGSQL stored procedure. Expand the ultimate query from all the calls and optimize that.

3. documentation: yes, the technical docs are good but what about books for newbs: "postgres for dummies", "postgres in 24 hours" etc. These are important in lowering the barrier to entry to non-DBAs. That your average Barnes & Noble / Borders etc. has 10+ books on MySQL, none on postgres I am sure drives traffic to mysql.

4. schema support: schema are not yet first class concepts with full support. E.g. in 8.3 there is not schema option for log_line_prefix and I cannot specify a schema in psql -c

live materialized views would be massively useful ...

2010-05-06T12:07:06.187-04:00

live materialized views would be massively useful for web apps.

at the moment a postgres user is forced to make tradeoffs between good design (normalization, referential integrity) and performance - basically it's too slow to do all the joins, and you're better off duplicating the data.

a live materialized view would give the best of both worlds - create a well structured DB, but then use materialized views to cache the overhead of flattening the tables.

web sites frequently run at 10,000-1 or higher read/write ratio, so it really doesn't matter a bit if there is significant cost to maintaining the view (as long as it's not a complete rebuild on every change..)

Materialized Views. They're the only thing I ...

2010-05-06T12:04:25.214-04:00

Materialized Views.

They're the only thing I really miss from my days using commercial database software.

I would love to see SSI (serializable snapshot iso...

2010-05-06T12:03:19.501-04:00

I would love to see SSI (serializable snapshot isolation) for truly serializable transactions.
Then you would know for sure there's no isolation anomalies despite having very complex transactions.

I'm no database guy, but my business associate...

2010-05-06T11:46:00.746-04:00

I'm no database guy, but my business associate won't use postgres because it doesn't have true record level locking for serial transactions. I know for most uses mvcc and concurrent transactions are faster, but some people have to use the serial transaction features and would really like true record level locking.

Parallel queries would also be nice.