When people talk to me about the (limited implementation of) join removal that will be part of PostgreSQL 9.0, the conversation usually goes in two ways. Some people ask how the feature works and then say something like "oh, I guess that could be useful every once in a while". Other people already know exactly how the feature works and usually say some variant of "this is an amazingly wonderful feature that I am looking forward to with great enthusiasm".
The difference between these two groups of people (I think) is not so much their level of technical knowledge or how closely they've been following pgsql-hackers, but their use case. If your database is primarily a data warehouse, my guess is that you won't have many occasions to benefit from join removal. Where this feature really comes in handy is in OLTP workloads with highly normalized data, in situations where users are generating queries against views (perhaps through some sort of reporting interface) and expecting to get results back immediately.
Let's take an example. Suppose you're writing a bug-tracking system. Each bug has a number of properties associated with it: who reported it, who's working on it, current status, priority, date opened, release for which it's slated to be fixed, date of last status change, date resolved, people who want to get an email when it's updated, comments, etc. If like me you're a big fan of database normalization, you'll not want to store all of these as text fields. So you might end up with a table like this:
CREATE TABLE bug (
reporter_id integer not null references person (id),
assigned_to_id integer references person (id),
status_id integer not null references bug_status (id),
priority_id integer not null references priority (id),
target_release_id integer references release (id),
open_date date not null,
primary key (id)
You'll probably also end up with some supplementary tables for the items that can exist multiple times, like bug_comment and bug_watchers. Now, to make reporting easier, you'll probably want to define a view over the bug table that joins to all the other tables, so that it's easy to get the text values for the reporter, status, etc.
CREATE VIEW bug_view AS
b.reporter_id, r.name AS reporter,
b.assigned_to_id, at.name AS assigned_to,
b.status_id, s.name AS status,
b.priority_id, p.name AS priority,
b.target_release_id, tr.name AS target_release,
JOIN person r ON b.reporter_id = r.id
JOIN bug_status s ON b.status_id = s.id
JOIN priority p ON b.priority_id = p.id
LEFT JOIN person at ON b.assigned_to_id = at.id
LEFT JOIN release tr ON b.target_release_id = tr.id;
And now you can pretty easily write an engine that will let users select the columns they'd like to see from bug_view and the filter conditions they'd like to apply (only open bugs, only bugs slated to be resolved in release X, etc.) via a spiffy web interface. Note that the reporter, bug status, and priority fields can't be null, so we can use a plain old JOIN, but the bug might be assigned to no one or have no target release, so we use LEFT JOIN in those cases. (Otherwise, rows where those fields were NULL would not appear in the output.)
Over time, you'll tend to add more fields. Scalar fields like open_date don't cause much heartache, but as you add more fields that require joins, your view will tend to slow down. Some people might say that the answer is simply to denormalize - use natural keys in the bug table, and don't join. While that solution may be appropriate for some people, it is not without its downsides: database normalization was invented for a reason. The good news is that PostgreSQL is fast and has an excellent query planner, so even fairly complex queries run quite quickly. The bad news is that every query against the view is going to hit every table that's part of the view definition, so if you add enough of them, it's eventually going to be slow.
And, realistically, most of the time, users aren't going to want all the columns anyway. In a web application, 8-12 columns of output in an HTML table is typically about as much as you can squeeze in without starting to have a lot of line wrapping issues. This leads pretty naturally to the following question: if you don't need all of the columns, can you skip some of those joins and speed up the query?
Yes. In PostgreSQL 9.0, we can drop a join against a base table if (1) it's a left join, (2) there is a unique index on all or a subset of the join columns, and (3) none of the attributes from the nullable side of the join are used elsewhere in the query. So, in the above example, we could skip the joins to person at or release tr if the assigned_to or target_release columns, respectively, are not selected, assuming those tables have unique indexes on their id columns (if they don't, the join might change the number of rows in the output, so we must perform it).
We can't skip joining to any of the other tables, because those are inner joins. That's an implementation restriction which I hope will be lifted in PostgreSQL 9.1, but some more logic is needed to make that safe. In the meantime, a useful workaround may be to write those joins as LEFT JOINs rather the INNER JOINs, in cases where either join type will produce the same results.