Replication, as it exists in PostgreSQL today, is physical replication. That is, the disk files as they exist on the standby are essentially byte-for-byte identical to the ones on the master (with the exception that the same hint bits may not be set). When a change is made on the master, the write-ahead log is streamed to the standby, which makes the same change to the corresponding disk block on the standby. The alternative is logical replication, in which what is transferred is not an instruction to write a certain sequence of bytes at a certain location, but the information that a certain tuple was inserted, or that a table with a given schema was created.
The big advantage of physical replication is that it has very, very low overhead. Most of the write-ahead log records that are needed to make physical replication work are needed anyway for crash recovery. There is certainly some overhead, but it is quite low, making it the best form of replication for disaster recovery scenarios. Having said that, physical replication has a number of serious disadvantages:
1. You can't replication to a different major version of PostgreSQL.
2. You certainly can't replicate to a database other than PostgreSQL.
3. You can't replicate part of the database.
4. You can't write any data at all on the standby.
5. You certainly can't do multi-master replication.
6. MVCC bloat on the master propagates to the standby.
7. Anything that would bloat the standby causes query cancellations instead, or delays in recovery (or in 9.1, you'll be able to avoid the query cancellation by bloating the master).
Logical replication, in theory, can work around all of these problems, and in fact there are already a number of existing projects which aim to provide logical replication for PostgreSQL, including Slony, Bucardo, and Londiste. All of these projects, however, have been hindered by lack of core support (as discussed at the last developer meeting, which I blogged a little bit about), and I think also by the fact that replication is just a really hard problem. There are also forks of PostgreSQL that aim to add replication functionality, such as Postgres-R and the now-defunct Mammoth Replicator, but it seems we don't yet have a solution that everyone is 100% happy with. Rather than a whole series of separate projects, I think we need one high-quality solution in core, or at the very least a set of powerful tools in core upon which solutions can be built more easily than what our current architecture allows. I think that trigger-based solutions are always going to carry an uncomfortably high performance cost.
Given the low overhead of physical replication and the fact that we already have it, one might ask whether it would make more sense to continue refining that technology, rather than invent something new. Perhaps, but I'm skeptical. Logical replication would be a very hard project, but it would also cut a pretty big hole through the solution space. Even the most ambitious plans for improving our current hot-standby-based replication system don't involve relaxing more than one or two of the seven restrictions listed above, and the difficulties in getting even that far seem quite formidable. I'm glad we have a really high-quality physical replication solution for high availability and disaster recovery, but I fear that it's never going to really meet the full range of needs in this area.