Almost two months ago, I wrote part one of what I indicated would be an occasional series of blog posts comparing the architecture of PostgreSQL to that of MySQL. Here's part two. Please note that the caveats set forth in part one apply to this and all future installments as well, so if you haven't read part one already, please click on the link above and read at least the first two paragraphs before reading this post.
Both InnoDB and PostgreSQL - as well as many other databases - use a technique called multi-version concurrency control (MVCC) to provide transaction isolation: transactions should not see the work of other, uncommitted transactions. MVCC means that, when a row is updated, the database stores both the old and new versions of the row. The alternative would be to lock the updated row until the inserting transaction committed, so that no one else could see the uncommitted update - but this is an unpopular approach, because a long-running transaction can acquire many locks and bring the system grinding to a halt. Even after a transaction commits, the old version of the row may continue to be seen by commands or transactions that began before the commit, so that their view of the database doesn't suddenly change under them. Abrupt changes in the way the client sees the database can lead to very surprising results, so both databases try to avoid letting that happen.
There's a fairly obvious problem with this scheme, though: if an update preserves the old version of the row while also storing the new one, then something's going to have to eventually go back and get rid of the old row, or the database will just keep growing and growing as rows are updated. That something, in the case of InnoDB, is purge, and in the case of PostgreSQL, is VACUUM. In both cases, the old version of a row that has been updated can be removed once all transactions or commands that could possibly wish to refer to the old version have completed. In both cases, also, deleted rows are kept around until any commands or transactions that might wish to see them have completed, and those must be removed as well. However, the implementations are quite different in detail.
In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment. All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too. One small downside of this approach is that performing an update means writing two tuples - the old one must be copied to the undo tablespace, and the new one must be written in its place.
PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it. As in InnoDB, records deleted are marked for future cleanup, but without also writing a record to the rollback tablespace. Both of these differences result in slightly less work when the operation is initially performed, but the payback is the eventual cleanup is more expensive. Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup. Beginning in PostgreSQL 8.3, there is an optimization called HOT (for "heap only tuple") which allows some vacuuming to be done on the fly in single-page increments; beginning in PostgreSQL 8.4 and higher, the system maintains a bitmap, called the visibility map, which indicates which pages of the table might possibly contain tuples in need of cleanup, and VACUUM can scan only those pages. However, a full scan of each index is still required during each VACUUM, make it still a somewhat expensive operation for large tables.
Since both systems can potentially store multiple versions of any given tuple, both can lead to "bloat", where the size of a table grows far beyond the amount of data actually stored in it, the bloat shows up in different places. Under InnoDB, most of the bloat (with the exception of any not-yet-removable deleted rows) is in the rollback tablespace, whereas in PostgreSQL it's mixed in with the actual table data. In either case, the problem occurs mostly (a) in the presence of long-running transactions or (b) when VACUUM and/or purge can't keep up with the rate at which old row versions are accumulating and must be removed. Beginning in PostgreSQL 8.3, VACUUM is typically performed automatically in the background, using multiple (three by default) concurrent worker processes. MySQL performs purges in the background, but it is single-threaded. Percona Server, and possibly other MySQL forks, offer multiple purge threads.
As in my previous post, I want to avoid making statements about which system is better. It's probably fairly clear from the previous paragraphs that I am much more knowledgeable about both the improvements in recent PostgreSQL releases and the limitations that remain than I am about anything on the MySQL side. Please feel free to offer comments, corrections, or additional details below.