I'm going to be starting an occasional series of blog postings comparing MySQL's architecture to PostgreSQL's architecture. Regular readers of this blog will already be aware that I know PostgreSQL far better than MySQL, having last used MySQL a very long time ago when both products were far less mature than they are today. So, my discussion of how PostgreSQL works will be based on first-hand knowledge, but discussion of how MySQL works will be based on research and - insofar as I'm can make it happen - discussion with people who know it better than I do. (Note: If you're a person who knows MySQL better than I do and would like to help me avoid making stupid mistakes, drop me an email.)
In writing these posts, I'm going to try to avoid making value judgments about which system is "better", and instead focus on describing how the architecture differs, and maybe a bit about the advantages of each architecture. I can't promise that it will be entirely unbiased (after all, I am a PostgreSQL committer, not a MySQL committer!) but I'm going to try to make it as unbiased as I can. Also, bearing in mind what I've recently been told by Baron Schwartz and Rob Wultsch, I'm going to focus completely on InnoDB and ignore MyISAM and all other storage engines. Finally, I'm going to focus on architectural differences. People might choose to use PostgreSQL because they hate Oracle, or MySQL because it's easier to find hosting, or either product because they know it better, and that's totally legitimate and perhaps worth talking about, but - partly in the interests of harmony among communities that ought to be allies - it's not what I'm going to talk about here.
So, all that having been said, what I'd like to talk about in this post is the way that MySQL and PostgreSQL store tables and indexes on disk. In PostgreSQL, table data and index data are stored in completely separate structures. When a new row is inserted, or when an existing row is updated, the new row is stored in any convenient place in the table. In the case of an update, we try to store the new row on the same page as the old row if there's room; if there isn't room or if it's an insert, we pick a page that has adequate free space and use that, or failing all else extend the table by one page and add the new row there. Once the table row is added, we cycle through all the indexes defined for the table and add an index entry to each one pointing at the physical position of the table row. One index may happen to be the primary key, but that's a fairly nominal distinction - all indexes are basically the same.
Under MySQL's InnoDB, the table data and the primary key index are stored in the same data structure. As I understand it, this is what Oracle calls an index-organized table. Any additional ("secondary") indexes refer to the primary key value of the tuple to which they point, not the physical position, which can change as leaf pages in the primary key index are split. Since this architecture requires every table to have a primary key, an internal row ID field is used as the primary key if no explicit primary key is specified.
Since Oracle supports both options, they are probably both useful. An index-organized table seems particularly likely to be useful when most lookups are by primary key, and most of the data in each row is part of the primary key anyway, either because the primary key columns are long compared with the remaining columns, or because the rows, overall, are short. Storing the whole row in the index avoids storing the same data twice (once in the index and once in the table), and the gain will be larger when the primary key is a substantial percentage of the total data. Furthermore, in this situation, the index page still holds as many, or almost as many, keys as it would if only a pointer were stored in lieu of the whole row, so one fewer random I/Os will be needed to access a given row.
When accessing an index-organized table via a secondary index, it may be necessary to traverse both the B-tree in the secondary-index, and the B-tree in the primary index. As a result, queries involving secondary indexes might be slower. However, since MySQL has index-only scans (PostgreSQL does not), it can sometimes avoid traversing the secondary index. So in MySQL, adding additional columns to an index might very well make it run faster, if it causes the index to function as a covering index for the query being executed. But in PostgreSQL, we frequently find ourselves telling users to pare down the columns in the index to the minimum set that is absolutely necessary, often resulting in dramatic performance gains. This is an interesting example of how the tuning that is right for one database may be completely wrong for another database.
I've recently learned that neither InnoDB nor PostgreSQL supports traversing an index in physical order, only in key order. For InnoDB, this means that ALL scans are performed in key order, since the table itself is, in essence, also an index. As I understand it, this can make a large sequential scan quite slow, by defeating the operating system's prefetch logic. In PostgreSQL, however, because tables are not index-organized, sequential scans are always performed in physical order, and don't require looking at the indexes at all; this also means we can skip any I/O or CPU cost associated with examining non-leaf index pages. Traversing in physical order is apparently difficult from a locking perspective, although it must be possible, because Oracle supports it. It would be very useful to see this support in MySQL, and once PostgreSQL has index-only scans, it would be a useful improvement for PostgreSQL, too.
One final difficulty with an index-organized table is that you can't add, drop, or change the primary key definition without a full-table rewrite. In PostgreSQL, on the other hand, this can be done - even while allowing concurrent read and write activity. This is a fairly nominal advantage for most use cases since the primary key of a table rarely changes - I think it's happened to me only once or twice in the last ten years - but it is useful when it does comes up.
I hope that the above is a fair and accurate summary of the topic, but I'm sure I've missed a few things and covered others incompletely or in less detail than might be helpful. Please feel free to respond with a comment below or a blog post of your own if I've missed something.
[Update: See also part 2 of this series, discussing VACUUM vs. Purge]