Thursday, May 20, 2010

Global Temporary and Unlogged Tables

From a technical standpoint, temporary tables in PostgreSQL have three properties that distinguish them from ordinary tables:

1. They're stored in a special schema, so that they are normally visible only to the creating backend.
2. They are managed by the local buffer manager rather than the shared buffer manager.
3. They are not WAL-logged.

It makes sense to think about removing these properties one by one, in the order listed above. Removing just the first property, without doing anything else, doesn't quite make sense, because a table which is managed by the local buffer manager can't be simultaneously accessed by multiple backends. We could work around this by having each backend access a separate set of files. This would give us a global temporary table - that is, a table which is visible to everyone, but each backend sees its own contents. (There is some debate about whether this is the right name, or what the right name for this concept might be - but that's what I'm calling it for now.)

Removing both of the first two properties also makes sense. It gives us an unlogged table - that is, a basically ordinary table for which no WAL is written. (Again, the naming is debatable.) Such tables are not crash-safe: an unexpected system crash could leave the table hopelessly corrupted. The only obvious workaround for this problem is to truncate the table on every system restart.

Why might someone want these new table types? Global temporary tables are appealing for users who need temporary tables with a relatively fixed structure, and don't want to recreate them in every new session. In addition to administrative convenience, this avoids the overhead of repeatedly creating and vacuuming the system catalog entries associated with the temporary tables, which may be a performance benefit for some users.

Unlogged tables are appealing for data that needs to be shared across backends, but which we're willing to lose in the case of a server restart. For example, consider a web application maintaining a table of active user sessions. If the server restarts, we may be willing to lose this data. Everyone will need to log in again, but considering that database crashes are rare, that may not be such a big deal. Unlogged tables also won't be replicated to standby servers, since replication relies on WAL. But, on the plus side, skipping WAL-logging should hopefully yield a significant performance benefit.

I'm going to be working on implementing both of these table types for PostgreSQL 9.1. In each case, the hardest part seems to be making sure that we clean up properly after a crash or server restart.

7 comments:

  1. Very useful features. If half the ideas floating around for 9.1 make it in that release I'll be more excited about it than the upcoming one!

    ReplyDelete
  2. PG 9.1 was released and it doesn't have temp tables wich can survie session destruction. Do You know if these kind of table will be ever implemented?

    ReplyDelete
  3. @MK: I'm not quite sure what you mean by temporary tables which can survive session destruction. 9.1 does have unlogged tables, which might do what you need. Global temporary tables are not implemented yet, but might be someday (probably not for 9.2, though).

    ReplyDelete
  4. I'm looking forward to seeing this! Maybe in 9.3?

    ReplyDelete
  5. I'm looking forward to seeing the global temporary tables. Maybe in 9.3?

    My use case: In a web app, at the beginning of the request, I'd like to set a variable that represents the current user's id. I'd like to be able to access that user's id in subsequent queries, views, etc.

    ReplyDelete
  6. Working on a PeopleSoft on Oracle project, just started wishing that Oracle was as cool as PostgreSQL and had unlogged tables. There is a table that is daily truncated and loaded (summary data to improve reporting performance). Not logging that monster would make the logs take a lot less disk, and would make it require a lot less bandwidth to a hot fail over server.

    New use case: Reduce bandwidth to a hot fail over server when the data can be reproduced on the target system.

    ReplyDelete
  7. A useful thing would be to let some tables be stored on a ramdisk only, while others on normal hard disks. AFAIK this is not possible today.

    Not all tables have the same "importance". Some may be used for temporary data, even heavily used, but non critical at all. Doing so would improve performance, and reduce the stress on hard disks and, even more, on solid state drives.

    ReplyDelete