Wednesday, December 08, 2021

Surviving Without A Superuser - Part One

PostgreSQL users and developers are generally aware that it is best to minimize the number of tasks performed as superuser, just as at the operating system level most Linux and UNIX users are aware that it's best not to do too many things as root. For that reason, PostgreSQL has over the last few years introduced a number of predefined roles that have special privileges and which in some case can be used in place of the superuser role. For instance, the pg_read_all_data role, new in version 14, has the ability to read all data in every table in the database - not only the tables that currently exist, but any that are created in the future. In earlier versions, you could achieve this effect only by handing out superuser permissions, which is not great, because the superuser role can do much more than just read all the data in the database. The new predefined role allows for a very desirable application of the principle of least privilege.

Monday, November 29, 2021

Collation Stability

When PostgreSQL needs to sort strings, it relies on either the operating system (by default) or the ICU collation library (if your PostgreSQL has been built with support for ICU and you have chosen to use an ICU-based collation) to tell it in what order the strings ought to be sorted. Unfortunately, operating system behaviors are confusing and inconsistent, and they change relatively frequently for reasons that most people can't understand. That's a problem for PostgreSQL users, especially PostgreSQL users who create indexes on text columns. The first step in building a btree index to sort the data, and if this sort order differs from the one used for later index lookups, data that is actually present in the index may not be found, and your queries may return wrong answers.

Monday, June 21, 2021

Talking about the PostgreSQL Optimizer at CMU

Professor Andy Pavlo, at CMU, seems to be a regular organizer of technical talks about databases; this year, he organized the vaccination database tech talks, and invited me to give one about the PostgreSQL query optimizer. So I did. It was great. There were a few PostgreSQL community members present, but more importantly, a bunch of smart people who know a lot about other database systems showed up to the talk, including Andy Pavlo himself, and I got some feedback on where PostgreSQL could perhaps be improved.  Here are the highlights, with links to the relevant portion of the YouTube video.