Tuesday, April 04, 2023

Assigning Blame For PostgreSQL Errors

If an error message shows up in the PostgreSQL log, what program is malfunctioning? It's easy to conclude that the answer is PostgreSQL, but that's too simplistic.

For example, suppose these messages show up in my log file:

ERROR:  new row for relation "orders" violates check constraint "orders_d_check"
DETAIL:  Failing row contains (1975-03-16, 42).
STATEMENT:  insert into orders values ('1975-03-16', 42);

Looking at these messages, it should be fairly clear that the problem is not with the database, but with the application connecting to the database, or with the schema that the user chose to create. Someone created a CHECK constraint on the "orders" table, and the row that the user inserted does not satisfy that CHECK constraint. The database is reporting an ERROR in this situation because it was told to do exactly that. We cannot fault PostgreSQL for doing what the user told it to do.

Here's a slightly different kind of example:

ERROR:  could not open file "base/16384/16385": Permission denied
STATEMENT:  select * from orders;

In this case, the problem isn't with the application, which has issued a perfectly reasonable query to retrieve the contents of the "orders" table. Rather, the problem is with the operating system. PostgreSQL isn't able to access the file base/16384/16385. When it attempted to do so, the operating system refused it permission. So the problem here is at the operating system level, not within the database.

In this particular case, the problem occurred because I used chmod to change the permissions on that file to 000. Naturally, that resulted in PostgreSQL being unable to read the file. PostgreSQL is not the problem: I'm the problem.

This kind of issue seems to come up pretty frequently on production systems, and users don't always realize what the issue is. As a long time Linux and UNIX user, I instantly recognize Permission denied as a standard operating system error message, corresponding to EACCES. That tells me that this is an operating system error, so I know that PostgreSQL tried to do something, unexpectedly got an error from the OS, and wrote that error to the log.

But if, for some reason, you haven't memorized the entire list of UNIX error strings - there are about 130 of them - you might not immediately understand what this error message is telling you. But if you see that the error message consists of a general error message (here could not open file "base/16384/16385"), then a colon, and then a more specific error message (here Permission denied) then you should consider the possibility that the second message is coming from the operating system or some other component outside of PostgreSQL itself.

Here are a few other operating system errors that are reasonably likely to occur: Operation not permitted, Input/output error, Device out of space, Read-only file system, Connection reset by peer. For a full list, on non-Windows systems, try man errno. If you're seeing one of these errors, chances are very good that you have an operating system problem that PostgreSQL wasn't expecting and doesn't know how to address. For instance, Input/output error means you have a bad disk, or at least an inaccessible one.

The Operation not permitted and Permission denied errors are worthy of special mention. Typically, if there is a file permissions problem, you get Permission denied. Most likely, something has changed the permissions on some files that PostgreSQL uses, and they need to be changed back. Operation not permitted is a more unusual error. If it occurs during file access, it indicates that access to the files was refused for some reason other than file or directory permissions. The culprit is often something like SE-Linux, or anti-virus software, or some other kind of security software that has erroneously concluded that PostgreSQL shouldn't be permitted to access its own data files.

No comments:

Post a Comment