As much as I like Linux (and, really, I do: I ran Linux 0.99.something on my desktop in college, and wrote my class papers using vim and LaTeX), there are certain things about it that drive me crazy, and the way it reports memory usage is definitely on the list. It should be possible for a reasonably intelligent human being (in which category I place myself) to answer simple questions about system memory usage, such as "How much memory is my database using?" or "How much memory is my web server using?" relatively simply.
Unfortunately, I can't, and I don't think I'm alone. The first thing I typically do is run "free -m" to get a picture of overall system memory utilization. This is generally pretty accurate, and if there's only one process on your machine that's using any significant amount of memory, that may be all you need, in which case you're lucky. Next, I run "top", hit capital "M" to sort by memory usage, and start looking at the individual processes. That's where the wheels come off.
I just did this on an otherwise idle system and 10 httpd processes popped up nearly to the top of the list, each reporting VIRT as 186MB, RES at 3304, and SHR as 768. Which number or combination of numbers represents my true memory usage? According to the top man page, the definition of VIRT is: "The total amount of virtual memory used by the task. It includes all code, data, and shared libraries plus pages that have been swapped out." It makes sense, therefore, that this number is much higher than the actual memory usage of the process: code, data, and shared library pages will only be faulted into memory as needed, but they're still counted against VIRT. Furthermore, those 10 httpd processes consist of one parent process and nine children, and some of the parents data or stack pages may be shared with its children via copy-on-write. VIRT, therefore, is a gross overestimate of the real memory impact of running an idle httpd on this machine. Just to be clear, I think VIRT is a useful piece of information for the system to report, but it doesn't tell me what I want to know right now, which is how much memory httpd is using.
The top man page defines RES as "The non-swapped physical memory a task has used" and adds the note that "RES = CODE + DATA". That sounds more like a measure of current (rather than theoretical) memory usage. Right now, the parent httpd process is reporting RES = 6892, and all of the children are reporting RES = 3304, except for one, which is reporting 2988. That adds up to about 35MB, which sounds plausible, but it turns out not to be right, because when I run free, stop httpd, and run free again, memory usage only drops by about 11MB, presumably because the RES number doesn't account (or doesn't fully account?) for sharing between the parent process and its children. The equation "RES = CODE + DATA" is also evidently false, because if I enable those additional fields inside top (f-r-s-enter), I find out that the processes with RES = 3304 have CODE = 332 and DATA = 2280, a discrepancy of almost 700kB.
Although the example above talks about httpd, the same problems all apply to PostgreSQL. When I configure shared_buffers = 200MB and start up the server, it reports VIRT = 369MB and RES = 16MB. The shared memory segment, including the 200MB of shared_buffers, is reflected in the virtual size but not the resident size. This is understandable: starting the server doesn't actually access the memory allocated for shared_buffers, and Linux won't allocated it until it's really used. But, as it turns out, the postmaster (i.e. the parent postgres process) will never reflect a resident size higher than 16MB, even after every block in the shared memory segment has been used. Instead, each individual backend that touches any part of the shared memory segment will count just the portions it touched in its shared memory total; thus, new backends will appear to start out using a small amount of memory and then grow (sometimes quite precipitously) as they begin to do actual work. If they eventually touches all of shared_buffers, they should level off at some value equal to approximately the total size of PostgreSQL's shared memory segment plus some amount that reflects the private memory it is using.
What this means in practice is that it's just about impossible to look at the output of top and have any idea how much memory PostgreSQL is using, or even whether its memory usage is growing over time. In fact, memory leaks in PostgreSQL are fairly rare, because we use a system of memory contexts to track allocations. There are per-tuple and per-query contexts where many allocations are done; once we're done processing a given tuple or query we eradicate the entire contents of the memory context in a bulk operation. This is a very effective way of preventing leaks; only when we're allocating memory in a session-lifespan memory context do we need to worry about a long-term leak. However, if we do spring a leak, it's hard to spot it from looking at the top (or ps) output unless it's pretty egregious. If you have a postgres process which is using much more memory than the other postgres processes, and you can correlate that with an overall decrease in system free memory, then you've got a leak. A small leak, however, is likely to go unobserved, because there's so much noise in the reported numbers that a real problem looks just like an artifact. Even a 10MB leak (which is pretty significant) would blend right in unless you happened to be running with a very small value for shared_buffers.
On the flip side, the tendency of each new postgres process to start out with a small resident size and then grow as it begins touching shared_buffers can easily create the perception of a real leak where none exists. In fact, no new memory is being allocated at all: the apparent growth in resident size is really just the result of faulting already-allocated pages into the address space of a process where the kernel hadn't previously chosen to map them. But seeing a process start out small and then within a minute balloon to multiple gigabytes can be alarming to system administrators, to say the least.
top advertises a few other potentially interesting values as well, but they don't really add much to the total picture. The SHR number is "simply reflects memory that could be potentially shared with other processes". In theory, that ought to help clarify things: but since memory that is opportunistically shared (like copy on write pages) is not distinguished from actual shared memory, it's not that helpful. There's also an optional column for SWAP, but it's not the number of pages that process has pushed out to the swapfile; it's just the portion of VIRT that isn't currently resident. So a demand-paged shared library that has never been fully loaded (because it hasn't been accessed) counts the same as an unshared stack page that's been evicted due to extreme memory pressure, which is bizarre.