tag:blogger.com,1999:blog-20038672.post295364488901863018..comments2024-03-12T17:48:06.493-04:00Comments on Robert Haas: Absurd Shared Memory LimitsRobert Haashttp://www.blogger.com/profile/08393677427643988650noreply@blogger.comBlogger39125tag:blogger.com,1999:blog-20038672.post-24757292979871671952013-02-18T03:24:53.098-05:002013-02-18T03:24:53.098-05:00I'm kinda late to the party stumbled across th...I'm kinda late to the party stumbled across this while researching PostgreSQL optimizations. I use the sysctl shm_use_phys to tell the FreeBSD kernel to wire down and and don't swap the pages allocated to SysV shared segments. Like sjg said this patch will affect performance on BSD systems. The only way around that would be to use the mlock system call on mmap anonymous memory but that requires the process to have root privileges.<br /><br />Anyways that been said can't this be a compile time option ? I understand the need for this patch but those SysV memory limits no longer apply on most BSD kernels last I checked.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-64919587444151425762012-11-05T09:17:44.478-05:002012-11-05T09:17:44.478-05:00I forgot to link to the DragonflyBSD / Postgres be...I forgot to link to the DragonflyBSD / Postgres benchmark.<br /><br />http://lists.dragonflybsd.org/pipermail/users/attachments/20121010/7996ff88/attachment-0002.pdfJacobnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-49508525706834597972012-11-05T09:06:56.267-05:002012-11-05T09:06:56.267-05:00Robert
What's your thought on the Postgres 9....Robert<br /><br />What's your thought on the Postgres 9.3-dev performance on DragonFlyBSD?<br /><br />Also, any chance we could get Postgres 9.3-dev tested with the lseek fix to show as a comparison to Scientific Linux 6.2?Jacobnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-36630175815076790762012-11-04T06:51:34.631-05:002012-11-04T06:51:34.631-05:00For those coming here from because someone recentl...For those coming here from because someone recently linked here (*cough*Slashdot*cough*), the issue here seems to have been resolved (as mentioned at said site), at least for Dragonfly: http://lists.dragonflybsd.org/pipermail/users/2012-October/017536.htmlAnonymoushttps://www.blogger.com/profile/08547126165225243709noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-26996395739624381842012-11-04T03:47:54.263-05:002012-11-04T03:47:54.263-05:00Given the description this is essentially a "...Given the description this is essentially a "political" patch, and we all know that politics are finnicky and have little to do with technical merits. On the technical side this means pgsql now depends on two out of three available ways to do the same thing, which actually enlarges the political vulnerability service --maybe some wit will insist on disabling mmap for yet another political stance-- so it'd be nice to be able to do just that, should the resident master tuner want to.<br /><br />Given that the goal is to not require arcane tuning knowledge for non-tuners, there shouldn't be anything against the ability to jump back to pre-mmap using some tunable, and the dubious joy of tuning sysv shmen parameters.<br /><br />Of course there's technical elegance in the One True Solution, but that's not a valid argument for patches that are essentially political in nature. Politics are messy, so don't go try impose neatness where flexibility is worth that much more, should you suddenly and unexpectedly need it.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-34003751186795326672012-09-17T19:14:33.872-04:002012-09-17T19:14:33.872-04:00I think on FreeBSD you could use mlock() to wire m...I think on FreeBSD you could use mlock() to wire memory pages in the main memory. Note that at this time it's not available to non-root privileged processes (we are working on changing this to an administrator-adjustable amount instead by the way).<br /><br />Speaking for PV entries, this can be mitigated on modern hardware by utilizing so called "Super Page" [1] which is transparent to applications on FreeBSD -- the system promotes smaller contiguous mappings to superpage mapping when needed if the mapping is aligned and sufficiently large in size.<br /><br />Hope this helps...Xin Lihttps://blog.delphij.net/noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-75680011709144188582012-09-17T19:09:51.963-04:002012-09-17T19:09:51.963-04:00On FreeBSD (I believe this is also available for o...On FreeBSD (I believe this is also available for other *BSDs) you could use mlock() and munlock() with the starting virtual address and length of the memory chunk to wire them, but this requires root privileges at this time (we are working on changing this, to allow unprivileged process to request certain administrator-adjustable amount in the meantime). Will that be helpful?<br /><br />Speaking for PV entries, the FreeBSD VM subsystem now have capability to promote contiguous normal pages to "superpage"s in a transparent manner to application, if the mappings are aligned and sufficiently large. [1]<br /><br />[1] http://www.youtube.com/watch?v=0wIxny-n_MgBit Ripperhttps://www.blogger.com/profile/18271865806282233805noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-4319065518112748132012-09-17T19:09:39.832-04:002012-09-17T19:09:39.832-04:00On FreeBSD (I believe this is also available for o...On FreeBSD (I believe this is also available for other *BSDs) you could use mlock() and munlock() with the starting virtual address and length of the memory chunk to wire them, but this requires root privileges at this time (we are working on changing this, to allow unprivileged process to request certain administrator-adjustable amount in the meantime). Will that be helpful?<br /><br />Speaking for PV entries, the FreeBSD VM subsystem now have capability to promote contiguous normal pages to "superpage"s in a transparent manner to application, if the mappings are aligned and sufficiently large. [1]<br /><br />[1] http://www.youtube.com/watch?v=0wIxny-n_MgBit Ripperhttps://www.blogger.com/profile/18271865806282233805noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-25015932898726522602012-09-14T10:53:01.227-04:002012-09-14T10:53:01.227-04:00I am hoping this will help alleviate the other lon...I am hoping this will help alleviate the other long standing problem: OOM Killer's badness() choosing Postgres unwisely [1]. Or am I hoping for too much?<br /><br />[1] http://thoughts.j-davis.com/2009/11/29/linux-oom-killer/Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-23949219630561002842012-09-12T02:00:31.320-04:002012-09-12T02:00:31.320-04:00Robert, there is nothing analogous to this for mma...Robert, there is nothing analogous to this for mmap nor is there a workaround. Basically, if this change sticks (and I am not necessarily saying it shouldn't), the BSD's will need to do major work to their VM's to reduce the pv_entry overhead in order to stay relevant as a platform for PostgreSQL. Matt Dillon has seen this and is trying to come up with a solution for the DragonFly VM presently, which may or may not be portable to the other BSD's. See: http://developers.slashdot.org/comments.pl?sid=3107463&cid=41308595 (and further up that thread)sjghttps://www.blogger.com/profile/04494748495212464786noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-38574115384568430272012-09-11T12:28:19.330-04:002012-09-11T12:28:19.330-04:00finally, well done Robert.
finally, well done Robert.<br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-29911596681622788972012-09-11T10:17:29.988-04:002012-09-11T10:17:29.988-04:00No, it doesn't eliminate shared memory usage c...No, it doesn't eliminate shared memory usage completely; it only reduces it to a few bytes. We might eliminate it completely eventually, but there isn't a consensus on what the best approach is.Robert Haashttps://www.blogger.com/profile/08393677427643988650noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-86392513380112944492012-09-11T10:16:38.758-04:002012-09-11T10:16:38.758-04:00Sorry, nope. Some people have back-ported the pat...Sorry, nope. Some people have back-ported the patch, though.Robert Haashttps://www.blogger.com/profile/08393677427643988650noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-23120075923329091852012-09-11T10:16:23.567-04:002012-09-11T10:16:23.567-04:00We don't want the memory segment to be backed ...We don't want the memory segment to be backed by a file; that would lead to extra I/O.Robert Haashttps://www.blogger.com/profile/08393677427643988650noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-33403693435885765122012-09-11T10:15:40.243-04:002012-09-11T10:15:40.243-04:00Is there some kind of workaround for this? It wou...Is there some kind of workaround for this? It would seem pretty odd if this shm_use_phys optimization were only available to processes using System V shared memory. If there's a way we can request that same optimization for an mmap'd segment I certainly think we'd do that.Robert Haashttps://www.blogger.com/profile/08393677427643988650noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-37573218765049322882012-09-10T21:26:49.244-04:002012-09-10T21:26:49.244-04:00"flock() doesn't work on NFS". Are y..."flock() doesn't work on NFS". Are you saying that this SysV shm locking scheme does work on NFS? Or are the downsides to other locking mechanisms in that "etc"?<br /><br />I'm not necessarily disagreeing with what you've written, but as glyph said: "I'm not saying I'm sure there's no good reason, I've just never seen one mentioned in the previous mailing list threads." This answer still does not address this. Can you point us at a comparison of locking methods that shows that SysV shm is the most reliable and robust method for locking?<br /><br />camhhttps://www.blogger.com/profile/00860158630335660776noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-51541330048414861122012-09-10T21:16:38.005-04:002012-09-10T21:16:38.005-04:00There is a very good reason we OS vendors do not s...There is a very good reason we OS vendors do not ship with SysV default limits high enough to run a serious PostgreSQL database. There is very little software that uses SysV in any serious way other than PostgreSQL and there is a fixed overhead to increasing those limits. You end up wasting RAM for all the users who do not need the limits to be that high. That said, you are late to the party here, vendors have finally decided that the fixed overheads are low enough relative to modern RAM sizes that the defaults can be raised quite high, DragonFly BSD has shipped with greatly increased limits for a year or so and I believe FreeBSD also.<br /><br />There is a serious problem with this patch on BSD kernels. All of the BSD sysv implementations have a shm_use_phys optimization which forces the kernel to wire up memory pages used to back SysV segments. This increases performance by not requiring the allocation of pv entries for these pages and also reduces memory pressure. Most serious users of PostgreSQL on BSD platforms use this well-documented optimization. After switching to 9.3, large and well optimized Pg installations that previously ran well in memory will be forced into swap because of the pv entry overhead.sjghttps://www.blogger.com/profile/04494748495212464786noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-3864985783779328402012-09-10T21:11:36.432-04:002012-09-10T21:11:36.432-04:00On FreeBSD you have to enable shared memory for ja...On FreeBSD you have to enable shared memory for jails if you want to jail your postgres process, and you can't have more than one postgres instance in a jail. It's always been considered insecure because the system-v shared memory data is readable by all jails. Is this going to solve the FreeBSD jails problem?<br /><br />http://lists.freebsd.org/pipermail/freebsd-jail/2008-January/000149.htmlfeldhttp://feld.menoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-45110919797059244072012-09-10T21:08:49.466-04:002012-09-10T21:08:49.466-04:00SysV shared memory has a fixed overhead, as do man...SysV shared memory has a fixed overhead, as do many other things in the kernel. The higher you raise those limits, the higher your fixed overhead becomes. The reason we as OS vendors do not ship with the ability to use many gigabyte SysV shared memory segments by default (historically) is that few people use it and we do not want to put the burden of that fixed overhead on everyone who does not need it.<br /><br />This patch will reduce performance outright on BSD kernels for users who previously leveraged the shm_use_phys optimization (pretty much everyone who runs a serious database) because the kernel will have to manage pv entries for all of those mmap'd pages. It will also create additional memory pressure on those systems because more pv entries will need to be allocated. sjghttps://www.blogger.com/profile/04494748495212464786noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-17855327283396253562012-09-01T22:54:56.779-04:002012-09-01T22:54:56.779-04:00Hmm. Any chance of a fast 9.3 release? =)Hmm. Any chance of a fast 9.3 release? =)Anonymoushttps://www.blogger.com/profile/07463865744282267557noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-73775207706055657982012-07-10T03:41:01.188-04:002012-07-10T03:41:01.188-04:00Safety and reliability.
For most applications/dae...Safety and reliability.<br /><br />For most applications/daemons, there's a reason you don't want multiple running concurrently, but if it somehow happens, it isn't the end of the world. For example, if you have a mail server running and you try to start a second, it isn't going to work well (only one can bind to TCP Port 25). But other than that, you aren't likely to experience pain.<br /><br />With a Database, there's a much greater risk. Having two PostgreSQL servers running concurrently and accessing the same database files will basically guarantee corrupted data. As Postgres goes to extremes to protect your data (as it should), this is an unacceptable situation. In order to ensure it doesn't happen, Postgres uses the most reliable and robust method possible to make sure there is only one Postgres instance using a given data directory.<br /><br />Other methods tend to have drawbacks (flock() doesn't work on NFS, etc). This one works, and works well.Christopher Cashellhttps://www.blogger.com/profile/17212804857619277232noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-30091817572548160042012-07-06T23:29:41.200-04:002012-07-06T23:29:41.200-04:00Given that SysV and POSIX shared memory chunks hav...Given that SysV and POSIX shared memory chunks have named identifiers, I'm not sure why you'd compare them to mmap() with MAP_ANONYMOUS. The logical equivalent is an mmap()'d file on the actual file system, not an anonymous chunk of mmap()'d memory.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-20038672.post-27383969385581204232012-07-06T20:12:55.372-04:002012-07-06T20:12:55.372-04:00I have seen these complaints before (about locking...I have seen these complaints before (about locking the data directory) but I have never really understood them.<br /><br />Why bother with a tiny SysV shared memory chunk at all? Why not just use a filesystem lock with either flock() or symlink() like... pretty much every other daemon process in the universe? <br /><br />I'm not saying I'm sure there's no good reason, I've just never seen one mentioned in the previous mailing list threads.glyphhttps://www.blogger.com/profile/07021175796928101086noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-33045943748271772332012-07-02T07:43:14.551-04:002012-07-02T07:43:14.551-04:00from systcl(8) man page on my system (ubuntu 11.1...from systcl(8) man page on my system (ubuntu 11.10) <br />it seems that "-p" is used to load from a file: <br /><br />"Load in sysctl settings from the file specified or /etc/sysctl.conf if none given. Specifying - as filename means reading data from standard input"sickpighttps://www.blogger.com/profile/08223173379217706866noreply@blogger.comtag:blogger.com,1999:blog-20038672.post-72913383458014608942012-07-02T02:09:31.719-04:002012-07-02T02:09:31.719-04:00On error, mmap() returns MAP_FAILED. Also, mmap() ...On error, mmap() returns MAP_FAILED. Also, mmap() can potentially return NULL, so AnonymousShmem should be initialized with and checked against MAP_FAILED.Davi Arnauthttps://www.blogger.com/profile/09313791000614068026noreply@blogger.com