* pidsig 0.11 - a fghack like de-daemonisation tool @ 2010-06-02 6:08 Janos Farkas 2010-06-02 18:46 ` Laurent Bercot 0 siblings, 1 reply; 10+ messages in thread From: Janos Farkas @ 2010-06-02 6:08 UTC (permalink / raw) To: supervision Hi, I've been using a tool to replace fghack in Bernstein chains. It overcomes at least one limitation of fghack, namely that fghack doesn't have a way to pass on signals to daemons that it started. pidsig works very similarly to fghack, will create (at least) one pipe in the newly started daemon, but in addition to that, it will also keep track of the pid for the child it started. With the recorded pid, pidsig is able to pass on (some) signals that it receives itself. In absence (or in addition to) the child pid, it can send signals to processes that have recorded pid files. Although it's still best to modify daemons to not actually put themselves to background, there are occasionally exceptions that are difficult to handle otherwise. One such example is nginx - which has an own "worker" scheme, and supports online code replacement, thus, occasionally can have several main threads, keeping several pid files for them when this is in progress. pidsig will be able to work with two kinds of daemons: - those that don't go out their way to close all open fd's (just like fghack) - those that don't go out their way to disconnect parent pids by forking too many times Furthermore: - it can chroot (although if still running as root, it may not be much more "secure") - it can run as a specific user (but then it may be limited in what processes it can kill) - can read several(!) pid files to pass on signals to - to signify quit, configuration reload, etc. I've been using it successfully to manage a few nginx configurations with daemontools, it also works well with the atd from the Linux at package. Sample usage for nginx: pidsig -d/var/run -pnginx.pid.oldbin -pnginx.pid /usr/sbin/nginx There are some obvious, and some quirky features that I have plans to implement - it all depends on the interest. Please drop me a note if any of the above sounds interesting and/or check out its project page at github: http://github.com/chexum/pidsig Janos ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-02 6:08 pidsig 0.11 - a fghack like de-daemonisation tool Janos Farkas @ 2010-06-02 18:46 ` Laurent Bercot 2010-06-03 16:53 ` Janos Farkas 0 siblings, 1 reply; 10+ messages in thread From: Laurent Bercot @ 2010-06-02 18:46 UTC (permalink / raw) To: supervision > pidsig works very similarly to fghack, will create (at least) one pipe > in the newly started daemon, but in addition to that, it will also > keep track of the pid for the child it started. With the recorded > pid, pidsig is able to pass on (some) signals that it receives itself. Nice ! > There are some obvious, and some quirky features that I have plans to > implement - it all depends on the interest. I would advise to implement just the features you need. Such a tool, like fghack, is very handy to have when there's no other choice; however, we don't want people to rely on them, to use them as excuses for bad coding practices. And if you make pidsig too powerful, you *know* it's going to happen. Working in professional environments has made me very disillusioned about the way people design software (when they design it) and understand Unix (when they understand it). Having workarounds is good, but if you make it too easy for people to misbehave, *they will*. So... scratch your itch, but don't go out of your way to accommodate all kinds of ugly practices. :) -- Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-02 18:46 ` Laurent Bercot @ 2010-06-03 16:53 ` Janos Farkas 2010-06-03 19:25 ` Laurent Bercot 0 siblings, 1 reply; 10+ messages in thread From: Janos Farkas @ 2010-06-03 16:53 UTC (permalink / raw) To: supervision On Wed, Jun 2, 2010 at 19:46, Laurent Bercot <ska-supervision@skarnet.org> wrote: > [kind words] Thanks! > I would advise to implement just the features you need. Such a tool, > like fghack, is very handy to have when there's no other choice; however, > we don't want people to rely on them, to use them as excuses for bad > coding practices. And if you make pidsig too powerful, you *know* it's > going to happen. In principle, I agree - it does what I originally wanted, I have some corner cases in mind that can be difficult to tackle (what if pidsig itself crashes, but the daemon below it doesn't - shall it try to do something with the pid file?). These kinds of problems are not that theoretical - just recently I saw svscan/svscanboot crashing on a >1y uptime box, taking many of the processes with it, including most of the supervise infrastructure, very likely not due to any fault in them - could be oom gone wild, cosmic rays hitting svscan memory, whatever). The host stayed up and admin access was possible (thanks mainly to sshd being a long running daemon), but the state the processes was left in was nothing to be glad about. Another question would be if there are more ways to reliably connect to any given process detecting it being gone - but all the current daemons that I run can be handled now :) > Working in professional environments has made me very disillusioned > about the way people design software (when they design it) and understand > Unix (when they understand it). Having workarounds is good, but if you > make it too easy for people to misbehave, *they will*. > > So... scratch your itch, but don't go out of your way to accommodate > all kinds of ugly practices. :) I hear you, points taken :) Please return to your regularly scheduled supervision :) Janos ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-03 16:53 ` Janos Farkas @ 2010-06-03 19:25 ` Laurent Bercot 2010-06-04 16:26 ` Wayne Marshall 0 siblings, 1 reply; 10+ messages in thread From: Laurent Bercot @ 2010-06-03 19:25 UTC (permalink / raw) To: supervision > These kinds of problems are not that theoretical - just recently I saw > svscan/svscanboot crashing on a >1y uptime box, taking many of the > processes with it, including most of the supervise infrastructure, > very likely not due to any fault in them - could be oom gone wild, > cosmic rays hitting svscan memory, whatever). That's a typical case of "weak" supervision, as opposed to a "strong" supervision chain. "Strong" supervision makes sure that all the infrastructure is connected to init. * svscan achieves strong supervision *if* svscanboot is flagged as "respawn" in /etc/inittab on System V-style inits, in /etc/event.d/ with Upstart, or in /etc/gettys on BSD. It does *not* achieve it if svscanboot is started via some rc.local script (as the stock daemontools instructions tell you to do, shame on DJB! :)) * perp is in the same boat, depending on how you start perpboot. * Paul Jarc has instructions on how to directly run svscan as process 1. * runit achieves strong supervision if you're using runit-init. * s4, my own supervision suite (to be released next summer) can be run from a respawner, but was also designed so s4-svscan can run as process 1. Strong supervision makes sure that your supervisor process tree is *always* alive and complete, unless process 1 itself crashes, in which case you're doomed to reboot anyway. > The host stayed up and admin access was possible (thanks mainly to > sshd being a long running daemon), but the state the processes was > left in was nothing to be glad about. This will, unfortunately, be the case even with a strong supervision chain : if a branch of the supervision tree is broken, the whole subtree is reconstructed from the breaking point, but the old subtree might still be alive and locking resources, preventing the new subtree from being fully functional (and filling your logs with warning messages). An intervention from the administrator is always necessary. The advantage is, all the admin has to do is kill the old subtree (including services) and everything will be working perfectly again. I'm not sure it's possible to design a supervision suite that addresses this problem cleanly without endangering service reliability. > Another question would be if there are more ways to reliably connect > to any given process detecting it being gone - but all the current > daemons that I run can be handled now :) Unfortunately, no; not without support from the process you want to monitor. There are only two ways of being notified of a process' death: - getting a SIGCHLD if you're the process' parent. That's what a supervisor uses (supervise, runsv, perpetrate, s4-supervise all work on this model). - getting an EOF on a pipe or socket you're listening to, when the monitored process is the only writer on the other side. That's what fghack uses (and pidsig too, I presume). The EOF method does not require the monitorer to be the monitoree's parent, so it's more flexible; but it does require the monitoree to not go out of its way to arbitrarily close fds. If some daemon forks itself and resists fghack and pidsig, you're out of luck: it definitely won't be supervised. If you have the process' pid, you can poll the process table, but polling - as opposed to notification - is evil. Also, a fundamental problem is that pids do NOT uniquely identify a process (which is the main flaw with .pid files), unless you're blessed with an OS where pids are 64 bits long. If you're willing to be non-portable, Linux should allow you to set an inotify fd listening to /proc/$pid's disappearance (if the /proc filesystem supports inotify, but there's no reason why it shouldn't). BSD might have a similar mechanism with kevent/kqueue. And all of this is definitely too much work for a lousy daemon. -- Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-03 19:25 ` Laurent Bercot @ 2010-06-04 16:26 ` Wayne Marshall 2010-06-04 16:54 ` Charlie Brady 0 siblings, 1 reply; 10+ messages in thread From: Wayne Marshall @ 2010-06-04 16:26 UTC (permalink / raw) To: Laurent Bercot, supervision On Thu, 3 Jun 2010 21:25:30 +0200 Laurent Bercot <ska-supervision@skarnet.org> wrote: > > These kinds of problems are not that theoretical - just > > recently I saw svscan/svscanboot crashing on a >1y uptime > > box, taking many of the processes with it, including most of > > the supervise infrastructure, very likely not due to any > > fault in them - could be oom gone wild, cosmic rays hitting > > svscan memory, whatever). > > That's a typical case of "weak" supervision, as opposed to a > "strong" supervision chain. "Strong" supervision makes sure > that all the infrastructure is connected to init. > > * svscan achieves strong supervision *if* svscanboot is > flagged as "respawn" in /etc/inittab on System V-style inits, > in /etc/event.d/ with Upstart, or in /etc/gettys on BSD. It > does *not* achieve it if svscanboot is started via some > rc.local script (as the stock daemontools instructions tell > you to do, shame on DJB! :)) > * perp is in the same boat, depending on how you start > perpboot. > ... > Strong supervision makes sure that your supervisor process > tree is *always* alive and complete, unless process 1 itself > crashes, in which case you're doomed to reboot anyway. > FWIW, the perp-setup(8)/perpboot(8) utilities do indeed enable such "strong supervision" in the default configurations on both BSD and Linux systems. Let me know if any question. > > Another question would be if there are more ways to reliably > > connect to any given process detecting it being gone - but > > all the current daemons that I run can be handled now :) > > Unfortunately, no; not without support from the process you > want to monitor. There are only two ways of being notified of > a process' death: > - getting a SIGCHLD if you're the process' parent. That's > what a supervisor uses (supervise, runsv, perpetrate, > s4-supervise all work on this model). > - getting an EOF on a pipe or socket you're listening to, > when the monitored process is the only writer on the other > side. That's what fghack uses (and pidsig too, I presume). > Also FWIW, the minit/ninit suites offer a "pidfilehack" utility that enables the supervisor to watch for SIGCHLD from non-progeny processes. It is clever and effective, but only works as intended if running minit/ninit as process 1. (The trick is based on the fact that process 1 inherits processes without parents.) Cheers, Wayne ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-04 16:26 ` Wayne Marshall @ 2010-06-04 16:54 ` Charlie Brady 2010-06-04 17:17 ` Wayne Marshall 2010-06-04 18:43 ` Laurent Bercot 0 siblings, 2 replies; 10+ messages in thread From: Charlie Brady @ 2010-06-04 16:54 UTC (permalink / raw) To: Wayne Marshall; +Cc: supervision On Fri, 4 Jun 2010, Wayne Marshall wrote: > On Thu, 3 Jun 2010 21:25:30 +0200 > Laurent Bercot <ska-supervision@skarnet.org> wrote: > > > > These kinds of problems are not that theoretical - just > > > recently I saw svscan/svscanboot crashing on a >1y uptime > > > box, taking many of the processes with it, including most of > > > the supervise infrastructure, very likely not due to any > > > fault in them - could be oom gone wild, cosmic rays hitting > > > svscan memory, whatever). > > > > That's a typical case of "weak" supervision, as opposed to a > > "strong" supervision chain. "Strong" supervision makes sure > > that all the infrastructure is connected to init. > > > > * svscan achieves strong supervision *if* svscanboot is > > flagged as "respawn" in /etc/inittab on System V-style inits, > > in /etc/event.d/ with Upstart, or in /etc/gettys on BSD. It > > does *not* achieve it if svscanboot is started via some > > rc.local script (as the stock daemontools instructions tell > > you to do, shame on DJB! :)) > > * perp is in the same boat, depending on how you start > > perpboot. > > ... > > Strong supervision makes sure that your supervisor process > > tree is *always* alive and complete, unless process 1 itself > > crashes, in which case you're doomed to reboot anyway. There is a weakness in this "strong supervision" model. Any service with a 'down' file will not be restarted if its supervise/runsv or svscan/runsvdir is replaced. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-04 16:54 ` Charlie Brady @ 2010-06-04 17:17 ` Wayne Marshall 2010-06-04 17:21 ` Charlie Brady 2010-06-04 18:43 ` Laurent Bercot 1 sibling, 1 reply; 10+ messages in thread From: Wayne Marshall @ 2010-06-04 17:17 UTC (permalink / raw) To: Charlie Brady; +Cc: supervision On Fri, 4 Jun 2010 12:54:46 -0400 (EDT) Charlie Brady <charlieb-supervision@budge.apana.org.au> wrote: > > On Thu, 3 Jun 2010 21:25:30 +0200 > > Laurent Bercot <ska-supervision@skarnet.org> wrote: > > > > > Strong supervision makes sure that your supervisor process > > > tree is *always* alive and complete, unless process 1 > > > itself crashes, in which case you're doomed to reboot > > > anyway. > > There is a weakness in this "strong supervision" model. Any > service with a 'down' file will not be restarted if its > supervise/runsv or svscan/runsvdir is replaced. > Why do you describe this as a "weakness"? The down flagfile is consulted only on startup of the supervisor. If the administrator has configured the service to be down on startup, presumably she wants it to be down on startup. Cheers, Wayne ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-04 17:17 ` Wayne Marshall @ 2010-06-04 17:21 ` Charlie Brady 2010-06-04 20:00 ` Wayne Marshall 0 siblings, 1 reply; 10+ messages in thread From: Charlie Brady @ 2010-06-04 17:21 UTC (permalink / raw) To: Wayne Marshall; +Cc: supervision On Fri, 4 Jun 2010, Wayne Marshall wrote: > On Fri, 4 Jun 2010 12:54:46 -0400 (EDT) > Charlie Brady <charlieb-supervision@budge.apana.org.au> wrote: > > > > On Thu, 3 Jun 2010 21:25:30 +0200 > > > Laurent Bercot <ska-supervision@skarnet.org> wrote: > > > > > > > Strong supervision makes sure that your supervisor process > > > > tree is *always* alive and complete, unless process 1 > > > > itself crashes, in which case you're doomed to reboot > > > > anyway. > > > > There is a weakness in this "strong supervision" model. Any > > service with a 'down' file will not be restarted if its > > supervise/runsv or svscan/runsvdir is replaced. > > Why do you describe this as a "weakness"? The down flagfile is > consulted only on startup of the supervisor. If the > administrator has configured the service to be down on startup, > presumably she wants it to be down on startup. I thought that we were discussing here the situation where the supervisor dies and is automatically restarted. That is not the 'on startup' where the adminstrator intends the service to be down. "on startup" is long gone, and the adminstrator has started the service, and wants it to continue running. The automated restart of the supervisor shouldn't change that running state. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-04 17:21 ` Charlie Brady @ 2010-06-04 20:00 ` Wayne Marshall 0 siblings, 0 replies; 10+ messages in thread From: Wayne Marshall @ 2010-06-04 20:00 UTC (permalink / raw) To: supervision On Fri, 4 Jun 2010 13:21:18 -0400 (EDT) Charlie Brady <charlieb-supervision@budge.apana.org.au> wrote: > > On Fri, 4 Jun 2010, Wayne Marshall wrote: > > > On Fri, 4 Jun 2010 12:54:46 -0400 (EDT) > > Charlie Brady <charlieb-supervision@budge.apana.org.au> > > wrote: > > > > > > On Thu, 3 Jun 2010 21:25:30 +0200 > > > > Laurent Bercot <ska-supervision@skarnet.org> wrote: > > > > > > > > > Strong supervision makes sure that your supervisor > > > > > process tree is *always* alive and complete, unless > > > > > process 1 itself crashes, in which case you're doomed > > > > > to reboot anyway. > > > > > > There is a weakness in this "strong supervision" model. Any > > > service with a 'down' file will not be restarted if its > > > supervise/runsv or svscan/runsvdir is replaced. > > > > Why do you describe this as a "weakness"? The down flagfile > > is consulted only on startup of the supervisor. If the > > administrator has configured the service to be down on > > startup, presumably she wants it to be down on startup. > > I thought that we were discussing here the situation where the > supervisor dies and is automatically restarted. That is not > the 'on startup' where the adminstrator intends the service to > be down. "on startup" is long gone, and the adminstrator has > started the service, and wants it to continue running. The > automated restart of the supervisor shouldn't change that > running state. > Well, the down flagfile is relevant on startup of the supervisor. Starting/restarting the supervisor may occur at system boot, and/or any number of times thereafter. Supervisors do not normally need to know or care between system boot and "thereafter". If the administrator needs to differentiate between system boot and "thereafter", she will have probably need to effect that differentiation through her system boot/shutdown scripts. But that is her problem, rather than a weakness in the supervisory model. Cheers, Wayne ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: pidsig 0.11 - a fghack like de-daemonisation tool 2010-06-04 16:54 ` Charlie Brady 2010-06-04 17:17 ` Wayne Marshall @ 2010-06-04 18:43 ` Laurent Bercot 1 sibling, 0 replies; 10+ messages in thread From: Laurent Bercot @ 2010-06-04 18:43 UTC (permalink / raw) To: supervision > There is a weakness in this "strong supervision" model. Any service with a > 'down' file will not be restarted if its supervise/runsv or > svscan/runsvdir is replaced. If a branch of the supervision tree dies, the old subtree, including leaves (i.e. services) is still alive. Manual admin intervention is necessary to kill it off and recreate a new subtree, connected to init. If there are any services with down files, but that need to be alive, the admin can take care of them at that time. Now, if a service has a down file, and its supervisor dies, *and then* the service dies too, then the service won't be restarted indeed; but we're talking about a double failure, which should be uncommon. Nevertheless, down files are a decrease in reliability. They're practical for test and manual intervention purposes, but I've never met a real-life case where they are necessary. It's always possible to boot the machine with a nearly-empty svscan directory and populate it during the later initialization phases. -- Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-06-04 20:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-02 6:08 pidsig 0.11 - a fghack like de-daemonisation tool Janos Farkas 2010-06-02 18:46 ` Laurent Bercot 2010-06-03 16:53 ` Janos Farkas 2010-06-03 19:25 ` Laurent Bercot 2010-06-04 16:26 ` Wayne Marshall 2010-06-04 16:54 ` Charlie Brady 2010-06-04 17:17 ` Wayne Marshall 2010-06-04 17:21 ` Charlie Brady 2010-06-04 20:00 ` Wayne Marshall 2010-06-04 18:43 ` Laurent Bercot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).