From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1060 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Vision for new platform Date: Sun, 10 Jun 2012 12:33:59 -0400 Message-ID: <20120610163359.GJ163@brightrain.aerifal.cx> References: <20120518010620.GW163@brightrain.aerifal.cx> <20120609192756.6e72f25e@sibserver.ru> <20120609074426.496a5e13@newbook> <20120609212411.GA163@brightrain.aerifal.cx> <87lijwnmao.fsf@gmail.com> <20120610132246.GF163@brightrain.aerifal.cx> <20120610225226.137363d0@sibserver.ru> <20120610151311.GH163@brightrain.aerifal.cx> <20120610235125.31f38cd7@sibserver.ru> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1339346387 21840 80.91.229.3 (10 Jun 2012 16:39:47 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 10 Jun 2012 16:39:47 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1061-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jun 10 18:39:44 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Sdl9r-00078i-0Q for gllmg-musl@plane.gmane.org; Sun, 10 Jun 2012 18:38:31 +0200 Original-Received: (qmail 23639 invoked by uid 550); 10 Jun 2012 16:38:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 23631 invoked from network); 10 Jun 2012 16:38:30 -0000 Content-Disposition: inline In-Reply-To: <20120610235125.31f38cd7@sibserver.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:1060 Archived-At: On Sun, Jun 10, 2012 at 11:51:25PM +0800, orc wrote: > > I don't think you're getting the issue at hand. Suppose you want to be > > able to automatically bring down a particular daemon -- perhaps to > > restart it with completely new configuration or to switch to a new > > version of it. This could happen as part of an automated upgrade > > process or under manual admin control. > > 'Automated' often becomes the source of problems, if this automated > subsystem is not engineered properly. If we want daemon that will be > responsible for other's daemons status and it will start and stop them > automatically based on the admin's decision than it must be > well-engineered and tested in many types of situations first. Without "automated", how do you intend for non-technical users to upgrade important system components when their old version has a critical vulnerability? Even if the system has a technically qualified admin, nobody wants to go manually upgrading/restarting daemons on tens, hundreds, or thousands of boxes... I agree automation is a huge source of problems, but I think they're fundamental problems you can't just pretend don't exist. > > (even one run by a user as opposed to by root with a > > separate config file and running on a separate port) > > Killing processes based on uid/gid and cmdline can be achieved with > pkill already, No, it cannot. Before you can solve any of these problems you must understand that you can't use resource handles that belong to another process which could invalidate them behind your back. This is a core principle of concurrency programming, and pids are such a resource. As an aside, I used to really dislike the push towards multi-threaded programming because concurrency is error-prone and hard to get right. Then I realized that basically all unix systems programming is concurrent programming, just disguised to look safe... > > to killing > > unrelated processes (by scanning /proc or reading a pid file, then > > subsequently killing the pid which might not belong to a different > > process). > > Again, pkill much better than "traditional" > "kill $(cat /var/run/daemon.pid)" that most of init script use today > (Am I right?) No. The pkill approach is the "doing things as stupid as killing any instance of the daemon" in my text you quoted. At least with pid files, you know the pid you kill _at one time_ belonged to the daemon you wanted to kill. With pkill, you'll pick up completely independent instances of the same program binary. > > If daemons really didn't exit unexpectedly, the only race condition in > > pid-based approaches to lifetime management would be races between > > multiple scripted administrative actions (e.g. 2 admins trying to down > > the daemon at the same time) which could be fixed by locking at the > > script level. > > Hm, for me that situation sounds a bit strange: even script will exit > with 'daemon already stopped' or script will send an additional signal No. It will send a TERM/KILL signal to a new process that happens to have the same PID as the already-killed daemon. If you get lucky, no such new process exists, but that's called "getting lucky" which has no place in robust systems. > I partially agree with approach that such daemon for monitoring status > of other daemons should be developed, but I think this daemon should > control only critical processes for admin, such as: My view is this: 1. On a hobbyist or fully self-maintained system where you're willing to manually do all the work of upgrading/restarting things, or on certain embedded systems where reboot-on-upgrade is acceptable or where you're sure you won't need security updates (because the system does not interact with potentially-dangerous inputs), just start all the daemons from your init script with no management and be done with it. Components should not be designed in ways that _preclude_ this ultra-simple setup. 2. On everything else, use your choice of robust daemon management tool that starts daemons as direct children and therefore can observe their death and/or intentionally kill them without any race conditions. Rich