From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2183 Path: news.gmane.org!not-for-mail From: Lee Hambley Newsgroups: gmane.comp.sysutils.supervision.general Subject: Race Condition? Date: Thu, 14 Mar 2013 08:03:30 +0100 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0112bfa6c32ef104d7dd1e4b X-Trace: ger.gmane.org 1363244614 2902 80.91.229.3 (14 Mar 2013 07:03:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Mar 2013 07:03:34 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-2417-gcsg-supervision=m.gmane.org@list.skarnet.org Thu Mar 14 08:03:56 2013 Return-path: Envelope-to: gcsg-supervision@plane.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UG2Ch-0008Ao-N4 for gcsg-supervision@plane.gmane.org; Thu, 14 Mar 2013 08:03:55 +0100 Original-Received: (qmail 28752 invoked by uid 76); 14 Mar 2013 06:01:40 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 28742 invoked from network); 14 Mar 2013 06:01:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=YWhWX9dbNvc45boVf2DyygvIZEQvmaaoZL7FvyR/SU8=; b=p0MET1kKmC6sMnqMEra6uTnwLARlNZzKAclmrn/1FuKbDDluOQXuYndhb+AQsHfyDA CU87Io4Khxsv3GtSL4BEXMzAiVlTmtPylhj3qF3MqO1j42NotoJjRX/tPQi9hLXqedoO 64fwMvIE9o7zD9BVzB++ReEcGkU0Yh8oEk9oBkJxWLwxIk3lB7K3ihz8Vd5wLK5+f7// hnuA9lLrgtUhp6S8i+ll/ttTwrtK5r1eCx5k+iHpv+F4MuAhykhsn/9d6uf34/cBQGJB vSE9IL9PMrHgNZ6HGaRB4ldsYLKUQRtKUrvQ3mSW5eueAiMwutxC2tQEZk2c8rrZxof6 e8Bg== X-Received: by 10.112.139.7 with SMTP id qu7mr700807lbb.18.1363244610692; Thu, 14 Mar 2013 00:03:30 -0700 (PDT) In-Reply-To: Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2183 Archived-At: --089e0112bfa6c32ef104d7dd1e4b Content-Type: text/plain; charset=UTF-8 I'm using runit in cooperation with Monit, we are still using the init.d scripts that shipped with Ubuntu 12.04 LTS, but using runit for all application level processes. Monit also watches over a couple of the system level scripts. We're seeing something where Monit (I believe is to blame) is causing the following: root@runitvm:~# ps aux | grep runsvdir | grep -v grep > root 1079 0.0 0.0 188 32 ? Ss 15:52 0:00 runsvdir > -P /etc/service log: > .........................................................................................................................................................................................................................runsv > apache2: fatal: unable to setup filedescriptor for ./run: file descriptor > not open?runsv apache2: fatal: unable to setup filedescriptor for ./run: > file descriptor not open? *runsv apache2: fatal: unable to setup filedescriptor for ./run: file descriptor not open?* I haven't been able to debug this, and the box only recovers when restarted (reboot). There's a possible explanation here detailing a kind of race condition: http://blog.gmane.org/gmane.comp.sysutils.supervision.general/month=20100801 An extract from the mailing list thread I linked: ...is that at some point, your runsv ran through that code, but > somehow managed to live and the services didn't die, i.e. another control > message was sent and processed before the exit condition was reached, and > runsv is still trying to supervise things - but runs into trouble with the > closed logpipe... It's my supposition that Monit is signalling the runsv process too often, and leaving it in a broken state, I haven't been able to verify this though. I wanted to run this by the mailing list before I pour too much time into debugging something that may already be a known problem with an obvious (to those wiser than I) workaround. I'm running on: $ dpkg -s runit Architecture: amd64 Version: 2.1.1-6.2ubuntu2 $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.1 LTS Release: 12.04 Codename: precise Thanks advance for any assistance, in the meantime I'm trying to tell Monit to be less aggressive. Lee Hambley -- http://lee.hambley.name/ --089e0112bfa6c32ef104d7dd1e4b--