From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/620 Path: main.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: runsvdir killed Date: Sat, 6 Nov 2004 20:42:16 +0200 Organization: asdfGroup Inc., http://www.asdfGroup.com/ Message-ID: <20041106184216.GB4568@home.power> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1099766550 6250 80.91.229.6 (6 Nov 2004 18:42:30 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 6 Nov 2004 18:42:30 +0000 (UTC) Original-X-From: supervision-return-859-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Nov 06 19:42:19 2004 Return-path: Original-Received: from antah.skarnet.org ([212.85.147.14] ident=qmailr) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1CQVVz-0004dx-00 for ; Sat, 06 Nov 2004 19:42:19 +0100 Original-Received: (qmail 19362 invoked by uid 76); 6 Nov 2004 18:42:40 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 19356 invoked from network); 6 Nov 2004 18:42:39 -0000 Original-To: supervision@list.skarnet.org Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline User-Agent: Mutt/1.5.6i Xref: main.gmane.org gmane.comp.sysutils.supervision.general:620 X-Report-Spam: http://spam.gmane.org/gmane.comp.sysutils.supervision.general:620 Hi! Sometimes when I check `ps axf` I see no runsvdir process, and all `runsv` processes has no parent (or their parent is process N1: runit-init). I think I know what happens - kernel has killed runsvdir because of 'out of memory' error (a lot of complex perl scripts earn all memory). Of course, kernel has killed not only runsvdir, but also it try to kill that perl scripts, mysql, etc. But this isn't a problem - perl scripts will be restarted by cron, mysql will be restarted by runsv, etc... but who will restart runsv if runsvdir is killed and runsv reparented (I not sure is this a correct english term) by runit-init? So, the question is: how to restore killed runsvdir without reboot? And the second question: I suppose killing runsvdir mean exiting stage2 and entering stage3 for reboot/halt... is this correct? And if this correct why this may not happens in my case? P.S. Yeah, I know, perl scripts eating all memory and kernel starting killing processes isn't correct behaviour for server. But for now I've no idea why this happens, so I can't fix it. On that server I got kernel oops/panic every 12-72 hours, and I've not found any information about these oopses in google. I use huge number of simultaneous download in that perl scripts (non-blocking sockets) 24/7/365 and I suppose I hit some unknown race condition bug in kernel because same mystic oops/panic happens on different servers with different kernels. 'Out of memory' errors, for example, happens usually after dnscachex or mysql stop accepting new connections by unknown reason. So perl script load into memory (about 35-50 MB memory used), try to connect to database and hang because mysql don't accept connection and don't return any error... after 1 minute next perl script started by cron and hang too... etc. Of course I can add alarm() around connect to mysql or refuse to start perl script if 2/3 memory already used, but this is super-ugly workarounds and don't solve anything. -- WBR, Alex.