supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* runsvdir killed
@ 2004-11-06 18:42 Alex Efros
  2004-11-07 13:54 ` Gerrit Pape
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Efros @ 2004-11-06 18:42 UTC (permalink / raw)


Hi!

Sometimes when I check `ps axf` I see no runsvdir process, and all `runsv`
processes has no parent (or their parent is process N1: runit-init).

I think I know what happens - kernel has killed runsvdir because of 'out of
memory' error (a lot of complex perl scripts earn all memory). Of course,
kernel has killed not only runsvdir, but also it try to kill that perl scripts,
mysql, etc. But this isn't a problem - perl scripts will be restarted by cron,
mysql will be restarted by runsv, etc... but who will restart runsv if runsvdir
is killed and runsv reparented (I not sure is this a correct english term) by
runit-init?

So, the question is: how to restore killed runsvdir without reboot?
And the second question: I suppose killing runsvdir mean exiting stage2 and
entering stage3 for reboot/halt... is this correct? And if this correct why
this may not happens in my case?


P.S. Yeah, I know, perl scripts eating all memory and kernel starting killing
processes isn't correct behaviour for server. But for now I've no idea why
this happens, so I can't fix it. On that server I got kernel oops/panic every
12-72 hours, and I've not found any information about these oopses in google.
I use huge number of simultaneous download in that perl scripts (non-blocking
sockets) 24/7/365 and I suppose I hit some unknown race condition bug in kernel
because same mystic oops/panic happens on different servers with different
kernels. 'Out of memory' errors, for example, happens usually after dnscachex
or mysql stop accepting new connections by unknown reason. So perl script load
into memory (about 35-50 MB memory used), try to connect to database and hang
because mysql don't accept connection and don't return any error... after 1
minute next perl script started by cron and hang too... etc. Of course I can
add alarm() around connect to mysql or refuse to start perl script if 2/3
memory already used, but this is super-ugly workarounds and don't solve
anything.

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: runsvdir killed
  2004-11-06 18:42 runsvdir killed Alex Efros
@ 2004-11-07 13:54 ` Gerrit Pape
  2004-11-07 19:40   ` Alex Efros
  0 siblings, 1 reply; 3+ messages in thread
From: Gerrit Pape @ 2004-11-07 13:54 UTC (permalink / raw)


On Sat, Nov 06, 2004 at 08:42:16PM +0200, Alex Efros wrote:
> Sometimes when I check `ps axf` I see no runsvdir process, and all `runsv`
> processes has no parent (or their parent is process N1: runit-init).
> 
> I think I know what happens - kernel has killed runsvdir because of 'out of
> memory' error (a lot of complex perl scripts earn all memory). Of course,
> kernel has killed not only runsvdir, but also it try to kill that perl
> scripts, mysql, etc. But this isn't a problem - perl scripts will be
> restarted by cron, mysql will be restarted by runsv, etc... but who will
> restart runsv if runsvdir is killed and runsv reparented (I not sure is this
> a correct english term) by runit-init?

The runit program running as process 1 monitors the stage 2 which by
default is the runsvdir process.  If runsvdir, and so /etc/runit/2,
crashes or exits 111, runit restarts /etc/runit/2.  If it exits 0, runit
enters stage 3 and runs /etc/runit/3; see the runit(8) man page.  Either
of them should happen on your system if /etc/runit/2 is terminated.

> So, the question is: how to restore killed runsvdir without reboot?  And the
> second question: I suppose killing runsvdir mean exiting stage2 and entering
> stage3 for reboot/halt... is this correct? And if this correct why this may
> not happens in my case?

You can send the runsvdir process a HUP signal to have stage 2
restarted, but this should almost never be needed.

> P.S. Yeah, I know, perl scripts eating all memory and kernel starting
> killing processes isn't correct behaviour for server. But for now I've no
> idea why this happens, so I can't fix it. On that server I got kernel
> oops/panic every 12-72 hours, and I've not found any information about these
[...]
This sounds really broken.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: runsvdir killed
  2004-11-07 13:54 ` Gerrit Pape
@ 2004-11-07 19:40   ` Alex Efros
  0 siblings, 0 replies; 3+ messages in thread
From: Alex Efros @ 2004-11-07 19:40 UTC (permalink / raw)


Hi!

On Sun, Nov 07, 2004 at 01:54:44PM +0000, Gerrit Pape wrote:
> The runit program running as process 1 monitors the stage 2 which by
> default is the runsvdir process.  If runsvdir, and so /etc/runit/2,
> crashes or exits 111, runit restarts /etc/runit/2.  If it exits 0, runit
> enters stage 3 and runs /etc/runit/3; see the runit(8) man page.  Either
> of them should happen on your system if /etc/runit/2 is terminated.

I've configured "catch-all" log in /var/log/everything/ using pipe from
runsvdir to svlogd in /etc/runit/2:

    #!/bin/sh
    PATH=...[cut]...
    exec env - PATH=$PATH runsvdir /var/service 'log: ...[cut]...' |
	svlogd /var/log/everything

(this work because I've added "e*" in /var/log/*/config for most services
which I want to see in everything-log plus I've added 2>&1 in most
/service/*/log/run).

So, if runsvdir is killed then /etc/runit/2 probably don't exit because svlogd
is still running and runit-init don't restart /etc/runit/2.

> You can send the runsvdir process a HUP signal to have stage 2
> restarted, but this should almost never be needed.

I can't send HUP because it's killed. Probably right answer to my question is:
kill svlogd executed in /etc/runit/2, so /etc/runit/2 will exit and runit-init
will restart it.

> This sounds really broken.

I know. :(

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-11-07 19:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-06 18:42 runsvdir killed Alex Efros
2004-11-07 13:54 ` Gerrit Pape
2004-11-07 19:40   ` Alex Efros

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).