supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* runit small fix
@ 2003-07-01  0:30 Hleil Liu
  2003-07-03  7:26 ` Gerrit Pape
  0 siblings, 1 reply; 11+ messages in thread
From: Hleil Liu @ 2003-07-01  0:30 UTC (permalink / raw)


hello list,

After a no-clean shutdown,boot system may run fsck,and it will create a
/service/lost+found directory,this make  runsvdir break.

We usually make soft link to /service to run a service,so before stage 2,rm
all except soft link under /service may fix this.Thus,we do not have to edit
other system scripts.

and this at the top of /etc/runit/2 or at the bottom of /etc/runit/1;

for i in /service/*;do [ -h $i ] || rm -rf $i;done

regard!



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-01  0:30 runit small fix Hleil Liu
@ 2003-07-03  7:26 ` Gerrit Pape
  2003-07-03  8:09   ` Hleil Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Gerrit Pape @ 2003-07-03  7:26 UTC (permalink / raw)


On Tue, Jul 01, 2003 at 08:30:24AM +0800, Hleil Liu wrote:
> After a no-clean shutdown,boot system may run fsck,and it will create a
> /service/lost+found directory,this make  runsvdir break.

I can't see how this should break runsvdir.  What's the impact on your
system besides on more runsv process that tries to start lost+found/run?

It's one of the runit and daemontools design's advantage that one bad
configured service doesn't affect other services.

> and this at the top of /etc/runit/2 or at the bottom of /etc/runit/1;
> 
> for i in /service/*;do [ -h $i ] || rm -rf $i;done

If you only put symlinks into /service, this should work fine for you.
But no part of the documentation states that only symlinks are allowed.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03  7:26 ` Gerrit Pape
@ 2003-07-03  8:09   ` Hleil Liu
  2003-07-03  8:34     ` Gerrit Pape
  0 siblings, 1 reply; 11+ messages in thread
From: Hleil Liu @ 2003-07-03  8:09 UTC (permalink / raw)


On Thu, 3 Jul 2003 09:26:40 +0200
Gerrit Pape <pape@smarden.org> wrote:

> On Tue, Jul 01, 2003 at 08:30:24AM +0800, Hleil Liu wrote:
> > After a no-clean shutdown,boot system may run fsck,and it will create a
> > /service/lost+found directory,this make  runsvdir break.
> 
> I can't see how this should break runsvdir.  What's the impact on your
> system besides on more runsv process that tries to start lost+found/run?
> 

The directory /service/lost+found is *empty*,so,when you want shutdown your 
system,

svwaitdown -xk -t350 /service/*

will not end.so stage 3 failed.

you can test this by 

mkdir /service/test

then shutdwon,you can not shutdown.


> It's one of the runit and daemontools design's advantage that one bad
> configured service doesn't affect other services.
> 

sure.but it can not process an empty /service/$dir.


> > and this at the top of /etc/runit/2 or at the bottom of /etc/runit/1;
> > 
> > for i in /service/*;do [ -h $i ] || rm -rf $i;done
> 
> If you only put symlinks into /service, this should work fine for you.
> But no part of the documentation states that only symlinks are allowed.

sure,thus,we must test all directory structure correct at /service before stage 2.
or we put only symlinks there.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03  8:09   ` Hleil Liu
@ 2003-07-03  8:34     ` Gerrit Pape
  2003-07-03  9:11       ` Hleil Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Gerrit Pape @ 2003-07-03  8:34 UTC (permalink / raw)


On Thu, Jul 03, 2003 at 04:09:58PM +0800, Hleil Liu wrote:
> On Thu, 3 Jul 2003 09:26:40 +0200
> Gerrit Pape <pape@smarden.org> wrote:
> > On Tue, Jul 01, 2003 at 08:30:24AM +0800, Hleil Liu wrote:
> > > After a no-clean shutdown,boot system may run fsck,and it will create a
> > > /service/lost+found directory,this make  runsvdir break.
> > I can't see how this should break runsvdir.  What's the impact on your
> > system besides on more runsv process that tries to start lost+found/run?
> 
> The directory /service/lost+found is *empty*,so,when you want shutdown your 
> system,
> 
> svwaitdown -xk -t350 /service/*
> 
> will not end.so stage 3 failed.
> 
> you can test this by 
> 
> mkdir /service/test
> 
> then shutdwon,you can not shutdown.

Works fine for me:

# mkdir /service/test
# runsvstat /service/test/
/service/test/: run (pid 7461) 0 seconds
# time svwaitdown -xk -t350 /service/test

real    0m1.005s
user    0m0.000s
sys     0m0.000s
#

svwaitdown with the -k -t350 option will _always_ return, at the worst
after 350 seconds.  This is another advantage of runit's design, a
broken service will not break system shutdown.

> > It's one of the runit and daemontools design's advantage that one bad
> > configured service doesn't affect other services.

> sure.but it can not process an empty /service/$dir.

Sure it can.  A runsv process is started for this directory, which tries
to start $dir/run once a second.  It perfectly can be taken down with
svwaitdown.

Of course this service is broken, because there's no run file, but it
doesn't break other services, or runsvdir, or system shutdown.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03  8:34     ` Gerrit Pape
@ 2003-07-03  9:11       ` Hleil Liu
  2003-07-03 10:09         ` Gerrit Pape
  0 siblings, 1 reply; 11+ messages in thread
From: Hleil Liu @ 2003-07-03  9:11 UTC (permalink / raw)


On Thu, 3 Jul 2003 10:34:43 +0200
Gerrit Pape <pape@smarden.org> wrote:


> Of course this service is broken, because there's no run file, but it
> doesn't break other services, or runsvdir, or system shutdown.
> 

would you like make runit works fine with such like empty /service/$dir?
Wait 350 seconds seems too long.So I just power off:)

Regard!



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03  9:11       ` Hleil Liu
@ 2003-07-03 10:09         ` Gerrit Pape
  2003-07-03 12:55           ` Hleil Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Gerrit Pape @ 2003-07-03 10:09 UTC (permalink / raw)


On Thu, Jul 03, 2003 at 05:11:51PM +0800, Hleil Liu wrote:
> On Thu, 3 Jul 2003 10:34:43 +0200
> Gerrit Pape <pape@smarden.org> wrote:
> > Of course this service is broken, because there's no run file, but it
> > doesn't break other services, or runsvdir, or system shutdown.
> 
> would you like make runit works fine with such like empty /service/$dir?
> Wait 350 seconds seems too long.So I just power off:)

As I showed in my last mail, it already does.  It takes _one_ second for
svwaitdown to take this broken service down, not 350.  There's another
service holding up the shutdown.  It must be a service that takes some
time to terminate after receiving a TERM signal, most probably this is a
console login.  Did you logout all consoles before you switched off the
machine?

If 350 seconds generally is too long for you, adapt the timeout in
/etc/runit/3, that's why this is a command line option.  Note that after
this timeout still running services will receive a KILL signal.

I still don't see anything that needs to be fixed in runit for you.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03 10:09         ` Gerrit Pape
@ 2003-07-03 12:55           ` Hleil Liu
  2003-07-03 14:35             ` Laurent Bercot
  0 siblings, 1 reply; 11+ messages in thread
From: Hleil Liu @ 2003-07-03 12:55 UTC (permalink / raw)


On Thu, 3 Jul 2003 12:09:45 +0200
Gerrit Pape <pape@smarden.org> wrote:

> As I showed in my last mail, it already does.  It takes _one_ second for
> svwaitdown to take this broken service down, not 350.  There's another
> service holding up the shutdown.  It must be a service that takes some
> time to terminate after receiving a TERM signal, most probably this is a
> console login.  Did you logout all consoles before you switched off the
> machine?
> 

In fact,I use sshd,After I test runit OK,I use your sshd run script,no locale
console login.After I type reboot through sshd,the monitor give me this messages
again and again:

Waiting for services to stop...

So,console login stop correctly.I think the service waiting for stop is that
empty /service/$dir which has not a run script.

And,I want to say:
In my original  RedHat config script,I run stage 1 below:

/etc/rc.d/rc.sysinit
/etc/rc.d/rc 3

and now,I remove 

/etc/rc.d/rc 3

from it to get the whole benefits.In stage 2 I manually start network service to
bring all network interface up.It works fine.

> If 350 seconds generally is too long for you, adapt the timeout in
> /etc/runit/3, that's why this is a command line option.  Note that after
> this timeout still running services will receive a KILL signal.
> 

Sorry for my poor english,maybe I cannt described my point correctly.
I think that,when a  service recieve a TERM signal,it should stop itself
not depends on runit.Seems we should not KILL a died service depends
on a period of time,there is no reason do that.I think use a counter or
depends on signal if probalble will be more reasonable.Ex:if a service
cannt stop after we send it a TERM signal more then 5 times,we just KILL it.

And,why runsvdir try to starts a runsv process for the empty directory in /service
which without a run script even?


Best Regard!




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03 12:55           ` Hleil Liu
@ 2003-07-03 14:35             ` Laurent Bercot
  2003-07-03 15:21               ` Hleil Liu
  2003-07-04  0:50               ` Hleil Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Laurent Bercot @ 2003-07-03 14:35 UTC (permalink / raw)


> In fact,I use sshd,After I test runit OK,I use your sshd run script,no locale
> console login.After I type reboot through sshd,the monitor give me this messages
> again and again:
> 
> Waiting for services to stop...
> 
> So,console login stop correctly.I think the service waiting for stop is that
> empty /service/$dir which has not a run script.

 Profile, don't speculate.
 Interrupt the shutdown process and find yourself a shell. Then, do a
"ps afuxww" (no f if you're not on Linux) to see the list of remaining
processes. You'll see exactly which service is causing you trouble.

 Here is a wild guess: since you're keeping a sshd process alive, your
sshd logger doesn't see EOF when the sshd main process is killed
(because your sshd child instance is keeping the pipe open). Unless I'm
mistaken, runit doesn't violently kill loggers - it waits for them to
exit cleanly after the service has been killed. But your sshd logger
just doesn't exit, and runit waits forever. QED.

 Rebooting through sshd _will_ cause you trouble if you don't know
exactly what you are doing. The next problem you'll face is: if sshd
is not installed on your root partition, you won't be able to unmount
your filesystems.

-- 
 Ska


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03 14:35             ` Laurent Bercot
@ 2003-07-03 15:21               ` Hleil Liu
  2003-07-04  0:50               ` Hleil Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Hleil Liu @ 2003-07-03 15:21 UTC (permalink / raw)


On Thu, 3 Jul 2003 16:35:10 +0200
Laurent Bercot <ska-supervision@skarnet.org> wrote:

>  Interrupt the shutdown process and find yourself a shell. Then, do a
> "ps afuxww" (no f if you're not on Linux) to see the list of remaining
> processes. You'll see exactly which service is causing you trouble.
> 

stage 3 dont accept signal.

>  Here is a wild guess: since you're keeping a sshd process alive, your
> sshd logger doesn't see EOF when the sshd main process is killed
> (because your sshd child instance is keeping the pipe open). Unless I'm
> mistaken, runit doesn't violently kill loggers - it waits for them to
> exit cleanly after the service has been killed. But your sshd logger
> just doesn't exit, and runit waits forever. QED.
> 

My sshd service and it's log work correct.If I open a ssh client,then I login
through console,type reboot,system reboot as it expect.But the same error
happend when there is a empty directory in /service even if system not
established a ssh session.

>  Rebooting through sshd _will_ cause you trouble if you don't know
> exactly what you are doing. The next problem you'll face is: if sshd
> is not installed on your root partition, you won't be able to unmount
> your filesystems.

If all service start ok,reboot through sshd or console will all ok.

And,this is a test box,so only a / partition and a swap partition.

Regard!



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-03 14:35             ` Laurent Bercot
  2003-07-03 15:21               ` Hleil Liu
@ 2003-07-04  0:50               ` Hleil Liu
  2003-07-04 10:03                 ` Gerrit Pape
  1 sibling, 1 reply; 11+ messages in thread
From: Hleil Liu @ 2003-07-04  0:50 UTC (permalink / raw)


On Thu, 3 Jul 2003 16:35:10 +0200
Laurent Bercot <ska-supervision@skarnet.org> wrote:

> 
>  Here is a wild guess: since you're keeping a sshd process alive, your
> sshd logger doesn't see EOF when the sshd main process is killed
> (because your sshd child instance is keeping the pipe open). Unless I'm
> mistaken, runit doesn't violently kill loggers - it waits for them to
> exit cleanly after the service has been killed. But your sshd logger
> just doesn't exit, and runit waits forever. QED.
> 

You are right.I found this problem.After I type reboot through ssh,I type exit,
then system reboot as expect.Is this right way?

Thanks for your advice!

Regard!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: runit small fix
  2003-07-04  0:50               ` Hleil Liu
@ 2003-07-04 10:03                 ` Gerrit Pape
  0 siblings, 0 replies; 11+ messages in thread
From: Gerrit Pape @ 2003-07-04 10:03 UTC (permalink / raw)


On Fri, Jul 04, 2003 at 08:50:09AM +0800, Hleil Liu wrote:
> On Thu, 3 Jul 2003 16:35:10 +0200 Laurent Bercot
> <ska-supervision@skarnet.org> wrote:
> >  Here is a wild guess: since you're keeping a sshd process alive,
> >  your sshd logger doesn't see EOF when the sshd main process is
> >  killed (because your sshd child instance is keeping the pipe open).
> >  Unless I'm mistaken, runit doesn't violently kill loggers - it
> >  waits for them to exit cleanly after the service has been killed.
> >  But your sshd logger just doesn't exit, and runit waits forever.
> >  QED.
> 
> You are right.I found this problem.After I type reboot through ssh,I
> type exit, then system reboot as expect.Is this right way?

Yes.  This is how runsv takes down services with log services through
svwaitdown in stage 3:

runsv sends the service a TERM signal, and waits for the service to
terminate.  If the service exits, the log pipe provided by runsv is
closed, the log service sees EOF and exits cleanly.  Both services now
are down, all logs are written, and runsv exits.

If the service doesn't exit after receiving a TERM signal within a given
timeout, runsv sends it a KILL signal, and closes the log pipe itself,
so that the log service exits cleanly.  Both services now are down (but
you may lose logs) and runsv exits.  There is no point in sending a
service multiple TERM signals.

This is what happens with the ssh service:  On login, sshd forks a new
sshd process to handle this ssh connection.  On shutdown, runsv is told
to take down the service through svwaitdown, sends the ssh service a
TERM signal and waits for the ssh service and the log service to exit.
The sshd process indeed terminates, but the sshd child process handling
the connection is still running, and possibly writing to the log pipe.
So the log service keeps running to collect log messages from the ssh
service.  If you terminate the ssh connection, the sshd child process
exits and the log pipe is closed.  All logs now are written, the log
service exits cleanly, and runsv finally exits.  If you don't terminate
your ssh session until the timeout is reached, runsv closes the log pipe
itself, and the log service and runsv exits.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-07-04 10:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-01  0:30 runit small fix Hleil Liu
2003-07-03  7:26 ` Gerrit Pape
2003-07-03  8:09   ` Hleil Liu
2003-07-03  8:34     ` Gerrit Pape
2003-07-03  9:11       ` Hleil Liu
2003-07-03 10:09         ` Gerrit Pape
2003-07-03 12:55           ` Hleil Liu
2003-07-03 14:35             ` Laurent Bercot
2003-07-03 15:21               ` Hleil Liu
2003-07-04  0:50               ` Hleil Liu
2003-07-04 10:03                 ` Gerrit Pape

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).