Should svwaitup/down be built again, or how to make sv do this?

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

* Should svwaitup/down be built again, or how to make sv do this?
@ 2006-07-12  1:59 Kevin
  2006-07-12 15:20 ` Gerrit Pape
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin @ 2006-07-12  1:59 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

We have some services that have come to depend on behavior of 
svwaitup/svwaitdown to work.  As these are no longer built since 
runit-1.4.0, we're starting to face a problem, since we use the Debian 
packaged version:

sv -w 1000000 -v check <service> seems to return immediately.  We have no 
way to indefinitely block a service from starting without replacing this 
functionality with extra code that used to already exist, to our 
knowledge.

Are we missing something?

Thanks for any help you can provide.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-12  1:59 Should svwaitup/down be built again, or how to make sv do this? Kevin
@ 2006-07-12 15:20 ` Gerrit Pape
  2006-07-12 16:27   ` Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Gerrit Pape @ 2006-07-12 15:20 UTC (permalink / raw)

On Tue, Jul 11, 2006 at 08:59:45PM -0500, Kevin wrote:
> We have some services that have come to depend on behavior of 
> svwaitup/svwaitdown to work.  As these are no longer built since 
> runit-1.4.0, we're starting to face a problem, since we use the Debian 
> packaged version:
> 
> sv -w 1000000 -v check <service> seems to return immediately.  We have no 
> way to indefinitely block a service from starting without replacing this 
> functionality with extra code that used to already exist, to our 
> knowledge.
> 
> Are we missing something?

I'm not sure what exactly you used the svwait* programs for, so I cannot
say whether or how it works with 'sv check/start/stop'.

Regards, Gerrit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-12 15:20 ` Gerrit Pape
@ 2006-07-12 16:27   ` Kevin
  2006-07-13  8:44     ` Gerrit Pape
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin @ 2006-07-12 16:27 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

On Wednesday 12 July 2006 10:20, Gerrit Pape wrote:
> On Tue, Jul 11, 2006 at 08:59:45PM -0500, Kevin wrote:
> > We have some services that have come to depend on behavior of
> > svwaitup/svwaitdown to work.  As these are no longer built since
> > runit-1.4.0, we're starting to face a problem, since we use the
> > Debian packaged version:
> >
> > sv -w 1000000 -v check <service> seems to return immediately.  We
> > have no way to indefinitely block a service from starting without
> > replacing this functionality with extra code that used to already
> > exist, to our knowledge.
> >
> > Are we missing something?
>
> I'm not sure what exactly you used the svwait* programs for, so I
> cannot say whether or how it works with 'sv check/start/stop'.
>
> Regards, Gerrit.

We cannot make sv block like svwaitup did.  Either we're misunderstanding 
how to use it, or something.

An example service script is included below:

#!/bin/sh
svwaitup ~/service/sfsagent
exec chpst -e env runthis.sh

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-12 16:27   ` Kevin
@ 2006-07-13  8:44     ` Gerrit Pape
  2006-07-17 17:56       ` Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Gerrit Pape @ 2006-07-13  8:44 UTC (permalink / raw)

On Wed, Jul 12, 2006 at 11:27:21AM -0500, Kevin wrote:
> On Wednesday 12 July 2006 10:20, Gerrit Pape wrote:
> > I'm not sure what exactly you used the svwait* programs for, so I
> > cannot say whether or how it works with 'sv check/start/stop'.

> We cannot make sv block like svwaitup did.  Either we're misunderstanding 
> how to use it, or something.
> 
> An example service script is included below:
> 
> #!/bin/sh
> svwaitup ~/service/sfsagent
> exec chpst -e env runthis.sh

svwaitup did wait two seconds by default after the sfsagent service
daemon has been started, this seems to be what you relied on.  Indeed,
sv doesn't do that by default, but it supports the ./check script
instead.

If you want 'sv start ~/service/sfsagent' or 'sv check
~/service/sfsagent' to wait for two seconds before returning, simply do
'sleep 2' in ~/service/sfsagent/check.

Better yet, find out how to check for what the sfsagent service daemon
needs to provide before runthis.sh can be started, and check for that in
~/service/sfsagent/check.

E.g., the ./check script from the socklog-unix service runs a tiny
program that tries to connect to the /dev/log socket, without writing
anything to it, and only exits zero if the connect succeeds.  This makes
'sv start socklog-unix' wait until the socklog service daemon is
listening on /dev/log, and accepts local log messages.

Regards, Gerrit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-13  8:44     ` Gerrit Pape
@ 2006-07-17 17:56       ` Kevin
  2006-07-17 19:32         ` Stefan Karrmann
  2006-07-18  8:38         ` Uffe Jakobsen
  0 siblings, 2 replies; 8+ messages in thread
From: Kevin @ 2006-07-17 17:56 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On Thursday 13 July 2006 03:44, Gerrit Pape wrote:
> > We cannot make sv block like svwaitup did.  Either we're
> > misunderstanding how to use it, or something.
> >
> > An example service script is included below:
> >
> > #!/bin/sh
> > svwaitup ~/service/sfsagent
> > exec chpst -e env runthis.sh
>
> svwaitup did wait two seconds by default after the sfsagent service
> daemon has been started, this seems to be what you relied on.  Indeed,
> sv doesn't do that by default, but it supports the ./check script
> instead.

svwaitup waited until the service had been running for at least 2 seconds 
before exiting, no?


> If you want 'sv start ~/service/sfsagent' or 'sv check
> ~/service/sfsagent' to wait for two seconds before returning, simply do
> 'sleep 2' in ~/service/sfsagent/check.

That would be a useless check.  What we want is for the sfsagent service 
to be running, not just wait 2 seconds before starting.

> Better yet, find out how to check for what the sfsagent service daemon
> needs to provide before runthis.sh can be started, and check for that
> in ~/service/sfsagent/check.

So, it sounds like you're saying that the functionality of checking for a 
service to be running, and waiting until that service is running before 
continuing no longer exists in runit as it is presently built.  Is this 
correct?

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-17 17:56       ` Kevin
@ 2006-07-17 19:32         ` Stefan Karrmann
  2006-07-18  8:38         ` Uffe Jakobsen
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Karrmann @ 2006-07-17 19:32 UTC (permalink / raw)


Dear all,

due to racing conditions, you cannot guarantee that a service is active.

E.g. regard:
#! /bin/sh
svwaitup /service/foo
exec baz

After svwaitup returns with success, the service foo can crash before the
shell starts the command exec.

If foo crashes you can hope that runsv restarts it. It would be worse if
foo is in a deadlock.

Even if the control of foo is in baz or foo offers a control socket which
reports the activity of foo the racing can happen in baz itself.

Most of the time you ignore the racing conditions in Un*x.
The only formal solution is the usage of timeouts in baz together with
error handlers.

> So, it sounds like you're saying that the functionality of checking for a 
> service to be running, and waiting until that service is running before 
> continuing no longer exists in runit as it is presently built.  Is this 
> correct?

Kind regards,
-- 
Stefan Karrmann

Computer programmers never die, they just get lost in the processing.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-17 17:56       ` Kevin
  2006-07-17 19:32         ` Stefan Karrmann
@ 2006-07-18  8:38         ` Uffe Jakobsen
  2006-07-19 18:46           ` Kevin
  1 sibling, 1 reply; 8+ messages in thread
From: Uffe Jakobsen @ 2006-07-18  8:38 UTC (permalink / raw)

Kevin wrote:
> On Thursday 13 July 2006 03:44, Gerrit Pape wrote:
>>> We cannot make sv block like svwaitup did.  Either we're
>>> misunderstanding how to use it, or something.
>>>
>>> An example service script is included below:
>>>
>>> #!/bin/sh
>>> svwaitup ~/service/sfsagent
>>> exec chpst -e env runthis.sh
>> svwaitup did wait two seconds by default after the sfsagent service
>> daemon has been started, this seems to be what you relied on.  Indeed,
>> sv doesn't do that by default, but it supports the ./check script
>> instead.
> 
> svwaitup waited until the service had been running for at least 2 seconds 
> before exiting, no?
> 
> 
>> If you want 'sv start ~/service/sfsagent' or 'sv check
>> ~/service/sfsagent' to wait for two seconds before returning, simply do
>> 'sleep 2' in ~/service/sfsagent/check.
> 
> That would be a useless check.  What we want is for the sfsagent service 
> to be running, not just wait 2 seconds before starting.
> 
>> Better yet, find out how to check for what the sfsagent service daemon
>> needs to provide before runthis.sh can be started, and check for that
>> in ~/service/sfsagent/check.
> 
> So, it sounds like you're saying that the functionality of checking for a 
> service to be running, and waiting until that service is running before 
> continuing no longer exists in runit as it is presently built.  Is this 
> correct?

I may be wrong here but there seems to be some sort of confusion about the internal state of a service versus the external runit state running/not running.

The runit supervisor framework can only see if a service (process) is running - it has no idea if that service is yet functional or not.

Runit assumes that if the process is running then it is ok. It knows nothing about how long it will take for the process to become operational (as a service) from the point where it is initialized. 
You could have a services that performs 'deep and long' integrity checks during initialization before they start eg. listening and providing their services - and how long will 'deep and long' integrity checks during initialzation take ? seconds, minutes, hours or even days ? runit has no chance of knowing that - as long as the supervised process is running everything is ok from runit's perspactive.

My guess is that you've until now just been lucky that your service typically takes just a little less than 2 seconds to initialize and become operational.

That is why Gerrit suggests that you should implement a test in your check script that can determine if your service actually has become operational (functional) yet or not. Typical checks could be to test if the service is listening on a socket/port.

Let's hope I'm not all wrong :-)

Kind regards Uffe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Should svwaitup/down be built again, or how to make sv do this?
  2006-07-18  8:38         ` Uffe Jakobsen
@ 2006-07-19 18:46           ` Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Kevin @ 2006-07-19 18:46 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Tuesday 18 July 2006 03:38, Uffe Jakobsen wrote:

> I may be wrong here but there seems to be some sort of confusion about
> the internal state of a service versus the external runit state
> running/not running.
>
> The runit supervisor framework can only see if a service (process) is
> running - it has no idea if that service is yet functional or not.

Correct.  You did identify the point of confusion for us though, I think.  
I was only interested in the service's state.  We don't have any services 
that take any time to initialize like you spoke of later.

> My guess is that you've until now just been lucky that your service
> typically takes just a little less than 2 seconds to initialize and
> become operational.

None of our services take longer than that to become operational.  
Anything that would be so big typically gets broken down into smaller 
services.

> That is why Gerrit suggests that you should implement a test in your
> check script that can determine if your service actually has become
> operational (functional) yet or not. Typical checks could be to test if
> the service is listening on a socket/port.
>
> Let's hope I'm not all wrong :-)

You were not wrong, actually.  And I thank you and Gerrit both for your 
assistance.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-07-19 18:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-12  1:59 Should svwaitup/down be built again, or how to make sv do this? Kevin
2006-07-12 15:20 ` Gerrit Pape
2006-07-12 16:27   ` Kevin
2006-07-13  8:44     ` Gerrit Pape
2006-07-17 17:56       ` Kevin
2006-07-17 19:32         ` Stefan Karrmann
2006-07-18  8:38         ` Uffe Jakobsen
2006-07-19 18:46           ` Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).