supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* runit - runsv thinking a dead process is up
@ 2008-10-17 20:29 Charles Duffy
  2008-10-21 10:00 ` Gerrit Pape
  0 siblings, 1 reply; 3+ messages in thread
From: Charles Duffy @ 2008-10-17 20:29 UTC (permalink / raw)
  To: supervision

Per subject; I'm being told that the child is up (and with a given pid), 
but that pid doesn't exist:

# sv status postgresql
run: postgresql: (pid 3023) 1229s; run: log: (pid 2519) 4430s
# pstree -a -p 3023
# ls -l /proc/3023
ls: /proc/3023: No such file or directory
# sv status postgresql
run: postgresql: (pid 3023) 1241s; run: log: (pid 2519) 4442s

This is not easily reproducible -- presumably an odd race condition or 
somesuch.

Observed with runit 1.8.0; please let me know if newer releases contain 
germane fixes.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: runit - runsv thinking a dead process is up
  2008-10-17 20:29 runit - runsv thinking a dead process is up Charles Duffy
@ 2008-10-21 10:00 ` Gerrit Pape
  2008-11-10 18:47   ` Charles Duffy
  0 siblings, 1 reply; 3+ messages in thread
From: Gerrit Pape @ 2008-10-21 10:00 UTC (permalink / raw)
  To: supervision

Charles Duffy <Charles_Duffy <at> messageone.com> writes:
> Per subject; I'm being told that the child is up (and with a given pid), 
> but that pid doesn't exist:
> 
> # sv status postgresql
> run: postgresql: (pid 3023) 1229s; run: log: (pid 2519) 4430s
> # pstree -a -p 3023
> # ls -l /proc/3023
> ls: /proc/3023: No such file or directory
> # sv status postgresql
> run: postgresql: (pid 3023) 1241s; run: log: (pid 2519) 4442s
> 
> This is not easily reproducible -- presumably an odd race condition or 
> somesuch.

Hmm, I just re-read the code a bit, and cannot spot an error.  runsv
uses the selfpipe trick for SIGCHLD, and enters a loop to wait_nohang()
for the zombies, comparing their pids with the pids in its status.

> Observed with runit 1.8.0; please let me know if newer releases contain 
> germane fixes.

There were some changes to runsv since version 1.8.0 that affect the status
files handling, but while reviewing I can't spot an error either.

Any special libc or kernel version you're using?

Regards, Gerrit.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: runit - runsv thinking a dead process is up
  2008-10-21 10:00 ` Gerrit Pape
@ 2008-11-10 18:47   ` Charles Duffy
  0 siblings, 0 replies; 3+ messages in thread
From: Charles Duffy @ 2008-11-10 18:47 UTC (permalink / raw)
  To: supervision

Gerrit Pape wrote:
>> Observed with runit 1.8.0; please let me know if newer releases contain 
>> germane fixes.
> 
> There were some changes to runsv since version 1.8.0 that affect the status
> files handling, but while reviewing I can't spot an error either.

Hmm. I'll report back if we manage to reproduce the issue -- 
particularly with a different release of runit, or in a different 
environment.

> Any special libc or kernel version you're using?

Kernel and libc are from CentOS 5 -- kernel-2.6.18-53.el5, 
glibc-2.5-18.el5_1.1, on an amd64 system.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-10 18:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-17 20:29 runit - runsv thinking a dead process is up Charles Duffy
2008-10-21 10:00 ` Gerrit Pape
2008-11-10 18:47   ` Charles Duffy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).