supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Suddenly sv does not start, gives a timeout
@ 2013-05-22  9:30 Peter Hickman
  2013-05-22 10:16 ` Robin Bowes
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Hickman @ 2013-05-22  9:30 UTC (permalink / raw)
  To: <supervision@list.skarnet.org>

[-- Attachment #1: Type: text/plain, Size: 615 bytes --]

One of our servers has started to have a problem with runit. Even after a
reboot we get this:

$ sv start ./service/unicorn/
timeout: down: ./service/unicorn/: 1s, normally up, want up

This has just started without (as far as we can tell) there being any
change to the server. I've even nuked the ./service/* directory so that it
will get rebuilt when the application is deployed (via capistrano - this is
a Rails app) but that does not seem to help.

The other 23 servers which are set up in the same way have no problem so I
am at a loss as to where to start looking.

Any idea of where I should look for clues?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Suddenly sv does not start, gives a timeout
  2013-05-22  9:30 Suddenly sv does not start, gives a timeout Peter Hickman
@ 2013-05-22 10:16 ` Robin Bowes
  2013-05-22 13:32   ` Peter Hickman
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Bowes @ 2013-05-22 10:16 UTC (permalink / raw)
  To: Peter Hickman; +Cc: <supervision@list.skarnet.org>

On Wed, 2013-05-22 at 10:30 +0100, Peter Hickman wrote:

> 
> Any idea of where I should look for clues?

In the logs. What do the logs say?

Or try stopping the service and running it manually from the command
line so you can see the output from the run script.

R.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Suddenly sv does not start, gives a timeout
  2013-05-22 10:16 ` Robin Bowes
@ 2013-05-22 13:32   ` Peter Hickman
  2013-05-22 13:40     ` Charlie Brady
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Hickman @ 2013-05-22 13:32 UTC (permalink / raw)
  Cc: <supervision@list.skarnet.org>

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

Well this is what we have. Firstly we manually started it so lets kill it:

$ ps ax | grep scorecard
  731 ?        S      0:11 runsv scorecard_cricket_scores_importer
 2980 ?        Sl     0:34 services/scorecard_cricket_scores_importer.rb


16599 pts/0    S+     0:00 grep scorecard
$ kill -9 2980
$ ps ax | grep scorecard
  731 ?        S      0:11 runsv scorecard_cricket_scores_importer
16671 pts/0    S+     0:00 grep scorecard

The process has gone and will not be restarted no matter how long you wait.
So we try and start it with sv:

$ sv start ./service/scorecard_cricket_scores_importer/
timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
up, want up
$ ps ax | grep scorecard
  731 ?        S      0:11 runsv scorecard_cricket_scores_importer
16868 pts/0    S+     0:00 grep scorecard

Still not started. So we try it manually:

$ ./service/scorecard_cricket_scores_importer/run &
[1] 16929
$ ps ax | grep scorecard
  731 ?        S      0:12 runsv scorecard_cricket_scores_importer
16929 pts/0    Sl     0:10 services/scorecard_cricket_scores_importer.rb


18896 pts/0    R+     0:00 grep scorecard
$

And it keeps running without any problems for as long as you let it

There are no errors in the logs and nothing reported in:

runsvdir -P /etc/service log:
..................................................................................................................................................................................................................................................................

Is there some other runit log that I should look into?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Suddenly sv does not start, gives a timeout
  2013-05-22 13:32   ` Peter Hickman
@ 2013-05-22 13:40     ` Charlie Brady
  2013-05-22 14:22       ` Peter Hickman
  0 siblings, 1 reply; 5+ messages in thread
From: Charlie Brady @ 2013-05-22 13:40 UTC (permalink / raw)
  To: Peter Hickman; +Cc: <supervision@list.skarnet.org>




> Well this is what we have. Firstly we manually started it so lets kill it:
> 
> $ ps ax | grep scorecard
>   731 ?        S      0:11 runsv scorecard_cricket_scores_importer
>  2980 ?        Sl     0:34 services/scorecard_cricket_scores_importer.rb
> 
> 
> 16599 pts/0    S+     0:00 grep scorecard
> $ kill -9 2980

You have a race condition here - process 2980 may have already died. Use 
"sv d services/scorecard_cricket_scores_importer.rb" to stop the process.

You also should not be using -9 unless you have exhausted other options. 
Use -TERM or -QUIT. Using -9 is a bad habit to have.

> $ ps ax | grep scorecard
>   731 ?        S      0:11 runsv scorecard_cricket_scores_importer
> 16671 pts/0    S+     0:00 grep scorecard
> 
> The process has gone and will not be restarted no matter how long you wait.
> So we try and start it with sv:
> 
> $ sv start ./service/scorecard_cricket_scores_importer/
> timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
> up, want up
> $ ps ax | grep scorecard
>   731 ?        S      0:11 runsv scorecard_cricket_scores_importer
> 16868 pts/0    S+     0:00 grep scorecard
> 
> Still not started. So we try it manually:
> 
> $ ./service/scorecard_cricket_scores_importer/run &
> [1] 16929

Why start it in the background?

> $ ps ax | grep scorecard
>   731 ?        S      0:12 runsv scorecard_cricket_scores_importer
> 16929 pts/0    Sl     0:10 services/scorecard_cricket_scores_importer.rb
> 
> 
> 18896 pts/0    R+     0:00 grep scorecard
> $
> 
> And it keeps running without any problems for as long as you let it
> 
> There are no errors in the logs and nothing reported in:

Then your service is faulty. Failing silently is not satisfactory.

Use strace to see what your process is doing, and when and why it is 
exiting.

> runsvdir -P /etc/service log:
> ..................................................................................................................................................................................................................................................................
> 
> Is there some other runit log that I should look into?
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Suddenly sv does not start, gives a timeout
  2013-05-22 13:40     ` Charlie Brady
@ 2013-05-22 14:22       ` Peter Hickman
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Hickman @ 2013-05-22 14:22 UTC (permalink / raw)
  To: Charlie Brady; +Cc: <supervision@list.skarnet.org>

[-- Attachment #1: Type: text/plain, Size: 386 bytes --]

Aaargh found the cause and it was not sv :)

The way ruby was installed on the machine had changed but the change was
not visible if you logged on as the user and ran the commands manually.
However when sv did it's magic it didn't have the same PATH values and
failed.

I am off to view the log files and see who I should castigate >_<

Thank you for your time and sorry for wasting it

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-22 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-22  9:30 Suddenly sv does not start, gives a timeout Peter Hickman
2013-05-22 10:16 ` Robin Bowes
2013-05-22 13:32   ` Peter Hickman
2013-05-22 13:40     ` Charlie Brady
2013-05-22 14:22       ` Peter Hickman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).