* Suddenly sv does not start, gives a timeout
@ 2013-05-22 9:30 Peter Hickman
2013-05-22 10:16 ` Robin Bowes
0 siblings, 1 reply; 5+ messages in thread
From: Peter Hickman @ 2013-05-22 9:30 UTC (permalink / raw)
To: <supervision@list.skarnet.org>
[-- Attachment #1: Type: text/plain, Size: 615 bytes --]
One of our servers has started to have a problem with runit. Even after a
reboot we get this:
$ sv start ./service/unicorn/
timeout: down: ./service/unicorn/: 1s, normally up, want up
This has just started without (as far as we can tell) there being any
change to the server. I've even nuked the ./service/* directory so that it
will get rebuilt when the application is deployed (via capistrano - this is
a Rails app) but that does not seem to help.
The other 23 servers which are set up in the same way have no problem so I
am at a loss as to where to start looking.
Any idea of where I should look for clues?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Suddenly sv does not start, gives a timeout
2013-05-22 9:30 Suddenly sv does not start, gives a timeout Peter Hickman
@ 2013-05-22 10:16 ` Robin Bowes
2013-05-22 13:32 ` Peter Hickman
0 siblings, 1 reply; 5+ messages in thread
From: Robin Bowes @ 2013-05-22 10:16 UTC (permalink / raw)
To: Peter Hickman; +Cc: <supervision@list.skarnet.org>
On Wed, 2013-05-22 at 10:30 +0100, Peter Hickman wrote:
>
> Any idea of where I should look for clues?
In the logs. What do the logs say?
Or try stopping the service and running it manually from the command
line so you can see the output from the run script.
R.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Suddenly sv does not start, gives a timeout
2013-05-22 10:16 ` Robin Bowes
@ 2013-05-22 13:32 ` Peter Hickman
2013-05-22 13:40 ` Charlie Brady
0 siblings, 1 reply; 5+ messages in thread
From: Peter Hickman @ 2013-05-22 13:32 UTC (permalink / raw)
Cc: <supervision@list.skarnet.org>
[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]
Well this is what we have. Firstly we manually started it so lets kill it:
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
2980 ? Sl 0:34 services/scorecard_cricket_scores_importer.rb
16599 pts/0 S+ 0:00 grep scorecard
$ kill -9 2980
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16671 pts/0 S+ 0:00 grep scorecard
The process has gone and will not be restarted no matter how long you wait.
So we try and start it with sv:
$ sv start ./service/scorecard_cricket_scores_importer/
timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
up, want up
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16868 pts/0 S+ 0:00 grep scorecard
Still not started. So we try it manually:
$ ./service/scorecard_cricket_scores_importer/run &
[1] 16929
$ ps ax | grep scorecard
731 ? S 0:12 runsv scorecard_cricket_scores_importer
16929 pts/0 Sl 0:10 services/scorecard_cricket_scores_importer.rb
18896 pts/0 R+ 0:00 grep scorecard
$
And it keeps running without any problems for as long as you let it
There are no errors in the logs and nothing reported in:
runsvdir -P /etc/service log:
..................................................................................................................................................................................................................................................................
Is there some other runit log that I should look into?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Suddenly sv does not start, gives a timeout
2013-05-22 13:32 ` Peter Hickman
@ 2013-05-22 13:40 ` Charlie Brady
2013-05-22 14:22 ` Peter Hickman
0 siblings, 1 reply; 5+ messages in thread
From: Charlie Brady @ 2013-05-22 13:40 UTC (permalink / raw)
To: Peter Hickman; +Cc: <supervision@list.skarnet.org>
> Well this is what we have. Firstly we manually started it so lets kill it:
>
> $ ps ax | grep scorecard
> 731 ? S 0:11 runsv scorecard_cricket_scores_importer
> 2980 ? Sl 0:34 services/scorecard_cricket_scores_importer.rb
>
>
> 16599 pts/0 S+ 0:00 grep scorecard
> $ kill -9 2980
You have a race condition here - process 2980 may have already died. Use
"sv d services/scorecard_cricket_scores_importer.rb" to stop the process.
You also should not be using -9 unless you have exhausted other options.
Use -TERM or -QUIT. Using -9 is a bad habit to have.
> $ ps ax | grep scorecard
> 731 ? S 0:11 runsv scorecard_cricket_scores_importer
> 16671 pts/0 S+ 0:00 grep scorecard
>
> The process has gone and will not be restarted no matter how long you wait.
> So we try and start it with sv:
>
> $ sv start ./service/scorecard_cricket_scores_importer/
> timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
> up, want up
> $ ps ax | grep scorecard
> 731 ? S 0:11 runsv scorecard_cricket_scores_importer
> 16868 pts/0 S+ 0:00 grep scorecard
>
> Still not started. So we try it manually:
>
> $ ./service/scorecard_cricket_scores_importer/run &
> [1] 16929
Why start it in the background?
> $ ps ax | grep scorecard
> 731 ? S 0:12 runsv scorecard_cricket_scores_importer
> 16929 pts/0 Sl 0:10 services/scorecard_cricket_scores_importer.rb
>
>
> 18896 pts/0 R+ 0:00 grep scorecard
> $
>
> And it keeps running without any problems for as long as you let it
>
> There are no errors in the logs and nothing reported in:
Then your service is faulty. Failing silently is not satisfactory.
Use strace to see what your process is doing, and when and why it is
exiting.
> runsvdir -P /etc/service log:
> ..................................................................................................................................................................................................................................................................
>
> Is there some other runit log that I should look into?
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-05-22 14:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-22 9:30 Suddenly sv does not start, gives a timeout Peter Hickman
2013-05-22 10:16 ` Robin Bowes
2013-05-22 13:32 ` Peter Hickman
2013-05-22 13:40 ` Charlie Brady
2013-05-22 14:22 ` Peter Hickman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).