I'm writing an application in which some threads do their job, communicating with the Concurrent Cell library. In this context I decided to avoid 1) the multiprocess model, for the sake of simplicity and 2) libraries like Lwt, because the threads use many external and non lwt-aware libraries.
When the process receives a SIGTERM, I want to smoothly kill the process. To do this I send (in the signal handler) a message to all the threads, and the main thread waits (Thread.join) for the other worker to finish. Then the main thread exits. To do this I install a signal handler in the main thread (Sys.set_signal)
I found a strange behavior: when the process receives the SIGTERM, the programs terminates with the exception Unix.Unix_error(11, "select", "").
I was able to remove all the noise and obtain a minimal program exposing the behavior. Here it is:
open Printf;;
Random.self_init ();;
let do_something id () =
let _ = Thread.sigmask Unix.SIG_SETMASK [15] in
while true
do
Thread.delay (1.0 +. (Random.float 1.0));
printf "Thread %d\n%!" id
done
;;
(* MAIN THREAD FROM HERE *)
let kill_handler _ = printf "Signal caught\n%!";;
Sys.set_signal 15 (Sys.Signal_handle kill_handler);;
ignore (Thread.create (do_something 1) ());;
ignore (Thread.create (do_something 2) ());;
ignore (Thread.create (do_something 3) ());;
while true
do
Thread.delay 5.0; (* Unix.sleep 5 WORKS! *)
printf "Main Thread\n%!";
done;;
The process should (at least I think) catch the SIGTERM signal and continue without problems. Instead the result is this:
$ ocamlfind ocamlopt -linkpkg -thread -package threads thread_test.ml -o thread_test
$ ./thread_test
Thread 1
Thread 2
Thread 3
Thread 1
Thread 2
Thread 3
Thread 1
Thread 2
Main Thread
Thread 1
Thread 3
Thread 2
Thread 3
Thread 1
Thread 2
Thread 1
Thread 3
Signal caught <--- in another shell: $ kill -15 PID
Fatal error: exception Unix.Unix_error(11, "select", "")
The exception is raised by the Thread.delay function present in the loop of the main thread. Note that if I use Unix.sleep 5 instead of Thread.delay 5.0 the program runs as expected.
Any ideas?