I'm writing an application in which some threads do their job, communicating with the Concurrent Cell library. In this context I decided to avoid 1) the multiprocess model, for the sake of simplicity and 2) libraries like Lwt, because the threads use many external and non lwt-aware libraries.

When the process receives a SIGTERM, I want to smoothly kill the process. To do this I send (in the signal handler) a message to all the threads, and the main thread waits (Thread.join) for the other worker to finish. Then the main thread exits. To do this I install a signal handler in the main thread (Sys.set_signal)

I found a strange behavior: when the process receives the SIGTERM, the programs terminates with the exception Unix.Unix_error(11, "select", "").

I was able to remove all the noise and obtain a minimal program exposing the behavior. Here it is:

$ cat thread_test.ml

open Printf;;

Random.self_init ();;

let do_something id () =
  let _ = Thread.sigmask Unix.SIG_SETMASK [15] in
  while true
  do
    Thread.delay (1.0 +. (Random.float 1.0));
    printf "Thread %d\n%!" id
  done
;;

(* MAIN THREAD FROM HERE *)
let kill_handler _ = printf "Signal caught\n%!";;

Sys.set_signal 15 (Sys.Signal_handle kill_handler);;

ignore (Thread.create (do_something 1) ());;
ignore (Thread.create (do_something 2) ());;
ignore (Thread.create (do_something 3) ());;

while true
do
  Thread.delay 5.0; (* Unix.sleep 5 WORKS! *)
  printf "Main Thread\n%!";
done;;


The process should (at least I think) catch the SIGTERM signal and continue without problems. Instead the result is this:

$ ocamlfind ocamlopt -linkpkg -thread -package threads thread_test.ml -o thread_test
$ ./thread_test 
Thread 1
Thread 2
Thread 3
Thread 1
Thread 2
Thread 3
Thread 1
Thread 2
Main Thread
Thread 1
Thread 3
Thread 2
Thread 3
Thread 1
Thread 2
Thread 1
Thread 3
Signal caught  <--- in another shell: $ kill -15 PID
Fatal error: exception Unix.Unix_error(11, "select", "")

The exception is raised by the Thread.delay function present in the loop of the main thread. Note that if I use Unix.sleep 5 instead of Thread.delay 5.0 the program runs as expected.

Any ideas?

Best regards,


-- 
Paolo ⠠⠵