caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Race conditions with asynchronous exceptions in standard library
@ 2011-07-06  3:26 Khoo Yit Phang
  2011-07-06  8:17 ` Mark Shinwell
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Khoo Yit Phang @ 2011-07-06  3:26 UTC (permalink / raw)
  To: caml-list; +Cc: Khoo Yit Phang

[-- Attachment #1: Type: text/plain, Size: 3038 bytes --]

Hi all,

I recently discovered a race condition when asynchronous exceptions, i.e., exceptions raised by signal handlers, are used with the DelimCC library. After consulting with Oleg Kiselyov, he realized that the problem with asynchronous exceptions is far more pervasive than I had originally thought. For example, the following program, using only the standard OCaml library, leads to a segfault:

> let q = Queue.create ();;
> 
> let () =
>     let en = ref false in
>     Sys.set_signal Sys.sigalrm (Sys.Signal_handle (fun _ -> if !en then raise Exit));
>     ignore (Unix.setitimer Unix.ITIMER_REAL { Unix.it_interval=1e-6; Unix.it_value=1e-3 });
>     while true do
>         try
>             en := true;
>             Queue.add "a" q;
>             en := false;
>             ignore (Queue.pop q)
>         with Exit ->
>             en := false;
>             if Queue.length q > 0 then prerr_endline "Non-empty";
>             assert (let len = Queue.length q in len = 0 || len = 1);
>             Queue.iter print_string q;
>             Queue.clear q;
>     done

The problem occurs in Queue.ml:

> let create () = {
>   length = 0;
>   tail = Obj.magic None
> }
> 
> let add x q =
>   q.length <- q.length + 1;
>   (* asynchronous exception occurs here *)
>   ...

When Queue.add is interrupted by an exception immediately after the length is updated, the length becomes inconsistent with the tail, and subsequent operations such as Queue.iter will attempt to operate on Obj.magic None, leading to the segfault.

Probably much of the standard library is similarly susceptible to such races, particularly the parts that operate on mutable data. These mostly lead to less dramatic consequences than segfaults, but are still seemingly random errors such as corrupted data or violated invariants.

For DelimCC, my solution was to provide a pair of C functions, mask_signals and unmask_signals, to bracket operations that are unsafe under asynchronous exceptions, using sigprocmask to suppress signal handling between them. However, since this requires access to OCaml's signal handling internals (caml_process_pending_signals and caml_signals_are_pending) to flush pending signals, it would be great to have mask_signals and unmask_signals or something similar in the standard library, so as to make it easier to develop libraries that are safe under asynchronous exceptions. (For reference, Haskell has seen this and other issues with asynchronous exception and implemented a similar solution, see http://hackage.haskell.org/trac/ghc/ticket/1036.)

Another option would be to simply warn against or disallow signal handlers that raise exceptions, but that seems less useful, e.g., it would make it hard to interrupt a long-running library function with a timeout.

We look forward to hearing what the official fix or guideline for handling asynchronous exception will be, for the standard library, third-party libraries, as well as applications.


Thank you,

Yit
July 5, 2011

[-- Attachment #2: Type: text/html, Size: 7208 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread
* [Caml-list] Re: Race conditions with asynchronous exceptions in standard library
@ 2011-07-06  9:06 oleg
  0 siblings, 0 replies; 9+ messages in thread
From: oleg @ 2011-07-06  9:06 UTC (permalink / raw)
  To: caml-list


Just a few clarifications. The posted Queue code gives the
segmentation fault when compiled with ocamlopt without threads. When
bytecode-compiled, the code is accidentally safe, at least on my
platform. The bytecode interpreter checks for signals when executing
any of the `application' instructions or popping the exception
handler. It just so happens that no function application instructions
were executed in the critical section of the standard library Queue
code. That is a mere accident however: a new version of the standard
library (or Batteries) might add some sort of debugging printing; or
the code generation will change to emit calls to auxiliary, debugging
or tracing facilities.

Incidentally, comments in the bytecode interpreter justify a particular
piece of code by saying that a signal handler may raise an
exception. Thus Xavier Leroy certainly did allow for such signal
handlers, at least as a possibility.

Mark Shinwell wrote:
> Specifically in the case of signal handlers, I would recommend
> restricting processing in them to an absolute minimum, and in
> particular not throwing exceptions.

If this is the consensus, to which INRIA assents, it ought to be
written in the user documentation, alongside of other warnings (like
living in harmony with GC). At the same time, one should describe the
recommended solution to Yit's original problem, to interrupt a
long-running library function with a timeout.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-07-09 13:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-06  3:26 [Caml-list] Race conditions with asynchronous exceptions in standard library Khoo Yit Phang
2011-07-06  8:17 ` Mark Shinwell
2011-07-06  9:04   ` Fabrice Le Fessant
2011-07-06 18:06 ` Gerd Stolpmann
2011-07-06 21:13 ` Richard W.M. Jones
2011-07-06 23:40 ` [Caml-list] " Khoo Yit Phang
2011-07-07  0:07   ` Gerd Stolpmann
2011-07-09 13:02 ` [Caml-list] " ygrek
2011-07-06  9:06 [Caml-list] " oleg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).