Am Donnerstag, den 20.03.2014, 13:29 +0800 schrieb ygrek:
> On Thu, 20 Mar 2014 00:21:36 +0100
> Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> 
> > OCaml printing isn't printf in C - it calls directly write() and is
> > always possible. OCaml signal handlers aren't signal handlers from the C
> > viewpoint: When the signal is caught, a flag in the OCaml runtime is
> > set, and this flag is regularly checked by the running code. (I.e. what
> > you suggest is already done in the runtime.)
> > 
> > So, I'd say this is a bug in the OCaml runtime. The bug goes away when
> > you print to a different channel from the signal handler, so it looks
> > like channels and signal handlers have some unwanted effect on each
> > other.
> 
> stdlib channels are protected with non-recursive mutex, so the deadlock on re-entrant invocation is guaranteed. 

No, there aren't any mutexes here involved - the failing program is
single-threaded.

> AFAICS runtime system tries to execute signal immediately (see signal_handle in asmrun/signals_asm.c)
> and if that is not possible - records signal for later execution.

Right, this handles the case that the current thread is doing a blocking
system call (sorry, forgot this case). Apparently, the OCaml code is
then run directly from the signal handler.

> Anyway doing complex stuff in signal handler is a bad idea, because even with delayed processing (when
> things are safe from the libc point of view) the points of invocation of OCaml signal handler are scattered
> all around the program (allocation sites) and any OCaml resource that doesn't support reentrant usage will break
> the program.

Anyway, I don't think this has anything to do with calling
non-signal-safe libc functions (so far I can see the only called
function is write()). It is most likely because flush isn't reentrant:

caml_flush() calls caml_flush_partial() which in turn calls do_write().
The signal arrives during the write() syscall, and in the signal handler
another flush is invoked, for the same channel. The effect is that (so
far I see it) channel->offset and channel->curr are set to illegal
values.

I don't see how this can be fixed properly. You probably can avoid the
livelock by doing nothing when flush is invoked for the second time, but
(a) this changes the semantics of flushing, and (b) doesn't fix the
other potential problems (when a flush is interrupted and one of the
other channel functions is called from the signal handler).

Nevertheless it's a bit surprising that innocent-looking OCaml code
turns out as unsafe. The current state is a bit unsatisfactory, at
least.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------