Hi again,

A colleague suggested doing the following experiment: call List.map on a large list and throw an exception from deep down in the call chain.

Now the backtrace I get contains 1022 entries for map, an entry for the raise site and some other entry. This matches the 1024 limit of BACKTRACE_BUFFER_SIZE. Since the limit has been reached, the backtrace is useless to diagnose the stack overflow. This matches my understanding of caml_stash_backtrace: all stack frames are inspected and reported as long as there is space in the trace buffer.

So it seems there is something funny happening when a stack overflow is detected in the SIGSEGV handler:  there are only 3 trace entries whereas the stack contains over a hundred thousand frames. Is this intended behavior?

If it is of any help I am including the test program. I am using Ocaml 3.12.0 on a x86-64 platform.

Cheers,

Alexey

On Mon, Jul 16, 2012 at 3:51 PM, Alexey Rodriguez <mrchebas@gmail.com> wrote:
Hi,

I am having trouble understanding exception backtraces for stack overflows.

Sometimes the backtrace only contains entries for the function that filled the stack with frames (you would see many backtrace entries pointing to List.map if you were trying to map a very long list). Such traces are useless to fix the stack overflow since you cannot use them to find the code path that leads to List.map.

In other situations, the backtrace contains the full path from the Ocaml entry point to the recursive functions that is blowing up the stack. In these situations the backtrace appears to have "compressed" the hundreds of thousands of frames that the recursive calls generated since there is only one entry for List.map.

Is there documentation that explains when you get one backtrace or the other? I tried to understand the source code of caml_stash_backtrace and there it seems that all the stack frames are captured (if the backtrace buffer size allows). Casual inspection with gdb shows that caml_stash_backtrace does not get the full stack at the moment of the fault. Maybe the signal handler is skipping over the hundreds of thousands of frames somehow? If someone can elucidate this mystery for me I'll be very grateful!

I can provide more details if needed, but probably someone on the list can already help with this short description.

Oh, one more question on backtraces. I see that when tracing is enabled, caml_stash_backtrace is called whenever an exception is thrown. This might be expensive as Not_found is raised by many functions in the standard library. Is there a high overhead in leaving tracing enabled? This is useful in production systems as very often it is not possible to have the original inputs to trigger the bug in a debug build.

Thanks!

Alexey