From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id IAA00271; Mon, 23 Feb 2004 08:49:46 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA01313 for ; Mon, 23 Feb 2004 08:49:43 +0100 (MET) Received: from maury.inria.fr (maury.inria.fr [192.93.2.7]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i1N7ngae005077 for ; Mon, 23 Feb 2004 08:49:42 +0100 Received: from hosting.commandprompt.com (215.commandprompt.com [207.173.200.215]) by maury.inria.fr (8.12.10/8.12.10) with ESMTP id i1N7ndYT009706 for ; Mon, 23 Feb 2004 08:49:40 +0100 Received: from hosting.commandprompt.com (65-102-113-91.tukw.qwest.net [65.102.113.91]) (authenticated) by hosting.commandprompt.com (8.11.6/8.11.6) with ESMTP id i1MN17L11694; Sun, 22 Feb 2004 15:01:07 -0800 Received: by hosting.commandprompt.com (sSMTP sendmail emulation); Sun, 22 Feb 2004 15:04:21 -0800 Date: Sun, 22 Feb 2004 15:04:21 -0800 From: Jeff Henrikson To: caml-list@inria.fr Cc: jefhen@safeco.com Subject: [Caml-list] mingw dll hell Message-ID: <20040222230420.GA10833@respighi.local.> Mail-Followup-To: caml-list@inria.fr, jefhen@safeco.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-OS: Darwin respighi.local. 6.0 Power Macintosh X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Miltered: at maury by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; henrikson:01 jehenrik:01 mingw:01 caml-list:01 debugging:01 henrikson:01 mingw:01 exes:01 exes:01 silently:01 backtracing:01 distro:01 uninstalled:01 3.07:01 installer:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk -- Hello, during the caml-list outage of the last couple of weeks, it appears that I've been the false positive of some spam filter (two possible reasons being that I have an annoying company not-reconfigurable all-caps email address, and that there were some weird characters in my debugging output below), or else gotten lost in the mail interruption. Here are two messages I sent last week. Please cc replys to both this yahoo address and my work address jefhen@safeco.com. Thank you. Jeff Henrikson --------------- message #1 Subject: mingw dll hell Hello, I had a several module mixed ocaml / C code working on a windows machine. It mysteriously stopped working today. In fact, every module which does any C DLL linking stopped working! If I link anything against C code now I get a nonfunctioning exe. I havenr modules, and I changed only my PATH variable to contain and then to not contain the current directory. The behavior to be precise is that native EXEs seg fault before running any code that I can tell about, and bytecode EXEs fail silently. I admit that I had been playing around with the ocaml compiler and building it with some modifications, trying to get the backtracing behavior to improve on my example. I installed it over my binary ocaml distro a couple of times. But I was modifying my main project code at the same time. Now when stuff started breaking, I completely uninstalled ocaml (mingw-3.07p2), reinstalled from the binary installer, made clean all the projects, and everything is still broken. Even all the subprojects break now. I did not change the mingw tools at all to my knowledge. I can however link with the dll files in the ocaml binary distribution. Hmm. I can't think of any other dll hell possibilities floating around. /usr/bin/cygwin1.dll is the only copy on the system. Is this obvious to anyone? Sorry for being such an idiot. Some experiments follow. Thanks, Jeff Henrikson Here's the output of gdb on an example native EXE linked to a dll, again the corresponding bytecode program fails silently. The behavior of a "hello world" pcre program is the same. Program received signal SIGSEGV, Segmentation fault. 0x00335f9e in alloc_small () from /cygdrive/c/ocaml/bin/ocamlrun.dll (gdb) bt #0 0x00335f9e in alloc_small () from /cygdrive/c/ocaml/bin/ocamlrun.dll #1 0x100014d4 in _fu5__local_roots () at ocamole.cpp:229 #2 0x10002e16 in guid_null (_=1) at ocamole.cpp:838 #3 0x00407023 in Ocamole__entry () #4 0x00000001 in ?? () #5 0x00401355 in startup__code_begin () #6 0x00413dd2 in caml_start_program () (gdb) The build command of this example: ocamlopt -ccopt dllocamole.dll -ccopt -g -o olegen.exe ocamole.ml olegen.ml The build commands of the dll: gcc -I /cygdrive/c/ocaml/lib -I /usr/include/w32api -c ocamole.cpp gcc -shared -L /cygdrive/c/ocaml/lib -o dllocamole.dll -Wl,--out-implib,libocamo le.a ocamole.o -lole32 -loleaut32 -luuid -lkernel32 /cygdrive/c/ocaml/lib/ocamlr un.a -lstdc++ chmod 777 dllocamole.dll --------------- message #2 Subject: mingw dll hell Per my post of yesterday, I'm having strange dynamic linking problems on mingw. I'm getting into weird territory here. Current behavior summary: - Code which worked two days ago no longer works. I am not been able to relate this to any source code / compiler / system changes. Breakage includes compilation of some off the shelf ocaml modules. (eg, pcre-ocaml) - Upon further inspection, the code which crashes is almost everything containing a ffi dll load. Linking C functions with -custom works fine, but dynlinking does not, in standalone programs or in the toplevel. In most cases, the result of running such a program is silent failure. Printfs at the top of the program sometimes execute but do not print. "flush stdout" typically crashes the VM. - code which does not involve dynlinking does not crash at all, near as I can tell. The toplevel does not crash. Even doing #load "unix.cma";; ond running open_process doesn't crash. - I compiled ocamlrun.exe and ocamlrun.dll with debugging information so that I could attach gdb. Tracing through interp.c on test input indicates that a fatal exception is thrown within 10-20 VM calls to C functions with 1 parameter. Typically, the crash is raised by a function having nothing to do with the ffi, like Pervasives.flush. And typically it is raised after the FFI primitive table gets loaded but before any of the FFI primitives get called. I include an example in gory detail below. I am currently stuck thinking it must be something to do with the way I build dlls, but nothing has changed about that. All the same build scripts worked perfectly on Tuesday. No build tool changes, nothing. Here's the example. Man it's been a long day. Anybody see anything? This feels like fantasy land. If anybody can say hey stupid and wake me out of my bad dream I would appreciate it. Jeff Henrikson ************* msgbox.c #include #include value c_msgbox(value i) { MessageBox(NULL,"yay","hello",MB_OK); return Val_int(0); } ************* msgbox.ml external msgbox : int -> unit = "c_msgbox";; ************* msgtest.ml open Msgbox;; sin 0.0;; output_string stdout "std awake\n"; flush stdout;; sin 0.0;; msgbox 17;; cos 0.0;; ************* build.sh gcc -I /cygdrive/c/ocaml/lib -I /usr/include/w32api -c msgbox.c gcc -shared -Wl,--out-implib,libocamole.a /cygdrive/c/ocaml/lib/ocamlrun.a msgbox.o -o dllmsgbox.dll ocamlc -c msgbox.ml ocamlc -a -o msgbox.cma -dllib -lmsgbox ocamlc -o msgtest.exe msgbox.cma msgtest.ml ************* gdb session - need to -g compile ocamlrun.exe and ocamlrun.dll $ gdb ocamlrun.exe GNU gdb 2003-09-20-cvs (cygwin-special) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-cygwin"... (gdb) set args msgtest.exe (gdb) break main Breakpoint 1 at 0x4012d8: file main.c, line 38. (gdb) run Starting program: /cygdrive/c/src/dlltest/ocamlrun.exe msgtest.exe Breakpoint 1, main (argc=2, argv=0x3f40c0) at main.c:38 38 expand_command_line(&argc, &argv); (gdb) break interprete Breakpoint 2 at 0x1000119c: file interp.c, line 225. (gdb) break interp.c:850 Breakpoint 3 at 0x100022c0: file interp.c, line 850. (gdb) break startup.c:405 Breakpoint 4 at 0x10003d95: file startup.c, line 405. (gdb) continue Continuing. Breakpoint 2, interprete (prog=0x0, prog_size=0) at interp.c:225 225 if (prog == NULL) { /* Interpreter is initializing */ (gdb) Continuing. Breakpoint 2, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:225 225 if (prog == NULL) { /* Interpreter is initializing */ (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) Continuing. Breakpoint 3, interprete (prog=0x3fcb20, prog_size=6184) at interp.c:850 850 Setup_for_c_call; (gdb) step 851 accu = Primitive(*pc)(accu); (gdb) step caml_flush (vchannel=4836348) at io.c:553 553 struct channel * channel = Channel(vchannel); (gdb) step 555 if (channel->fd == -1) return Val_unit; (gdb) 556 Lock(channel); (gdb) 557 flush(channel); (gdb) flush (channel=0x4a0f18) at io.c:197 197 while (! flush_partial(channel)) /*nothing*/; (gdb) flush_partial (channel=0x4a0f18) at io.c:182 182 towrite = channel->curr - channel->buff; (gdb) 183 if (towrite > 0) { (gdb) 184 written = do_write(channel->fd, channel->buff, towrite); (gdb) do_write (fd=1, p=0x4a0f40 "std awake\n-----\rd-\rd-\rd54 enter_blocking_section(); (gdb) enter_blocking_section () at signals.c:125 125 temp = pending_signal; pending_signal = 0; (gdb) 126 if (temp) execute_signal(temp, 0); (gdb) 127 async_signal_mode = 1; (gdb) 128 if (!pending_signal) break; (gdb) 131 if (enter_blocking_section_hook != NULL) enter_blocking_section_hook() ; (gdb) 132 } (gdb) do_write (fd=1, p=0x4a0f40 "std awake\n-bbbd-d-d- rd-+"..., n=10) at io.c:155 155 retcode = write(fd, p, n); (gdb) 156 leave_blocking_section(); (gdb) leave_blocking_section () at signals.c:140 140 if (leave_blocking_section_hook != NULL) leave_blocking_section_hook() ; (gdb) 147 signal_number = pending_signal; (gdb) 148 pending_signal = 0; (gdb) 149 if (signal_number) execute_signal(signal_number, 1); (gdb) 152 async_signal_mode = 0; (gdb) 153 } (gdb) do_write (fd=1, p=0x4a0f40 "std awake\n-\rd-rd-f (retcode == -1) { (gdb) 158 if (errno == EINTR) goto again; (gdb) 159 if ((errno == EAGAIN || errno == EWOULDBLOCK) && n > 1) { (gdb) 169 if (retcode == -1) sys_error(NO_ARG); (gdb) sys_error (arg=1) at sys.c:97 97 CAMLparam1 (arg); (gdb) 99 CAMLlocal1 (str); (gdb) 101 if (errno == EAGAIN || errno == EWOULDBLOCK) { (gdb) 104 err = error_message(); (gdb) error_message () at sys.c:70 70 return strerror(errno); (gdb) 71 } (gdb) sys_error (arg=1) at sys.c:105 105 if (arg == NO_ARG) { (gdb) 106 str = copy_string(err); (gdb) copy_string (s=0x3fe9d0 "Bad file descriptor") at alloc.c:100 100 len = strlen(s); (gdb) 101 res = alloc_string(len); (gdb) alloc_string (len=19) at alloc.c:74 74 mlsize_t wosize = (len + sizeof (value)) / sizeof (value); (gdb) 76 if (wosize <= Max_young_wosize) { (gdb) 77 Alloc_small (result, wosize, String_tag); (gdb) 82 Field (result, wosize - 1) = 0; (gdb) 83 offset_index = Bsize_wsize (wosize) - 1; (gdb) 84 Byte (result, offset_index) = offset_index - len; (gdb) 86 } (gdb) copy_string (s=0x3fe9d0 "Bad file descriptor") at alloc.c:102 102 memmove(String_val(res), s, len); (gdb) 104 } (gdb) sys_error (arg=1) at sys.c:115 115 raise_sys_error(str); (gdb) raise_sys_error (msg=4585348) at fail.c:107 107 raise_with_arg(Field(global_data, SYS_ERROR_EXN), msg); (gdb) raise_with_arg (tag=4837168, arg=4585348) at fail.c:56 56 CAMLparam2 (tag, arg); (gdb) 57 CAMLlocal1 (bucket); (gdb) 59 bucket = alloc_small (2, 0); (gdb) alloc_small (wosize=2, tag=0) at alloc.c:61 61 Alloc_small (result, wosize, tag); (gdb) 63 } (gdb) raise_with_arg (tag=4837168, arg=4585348) at fail.c:60 60 Field(bucket, 0) = tag; (gdb) 61 Field(bucket, 1) = arg; (gdb) 62 mlraise(bucket); (gdb) mlraise (v=4585336) at fail.c:38 38 Unlock_exn(); (gdb) 39 exn_bucket = v; (gdb) 40 if (external_raise == NULL) fatal_uncaught_exception(v); (gdb) 41 siglongjmp(external_raise->buf, 1); (gdb) Breakpoint 4, caml_main (argv=0x3f40e8) at startup.c:405 405 if (Is_exception_result(res)) { (gdb) quit ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners