caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Dynamically loaded BSS not initialised to 0.
@ 2010-01-03 11:37 Guillaume Yziquel
  2010-01-03 23:21 ` [Caml-list] " Guillaume Yziquel
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume Yziquel @ 2010-01-03 11:37 UTC (permalink / raw)
  To: OCaml List; +Cc: ygrek

Hello.

I encountered a rather weird issue. A binding of mine works fine when 
bundled as a .cmxa, but fails when bundled as a .cma. I'm running a 
Linux Debian amd64.

I've tracked down the issue to the following point: it seems that when 
the BSS (uninitialised data section) of libmonetdb5.so is dynamically 
loaded, it doesn't get initialised to 0. And the code in libmonetdb5.so 
relies on the fact that BSS gets initialised to 0 when dynamically loaded.

To reproduce my problem, do the following:

You need to have the following Debian packages installed for MonetDB.

	libmonetdb-client-dev
	libmonetdb-client1
	libmonetdb-dev-dbg
	libmonetdb1-dbg
	libmonetdb5-server-dev-dbg
	libmonetdb5-server5-dbg
	libmonetdb5-sql-dev
	libmonetdb5-sql2
	monetdb-client
	monetdb5-server-dbg

The *-dbg packages are packages I've changed and recompiled with the -g 
option. They are available from my website:

	http://yziquel.homelinux.org/debian/pool/main/m/

The key signing the repo is located at

http://yziquel.homelinux.org/debian/yziquel-debian-packages.asc

and you just have to do

	cat yziquel-debian-packages.asc | sudo apt-key add -

and include the following lines:

> deb     http://yziquel.homelinux.org/debian stable   main
> deb-src http://yziquel.homelinux.org/debian stable   main
> deb     http://yziquel.homelinux.org/debian testing  main
> deb-src http://yziquel.homelinux.org/debian testing  main
> deb     http://yziquel.homelinux.org/debian unstable main
> deb-src http://yziquel.homelinux.org/debian unstable main

The rest of the MonetDB packages can be found here:

	http://monetdb.cwi.nl/downloads/Debian/

and the monetdb5 binding is here:

	http://yziquel.homelinux.org/gitweb/?p=ocaml-monetdb5.git;a=tree

(click on snapshot to download one).

Now here is why I believe that the BSS is not properly initialised. The 
code in which I have my segfault is the following one, function findBox. 
Line 330 of:

http://monetdb.cvs.sourceforge.net/viewvc/monetdb/MonetDB5/src/mal/mal_box.mx?revision=1.100&view=markup

There is this line:

> if (box[i] != NULL && idcmp(name, box[i]->name) == 0) {

I've followed machine code instructions step by step there, with ddd.

In native code, box[i] == NULL. Evaluation stops there (i.e. box[i] != 
NULL is false). Everything is perfect.

In bytecode, box[i] != NULL because BSS is not initialised to 0... And 
it then tries to access box[i]->name, and segfaults.

For the record, you have:

>   211 typedef struct BOX {
>   212 	MT_Lock lock;		/* provide exclusive access */
>   213 	str name;
>   214 	MalBlkPtr sym;
>   215 	MalStkPtr val;
>   216 	int dirty;		/* don't save if it hasn't been changed */
>   217 } *Box, BoxRecord;

and

>   263 #define MAXSPACES 64		/* >MAXCLIENTS+ max modules !! */
>   264 Box box[MAXSPACES];

For the disassembled code, you can have a look at:

http://sourceforge.net/mailarchive/message.php?msg_name=4B3ED073.3050203%40citycable.ch

I've also tried running ltrace to see how dynamic loading happens for 
the bytecode monetdb5.cma:

	http://yziquel.homelinux.org/monetdb_sql.byte.ltrace

But it gives ma 95% of ocaml related lines, and the end is concerned 
only with ml_monetdb_sql. I'd like to see how the 'box' symbol gets 
loaded in BSS, but do not know how to do that.

So: is ocaml failing to initialise memory to 0 when limonetdb5.so is 
dynamically loaded?

-- 
      Guillaume Yziquel
http://yziquel.homelinux.org/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Dynamically loaded BSS not initialised to 0.
  2010-01-03 11:37 Dynamically loaded BSS not initialised to 0 Guillaume Yziquel
@ 2010-01-03 23:21 ` Guillaume Yziquel
  2010-01-04 14:10   ` Richard Jones
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume Yziquel @ 2010-01-03 23:21 UTC (permalink / raw)
  To: OCaml List

Guillaume Yziquel a écrit :
> Hello.
> 
> I encountered a rather weird issue. A binding of mine works fine when 
> bundled as a .cmxa, but fails when bundled as a .cma. I'm running a 
> Linux Debian amd64.
> 
> I've tracked down the issue to the following point: it seems that when 
> the BSS (uninitialised data section) of libmonetdb5.so is dynamically 
> loaded, it doesn't get initialised to 0. And the code in libmonetdb5.so 
> relies on the fact that BSS gets initialised to 0 when dynamically loaded.
> 
> So: is ocaml failing to initialise memory to 0 when limonetdb5.so is 
> dynamically loaded?

Problem solved: This is in fact a symbol collision problem on the symbol 
'box'. There's one in libncurses, which is loaded by ocamlrun.

Thanks to Csaba Halasz (Jester01 on ##asm) for help with binary debugging.

All the best,

-- 
      Guillaume Yziquel
http://yziquel.homelinux.org/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Dynamically loaded BSS not initialised to 0.
  2010-01-03 23:21 ` [Caml-list] " Guillaume Yziquel
@ 2010-01-04 14:10   ` Richard Jones
  2010-01-08 20:12     ` Guillaume Yziquel
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Jones @ 2010-01-04 14:10 UTC (permalink / raw)
  To: Guillaume Yziquel; +Cc: OCaml List

On Mon, Jan 04, 2010 at 12:21:36AM +0100, Guillaume Yziquel wrote:
> Guillaume Yziquel a écrit :
> >Hello.
> >
> >I encountered a rather weird issue. A binding of mine works fine when 
> >bundled as a .cmxa, but fails when bundled as a .cma. I'm running a 
> >Linux Debian amd64.
> >
> >I've tracked down the issue to the following point: it seems that when 
> >the BSS (uninitialised data section) of libmonetdb5.so is dynamically 
> >loaded, it doesn't get initialised to 0. And the code in libmonetdb5.so 
> >relies on the fact that BSS gets initialised to 0 when dynamically loaded.
> >
> >So: is ocaml failing to initialise memory to 0 when limonetdb5.so is 
> >dynamically loaded?
> 
> Problem solved: This is in fact a symbol collision problem on the symbol 
> 'box'. There's one in libncurses, which is loaded by ocamlrun.

Good ol' ELF loading model ...  Uli wrote a really good introduction
to writing DSOs which everyone should read:

http://people.redhat.com/drepper/dsohowto.pdf

The issue of symbol scope is covered there too, although I don't think
it can help in this case.  One or other of the libraries is just going
to have to change the visibility of that symbol.  In ncurses it's a
public symbol, but if I understand the code correctly, in MonetDB it's
just an accidentally leaked global variable (not part of the API).  So
MonetDB could control the visibility of that symbol using a linker
script.  We use linker scripts extensively in libvirt to control which
clients can see which sets of symbols, eg:

http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt_public.syms;hb=HEAD
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt_private.syms;hb=HEAD

In answer to your original question, initialization of the BSS is the
job of the loader (ld-linux.so(8)).  OCaml just calls dlopen(3), which
calls into some extremely well-tested code, so it was always going to
be unlikely that BSS initialization was the problem.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Dynamically loaded BSS not initialised to 0.
  2010-01-04 14:10   ` Richard Jones
@ 2010-01-08 20:12     ` Guillaume Yziquel
  0 siblings, 0 replies; 4+ messages in thread
From: Guillaume Yziquel @ 2010-01-08 20:12 UTC (permalink / raw)
  To: Richard Jones; +Cc: OCaml List

Richard Jones a écrit :
>
>> Problem solved: This is in fact a symbol collision problem on the symbol 
>> 'box'. There's one in libncurses, which is loaded by ocamlrun.
> 
> Good ol' ELF loading model ...  Uli wrote a really good introduction
> to writing DSOs which everyone should read:
> 
> http://people.redhat.com/drepper/dsohowto.pdf

Indeed, it's very very good. Thanks a lot for this pointer.

> The issue of symbol scope is covered there too, although I don't think
> it can help in this case.  One or other of the libraries is just going
> to have to change the visibility of that symbol.

Yes. This has been done on the MonetDB side. They're going to make 'box' 
locally static, and to rename it...

> In ncurses it's a
> public symbol, but if I understand the code correctly, in MonetDB it's
> just an accidentally leaked global variable (not part of the API).  So
> MonetDB could control the visibility of that symbol using a linker
> script.

Yes, they probably could, but it seems to me that they have other 
priorities for now.

> We use linker scripts extensively in libvirt to control which
> clients can see which sets of symbols, eg:
> 
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt_public.syms;hb=HEAD
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt_private.syms;hb=HEAD
> 
> In answer to your original question, initialization of the BSS is the
> job of the loader (ld-linux.so(8)).  OCaml just calls dlopen(3), which
> calls into some extremely well-tested code, so it was always going to
> be unlikely that BSS initialization was the problem.
> 
> Rich.

Thanks. I was quite sure that the loader was doing a proper job. I 
wasn't sure however that OCaml was calling dlopen, and I was wondering 
at the time if the linking scheme used by OCaml depended or not on 
whether we're dealing with OCaml bytecode or OCaml native code. In this 
context I was wondering if the BSS was initialised to 0, since on some 
hardware, it's not necessarily the case (it seems... I would not bet my 
hand on this).

I now know better.

Anyway, it was an interesting bug: I'm growing fond of assembly.

All the best,

-- 
      Guillaume Yziquel
http://yziquel.homelinux.org/


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-08 20:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-03 11:37 Dynamically loaded BSS not initialised to 0 Guillaume Yziquel
2010-01-03 23:21 ` [Caml-list] " Guillaume Yziquel
2010-01-04 14:10   ` Richard Jones
2010-01-08 20:12     ` Guillaume Yziquel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).