caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] segfault in simple program with 4.02 native
@ 2014-09-05 21:33 Ashish Agarwal
  2014-09-05 21:50 ` Andy Ray
  2014-09-05 21:56 ` Richard W.M. Jones
  0 siblings, 2 replies; 25+ messages in thread
From: Ashish Agarwal @ 2014-09-05 21:33 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 755 bytes --]

https://github.com/agarwal/ocaml402_error

The above link contains a simple program that segfaults when compiling with
ocamlopt 4.02. Byte code doesn't segfault, nor does native code with 4.01.

This is the minimal example I could come up with. It uses atdgen and is
sensitive to the exact fields in the .atd file. Removing any of the fields
leads to correctly functioning code.

I've only tested on Mac OS X so far.

In the context of my full code, I observed another behavior also. Instead
of a segfault, I would get random non-ascii characters printed for the
value of postgres.host, and it would be different on repeated runs of the
program (I didn't even recompile in between). I can't seem to reproduce
this behavior, only getting the segfault now.

[-- Attachment #2: Type: text/html, Size: 964 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 21:33 [Caml-list] segfault in simple program with 4.02 native Ashish Agarwal
@ 2014-09-05 21:50 ` Andy Ray
  2014-09-05 21:56 ` Richard W.M. Jones
  1 sibling, 0 replies; 25+ messages in thread
From: Andy Ray @ 2014-09-05 21:50 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: Caml List

Hi,

A similar issue here on a Ubuntu 14.04 64 bit opam install, also using atdgen.

https://github.com/avsm/ocaml-github/issues/35

It works with ocaml 4.01 but not 4.02 and should be repeatable with
the ocaml-github 'git-jar' tool installed with opam.

Regards,
Andy


On Fri, Sep 5, 2014 at 10:33 PM, Ashish Agarwal <agarwal1975@gmail.com> wrote:
> https://github.com/agarwal/ocaml402_error
>
> The above link contains a simple program that segfaults when compiling with
> ocamlopt 4.02. Byte code doesn't segfault, nor does native code with 4.01.
>
> This is the minimal example I could come up with. It uses atdgen and is
> sensitive to the exact fields in the .atd file. Removing any of the fields
> leads to correctly functioning code.
>
> I've only tested on Mac OS X so far.
>
> In the context of my full code, I observed another behavior also. Instead of
> a segfault, I would get random non-ascii characters printed for the value of
> postgres.host, and it would be different on repeated runs of the program (I
> didn't even recompile in between). I can't seem to reproduce this behavior,
> only getting the segfault now.
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 21:33 [Caml-list] segfault in simple program with 4.02 native Ashish Agarwal
  2014-09-05 21:50 ` Andy Ray
@ 2014-09-05 21:56 ` Richard W.M. Jones
  2014-09-05 22:01   ` Sebastien Mondet
  2014-09-05 22:06   ` Ashish Agarwal
  1 sibling, 2 replies; 25+ messages in thread
From: Richard W.M. Jones @ 2014-09-05 21:56 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: Caml List

On Fri, Sep 05, 2014 at 05:33:11PM -0400, Ashish Agarwal wrote:
> https://github.com/agarwal/ocaml402_error
> 
> The above link contains a simple program that segfaults when compiling with
> ocamlopt 4.02. Byte code doesn't segfault, nor does native code with 4.01.
> 
> This is the minimal example I could come up with. It uses atdgen and is
> sensitive to the exact fields in the .atd file. Removing any of the fields
> leads to correctly functioning code.
> 
> I've only tested on Mac OS X so far.
> 
> In the context of my full code, I observed another behavior also. Instead
> of a segfault, I would get random non-ascii characters printed for the
> value of postgres.host, and it would be different on repeated runs of the
> program (I didn't even recompile in between). I can't seem to reproduce
> this behavior, only getting the segfault now.

I made a best effort to compile that, but I don't have some of
the gazillion dependencies.  So instead I have a suggestion:

What happens if you increase the stack limit?

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 21:56 ` Richard W.M. Jones
@ 2014-09-05 22:01   ` Sebastien Mondet
  2014-09-05 22:06   ` Ashish Agarwal
  1 sibling, 0 replies; 25+ messages in thread
From: Sebastien Mondet @ 2014-09-05 22:01 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Ashish Agarwal, Caml List

[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]

Rich, please just `git pull`, Core has been removed from the deps few
seconds ago.




On Fri, Sep 5, 2014 at 5:56 PM, Richard W.M. Jones <rich@annexia.org> wrote:

> On Fri, Sep 05, 2014 at 05:33:11PM -0400, Ashish Agarwal wrote:
> > https://github.com/agarwal/ocaml402_error
> >
> > The above link contains a simple program that segfaults when compiling
> with
> > ocamlopt 4.02. Byte code doesn't segfault, nor does native code with
> 4.01.
> >
> > This is the minimal example I could come up with. It uses atdgen and is
> > sensitive to the exact fields in the .atd file. Removing any of the
> fields
> > leads to correctly functioning code.
> >
> > I've only tested on Mac OS X so far.
> >
> > In the context of my full code, I observed another behavior also. Instead
> > of a segfault, I would get random non-ascii characters printed for the
> > value of postgres.host, and it would be different on repeated runs of the
> > program (I didn't even recompile in between). I can't seem to reproduce
> > this behavior, only getting the segfault now.
>
> I made a best effort to compile that, but I don't have some of
> the gazillion dependencies.  So instead I have a suggestion:
>
> What happens if you increase the stack limit?
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2440 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 21:56 ` Richard W.M. Jones
  2014-09-05 22:01   ` Sebastien Mondet
@ 2014-09-05 22:06   ` Ashish Agarwal
  2014-09-05 22:13     ` Richard W.M. Jones
  1 sibling, 1 reply; 25+ messages in thread
From: Ashish Agarwal @ 2014-09-05 22:06 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]

I increased the stack size to 65532, which is apparently the max allowed on
a Mac, and it doesn't change the behavior.


On Fri, Sep 5, 2014 at 5:56 PM, Richard W.M. Jones <rich@annexia.org> wrote:

> On Fri, Sep 05, 2014 at 05:33:11PM -0400, Ashish Agarwal wrote:
> > https://github.com/agarwal/ocaml402_error
> >
> > The above link contains a simple program that segfaults when compiling
> with
> > ocamlopt 4.02. Byte code doesn't segfault, nor does native code with
> 4.01.
> >
> > This is the minimal example I could come up with. It uses atdgen and is
> > sensitive to the exact fields in the .atd file. Removing any of the
> fields
> > leads to correctly functioning code.
> >
> > I've only tested on Mac OS X so far.
> >
> > In the context of my full code, I observed another behavior also. Instead
> > of a segfault, I would get random non-ascii characters printed for the
> > value of postgres.host, and it would be different on repeated runs of the
> > program (I didn't even recompile in between). I can't seem to reproduce
> > this behavior, only getting the segfault now.
>
> I made a best effort to compile that, but I don't have some of
> the gazillion dependencies.  So instead I have a suggestion:
>
> What happens if you increase the stack limit?
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>

[-- Attachment #2: Type: text/html, Size: 1935 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:06   ` Ashish Agarwal
@ 2014-09-05 22:13     ` Richard W.M. Jones
  2014-09-05 22:18       ` Richard W.M. Jones
  2014-09-05 22:18       ` Christoph Höger
  0 siblings, 2 replies; 25+ messages in thread
From: Richard W.M. Jones @ 2014-09-05 22:13 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: Caml List

On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
> I increased the stack size to 65532, which is apparently the max allowed on
> a Mac, and it doesn't change the behavior.

Yup.  I was able to reproduce this on the non-core version, and indeed
increasing the stack to unlimited on Linux does not help.

The stack trace is simple:

#0  0x00000000004543f4 in camlPervasives__output_string_1198 ()
#1  0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 ()
#2  0x0000000000473a32 in camlPrintf__fun_1062 ()
#3  0x000000000041e776 in camlApp__entry ()
#4  0x000000000041c5f9 in caml_program ()
#5  0x0000000000497f7e in caml_start_program ()
#6  0x000000000049813d in __libc_csu_init ()
#7  0x00007ffff7317d65 in __libc_start_main () from /lib64/libc.so.6
#8  0x000000000041c2e9 in _start ()

I'm just installing debuginfo so I can get more symbols ..

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:13     ` Richard W.M. Jones
@ 2014-09-05 22:18       ` Richard W.M. Jones
  2014-09-05 22:36         ` Török Edwin
  2014-09-05 22:39         ` Martin Jambon
  2014-09-05 22:18       ` Christoph Höger
  1 sibling, 2 replies; 25+ messages in thread
From: Richard W.M. Jones @ 2014-09-05 22:18 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: Caml List

On Fri, Sep 05, 2014 at 11:13:02PM +0100, Richard W.M. Jones wrote:
> On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
> > I increased the stack size to 65532, which is apparently the max allowed on
> > a Mac, and it doesn't change the behavior.
> 
> Yup.  I was able to reproduce this on the non-core version, and indeed
> increasing the stack to unlimited on Linux does not help.
> 
> The stack trace is simple:
> 
> #0  0x00000000004543f4 in camlPervasives__output_string_1198 ()
> #1  0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 ()
> #2  0x0000000000473a32 in camlPrintf__fun_1062 ()
> #3  0x000000000041e776 in camlApp__entry ()
> #4  0x000000000041c5f9 in caml_program ()
> #5  0x0000000000497f7e in caml_start_program ()
> #6  0x000000000049813d in __libc_csu_init ()
> #7  0x00007ffff7317d65 in __libc_start_main () from /lib64/libc.so.6
> #8  0x000000000041c2e9 in _start ()
> 
> I'm just installing debuginfo so I can get more symbols ..

.. although I guess the fact that the generated code in config_j.ml is
doing a lot of Obj.magic would be the first place to be suspicious.

eg:

    let (x : postgres) =
      {
        host = Obj.magic 0.0;
...

where the host field has declared type string.  Really?

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:13     ` Richard W.M. Jones
  2014-09-05 22:18       ` Richard W.M. Jones
@ 2014-09-05 22:18       ` Christoph Höger
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Höger @ 2014-09-05 22:18 UTC (permalink / raw)
  To: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

output_string calls unsafe_output_string after getting the string
length. Both are external. Could you attempt to print the string
length before you print the string (don't forget to flush the buffer).
My take is, the input argument is not a string at all.

Am 06.09.2014 00:13, schrieb Richard W.M. Jones:
> On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
>> I increased the stack size to 65532, which is apparently the max
>> allowed on a Mac, and it doesn't change the behavior.
> 
> Yup.  I was able to reproduce this on the non-core version, and
> indeed increasing the stack to unlimited on Linux does not help.
> 
> The stack trace is simple:
> 
> #0  0x00000000004543f4 in camlPervasives__output_string_1198 () #1
> 0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 () 
> #2  0x0000000000473a32 in camlPrintf__fun_1062 () #3
> 0x000000000041e776 in camlApp__entry () #4  0x000000000041c5f9 in
> caml_program () #5  0x0000000000497f7e in caml_start_program () #6
> 0x000000000049813d in __libc_csu_init () #7  0x00007ffff7317d65 in
> __libc_start_main () from /lib64/libc.so.6 #8  0x000000000041c2e9
> in _start ()
> 
> I'm just installing debuginfo so I can get more symbols ..
> 
> Rich.
> 


- -- 
Christoph Höger

Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Übersetzerbau und Programmiersprachen

Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin

Tel.: +49 (30) 314-24890
E-Mail: christoph.hoeger@tu-berlin.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlQKNqYACgkQhMBO4cVSGS8/egCglmUwz2XBaRkkCFwvSVaIlEgN
HrgAnRRN0mzuu1GlYgclglAK7961sGXp
=1tGf
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:18       ` Richard W.M. Jones
@ 2014-09-05 22:36         ` Török Edwin
  2014-09-05 22:39         ` Martin Jambon
  1 sibling, 0 replies; 25+ messages in thread
From: Török Edwin @ 2014-09-05 22:36 UTC (permalink / raw)
  To: caml-list

On Fri, 2014-09-05 at 23:18 +0100, Richard W.M. Jones wrote:
> On Fri, Sep 05, 2014 at 11:13:02PM +0100, Richard W.M. Jones wrote:
> > On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
> > > I increased the stack size to 65532, which is apparently the max allowed on
> > > a Mac, and it doesn't change the behavior.
> > 
> > Yup.  I was able to reproduce this on the non-core version, and indeed
> > increasing the stack to unlimited on Linux does not help.
> > 
> > The stack trace is simple:
> > 
> > #0  0x00000000004543f4 in camlPervasives__output_string_1198 ()
> > #1  0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 ()
> > #2  0x0000000000473a32 in camlPrintf__fun_1062 ()
> > #3  0x000000000041e776 in camlApp__entry ()
> > #4  0x000000000041c5f9 in caml_program ()
> > #5  0x0000000000497f7e in caml_start_program ()
> > #6  0x000000000049813d in __libc_csu_init ()
> > #7  0x00007ffff7317d65 in __libc_start_main () from /lib64/libc.so.6
> > #8  0x000000000041c2e9 in _start ()
> > 
> > I'm just installing debuginfo so I can get more symbols ..
> 
> .. although I guess the fact that the generated code in config_j.ml is
> doing a lot of Obj.magic would be the first place to be suspicious.
> 
> eg:
> 
>     let (x : postgres) =
>       {
>         host = Obj.magic 0.0;
> ...
> 
> where the host field has declared type string.  Really?

There's also this, and a lot of Obj.set_field too
  postgres = Obj.magic 0.0;

FWIW it triggers an assertion failure with -runtime-variant d even
before it starts parsing:

./app.native 
### OCaml runtime: debug mode ###
Initial minor heap size: 2048k bytes
Initial major heap size: 3840k bytes
Initial space overhead: 80%
Initial max overhead: 500%
Initial heap increment: 15%
Initial allocation policy: 0
file memory.d.c; line 548 ### Assertion failed: Is_in_heap(fp)

Best regards,
--Edwin


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:18       ` Richard W.M. Jones
  2014-09-05 22:36         ` Török Edwin
@ 2014-09-05 22:39         ` Martin Jambon
  2014-09-05 23:39           ` Ashish Agarwal
  2014-09-06  0:12           ` Jeremy Yallop
  1 sibling, 2 replies; 25+ messages in thread
From: Martin Jambon @ 2014-09-05 22:39 UTC (permalink / raw)
  To: caml-list

On 09/05/2014 03:18 PM, Richard W.M. Jones wrote:
> On Fri, Sep 05, 2014 at 11:13:02PM +0100, Richard W.M. Jones wrote:
>> On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
>>> I increased the stack size to 65532, which is apparently the max allowed on
>>> a Mac, and it doesn't change the behavior.
>>
>> Yup.  I was able to reproduce this on the non-core version, and indeed
>> increasing the stack to unlimited on Linux does not help.
>>
>> The stack trace is simple:
>>
>> #0  0x00000000004543f4 in camlPervasives__output_string_1198 ()
>> #1  0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 ()
>> #2  0x0000000000473a32 in camlPrintf__fun_1062 ()
>> #3  0x000000000041e776 in camlApp__entry ()
>> #4  0x000000000041c5f9 in caml_program ()
>> #5  0x0000000000497f7e in caml_start_program ()
>> #6  0x000000000049813d in __libc_csu_init ()
>> #7  0x00007ffff7317d65 in __libc_start_main () from /lib64/libc.so.6
>> #8  0x000000000041c2e9 in _start ()
>>
>> I'm just installing debuginfo so I can get more symbols ..
>
> .. although I guess the fact that the generated code in config_j.ml is
> doing a lot of Obj.magic would be the first place to be suspicious.
>
> eg:
>
>      let (x : postgres) =
>        {
>          host = Obj.magic 0.0;
> ...
>
> where the host field has declared type string.  Really?

That code is generated by atdgen. What happens is that we have to either 
create an empty record when starting to parse a list of unordered JSON 
fields, or use a bunch `let <field name> = ref None in` for each field 
and create the record in the end. While the latter approach is not much 
more work to implement, the resulting code was found to be significantly 
slower.

The reason why it's using `Obj.magic 0.0` is that it worked in all cases 
(and has been for the past 4 years). Obtaining a well-formed constant 
value for any type is not trivial, so this what we have.

It's very possible that it's now broken with OCaml 4.02. First try a 
'make test' from atdgen's source directory 
(https://github.com/mjambon/atdgen) and see if it passes.


Martin


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:39         ` Martin Jambon
@ 2014-09-05 23:39           ` Ashish Agarwal
  2014-09-05 23:59             ` Martin Jambon
  2014-09-06  0:12           ` Jeremy Yallop
  1 sibling, 1 reply; 25+ messages in thread
From: Ashish Agarwal @ 2014-09-05 23:39 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]

>  First try a 'make test' from atdgen's source
...
The following tests failed:
ocaml internals
biniou correctness
json correctness
*** FAILURE ***

Thanks to everyone for all the replies. My type defs are small, so at least
I can easily bypass the issue by using Yojson directly.




On Fri, Sep 5, 2014 at 6:39 PM, Martin Jambon <martin.jambon@ens-lyon.org>
wrote:

> On 09/05/2014 03:18 PM, Richard W.M. Jones wrote:
>
>> On Fri, Sep 05, 2014 at 11:13:02PM +0100, Richard W.M. Jones wrote:
>>
>>> On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal wrote:
>>>
>>>> I increased the stack size to 65532, which is apparently the max
>>>> allowed on
>>>> a Mac, and it doesn't change the behavior.
>>>>
>>>
>>> Yup.  I was able to reproduce this on the non-core version, and indeed
>>> increasing the stack to unlimited on Linux does not help.
>>>
>>> The stack trace is simple:
>>>
>>> #0  0x00000000004543f4 in camlPervasives__output_string_1198 ()
>>> #1  0x0000000000472093 in camlCamlinternalFormat__output_acc_60624 ()
>>> #2  0x0000000000473a32 in camlPrintf__fun_1062 ()
>>> #3  0x000000000041e776 in camlApp__entry ()
>>> #4  0x000000000041c5f9 in caml_program ()
>>> #5  0x0000000000497f7e in caml_start_program ()
>>> #6  0x000000000049813d in __libc_csu_init ()
>>> #7  0x00007ffff7317d65 in __libc_start_main () from /lib64/libc.so.6
>>> #8  0x000000000041c2e9 in _start ()
>>>
>>> I'm just installing debuginfo so I can get more symbols ..
>>>
>>
>> .. although I guess the fact that the generated code in config_j.ml is
>> doing a lot of Obj.magic would be the first place to be suspicious.
>>
>> eg:
>>
>>      let (x : postgres) =
>>        {
>>          host = Obj.magic 0.0;
>> ...
>>
>> where the host field has declared type string.  Really?
>>
>
> That code is generated by atdgen. What happens is that we have to either
> create an empty record when starting to parse a list of unordered JSON
> fields, or use a bunch `let <field name> = ref None in` for each field and
> create the record in the end. While the latter approach is not much more
> work to implement, the resulting code was found to be significantly slower.
>
> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
> (and has been for the past 4 years). Obtaining a well-formed constant value
> for any type is not trivial, so this what we have.
>
> It's very possible that it's now broken with OCaml 4.02. First try a 'make
> test' from atdgen's source directory (https://github.com/mjambon/atdgen)
> and see if it passes.
>
>
> Martin
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 4253 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 23:39           ` Ashish Agarwal
@ 2014-09-05 23:59             ` Martin Jambon
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Jambon @ 2014-09-05 23:59 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: Caml List

On Fri 05 Sep 2014 04:39:54 PM PDT, Ashish Agarwal wrote:
> >  First try a 'make test' from atdgen's source
> ...
> The following tests failed:
> ocaml internals
> biniou correctness
> json correctness
> *** FAILURE ***

The "ocaml internals" test fails on this:

type internals1 = { int1 : bool }

...

  let f () = { int1 = Obj.magic false } in
  assert (f () != f ());

So don't use atdgen 1.3.1 or below with ocaml 4.02.0 or above.


No promise as when this will be fixed. Follow the activity on github if 
you're interested in following or contributing.


> Thanks to everyone for all the replies. My type defs are small, so at
> least I can easily bypass the issue by using Yojson directly.
>
>
>
>
> On Fri, Sep 5, 2014 at 6:39 PM, Martin Jambon
> <martin.jambon@ens-lyon.org <mailto:martin.jambon@ens-lyon.org>> wrote:
>
>     On 09/05/2014 03:18 PM, Richard W.M. Jones wrote:
>
>         On Fri, Sep 05, 2014 at 11:13:02PM +0100, Richard W.M. Jones
>         wrote:
>
>             On Fri, Sep 05, 2014 at 06:06:55PM -0400, Ashish Agarwal
>             wrote:
>
>                 I increased the stack size to 65532, which is
>                 apparently the max allowed on
>                 a Mac, and it doesn't change the behavior.
>
>
>             Yup.  I was able to reproduce this on the non-core
>             version, and indeed
>             increasing the stack to unlimited on Linux does not help.
>
>             The stack trace is simple:
>
>             #0  0x00000000004543f4 in
>             camlPervasives__output_string___1198 ()
>             #1  0x0000000000472093 in
>             camlCamlinternalFormat____output_acc_60624 ()
>             #2  0x0000000000473a32 in camlPrintf__fun_1062 ()
>             #3  0x000000000041e776 in camlApp__entry ()
>             #4  0x000000000041c5f9 in caml_program ()
>             #5  0x0000000000497f7e in caml_start_program ()
>             #6  0x000000000049813d in __libc_csu_init ()
>             #7  0x00007ffff7317d65 in __libc_start_main () from
>             /lib64/libc.so.6
>             #8  0x000000000041c2e9 in _start ()
>
>             I'm just installing debuginfo so I can get more symbols ..
>
>
>         .. although I guess the fact that the generated code in
>         config_j.ml <http://config_j.ml> is
>         doing a lot of Obj.magic would be the first place to be
>         suspicious.
>
>         eg:
>
>              let (x : postgres) =
>                {
>                  host = Obj.magic 0.0;
>         ...
>
>         where the host field has declared type string.  Really?
>
>
>     That code is generated by atdgen. What happens is that we have to
>     either create an empty record when starting to parse a list of
>     unordered JSON fields, or use a bunch `let <field name> = ref None
>     in` for each field and create the record in the end. While the
>     latter approach is not much more work to implement, the resulting
>     code was found to be significantly slower.
>
>     The reason why it's using `Obj.magic 0.0` is that it worked in all
>     cases (and has been for the past 4 years). Obtaining a well-formed
>     constant value for any type is not trivial, so this what we have.
>
>     It's very possible that it's now broken with OCaml 4.02. First try
>     a 'make test' from atdgen's source directory
>     (https://github.com/mjambon/__atdgen
>     <https://github.com/mjambon/atdgen>) and see if it passes.
>
>
>     Martin
>
>
>
>     --
>     Caml-list mailing list.  Subscription management and archives:
>     https://sympa.inria.fr/sympa/__arc/caml-list
>     <https://sympa.inria.fr/sympa/arc/caml-list>
>     Beginner's list: http://groups.yahoo.com/group/__ocaml_beginners
>     <http://groups.yahoo.com/group/ocaml_beginners>
>     Bug reports: http://caml.inria.fr/bin/caml-__bugs
>     <http://caml.inria.fr/bin/caml-bugs>
>
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-05 22:39         ` Martin Jambon
  2014-09-05 23:39           ` Ashish Agarwal
@ 2014-09-06  0:12           ` Jeremy Yallop
  2014-09-06  5:51             ` Martin Jambon
  1 sibling, 1 reply; 25+ messages in thread
From: Jeremy Yallop @ 2014-09-06  0:12 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Caml List

On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org> wrote:
> That code is generated by atdgen. What happens is that we have to either
> create an empty record when starting to parse a list of unordered JSON
> fields, or use a bunch `let <field name> = ref None in` for each field and
> create the record in the end. While the latter approach is not much more
> work to implement, the resulting code was found to be significantly slower.
>
> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
> (and has been for the past 4 years). Obtaining a well-formed constant value
> for any type is not trivial, so this what we have.
>
> It's very possible that it's now broken with OCaml 4.02. First try a 'make
> test' from atdgen's source directory (https://github.com/mjambon/atdgen) and
> see if it passes.

It does seem to be broken, and the change in behaviour with 4.0.2 is
apparently due to improved constant propagation
(http://caml.inria.fr/mantis/view.php?id=5779).

The compiler now takes more advantage of immutability to improve the
memory usage and performance of programs.  It's safe (or ought to be
safe) to assume that immutable record fields are never updated, so the
values used to initialize the fields can be propagated to other parts
of the program.  Here's a small example that shows the change in
behaviour between 4.01 and 4.02.

   type t = { s : string }
   let x = { s =  "one" }
   let () = Obj.(set_field (repr x) 0 (repr "two"))
   let () = print_endline x.s

Using OCaml 4.01 the third line overwrites the field 's'  and the
fourth line reads the updated field and prints "two".  Using OCaml
4.02 the initial value of the field is propagated past the write to
the code in the fourth line, so the program prints "one".

The code currently generated by atdgen assumes that it's safe to treat
fields as if they were mutable -- that is, it assumes that it's safe
to initialize a field with a value of the wrong type, so long as the
value is overwritten before the field is first read.  I don't think
such tricks were ever explicitly guaranteed to work, but they're now
much more likely to fail, leading to the dummy initial value being
accessed at an inappropriate type.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  0:12           ` Jeremy Yallop
@ 2014-09-06  5:51             ` Martin Jambon
  2014-09-06  6:00               ` Milan Stanojević
  2014-09-07 18:47               ` Alain Frisch
  0 siblings, 2 replies; 25+ messages in thread
From: Martin Jambon @ 2014-09-06  5:51 UTC (permalink / raw)
  To: Jeremy Yallop; +Cc: Caml List

On Fri 05 Sep 2014 05:12:44 PM PDT, Jeremy Yallop wrote:
> On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org> wrote:
>> That code is generated by atdgen. What happens is that we have to either
>> create an empty record when starting to parse a list of unordered JSON
>> fields, or use a bunch `let <field name> = ref None in` for each field and
>> create the record in the end. While the latter approach is not much more
>> work to implement, the resulting code was found to be significantly slower.
>>
>> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
>> (and has been for the past 4 years). Obtaining a well-formed constant value
>> for any type is not trivial, so this what we have.
>>
>> It's very possible that it's now broken with OCaml 4.02. First try a 'make
>> test' from atdgen's source directory (https://github.com/mjambon/atdgen) and
>> see if it passes.
>
> It does seem to be broken, and the change in behaviour with 4.0.2 is
> apparently due to improved constant propagation
> (http://caml.inria.fr/mantis/view.php?id=5779).
>
> The compiler now takes more advantage of immutability to improve the
> memory usage and performance of programs.  It's safe (or ought to be
> safe) to assume that immutable record fields are never updated, so the
> values used to initialize the fields can be propagated to other parts
> of the program.  Here's a small example that shows the change in
> behaviour between 4.01 and 4.02.
>
>     type t = { s : string }
>     let x = { s =  "one" }
>     let () = Obj.(set_field (repr x) 0 (repr "two"))
>     let () = print_endline x.s
>
> Using OCaml 4.01 the third line overwrites the field 's'  and the
> fourth line reads the updated field and prints "two".  Using OCaml
> 4.02 the initial value of the field is propagated past the write to
> the code in the fourth line, so the program prints "one".
>
> The code currently generated by atdgen assumes that it's safe to treat
> fields as if they were mutable -- that is, it assumes that it's safe
> to initialize a field with a value of the wrong type, so long as the
> value is overwritten before the field is first read.  I don't think
> such tricks were ever explicitly guaranteed to work, but they're now
> much more likely to fail, leading to the dummy initial value being
> accessed at an inappropriate type.

Thanks for the explanation, Jeremy. I guess atdgen will have to use 
"option refs" after all unless someone has a better idea.

ATD definition:

type t = {
  ?field0: foo option;
  ~field1: string;
  field2: int;
}

Generated OCaml code:

let field0 = ref None in
let field1 = ref "" in
let field2 = ref None in
...
(* parse json fields coming in an unknown order *)
...
{
  field0 = !field0;
  field1 = !field1;
  field2 = (match !field2 with None -> error ... | Some x - >x);
}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  5:51             ` Martin Jambon
@ 2014-09-06  6:00               ` Milan Stanojević
  2014-09-06  7:46                 ` Frédéric Bour
  2014-09-06 19:08                 ` Martin Jambon
  2014-09-07 18:47               ` Alain Frisch
  1 sibling, 2 replies; 25+ messages in thread
From: Milan Stanojević @ 2014-09-06  6:00 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Jeremy Yallop, Caml List

Could you do a dirty trick where you define a record that is the same
as the one you have now except that is has mutable fields, then you do
your parsing like now and then at the end return a record with
immutable fields (using (Obj.magic mutable : immutable)? You just need
to make sure that your mutable record doesn't escape your code.

On Sat, Sep 6, 2014 at 1:51 AM, Martin Jambon
<martin.jambon@ens-lyon.org> wrote:
> On Fri 05 Sep 2014 05:12:44 PM PDT, Jeremy Yallop wrote:
>>
>> On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org>
>> wrote:
>>>
>>> That code is generated by atdgen. What happens is that we have to either
>>> create an empty record when starting to parse a list of unordered JSON
>>> fields, or use a bunch `let <field name> = ref None in` for each field
>>> and
>>> create the record in the end. While the latter approach is not much more
>>> work to implement, the resulting code was found to be significantly
>>> slower.
>>>
>>> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
>>> (and has been for the past 4 years). Obtaining a well-formed constant
>>> value
>>> for any type is not trivial, so this what we have.
>>>
>>> It's very possible that it's now broken with OCaml 4.02. First try a
>>> 'make
>>> test' from atdgen's source directory (https://github.com/mjambon/atdgen)
>>> and
>>> see if it passes.
>>
>>
>> It does seem to be broken, and the change in behaviour with 4.0.2 is
>> apparently due to improved constant propagation
>> (http://caml.inria.fr/mantis/view.php?id=5779).
>>
>> The compiler now takes more advantage of immutability to improve the
>> memory usage and performance of programs.  It's safe (or ought to be
>> safe) to assume that immutable record fields are never updated, so the
>> values used to initialize the fields can be propagated to other parts
>> of the program.  Here's a small example that shows the change in
>> behaviour between 4.01 and 4.02.
>>
>>     type t = { s : string }
>>     let x = { s =  "one" }
>>     let () = Obj.(set_field (repr x) 0 (repr "two"))
>>     let () = print_endline x.s
>>
>> Using OCaml 4.01 the third line overwrites the field 's'  and the
>> fourth line reads the updated field and prints "two".  Using OCaml
>> 4.02 the initial value of the field is propagated past the write to
>> the code in the fourth line, so the program prints "one".
>>
>> The code currently generated by atdgen assumes that it's safe to treat
>> fields as if they were mutable -- that is, it assumes that it's safe
>> to initialize a field with a value of the wrong type, so long as the
>> value is overwritten before the field is first read.  I don't think
>> such tricks were ever explicitly guaranteed to work, but they're now
>> much more likely to fail, leading to the dummy initial value being
>> accessed at an inappropriate type.
>
>
> Thanks for the explanation, Jeremy. I guess atdgen will have to use "option
> refs" after all unless someone has a better idea.
>
> ATD definition:
>
> type t = {
>  ?field0: foo option;
>  ~field1: string;
>  field2: int;
> }
>
> Generated OCaml code:
>
> let field0 = ref None in
> let field1 = ref "" in
> let field2 = ref None in
> ...
> (* parse json fields coming in an unknown order *)
> ...
> {
>  field0 = !field0;
>  field1 = !field1;
>  field2 = (match !field2 with None -> error ... | Some x - >x);
>
> }
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  6:00               ` Milan Stanojević
@ 2014-09-06  7:46                 ` Frédéric Bour
  2014-09-06 19:15                   ` Martin Jambon
  2014-09-06 19:08                 ` Martin Jambon
  1 sibling, 1 reply; 25+ messages in thread
From: Frédéric Bour @ 2014-09-06  7:46 UTC (permalink / raw)
  To: caml-list

Since it is generated code and offsets of the fields are already known,
Obj.new_block / Obj.set_field / Obj.repr could do it too.
The Obj.repr will be the only point were the typed version is introduced 
and it won't receive any mutation after that, this should be enough to 
make it works.

Of course, this might not be robust against future changes.

On 06/09/2014 10:00, Milan Stanojević wrote:
> Could you do a dirty trick where you define a record that is the same
> as the one you have now except that is has mutable fields, then you do
> your parsing like now and then at the end return a record with
> immutable fields (using (Obj.magic mutable : immutable)? You just need
> to make sure that your mutable record doesn't escape your code.
>
> On Sat, Sep 6, 2014 at 1:51 AM, Martin Jambon
> <martin.jambon@ens-lyon.org> wrote:
>> On Fri 05 Sep 2014 05:12:44 PM PDT, Jeremy Yallop wrote:
>>> On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org>
>>> wrote:
>>>> That code is generated by atdgen. What happens is that we have to either
>>>> create an empty record when starting to parse a list of unordered JSON
>>>> fields, or use a bunch `let <field name> = ref None in` for each field
>>>> and
>>>> create the record in the end. While the latter approach is not much more
>>>> work to implement, the resulting code was found to be significantly
>>>> slower.
>>>>
>>>> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
>>>> (and has been for the past 4 years). Obtaining a well-formed constant
>>>> value
>>>> for any type is not trivial, so this what we have.
>>>>
>>>> It's very possible that it's now broken with OCaml 4.02. First try a
>>>> 'make
>>>> test' from atdgen's source directory (https://github.com/mjambon/atdgen)
>>>> and
>>>> see if it passes.
>>>
>>> It does seem to be broken, and the change in behaviour with 4.0.2 is
>>> apparently due to improved constant propagation
>>> (http://caml.inria.fr/mantis/view.php?id=5779).
>>>
>>> The compiler now takes more advantage of immutability to improve the
>>> memory usage and performance of programs.  It's safe (or ought to be
>>> safe) to assume that immutable record fields are never updated, so the
>>> values used to initialize the fields can be propagated to other parts
>>> of the program.  Here's a small example that shows the change in
>>> behaviour between 4.01 and 4.02.
>>>
>>>      type t = { s : string }
>>>      let x = { s =  "one" }
>>>      let () = Obj.(set_field (repr x) 0 (repr "two"))
>>>      let () = print_endline x.s
>>>
>>> Using OCaml 4.01 the third line overwrites the field 's'  and the
>>> fourth line reads the updated field and prints "two".  Using OCaml
>>> 4.02 the initial value of the field is propagated past the write to
>>> the code in the fourth line, so the program prints "one".
>>>
>>> The code currently generated by atdgen assumes that it's safe to treat
>>> fields as if they were mutable -- that is, it assumes that it's safe
>>> to initialize a field with a value of the wrong type, so long as the
>>> value is overwritten before the field is first read.  I don't think
>>> such tricks were ever explicitly guaranteed to work, but they're now
>>> much more likely to fail, leading to the dummy initial value being
>>> accessed at an inappropriate type.
>>
>> Thanks for the explanation, Jeremy. I guess atdgen will have to use "option
>> refs" after all unless someone has a better idea.
>>
>> ATD definition:
>>
>> type t = {
>>   ?field0: foo option;
>>   ~field1: string;
>>   field2: int;
>> }
>>
>> Generated OCaml code:
>>
>> let field0 = ref None in
>> let field1 = ref "" in
>> let field2 = ref None in
>> ...
>> (* parse json fields coming in an unknown order *)
>> ...
>> {
>>   field0 = !field0;
>>   field1 = !field1;
>>   field2 = (match !field2 with None -> error ... | Some x - >x);
>>
>> }
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  6:00               ` Milan Stanojević
  2014-09-06  7:46                 ` Frédéric Bour
@ 2014-09-06 19:08                 ` Martin Jambon
  2014-09-06 20:31                   ` David MENTRÉ
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Jambon @ 2014-09-06 19:08 UTC (permalink / raw)
  To: Milan Stanojević; +Cc: Jeremy Yallop, Caml List

On 09/05/2014 11:00 PM, Milan Stanojević wrote:
> Could you do a dirty trick where you define a record that is the same
> as the one you have now except that is has mutable fields, then you do
> your parsing like now and then at the end return a record with
> immutable fields (using (Obj.magic mutable : immutable)? You just need
> to make sure that your mutable record doesn't escape your code.

I remember considering this solution in the past. It seems it would work.

I must say there's a strange feeling about creating more type 
definitions while breaking the type system.

> On Sat, Sep 6, 2014 at 1:51 AM, Martin Jambon
> <martin.jambon@ens-lyon.org> wrote:
>> On Fri 05 Sep 2014 05:12:44 PM PDT, Jeremy Yallop wrote:
>>>
>>> On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org>
>>> wrote:
>>>>
>>>> That code is generated by atdgen. What happens is that we have to either
>>>> create an empty record when starting to parse a list of unordered JSON
>>>> fields, or use a bunch `let <field name> = ref None in` for each field
>>>> and
>>>> create the record in the end. While the latter approach is not much more
>>>> work to implement, the resulting code was found to be significantly
>>>> slower.
>>>>
>>>> The reason why it's using `Obj.magic 0.0` is that it worked in all cases
>>>> (and has been for the past 4 years). Obtaining a well-formed constant
>>>> value
>>>> for any type is not trivial, so this what we have.
>>>>
>>>> It's very possible that it's now broken with OCaml 4.02. First try a
>>>> 'make
>>>> test' from atdgen's source directory (https://github.com/mjambon/atdgen)
>>>> and
>>>> see if it passes.
>>>
>>>
>>> It does seem to be broken, and the change in behaviour with 4.0.2 is
>>> apparently due to improved constant propagation
>>> (http://caml.inria.fr/mantis/view.php?id=5779).
>>>
>>> The compiler now takes more advantage of immutability to improve the
>>> memory usage and performance of programs.  It's safe (or ought to be
>>> safe) to assume that immutable record fields are never updated, so the
>>> values used to initialize the fields can be propagated to other parts
>>> of the program.  Here's a small example that shows the change in
>>> behaviour between 4.01 and 4.02.
>>>
>>>      type t = { s : string }
>>>      let x = { s =  "one" }
>>>      let () = Obj.(set_field (repr x) 0 (repr "two"))
>>>      let () = print_endline x.s
>>>
>>> Using OCaml 4.01 the third line overwrites the field 's'  and the
>>> fourth line reads the updated field and prints "two".  Using OCaml
>>> 4.02 the initial value of the field is propagated past the write to
>>> the code in the fourth line, so the program prints "one".
>>>
>>> The code currently generated by atdgen assumes that it's safe to treat
>>> fields as if they were mutable -- that is, it assumes that it's safe
>>> to initialize a field with a value of the wrong type, so long as the
>>> value is overwritten before the field is first read.  I don't think
>>> such tricks were ever explicitly guaranteed to work, but they're now
>>> much more likely to fail, leading to the dummy initial value being
>>> accessed at an inappropriate type.
>>
>>
>> Thanks for the explanation, Jeremy. I guess atdgen will have to use "option
>> refs" after all unless someone has a better idea.
>>
>> ATD definition:
>>
>> type t = {
>>   ?field0: foo option;
>>   ~field1: string;
>>   field2: int;
>> }
>>
>> Generated OCaml code:
>>
>> let field0 = ref None in
>> let field1 = ref "" in
>> let field2 = ref None in
>> ...
>> (* parse json fields coming in an unknown order *)
>> ...
>> {
>>   field0 = !field0;
>>   field1 = !field1;
>>   field2 = (match !field2 with None -> error ... | Some x - >x);
>>
>> }
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  7:46                 ` Frédéric Bour
@ 2014-09-06 19:15                   ` Martin Jambon
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Jambon @ 2014-09-06 19:15 UTC (permalink / raw)
  To: caml-list

On 09/06/2014 12:46 AM, Frédéric Bour wrote:
> Since it is generated code and offsets of the fields are already known,
> Obj.new_block / Obj.set_field / Obj.repr could do it too.
> The Obj.repr will be the only point were the typed version is introduced
> and it won't receive any mutation after that, this should be enough to
> make it works.

I'd love to do that. The reason why it's not implemented this way right 
now is that all-float records use a different representation than other 
records. Atdgen would need to know when such a representation is chosen 
by the OCaml compilers. This may a problem with field types that are 
abstract to atdgen but not to OCaml. I have to think about it.


> Of course, this might not be robust against future changes.
>
> On 06/09/2014 10:00, Milan Stanojević wrote:
>> Could you do a dirty trick where you define a record that is the same
>> as the one you have now except that is has mutable fields, then you do
>> your parsing like now and then at the end return a record with
>> immutable fields (using (Obj.magic mutable : immutable)? You just need
>> to make sure that your mutable record doesn't escape your code.
>>
>> On Sat, Sep 6, 2014 at 1:51 AM, Martin Jambon
>> <martin.jambon@ens-lyon.org> wrote:
>>> On Fri 05 Sep 2014 05:12:44 PM PDT, Jeremy Yallop wrote:
>>>> On 6 September 2014 00:39, Martin Jambon <martin.jambon@ens-lyon.org>
>>>> wrote:
>>>>> That code is generated by atdgen. What happens is that we have to
>>>>> either
>>>>> create an empty record when starting to parse a list of unordered JSON
>>>>> fields, or use a bunch `let <field name> = ref None in` for each field
>>>>> and
>>>>> create the record in the end. While the latter approach is not much
>>>>> more
>>>>> work to implement, the resulting code was found to be significantly
>>>>> slower.
>>>>>
>>>>> The reason why it's using `Obj.magic 0.0` is that it worked in all
>>>>> cases
>>>>> (and has been for the past 4 years). Obtaining a well-formed constant
>>>>> value
>>>>> for any type is not trivial, so this what we have.
>>>>>
>>>>> It's very possible that it's now broken with OCaml 4.02. First try a
>>>>> 'make
>>>>> test' from atdgen's source directory
>>>>> (https://github.com/mjambon/atdgen)
>>>>> and
>>>>> see if it passes.
>>>>
>>>> It does seem to be broken, and the change in behaviour with 4.0.2 is
>>>> apparently due to improved constant propagation
>>>> (http://caml.inria.fr/mantis/view.php?id=5779).
>>>>
>>>> The compiler now takes more advantage of immutability to improve the
>>>> memory usage and performance of programs.  It's safe (or ought to be
>>>> safe) to assume that immutable record fields are never updated, so the
>>>> values used to initialize the fields can be propagated to other parts
>>>> of the program.  Here's a small example that shows the change in
>>>> behaviour between 4.01 and 4.02.
>>>>
>>>>      type t = { s : string }
>>>>      let x = { s =  "one" }
>>>>      let () = Obj.(set_field (repr x) 0 (repr "two"))
>>>>      let () = print_endline x.s
>>>>
>>>> Using OCaml 4.01 the third line overwrites the field 's'  and the
>>>> fourth line reads the updated field and prints "two".  Using OCaml
>>>> 4.02 the initial value of the field is propagated past the write to
>>>> the code in the fourth line, so the program prints "one".
>>>>
>>>> The code currently generated by atdgen assumes that it's safe to treat
>>>> fields as if they were mutable -- that is, it assumes that it's safe
>>>> to initialize a field with a value of the wrong type, so long as the
>>>> value is overwritten before the field is first read.  I don't think
>>>> such tricks were ever explicitly guaranteed to work, but they're now
>>>> much more likely to fail, leading to the dummy initial value being
>>>> accessed at an inappropriate type.
>>>
>>> Thanks for the explanation, Jeremy. I guess atdgen will have to use
>>> "option
>>> refs" after all unless someone has a better idea.
>>>
>>> ATD definition:
>>>
>>> type t = {
>>>   ?field0: foo option;
>>>   ~field1: string;
>>>   field2: int;
>>> }
>>>
>>> Generated OCaml code:
>>>
>>> let field0 = ref None in
>>> let field1 = ref "" in
>>> let field2 = ref None in
>>> ...
>>> (* parse json fields coming in an unknown order *)
>>> ...
>>> {
>>>   field0 = !field0;
>>>   field1 = !field1;
>>>   field2 = (match !field2 with None -> error ... | Some x - >x);
>>>
>>> }
>>>
>>>
>>> --
>>> Caml-list mailing list.  Subscription management and archives:
>>> https://sympa.inria.fr/sympa/arc/caml-list
>>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06 19:08                 ` Martin Jambon
@ 2014-09-06 20:31                   ` David MENTRÉ
  2014-09-06 21:57                     ` Martin Jambon
  0 siblings, 1 reply; 25+ messages in thread
From: David MENTRÉ @ 2014-09-06 20:31 UTC (permalink / raw)
  To: caml-list

Hello Martin,

[ Disclaimer: I know nothing about atdgen. ]

2014-09-06 21:08, Martin Jambon:
> I must say there's a strange feeling about creating more type
> definitions while breaking the type system.

Apparently, such tricks are used for speed reasons. Why not provide two 
ways to generate code: one safe using only regular OCaml and one using 
Obj tricks. That way, your users could check in case they have an issue 
and fall-back to safer code until a proper fix is found.

Nonetheless, I still do think that having correct code is more important 
that having fast one, whatever is the speed ratio between the two.

Best regards,
david


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06 20:31                   ` David MENTRÉ
@ 2014-09-06 21:57                     ` Martin Jambon
  2014-09-07  7:34                       ` David MENTRÉ
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Jambon @ 2014-09-06 21:57 UTC (permalink / raw)
  To: David MENTRÉ; +Cc: caml-list

On Sat 06 Sep 2014 01:31:53 PM PDT, David MENTRÉ wrote:
> Hello Martin,
>
> [ Disclaimer: I know nothing about atdgen. ]
>
> 2014-09-06 21:08, Martin Jambon:
>> I must say there's a strange feeling about creating more type
>> definitions while breaking the type system.
>
> Apparently, such tricks are used for speed reasons. Why not provide
> two ways to generate code: one safe using only regular OCaml and one
> using Obj tricks. That way, your users could check in case they have
> an issue and fall-back to safer code until a proper fix is found.
>
> Nonetheless, I still do think that having correct code is more
> important that having fast one, whatever is the speed ratio between
> the two.

I like the idea. I'm a bit concerned about maintainability and 
continuous testing, though, since this would effectively split the user 
base between those using the safe option and those using the fast 
option.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06 21:57                     ` Martin Jambon
@ 2014-09-07  7:34                       ` David MENTRÉ
  0 siblings, 0 replies; 25+ messages in thread
From: David MENTRÉ @ 2014-09-07  7:34 UTC (permalink / raw)
  To: caml-list

Hello Martin,

2014-09-06 23:57, Martin Jambon:
> I'm a bit concerned about maintainability and continuous testing,
> though, since this would effectively split the user base between those
> using the safe option and those using the fast option.

Having both options would help continuous testing as you can compare 
both version results and flag any difference between them. They should 
have the same functional behavior, only speed should be different. 
Factorizing tests for the two versions is left as an exercise for the 
reader. ;-)

Best regards,
david


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-06  5:51             ` Martin Jambon
  2014-09-06  6:00               ` Milan Stanojević
@ 2014-09-07 18:47               ` Alain Frisch
  2014-09-08  1:28                 ` Martin Jambon
  1 sibling, 1 reply; 25+ messages in thread
From: Alain Frisch @ 2014-09-07 18:47 UTC (permalink / raw)
  To: Martin Jambon, Jeremy Yallop; +Cc: Caml List

On 9/6/2014 7:51 AM, Martin Jambon wrote:
> Thanks for the explanation, Jeremy. I guess atdgen will have to use
> "option refs" after all unless someone has a better idea.

I might be missing some context, but the current code seems to playing 
two different tricks with the type system:  using (Obj.magic 0.) as a 
dummy initial default value (to avoid references) and mutating normally 
immutable fields with Obj.set_field.  Is that right?  You might be able 
to keep the first trick, but storing the values in local references 
instead of field of the the target record (if those references don't 
espace from the function, they will be represented as local mutable 
variables, whose mutation might actually be more efficient than those of 
the record fields),  building the target record at the end by reading 
from those references.


-- Alain

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-07 18:47               ` Alain Frisch
@ 2014-09-08  1:28                 ` Martin Jambon
  2014-09-13 10:26                   ` Martin Jambon
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Jambon @ 2014-09-08  1:28 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Caml List, Jeremy Yallop

On Sun 07 Sep 2014 11:47:43 AM PDT, Alain Frisch wrote:
> On 9/6/2014 7:51 AM, Martin Jambon wrote:
>> Thanks for the explanation, Jeremy. I guess atdgen will have to use
>> "option refs" after all unless someone has a better idea.
>
> I might be missing some context, but the current code seems to playing
> two different tricks with the type system:  using (Obj.magic 0.) as a
> dummy initial default value (to avoid references) and mutating
> normally immutable fields with Obj.set_field.  Is that right?

Yes, exactly.

>  You
> might be able to keep the first trick, but storing the values in local
> references instead of field of the the target record (if those
> references don't espace from the function, they will be represented as
> local mutable variables, whose mutation might actually be more
> efficient than those of the record fields),  building the target
> record at the end by reading from those references.

Christophe Troestler also suggested this solution in a private reply. I 
was afraid of the cost of the refs, so it's great to know that they're 
optimized away.

I want to thank everyone for their suggestions. It's very helpful.

Martin


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-08  1:28                 ` Martin Jambon
@ 2014-09-13 10:26                   ` Martin Jambon
  2014-09-14  7:41                     ` Martin Jambon
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Jambon @ 2014-09-13 10:26 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Caml List, Jeremy Yallop

On Sun 07 Sep 2014 06:28:34 PM PDT, Martin Jambon wrote:
> On Sun 07 Sep 2014 11:47:43 AM PDT, Alain Frisch wrote:
>> On 9/6/2014 7:51 AM, Martin Jambon wrote:
>>> Thanks for the explanation, Jeremy. I guess atdgen will have to use
>>> "option refs" after all unless someone has a better idea.
>>
>> I might be missing some context, but the current code seems to playing
>> two different tricks with the type system:  using (Obj.magic 0.) as a
>> dummy initial default value (to avoid references) and mutating
>> normally immutable fields with Obj.set_field.  Is that right?
>
> Yes, exactly.
>
>>  You
>> might be able to keep the first trick, but storing the values in local
>> references instead of field of the the target record (if those
>> references don't espace from the function, they will be represented as
>> local mutable variables, whose mutation might actually be more
>> efficient than those of the record fields),  building the target
>> record at the end by reading from those references.
>
> Christophe Troestler also suggested this solution in a private reply.
> I was afraid of the cost of the refs, so it's great to know that
> they're optimized away.

I think we now have a working implementation for json readers in atdgen 
(https://github.com/mjambon/atdgen/tree/fix-magic-ocaml402).

However, atdgen also supports a binary format called biniou (Intro: 
http://mjambon.com/biniou.html Spec: 
https://github.com/mjambon/biniou/blob/master/biniou-format.txt). It is 
broken for the same reason as the JSON readers, but it is trickier to 
fix. Biniou allows sharing records and cyclic data. So far readers have 
been working by first creating a record with uninitialized fields, and 
then by proceeding with reading the field values. If a reference to the 
record is found, we just use the record that we already created, even 
if some of its fields are still unset.

The problem here is that we need to create the record before reading 
its fields. Perhaps we could use the solution proposed by Milan 
Stanojević earlier in this thread, consisting in creating a private 
record type with mutable fields and at the end convert it to the normal 
type using Obj.magic.

... Or we could just drop support for some features, hence the 
following questions:

1. Does anyone use biniou at all?
2. Does anyone use biniou with cyclic data?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] segfault in simple program with 4.02 native
  2014-09-13 10:26                   ` Martin Jambon
@ 2014-09-14  7:41                     ` Martin Jambon
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Jambon @ 2014-09-14  7:41 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Caml List, Jeremy Yallop

On 09/13/2014 03:26 AM, Martin Jambon wrote:
> On Sun 07 Sep 2014 06:28:34 PM PDT, Martin Jambon wrote:
>> On Sun 07 Sep 2014 11:47:43 AM PDT, Alain Frisch wrote:
>>> On 9/6/2014 7:51 AM, Martin Jambon wrote:
>>>> Thanks for the explanation, Jeremy. I guess atdgen will have to use
>>>> "option refs" after all unless someone has a better idea.
>>>
>>> I might be missing some context, but the current code seems to playing
>>> two different tricks with the type system:  using (Obj.magic 0.) as a
>>> dummy initial default value (to avoid references) and mutating
>>> normally immutable fields with Obj.set_field.  Is that right?
>>
>> Yes, exactly.
>>
>>>  You
>>> might be able to keep the first trick, but storing the values in local
>>> references instead of field of the the target record (if those
>>> references don't espace from the function, they will be represented as
>>> local mutable variables, whose mutation might actually be more
>>> efficient than those of the record fields),  building the target
>>> record at the end by reading from those references.
>>
>> Christophe Troestler also suggested this solution in a private reply.
>> I was afraid of the cost of the refs, so it's great to know that
>> they're optimized away.
>
> I think we now have a working implementation for json readers in atdgen
> (https://github.com/mjambon/atdgen/tree/fix-magic-ocaml402).
>
> However, atdgen also supports a binary format called biniou (Intro:
> http://mjambon.com/biniou.html Spec:
> https://github.com/mjambon/biniou/blob/master/biniou-format.txt). It is
> broken for the same reason as the JSON readers, but it is trickier to
> fix. Biniou allows sharing records and cyclic data. So far readers have
> been working by first creating a record with uninitialized fields, and
> then by proceeding with reading the field values. If a reference to the
> record is found, we just use the record that we already created, even if
> some of its fields are still unset.
>
> The problem here is that we need to create the record before reading its
> fields. Perhaps we could use the solution proposed by Milan Stanojević
> earlier in this thread, consisting in creating a private record type
> with mutable fields and at the end convert it to the normal type using
> Obj.magic.
>
> ... Or we could just drop support for some features, hence the following
> questions:
>
> 1. Does anyone use biniou at all?
> 2. Does anyone use biniou with cyclic data?

atdgen 1.4.0 is out (soon in opam). It should work with OCaml 4.02 for 
both json and biniou.

I dropped support for sharing, so no more support for serialization of 
cyclic data using biniou.


Martin


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-09-14  7:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05 21:33 [Caml-list] segfault in simple program with 4.02 native Ashish Agarwal
2014-09-05 21:50 ` Andy Ray
2014-09-05 21:56 ` Richard W.M. Jones
2014-09-05 22:01   ` Sebastien Mondet
2014-09-05 22:06   ` Ashish Agarwal
2014-09-05 22:13     ` Richard W.M. Jones
2014-09-05 22:18       ` Richard W.M. Jones
2014-09-05 22:36         ` Török Edwin
2014-09-05 22:39         ` Martin Jambon
2014-09-05 23:39           ` Ashish Agarwal
2014-09-05 23:59             ` Martin Jambon
2014-09-06  0:12           ` Jeremy Yallop
2014-09-06  5:51             ` Martin Jambon
2014-09-06  6:00               ` Milan Stanojević
2014-09-06  7:46                 ` Frédéric Bour
2014-09-06 19:15                   ` Martin Jambon
2014-09-06 19:08                 ` Martin Jambon
2014-09-06 20:31                   ` David MENTRÉ
2014-09-06 21:57                     ` Martin Jambon
2014-09-07  7:34                       ` David MENTRÉ
2014-09-07 18:47               ` Alain Frisch
2014-09-08  1:28                 ` Martin Jambon
2014-09-13 10:26                   ` Martin Jambon
2014-09-14  7:41                     ` Martin Jambon
2014-09-05 22:18       ` Christoph Höger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).