caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* ocaml+twt v0.90
@ 2007-01-16 20:48 Mike Lin
  2007-01-17  9:14 ` marshaling limits Sebastien Ferre
  2007-01-23 20:43 ` [Caml-list] ocaml+twt v0.90 Ingo Bormuth
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Lin @ 2007-01-16 20:48 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

I just posted a new version of ocaml+twt, a preprocessor that lets you use
indentation to avoid multi-line parenthesization (like Python or Haskell).

http://people.csail.mit.edu/mikelin/ocaml+twt

This version introduces a major backwards-incompatible change: the
eradication of "in" from let expressions, and the need to indent the let
body (as suggested by the F# lightweight syntax). This reduces the
familiar phenomenon of long function bodies getting progressively more
indented as they go along. That is, before where you had:

let x = 5 in
  printf "%d\n" x
  let y = x+1 in
    printf  "%d\n" y

You'd now just write:

let x = 5
printf "%d\n" x
let y = x+1
printf "%d\n" y

I was hesitant to introduce this feature because it's extra hackish in
implementation (even moreso than the rest of this house of cards). It also
removes some programmer freedom, because you cannot have the let body on the
same line as the let, and you cannot have a statement sequentially following
the let, outside the scope of the binding. But after playing with it, I
think it is worthwhile. Please let me know what you think. I am still not
completely sure that I haven't broken something profound that will force me
to totally backtrack this change, but let's give it a try. I will obviously
keep the 0.8x versions around for those who prefer it and for existing code
(including a lot of my own).

Standard disclaimer: ocaml+twt is a flagrant, stupendous,
borderline-ridiculous hack, but it works quite well, I write all my new code
using it, and I recommend it if you like this style. On the other hand, if
someone with more free time and knowledge of camlp4 wants to step up, I have
a couple ideas about how you might do it right...

Mike

[-- Attachment #2: Type: text/html, Size: 2300 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* marshaling limits
  2007-01-16 20:48 ocaml+twt v0.90 Mike Lin
@ 2007-01-17  9:14 ` Sebastien Ferre
  2007-01-17  9:36   ` [Caml-list] " Olivier Andrieu
  2007-01-17 15:33   ` Frédéric Gava
  2007-01-23 20:43 ` [Caml-list] ocaml+twt v0.90 Ingo Bormuth
  1 sibling, 2 replies; 14+ messages in thread
From: Sebastien Ferre @ 2007-01-17  9:14 UTC (permalink / raw)
  To: caml-list

Hi,

I get a segmentation fault when marshalling
a large data structure. I could produce a file
of ~30MB, but for a larger data structure of
the same kind, I get a seg fault.

Do you know of any limit in the marshalling
functions w.r.t. size ?

Some part of my data structure are big doubly linked
graphs.

---
Sébastien Ferré


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17  9:14 ` marshaling limits Sebastien Ferre
@ 2007-01-17  9:36   ` Olivier Andrieu
  2007-01-17 15:33   ` Frédéric Gava
  1 sibling, 0 replies; 14+ messages in thread
From: Olivier Andrieu @ 2007-01-17  9:36 UTC (permalink / raw)
  To: Sébastien Ferre; +Cc: caml-list

On 1/17/07, Sebastien Ferre <ferre@irisa.fr> wrote:
> Hi,
>
> I get a segmentation fault when marshalling
> a large data structure. I could produce a file
> of ~30MB, but for a larger data structure of
> the same kind, I get a seg fault.
>
> Do you know of any limit in the marshalling
> functions w.r.t. size ?

Indeed, the marshalling/unmarshalling functions can overflow the
execution stack. You could try to increase maximum stack size for your
process (ulimit -s with a Unix shell).

-- 
  Olivier


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17  9:14 ` marshaling limits Sebastien Ferre
  2007-01-17  9:36   ` [Caml-list] " Olivier Andrieu
@ 2007-01-17 15:33   ` Frédéric Gava
  2007-01-17 15:41     ` Sebastien Ferre
  1 sibling, 1 reply; 14+ messages in thread
From: Frédéric Gava @ 2007-01-17 15:33 UTC (permalink / raw)
  To: Sébastien Ferre, caml-list

Salut,

cela provient du fait que tu passes par le Marshaling c'est-à-dire que 
tu transformes ta donnée en une chaîne de caractères. Or, celles-ci ont 
une taille limite (voir module Sys pour la valeur exacte) d'où le seg fault.

A mon avis essaye d'écrire directement ta valeur dans le fichier avec un 
output_value ou bien utilise "ocaml xml" pour lire/écrire des données 
sous le format xml (c'est plus bcp lent mais cela passera à coup sûr la 
limitation des 30 Mo)

Amicalement,
Frédéric Gava

Sebastien Ferre a écrit :
> Hi,
> 
> I get a segmentation fault when marshalling
> a large data structure. I could produce a file
> of ~30MB, but for a larger data structure of
> the same kind, I get a seg fault.
> 
> Do you know of any limit in the marshalling
> functions w.r.t. size ?
> 
> Some part of my data structure are big doubly linked
> graphs.
> 
> ---
> Sébastien Ferré
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 15:33   ` Frédéric Gava
@ 2007-01-17 15:41     ` Sebastien Ferre
  2007-01-17 16:12       ` Daniel Bünzli
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastien Ferre @ 2007-01-17 15:41 UTC (permalink / raw)
  To: caml-list; +Cc: Frédéric Gava


pourtant, je passe bien par un appel a output_value
dans un fichier, sans passer par une chaine intermediaire.

Amicalement,
Sebastien

Frédéric Gava wrote:
> Salut,
> 
> cela provient du fait que tu passes par le Marshaling c'est-à-dire que 
> tu transformes ta donnée en une chaîne de caractères. Or, celles-ci ont 
> une taille limite (voir module Sys pour la valeur exacte) d'où le seg 
> fault.
> 
> A mon avis essaye d'écrire directement ta valeur dans le fichier avec un 
> output_value ou bien utilise "ocaml xml" pour lire/écrire des données 
> sous le format xml (c'est plus bcp lent mais cela passera à coup sûr la 
> limitation des 30 Mo)
> 
> Amicalement,
> Frédéric Gava
> 
> Sebastien Ferre a écrit :
> 
>> Hi,
>>
>> I get a segmentation fault when marshalling
>> a large data structure. I could produce a file
>> of ~30MB, but for a larger data structure of
>> the same kind, I get a seg fault.
>>
>> Do you know of any limit in the marshalling
>> functions w.r.t. size ?
>>
>> Some part of my data structure are big doubly linked
>> graphs.
>>
>> ---
>> Sébastien Ferré
>>
>> _______________________________________________
>> Caml-list mailing list. Subscription management:
>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>> Archives: http://caml.inria.fr
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 15:41     ` Sebastien Ferre
@ 2007-01-17 16:12       ` Daniel Bünzli
  2007-01-17 16:32         ` Olivier Andrieu
  2007-01-17 16:34         ` Sebastien Ferre
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Bünzli @ 2007-01-17 16:12 UTC (permalink / raw)
  To: Sebastien Ferre; +Cc: caml-list


Le 17 janv. 07 à 16:41, Sebastien Ferre a écrit :

> pourtant, je passe bien par un appel a output_value
> dans un fichier, sans passer par une chaine intermediaire.

Maybe output_value uses a string internally. Try with a bytecode  
version of your executable, an exception should be raised (or have a  
look at the implementaiton of output_value).

Best,

Daniel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 16:12       ` Daniel Bünzli
@ 2007-01-17 16:32         ` Olivier Andrieu
  2007-01-18  8:14           ` Sebastien Ferre
  2007-01-17 16:34         ` Sebastien Ferre
  1 sibling, 1 reply; 14+ messages in thread
From: Olivier Andrieu @ 2007-01-17 16:32 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Sebastien Ferre, caml-list

On 1/17/07, Daniel Bünzli <daniel.buenzli@epfl.ch> wrote:
>
> Le 17 janv. 07 à 16:41, Sebastien Ferre a écrit :
>
> > pourtant, je passe bien par un appel a output_value
> > dans un fichier, sans passer par une chaine intermediaire.
>
> Maybe output_value uses a string internally. Try with a bytecode
> version of your executable, an exception should be raised (or have a
> look at the implementaiton of output_value).

output_value doesn't use a string internally, it uses malloc. Anyway,
if the marshalling function runs out of memory (wether because malloc
returns NULL or because the caml string is too large), an
Out_of_memory exception is raised.

If it segfaults, that's most probably because the marshalling runs out
of executable stack (because of too much recursion). I've seen it do
this before. The "fix" is to increase the maximum size of the
executable stack.

The behavior is the same with bytecode or native code since it's not
the interpreter's stack that overflows, it's the C one.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 16:12       ` Daniel Bünzli
  2007-01-17 16:32         ` Olivier Andrieu
@ 2007-01-17 16:34         ` Sebastien Ferre
  2007-01-17 19:37           ` Jonathan Roewen
  2007-01-17 19:50           ` Yaron Minsky
  1 sibling, 2 replies; 14+ messages in thread
From: Sebastien Ferre @ 2007-01-17 16:34 UTC (permalink / raw)
  To: caml-list; +Cc: Daniel Bünzli


Daniel Bünzli wrote:

>> pourtant, je passe bien par un appel a output_value
>> dans un fichier, sans passer par une chaine intermediaire.
> 
> Maybe output_value uses a string internally. Try with a bytecode  
> version of your executable, an exception should be raised (or have a  
> look at the implementaiton of output_value).

I used a bytecode version.

I checked the code of output_value, and it uses an internal
string. So it won't work.

Anyway, I knew I would have to go for a more serious
solution as soon as data get really large. I think of
using something like GDBM.

Thanks for the help.
Sebastien


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 16:34         ` Sebastien Ferre
@ 2007-01-17 19:37           ` Jonathan Roewen
  2007-01-17 19:50           ` Yaron Minsky
  1 sibling, 0 replies; 14+ messages in thread
From: Jonathan Roewen @ 2007-01-17 19:37 UTC (permalink / raw)
  To: Sebastien Ferre; +Cc: caml-list, Daniel Bünzli

I'm sure one of the marshalling functions uses malloc internally. Have
you tried Marshal.to_channel? That _should_ avoid using ocaml strings.

Jonathan

On 1/18/07, Sebastien Ferre <ferre@irisa.fr> wrote:
>
> Daniel Bünzli wrote:
>
> >> pourtant, je passe bien par un appel a output_value
> >> dans un fichier, sans passer par une chaine intermediaire.
> >
> > Maybe output_value uses a string internally. Try with a bytecode
> > version of your executable, an exception should be raised (or have a
> > look at the implementaiton of output_value).
>
> I used a bytecode version.
>
> I checked the code of output_value, and it uses an internal
> string. So it won't work.
>
> Anyway, I knew I would have to go for a more serious
> solution as soon as data get really large. I think of
> using something like GDBM.
>
> Thanks for the help.
> Sebastien
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 16:34         ` Sebastien Ferre
  2007-01-17 19:37           ` Jonathan Roewen
@ 2007-01-17 19:50           ` Yaron Minsky
  2007-01-17 22:51             ` Markus Mottl
  1 sibling, 1 reply; 14+ messages in thread
From: Yaron Minsky @ 2007-01-17 19:50 UTC (permalink / raw)
  To: Sebastien Ferre; +Cc: caml-list, Daniel Bünzli

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

Don't quote me on this, but I believe that marshal uses a string in bytecode
with threads, uses straight malloc with bytecode and no threads, and never
uses strings in native code.  I'm /very/ unsure about that last one, but I
am pretty confident that in some cases, whether it uses strings depends on
whether threads are involved.

y

On 1/17/07, Sebastien Ferre <ferre@irisa.fr> wrote:
>
>
> Daniel Bünzli wrote:
>
> >> pourtant, je passe bien par un appel a output_value
> >> dans un fichier, sans passer par une chaine intermediaire.
> >
> > Maybe output_value uses a string internally. Try with a bytecode
> > version of your executable, an exception should be raised (or have a
> > look at the implementaiton of output_value).
>
> I used a bytecode version.
>
> I checked the code of output_value, and it uses an internal
> string. So it won't work.
>
> Anyway, I knew I would have to go for a more serious
> solution as soon as data get really large. I think of
> using something like GDBM.
>
> Thanks for the help.
> Sebastien
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1929 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 19:50           ` Yaron Minsky
@ 2007-01-17 22:51             ` Markus Mottl
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Mottl @ 2007-01-17 22:51 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: Sebastien Ferre, caml-list, Daniel Bünzli

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

On 1/17/07, Yaron Minsky <yminsky@cs.cornell.edu> wrote:
>
> Don't quote me on this, but I believe that marshal uses a string in
> bytecode with threads, uses straight malloc with bytecode and no threads,
> and never uses strings in native code.  I'm /very/ unsure about that last
> one, but I am pretty confident that in some cases, whether it uses strings
> depends on whether threads are involved.
>

I think the question is more along the lines "byte code threads" vs. native
(e.g. POSIX) threads rather than "byte vs. native code".  It's true that
byte code threads, which can naturally only be used with byte code, require
an intermediate copy step to OCaml-strings if you want to write to
channels.  That's bad on 32bit platforms due to the size limitations on
strings (< 16MB).

I'd recommend using Bigarrays of characters to marshal out data in cases
where OCaml-strings don't suffice.  The code for this is extremely simple:

  extern CAMLprim int
  caml_output_value_to_block(value v, value v_flags, char *bstr, int len);

  CAMLprim value bigstring_marshal_stub(value v, value v_flags)
  {
    char *buf;
    long len;
    int alloc_flags = BIGARRAY_UINT8 | BIGARRAY_C_LAYOUT | BIGARRAY_MANAGED;
    caml_output_value_to_malloc(v, v_flags, &buf, &len);
    return alloc_bigarray(alloc_flags, 1, buf, &len);
  }

The signature of the OCaml-function is:

  external marshal : 'a -> Marshal.extern_flags list -> t  =
"bigstring_marshal_stub"

Where type "t" is a bigarray of characters with C-layout.

You can even do without the intermediate copying if you know the maximum
size of the marshalled data and preallocate a bigarray for that.  Use
"caml_output_value_to_block" for that purpose.  It's defined in
"byterun/extern.c" of the OCaml-distribution.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

[-- Attachment #2: Type: text/html, Size: 2620 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] marshaling limits
  2007-01-17 16:32         ` Olivier Andrieu
@ 2007-01-18  8:14           ` Sebastien Ferre
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastien Ferre @ 2007-01-18  8:14 UTC (permalink / raw)
  To: caml-list; +Cc: Olivier Andrieu, Daniel Bünzli


Olivier Andrieu wrote:
> On 1/17/07, Daniel Bünzli <daniel.buenzli@epfl.ch> wrote:
> 
>>
>> Le 17 janv. 07 à 16:41, Sebastien Ferre a écrit :
>>
>> > pourtant, je passe bien par un appel a output_value
>> > dans un fichier, sans passer par une chaine intermediaire.
>>
>> Maybe output_value uses a string internally. Try with a bytecode
>> version of your executable, an exception should be raised (or have a
>> look at the implementaiton of output_value).

> If it segfaults, that's most probably because the marshalling runs out
> of executable stack (because of too much recursion). I've seen it do
> this before. The "fix" is to increase the maximum size of the
> executable stack.

Indeed, you're right.
I could solve the problem by using the 'ulimit -s' command.

> The behavior is the same with bytecode or native code since it's not
> the interpreter's stack that overflows, it's the C one.

I didn't know the existence of this C stack.
How can I have an idea of the necessary size ?
Is it related to the depth of data structures to
be marshaled ?

Thanks !

Sébastien



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] ocaml+twt v0.90
  2007-01-16 20:48 ocaml+twt v0.90 Mike Lin
  2007-01-17  9:14 ` marshaling limits Sebastien Ferre
@ 2007-01-23 20:43 ` Ingo Bormuth
       [not found]   ` <2a1a1a0c0701231322h48e3af00m9f07371f236fe7c@mail.gmail.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Ingo Bormuth @ 2007-01-23 20:43 UTC (permalink / raw)
  To: caml-list, mikelin

On 2007-01-16 15:48, Mike Lin wrote:
> This version introduces a major backwards-incompatible change: the
> eradication of "in" from let expressions, and the need to indent the let
> body (as suggested by the F# lightweight syntax). 

I downloaded the new version some day ago and immediately fell in love
with the compact syntax. In my opinion it feels much more natural.
I especially realized that it took me more effort to convert old
ocaml+twt code (lots of semantically relevant indentation changes) then
it did to convert vanilla ocaml code (essentially s/ *\( in\|;\)$//g
plus some optional parentheses removal).

> I was hesitant to introduce this feature because it's extra hackish in
> implementation (even moreso than the rest of this house of cards). It also
> removes some programmer freedom, because you cannot have the let body on the
> same line as the let, and you cannot have a statement sequentially following
> the let, outside the scope of the binding. 

A let body beginning in the first line is no problem if you add an
additional semicolon:

let print x y = print_string x ;   (* <-- note the semicolon *)
  print_string " "
  print_string y
print "Hello" "World"


If you need a function in private scope you can easily declare and call 
it inside a 'let _ =' block:

let x = 5
printf "%d\n" x
let _ =
  let y = x+1
  printf "%d\n" y
printf "no y here"


I ran into some minor problems due to ocaml+twt not recognizing the
object related syntax. As I personally use it only in rare cases, I
ended up with just putting the critical section in one long line.

I suggest to implement the '#light' pragma (as in f#) which would allow 
to swith on and off indentation awareness on the fly. This would also 
enable me to replace all ocaml compilers by wrappers calling ocaml+twt
implicitly. If you want I can prepare a little patch.

Thanks for your effort -- keep going on

  Ingo



-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] ocaml+twt v0.90
       [not found]   ` <2a1a1a0c0701231322h48e3af00m9f07371f236fe7c@mail.gmail.com>
@ 2007-01-24 16:09     ` Ingo Bormuth
  0 siblings, 0 replies; 14+ messages in thread
From: Ingo Bormuth @ 2007-01-24 16:09 UTC (permalink / raw)
  To: Mike Lin; +Cc: caml-list

On 2007-01-23 16:22, Mike Lin wrote:
> Do you have any examples of this lying around? Objects are "supposed" to
> work, although I have not tested it in any project of appreciable size. I
> definitely want to fix it where it is broken.

You're right. I isolated the problem to the following piece of code:

let _x = ref 0
_x := 1

ocaml+twt complains: 'syntax error at line 2'

I think you should add a '_' to the regular expression for identifiers 
in line 218 of ocaml+twt.ml.

Sorry for the false alarm about object orientation (in my code if 
had 'val __dbg' inside a class definition).

Anyway I'd regard the #light pragma as very desirable.


Ingo


-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-01-24 16:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-16 20:48 ocaml+twt v0.90 Mike Lin
2007-01-17  9:14 ` marshaling limits Sebastien Ferre
2007-01-17  9:36   ` [Caml-list] " Olivier Andrieu
2007-01-17 15:33   ` Frédéric Gava
2007-01-17 15:41     ` Sebastien Ferre
2007-01-17 16:12       ` Daniel Bünzli
2007-01-17 16:32         ` Olivier Andrieu
2007-01-18  8:14           ` Sebastien Ferre
2007-01-17 16:34         ` Sebastien Ferre
2007-01-17 19:37           ` Jonathan Roewen
2007-01-17 19:50           ` Yaron Minsky
2007-01-17 22:51             ` Markus Mottl
2007-01-23 20:43 ` [Caml-list] ocaml+twt v0.90 Ingo Bormuth
     [not found]   ` <2a1a1a0c0701231322h48e3af00m9f07371f236fe7c@mail.gmail.com>
2007-01-24 16:09     ` Ingo Bormuth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).