caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Basile Starynkevitch [local]" <basile.starynkevitch@inria.fr>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Gripes with array
Date: Thu, 9 Sep 2004 14:08:45 +0200	[thread overview]
Message-ID: <20040909120845.GA26938@bourg.inria.fr> (raw)
In-Reply-To: <20040909.110850.59463842.oandrieu@nerim.net>

On Thu, Sep 09, 2004 at 11:08:50AM +0200, Olivier Andrieu wrote:
>  Richard Jones [Thu, 9 Sep 2004]:
>  > On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
>  > > 
>  > > Jon Harrop writes:
>  > >  > 
>  > >  > Does anyone have any pointers to information about the origin of the size 
>  > >  > limit for arrays? [....]
>  > > 
>  > > In ocaml sources, the  file byterun/mlvalues.h gives all details about
>  > > the  block  header structure.  [....]
>  > > 
>  > > But I must  agree with you: this is definitely too  small and we could
>  > > imagine  that, when the  tag says  a block  is an  array, the  size is
>  > > stored within the first (or the last) field instead.

I do agree that using Bigarrays is the way to go, until everyone has a
64 bits machine. For completeness, we could also consider the
following scheme (which I am NOT volunteering to implement!)

   The header layout remains the same (so only 22 bits for size on 32
   bits machine), but if the size is all bit ones, the block is
   actually a fixed block, and the real array size is the word before
   the header.

However, I see another potential problem (which could already
potentially appear today, with standard 0caml 3.08, on 64 bits
machines with more than 1Gbyte of RAM). If you have a huge array of
pointers, the garbage collector (even the minor one) has to scan the
full array - and this scan is "atomic" in the sense that it is not
interruptible (and I believe that designing a GC which incrementally
scans big values by chunks is not trivial, given Ocaml GC performance
needs). So for an array of say 300 million pointers, the GC has to
scan it, which takes a significant time (I would guess several tenths
of seconds at least, just scanning this single array).

I am asking, do lucky people with a 64 bits machines and plenty of RAM
did experiment some bizarre GC behavior when handling such monster
pointer arrays in Ocaml - for example,
   let monster =  Array.init 300_000_000 (fun i -> ((Printf.sprintf "int%d" i), i))

More practically, I would be curious to hear from people having run
Ocaml programs (on rather big 64 bits hardware, with multigigabyte
RAM) in processes of more than a gigabyte of Ocaml heap! Does the GC
works well in its default setting, or have they to tune it?

>  > I have a similar problem with the maximum size of strings.  In
>  > practical terms, it limits the size of file uploads to COCANWIKI to
>  > around 6 MB (ie., not very much) [not the full 16 MB because of
>  > character escaping, but even 16 MB would be far too small]. [....]

FWIW, I had similar limitations in Poesia more than a year ago. I
solved it by specifying that Poesia (a web filter) won't handle web
content of more than ten million bytes. (Maybe an enhanced buffer
package, representing buffer contents in array of strings, could
help).

> 
>  > Does the tag field need to be so wide?  What does the tag mean if it
>  > has different values < No_scan_tag (251)?
> 
> it's for variants (with or without arguments)

Apparently, people are less bitten by the maximal number of
variants. I guess that most applications don't have big sum (ie
variant) types with more than a hundred of non-empty choices (ie
Choice of ... construct).

Regarding very big data structures, I tend to believe that they should
be more organized than just a single monster array (hence the current
array limits on 32 bits machine is a sensible tradeoff), even on 64
bits irons. But I have no practical experience on these.

-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project cristal.inria.fr -  temporarily
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-09-09 12:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-09  2:10 Jon Harrop
2004-09-09  5:08 ` Ville-Pertti Keinonen
2004-09-09  7:17 ` Jean-Christophe Filliatre
2004-09-09  8:23   ` Richard Jones
2004-09-09  9:08     ` Olivier Andrieu
2004-09-09 12:08       ` Basile Starynkevitch [local] [this message]
2004-09-09 12:31         ` Damien Doligez
2004-09-09 10:42     ` Gerd Stolpmann
2004-09-09  9:37 ` Damien Doligez
2004-09-09 10:34   ` Jean-Christophe Filliatre
2004-09-09 12:15     ` Igor Pechtchanski
2004-09-09 13:01   ` Brian Hurt
2004-09-09 20:08     ` [Caml-list] 32-bit is sticking around Brandon J. Van Every
2004-09-09 21:04       ` Jon Harrop
2004-09-11 15:30         ` Lars Nilsson
2004-09-11 16:24           ` [off topic] " David MENTRE
2004-09-11 17:52             ` Lars Nilsson
     [not found]           ` <200409111656.11952.jon@jdh30.plus.com>
2004-09-11 17:47             ` Lars Nilsson
2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
2004-09-10  8:53       ` Richard Jones
2004-09-10 14:50         ` Damien Doligez
2004-09-13  7:02       ` Christophe Raffalli
2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
2004-09-11  1:43       ` skaller
2004-09-11  3:16         ` skaller
2004-09-11 14:36       ` Jon Harrop
2004-09-11 20:53         ` Damien Doligez
2004-09-12 15:33           ` Jon Harrop
2004-09-12 16:07             ` Basile Starynkevitch [local]
2004-09-10 23:48 ` brogoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040909120845.GA26938@bourg.inria.fr \
    --to=basile.starynkevitch@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).