caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Gripes with array
@ 2004-09-09  2:10 Jon Harrop
  2004-09-09  5:08 ` Ville-Pertti Keinonen
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Jon Harrop @ 2004-09-09  2:10 UTC (permalink / raw)
  To: caml-list


I'm increasingly finding the outrageously finite size limit of arrays to be a 
pain. In particular, I'm peeved that the size limit is itself a function of 
the type which, therefore, makes writing polymorphic functions over arrays 
nay-on impossible (e.g. to make an array of maximum-sized arrays). Can 
anything be done about this? Am I right in thinking that the maximum 
non-float array size on a 64-bit machine is 18,014,398,509,481,983?

Also, can Array.init be made to fill the elements only once? This would make 
quite a few things twice as fast (Indeed, I'd always assumed that this was 
the point of having Array.init, having read some of Skaller's previous 
ramblings ;-). Array.copy could then be written more succinctly and 
efficiently in terms of Array.init as:

let copy a = init (length a) (fun i -> a.(i))

Does anyone have any pointers to information about the origin of the size 
limit for arrays? I assume it is something to do with the garbage collector 
using a fixed-size tag instead of a variable-size one but I'd be interested 
in the details.

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  2:10 [Caml-list] Gripes with array Jon Harrop
@ 2004-09-09  5:08 ` Ville-Pertti Keinonen
  2004-09-09  7:17 ` Jean-Christophe Filliatre
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Ville-Pertti Keinonen @ 2004-09-09  5:08 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:

>Does anyone have any pointers to information about the origin of the size 
>limit for arrays? I assume it is something to do with the garbage collector 
>using a fixed-size tag instead of a variable-size one but I'd be interested 
>in the details.
>  
>
You're correct, see byterun/mlvalues.h for details.

I suspect variable-length tags would probably increase complexity and 
decrease performance considerably...and personally I hope we'll all be 
migrating to 64-bit architectures soon, anyhow.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  2:10 [Caml-list] Gripes with array Jon Harrop
  2004-09-09  5:08 ` Ville-Pertti Keinonen
@ 2004-09-09  7:17 ` Jean-Christophe Filliatre
  2004-09-09  8:23   ` Richard Jones
  2004-09-09  9:37 ` Damien Doligez
  2004-09-10 23:48 ` brogoff
  3 siblings, 1 reply; 31+ messages in thread
From: Jean-Christophe Filliatre @ 2004-09-09  7:17 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


Jon Harrop writes:
 > 
 > Does anyone have any pointers to information about the origin of the size 
 > limit for arrays? I assume it is something to do with the garbage collector 
 > using a fixed-size tag instead of a variable-size one but I'd be interested 
 > in the details.

In ocaml sources, the  file byterun/mlvalues.h gives all details about
the  block  header structure.  There  you can  see  that,  on 32  bits
architecture,  the block size  (in words)  is stored  on 22  bits. And
indeed Sys.max_array_length is equal to 2^22-1.

But I must  agree with you: this is definitely too  small and we could
imagine  that, when the  tag says  a block  is an  array, the  size is
stored within the first (or the last) field instead.

-- 
Jean-Christophe Filliâtre


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  7:17 ` Jean-Christophe Filliatre
@ 2004-09-09  8:23   ` Richard Jones
  2004-09-09  9:08     ` Olivier Andrieu
  2004-09-09 10:42     ` Gerd Stolpmann
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Jones @ 2004-09-09  8:23 UTC (permalink / raw)
  Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
> 
> Jon Harrop writes:
>  > 
>  > Does anyone have any pointers to information about the origin of the size 
>  > limit for arrays? I assume it is something to do with the garbage collector 
>  > using a fixed-size tag instead of a variable-size one but I'd be interested 
>  > in the details.
> 
> In ocaml sources, the  file byterun/mlvalues.h gives all details about
> the  block  header structure.  There  you can  see  that,  on 32  bits
> architecture,  the block size  (in words)  is stored  on 22  bits. And
> indeed Sys.max_array_length is equal to 2^22-1.
> 
> But I must  agree with you: this is definitely too  small and we could
> imagine  that, when the  tag says  a block  is an  array, the  size is
> stored within the first (or the last) field instead.

I have a similar problem with the maximum size of strings.  In
practical terms, it limits the size of file uploads to COCANWIKI to
around 6 MB (ie., not very much) [not the full 16 MB because of
character escaping, but even 16 MB would be far too small].

Does the tag field need to be so wide?  What does the tag mean if it
has different values < No_scan_tag (251)?

Agree with the comment about us all migrating to 64 bit architectures
very soon, although I've been waiting to do this since around '92.

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
MOD_CAML lets you run type-safe Objective CAML programs inside the Apache
webserver. http://www.merjis.com/developers/mod_caml/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  8:23   ` Richard Jones
@ 2004-09-09  9:08     ` Olivier Andrieu
  2004-09-09 12:08       ` Basile Starynkevitch [local]
  2004-09-09 10:42     ` Gerd Stolpmann
  1 sibling, 1 reply; 31+ messages in thread
From: Olivier Andrieu @ 2004-09-09  9:08 UTC (permalink / raw)
  To: rich; +Cc: caml-list

 Richard Jones [Thu, 9 Sep 2004]:
 > On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
 > > 
 > > Jon Harrop writes:
 > >  > 
 > >  > Does anyone have any pointers to information about the origin of the size 
 > >  > limit for arrays? I assume it is something to do with the garbage collector 
 > >  > using a fixed-size tag instead of a variable-size one but I'd be interested 
 > >  > in the details.
 > > 
 > > In ocaml sources, the  file byterun/mlvalues.h gives all details about
 > > the  block  header structure.  There  you can  see  that,  on 32  bits
 > > architecture,  the block size  (in words)  is stored  on 22  bits. And
 > > indeed Sys.max_array_length is equal to 2^22-1.
 > > 
 > > But I must  agree with you: this is definitely too  small and we could
 > > imagine  that, when the  tag says  a block  is an  array, the  size is
 > > stored within the first (or the last) field instead.
 > 
 > I have a similar problem with the maximum size of strings.  In
 > practical terms, it limits the size of file uploads to COCANWIKI to
 > around 6 MB (ie., not very much) [not the full 16 MB because of
 > character escaping, but even 16 MB would be far too small].

You can use Bigarrays:

open Bigarray
type bigstring = (char, int8_unsigned_elt, c_layout) Array1.t

all you need is to write some blitting functions for conversions to
and from regular strings.

 > Does the tag field need to be so wide?  What does the tag mean if it
 > has different values < No_scan_tag (251)?

it's for variants (with or without arguments)

-- 
   Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  2:10 [Caml-list] Gripes with array Jon Harrop
  2004-09-09  5:08 ` Ville-Pertti Keinonen
  2004-09-09  7:17 ` Jean-Christophe Filliatre
@ 2004-09-09  9:37 ` Damien Doligez
  2004-09-09 10:34   ` Jean-Christophe Filliatre
                     ` (2 more replies)
  2004-09-10 23:48 ` brogoff
  3 siblings, 3 replies; 31+ messages in thread
From: Damien Doligez @ 2004-09-09  9:37 UTC (permalink / raw)
  To: caml users

On Sep 9, 2004, at 04:10, Jon Harrop wrote:

> I'm increasingly finding the outrageously finite size limit of arrays 
> to be a
> pain.

How I would like to get rid of that limit!

> Am I right in thinking that the maximum
> non-float array size on a 64-bit machine is 18,014,398,509,481,983?

That's correct.  It's a good thing that 32-bitters are on their way out.

> Also, can Array.init be made to fill the elements only once?

No, that's impossible without breaking the GC invariants.

>  This would make quite a few things twice as fast

Twice?  I doubt it very much.

> Array.copy could then be written more succinctly and
> efficiently in terms of Array.init as:
>
> let copy a = init (length a) (fun i -> a.(i))

Exactly how it's written now, except that it's inlined by hand for
performance reasons.

> Does anyone have any pointers to information about the origin of the 
> size
> limit for arrays? I assume it is something to do with the garbage 
> collector
> using a fixed-size tag instead of a variable-size one but I'd be 
> interested
> in the details.

Yes, but the main use of the tag is not garbage collection.


On Sep 9, 2004, at 09:17, Jean-Christophe Filliatre wrote:

> But I must  agree with you: this is definitely too  small and we could
> imagine  that, when the  tag says  a block  is an  array, the  size is
> stored within the first (or the last) field instead.

The last, really?


On Sep 9, 2004, at 10:23, Richard Jones wrote:

> I have a similar problem with the maximum size of strings.  In
> practical terms, it limits the size of file uploads to COCANWIKI to
> around 6 MB (ie., not very much) [not the full 16 MB because of
> character escaping, but even 16 MB would be far too small].

Maybe you should use bigarrays instead of strings.


A general remark: for more details on the internal representation of
O'Caml values, you can read < 
http://caml.inria.fr/ocaml/htmlman/manual032.html >.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  9:37 ` Damien Doligez
@ 2004-09-09 10:34   ` Jean-Christophe Filliatre
  2004-09-09 12:15     ` Igor Pechtchanski
  2004-09-09 13:01   ` Brian Hurt
  2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
  2 siblings, 1 reply; 31+ messages in thread
From: Jean-Christophe Filliatre @ 2004-09-09 10:34 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users


Damien Doligez writes:
 > 
 > > But I must  agree with you: this is definitely too  small and we could
 > > imagine  that, when the  tag says  a block  is an  array, the  size is
 > > stored within the first (or the last) field instead.
 > 
 > The last, really?

How stupid I am! I was thinking of not adding an extra addition to the
array access, keeping the first array  element at field 0 but it is of
course ridiculous.

-- 
Jean-Christophe

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  8:23   ` Richard Jones
  2004-09-09  9:08     ` Olivier Andrieu
@ 2004-09-09 10:42     ` Gerd Stolpmann
  1 sibling, 0 replies; 31+ messages in thread
From: Gerd Stolpmann @ 2004-09-09 10:42 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

Am Don, 2004-09-09 um 10.23 schrieb Richard Jones:
> On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
> > 
> > Jon Harrop writes:
> >  > 
> >  > Does anyone have any pointers to information about the origin of the size 
> >  > limit for arrays? I assume it is something to do with the garbage collector 
> >  > using a fixed-size tag instead of a variable-size one but I'd be interested 
> >  > in the details.
> > 
> > In ocaml sources, the  file byterun/mlvalues.h gives all details about
> > the  block  header structure.  There  you can  see  that,  on 32  bits
> > architecture,  the block size  (in words)  is stored  on 22  bits. And
> > indeed Sys.max_array_length is equal to 2^22-1.
> > 
> > But I must  agree with you: this is definitely too  small and we could
> > imagine  that, when the  tag says  a block  is an  array, the  size is
> > stored within the first (or the last) field instead.
> 
> I have a similar problem with the maximum size of strings.  In
> practical terms, it limits the size of file uploads to COCANWIKI to
> around 6 MB (ie., not very much) [not the full 16 MB because of
> character escaping, but even 16 MB would be far too small].

You can switch to OCamlnet as base library which does not have this
limitation. One can directly write the uploaded data to disk.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  9:08     ` Olivier Andrieu
@ 2004-09-09 12:08       ` Basile Starynkevitch [local]
  2004-09-09 12:31         ` Damien Doligez
  0 siblings, 1 reply; 31+ messages in thread
From: Basile Starynkevitch [local] @ 2004-09-09 12:08 UTC (permalink / raw)
  To: caml-list

On Thu, Sep 09, 2004 at 11:08:50AM +0200, Olivier Andrieu wrote:
>  Richard Jones [Thu, 9 Sep 2004]:
>  > On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
>  > > 
>  > > Jon Harrop writes:
>  > >  > 
>  > >  > Does anyone have any pointers to information about the origin of the size 
>  > >  > limit for arrays? [....]
>  > > 
>  > > In ocaml sources, the  file byterun/mlvalues.h gives all details about
>  > > the  block  header structure.  [....]
>  > > 
>  > > But I must  agree with you: this is definitely too  small and we could
>  > > imagine  that, when the  tag says  a block  is an  array, the  size is
>  > > stored within the first (or the last) field instead.

I do agree that using Bigarrays is the way to go, until everyone has a
64 bits machine. For completeness, we could also consider the
following scheme (which I am NOT volunteering to implement!)

   The header layout remains the same (so only 22 bits for size on 32
   bits machine), but if the size is all bit ones, the block is
   actually a fixed block, and the real array size is the word before
   the header.

However, I see another potential problem (which could already
potentially appear today, with standard 0caml 3.08, on 64 bits
machines with more than 1Gbyte of RAM). If you have a huge array of
pointers, the garbage collector (even the minor one) has to scan the
full array - and this scan is "atomic" in the sense that it is not
interruptible (and I believe that designing a GC which incrementally
scans big values by chunks is not trivial, given Ocaml GC performance
needs). So for an array of say 300 million pointers, the GC has to
scan it, which takes a significant time (I would guess several tenths
of seconds at least, just scanning this single array).

I am asking, do lucky people with a 64 bits machines and plenty of RAM
did experiment some bizarre GC behavior when handling such monster
pointer arrays in Ocaml - for example,
   let monster =  Array.init 300_000_000 (fun i -> ((Printf.sprintf "int%d" i), i))

More practically, I would be curious to hear from people having run
Ocaml programs (on rather big 64 bits hardware, with multigigabyte
RAM) in processes of more than a gigabyte of Ocaml heap! Does the GC
works well in its default setting, or have they to tune it?

>  > I have a similar problem with the maximum size of strings.  In
>  > practical terms, it limits the size of file uploads to COCANWIKI to
>  > around 6 MB (ie., not very much) [not the full 16 MB because of
>  > character escaping, but even 16 MB would be far too small]. [....]

FWIW, I had similar limitations in Poesia more than a year ago. I
solved it by specifying that Poesia (a web filter) won't handle web
content of more than ten million bytes. (Maybe an enhanced buffer
package, representing buffer contents in array of strings, could
help).

> 
>  > Does the tag field need to be so wide?  What does the tag mean if it
>  > has different values < No_scan_tag (251)?
> 
> it's for variants (with or without arguments)

Apparently, people are less bitten by the maximal number of
variants. I guess that most applications don't have big sum (ie
variant) types with more than a hundred of non-empty choices (ie
Choice of ... construct).

Regarding very big data structures, I tend to believe that they should
be more organized than just a single monster array (hence the current
array limits on 32 bits machine is a sensible tradeoff), even on 64
bits irons. But I have no practical experience on these.

-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project cristal.inria.fr -  temporarily
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09 10:34   ` Jean-Christophe Filliatre
@ 2004-09-09 12:15     ` Igor Pechtchanski
  0 siblings, 0 replies; 31+ messages in thread
From: Igor Pechtchanski @ 2004-09-09 12:15 UTC (permalink / raw)
  To: Jean-Christophe Filliatre; +Cc: caml-list

On Thu, 9 Sep 2004, Jean-Christophe Filliatre wrote:

> Damien Doligez writes:
>  >
>  > > But I must  agree with you: this is definitely too  small and we could
>  > > imagine  that, when the  tag says  a block  is an  array, the  size is
>  > > stored within the first (or the last) field instead.
>  >
>  > The last, really?
>
> How stupid I am! I was thinking of not adding an extra addition to the
> array access, keeping the first array  element at field 0 but it is of
> course ridiculous.

I believe the usual solution for this is to store the array size at
negative offset from the array header, but that changes the heap layout
slightly, and affects the GC logic, among other things...
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"Happiness lies in being privileged to work hard for long hours in doing
whatever you think is worth doing."  -- Dr. Jubal Harshaw

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09 12:08       ` Basile Starynkevitch [local]
@ 2004-09-09 12:31         ` Damien Doligez
  0 siblings, 0 replies; 31+ messages in thread
From: Damien Doligez @ 2004-09-09 12:31 UTC (permalink / raw)
  To: caml users

On Sep 9, 2004, at 14:08, Basile Starynkevitch [local] wrote:

>    The header layout remains the same (so only 22 bits for size on 32
>    bits machine), but if the size is all bit ones, the block is
>    actually a fixed block, and the real array size is the word before
>    the header.

The real size would have to be after the header, not before it, because
you cannot store anything before the header.

> If you have a huge array of
> pointers, the garbage collector (even the minor one) has to scan the
> full array - and this scan is "atomic" in the sense that it is not
> interruptible (and I believe that designing a GC which incrementally
> scans big values by chunks is not trivial, given Ocaml GC performance
> needs).

The minor GC will never scan such a huge array because you won't find
it in the minor heap.  What you say is still true of the major GC.

> Regarding very big data structures, I tend to believe that they should
> be more organized than just a single monster array (hence the current
> array limits on 32 bits machine is a sensible tradeoff), even on 64
> bits irons. But I have no practical experience on these.

I fixed a bug in a camlp4 lexer recently, where it needed an
(extensible) array larger than 16 million entries.  Implementing an
array of arrays increased the code size by as much as 6 lines...

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  9:37 ` Damien Doligez
  2004-09-09 10:34   ` Jean-Christophe Filliatre
@ 2004-09-09 13:01   ` Brian Hurt
  2004-09-09 20:08     ` [Caml-list] 32-bit is sticking around Brandon J. Van Every
  2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
  2 siblings, 1 reply; 31+ messages in thread
From: Brian Hurt @ 2004-09-09 13:01 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

On Thu, 9 Sep 2004, Damien Doligez wrote:

> That's correct.  It's a good thing that 32-bitters are on their way out.

Absent the x86, 32-bitters are already out on servers.  They're on their 
way out on desktops.  But they are still huge in the embedded world, and 
likely to remain so for quite some time.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  9:37 ` Damien Doligez
  2004-09-09 10:34   ` Jean-Christophe Filliatre
  2004-09-09 13:01   ` Brian Hurt
@ 2004-09-09 16:58   ` Jon Harrop
  2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
  2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
  2 siblings, 2 replies; 31+ messages in thread
From: Jon Harrop @ 2004-09-09 16:58 UTC (permalink / raw)
  To: caml users; +Cc: Damien Doligez

On Thursday 09 September 2004 10:37, Damien wrote:
> ...
> > Am I right in thinking that the maximum
> > non-float array size on a 64-bit machine is 18,014,398,509,481,983?
>
> That's correct.  It's a good thing that 32-bitters are on their way out.

If I upgrade I'll surely end up playing Doom 3 and not get any work 
done... :-)

> > Also, can Array.init be made to fill the elements only once?
>
> No, that's impossible without breaking the GC invariants.

Could it not even be done by dragging Array.init inside the compiler, giving 
it the same status as Array.make?

> >  This would make quite a few things twice as fast
>
> Twice?  I doubt it very much.

That was an estimate based upon the assumption that, on large arrays (well, 
~4M elements ;-), the time taken is limited by the filling of elements which 
is currently done twice but which only needs to be done once. I believe this 
is justified because of the high-cost of memory writes (to main memory for 
out-of-cache sized arrays), even sequential ones, compared to (trivial, 
inlineable) function calls, a single heap allocation etc.

What is the bottleneck in the asymptotic limit?

For measurements on 4,000,000 element int arrays (using the code at the end of 
this mail) I get:

Array.make took 0.131528823272 secs.
Array.init took 0.311059344899 secs.
array_init took 0.179279577732 secs.

Measuring memset from C gives me 0.0311secs. So element-setting must be at 
least 10% of Array.init. Also, the array_init function is surprisingly fast, 
presumably due to "f" not being inlined into Array.init but being inlined 
into array_init.

This came up because my wavelet transform code in OCaml is within 15% of the 
performance of my equivalent C version excluding the cost of creating the 
array. Including that cost (even with calloc), the C version is twice as 
fast. Admittedly calloc will use memset, and not set the elements "properly", 
but even so...

> > let copy a = init (length a) (fun i -> a.(i))
>
> Exactly how it's written now, except that it's inlined by hand for
> performance reasons.

Array.copy took 0.298557505888 secs.
array_copy took 0.315200943696 secs.

This optimisation gives a <6% performance improvement (and this is really 
best-case for large arrays because the filling-function is trivial in this 
case). I'd have gone for five times less code in the array module and more 
code in the compiler... ;-)

Perhaps the current versions are significantly faster on smaller data 
structures...

Cheers,
Jon.

-----

let f i = 1+i

let array_init l =
  if l = 0 then [||] else
  let res = Array.make l (f 0) in
  for i = 1 to pred l do
    Array.unsafe_set res i (f i)
  done;
  res 

let array_copy a = Array.init (Array.length a) (fun i -> Array.unsafe_get a i)

let time f =
  let time = Unix.gettimeofday in
  let t = time () in
  ignore (f ());
  (time ()) -. t

let _ =
  let timings = Array.make 5 (0., 0) in
  let l = 4000000 in
  let a = Array.make l 0 in
  for i=0 to 100 do
    let entry = Random.int 5 in
    let t =
      time (match entry with
	0 -> fun () -> Array.make l 0
      | 1 -> fun () -> Array.init l f
      | 2 -> fun () -> array_init l
      | 3 -> fun () -> Array.copy a
      | 4 -> fun () -> array_copy a)
    in
    timings.(entry) <-
      let (ot, n) = timings.(entry) in
      (ot +. t, n+1);
  done;
  let entry = [| "Array.make";
		 "Array.init";
		 "array_init";
		 "Array.copy";
		 "array_copy" |] in
  for i=0 to 4 do
    print_endline (entry.(i)^": "^(string_of_int (snd timings.(i))))
  done;
  let timings =
    Array.map (fun (t, n) -> string_of_float (t /. float_of_int n)) timings in
  for i=0 to 4 do
    print_endline (entry.(i)^" took "^timings.(i)^" secs.")
  done

-----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Caml-list] 32-bit is sticking around
  2004-09-09 13:01   ` Brian Hurt
@ 2004-09-09 20:08     ` Brandon J. Van Every
  2004-09-09 21:04       ` Jon Harrop
  0 siblings, 1 reply; 31+ messages in thread
From: Brandon J. Van Every @ 2004-09-09 20:08 UTC (permalink / raw)
  To: caml-list

Brian Hurt wrote:
>
> Absent the x86, 32-bitters are already out on servers.
> They're on their
> way out on desktops.  But they are still huge in the embedded
> world, and likely to remain so for quite some time.

Saying that 32-bit is "on the way out" on the desktop is tremendous
hyperbole.  What's really happening is soon we'll have a forward path on
the x86 desktop *to* 64-bit.  That ain't here yet in volume, and the
whole point of the AMD architecture is assuring backwards compatibility
for 32-bit.  It's going to be 5 years before 64-bit is happening on the
desktop in any dominant quantity, and 10 years before 32-bit actually
goes away.

To think otherwise is just moooing about what you want to happen rather
than what is actually going to happen and has always happened.  Consider
that only in Longhorn will Microsoft finally kill *16*-bit!

You guys can all dream on about your 64-bit machines.  I mastered 64-bit
on the DEC Alpha in 1996.  Then Compaq and Intel killed it.  I'm still
sad about that, and I hate Intel ASM.  The register poverty *sucks* !
The x87 FPU stack *sucks* !  SSE *sucks* !  Intel has always won by
getting shoddy products to market quickly, never by producing anything
that's any good.  They don't call it "the Wintel hegemony" for nuthin',
it's the hardware equivalent of Microsoft's business model.


Cheers,                         www.indiegamedesign.com
Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  Bigarray!
Unboxed overhead group!  Wondering!  chant chant chant...

Is my technical content showing?

// return an array of 100 packed tuples
temps
  int $[tvar0][2*100]; // what the c function needs
  value $[tvar1]; // one int
  value $[tvar2]; // one tuple
  int $[tvar3] // loop control var
oncePre
eachPre
  $[cvar0]=&($[tvar0][0]);
eachPost
  $[lvar0] = alloc(2*100, 0 /*NB: zero-tagged block*/ );
  for(int $[tvar3]=0;$[tvar3]<100;$[tvar3]++) {
    $[tvar2] = alloc_tuple(2);
    $[tvar1] = Val_int($[cvar0][0+2*$[tvar3]]);
    Store_field($[tvar2],0,$[tvar1]);
    $[tvar1] = Val_int($[cvar0][1]);
    Store_field($[tvar2],1,$[tvar1+2*$[tvar3]]);
    Array_store($[lvar0],$[tvar3],$[tvar0]);
  }
oncePost

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] 32-bit is sticking around
  2004-09-09 20:08     ` [Caml-list] 32-bit is sticking around Brandon J. Van Every
@ 2004-09-09 21:04       ` Jon Harrop
  2004-09-11 15:30         ` Lars Nilsson
  0 siblings, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2004-09-09 21:04 UTC (permalink / raw)
  To: Brandon J. Van Every; +Cc: caml-list

On Thursday 09 September 2004 21:08, Brandon J. Van Every wrote:
> Intel has always won by getting shoddy products to market quickly, never by 
> producing anything that's any good.

StrongARM?

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Array.init (was [Caml-list] Gripes with array)
  2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
@ 2004-09-10  5:56     ` Christophe Raffalli
  2004-09-10  8:53       ` Richard Jones
  2004-09-13  7:02       ` Christophe Raffalli
  2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
  1 sibling, 2 replies; 31+ messages in thread
From: Christophe Raffalli @ 2004-09-10  5:56 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml users, Damien Doligez

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]


> 
>>>Also, can Array.init be made to fill the elements only once?
>>
>>No, that's impossible without breaking the GC invariants.
> 

This is not true if the runtime maintains a list of array in 
construction with the current position of the index. This list will stay
rather small, but each addresse read by the GC will have to be checked 
for membership in the list. If the list is rather small it will be in 
the cache, but still I am afraid it will slow down noticably the GC.

Did someone tried to implement such a list of partially initialized 
objects in the GC ?

You could also lie in the tag about the size of array (if the way the 
runtime finds free block of memory does not use it). It will cost an 
increment of integer at each step in the initialisation process which 
should not be much since the beginning of array may stay in the cache if 
the initialisation function is simple and this will be neggligeable if not.


-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Array.init (was [Caml-list] Gripes with array)
  2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
@ 2004-09-10  8:53       ` Richard Jones
  2004-09-10 14:50         ` Damien Doligez
  2004-09-13  7:02       ` Christophe Raffalli
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Jones @ 2004-09-10  8:53 UTC (permalink / raw)
  Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Fri, Sep 10, 2004 at 07:56:16AM +0200, Christophe Raffalli wrote:
> 
> >
> >>>Also, can Array.init be made to fill the elements only once?
> >>
> >>No, that's impossible without breaking the GC invariants.
>
> You could also lie in the tag about the size of array (if the way the 
> runtime finds free block of memory does not use it). It will cost an 
> increment of integer at each step in the initialisation process which 
> should not be much since the beginning of array may stay in the cache if 
> the initialisation function is simple and this will be neggligeable if not.

Can one set the tag first to Custom_tag, then once array
initialization is complete, set the tag again to 0 (or
Array_double_tag as appropriate)?

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://www.winwinsales.co.uk/ - CRM improvement consultancy

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
  2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
@ 2004-09-10 13:45     ` Damien Doligez
  2004-09-11  1:43       ` skaller
  2004-09-11 14:36       ` Jon Harrop
  1 sibling, 2 replies; 31+ messages in thread
From: Damien Doligez @ 2004-09-10 13:45 UTC (permalink / raw)
  To: caml users

On Sep 9, 2004, at 18:58, Jon Harrop wrote:

>>> Also, can Array.init be made to fill the elements only once?
>> No, that's impossible without breaking the GC invariants.
>
> Could it not even be done by dragging Array.init inside the compiler, 
> giving
> it the same status as Array.make?

Not without implementing such horrors as Christophe described.  I don't 
much
like the idea of introducing lots of bugs while slowing down the whole 
GC,
even for programs that don't use arrays.  An intermediate solution would
be to make a "Array.unsafe_make" primitive, which would use memset 
instead
of initialising the array properly.  If you enter this as a feature wish
in the BTS, I'll look into it.

> Measuring memset from C gives me 0.0311secs. So element-setting must 
> be at
> least 10% of Array.init. Also, the array_init function is surprisingly 
> fast,
> presumably due to "f" not being inlined into Array.init but being 
> inlined
> into array_init.

That would confirm my intuition that the calls to f dominates the
initialisation time.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Array.init (was [Caml-list] Gripes with array)
  2004-09-10  8:53       ` Richard Jones
@ 2004-09-10 14:50         ` Damien Doligez
  0 siblings, 0 replies; 31+ messages in thread
From: Damien Doligez @ 2004-09-10 14:50 UTC (permalink / raw)
  To: caml users

On Sep 10, 2004, at 10:53, Richard Jones wrote:

> Can one set the tag first to Custom_tag, then once array
> initialization is complete, set the tag again to 0 (or
> Array_double_tag as appropriate)?

No.  If you do that, the pointers that you have already set
will not be traced by the GC -> Bus Error.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-09  2:10 [Caml-list] Gripes with array Jon Harrop
                   ` (2 preceding siblings ...)
  2004-09-09  9:37 ` Damien Doligez
@ 2004-09-10 23:48 ` brogoff
  3 siblings, 0 replies; 31+ messages in thread
From: brogoff @ 2004-09-10 23:48 UTC (permalink / raw)
  To: caml-list

On Thu, 9 Sep 2004, Jon Harrop wrote:
> I'm increasingly finding the outrageously finite size limit of arrays to be a
> pain. In particular, I'm peeved that the size limit is itself a function of
> the type which, therefore, makes writing polymorphic functions over arrays
> nay-on impossible (e.g. to make an array of maximum-sized arrays).

I'm surprised that there were no comments on this. Since the distinction for
floats is a performance hack, wouldn't it make more sense to just have a
separate module for fast float arrays?

-- Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
@ 2004-09-11  1:43       ` skaller
  2004-09-11  3:16         ` skaller
  2004-09-11 14:36       ` Jon Harrop
  1 sibling, 1 reply; 31+ messages in thread
From: skaller @ 2004-09-11  1:43 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

On Fri, 2004-09-10 at 23:45, Damien Doligez wrote:

>  An intermediate solution would
> be to make a "Array.unsafe_make" primitive, which would use memset 
> instead of initialising the array properly.  

Yeah but that doesn't solve the problem of filling
the array initially with some 'non-binary-zero' value.
You'd still need two passes: the memset and the
proper fill (hence paging all the memory in twice).

AFAICS tracking how much of an array is initialised
with an index the GC can use costs a single
comparison when you're not initialising arrays.
EG: the GC has a list of blocks undergoing
initialisation, determining the list is empty
should be a single machine instruction.

If you're initialising an array without any
recursion, that's two comparisons.

So the only time there would be a serious impact
on the GC would be if you're initialising many
big arrays all at once (EG if you're making a matrix
as an array of arrays), and that cost would be
small compared to scanning twice.

I can't predict the performance of the one
extra comparison needed when no arrays are
being initialised because it probably depends
on the processor cache design -- how much
can one machine instruction + one memory
reference cost?

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-11  1:43       ` skaller
@ 2004-09-11  3:16         ` skaller
  0 siblings, 0 replies; 31+ messages in thread
From: skaller @ 2004-09-11  3:16 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

On Sat, 2004-09-11 at 11:43, skaller wrote:
> On Fri, 2004-09-10 at 23:45, Damien Doligez wrote:
> 
> >  An intermediate solution would
> > be to make a "Array.unsafe_make" primitive, which would use memset 
> > instead of initialising the array properly.  

> 
> AFAICS tracking how much of an array is initialised
> with an index the GC can use costs a single
> comparison when you're not initialising arrays.

Specifically: I assume there is some GC 'object'
containing the state data for the collector.

Suppose we add a doubly linked list
of the type

struct pia { 
  pia *next; 
  pia *prev; 
  caml_heap_block *block; 
  unsigned long index; 
} *pial;

to the GC state data. pial means
'pointer to initialising array list'

In order to sweep a block *p for pointers,
you'd need to do this:

if (pial == NULL) usual_scan(p)
else {
  unsigned long index = find_index(pias,p);
  if (index) scan_array(p,index));
  else usual_scan(p);
} 

usual_scan() is the usual scanning algorithm
for a heap block, array_scan is a special
one where the initialised block length is
tracked. The index is incremented when filling
in the array. If the fill is using a fixed value,
every 4K elements, if a function, each element.

All the scan calls are tail recurisve.
I imagine usual_scan and array_scan would
be the different entry points to the same
routine. Clearly the array initialiser
has to be able to add and remove a the pia descriptor
from the list. In addition, the compactor would
need to adjust the heap pointers in the list.

So in the case we're not currently initialising
an array, we require one memory fetch, a load
of the value 'pial'.

Perhaps 'pial' would reside in the cache,
even if it does it is displacing one other value.
So there is a definite cost for all code with
this technique.

BTW: the Felix collector has an array index count
in every heap header block. The actual block
size is independent of this count. As I understand
it the block size indicator in Ocaml is *also* used
to determine string and array sizes which prevents
it being adjusted dynamically during initialisation?

This would be in case there is an exception thrown 
and the array become prematurely unreachable,
since otherwise the array can't be disposed until
it is initialised, and the disparity between
the dynamic and static length would not matter.
The static length is required to free the memory.

This suggests an alternate solution with zero
impact on the sweep: swap the meaning of the
length count in the heap block and pial list.

This means the 'free' routine that disposes
of unreachable heap memory would have to check
the list to find the real memory length.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
  2004-09-11  1:43       ` skaller
@ 2004-09-11 14:36       ` Jon Harrop
  2004-09-11 20:53         ` Damien Doligez
  1 sibling, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2004-09-11 14:36 UTC (permalink / raw)
  To: caml users

On Friday 10 September 2004 14:45, Damien Doligez wrote:
> Not without implementing such horrors as Christophe described.  I don't
> much
> like the idea of introducing lots of bugs while slowing down the whole
> GC,
> even for programs that don't use arrays.

Yes indeed. I like Christophe's last idea though:

On Friday 10 September 2004 06:56, Christophe Raffalli wrote:
> You could also lie in the tag about the size of array (if the way the
> runtime finds free block of memory does not use it). It will cost an
> increment of integer at each step in the initialisation process which
> should not be much since the beginning of array may stay in the cache if
> the initialisation function is simple and this will be neggligeable if not.

What is the trickiest/most-error-prone part of doing this and could this be 
used to implement amortised extensible arrays within the compiler? I would 
like to see such a thing in the compiler (not that I distrust Markus ;-).

On Friday 10 September 2004 14:45, Damien Doligez wrote:
> An intermediate solution would 
> be to make a "Array.unsafe_make" primitive, which would use memset
> instead
> of initialising the array properly.  If you enter this as a feature wish
> in the BTS, I'll look into it.

No, I don't think the performance improvement would justify your time or the 
loss of safety. Could you add a memset to String.create though? :-)

> That would confirm my intuition that the calls to f dominates the
> initialisation time.

In the case of Array.init, yes. From my array_init, inlining and 
type-specialisation decrease the time taken by ~43% and, hence, these would 
be the most productive optimisations.

An array_init2 function which specialises the array type but uses an external 
"f" function betters the time taken by Array.init by ~36% (perhaps not 
significantly different):

let array_init2 l f =
  if l = 0 then [||] else
  let x : int = f 0 in
  let res = Array.make l x in
  for i = 1 to pred l do
    Array.unsafe_set res i (f i)
  done;
  res

I'm not sure what the impact of the type-specialisation on inlining is though. 
I'd imagine that doing such partial specialisations in ocamlopt is 
arbitrarily dodgy because you don't get any feedback on the benfits of your 
results. Perhaps this is a challenge for Basile's JIT? :-)

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] 32-bit is sticking around
  2004-09-09 21:04       ` Jon Harrop
@ 2004-09-11 15:30         ` Lars Nilsson
  2004-09-11 16:24           ` [off topic] " David MENTRE
       [not found]           ` <200409111656.11952.jon@jdh30.plus.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Lars Nilsson @ 2004-09-11 15:30 UTC (permalink / raw)
  To: caml-list

On Thu, 9 Sep 2004 22:04:02 +0100, Jon Harrop <jon@jdh30.plus.com> wrote:
> On Thursday 09 September 2004 21:08, Brandon J. Van Every wrote:
> > Intel has always won by getting shoddy products to market quickly, never by
> > producing anything that's any good.
> 
> StrongARM?
> 
> Cheers,
> Jon.

You mean the technology they bought instead of develop it?

http://en.wikipedia.org/wiki/StrongARM

Lars

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [off topic] Re: [Caml-list] 32-bit is sticking around
  2004-09-11 15:30         ` Lars Nilsson
@ 2004-09-11 16:24           ` David MENTRE
  2004-09-11 17:52             ` Lars Nilsson
       [not found]           ` <200409111656.11952.jon@jdh30.plus.com>
  1 sibling, 1 reply; 31+ messages in thread
From: David MENTRE @ 2004-09-11 16:24 UTC (permalink / raw)
  To: Lars Nilsson; +Cc: caml-list

Lars Nilsson <chamaeleon@gmail.com> writes:

> You mean the technology they bought instead of develop it?
>
> http://en.wikipedia.org/wiki/StrongARM

And you haven't seen all the gory details (e.g. doubles are stored in
little endian but long long in big endian). :)

Yours,
d.
-- 
 David Mentré <dmentre@linux-france.org>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] 32-bit is sticking around
       [not found]           ` <200409111656.11952.jon@jdh30.plus.com>
@ 2004-09-11 17:47             ` Lars Nilsson
  0 siblings, 0 replies; 31+ messages in thread
From: Lars Nilsson @ 2004-09-11 17:47 UTC (permalink / raw)
  To: caml-list

On Sat, 11 Sep 2004 16:56:11 +0100, Jon Harrop <jon@jdh30.plus.com> wrote:
> On Saturday 11 September 2004 16:30, Lars Nilsson wrote:
> > > StrongARM?
> >
> > You mean the technology they bought instead of develop it?
> 
> Intel have developed it quite substantially, IMHO. AFAIK, nobody else has made
> such a superscalar/deeply-pipelined CPU with such a conditional instruction
> set before, and that's not easy.
> 
> Cheers,
> Jon.

Sure, they have probably done quite a bit of innovation with it by
now. All I know is that back in 1987-88 I was happily doing assembly
GUI programming just because it (ARM) was so darn easy to work with,
while x86 made my stomach turn.

Lars

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [off topic] Re: [Caml-list] 32-bit is sticking around
  2004-09-11 16:24           ` [off topic] " David MENTRE
@ 2004-09-11 17:52             ` Lars Nilsson
  0 siblings, 0 replies; 31+ messages in thread
From: Lars Nilsson @ 2004-09-11 17:52 UTC (permalink / raw)
  To: caml-list

On Sat, 11 Sep 2004 18:24:37 +0200, David MENTRE
<dmentre@linux-france.org> wrote:
> Lars Nilsson <chamaeleon@gmail.com> writes:
> 
> > You mean the technology they bought instead of develop it?
> >
> > http://en.wikipedia.org/wiki/StrongARM
> 
> And you haven't seen all the gory details (e.g. doubles are stored in
> little endian but long long in big endian). :)
> 
> Yours,
> d.

Was it necessary to bring up the achilles heal of the ARM family
(floating point)? :)

Lars

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-11 14:36       ` Jon Harrop
@ 2004-09-11 20:53         ` Damien Doligez
  2004-09-12 15:33           ` Jon Harrop
  0 siblings, 1 reply; 31+ messages in thread
From: Damien Doligez @ 2004-09-11 20:53 UTC (permalink / raw)
  To: caml users

On Sep 11, 2004, at 16:36, Jon Harrop wrote:

> On Friday 10 September 2004 06:56, Christophe Raffalli wrote:
>> You could also lie in the tag about the size of array (if the way the
>> runtime finds free block of memory does not use it). It will cost an
>> increment of integer at each step in the initialisation process which
>> should not be much since the beginning of array may stay in the cache 
>> if
>> the initialisation function is simple and this will be neggligeable 
>> if not.
>
> What is the trickiest/most-error-prone part of doing this and could 
> this be
> used to implement amortised extensible arrays within the compiler? I 
> would
> like to see such a thing in the compiler (not that I distrust Markus 
> ;-).

The "if" part is false...  I suppose it might be possible to change
representations (true length in header and initialized part stored 
off-line /
initialized part in header / true length off-line) when the GC switches
between sweeping and marking.  We would also need several additional
primitives to create, update, and destroy the "off-line" part.  Frankly,
I find it hard to imagine that it would give a noticeable run-time
improvement on any program.  Most likely, it would make them all a 
fraction
of a percent slower, except for the most synthetic of benchmarks.

> On Friday 10 September 2004 14:45, Damien Doligez wrote:
>> An intermediate solution would
>> be to make a "Array.unsafe_make" primitive, which would use memset
>> instead of initialising the array properly.  If you enter this as
>> a feature wish in the BTS, I'll look into it.
>
> No, I don't think the performance improvement would justify your time 
> or the
> loss of safety.

No loss of safety if we don't export that primitive and use it only in
Array.init.  After all, it only breaks type safety, and only if you use
it wrong.

>  Could you add a memset to String.create though? :-)

String.make

> An array_init2 function which specialises the array type but uses an 
> external
> "f" function betters the time taken by Array.init by ~36% (perhaps not
> significantly different):

That's from getting rid of the float/non-float test.  Nothing to do with
the cost of initializing twice.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-11 20:53         ` Damien Doligez
@ 2004-09-12 15:33           ` Jon Harrop
  2004-09-12 16:07             ` Basile Starynkevitch [local]
  0 siblings, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2004-09-12 15:33 UTC (permalink / raw)
  To: caml users

On Saturday 11 September 2004 21:53, Damien Doligez wrote:
> ... Frankly,
> I find it hard to imagine that it would give a noticeable run-time
> improvement on any program.  Most likely, it would make them all a
> fraction
> of a percent slower, except for the most synthetic of benchmarks.

If it let you do extensible arrays safely then it could be used to reduce the 
asymptotic complexity of array append- or prepend-element from O(n) to 
amortized O(1) and "i"th insert from O(n) to amortized O(min{i, n-i}).

> > No, I don't think the performance improvement would justify your time
> > or the
> > loss of safety.
>
> No loss of safety if we don't export that primitive and use it only in
> Array.init.  After all, it only breaks type safety, and only if you use
> it wrong.

I was assuming that you would export it. My discrete wavelet transform (DWT) 
code, for example, is easily proven to fill all array elements but it fills 
them out of order (I've considered ways to make it fill in-order but they all 
incur significant performance penalties elsewhere, e.g. lots of swapping). So 
I was thinking of using unsafe_make for that. Still, I don't think it would 
be worth your writing it.

Having said that, it might be possible to abstract the array lookup and 
replace it with a clever function to work out where the "i"th element really 
is. Then you could give Array.init a continuation. Hmm...

> >  Could you add a memset to String.create though? :-)
>
> String.make

I meant for safety reasons - either don't export String.create or make it 
safe, e.g. "let create n = make n '\000'".

> > An array_init2 function which specialises the array type but uses an
> > external
> > "f" function betters the time taken by Array.init by ~36% (perhaps not
> > significantly different):
>
> That's from getting rid of the float/non-float test.  Nothing to do with
> the cost of initializing twice.

Yes. Optimising that would be much more productive. Am I right in thinking 
that ocamlopt doesn't hoist the test out of the loop?

Such optimisations would be best done at run-time, e.g. by Basile's JIT. Where 
is he? ;-)

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Gripes with array
  2004-09-12 15:33           ` Jon Harrop
@ 2004-09-12 16:07             ` Basile Starynkevitch [local]
  0 siblings, 0 replies; 31+ messages in thread
From: Basile Starynkevitch [local] @ 2004-09-12 16:07 UTC (permalink / raw)
  To: caml-list

On Sun, Sep 12, 2004 at 04:33:38PM +0100, Jon Harrop wrote:
> 
> Having said that, it might be possible to abstract the array lookup and 
> replace it with a clever function to work out where the "i"th element really 
> is. Then you could give Array.init a continuation. Hmm...
> 
> > >  Could you add a memset to String.create though? :-)
> >
> > String.make
> 
> I meant for safety reasons - either don't export String.create or make it 
> safe, e.g. "let create n = make n '\000'".
> 
> > > An array_init2 function which specialises the array type but uses an
> > > external
> > > "f" function betters the time taken by Array.init by ~36% (perhaps not
> > > significantly different):
> >
> > That's from getting rid of the float/non-float test.  Nothing to do with
> > the cost of initializing twice.
> 
> Yes. Optimising that would be much more productive. Am I right in thinking 
> that ocamlopt doesn't hoist the test out of the loop?

> Such optimisations would be best done at run-time, e.g. by Basile's
> JIT. Where is he? ;-)

I'm here (not for very long, I'm going back in a few days to CEA
working on other things, probably unrelated to Ocaml).

However, this optimisation is not for OcamlJIT, which is an
unoptimizing JIT. The main requirement of OcamlJIT was full
compatibility with Ocamlrun without touching a single line in the
ocaml/**/*.ml* sources (the ** is zsh notation) of Ocaml (actually, I
had to cheat for the toplevel where I added a single line, now
incorporated in 3.08).

Of course, one could dream of substantially change the bytecode
instruction set (for better JIT translation). But it won't be
compatible with Ocaml.

If you ask me, the one feature I would have liked in the bytecode
instruction set, is to add a LABEL bytecode, followed by a useless
word (used by ocamljit to store a pointer to machine code), and to
change the bytecode generator so that every closure had its code
pointer pointing to this LABEL bytecode.

But the rule of the OcamlJIT game was to leave Ocaml as much as
possible unchanged.

BTW, I would be delighted to have feedback of people using OcamlJit
with Ocaml 3.08.

Regards

-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
other email : basile at starynkevitch dot net
Project cristal.inria.fr - for a few days
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Array.init (was [Caml-list] Gripes with array)
  2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
  2004-09-10  8:53       ` Richard Jones
@ 2004-09-13  7:02       ` Christophe Raffalli
  1 sibling, 0 replies; 31+ messages in thread
From: Christophe Raffalli @ 2004-09-13  7:02 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: Jon Harrop, caml users, Damien Doligez

[-- Attachment #1: Type: text/plain, Size: 2454 bytes --]

Christophe Raffalli wrote:
> 
>>
>>>> Also, can Array.init be made to fill the elements only once?
>>>
>>>
>>> No, that's impossible without breaking the GC invariants.
>>
>>
> 
> This is not true if the runtime maintains a list of array in 
> construction with the current position of the index. This list will stay
> rather small, but each addresse read by the GC will have to be checked 
> for membership in the list. If the list is rather small it will be in 
> the cache, but still I am afraid it will slow down noticably the GC.
> 
> Did someone tried to implement such a list of partially initialized 
> objects in the GC ? (I do not think it is worth it ?)

> You could also lie in the tag about the size of array (if the way the 
> runtime finds free block of memory does not use it). It will cost an 
> increment of integer at each step in the initialisation process which 
> should not be much since the beginning of array may stay in the cache if 
> the initialisation function is simple and this will be neggligeable if not.
> 
> 
This second solution is only for non copying GC, so not for OCaml

And yet another solution would have to have a tag with the total size + 
the number of non pointers at the end of the block. This would be 
usefull not only for Arrays. It just makes again a (too) strong limit on 
the size of arrays.

I am now wondering why the structure of a block does not allow a second 
tag word (and even a third tag word) for the tag in case of big objects 
? A cons cell keeps a tag of only one word (that's still too much :-), 
but an array (or even only an array of size > 2^n) uses a bigger tag. 
Moreover this extra tag word could be at address (p-1) if p is the 
address of the object so that some code like unsafe.get do not need to 
access it.

The Gc and few other function like Array.length, Array.get in the bound 
check,.. would have an extra test to do, but the GC would have less 
memory to scan (because it would have to scan not all word in a block).


-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2004-09-13  7:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-09  2:10 [Caml-list] Gripes with array Jon Harrop
2004-09-09  5:08 ` Ville-Pertti Keinonen
2004-09-09  7:17 ` Jean-Christophe Filliatre
2004-09-09  8:23   ` Richard Jones
2004-09-09  9:08     ` Olivier Andrieu
2004-09-09 12:08       ` Basile Starynkevitch [local]
2004-09-09 12:31         ` Damien Doligez
2004-09-09 10:42     ` Gerd Stolpmann
2004-09-09  9:37 ` Damien Doligez
2004-09-09 10:34   ` Jean-Christophe Filliatre
2004-09-09 12:15     ` Igor Pechtchanski
2004-09-09 13:01   ` Brian Hurt
2004-09-09 20:08     ` [Caml-list] 32-bit is sticking around Brandon J. Van Every
2004-09-09 21:04       ` Jon Harrop
2004-09-11 15:30         ` Lars Nilsson
2004-09-11 16:24           ` [off topic] " David MENTRE
2004-09-11 17:52             ` Lars Nilsson
     [not found]           ` <200409111656.11952.jon@jdh30.plus.com>
2004-09-11 17:47             ` Lars Nilsson
2004-09-09 16:58   ` [Caml-list] Gripes with array Jon Harrop
2004-09-10  5:56     ` Array.init (was [Caml-list] Gripes with array) Christophe Raffalli
2004-09-10  8:53       ` Richard Jones
2004-09-10 14:50         ` Damien Doligez
2004-09-13  7:02       ` Christophe Raffalli
2004-09-10 13:45     ` [Caml-list] Gripes with array Damien Doligez
2004-09-11  1:43       ` skaller
2004-09-11  3:16         ` skaller
2004-09-11 14:36       ` Jon Harrop
2004-09-11 20:53         ` Damien Doligez
2004-09-12 15:33           ` Jon Harrop
2004-09-12 16:07             ` Basile Starynkevitch [local]
2004-09-10 23:48 ` brogoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).