caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Wish List for Large Mutable Objects
@ 2004-07-31 18:29 David McClain
  2004-08-01  3:36 ` Brandon J. Van Every
  2004-08-01  4:06 ` Brandon J. Van Every
  0 siblings, 2 replies; 11+ messages in thread
From: David McClain @ 2004-07-31 18:29 UTC (permalink / raw)
  To: caml

Something I would like to see appear in the OCaml libraries, and I don't
have it yet myself, is the use of Copy-on-Write and Scatter-Gather applied
to large mutable objects such as BigArrays. When a request to copy the
object arrives, it is immediately satisfied in mere nanoseconds, delaying
the actual copying operation until (if ever) some code attempts to mutate
one of the cells. Actual copying would frequently be a huge undertaking and
costing a great deal in runtime performance.

The Scatter-Gather would be useful in managing arrays where only a small
part of the array has actually been mutated. Perhaps some kind of frame
paging applied to the array proper. That way the COW only has to replicate
small portions of the array for the user who requested a copy.

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Wish List for Large Mutable Objects
  2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain
@ 2004-08-01  3:36 ` Brandon J. Van Every
  2004-08-02  5:28   ` Brandon J. Van Every
  2004-08-01  4:06 ` Brandon J. Van Every
  1 sibling, 1 reply; 11+ messages in thread
From: Brandon J. Van Every @ 2004-08-01  3:36 UTC (permalink / raw)
  To: caml

David McClain wrote:
>
> Something I would like to see appear in the OCaml libraries,
> and I don't have it yet myself, is the use of Copy-on-Write
> and Scatter-Gather applied to large mutable objects such as
> BigArrays.
>
> [and some other things in other posts]

OCaml is not a lazy language.  Is it reasonable to expect Bigarray to
perform lazy copies, under any permutation of complexity you have in
mind?

It seems like what you really want is to design the memory management of
an Operating System.  If your files are so huge that they don't fit in
main memory, why aren't you willing to use virtual memory?
Scatter-Gather DMA is a device driver level capability that's hardware
dependent.  I don't see why a high level language like OCaml should be
exposing that kind of functionality, and I'm not entirely sure if it
should be doing it under the hood either.  If your OS doesn't have the
kind of memory management you want, maybe you should modify an open
source OS, like Linux or BSD Unix, to do what you want?

You ask why Array1, Array2, Array3 should be special cases.  Well,
clearly because they're the most common, and you can perform access
optimizations for each of these common cases.  It seems that you are
only thinking of ***BIG*** arrays, i.e. your problems and nobody else's.
Lotsa people don't have your notion of 'big'.  Indeed, I don't
personally care about arrays being particularly big.  100MB would be
pretty darn big for what I do in game development right now.  I do care
about their contents being unboxed.  If someone wanted to rename
Bigarray to UnboxedArray, that would suit my own priorities just fine.

I don't understand the "starting from zero" complaint, with respect to
arbitrary file formats.  If the file format is arbitrarily structured,
it is not an array.  You will have to read it some other way.  Arrays
are, generally speaking, composed of uniform elements.  At least, that's
how all of us pedal-to-the-metal guys view them.  I suppose high level
language guys often define the word 'array' to mean anything they want,
like a list or a hash table or a map or whatever, but I don't think they
should.

I don't see why your Scientific notion of an 'infinite array' should be
a basic language interface.  What would be so difficult about building
your favorite array windowing scheme on top of the basic fixed length
components, and calling that a library?  Like 'InfiniteArray' or
something.  Then you'd write some access functions in some syntax you
like, it would behave the way you like, and for your problems you'd be
good.  I've done similar things to perform addressing on icosahedrons,
to try to regularize the mathematics of a tiling of it.  I don't bother
the user about it, my functions just do some computing to make it all
work under the hood.

I do wish Bigarray handled heterogeneous C structures.  Homogeneous
arrays impose some design and interop constraints.

Finally, I'm told that the "%" in the names of called functions in the
sources means that ocamlopt generates different, better code.  The C
routines are ignored, they're only used for ocamlc.


Cheers,                         www.indiegamedesign.com
Brandon Van Every               Seattle, WA

20% of the world is real.
80% is gobbledygook we make up inside our own heads.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Wish List for Large Mutable Objects
  2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain
  2004-08-01  3:36 ` Brandon J. Van Every
@ 2004-08-01  4:06 ` Brandon J. Van Every
  2004-08-02  2:38   ` David McClain
  1 sibling, 1 reply; 11+ messages in thread
From: Brandon J. Van Every @ 2004-08-01  4:06 UTC (permalink / raw)
  To: caml

David McClain wrote:
>
> Something I would like to see appear in the OCaml libraries,
> and I don't have it yet myself, is the use of Copy-on-Write
> and Scatter-Gather applied to large mutable objects such as
> BigArrays.
>
> [and some other things in other posts]

OCaml is not a lazy language.  Is it reasonable to expect Bigarray to
perform lazy copies, under any permutation of complexity you have in
mind?

It seems like what you really want is to design the memory management of
an Operating System.  If your files are so huge that they don't fit in
main memory, why aren't you willing to use virtual memory?
Scatter-Gather DMA is a device driver level capability that's hardware
dependent.  I don't see why a high level language like OCaml should be
exposing that kind of functionality, and I'm not entirely sure if it
should be doing it under the hood either.  If your OS doesn't have the
kind of memory management you want, maybe you should modify an open
source OS, like Linux or BSD Unix, to do what you want?

You ask why Array1, Array2, Array3 should be special cases.  Well,
clearly because they're the most common, and you can perform access
optimizations for each of these common cases.  It seems that you are
only thinking of ***BIG*** arrays, i.e. your problems and nobody else's.
Lotsa people don't have your notion of 'big'.  Indeed, I don't
personally care about arrays being particularly big.  100MB would be
pretty darn big for what I do in game development right now.  I do care
about their contents being unboxed.  If someone wanted to rename
Bigarray to UnboxedArray, that would suit my own priorities just fine.

I don't understand the "starting from zero" complaint, with respect to
arbitrary file formats.  If the file format is arbitrarily structured,
it is not an array.  You will have to read it some other way.  Arrays
are, generally speaking, composed of uniform elements.  At least, that's
how all of us pedal-to-the-metal guys view them.  I suppose high level
language guys often define the word 'array' to mean anything they want,
like a list or a hash table or a map or whatever, but I don't think they
should.

I don't see why your Scientific notion of an 'infinite array' should be
a basic language interface.  What would be so difficult about building
your favorite array windowing scheme on top of the basic fixed length
components, and calling that a library?  Like 'InfiniteArray' or
something.  Then you'd write some access functions in some syntax you
like, it would behave the way you like, and for your problems you'd be
good.  I've done similar things to perform addressing on icosahedrons,
to try to regularize the mathematics of a tiling of it.  I don't bother
the user about it, my functions just do some computing to make it all
work under the hood.

I do wish Bigarray handled heterogeneous C structures.  Homogeneous
arrays impose some design and interop constraints.

Finally, I'm told that the "%" in the names of called functions in the
sources means that ocamlopt generates different, better code.  The C
routines are ignored, they're only used for ocamlc.


Cheers,                         www.indiegamedesign.com
Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  Bigarray!
Unboxed overhead group!  Wondering!  chant chant chant...

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Wish List for Large Mutable Objects
  2004-08-01  4:06 ` Brandon J. Van Every
@ 2004-08-02  2:38   ` David McClain
  2004-08-02  3:20     ` Brandon J. Van Every
  2004-08-04  7:24     ` Alex Baretta
  0 siblings, 2 replies; 11+ messages in thread
From: David McClain @ 2004-08-02  2:38 UTC (permalink / raw)
  To: Brandon J. Van Every, caml

Okay now... not trying to start any flame wars, but you "guys in the
trenches" so to speak seem a bit short on real life experience outside of
your own field.

I have a perfectly good running VM as user process library running right now
in C++ that allows for mixed array files, arbitrary offsets into the file
for various array pointers, and this is all transparent to the user just as
I indicated in my wish list for OCaml.

In more than 20 years of scientific data access and analysis I have only
seen uniform arrays, one per file, generated by neophytes. In just about
every case I can remember; NetCDF, HDF, FITS, RIF Wave Files, MPEG, etc.,
these are all compound object files. The trouble with the simple minded
approach of one array per file is that most data acquisitions will then end
up with dozens of component data files and it becomes a tracking nightmare
to keep them all coordinated. Not so if you permit compound document files.

With a language as rich and wonderful as OCaml, I really can't understand
your hostility to useful additions to the language. If you don't want to
play, you don't have to join my sandbox -- find another.

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com


----- Original Message ----- 
From: "Brandon J. Van Every" <vanevery@indiegamedesign.com>
To: "caml" <caml-list@inria.fr>
Sent: Saturday, July 31, 2004 21:06
Subject: RE: [Caml-list] Wish List for Large Mutable Objects


> David McClain wrote:
> >
> > Something I would like to see appear in the OCaml libraries,
> > and I don't have it yet myself, is the use of Copy-on-Write
> > and Scatter-Gather applied to large mutable objects such as
> > BigArrays.
> >
> > [and some other things in other posts]
>
> OCaml is not a lazy language.  Is it reasonable to expect Bigarray to
> perform lazy copies, under any permutation of complexity you have in
> mind?
>
> It seems like what you really want is to design the memory management of
> an Operating System.  If your files are so huge that they don't fit in
> main memory, why aren't you willing to use virtual memory?
> Scatter-Gather DMA is a device driver level capability that's hardware
> dependent.  I don't see why a high level language like OCaml should be
> exposing that kind of functionality, and I'm not entirely sure if it
> should be doing it under the hood either.  If your OS doesn't have the
> kind of memory management you want, maybe you should modify an open
> source OS, like Linux or BSD Unix, to do what you want?
>
> You ask why Array1, Array2, Array3 should be special cases.  Well,
> clearly because they're the most common, and you can perform access
> optimizations for each of these common cases.  It seems that you are
> only thinking of ***BIG*** arrays, i.e. your problems and nobody else's.
> Lotsa people don't have your notion of 'big'.  Indeed, I don't
> personally care about arrays being particularly big.  100MB would be
> pretty darn big for what I do in game development right now.  I do care
> about their contents being unboxed.  If someone wanted to rename
> Bigarray to UnboxedArray, that would suit my own priorities just fine.
>
> I don't understand the "starting from zero" complaint, with respect to
> arbitrary file formats.  If the file format is arbitrarily structured,
> it is not an array.  You will have to read it some other way.  Arrays
> are, generally speaking, composed of uniform elements.  At least, that's
> how all of us pedal-to-the-metal guys view them.  I suppose high level
> language guys often define the word 'array' to mean anything they want,
> like a list or a hash table or a map or whatever, but I don't think they
> should.
>
> I don't see why your Scientific notion of an 'infinite array' should be
> a basic language interface.  What would be so difficult about building
> your favorite array windowing scheme on top of the basic fixed length
> components, and calling that a library?  Like 'InfiniteArray' or
> something.  Then you'd write some access functions in some syntax you
> like, it would behave the way you like, and for your problems you'd be
> good.  I've done similar things to perform addressing on icosahedrons,
> to try to regularize the mathematics of a tiling of it.  I don't bother
> the user about it, my functions just do some computing to make it all
> work under the hood.
>
> I do wish Bigarray handled heterogeneous C structures.  Homogeneous
> arrays impose some design and interop constraints.
>
> Finally, I'm told that the "%" in the names of called functions in the
> sources means that ocamlopt generates different, better code.  The C
> routines are ignored, they're only used for ocamlc.
>
>
> Cheers,                         www.indiegamedesign.com
> Brand*n Van Every               S*attle, WA
>
> Praise Be to the caml-list Bayesian filter! It blesseth
> my postings, it is evil crap!  evil crap!  Bigarray!
> Unboxed overhead group!  Wondering!  chant chant chant...
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  2:38   ` David McClain
@ 2004-08-02  3:20     ` Brandon J. Van Every
  2004-08-02  3:32       ` David McClain
  2004-08-04  7:24     ` Alex Baretta
  1 sibling, 1 reply; 11+ messages in thread
From: Brandon J. Van Every @ 2004-08-02  3:20 UTC (permalink / raw)
  To: caml

David McClain wrote:
>
> I have a perfectly good running VM as user process library
> running right now
> in C++ that allows for mixed array files, arbitrary offsets
> into the file
> for various array pointers, and this is all transparent to
> the user just as I indicated in my wish list for OCaml.

But it doesn't do scatter-gather DMA.  A user process only grants so
much control, and you seem to want an awful lot of control.  Hence my
suggestion that you tweak an OS.

> In more than 20 years of scientific data access and analysis
> I have only
> seen uniform arrays, one per file, generated by neophytes. In
> just about
> every case I can remember; NetCDF, HDF, FITS, RIF Wave Files,
> MPEG, etc., these are all compound object files.

Us neophytes call them 'file formats'.  They aren't arrays.  I think
we'll be at loggerheads until we agree what an 'array' is.

> The trouble with the simple minded
> approach of one array per file is that most data acquisitions
> will then end
> up with dozens of component data files and it becomes a
> tracking nightmare
> to keep them all coordinated. Not so if you permit compound
> document files.

What does this have to do with Bigarray?  Bigarray provides uniform
basic types in unboxed consecutive memory locations, ala C or Fortran.
That's the entire point, to communicate with arrays as C and Fortran do
them.  Why are you expecting it to be something exceedingly different?

> With a language as rich and wonderful as OCaml, I really
> can't understand your hostility

I haven't spoken with hostility.  I gather you're somewhat attached to
your problems, to view my comments as hostility.

> to useful additions to the language.

Clearly, you think your ideas are useful to you.  Whether others think
they're useful to them, remains to be seen.

> If you don't want to
> play, you don't have to join my sandbox -- find another.

You've lost me here.  Are you saying that if you hear feedback you don't
like, that those giving the feedback should leave caml-list or just be
quiet?


Cheers,                         www.indiegamedesign.com
Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  Bigarray!
Unboxed overhead group!  Wondering!  chant chant chant...


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  3:20     ` Brandon J. Van Every
@ 2004-08-02  3:32       ` David McClain
  2004-08-02  5:14         ` Brandon J. Van Every
  0 siblings, 1 reply; 11+ messages in thread
From: David McClain @ 2004-08-02  3:32 UTC (permalink / raw)
  To: Brandon J. Van Every, caml

... I'm open to reasoned feedback, of course. Yours seemed overtly hostile
to me, as though you were somehow protecting the virtue of your young
sister, OCaml, against inferred accostations.

How is it you claim to speak for my C++ manager about scatter gather? It
appears that you have some real boundary issues here, and this probably
needs to be taken offline...

I was actually addressing most of my comments to the language designers
themselves, without referring to them by name. I am perfectly capable of
adding such primitives to the core language myself. But I was offering some
useful insight into the way that scientists view the universe, as contrasted
with conventional programming language design. If the language would choose
to implement some of these additions it could become more immediately
attractive to the audience in my corner of the universe. That's all....

I see, after thinking some time about the Array1, Array2, etc., versus
Generic Arrays, that Xavier et al needed to protect the typability of their
language, and so they made a concession to the masses in restricting the
convenient x.{ix1, ix2}, etc. syntax to the more common uses. In any event,
handling arbitrary arrays, I'm unlikely to use this syntax anyway,
preferring the more general Get/Set primitives on computed index lists. So,
in this case, I have answered my own question, and I'm not really losing
anything by their choice.

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com



----- Original Message ----- 
From: "Brandon J. Van Every" <vanevery@indiegamedesign.com>
To: "caml" <caml-list@inria.fr>
Sent: Sunday, August 01, 2004 20:20
Subject: RE: [Caml-list] Wish List for Large Mutable Objects


> David McClain wrote:
> >
> > I have a perfectly good running VM as user process library
> > running right now
> > in C++ that allows for mixed array files, arbitrary offsets
> > into the file
> > for various array pointers, and this is all transparent to
> > the user just as I indicated in my wish list for OCaml.
>
> But it doesn't do scatter-gather DMA.  A user process only grants so
> much control, and you seem to want an awful lot of control.  Hence my
> suggestion that you tweak an OS.
>
> > In more than 20 years of scientific data access and analysis
> > I have only
> > seen uniform arrays, one per file, generated by neophytes. In
> > just about
> > every case I can remember; NetCDF, HDF, FITS, RIF Wave Files,
> > MPEG, etc., these are all compound object files.
>
> Us neophytes call them 'file formats'.  They aren't arrays.  I think
> we'll be at loggerheads until we agree what an 'array' is.
>
> > The trouble with the simple minded
> > approach of one array per file is that most data acquisitions
> > will then end
> > up with dozens of component data files and it becomes a
> > tracking nightmare
> > to keep them all coordinated. Not so if you permit compound
> > document files.
>
> What does this have to do with Bigarray?  Bigarray provides uniform
> basic types in unboxed consecutive memory locations, ala C or Fortran.
> That's the entire point, to communicate with arrays as C and Fortran do
> them.  Why are you expecting it to be something exceedingly different?
>
> > With a language as rich and wonderful as OCaml, I really
> > can't understand your hostility
>
> I haven't spoken with hostility.  I gather you're somewhat attached to
> your problems, to view my comments as hostility.
>
> > to useful additions to the language.
>
> Clearly, you think your ideas are useful to you.  Whether others think
> they're useful to them, remains to be seen.
>
> > If you don't want to
> > play, you don't have to join my sandbox -- find another.
>
> You've lost me here.  Are you saying that if you hear feedback you don't
> like, that those giving the feedback should leave caml-list or just be
> quiet?
>
>
> Cheers,                         www.indiegamedesign.com
> Brand*n Van Every               S*attle, WA
>
> Praise Be to the caml-list Bayesian filter! It blesseth
> my postings, it is evil crap!  evil crap!  Bigarray!
> Unboxed overhead group!  Wondering!  chant chant chant...
>
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  3:32       ` David McClain
@ 2004-08-02  5:14         ` Brandon J. Van Every
  2004-08-02  8:00           ` Ville-Pertti Keinonen
  0 siblings, 1 reply; 11+ messages in thread
From: Brandon J. Van Every @ 2004-08-02  5:14 UTC (permalink / raw)
  To: caml

David McClain wrote:
>
> How is it you claim to speak for my C++ manager about scatter
> gather?

I used to write 3d device drivers for a living.  I was never much for
the 'nasty innards' of OS internals, preferring to concentrate on ASM
loop optimizations for 3d graphics.  That said, Scatter-Gather DMA is
generally a property of a memory controller, i.e. a chipset on a
motherboard.  I seriously doubt you have user mode access to such memory
controllers.  If you do, point me at the API for it.  I'm happy to stand
corrected, but as far as I know, scatter-gather DMA is kernel mode stuff
on all common architectures.

There is the outside possibility that you mean something different by
'scatter-gather' than a device driver writer means by 'scatter-gather'.
A similar loggerhead to what an 'array' is.

A third possibility is you have written a library that assumes
scatter-gather DMA is happening under the hood somehow, but doesn't
explicitly control it in any way.  To which I say, memory controllers
are different.  In the absence of a query interface to determine their
capabilities, I don't see how you'd rigorously control algorithmic
performance.  Maybe you do not regard rigor as so important - memory
cache hierarchies sorta work without anyone doing anything explicit,
after all.  But I would say, without rigor, you probably won't end up
with anything.  Just an idea that something should be fast under some
circumstances, rather than any proven, repeatable reality.

For clarity, these aren't personal comments.  This is just my
understanding of scatter-gather DMA vs. whatever your understanding is.


Cheers,                         www.indiegamedesign.com
Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  Bigarray!
Unboxed overhead group!  Wondering!  chant chant chant...

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] Wish List for Large Mutable Objects
  2004-08-01  3:36 ` Brandon J. Van Every
@ 2004-08-02  5:28   ` Brandon J. Van Every
  0 siblings, 0 replies; 11+ messages in thread
From: Brandon J. Van Every @ 2004-08-02  5:28 UTC (permalink / raw)
  To: caml

Brandon J. Van Every wrote:
>
> I do wish Bigarray handled heterogeneous C structures.  Homogeneous
> arrays impose some design and interop constraints.

I meant, I wish any one Bigarray could handle any one type of C
structure.  I will be looking at how to fake this behavior.


Cheers,                         www.indiegamedesign.com
Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  Bigarray!
Unboxed overhead group!  Wondering!  chant chant chant...

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  5:14         ` Brandon J. Van Every
@ 2004-08-02  8:00           ` Ville-Pertti Keinonen
  2004-08-02  9:12             ` David McClain
  0 siblings, 1 reply; 11+ messages in thread
From: Ville-Pertti Keinonen @ 2004-08-02  8:00 UTC (permalink / raw)
  To: Brandon J. Van Every; +Cc: caml

Brandon J. Van Every wrote:

>loop optimizations for 3d graphics.  That said, Scatter-Gather DMA is
>generally a property of a memory controller, i.e. a chipset on a
>motherboard.  I seriously doubt you have user mode access to such memory
>controllers.  If you do, point me at the API for it.  I'm happy to stand
>corrected, but as far as I know, scatter-gather DMA is kernel mode stuff
>on all common architectures.
>  
>
Often things like the readv(2)/writev(2) interface are referred to as 
"scatter-gather".  It just means I/O on regions of memory that aren't 
contiguous in a single operation.

I'm not sure what David McClain is referring to - but I think it's the 
ability for an "array" to provide another level of virtualization so 
that the underlying data needn't be contiguous in the address space of 
the process.

That seems a bit excessive - a more limited part of what he's suggesting 
could be more reasonable - being able to map a part of a file, from an 
arbitrary offset, as a Bigarray could be useful.  Even this would 
require some separation of storage management from the actual "Bigarray 
header", since operating systems require the underlying mappings to be 
page aligned.  I suspect this could be as simple as passing a 
page-truncated offset to mmap(2), adding the remaining offset to the 
returned address and page-truncating the address passed to munmap(2).

This doesn't address the problem with most CPUs requiring the actual 
objects to be aligned, for which adding an "offset" between the 
beginning of the mapping and the beginning of the visible array isn't 
sufficient if the mapped file doesn't align things appropriately for the 
CPU (for arbitrary file formats, there's also endianness issues to 
consider).

Using subarrays instead of offsetted memory mappings protects against 
this, and makes offsetted mappings "unnecessary" altogether, but 
obviously if the file is big enough, the waste of address space due to 
the extra mappings can be significant on 32-bit systems...which as I 
understand was part of the original problem.

Typing this on an Athlon 64 and sitting next to an Alpha, such things 
seem like legacy issues to me, especially since OCaml supports both 
architectures natively.

The main issue I have with the suggestions regarding turning Bigarrays 
into higher-level abstractions altogether is that it would make them 
considerably less efficient.  The higher-level abstractions can always 
be implemented as layers on top of the current abstractions, which in my 
opinion is the right approach.

Of course it isn't my decision in any way.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  8:00           ` Ville-Pertti Keinonen
@ 2004-08-02  9:12             ` David McClain
  0 siblings, 0 replies; 11+ messages in thread
From: David McClain @ 2004-08-02  9:12 UTC (permalink / raw)
  To: caml

Some good points here by Ville-Pertti,

Indeed the Scientific mode requires a balanced modulus operation on each
array index, not the one presently offered by OCaml Pervasive. But this is
used in lieu of bounds checking anyway, and the world has come to accept the
slight cost of array bounds checking.

There are really two issues that sort of got mixed together here, only
because BigArray mixed them up... One is the use of Scientific mode for some
arrays. The other is memory mapped arrays. These are really two separate
issues, and the extra cost on accessing mmapped arrays is worth the price
over the cost of slower buffered file I/O. It wouldn't be acceptable cost
for normal memory bound arrays.

Some processors do have alignment requirements, but every file system I was
referring to always guarantees a minimal alignment based on the underlying
array element type. These alignments generally coincide with the most
stringent alignment requirements in use today. Some processors like the G4
appear to be more lax on alignment requirements, but my bet is that
misaligned data cause some slowdown. I think the X86 architectures operate
this way too.

However, you do raise an interesting point about endianess. The more
portable file formats have generally accepted network byte ordering,
generally by incorporating old Sun-XDR data representations. And indeed for
memory mapped arrays, this would be an extra cost. But still, in this case,
it would far faster than buffered file I/O. My own tests show that a more or
less random access pattern in the mmapped array is 200 times faster than
fread/fwrite style of data accessing. So any addition machine cycles can
easily be hidden in that performance difference.

But again, let's separate these two issues. I generally know when I'm
accessing a mmapped array and when I'm not. I had to offer up a filename in
order to do mmapping... The only reason these two conversation threads
merged is because when I read the BigArray documentation, I found out that
these offer a primitive form of mmapped access in addition to normal memory
bound array accessing.

Not sure what multiple mappings you were referring to... I meant to allow a
kind of scatter-gather COW on normal memory bound arrays. Memmaped arrays
are a problem apart from this. Despite what might appear as a cost overhead,
the savings can be quite significant when combined with smart array slicing
and sectioning.

For example, in my NML, whenever I do an array slice (more complex
operations than supported by BigArray), what I actually do is pay the price
of all the if-then-else branching on only the first descent, generating a
tree of lambda closures on the way back out, so that all the actual copying
operation occurs without any more testing along the way. Sort of like
reaching down your throat and pulling yourself inside-out... heh!

The speed of these compound slicings is enormously faster than conventional
imperative logic. So while some operations are more costly, others benefit
greatly from higher order logic. In fact, a simple minded analysis shows
that if you ever intend to read or write a mutating representation array
then it pays to simply create a native double array once, and pay the cost
of representation mutation just once, and then allow repeat non-mutating,
faster, accesses to the underlying data. Keeping the array around in a
foreign format just adds incremental costs that will exceed this copying
cost, if you hit every element several times.

But as often as not, we do slice interesting sections from the data. Not
sure if this ever happens without first hitting every element on a
vectorized math op... My guess is no... and so the cost of copying must
occur no matter what.

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Wish List for Large Mutable Objects
  2004-08-02  2:38   ` David McClain
  2004-08-02  3:20     ` Brandon J. Van Every
@ 2004-08-04  7:24     ` Alex Baretta
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Baretta @ 2004-08-04  7:24 UTC (permalink / raw)
  To: David McClain, Ocaml

David McClain wrote:

> With a language as rich and wonderful as OCaml, I really can't understand
> your hostility to useful additions to the language. If you don't want to
> play, you don't have to join my sandbox -- find another.
> 

The language is truly great, but that doesn't mean that all libraries 
we've ever dreamt of are actually available. I think your idea of 
windowed mmapped access to files would make an excellent library. Only, 
don't ask Xavier to do it.

Alex

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-08-04  7:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain
2004-08-01  3:36 ` Brandon J. Van Every
2004-08-02  5:28   ` Brandon J. Van Every
2004-08-01  4:06 ` Brandon J. Van Every
2004-08-02  2:38   ` David McClain
2004-08-02  3:20     ` Brandon J. Van Every
2004-08-02  3:32       ` David McClain
2004-08-02  5:14         ` Brandon J. Van Every
2004-08-02  8:00           ` Ville-Pertti Keinonen
2004-08-02  9:12             ` David McClain
2004-08-04  7:24     ` Alex Baretta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).