caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Array 4 MB size limit
@ 2006-05-15 18:12 akalin
  2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
                   ` (4 more replies)
  0 siblings, 5 replies; 67+ messages in thread
From: akalin @ 2006-05-15 18:12 UTC (permalink / raw)
  To: caml-list

I'm running into cases where the 4 MB limit on arrays is starting to 
become a problem.  Lists are much slower and cause seg faults for me on 
the same data set, and Bigarrays are a partial solution because I'd 
like to be able to store arbitrary types in arrays (not just numbers).

I was greatly surprised when I found out there was such a low limit on 
arrays.  Is there a reason for this?  Will this limit ever be increased?

Is the limit a limit on the number of elements or the total size?  The 
language in Sys.max_array_size implies the former, but the fact the 
limit is halved for floats implies the latter.  If I had a record type 
with 5 floats, will the limit then by Sys.max_array_size / 10? Is there 
some sort of existing ArrayList module that works around this problem?  
Ideally, I'd like to have something like C++'s std::vector<> type, 
which can be dynamically resized.  Do I have to write my own? :(

Also, the fact that using lists crashes for the same data set is 
surprising.  Is there a similar hard limit for lists, or would this be 
a bug?  Should I post a test case?

Thanks!

Frederick Akalin


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 18:12 Array 4 MB size limit akalin
@ 2006-05-15 18:22 ` Nicolas Cannasse
  2006-05-15 20:32 ` Damien Doligez
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Nicolas Cannasse @ 2006-05-15 18:22 UTC (permalink / raw)
  To: akalin; +Cc: caml-list

> Also, the fact that using lists crashes for the same data set is 
> surprising.  Is there a similar hard limit for lists, or would this be a 
> bug?  Should I post a test case?
> 
> Thanks!
> 
> Frederick Akalin

This is most probably because some list operation is causing a Stack 
Overflow (List.map if I remember correctly). You can use ExtLib 
(http://ocaml-libs.sf.net) for example that provides tail-recursive List 
operations.

Nicolas


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 18:12 Array 4 MB size limit akalin
  2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
@ 2006-05-15 20:32 ` Damien Doligez
  2006-05-15 21:27   ` akalin
  2006-05-15 22:51 ` Oliver Bandel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Damien Doligez @ 2006-05-15 20:32 UTC (permalink / raw)
  To: caml users

On 2006-05-15, at 20:12, akalin@akalin.cx wrote:

> I'm running into cases where the 4 MB limit on arrays is starting  
> to become a problem.  Lists are much slower and cause seg faults  
> for me on the same data set, and Bigarrays are a partial solution  
> because I'd like to be able to store arbitrary types in arrays (not  
> just numbers).

You could move to a two-level solution (an array of arrays).  The  
code is not
hard to write, but it does entail a performance hit, which may or may  
not matter
for your application.

> I was greatly surprised when I found out there was such a low limit  
> on arrays.  Is there a reason for this?  Will this limit ever be  
> increased?

It will be increased when you get a 64-bit machine ;-)

> Also, the fact that using lists crashes for the same data set is  
> surprising.  Is there a similar hard limit for lists, or would this  
> be a bug?  Should I post a test case?

I bet my mustache these crashes are stack overflows.


-- Damien


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 20:32 ` Damien Doligez
@ 2006-05-15 21:27   ` akalin
  0 siblings, 0 replies; 67+ messages in thread
From: akalin @ 2006-05-15 21:27 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

Quoting Damien Doligez <damien.doligez@inria.fr>:

> On 2006-05-15, at 20:12, akalin@akalin.cx wrote:
>
>> I was greatly surprised when I found out there was such a low limit  
>> on arrays.  Is there a reason for this?  Will this limit ever be  
>> increased?
>
> It will be increased when you get a 64-bit machine ;-)

Of course it will. :) But what I meant was that the limit is nowhere 
near 32 bits (4 MB is only 22 bits), which leads me to believe that the 
limit is due to the implementation of arrays.  I was asking if there 
would be an improved implemetation of arrays which would increase this 
limit (up to 32 bits).

>> Also, the fact that using lists crashes for the same data set is  
>> surprising.  Is there a similar hard limit for lists, or would this  
>> be a bug?  Should I post a test case?
>
> I bet my mustache these crashes are stack overflows.

I was thinking otherwise, as it seemed to be crashing only when 
building the list.  The list is built by a parser rule which is reading 
from a line of floats.  The rule itself is recursive, which does imply 
a stack overflow, (or does it? ocamlyacc generates a LALR or some 
variant, right?) but changing that rule to use a Queue worked fine. 
(Until I tried to convert it into an array, which led me to discover 
the 4 MB limit)

Incidentally, what data structure does the Queue use internally? I 
didn't run into any of the problems I did with lists or arrays.

Frederick Akalin


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 18:12 Array 4 MB size limit akalin
  2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
  2006-05-15 20:32 ` Damien Doligez
@ 2006-05-15 22:51 ` Oliver Bandel
  2006-05-16  0:48 ` Brian Hurt
  2006-05-16  8:01 ` Xavier Leroy
  4 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-15 22:51 UTC (permalink / raw)
  To: caml-list

On Mon, May 15, 2006 at 02:12:30PM -0400, akalin@akalin.cx wrote:
> I'm running into cases where the 4 MB limit on arrays is starting to 
> become a problem.

ooops :(

> Lists are much slower and cause seg faults for me on 
> the same data set,

well.... exceptions can be acceptable, but segfaults?
Very unnice :(
Shouldn't be...


[...]
> Also, the fact that using lists crashes for the same data set is 
> surprising.  Is there a similar hard limit for lists, or would this be 
> a bug?  Should I post a test case?
[...]

Do you really need an array or other linear data structures?
Couldn't the problem be re-formulated to use trees instead (Map/Set)?
(hoping these limitations are not there...)


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 18:12 Array 4 MB size limit akalin
                   ` (2 preceding siblings ...)
  2006-05-15 22:51 ` Oliver Bandel
@ 2006-05-16  0:48 ` Brian Hurt
  2006-05-16  9:57   ` Damien Doligez
  2006-05-16  8:01 ` Xavier Leroy
  4 siblings, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-16  0:48 UTC (permalink / raw)
  To: akalin; +Cc: caml-list



On Mon, 15 May 2006, akalin@akalin.cx wrote:

> I'm running into cases where the 4 MB limit on arrays is starting to become a 
> problem.  Lists are much slower and cause seg faults for me on the same data 
> set, and Bigarrays are a partial solution because I'd like to be able to 
> store arbitrary types in arrays (not just numbers).
>
> I was greatly surprised when I found out there was such a low limit on 
> arrays.  Is there a reason for this?  Will this limit ever be increased?

It comes from how arrays are stored in memory.  Every boxed value has a 
"tag word", which, among other uses, is used by the GC to figure out how 
big the object is (i.e. where the next object starts).  Now, the encoding 
scheme of this tag word is fairly complicated, but it works out that for 
arrays, 10 bits of the 32 bits is used for other stuff, leaving 22 bits 
for the size.  This limits the size of an array to 4M-1 elements 
(actually, it limits the size of the array to 16M-4 bytes, as size is 
measured in words- which is why arrays of floats are limited to 2M-1 
entries- each float takes up 8 bytes aka 2 words).

When you move to 64 bits, the tag word doubles in size, but the amount of 
"other" information in the tag word doesn't- this means that suddenly you 
have 52 bits of size, or 4T arrays.  And now floats only take up one word, 
not two, so they too can have 4T arrays.

The array of array idea someone brought up is a good one.  There's a 
slight performance hit, but not much.  I'd keep the top level array short- 
1K entries is enough.  The advantage of this is that if you're heavily 
using the array, the top level array will stay in cache (possibly even L1 
cache), meaning the main cost of an array access only goes up by about 10% 
or so, maybe less (maybe 1% if it stays in L1 cache).  The reasoning goes 
like this: the largest cost by far of an array access on a large array is 
the cache miss.  Doing a memory reference that is satisified out of L1 
cache generally takes 1-4 clock cycles.  If you have to go to L2 cache, 
that's going to take 10-40 clock cycles (approximately).  Going to main 
memory?  100-350+ clock cycles.  With a large array, you're likely going 
to take the 100-350+ clock cycle hit anyways. If the top level array is in 
L1 cache, you're only adding 1-4 clockcycles to the 100-350+ clock cycle 
hit you're going to take anyways.

Personally, I think PCs should have gone 64 bit circa 1997 or so.  Once we 
started getting hard disks large enough that we couldn't mmap the entire 
hard disk, we started hitting problems.  Long long and the large file 
interfaces are examples of these problems.  The longer we go- and the 
larger memories and hard disks get, the bigger the problems get. 
Microsoft seems to be the main impediment, now that AMD has forced Intel 
to get off the stick.

> If I had a record type with 5 floats, 
> will the limit then by Sys.max_array_size / 10?

Within the record, the floats are unboxed (assuming you didn't have any 
non-float elements).  This means the floats are stored directly in the 
record, and that the record doesn't hold pointers to the floats.  But the 
record itself is boxed- which means that an array of these records is 
really an array of pointers to these records, meaning that you're back at 
the 4M-1 limit.

Note that a record of 5 floats costs 44 bytes (40 bytes for the 5 floats + 
4 bytes for the tag word).  Assuming no records are stored in more than 
one location, the total memory cost of an array of 4M-1 of them is 16M for 
the array, plus (4M-1)*44 for the records, for a total of 192M-44 bytes. 
This is almost 10% of your available memory space right there.

> Is there some sort of 
> existing ArrayList module that works around this problem?  Ideally, I'd like 
> to have something like C++'s std::vector<> type, which can be dynamically 
> resized.

Ocaml can't dynamically resize arrays.  This gets tricky to do when arrays 
get large.  At the extreme, if the array takes up 50% + 1 byte of the 
total address space, you can't resize it to be any larger- to resize 
requires you to copy it, which takes more memory than you have.  I've 
seend problems well before then.

> Also, the fact that using lists crashes for the same data set is surprising. 
> Is there a similar hard limit for lists, or would this be a bug?  Should I 
> post a test case?

I'm willing to bet dollars to donuts that the problem you hit with lists 
was stack overflow.  Without signifigantly impacting performance there 
isn't a whole lot Ocaml can do about detecting stack overflow before the 
segfault hits.  My general rule is, if it's going to contain more than a 
few dozen items, it probably shouldn't be a list.  Extlib's library is 
less prone to this error, but you can still run into serious problems with 
long lists.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-15 18:12 Array 4 MB size limit akalin
                   ` (3 preceding siblings ...)
  2006-05-16  0:48 ` Brian Hurt
@ 2006-05-16  8:01 ` Xavier Leroy
  2006-05-16  8:20   ` Nicolas Cannasse
  2006-05-19  5:57   ` Frederick Akalin
  4 siblings, 2 replies; 67+ messages in thread
From: Xavier Leroy @ 2006-05-16  8:01 UTC (permalink / raw)
  To: akalin; +Cc: caml-list

> I was greatly surprised when I found out there was such a low limit on
> arrays.  Is there a reason for this?  Will this limit ever be increased?

As Brian Hurt explained, this limit comes from the fact that heap
object sizes are stored in N-10 bits, where N is the bit width of the
processor (32 or 64).

Historical digression: this representation decision was initially
taken when designing Caml Light in 1989-1990.  At that time, even
professional workstations had 16 M of RAM at best, so limiting arrays to
4M elements was reasonable.  The decision was then reconsidered in
1995 during the redesign that led to OCaml.  At that time,
64-bit architectures were all the rage: OCaml was actually implemented
on a 64-bit Alpha, and only then backported to 32-bit machines.
So, the original header format was kept, since it makes complete sense
on a 64-bit architecture.  Little did I know that the 32-bitters would
survive so long.  Now, it's 2006, and 64-bit processors are becoming
universally available, in desktop machines at least.  (I've been
running an AMD64 PC at home since january 2005 with absolutely zero
problems.)  So, no the data representations of OCaml are not going to
change to lift the array size limit on 32-bit machines.

> Is the limit a limit on the number of elements or the total size?  The
> language in Sys.max_array_size implies the former, but the fact the
> limit is halved for floats implies the latter.  If I had a record type
> with 5 floats, will the limit then by Sys.max_array_size / 10?

No.  In general, Caml arrays are not unboxed, meaning that your array
of 5-float records is actually an array of pointers to individual
blocks holding 5 floats.  The only exception is for arrays of floats,
which are unboxed.

> Is there
> some sort of existing ArrayList module that works around this problem?
> Ideally, I'd like to have something like C++'s std::vector<> type, which
> can be dynamically resized.  Do I have to write my own? :(

A better idea would be to determine exactly what data structure you need:
which abstract operations, what performance requirements, etc.  C++
and Lisp programmers tend to encode everything as arrays or lists,
respectively, but quite often these are not the best data structure
for the application of interest.

> Also, the fact that using lists crashes for the same data set is
> surprising.  Is there a similar hard limit for lists, or would this be a
> bug?  Should I post a test case?

Depends on the platform you use.  In principle, Caml should report
stack overflows cleanly, by throwing a Stack_overflow exception.
However, this is hard to do in native code as it depends a lot on the
processor and OS used.  So, some combinations (e.g. x86/Linux) will
report stack overflows via an exception, and others will let the
kernel generate a segfault.

If you're getting the segfault under x86/Linux for instance, please
post a test case on the bug tracking system.  It's high time that
Damien shaves :-)

- Xavier Leroy


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-16  8:01 ` Xavier Leroy
@ 2006-05-16  8:20   ` Nicolas Cannasse
  2006-05-19 17:13     ` Xavier Leroy
  2006-05-19  5:57   ` Frederick Akalin
  1 sibling, 1 reply; 67+ messages in thread
From: Nicolas Cannasse @ 2006-05-16  8:20 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: akalin, caml-list

>>Also, the fact that using lists crashes for the same data set is
>>surprising.  Is there a similar hard limit for lists, or would this be a
>>bug?  Should I post a test case?
> 
> 
> Depends on the platform you use.  In principle, Caml should report
> stack overflows cleanly, by throwing a Stack_overflow exception.
> However, this is hard to do in native code as it depends a lot on the
> processor and OS used.  So, some combinations (e.g. x86/Linux) will
> report stack overflows via an exception, and others will let the
> kernel generate a segfault.

A segfault will happen on Windows/MSVC port. I also found some cases
where the commandline program (the haXe compiler in that case) just
silently exited on Stack Overflow (exit code was not 0 but no error or
"program error" infamous message box was displayed).

I think that there is some MSVC specific C extension for catching such
stack overflows (__try / __except*). It would be nice to have such a
handling at the ocaml toplevel that would at least exit with a
meaningful error message. I don't care so much in my case since I'm not
using C code a lot, I know for sure that any crash/early abort is
indeeed a stack overflow. That might not be the case for all Win32 users.

Best,

Nicolas
*
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/key_s-z_4.asp


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-16  0:48 ` Brian Hurt
@ 2006-05-16  9:57   ` Damien Doligez
  2006-05-16 15:10     ` Markus Mottl
  0 siblings, 1 reply; 67+ messages in thread
From: Damien Doligez @ 2006-05-16  9:57 UTC (permalink / raw)
  To: caml users


On 2006-05-16, at 02:48, Brian Hurt wrote:

> When you move to 64 bits, the tag word doubles in size, but the  
> amount of "other" information in the tag word doesn't- this means  
> that suddenly you have 52 bits of size, or 4T arrays.

This is not quite correct, but a mere factor of 1000 doesn't matter  
much at
this point :-)

-- Damien


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-16  9:57   ` Damien Doligez
@ 2006-05-16 15:10     ` Markus Mottl
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Mottl @ 2006-05-16 15:10 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

On 5/16/06, Damien Doligez <damien.doligez@inria.fr> wrote:
>
> > When you move to 64 bits, the tag word doubles in size, but the
> > amount of "other" information in the tag word doesn't- this means
> > that suddenly you have 52 bits of size, or 4T arrays.
>
> This is not quite correct, but a mere factor of 1000 doesn't matter
> much at
> this point :-)
>

And 640KB ought to be enough for anyone ;-)

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

[-- Attachment #2: Type: text/html, Size: 1012 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-16  8:01 ` Xavier Leroy
  2006-05-16  8:20   ` Nicolas Cannasse
@ 2006-05-19  5:57   ` Frederick Akalin
  2006-05-19  6:21     ` Oliver Bandel
                       ` (2 more replies)
  1 sibling, 3 replies; 67+ messages in thread
From: Frederick Akalin @ 2006-05-19  5:57 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier Leroy wrote:
>> I was greatly surprised when I found out there was such a low limit on
>> arrays.  Is there a reason for this?  Will this limit ever be increased?
>>     
>
> As Brian Hurt explained, this limit comes from the fact that heap
> object sizes are stored in N-10 bits, where N is the bit width of the
> processor (32 or 64).
>
> Historical digression: this representation decision was initially
> taken when designing Caml Light in 1989-1990.  At that time, even
> professional workstations ... Little did I know that the 32-bitters would
> survive so long.  Now, it's 2006, and 64-bit processors are becoming
> universally available, in desktop machines at least.  (I've been
> running an AMD64 PC at home since january 2005 with absolutely zero
> problems.)  So, no the data representations of OCaml are not going to
> change to lift the array size limit on 32-bit machines.
>
>   
I see.  This is a perfectly reasonable explanation.  I ended up just 
using Bigarrays of floats for now and converting from those to 3-d 
vectors on demand (what I need), but it would be nice to have a type 
that wraps around Array that can get around the 4M limit (using an array 
of arrays like someone suggested earlier).  It's sad, but I think 32-bit 
is going to be around for a while, and surely I can't be the only person 
to run up against this. :) Not that I mind writing such a library and 
releasing it.  I wonder if the Extlib guys would be interested...
> A better idea would be to determine exactly what data structure you need:
> which abstract operations, what performance requirements, etc.  C++
> and Lisp programmers tend to encode everything as arrays or lists,
> respectively, but quite often these are not the best data structure
> for the application of interest.
>   
I find your assertion surprising.  C++ and Common LISP are by no means 
lacking in standard data structures (and using bare arrays is 
discouraged in C++, as far as I know) and in my experience I haven't 
much seen C++ code that used arrays/vectors when not appropriate.

In any case, in my application (a raytracer) I am reading in lists of 
numbers from a file representing the points of a mesh and the triangles 
that make up the mesh (represented by a list of indices into the mesh 
list).  A dynamically resizable, reasonable scalable array seems like 
the best data structure to use.
>> Also, the fact that using lists crashes for the same data set is
>> surprising.  Is there a similar hard limit for lists, or would this be a
>> bug?  Should I post a test case?
>>     
>
> Depends on the platform you use.  In principle, Caml should report
> stack overflows cleanly, by throwing a Stack_overflow exception.
> However, this is hard to do in native code as it depends a lot on the
> processor and OS used.  So, some combinations (e.g. x86/Linux) will
> report stack overflows via an exception, and others will let the
> kernel generate a segfault.
>
> If you're getting the segfault under x86/Linux for instance, please
> post a test case on the bug tracking system.  It's high time that
> Damien shaves :-)
>
>   
Due to some confusion on my part (I didn't realize stderr was buffered 
in OCaml and needed to be flushed), I erroneously thought my program was 
crashing on building the list.  It was, in fact, a List.map lurking 
further along in my code.  I'm running PPC/OS X, which I'm guessing is 
one of those platforms that lets the kernel generate a segfault?  I 
tried it on Linux and it threw an exception as expected.  Damien's 
facial hair is safe. :)
> - Xavier Leroy
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>   
Frederick Akalin


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19  5:57   ` Frederick Akalin
@ 2006-05-19  6:21     ` Oliver Bandel
  2006-05-19 12:15     ` Jon Harrop
  2006-05-19 16:28     ` Jozef Kosoru
  2 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19  6:21 UTC (permalink / raw)
  To: caml-list

On Thu, May 18, 2006 at 10:57:30PM -0700, Frederick Akalin wrote:
[...]
> I see.  This is a perfectly reasonable explanation.  I ended up just 
> using Bigarrays of floats for now and converting from those to 3-d 
> vectors on demand (what I need), but it would be nice to have a type 
> that wraps around Array that can get around the 4M limit (using an array 
> of arrays like someone suggested earlier).  It's sad, but I think 32-bit 
> is going to be around for a while, and surely I can't be the only person 
> to run up against this. :) Not that I mind writing such a library and 
> releasing it.  I wonder if the Extlib guys would be interested...
> >A better idea would be to determine exactly what data structure you need:
> >which abstract operations, what performance requirements, etc.  C++
> >and Lisp programmers tend to encode everything as arrays or lists,
> >respectively, but quite often these are not the best data structure
> >for the application of interest.
[...]

You have > 4 Million points?

What datastructure are you using right now?

Some code to show?

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19  5:57   ` Frederick Akalin
  2006-05-19  6:21     ` Oliver Bandel
@ 2006-05-19 12:15     ` Jon Harrop
  2006-05-19 19:36       ` akalin
  2006-05-19 16:28     ` Jozef Kosoru
  2 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-19 12:15 UTC (permalink / raw)
  To: caml-list

On Friday 19 May 2006 06:57, Frederick Akalin wrote:
> I see.  This is a perfectly reasonable explanation.  I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an array
> of arrays like someone suggested earlier).  It's sad, but I think 32-bit
> is going to be around for a while, and surely I can't be the only person
> to run up against this. :) Not that I mind writing such a library and
> releasing it.  I wonder if the Extlib guys would be interested...

If you want your code to be memory efficient then using 32-bit floats via big 
arrays would seem like a good idea. If you want your code to be fast then 
you'll want a monomorphic data structure to avoid the overhead of polymorphic 
function calls (which is often very significant on simple, numerical 
container types).

> > A better idea would be to determine exactly what data structure you need:
> > which abstract operations, what performance requirements, etc.  C++
> > and Lisp programmers tend to encode everything as arrays or lists,
> > respectively, but quite often these are not the best data structure
> > for the application of interest.
>
> I find your assertion surprising.  C++ and Common LISP are by no means
> lacking in standard data structures (and using bare arrays is
> discouraged in C++, as far as I know) and in my experience I haven't
> much seen C++ code that used arrays/vectors when not appropriate.

Yes, I would say that C and Fortran programmers overuse arrays because other 
data structures are prohibitively difficult to implement and reuse.

> In any case, in my application (a raytracer) I am reading in lists of
> numbers from a file representing the points of a mesh and the triangles
> that make up the mesh (represented by a list of indices into the mesh
> list).  A dynamically resizable, reasonable scalable array seems like
> the best data structure to use.

Why not use a list and then apply Array.of_list?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19  5:57   ` Frederick Akalin
  2006-05-19  6:21     ` Oliver Bandel
  2006-05-19 12:15     ` Jon Harrop
@ 2006-05-19 16:28     ` Jozef Kosoru
  2006-05-19 20:08       ` Oliver Bandel
                         ` (2 more replies)
  2 siblings, 3 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-19 16:28 UTC (permalink / raw)
  To: caml-list

On Thu, May 18, 2006 at 22:57:30 -0700, Frederick Akalin wrote:
> >Historical digression: this representation decision was initially
> >taken when designing Caml Light in 1989-1990.  At that time, even
> >professional workstations ... Little did I know that the 32-bitters
> >would survive so long.  Now, it's 2006, and 64-bit processors are
> >becoming universally available, in desktop machines at least.  (I've
> >been running an AMD64 PC at home since january 2005 with absolutely
> >zero problems.)  So, no the data representations of OCaml are not
> >going to change to lift the array size limit on 32-bit machines.
>  
> I see. This is a perfectly reasonable explanation.  I ended up just
> using Bigarrays of floats for now and converting from those to 3-d
> vectors on demand (what I need), but it would be nice to have a type
> that wraps around Array that can get around the 4M limit (using an
> array of arrays like someone suggested earlier).  It's sad, but I
> think 32-bit is going to be around for a while, and surely I can't be
> the only person to run up against this. :) Not that I mind writing
> such a library and releasing it.  I wonder if the Extlib guys would be
> interested...

Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
RAM is now standard and 1G RAM is very common for an average PC - having
a programming language with 4M limit for the array size is like to have
an 8+3 characters filename limitation on a filesystem using a mainstream
300G disk in 2006. However I understand the techical difficulty to solve
this issue - it's pretty annoying anyhow :)

Jozef

-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-16  8:20   ` Nicolas Cannasse
@ 2006-05-19 17:13     ` Xavier Leroy
  0 siblings, 0 replies; 67+ messages in thread
From: Xavier Leroy @ 2006-05-19 17:13 UTC (permalink / raw)
  To: Nicolas Cannasse; +Cc: akalin, caml-list

>> [Stack overflow in ocamlopt-generated code]

> A segfault will happen on Windows/MSVC port. I also found some cases
> where the commandline program (the haXe compiler in that case) just
> silently exited on Stack Overflow (exit code was not 0 but no error or
> "program error" infamous message box was displayed).
>
> I think that there is some MSVC specific C extension for catching such
> stack overflows (__try / __except*). It would be nice to have such a
> handling at the ocaml toplevel that would at least exit with a
> meaningful error message.

I'm aware of __try/__finally, but a bit of experimentation suggested
that it doesn't play nice with OCaml's exception handling.  There is a
lower-level API whose name I cannot remember that might be usable
within Caml, but I didn't pursue this.  Windows users are welcome to
contribute suggestions or code.

- Xavier Leroy



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 12:15     ` Jon Harrop
@ 2006-05-19 19:36       ` akalin
  2006-05-19 20:17         ` Oliver Bandel
  0 siblings, 1 reply; 67+ messages in thread
From: akalin @ 2006-05-19 19:36 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

> On Friday 19 May 2006 06:57, Frederick Akalin wrote:
>> > A better idea would be to determine exactly what data structure you
>> need:
>> > which abstract operations, what performance requirements, etc.  C++
>> > and Lisp programmers tend to encode everything as arrays or lists,
>> > respectively, but quite often these are not the best data structure
>> > for the application of interest.
>>
>> I find your assertion surprising.  C++ and Common LISP are by no means
>> lacking in standard data structures (and using bare arrays is
>> discouraged in C++, as far as I know) and in my experience I haven't
>> much seen C++ code that used arrays/vectors when not appropriate.
>
> Yes, I would say that C and Fortran programmers overuse arrays because
> other
> data structures are prohibitively difficult to implement and reuse.

I would agree with you on your statement about C and Fortran.  However, I
was talking about C++, which has now grown far beyond its "C with Classes"
legacy.  C++'s standard library includes all of the most common data
structures (maps, vectors, lists), so you'd have to go out of your way to
use arrays for everything.

>> In any case, in my application (a raytracer) I am reading in lists of
>> numbers from a file representing the points of a mesh and the triangles
>> that make up the mesh (represented by a list of indices into the mesh
>> list).  A dynamically resizable, reasonable scalable array seems like
>> the best data structure to use.
>
> Why not use a list and then apply Array.of_list?

I did try that.  Then I ran up against the 4 MB limit.  Which led to my
original e-mail complaining about it here... :)

Frederick Akalin


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 16:28     ` Jozef Kosoru
@ 2006-05-19 20:08       ` Oliver Bandel
  2006-05-19 21:26       ` Jon Harrop
  2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
  2 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19 20:08 UTC (permalink / raw)
  To: caml-list

On Fri, May 19, 2006 at 06:28:44PM +0200, Jozef Kosoru wrote:
> On Thu, May 18, 2006 at 22:57:30 -0700, Frederick Akalin wrote:
> > >Historical digression: this representation decision was initially
> > >taken when designing Caml Light in 1989-1990.  At that time, even
> > >professional workstations ... Little did I know that the 32-bitters
> > >would survive so long.  Now, it's 2006, and 64-bit processors are
> > >becoming universally available, in desktop machines at least.  (I've
> > >been running an AMD64 PC at home since january 2005 with absolutely
> > >zero problems.)  So, no the data representations of OCaml are not
> > >going to change to lift the array size limit on 32-bit machines.
> >  
> > I see. This is a perfectly reasonable explanation.  I ended up just
> > using Bigarrays of floats for now and converting from those to 3-d
> > vectors on demand (what I need), but it would be nice to have a type
> > that wraps around Array that can get around the 4M limit (using an
> > array of arrays like someone suggested earlier).  It's sad, but I
> > think 32-bit is going to be around for a while, and surely I can't be
> > the only person to run up against this. :) Not that I mind writing
> > such a library and releasing it.  I wonder if the Extlib guys would be
> > interested...
> 
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC - having
> a programming language with 4M limit for the array size is like to have
> an 8+3 characters filename limitation on a filesystem using a mainstream
> 300G disk in 2006. However I understand the techical difficulty to solve
> this issue - it's pretty annoying anyhow :)
[...]

Well, my first computer (VC 20/VIC 20) had about 3,5 kB RAM, was an 8-Bit
processor based computer (6502 CPU :)) and had 1 MHz clock frequency :)


Some years later HP came up with their HP 9000 workstation,
32 Bit processors they had! :)

I thought that some years later I also would have such a beast!

But: It took  a too long time.... I was a computer-"hater" for many
years, had jumped into analog technics (electrical engineering)
and only came back to programming at the end of the study, when
I merged eletrotechnics and programming (noise simulation and
analog-digital converting in C under DOS or windows, don't know,
but I think it was a x386 processor machine with maybe 100 MHz clock frequency?
 looking for quantization errors, yielded by noise - all simulated in C)


What I wanted to say: yes, the technological "revolution" we so often
hear in the media, is much slower than it could be...

...the fastest machines are created for game playing.

So, when the next playstation comes out, I maybe buy one and install
Linux on it. I hope, that there will be a native OCaml-port for it then :)


Best Regards,
       Oliver



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 19:36       ` akalin
@ 2006-05-19 20:17         ` Oliver Bandel
  0 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-19 20:17 UTC (permalink / raw)
  To: caml-list

On Fri, May 19, 2006 at 03:36:10PM -0400, akalin@akalin.cx wrote:
> > On Friday 19 May 2006 06:57, Frederick Akalin wrote:
> >> > A better idea would be to determine exactly what data structure you
> >> need:
> >> > which abstract operations, what performance requirements, etc.  C++
> >> > and Lisp programmers tend to encode everything as arrays or lists,
> >> > respectively, but quite often these are not the best data structure
> >> > for the application of interest.
> >>
> >> I find your assertion surprising.  C++ and Common LISP are by no means
> >> lacking in standard data structures (and using bare arrays is
> >> discouraged in C++, as far as I know) and in my experience I haven't
> >> much seen C++ code that used arrays/vectors when not appropriate.
> >
> > Yes, I would say that C and Fortran programmers overuse arrays because
> > other
> > data structures are prohibitively difficult to implement and reuse.
> 
> I would agree with you on your statement about C and Fortran.  However, I
> was talking about C++, which has now grown far beyond its "C with Classes"
> legacy.  C++'s standard library includes all of the most common data
> structures (maps, vectors, lists), so you'd have to go out of your way to
> use arrays for everything.

But IMHO even today these lists are not type-polymorphc like in
OCaml.


==================
first:~ oliver$ ocaml
        Objective Caml version 3.09.0

# List.map;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# List.map (fun x -> x * 3 ) [2; 5;1;22;4];;
- : int list = [6; 15; 3; 66; 12]
# List.map (fun  str -> str ^ str ) [ "Hi"; "yes"; "hello"; "world" ];;
- : string list = ["HiHi"; "yesyes"; "hellohello"; "worldworld"]
# 
==================

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 16:28     ` Jozef Kosoru
  2006-05-19 20:08       ` Oliver Bandel
@ 2006-05-19 21:26       ` Jon Harrop
  2006-05-20  1:06         ` Brian Hurt
  2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
  2 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-19 21:26 UTC (permalink / raw)
  To: caml-list

On Friday 19 May 2006 17:28, Jozef Kosoru wrote:
> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC - having
> a programming language with 4M limit for the array size is like to have
> an 8+3 characters filename limitation on a filesystem using a mainstream
> 300G disk in 2006.

I find it more concerning that array length is a function of the type, i.e. 
different for float array.

> However I understand the techical difficulty to solve  
> this issue - it's pretty annoying anyhow :)

Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths 
and strings as char arrays?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 16:28     ` Jozef Kosoru
  2006-05-19 20:08       ` Oliver Bandel
  2006-05-19 21:26       ` Jon Harrop
@ 2006-05-20  0:57       ` Brian Hurt
  2006-05-20  1:17         ` Frederick Akalin
                           ` (2 more replies)
  2 siblings, 3 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20  0:57 UTC (permalink / raw)
  To: Jozef Kosoru; +Cc: caml-list



On Fri, 19 May 2006, Jozef Kosoru wrote:

> Yes. 32-bit x86 platform is not going away anytime soon. Given that 512M
> RAM is now standard and 1G RAM is very common for an average PC

This is what boggles my imagination.  The combination of obstinate 
adherence to a limited platform (x86-32) in the same breath as a 
recognition that we are approaching the limitations of that architecture. 
It's the 640K limit all over again.

And no, segments will not help the situation.  Every single process is 
limited to 4G of address space, period.  Read the Intel CPU docs.  With 
reasonable amounts of virtual memory we're well above that already- and 
we're approaching that with real memory.

Now, I realize the core reasons for the delay in moving to 64-bits are 
industry wide.  Intel's Itanium fiasco delayed Intel introducing a 64-bit 
chip at least 7 years.  And Microsoft seems to be incapable of releasing a 
new OS- 32-bit or 64-bit.  But that, I think, is the core of the problem- 
Ocaml's array limit is just one of many symptoms.

And that's my point- we should be looking to fix the underlying problem, 
not looking to patch the symptoms.  Because often times patching the 
symptoms and not addressing the core problem simply makes the whole 
situation worse- the underlying problem simply shows up in new ways, and 
the fix for the specific symptom often causes new problems.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-19 21:26       ` Jon Harrop
@ 2006-05-20  1:06         ` Brian Hurt
  2006-05-20 18:32           ` brogoff
  2006-05-20 21:11           ` immutable strings (Re: [Caml-list] " Oliver Bandel
  0 siblings, 2 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20  1:06 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list



On Fri, 19 May 2006, Jon Harrop wrote:

> Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> and strings as char arrays?

Why not just run Ocaml as a 64-bit app on a 64-bit OS?

We I designing a language today, I'd have 63-bit array lengths- of course, 
I'd do it by not bothering to support 32-bit systems...

As for strings, I'd be inclined to make them immutable- the correct way to 
manipulate strings is with regular expressions.  But I'm widely 
acknowledged to be an extremist.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
@ 2006-05-20  1:17         ` Frederick Akalin
  2006-05-20  1:52           ` Brian Hurt
  2006-05-21  9:26           ` Richard Jones
  2006-05-20 10:51         ` Jozef Kosoru
  2006-05-20 15:15         ` Don Syme
  2 siblings, 2 replies; 67+ messages in thread
From: Frederick Akalin @ 2006-05-20  1:17 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list


On May 19, 2006, at 5:57 PM, Brian Hurt wrote:

> Now, I realize the core reasons for the delay in moving to 64-bits  
> are industry wide.  Intel's Itanium fiasco delayed Intel  
> introducing a 64-bit chip at least 7 years.  And Microsoft seems to  
> be incapable of releasing a new OS- 32-bit or 64-bit.  But that, I  
> think, is the core of the problem- Ocaml's array limit is just one  
> of many symptoms.
>
> And that's my point- we should be looking to fix the underlying  
> problem, not looking to patch the symptoms.  Because often times  
> patching the symptoms and not addressing the core problem simply  
> makes the whole situation worse- the underlying problem simply  
> shows up in new ways, and the fix for the specific symptom often  
> causes new problems.
>
> Brian

I think that's an awfully simplistic point of view.  My problem is  
that I want to store more than 4 million items in an array.  You're  
suggesting moving to 64 bits?  And even if everyone magically moves  
to 64 bits, I can imagine that in the future someone will want to  
read more than 2^(64 - 10) items from a file into memory, as 64 bits  
isn't quite "number of atoms in the universe" big yet.

Fred


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  1:17         ` Frederick Akalin
@ 2006-05-20  1:52           ` Brian Hurt
  2006-05-20  9:08             ` Jozef Kosoru
  2006-05-21  9:26           ` Richard Jones
  1 sibling, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-20  1:52 UTC (permalink / raw)
  To: Frederick Akalin; +Cc: caml-list



On Fri, 19 May 2006, Frederick Akalin wrote:

> I think that's an awfully simplistic point of view.  My problem is that I 
> want to store more than 4 million items in an array.  You're suggesting 
> moving to 64 bits?  And even if everyone magically moves to 64 bits, I can 
> imagine that in the future someone will want to read more than 2^(64 - 10) 
> items from a file into memory, as 64 bits isn't quite "number of atoms in the 
> universe" big yet.

And in 20 years or so, assuming Moore's law holds, we'll need to 
transition to 128-bit architectures.  Servers should probably transistion 
a little sooner, maybe 15 years out.

Of course, a large hunk of my annoyance here is comming from C-99 and long 
long- breaking conformant code in a desperate (and- might I add- futile) 
attempt to patch broken and non-conformant code around what will be a 
temporary problem (the only question left is the definition of temporary). 
Eventually, just going to 64 bits will be the solution- just like just 
going to 32 bits was the solution to the 640K limit.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  1:52           ` Brian Hurt
@ 2006-05-20  9:08             ` Jozef Kosoru
  2006-05-20 10:12               ` skaller
       [not found]               ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
  0 siblings, 2 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20  9:08 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Fri, May 19, 2006 at 21:52:26 -0400, Brian Hurt wrote:
> >I think that's an awfully simplistic point of view.  My problem is
> >that I want to store more than 4 million items in an array.  You're
> >suggesting moving to 64 bits?  And even if everyone magically moves
> >to 64 bits, I can imagine that in the future someone will want to
> >read more than 2^(64 - 10) items from a file into memory, as 64 bits
> >isn't quite "number of atoms in the universe" big yet.
> 
> And in 20 years or so, assuming Moore's law holds, we'll need to
> transition to 128-bit architectures.  Servers should probably
> transistion a little sooner, maybe 15 years out.

Sure. Average consumers in 20 years, servers in 15 years and OCaml users
in 5 years (remember N-10!).
 
jozef
-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  9:08             ` Jozef Kosoru
@ 2006-05-20 10:12               ` skaller
  2006-05-20 11:06                 ` Jozef Kosoru
  2006-05-20 21:42                 ` Oliver Bandel
       [not found]               ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
  1 sibling, 2 replies; 67+ messages in thread
From: skaller @ 2006-05-20 10:12 UTC (permalink / raw)
  To: Jozef Kosoru; +Cc: Brian Hurt, caml-list

On Sat, 2006-05-20 at 11:08 +0200, Jozef Kosoru wrote:

> Sure. Average consumers in 20 years, servers in 15 years and OCaml users
> in 5 years (remember N-10!).

I'm not so sure it will happen like that. Look more at:

Today, newer desktops are all dual core. We've gone from
1 processor to 2. How long will it take to get 4?

It surely will not be 20 years.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
  2006-05-20  1:17         ` Frederick Akalin
@ 2006-05-20 10:51         ` Jozef Kosoru
  2006-05-20 14:22           ` Brian Hurt
  2006-05-20 22:07           ` Oliver Bandel
  2006-05-20 15:15         ` Don Syme
  2 siblings, 2 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 10:51 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Fri, May 19, 2006 at 20:57:35 -0400, Brian Hurt wrote:
> >Yes. 32-bit x86 platform is not going away anytime soon. Given that
> >512M RAM is now standard and 1G RAM is very common for an average PC
> 
> This is what boggles my imagination.  The combination of obstinate
> adherence to a limited platform (x86-32) in the same breath as a
> recognition that we are approaching the limitations of that
> architecture.  It's the 640K limit all over again.
> 
> And no, segments will not help the situation.  Every single process is
> limited to 4G of address space, period.  Read the Intel CPU docs.
> With reasonable amounts of virtual memory we're well above that
> already- and we're approaching that with real memory.

4G address space limit is off-topic here. Please note that the OCaml
array size limit is 4M not 4G.
 
> Now, I realize the core reasons for the delay in moving to 64-bits are
> industry wide.  Intel's Itanium fiasco delayed Intel introducing a
> 64-bit chip at least 7 years.  And Microsoft seems to be incapable of
> releasing a new OS- 32-bit or 64-bit.  But that, I think, is the core
> of the problem- Ocaml's array limit is just one of many symptoms.
> 
> And that's my point- we should be looking to fix the underlying
> problem, not looking to patch the symptoms.  Because often times
> patching the symptoms and not addressing the core problem simply makes
> the whole situation worse- the underlying problem simply shows up in
> new ways, and the fix for the specific symptom often causes new
> problems.

I disagree. Actually I think an opposite. A suggestion to move to 64-bit
is just patching the symptoms of the real problem - a design flaw in the
OCaml programming language. And this issue will not go away on 64-bits.
Just like 22 bits for the array size is a completely artificial limit on
32-bit platforms, (N - 10) imposes once again such an unlogical 54 bits
limit on 64-bit computers. And I don't think "it makes complete sense on
a 64-bit architecture".

If you create OCaml programs for yourself then maybe you can switch to
64-bits overnight (and perpahs you already did so). But if you need to
distribute your application to customers/users then it's much more
complicated. And explaining them that they should upgrade all
workstations because your software has various 4M limitations here and
there sounds like a bad joke, doesn't it?

32-bit x86 platform is still good enough for many purposes and its 2^32
limit is acceptable in most cases while 2^22 is not.

Jozef

-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 10:12               ` skaller
@ 2006-05-20 11:06                 ` Jozef Kosoru
  2006-05-20 12:02                   ` skaller
  2006-05-20 21:42                 ` Oliver Bandel
  1 sibling, 1 reply; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 11:06 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On Sat, May 20, 2006 at 20:12:22 +1000, skaller wrote:
> > Sure. Average consumers in 20 years, servers in 15 years and OCaml
> > users in 5 years (remember N-10!).
> 
> I'm not so sure it will happen like that. Look more at:
> 
> Today, newer desktops are all dual core. We've gone from
> 1 processor to 2. How long will it take to get 4?

Yes, but that doesn't change the fact 99% of PCs out there still have a
single core CPU. Besides, adding more cores on the chip is backward
compatible, it isn't the quite same thing as changing the CPU
architecture.
 
> It surely will not be 20 years.

I think the migration from 64-bit to 128-bit architecture will take more
than 20 years. But such things are generally very hard to predict.

-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 11:06                 ` Jozef Kosoru
@ 2006-05-20 12:02                   ` skaller
  0 siblings, 0 replies; 67+ messages in thread
From: skaller @ 2006-05-20 12:02 UTC (permalink / raw)
  To: Jozef Kosoru; +Cc: caml-list

On Sat, 2006-05-20 at 13:06 +0200, Jozef Kosoru wrote:
>  
> > It surely will not be 20 years.
> 
> I think the migration from 64-bit to 128-bit architecture will take more
> than 20 years. But such things are generally very hard to predict.

I'm not disputing that -- I was asking how long it will
take until we have 4-core CPU chips in commercial quantities
for desktops.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 10:51         ` Jozef Kosoru
@ 2006-05-20 14:22           ` Brian Hurt
  2006-05-20 18:41             ` j h woodyatt
                               ` (2 more replies)
  2006-05-20 22:07           ` Oliver Bandel
  1 sibling, 3 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-20 14:22 UTC (permalink / raw)
  To: Jozef Kosoru; +Cc: caml-list



On Sat, 20 May 2006, Jozef Kosoru wrote:

> 4G address space limit is off-topic here. Please note that the OCaml
> array size limit is 4M not 4G.

It is entirely on-topic.  How much headroom do you think you have?

Let's perform an experiment here.  Let's assume that Xavier decides you're 
right, and releases a new version of Ocaml with removes the 4M array 
limit.  How big of an array will you be able to create then?  More than 
4M, yes- but how much more?

The hardware imposes a hard and fast 4G limit.  On 32-bit x86 you can get 
a process larger than this.  But note, this includes everything- not just 
4G of data, but also all code, stack, shared libraries, address space set 
aside for the OS, everything.  The OS alone wants to grab at least 1G, if 
not 2G of address space (remember that it needs to mmap that hot new 512M 
graphics card you just bought- so claiming 1G as a minimum isn't extreme, 
as 512M will go to the graphics card).  To make life simple, let's say 
that everything except the huge array will fit in the upper 2G, leaving 2G 
for the huge array.

Now if it's a huge array of integers, each element only takes up 4 bytes, 
which means we can fit 512M integers in this array.  This sounds nice- on 
paper.  Wow, we could make an array 128 times as big as we can now! 
Except the proper way to look at this is that it's a mean 7 bits larger. 
And that's best case.

If we changed it to an array of floats, each float now takes 8 bytes, and 
that 2G can only hold 256M floats.  Our 128x improvement has now dropped 
to a 64x improvement- and we're starting to get a bad feeling about where 
this is going.

If we start trying to hold 3-dimensional pointers, that's 3 floats- and 
our 64x improvement has now dropped to about a 21x improvement.  Skaller's 
about to pipe up right about here and proclaim the advantages of 
single-precision floats, which would get you back up to about a 42x 
improvement.  But also note that I'm ignoring the boxing costs here- 
adding the tag word for the structure plus the pointer in the array (a 
structure is boxed) ups us from a size of 24 bytes to 32 bytes (for double 
precision- 12 bytes to 20 bytes for single precision), making our 21x 
improvement (42x improvement) a mere 16x (25x) improvement.

Now, what happens if we try to hold a triangle of three points (i.e. for 
3D polygon mapping)?  How many polygons can we hold?  Well, a polygon 
takes 9 floats (3 floats each for 3 points), so that's 72 bytes there. 
Assume the polygon is a structure and each point is a structure, and that 
means 4 tag words (1 for each of the 3 points + 1 for the polygon 
structure) and 4 pointers of 4 bytes each, for a grand total of 104 bytes 
per polygon.  2G can now hold about 19M unique polygons- a mere 5x 
advantage.

The 4M limit is "artificial" and "low"- but not by as much as you might 
think.

And the 4G limit starts hitting you in other ways.  Consider the classic 
programming pattern of reading the entire file into a list (or array) of 
strings.  Considering that I can get a 300G hard disk for about $110:
http://froogle.google.com/froogle_cluster?q=SATA+hard+disk&pid=4829187676318423498&oid=15465117802862862360&btnG=Search+Froogle&scoring=mrd&hl=en

It costs about $0.73USD to buy 2G of hard disk space.  I try to slurp that 
73 cent file into memory, and boom- welcome to the 4G limit.  Heck, you 
have to do special incantations just to open and read a file that big, let 
only trying to store the whole thing in memory.  This is why I claim that 
once hard disks start getting larger than the address space, the address 
space needs to increase.

And I'm not even going to go into "subtle" bugs, like ptrdiff_t overflows, 
or the fact that as the adress space fills, it's more likely a wild 
pointer will point to valid memory, and thus cause harder to debug 
problems than simple segfaults, here.

On the other side, increasing the array size has a cost.  Among other 
things, it slows down garbage collection.  If done badly, it could break 
existing code.  The C-99 standard did this- by introducing long long, it 
*silently* broke conformant code.  I've yet to forgive them for doing 
this.  More to the point, it silently broke *my* code.  Which is why I'm 
violently allergic to suggestions that we "patch around" the 32-bit 
limitation.  When people suggest this, I tend to hear "I want to break 
your code".  Because that's how it worked last time.

> I disagree. Actually I think an opposite. A suggestion to move to 64-bit
> is just patching the symptoms of the real problem - a design flaw in the
> OCaml programming language. And this issue will not go away on 64-bits.
> Just like 22 bits for the array size is a completely artificial limit on
> 32-bit platforms, (N - 10) imposes once again such an unlogical 54 bits
> limit on 64-bit computers. And I don't think "it makes complete sense on
> a 64-bit architecture".

By this logic, Ocaml should be able to support array sizes of 10^100th on 
16-bit platforms.  Why is Ocaml imposing stupid artificial limits?

I note with humor that the Java programming language a) defined ints to be 
32 bits, and b) made the index type for arrays, vectors, and many other 
data structures ints and not longs.  When you move to 64 bits, it's Java, 
much more so than Ocaml, that hits artificial and low limits on array 
size.  Ocaml will be able to have arrays millions of times larger than the 
largest possible array in Java.

> If you create OCaml programs for yourself then maybe you can switch to
> 64-bits overnight (and perpahs you already did so). But if you need to
> distribute your application to customers/users then it's much more
> complicated. And explaining them that they should upgrade all
> workstations because your software has various 4M limitations here and
> there sounds like a bad joke, doesn't it?

You call them 32-bit limits, which they are.  I admit that this is a 
problem- but it's not one created by Ocaml.

>
> 32-bit x86 platform is still good enough for many purposes and its 2^32
> limit is acceptable in most cases while 2^22 is not.

Except that you're here comparing apples to oranges again.  The 2^22 limit 
is in *words*, while the 2^32 limit is in *bytes*.  So it's really more of 
a 2^24 vr.s 2^31 comparison.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [Caml-list] Array 4 MB size limit
  2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
  2006-05-20  1:17         ` Frederick Akalin
  2006-05-20 10:51         ` Jozef Kosoru
@ 2006-05-20 15:15         ` Don Syme
  2006-05-20 22:15           ` Oliver Bandel
  2 siblings, 1 reply; 67+ messages in thread
From: Don Syme @ 2006-05-20 15:15 UTC (permalink / raw)
  To: Brian Hurt, Jozef Kosoru; +Cc: caml-list


> And Microsoft seems to be incapable of releasing a 
> new OS- 32-bit or 64-bit.  

64-bit editions of Windows XP and Windows 2003 Server have been
available for quite a while, e.g.

http://www.microsoft.com/windowsxp/64bit/default.mspx

The maximum memory sizes supported are listed here:
http://www.microsoft.com/windowsxp/64bit/facts/top10.mspx

Cheers!
Don



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  1:06         ` Brian Hurt
@ 2006-05-20 18:32           ` brogoff
  2006-05-20 21:29             ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
  2006-05-20 21:11           ` immutable strings (Re: [Caml-list] " Oliver Bandel
  1 sibling, 1 reply; 67+ messages in thread
From: brogoff @ 2006-05-20 18:32 UTC (permalink / raw)
  To: caml-list

On Fri, 19 May 2006, Brian Hurt wrote:
> On Fri, 19 May 2006, Jon Harrop wrote:
>
> > Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> > and strings as char arrays?

Strings are random access character collections, so under the hood they might
be arrays, but I think mutable strings as the default string are a mistake.
One can imagine other representations for strings, like ropes. I'd like a
language to allow many different underlying string representations.

Extensible arrays are useful, but they're slower than fixed sized arrays, so
it would be best to provide both.

It would be better if there were no special behavior for float arrays. For
performance, monomorphic  FloatArray or even Float32Array, Float64Array,
etc., would be fine with me. And so would the same approach for strings
with different character types. It would get mildly annoying without
overloading, but maybe that would prompt the language designers to address
that issue.

> We I designing a language today, I'd have 63-bit array lengths- of course,
> I'd do it by not bothering to support 32-bit systems...

That would, even today, dramatically reduce the number of potential users
of your language.

> As for strings, I'd be inclined to make them immutable-

I strongly agree with that.

> the correct way to manipulate strings is with regular expressions.

But definitely not with that. REs have their place in the toolbox though.

-- Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 14:22           ` Brian Hurt
@ 2006-05-20 18:41             ` j h woodyatt
  2006-05-20 19:37               ` Jon Harrop
  2006-05-20 20:47             ` Jozef Kosoru
  2006-05-26 18:34             ` Ken Rose
  2 siblings, 1 reply; 67+ messages in thread
From: j h woodyatt @ 2006-05-20 18:41 UTC (permalink / raw)
  To: The Caml Trade

On May 20, 2006, at 7:22 AM, Brian Hurt wrote:
>
> Consider the classic programming pattern of reading the entire file  
> into a list (or array) of strings.  [...] boom- welcome to the 4G  
> limit.  Heck, you have to do special incantations just to open and  
> read a file that big, let only trying to store the whole thing in  
> memory.  This is why I claim that once hard disks start getting  
> larger than the address space, the address space needs to increase.

Sounds like it's an *anti*-pattern to me.  The offset parameter to  
mmap(2) is there for a reason.

There is, however, another interesting point I hope the Caml team at  
INRIA keeps in mind.  I hear rumors that 64-bit processes on  
[redacted] may not have access to all the application support  
libraries.  Some of them, e.g. [redacted] and [redacted], may only be  
available to 32-bit processes-- for technical reasons, not business  
reasons.  The engineering resources to make the libraries available  
to 64-bit processes are there, but the technical case for making 64- 
bit versions of the libraries is apparently losing.  At least for the  
time being.  I would be unsurprised if technical decisions like this  
are happening all over the application services programming world.

What this tells me is that 64-bit hardware and 64-bit operating  
systems will very likely continue to have 32-bit processes running on  
them for some time to come.  We will be living with 32-bit address  
spaces for many years after everyone gets 64-bit operating systems  
and 64-bit desktop machines.

All that said, I've got little sympathy for the critics on this  
issue.  The 4 MB array size limit is a fact of OCaml life, but  
there's a reasonable workaround that will get most application  
programmers around it, i.e. careful use of Bigarray.  The 4G size  
limit is another story altogether, and it will plague everyone  
equally-- not just the OCaml world.


—
j h woodyatt <jhw@conjury.org>




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 18:41             ` j h woodyatt
@ 2006-05-20 19:37               ` Jon Harrop
  0 siblings, 0 replies; 67+ messages in thread
From: Jon Harrop @ 2006-05-20 19:37 UTC (permalink / raw)
  To: caml-list

On Saturday 20 May 2006 19:41, j h woodyatt wrote:
> The 4G size
> limit is another story altogether, and it will plague everyone
> equally-- not just the OCaml world.

When I've ported C++ programs to OCaml they often use ~2x as much memory, so 
you could argue that a memory limit is more limiting for OCaml users. 
However, my personal impression is that memory has gone from the scarce 
resource that it was 20 years ago to a resource that is now commonly wasted 
and moving from C++ to OCaml certainly fits in with that trend.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
       [not found]               ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
@ 2006-05-20 19:52                 ` Jozef Kosoru
  2006-05-20 21:45                   ` Oliver Bandel
  0 siblings, 1 reply; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 19:52 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Sat, May 20, 2006 at 09:02:56 -0400, Brian Hurt wrote:
> >Sure. Average consumers in 20 years, servers in 15 years and OCaml
> >users in 5 years (remember N-10!).
> 
> No.  Ocaml users in about 20 years.
> 
> The math:
>
> [...]
> 
> Ocaml users actually won't start hitting problems until after that-
> probably more like 2050 or so.  It's N-10, remember?  It's not until
> memory sizes start hitting the peta-byte sizes- 2^51 elements being
> held in 2^54 bytes of memory, that 64-bit Ocaml starts hitting
> problems.  10 bits out of 32 is a much larger percentage than 10 bits
> out of 64.
> 
> [...]
> 
> And remember that this is assuming that Moore's law continues in it's
> exponential pace for the next 50-odd years or more.  I'd have to rate
> this as *highly* unlikely.  If we fall off of the exponential growth
> curve at any point much before 2040 or so, the switch to 128-bit
> architectures may not happen (or may not happen for centuries).
> 
> Sorry, the alarmist "if 32 bits isn't enough, than neither is 64!"
> argument doesn't hold water.

Ok, perhaps you are correct. But that doesn't help programmers facing
this problem on 32-bit x86 architecture now.

-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 14:22           ` Brian Hurt
  2006-05-20 18:41             ` j h woodyatt
@ 2006-05-20 20:47             ` Jozef Kosoru
  2006-05-26 18:34             ` Ken Rose
  2 siblings, 0 replies; 67+ messages in thread
From: Jozef Kosoru @ 2006-05-20 20:47 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Sat, May 20, 2006 at 10:22:42 -0400, Brian Hurt wrote:
> >4G address space limit is off-topic here. Please note that the OCaml
> >array size limit is 4M not 4G.
> 
> It is entirely on-topic.  How much headroom do you think you have?
> 
> Let's perform an experiment here.  Let's assume that Xavier decides you're 
> right, and releases a new version of Ocaml with removes the 4M array 
> limit.  How big of an array will you be able to create then?  More than 
> 4M, yes- but how much more?
> 
> The hardware imposes a hard and fast 4G limit. [...] To make life
> simple, let's say that everything except the huge array will fit in
> the upper 2G, leaving 2G for the huge array.
> 
> Now if it's a huge array of integers, each element only takes up 4
> bytes, which means we can fit 512M integers in this array. [...] Wow,
> we could make an array 128 times as big as we can now! Except the
> proper way to look at this is that it's a mean 7 bits larger.  And
> that's best case.
> 
> If we changed it to an array of floats, each float now takes 8 bytes, and 
> that 2G can only hold 256M floats.  Our 128x improvement has now dropped 
> to a 64x improvement- and we're starting to get a bad feeling about where 
> this is going.
> 
> If we start trying to hold 3-dimensional pointers, that's 3 floats-
> and our 64x improvement has now dropped to about a 21x improvement.
> [...] making our 21x improvement (42x improvement) a mere 16x (25x)
> improvement.
> 
> Now, what happens if we try to hold a triangle of three points (i.e.
> for 3D polygon mapping)? [...] 2G can now hold about 19M unique
> polygons- a mere 5x advantage.

Thank you for elaborating on this issue and providing us some
interesting numbers. I think your results are valid. So it is perfectly
possible to find examples in a practice where there is a need to create
arrays from 5x to 128x bigger than OCaml allows. And that is the
problem.
 
> [...]
> 
> >I disagree. Actually I think an opposite. A suggestion to move to
> >64-bit is just patching the symptoms of the real problem - a design
> >flaw in the OCaml programming language. And this issue will not go
> >away on 64-bits.  Just like 22 bits for the array size is a
> >completely artificial limit on 32-bit platforms, (N - 10) imposes
> >once again such an unlogical 54 bits limit on 64-bit computers. And I
> >don't think "it makes complete sense on a 64-bit architecture".
> 
> By this logic, Ocaml should be able to support array sizes of 10^100th
> on 16-bit platforms.  Why is Ocaml imposing stupid artificial limits?

No. An above mentioned logic doesn't imply that. It should be 2^16 on
16-bits of course.
 
> I note with humor that the Java programming language a) defined ints
> to be 32 bits, and b) made the index type for arrays, vectors, and
> many other data structures ints and not longs.  When you move to 64
> bits, it's Java, much more so than Ocaml, that hits artificial and low
> limits on array size.  Ocaml will be able to have arrays millions of
> times larger than the largest possible array in Java.

Yeah, but we don't expect a lot of Java fans on this mailing list anyhow
;)
 
> >If you create OCaml programs for yourself then maybe you can switch
> >to 64-bits overnight (and perpahs you already did so). But if you
> >need to distribute your application to customers/users then it's much
> >more complicated. And explaining them that they should upgrade all
> >workstations because your software has various 4M limitations here
> >and there sounds like a bad joke, doesn't it?
> 
> You call them 32-bit limits, which they are.  I admit that this is a
> problem- but it's not one created by Ocaml.

It's been created by OCaml. CPU architecture is not obliged to have word
wide enough to allow programming languages to store some meta data along
with array indexes. That was OCaml "invention". An argument that 32-bit
is a crappy platform doesn't really hold. As Xavier acknowledged - OCaml
was not designed with 32-bit CPUs in mind; and that's the thing.
 
> >32-bit x86 platform is still good enough for many purposes and its
> >2^32 limit is acceptable in most cases while 2^22 is not.
> 
> Except that you're here comparing apples to oranges again.  The 2^22
> limit is in *words*, while the 2^32 limit is in *bytes*.  So it's
> really more of a 2^24 vr.s 2^31 comparison.

My point was that 2^32 (or 2^31 if you want) is usually enough to count
both apples and oranges whereas 2^22 is barely enough to count apple
trees.

But let's close this thread - I think we have discussed the problem into
great details and the final conclusion might be that consensus is not
possible. :)

Jozef
 
-- 
jozef kosoru
http://zyzstar.kosoru.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* immutable strings (Re: [Caml-list] Array 4 MB size limit)
  2006-05-20  1:06         ` Brian Hurt
  2006-05-20 18:32           ` brogoff
@ 2006-05-20 21:11           ` Oliver Bandel
  2006-05-25  4:32             ` immutable strings (Re: " Stefan Monnier
  1 sibling, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:11 UTC (permalink / raw)
  To: caml-list

On Fri, May 19, 2006 at 09:06:27PM -0400, Brian Hurt wrote:
> 
> 
> On Fri, 19 May 2006, Jon Harrop wrote:
> 
> >Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> >and strings as char arrays?
> 
> Why not just run Ocaml as a 64-bit app on a 64-bit OS?
> 
> We I designing a language today, I'd have 63-bit array lengths- of course, 
> I'd do it by not bothering to support 32-bit systems...
> 
> As for strings, I'd be inclined to make them immutable- the correct way to 
> manipulate strings is with regular expressions.  But I'm widely 
> acknowledged to be an extremist.

IMHO strings should be possibly made immutable by using the "immutable"
keyword, which is the opposite of the "mutable" keyword as it is used for
records. So the user/programmer can chose the kind of strings he7she wants.

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: immutable strings II ([Caml-list] Array 4 MB size limit)
  2006-05-20 18:32           ` brogoff
@ 2006-05-20 21:29             ` Oliver Bandel
  2006-05-22 22:09               ` Aleksey Nogin
  0 siblings, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:29 UTC (permalink / raw)
  To: caml-list

On Sat, May 20, 2006 at 11:32:15AM -0700, brogoff wrote:
> On Fri, 19 May 2006, Brian Hurt wrote:
> > On Fri, 19 May 2006, Jon Harrop wrote:
> >
> > > Agreed. Should OCaml's successor have extensible arrays with 64-bit lengths
> > > and strings as char arrays?
> 
> Strings are random access character collections, so under the hood they might
> be arrays, but I think mutable strings as the default string are a mistake.
> One can imagine other representations for strings, like ropes. I'd like a
> language to allow many different underlying string representations.

I think it would break too much code when changing the string behaviour now.

There should be a keyword "immutable" to make strings immutable.
Your idea would be to make it like records immutable as default
and then the "mutable" word must be used for mutable strings.

This would also work, but then, as mentioned above, this would berak
a lot of aleady existing code.

A compiler switch, that per default uses mutable strings,
so as to provide backwards compatibility, but can be changed
to immutable strings as default.
This default string representation then could be changed with indivdual
used keywords "mutable" and "immutable".

I don't know how much development effort this would be, but it would be a  fine thing.


============++=====================++=======================++
\ default   ||   mutable strings   || immutable  strings    ||
 \-------+  ||   as default        || as default            ||
  keyword \ || (compiler switch)   || (compiler switch)     ||
===========+++=====================++=======================++
    ./.     ||  all strings are    || all strings are       ||
            ||  mutable            || immutable             ||
============++=====================++=======================++
  mutable   || all strings are     || all strings that      ||
            || mutable             || are not declared as   ||
            ||                     || mutable are immutable ||
============++=====================++=======================++
 immutable  || all strings that    || all strings are       ||
            || are not declared as || immutable             ||
            || immutable are       ||                       ||
            || mutable             ||                       ||
============++=====================++=======================++

In short words: The compiler flag would give the default string representation
in use, and the keywords "mutable" and "immutable" used for individual strings
would overwrite that default individally.
And as default for the compiler - for backward compatibility reasons -
would then be used the mutable string representation.

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 10:12               ` skaller
  2006-05-20 11:06                 ` Jozef Kosoru
@ 2006-05-20 21:42                 ` Oliver Bandel
  2006-05-21  1:24                   ` skaller
  1 sibling, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:42 UTC (permalink / raw)
  To: caml-list

On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:
> On Sat, 2006-05-20 at 11:08 +0200, Jozef Kosoru wrote:
> 
> > Sure. Average consumers in 20 years, servers in 15 years and OCaml users
> > in 5 years (remember N-10!).
> 
> I'm not so sure it will happen like that. Look more at:
> 
> Today, newer desktops are all dual core. We've gone from
> 1 processor to 2. How long will it take to get 4?
> 
> It surely will not be 20 years.
[...]


Two dual core G5:

  http://www.apple.com/powermac/



Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 19:52                 ` Jozef Kosoru
@ 2006-05-20 21:45                   ` Oliver Bandel
  0 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 21:45 UTC (permalink / raw)
  To: caml-list

On Sat, May 20, 2006 at 09:52:46PM +0200, Jozef Kosoru wrote:
[...]
> Ok, perhaps you are correct. But that doesn't help programmers facing
> this problem on 32-bit x86 architecture now.
[...]

The answer to the solution already was mentioned on that list:
array of arrays is the answer.

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 10:51         ` Jozef Kosoru
  2006-05-20 14:22           ` Brian Hurt
@ 2006-05-20 22:07           ` Oliver Bandel
  1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 22:07 UTC (permalink / raw)
  To: caml-list

On Sat, May 20, 2006 at 12:51:08PM +0200, Jozef Kosoru wrote:
> On Fri, May 19, 2006 at 20:57:35 -0400, Brian Hurt wrote:
> > >Yes. 32-bit x86 platform is not going away anytime soon. Given that
> > >512M RAM is now standard and 1G RAM is very common for an average PC
> > 
> > This is what boggles my imagination.  The combination of obstinate
> > adherence to a limited platform (x86-32) in the same breath as a
> > recognition that we are approaching the limitations of that
> > architecture.  It's the 640K limit all over again.
> > 
> > And no, segments will not help the situation.  Every single process is
> > limited to 4G of address space, period.  Read the Intel CPU docs.
> > With reasonable amounts of virtual memory we're well above that
> > already- and we're approaching that with real memory.
> 
> 4G address space limit is off-topic here. Please note that the OCaml
> array size limit is 4M not 4G.
>  
> > Now, I realize the core reasons for the delay in moving to 64-bits are
> > industry wide.  Intel's Itanium fiasco delayed Intel introducing a
> > 64-bit chip at least 7 years.  And Microsoft seems to be incapable of
> > releasing a new OS- 32-bit or 64-bit.  But that, I think, is the core
> > of the problem- Ocaml's array limit is just one of many symptoms.
> > 
> > And that's my point- we should be looking to fix the underlying
> > problem, not looking to patch the symptoms.  Because often times
> > patching the symptoms and not addressing the core problem simply makes
> > the whole situation worse- the underlying problem simply shows up in
> > new ways, and the fix for the specific symptom often causes new
> > problems.
> 
> I disagree. Actually I think an opposite. A suggestion to move to 64-bit
> is just patching the symptoms of the real problem - a design flaw in the
> OCaml programming language.
[...]

No.

It would be a design flaw when future processors would make bigger problems.
You may have missed the historical note from Xavier: OCaml is a BACK-port.

What kind of problems would you run into, when backporting your current
applications to a 16 or 8 Bit processor?
Would you call that breaking code degned by flaw?

The year 2k problem we have left behind, but the POSIX timer (derived
from the Unix time) will overflow in 2038.

So, he have some years to switch to 64 Bit platforms,
before the year 2038 problem will make trouble ;-)


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 15:15         ` Don Syme
@ 2006-05-20 22:15           ` Oliver Bandel
  2006-05-21  1:25             ` skaller
  0 siblings, 1 reply; 67+ messages in thread
From: Oliver Bandel @ 2006-05-20 22:15 UTC (permalink / raw)
  To: caml-list

On Sat, May 20, 2006 at 04:15:52PM +0100, Don Syme wrote:
> 
> > And Microsoft seems to be incapable of releasing a 
> > new OS- 32-bit or 64-bit.  
> 
> 64-bit editions of Windows XP and Windows 2003 Server have been
> available for quite a while, e.g.

What is Windows? :->

Linux also is available since a long time on 64-Bit machines. :)


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 21:42                 ` Oliver Bandel
@ 2006-05-21  1:24                   ` skaller
  2006-05-21 14:10                     ` Oliver Bandel
  0 siblings, 1 reply; 67+ messages in thread
From: skaller @ 2006-05-21  1:24 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Sat, 2006-05-20 at 23:42 +0200, Oliver Bandel wrote:
> On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:

> Two dual core G5:
> 
>   http://www.apple.com/powermac/

We're looking at how N will increase in N-core CPU chips. 

There are plenty of boards that support
multiple CPUs (you can get a 8x Opteron board, that's 16 cores,
and not really that expensive -- the CPUs are though :).

'In theory' CISC should die, RISC should support higher N
on the same wafer.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 22:15           ` Oliver Bandel
@ 2006-05-21  1:25             ` skaller
  0 siblings, 0 replies; 67+ messages in thread
From: skaller @ 2006-05-21  1:25 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Sun, 2006-05-21 at 00:15 +0200, Oliver Bandel wrote:
> On Sat, May 20, 2006 at 04:15:52PM +0100, Don Syme wrote:
> > 
> > > And Microsoft seems to be incapable of releasing a 
> > > new OS- 32-bit or 64-bit.  
> > 
> > 64-bit editions of Windows XP and Windows 2003 Server have been
> > available for quite a while, e.g.
> 
> What is Windows? :->

The dominant desktop platform. Linux is unlikely to challenge it.
Apple might.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20  1:17         ` Frederick Akalin
  2006-05-20  1:52           ` Brian Hurt
@ 2006-05-21  9:26           ` Richard Jones
       [not found]             ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
  1 sibling, 1 reply; 67+ messages in thread
From: Richard Jones @ 2006-05-21  9:26 UTC (permalink / raw)
  To: caml-list

On Fri, May 19, 2006 at 06:17:38PM -0700, Frederick Akalin wrote:
> I think that's an awfully simplistic point of view.  My problem is  
> that I want to store more than 4 million items in an array.  You're  
> suggesting moving to 64 bits?

We moved our servers and development machines to 64 bits (AMD64
specifically) a while back and haven't regretted the decision at all.
You can buy low-end 64 bit processors and motherboards for peanuts
these days.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-21  1:24                   ` skaller
@ 2006-05-21 14:10                     ` Oliver Bandel
  0 siblings, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-21 14:10 UTC (permalink / raw)
  To: caml-list

On Sun, May 21, 2006 at 11:24:34AM +1000, skaller wrote:
> On Sat, 2006-05-20 at 23:42 +0200, Oliver Bandel wrote:
> > On Sat, May 20, 2006 at 08:12:22PM +1000, skaller wrote:
> 
> > Two dual core G5:
> > 
> >   http://www.apple.com/powermac/
> 
> We're looking at how N will increase in N-core CPU chips. 
> 
> There are plenty of boards that support
> multiple CPUs (you can get a 8x Opteron board, that's 16 cores,
> and not really that expensive -- the CPUs are though :).
> 
> 'In theory' CISC should die, RISC should support higher N
> on the same wafer.
[...]

Please do not forget that the G5 is a 64-Bit processor!

http://www.apple.com/g5processor/


So if you have a 2 x DualCore G5 then you have
four 64-Bit Cores on your computer.

Even when the Processors then switch to 64 Bits,
the next problem would be to have RAMs available
that are big enough to use that.


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
       [not found]             ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
@ 2006-05-22 10:40               ` Richard Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Jones @ 2006-05-22 10:40 UTC (permalink / raw)
  To: Frederick Akalin; +Cc: caml-list

On Sun, May 21, 2006 at 02:48:30AM -0700, Frederick Akalin wrote:
> I agree that 64-bit machines are cheap these days, but what I meant  
> was that forcing the users of my software, if I were to release it,  
> to move to 64 bit just to run my program is ludicrous.  What's even  
> more ludicrous is I would have to explain that the hard limit in 32  
> bits isn't 2^31 or 2^32, but 2^22.  If I had total control over the  
> machines that would run my software, mandating 64 bits would be  
> viable, but otherwise it's not, as 32 bit machines will be around for  
> a while yet.

Well, the solution - which has been pointed out to you at least twice
- is to use a two-level array when compiling on 32 bit machines.  (On
64 bit machines, just use the normal array mechanism - you can hide
the difference by defining a pair of operators).  The slow down should
be negligible.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: immutable strings II ([Caml-list] Array 4 MB size limit)
  2006-05-20 21:29             ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
@ 2006-05-22 22:09               ` Aleksey Nogin
  0 siblings, 0 replies; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-22 22:09 UTC (permalink / raw)
  To: Caml List

On 20.05.2006 14:29, Oliver Bandel wrote:

> On Sat, May 20, 2006 at 11:32:15AM -0700, brogoff wrote:
> 
>>Strings are random access character collections, so under the hood they might
>>be arrays, but I think mutable strings as the default string are a mistake.
>>One can imagine other representations for strings, like ropes. I'd like a
>>language to allow many different underlying string representations.
> 
> I think it would break too much code when changing the string behaviour now.
> 
> There should be a keyword "immutable" to make strings immutable.
> Your idea would be to make it like records immutable as default
> and then the "mutable" word must be used for mutable strings.
> 
> This would also work, but then, as mentioned above, this would berak
> a lot of aleady existing code.
> 
> A compiler switch, that per default uses mutable strings,
> so as to provide backwards compatibility, but can be changed
> to immutable strings as default.
> This default string representation then could be changed with indivdual
> used keywords "mutable" and "immutable".

I would really love to have such a compiler switch (even without the 
keywords). We currently have a huge amount of code that assumes that all 
the strings are immutable. This is "sort of OK" as long as we do not 
interface with anything that would mutate strings, but, of course, I 
would love for the compiler to actually enforce this with some sort of 
"immutable strings" type.

-- 
Aleksey Nogin

Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-20 21:11           ` immutable strings (Re: [Caml-list] " Oliver Bandel
@ 2006-05-25  4:32             ` Stefan Monnier
  2006-05-25  5:56               ` [Caml-list] " Martin Jambon
  2006-05-25 11:18               ` Brian Hurt
  0 siblings, 2 replies; 67+ messages in thread
From: Stefan Monnier @ 2006-05-25  4:32 UTC (permalink / raw)
  To: caml-list

> IMHO strings should be possibly made immutable by using the "immutable"
> keyword, which is the opposite of the "mutable" keyword as it is used for
> records. So the user/programmer can chose the kind of strings he7she wants.

I think it's OK to have (mutable) byte arrays, but strings should simply
always be immutable.


        Stefan



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  4:32             ` immutable strings (Re: " Stefan Monnier
@ 2006-05-25  5:56               ` Martin Jambon
  2006-05-25  7:23                 ` j h woodyatt
                                   ` (2 more replies)
  2006-05-25 11:18               ` Brian Hurt
  1 sibling, 3 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25  5:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: caml-list

On Thu, 25 May 2006, Stefan Monnier wrote:

>> IMHO strings should be possibly made immutable by using the "immutable"
>> keyword, which is the opposite of the "mutable" keyword as it is used for
>> records. So the user/programmer can chose the kind of strings he7she wants.
>
> I think it's OK to have (mutable) byte arrays, but strings should simply
> always be immutable.

OCaml strings are compact byte arrays which serve their purpose well.
Having a whole different type for immutable strings is in my opinion a 
waste of energy. The problem is that freezing or unfreezing a string 
safely involves a copy of the whole string. And obviously it's not 
possible to handle only immutable strings since somehow you have to create 
them, and unlike record fields, they won't be set in one operation but in 
n operations, n being the length of the string.

So I'd really love to see actual examples where using immutable strings 
would be such an improvement over mutable strings.
If the problem is just to ensure that string data won't be changed by the 
user of a library, then it is trivial using module signatures and 
String.copy for the conversions.


Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  5:56               ` [Caml-list] " Martin Jambon
@ 2006-05-25  7:23                 ` j h woodyatt
  2006-05-25 10:22                   ` Jon Harrop
  2006-05-25 19:28                   ` Oliver Bandel
  2006-05-25 11:14                 ` Brian Hurt
  2006-05-25 17:31                 ` Aleksey Nogin
  2 siblings, 2 replies; 67+ messages in thread
From: j h woodyatt @ 2006-05-25  7:23 UTC (permalink / raw)
  To: The Caml Trade

On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
>
> So I'd really love to see actual examples where using immutable  
> strings would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed  
> by the user of a library, then it is trivial using module  
> signatures and String.copy for the conversions.

I have no dog in this fight, but I can imagine a pragmatic approach  
that might satisfy some of these concerns without introducing much in  
the way of extra runtime complexity costs.

Change the way strings are boxed so that there is an additional tag  
for immutable strings as opposed to the normal mutable ones.  In all  
respects, allow string objects with either tag to be treated the same  
by functions that do not modify the content of the string.  When the  
"safe" variants of the string mutation functions are called on a  
string object with the immutable tag, throw a runtime exception.   
Finally, provide a function that changes the tag of a string from  
mutable to immutable, i.e. like a special blessed case of  
Obj.set_tag.  Not from immutable to mutable, though-- that should be  
disallowed.  If the -unsafe is not set, then allow a new compiler  
option -constliteral (or something) that would make all string  
literals immutable.  You'd need to use String.copy to get a mutable  
string with contents copied from an immutable one.

Of course, I can't offer a convincing reason to believe this would  
bring an improvement over what we have today.  My proposal would not  
give the programmer any assurance that contents of string values  
passed as parameters to library functions are not modified by the  
library, i.e. the "unsafe" functions would still work.  It *might*  
give a pragmatic benefit of surfacing unintentional programming  
errors earlier than they might otherwise be found, but I'm not  
prepared to argue in defense of that.

If you did immutable strings this way, it would be nice for the sake  
of consistency to have a comparable feature for arrays.


—
j h woodyatt <jhw@conjury.org>




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  7:23                 ` j h woodyatt
@ 2006-05-25 10:22                   ` Jon Harrop
  2006-05-25 19:28                   ` Oliver Bandel
  1 sibling, 0 replies; 67+ messages in thread
From: Jon Harrop @ 2006-05-25 10:22 UTC (permalink / raw)
  To: caml-list

On Thursday 25 May 2006 08:23, j h woodyatt wrote:
> Change the way strings are boxed so that there is an additional tag
> for immutable strings as opposed to the normal mutable ones.  In all
> respects, allow string objects with either tag to be treated the same
> by functions that do not modify the content of the string.  When the
> "safe" variants of the string mutation functions are called on a
> string object with the immutable tag, throw a runtime exception.

Blech. I didn't come all this way for dynamic typing... ;-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  5:56               ` [Caml-list] " Martin Jambon
  2006-05-25  7:23                 ` j h woodyatt
@ 2006-05-25 11:14                 ` Brian Hurt
  2006-05-25 19:42                   ` Oliver Bandel
  2006-05-26  6:51                   ` Alain Frisch
  2006-05-25 17:31                 ` Aleksey Nogin
  2 siblings, 2 replies; 67+ messages in thread
From: Brian Hurt @ 2006-05-25 11:14 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Stefan Monnier, caml-list



On Wed, 24 May 2006, Martin Jambon wrote:

> And obviously it's not possible to handle only immutable strings since 
> somehow you have to create them, and unlike record fields, they won't be 
> set in one operation but in n operations, n being the length of the 
> string.

This is the advantage of regular expressions- they can be implemented on 
immutable strings (regular expressions don't modify the old strings, they 
produce new strings), and can change large chunks of the string in one 
"operation", limiting the number of unnecessary string copies you need to 
do.  Yes, the underlying implementation would be mutable- but so is they 
underlying implementation of everything else.  The semantics remain 
applicative.

Note that this also eliminates an inefficiency of ocaml.  When you use a 
constant string, it has to allocate a new string for you.  Consider 
Ocaml's response to the expression:

"foo" == "foo";;

The reason ocaml returns false in the above expression is that when you 
give a constant string in an expression, Ocaml first has to make a copy of 
that string (in case you modify it), and then it returns the copy.  So it 
makes two copies of the string "foo" above, and then compares them for 
referential equality- which, since they're different copies, returns 
false.

Note that even pure-imperitive languages, like C/C++, hit this problem. 
What is the output of the following peice of C code:
     for (i = 0; i < 10; ++i) {
         char * p = "foo";
         p[0] += 1;
         printf("%s\n", p);
     }

C/C++ itself wants *some* strings to be immutable.  Fortran used to let 
you change the value of 2, which made for some interesting (in the bad 
sense) code tricks.  I comment that they've since fixed this.  But mutable 
strings allow you to change the value of "two", which is almost as bad.

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  4:32             ` immutable strings (Re: " Stefan Monnier
  2006-05-25  5:56               ` [Caml-list] " Martin Jambon
@ 2006-05-25 11:18               ` Brian Hurt
  2006-05-25 17:34                 ` Aleksey Nogin
  1 sibling, 1 reply; 67+ messages in thread
From: Brian Hurt @ 2006-05-25 11:18 UTC (permalink / raw)
  To: caml-list


I wish to reiterate that my comments on immutable strings are solely in 
the context of designing a new language, and should not be construed to 
indicate a desire to change Ocaml.  Changing this behavior in Ocaml would 
break huge amounts of code (including some of my code).

Brian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  5:56               ` [Caml-list] " Martin Jambon
  2006-05-25  7:23                 ` j h woodyatt
  2006-05-25 11:14                 ` Brian Hurt
@ 2006-05-25 17:31                 ` Aleksey Nogin
  2006-05-25 19:54                   ` Martin Jambon
  2 siblings, 1 reply; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:31 UTC (permalink / raw)
  To: Caml List; +Cc: Martin Jambon

On 24.05.2006 22:56, Martin Jambon wrote:

>> I think it's OK to have (mutable) byte arrays, but strings should simply
>> always be immutable.
>  
> OCaml strings are compact byte arrays which serve their purpose well.

Yes, however immutable strings are also very useful and that 
functionality is simply missing in OCaml. The usage I am very interested 
in is essentially using strings as "printable tokens". In other words, a 
data type that is easy to compare and has an obvious I/O representation.

> Having a whole different type for immutable strings is in my opinion a 
> waste of energy. The problem is that freezing or unfreezing a string 
> safely involves a copy of the whole string. And obviously it's not 
> possible to handle only immutable strings since somehow you have to 
> create them, and unlike record fields, they won't be set in one 
> operation but in n operations, n being the length of the string.

This is not true. All I want is having a purely functional interface with:
- Constants (a compiler flag for turning "..." constants into immutable 
strings instead of mutable ones).
- Inputing from a channel
- Concatenation
- Things like string_of_int for immutable string.

Of course, it might be the case that the standard library might have to 
use some sort of "unsafe" operations that would "inappropriately" mutate 
the newly created immutable string buffer, but this is IMHO no different 
than how the unsafe operations are already used in standard library for 
arrays and strings.

> So I'd really love to see actual examples where using immutable strings 
> would be such an improvement over mutable strings.
> If the problem is just to ensure that string data won't be changed by 
> the user of a library, then it is trivial using module signatures and 
> String.copy for the conversions.

Such a copy operation can be extremely prohibitive in a setting that 
assumes that a data structure is immutable and tries really hard to 
preserve sharing (including using functions like a sharing-preserving 
version of map (*), etc). In such a setting, these extra copies can 
potentially have a devastating effect on memory usage, cache 
performance, etc. And this situation is exactly what we have in our 
MetaPRL project - there we have resorted to simply using strings and 
pretending they are immutable, but this is clearly suboptimal.

----
(*)
let rec smap f = function
    [] -> []
  | (hd :: tl) as l ->
       let hd' = f hd in
       let tl' = smap f tl in
          if hd == hd' && tl == tl' then l else hd' :: tl'

-- 
Aleksey Nogin

Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 11:18               ` Brian Hurt
@ 2006-05-25 17:34                 ` Aleksey Nogin
  2006-05-25 18:44                   ` Tom
  0 siblings, 1 reply; 67+ messages in thread
From: Aleksey Nogin @ 2006-05-25 17:34 UTC (permalink / raw)
  To: Caml List

On 25.05.2006 04:18, Brian Hurt wrote:

> I wish to reiterate that my comments on immutable strings are solely in 
> the context of designing a new language, and should not be construed to 
> indicate a desire to change Ocaml. 

Well, I am advocating a change. IMHO the immutable strings is a very 
useful concept and I would really like to see it in OCaml.

> Changing this behavior in Ocaml 
> would break huge amounts of code (including some of my code).

It is obvious that at this point changing the default behavior would not 
be possible. However, adding the new "immutable strings" type would 
definitely be great and adding a compiler flag that would turn all 
string constants, "^", etc in a given module into immutable string 
operations would be nice as well and would not break any existing code.

-- 
Aleksey Nogin

Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Moore 04, tel: (626) 395-2200


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 17:34                 ` Aleksey Nogin
@ 2006-05-25 18:44                   ` Tom
  2006-05-25 23:00                     ` Jon Harrop
  0 siblings, 1 reply; 67+ messages in thread
From: Tom @ 2006-05-25 18:44 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On 25/05/06, Aleksey Nogin <nogin@cs.caltech.edu> wrote:
>
> On 25.05.2006 04:18, Brian Hurt wrote:
>
> > Changing this behavior in Ocaml
> > would break huge amounts of code (including some of my code).
>
> It is obvious that at this point changing the default behavior would not
> be possible. However, adding the new "immutable strings" type would
> definitely be great and adding a compiler flag that would turn all
> string constants, "^", etc in a given module into immutable string
> operations would be nice as well and would not break any existing code.


But why don't you add such a type yourself?

[-- Attachment #2: Type: text/html, Size: 943 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25  7:23                 ` j h woodyatt
  2006-05-25 10:22                   ` Jon Harrop
@ 2006-05-25 19:28                   ` Oliver Bandel
  1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:28 UTC (permalink / raw)
  To: caml-list

On Thu, May 25, 2006 at 12:23:51AM -0700, j h woodyatt wrote:
> On May 24, 2006, at 10:56 PM, Martin Jambon wrote:
> >
> >So I'd really love to see actual examples where using immutable  
> >strings would be such an improvement over mutable strings.
> >If the problem is just to ensure that string data won't be changed  
> >by the user of a library, then it is trivial using module  
> >signatures and String.copy for the conversions.
> 
> I have no dog in this fight, but I can imagine a pragmatic approach  
> that might satisfy some of these concerns without introducing much in  
> the way of extra runtime complexity costs.
> 
> Change the way strings are boxed so that there is an additional tag  
> for immutable strings as opposed to the normal mutable ones.  In all  
> respects, allow string objects with either tag to be treated the same  
> by functions that do not modify the content of the string.  When the  
> "safe" variants of the string mutation functions are called on a  
> string object with the immutable tag, throw a runtime exception.   
[...]

Compiletime error would be what I would prefer.


[...]
> It *might*  
> give a pragmatic benefit of surfacing unintentional programming  
> errors earlier than they might otherwise be found,
[...]

The earlier the better it is.
It's like non-matching types found by the compiler: it helps
a lot in development. Debugging normallay takes the most time
in development; and with OCaml debugging time decreases much - that
is what I experienced. And this comes from early getting a hint
to possible problems (possible with compilers that would accept
such code; im possible when the compiler says: "no! I don't accept
these sources!").

Ciao,
  Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 11:14                 ` Brian Hurt
@ 2006-05-25 19:42                   ` Oliver Bandel
  2006-05-26  6:51                   ` Alain Frisch
  1 sibling, 0 replies; 67+ messages in thread
From: Oliver Bandel @ 2006-05-25 19:42 UTC (permalink / raw)
  To: caml-list

On Thu, May 25, 2006 at 07:14:55AM -0400, Brian Hurt wrote:
[...]
> Note that even pure-imperitive languages, like C/C++, hit this problem. 
> What is the output of the following peice of C code:
>     for (i = 0; i < 10; ++i) {
>         char * p = "foo";
>         p[0] += 1;
>         printf("%s\n", p);
>     }
[...]

Heheh, this brings a "Bus Error" (which one might call an "exception" ;-)
 (an exception th C'ish way)).

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 17:31                 ` Aleksey Nogin
@ 2006-05-25 19:54                   ` Martin Jambon
  0 siblings, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25 19:54 UTC (permalink / raw)
  To: Caml List; +Cc: Martin Jambon

On Thu, 25 May 2006, Aleksey Nogin wrote:

> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>>  OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is 
> simply missing in OCaml. The usage I am very interested in is essentially 
> using strings as "printable tokens". In other words, a data type that is easy 
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a 
>> waste of energy. The problem is that freezing or unfreezing a string safely 
>> involves a copy of the whole string. And obviously it's not possible to 
>> handle only immutable strings since somehow you have to create them, and 
>> unlike record fields, they won't be set in one operation but in n 
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable 
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.

Isn't it a bit limited? What if I want other functions?

But if it satisfies you, you can do the syntax part with an unsafe_freeze 
function and a bit of camlp4. The rest is just plain old OCaml.

> Of course, it might be the case that the standard library might have to use 
> some sort of "unsafe" operations that would "inappropriately" mutate the 
> newly created immutable string buffer, but this is IMHO no different than how 
> the unsafe operations are already used in standard library for arrays and 
> strings.

I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of 
your suggestions. This strongly constrasts with mistakes in array/string 
indices which happen all the time.


>> So I'd really love to see actual examples where using immutable strings 
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the 
>> user of a library, then it is trivial using module signatures and 
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes 
> that a data structure is immutable and tries really hard to preserve sharing 
> (including using functions like a sharing-preserving version of map (*), 
> etc). In such a setting, these extra copies can potentially have a 
> devastating effect on memory usage, cache performance, etc. And this 
> situation is exactly what we have in our MetaPRL project - there we have 
> resorted to simply using strings and pretending they are immutable, but this 
> is clearly suboptimal.

Yes, so how do you avoid copies without using the "unsafe" conversions all 
over the place?


> ----
> (*)
> let rec smap f = function
>   [] -> []
> | (hd :: tl) as l ->
>      let hd' = f hd in
>      let tl' = smap f tl in
>         if hd == hd' && tl == tl' then l else hd' :: tl'

In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it 
doesn't break sharing:

let freeze s =
   let s' = make_constant s (* using a copy! *) in
   if s' is in the table then return the element from the table
   else add s' and return s'



Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 18:44                   ` Tom
@ 2006-05-25 23:00                     ` Jon Harrop
  2006-05-25 23:15                       ` Martin Jambon
  0 siblings, 1 reply; 67+ messages in thread
From: Jon Harrop @ 2006-05-25 23:00 UTC (permalink / raw)
  To: caml-list

On Thursday 25 May 2006 19:44, Tom wrote:
[immutable strings]
> But why don't you add such a type yourself?

You would have to make your new type abstract in order to distinguish it from 
the build-in mutable string type. Then you lose the ability to do pattern 
matching and s.[i].

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 23:00                     ` Jon Harrop
@ 2006-05-25 23:15                       ` Martin Jambon
  0 siblings, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-25 23:15 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, 26 May 2006, Jon Harrop wrote:

> On Thursday 25 May 2006 19:44, Tom wrote:
> [immutable strings]
>> But why don't you add such a type yourself?
>
> You would have to make your new type abstract in order to distinguish it from
> the build-in mutable string type. Then you lose the ability to do pattern
> matching and s.[i].

The former problem can be solved with camlp4 (which would use a "unsafe" 
function to do the type conversion). The latter can be solved with camlp4 
too, or simply by opening a module which defines a String module which 
provides a "get" function.

Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-25 11:14                 ` Brian Hurt
  2006-05-25 19:42                   ` Oliver Bandel
@ 2006-05-26  6:51                   ` Alain Frisch
  1 sibling, 0 replies; 67+ messages in thread
From: Alain Frisch @ 2006-05-26  6:51 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

Brian Hurt wrote:
> Note that this also eliminates an inefficiency of ocaml.  When you use a
> constant string, it has to allocate a new string for you.  Consider
> Ocaml's response to the expression:
> 
> "foo" == "foo";;
> 
> The reason ocaml returns false in the above expression is that when you
> give a constant string in an expression, Ocaml first has to make a copy
> of that string (in case you modify it), and then it returns the copy. 

This is not true. The two strings above are physically different because
the two constants appear at different location of the source.

The expression

let f () = "foo" in f () == f ()

evaluates to true. The block for "foo" is allocated only once; the
string is not copied not each time the body of f is evaluated.
As a consequence,

let f () = "foo" in (f ()).[0] <- 'F'; f ()

returns "Foo".

-- Alain


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Caml-list] Array 4 MB size limit
  2006-05-20 14:22           ` Brian Hurt
  2006-05-20 18:41             ` j h woodyatt
  2006-05-20 20:47             ` Jozef Kosoru
@ 2006-05-26 18:34             ` Ken Rose
  2 siblings, 0 replies; 67+ messages in thread
From: Ken Rose @ 2006-05-26 18:34 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Jozef Kosoru, caml-list

Brian Hurt wrote:

> On the other side, increasing the array size has a cost.  Among other
> things, it slows down garbage collection.  If done badly, it could break
> existing code.  The C-99 standard did this- by introducing long long, it
> *silently* broke conformant code.  I've yet to forgive them for doing
> this.  More to the point, it silently broke *my* code.  Which is why I'm
> violently allergic to suggestions that we "patch around" the 32-bit
> limitation.  When people suggest this, I tend to hear "I want to break
> your code".  Because that's how it worked last time.

Maybe I'm being dense here, and maybe this is off-topic, but how did
long long break things in C99?

Thanks

 - ken


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
  2006-05-29  2:36 ` Martin Jambon
@ 2006-05-31 12:53 ` Jean-Christophe Filliatre
  1 sibling, 0 replies; 67+ messages in thread
From: Jean-Christophe Filliatre @ 2006-05-31 12:53 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Martin Jambon, Caml List


Harrison, John R writes:
 > The point is not that I will mutate a string by accident.

I once discovered a bug in  the Coq proof assistant that was precisely
due  to  a  string (an  identifier)  mutated  by  accident (may  be  I
shouldn't say  it :-) A name  was capitalized in-place  somewhere in a
piece  of code  unrelated with  the Coq  kernel but  of course  it had
consequences all over the system (including the kernel).

So I'm definitely in favor of immutable strings, for the exact reasons
mentioned by John.

But I  think an abstract data type  is not really an  issue, since one
does little pattern-matching on  strings in practice.  And having your
own  abstract data type  for immutable  strings has  other advantages,
such as  the ability  to share equal  strings (using  hash-consing) to
speedup  names comparisons. Even  printing is  not painful  provided a
suitable formatter-based printing function and %a.

-- 
Jean-Christophe Filliâtre (http://www.lri.fr/~filliatr)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-29 20:52 Harrison, John R
  0 siblings, 0 replies; 67+ messages in thread
From: Harrison, John R @ 2006-05-29 20:52 UTC (permalink / raw)
  To: Martin Jambon, Caml List; +Cc: Harrison, John R

Hi Martin,

| OK, but let's be pragmatic: what kind of interface and implementation
do
| you have in mind?

I did indeed have a very specific example in mind, my theorem prover HOL
Light. I have an OCaml type of typed lambda-terms:

  type term =
      Var of string * hol_type
    | Const of string * hol_type
    | Comb of term * term
    | Abs of term * term

The type "term" is private, and the abstract type interface only permits
you to construct well-typed terms, via interface functions like "mk_var"
and "mk_comb". For example, the call "mk_comb(s,t)" gives "Comb(s,t)"
provided the types agree, and fails otherwise.

I would like the user to be able to write "mk_var(x,ty)" and the net
result to be just one cons "Var(x,ty)" with "x" and "ty" identical to
the
input arguments. But with mutable strings, it is possible in principle
for the string "x" inside that object to get modified by other code.
Of course, it's a bit artificial, but I would like it to be impossible,
given that the principle of LCF provers is that the user should be able
to use arbitrary programs while having soundness enforced by the ML
type system.

Of course, I can use my own private type of strings, but then I need
to convert every time I use standard library functions, pattern matching
is a bit less convenient, etc.

John.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
  2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
@ 2006-05-29  2:36 ` Martin Jambon
  2006-05-31 12:53 ` Jean-Christophe Filliatre
  1 sibling, 0 replies; 67+ messages in thread
From: Martin Jambon @ 2006-05-29  2:36 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Caml List

Hi John,

On Sun, 28 May 2006, Harrison, John R wrote:

> With immutable strings, you'd never need to do conversions at the module
> interfaces. As with any other functional data structure, you only copy
> when you want to change part of it.

OK, but let's be pragmatic: what kind of interface and implementation do 
you have in mind?

(and then: isn't it possible to implement in OCaml?)


If anyone is interested:

Before posting I tried a polymorphic (wrt mutability) string type.
It was fun enough, but it doesn't scale very well. I put it there:

   http://martin.jambon.free.fr/ocaml.html#gstring



Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki


^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
@ 2006-05-28 23:20 Harrison, John R
  2006-05-29  2:36 ` Martin Jambon
  2006-05-31 12:53 ` Jean-Christophe Filliatre
  0 siblings, 2 replies; 67+ messages in thread
From: Harrison, John R @ 2006-05-28 23:20 UTC (permalink / raw)
  To: Martin Jambon, Caml List; +Cc: Harrison, John R

Hi Martin,

| I disagree: has it ever happened to you to mutate a string by
accident?

The point is not that I will mutate a string by accident. I've never
done
it by accident or by design. The point is that I can't depend on code
that I call, or code that calls mine, not to subsequently modify strings
that are passed as arguments. So if I really need to reliably fix them I
am forced into expensive copy operations.

In practice, the obvious library calls are safe, so like Aleksey, I use
the built-in strings for the sake of convenience and compatibility. But
it's unsatisfactory intellectually. Some of us want to program in a
primarily functional style, yet the implementation of one of the most
basic and useful datatypes is not functional.

| Yes, so how do you avoid copies without using the "unsafe" conversions
all
| over the place?

With immutable strings, you'd never need to do conversions at the module
interfaces. As with any other functional data structure, you only copy
when you want to change part of it.

John.


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2006-05-31 12:53 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
2006-05-15 21:27   ` akalin
2006-05-15 22:51 ` Oliver Bandel
2006-05-16  0:48 ` Brian Hurt
2006-05-16  9:57   ` Damien Doligez
2006-05-16 15:10     ` Markus Mottl
2006-05-16  8:01 ` Xavier Leroy
2006-05-16  8:20   ` Nicolas Cannasse
2006-05-19 17:13     ` Xavier Leroy
2006-05-19  5:57   ` Frederick Akalin
2006-05-19  6:21     ` Oliver Bandel
2006-05-19 12:15     ` Jon Harrop
2006-05-19 19:36       ` akalin
2006-05-19 20:17         ` Oliver Bandel
2006-05-19 16:28     ` Jozef Kosoru
2006-05-19 20:08       ` Oliver Bandel
2006-05-19 21:26       ` Jon Harrop
2006-05-20  1:06         ` Brian Hurt
2006-05-20 18:32           ` brogoff
2006-05-20 21:29             ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-22 22:09               ` Aleksey Nogin
2006-05-20 21:11           ` immutable strings (Re: [Caml-list] " Oliver Bandel
2006-05-25  4:32             ` immutable strings (Re: " Stefan Monnier
2006-05-25  5:56               ` [Caml-list] " Martin Jambon
2006-05-25  7:23                 ` j h woodyatt
2006-05-25 10:22                   ` Jon Harrop
2006-05-25 19:28                   ` Oliver Bandel
2006-05-25 11:14                 ` Brian Hurt
2006-05-25 19:42                   ` Oliver Bandel
2006-05-26  6:51                   ` Alain Frisch
2006-05-25 17:31                 ` Aleksey Nogin
2006-05-25 19:54                   ` Martin Jambon
2006-05-25 11:18               ` Brian Hurt
2006-05-25 17:34                 ` Aleksey Nogin
2006-05-25 18:44                   ` Tom
2006-05-25 23:00                     ` Jon Harrop
2006-05-25 23:15                       ` Martin Jambon
2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20  1:17         ` Frederick Akalin
2006-05-20  1:52           ` Brian Hurt
2006-05-20  9:08             ` Jozef Kosoru
2006-05-20 10:12               ` skaller
2006-05-20 11:06                 ` Jozef Kosoru
2006-05-20 12:02                   ` skaller
2006-05-20 21:42                 ` Oliver Bandel
2006-05-21  1:24                   ` skaller
2006-05-21 14:10                     ` Oliver Bandel
     [not found]               ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
2006-05-20 19:52                 ` Jozef Kosoru
2006-05-20 21:45                   ` Oliver Bandel
2006-05-21  9:26           ` Richard Jones
     [not found]             ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
2006-05-22 10:40               ` Richard Jones
2006-05-20 10:51         ` Jozef Kosoru
2006-05-20 14:22           ` Brian Hurt
2006-05-20 18:41             ` j h woodyatt
2006-05-20 19:37               ` Jon Harrop
2006-05-20 20:47             ` Jozef Kosoru
2006-05-26 18:34             ` Ken Rose
2006-05-20 22:07           ` Oliver Bandel
2006-05-20 15:15         ` Don Syme
2006-05-20 22:15           ` Oliver Bandel
2006-05-21  1:25             ` skaller
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29  2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-05-29 20:52 Harrison, John R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).