From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 07ACDBB84 for ; Tue, 16 May 2006 10:01:38 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4G81be5020260 for ; Tue, 16 May 2006 10:01:37 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA22708 for ; Tue, 16 May 2006 10:01:36 +0200 (MET DST) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4G81a9Y020252 for ; Tue, 16 May 2006 10:01:36 +0200 Received: from [192.168.1.2] (che78-2-82-237-71-191.fbx.proxad.net [82.237.71.191]) by smtp5-g19.free.fr (Postfix) with ESMTP id 3EC2827505; Tue, 16 May 2006 10:01:36 +0200 (CEST) Message-ID: <446986DF.1070308@inria.fr> Date: Tue, 16 May 2006 10:01:35 +0200 From: Xavier Leroy User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: akalin@akalin.cx Cc: caml-list@inria.fr Subject: Re: [Caml-list] Array 4 MB size limit References: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> In-Reply-To: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 446986E1.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 446986E0.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; arrays:01 arrays:01 ocaml:01 ocaml:01 unboxed:01 pointers:01 unboxed:01 surprising:01 bug:01 stack:01 stack:01 segfault:01 segfault:01 bug:01 damien:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 > I was greatly surprised when I found out there was such a low limit on > arrays. Is there a reason for this? Will this limit ever be increased? As Brian Hurt explained, this limit comes from the fact that heap object sizes are stored in N-10 bits, where N is the bit width of the processor (32 or 64). Historical digression: this representation decision was initially taken when designing Caml Light in 1989-1990. At that time, even professional workstations had 16 M of RAM at best, so limiting arrays to 4M elements was reasonable. The decision was then reconsidered in 1995 during the redesign that led to OCaml. At that time, 64-bit architectures were all the rage: OCaml was actually implemented on a 64-bit Alpha, and only then backported to 32-bit machines. So, the original header format was kept, since it makes complete sense on a 64-bit architecture. Little did I know that the 32-bitters would survive so long. Now, it's 2006, and 64-bit processors are becoming universally available, in desktop machines at least. (I've been running an AMD64 PC at home since january 2005 with absolutely zero problems.) So, no the data representations of OCaml are not going to change to lift the array size limit on 32-bit machines. > Is the limit a limit on the number of elements or the total size? The > language in Sys.max_array_size implies the former, but the fact the > limit is halved for floats implies the latter. If I had a record type > with 5 floats, will the limit then by Sys.max_array_size / 10? No. In general, Caml arrays are not unboxed, meaning that your array of 5-float records is actually an array of pointers to individual blocks holding 5 floats. The only exception is for arrays of floats, which are unboxed. > Is there > some sort of existing ArrayList module that works around this problem? > Ideally, I'd like to have something like C++'s std::vector<> type, which > can be dynamically resized. Do I have to write my own? :( A better idea would be to determine exactly what data structure you need: which abstract operations, what performance requirements, etc. C++ and Lisp programmers tend to encode everything as arrays or lists, respectively, but quite often these are not the best data structure for the application of interest. > Also, the fact that using lists crashes for the same data set is > surprising. Is there a similar hard limit for lists, or would this be a > bug? Should I post a test case? Depends on the platform you use. In principle, Caml should report stack overflows cleanly, by throwing a Stack_overflow exception. However, this is hard to do in native code as it depends a lot on the processor and OS used. So, some combinations (e.g. x86/Linux) will report stack overflows via an exception, and others will let the kernel generate a segfault. If you're getting the segfault under x86/Linux for instance, please post a test case on the bug tracking system. It's high time that Damien shaves :-) - Xavier Leroy