From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 3ECB6BB84 for ; Tue, 16 May 2006 02:48:06 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4G0m5gS010385 for ; Tue, 16 May 2006 02:48:05 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA18071 for ; Tue, 16 May 2006 02:48:04 +0200 (MET DST) Received: from luna.vie.lunde.net (at.lunde.net [62.116.13.60]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id k4G0m3vr027035 for ; Tue, 16 May 2006 02:48:04 +0200 Received: from h-68-166-100-33.nycmny83.covad.net ([68.166.100.33] helo=localhost.localdomain) by luna.vie.lunde.net with asmtp (Exim 3.36 #1 (Debian)) id 1Ffnj9-0000yv-00; Tue, 16 May 2006 02:47:56 +0200 Received: from localhost (localhost.localdomain [127.0.0.1]) by localhost.localdomain (Postfix) with ESMTP id 6F335C3064; Mon, 15 May 2006 20:48:16 -0400 (EDT) Date: Mon, 15 May 2006 20:48:16 -0400 (EDT) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: akalin@akalin.cx Cc: caml-list@inria.fr Subject: Re: [Caml-list] Array 4 MB size limit In-Reply-To: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> Message-ID: References: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Miltered: at concorde with ID 44692145.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 44692144.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; arrays:01 bigarrays:01 arrays:01 doubles:01 unboxed:01 pointers:01 pointers:01 ocaml:01 resize:01 byte:01 resize:01 surprising:01 bug:01 stack:01 ocaml:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.3 On Mon, 15 May 2006, akalin@akalin.cx wrote: > I'm running into cases where the 4 MB limit on arrays is starting to become a > problem. Lists are much slower and cause seg faults for me on the same data > set, and Bigarrays are a partial solution because I'd like to be able to > store arbitrary types in arrays (not just numbers). > > I was greatly surprised when I found out there was such a low limit on > arrays. Is there a reason for this? Will this limit ever be increased? It comes from how arrays are stored in memory. Every boxed value has a "tag word", which, among other uses, is used by the GC to figure out how big the object is (i.e. where the next object starts). Now, the encoding scheme of this tag word is fairly complicated, but it works out that for arrays, 10 bits of the 32 bits is used for other stuff, leaving 22 bits for the size. This limits the size of an array to 4M-1 elements (actually, it limits the size of the array to 16M-4 bytes, as size is measured in words- which is why arrays of floats are limited to 2M-1 entries- each float takes up 8 bytes aka 2 words). When you move to 64 bits, the tag word doubles in size, but the amount of "other" information in the tag word doesn't- this means that suddenly you have 52 bits of size, or 4T arrays. And now floats only take up one word, not two, so they too can have 4T arrays. The array of array idea someone brought up is a good one. There's a slight performance hit, but not much. I'd keep the top level array short- 1K entries is enough. The advantage of this is that if you're heavily using the array, the top level array will stay in cache (possibly even L1 cache), meaning the main cost of an array access only goes up by about 10% or so, maybe less (maybe 1% if it stays in L1 cache). The reasoning goes like this: the largest cost by far of an array access on a large array is the cache miss. Doing a memory reference that is satisified out of L1 cache generally takes 1-4 clock cycles. If you have to go to L2 cache, that's going to take 10-40 clock cycles (approximately). Going to main memory? 100-350+ clock cycles. With a large array, you're likely going to take the 100-350+ clock cycle hit anyways. If the top level array is in L1 cache, you're only adding 1-4 clockcycles to the 100-350+ clock cycle hit you're going to take anyways. Personally, I think PCs should have gone 64 bit circa 1997 or so. Once we started getting hard disks large enough that we couldn't mmap the entire hard disk, we started hitting problems. Long long and the large file interfaces are examples of these problems. The longer we go- and the larger memories and hard disks get, the bigger the problems get. Microsoft seems to be the main impediment, now that AMD has forced Intel to get off the stick. > If I had a record type with 5 floats, > will the limit then by Sys.max_array_size / 10? Within the record, the floats are unboxed (assuming you didn't have any non-float elements). This means the floats are stored directly in the record, and that the record doesn't hold pointers to the floats. But the record itself is boxed- which means that an array of these records is really an array of pointers to these records, meaning that you're back at the 4M-1 limit. Note that a record of 5 floats costs 44 bytes (40 bytes for the 5 floats + 4 bytes for the tag word). Assuming no records are stored in more than one location, the total memory cost of an array of 4M-1 of them is 16M for the array, plus (4M-1)*44 for the records, for a total of 192M-44 bytes. This is almost 10% of your available memory space right there. > Is there some sort of > existing ArrayList module that works around this problem? Ideally, I'd like > to have something like C++'s std::vector<> type, which can be dynamically > resized. Ocaml can't dynamically resize arrays. This gets tricky to do when arrays get large. At the extreme, if the array takes up 50% + 1 byte of the total address space, you can't resize it to be any larger- to resize requires you to copy it, which takes more memory than you have. I've seend problems well before then. > Also, the fact that using lists crashes for the same data set is surprising. > Is there a similar hard limit for lists, or would this be a bug? Should I > post a test case? I'm willing to bet dollars to donuts that the problem you hit with lists was stack overflow. Without signifigantly impacting performance there isn't a whole lot Ocaml can do about detecting stack overflow before the segfault hits. My general rule is, if it's going to contain more than a few dozen items, it probably shouldn't be a list. Extlib's library is less prone to this error, but you can still run into serious problems with long lists. Brian