From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA18398; Sat, 31 Jul 2004 18:51:28 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA18097 for ; Sat, 31 Jul 2004 18:51:27 +0200 (MET DST) Received: from asmtp-a063f35.pas.sa.earthlink.net (asmtp-a063f35.pas.sa.earthlink.net [207.217.120.220]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i6VGpQEV031132 for ; Sat, 31 Jul 2004 18:51:26 +0200 Received: from 168-103-58-16.tcsn.qwest.net ([168.103.58.16] helo=dylan) by asmtp-a063f35.pas.sa.earthlink.net with asmtp (Exim 4.34) id 1Bqx4v-0006Ku-BY for caml-list@inria.fr; Sat, 31 Jul 2004 09:51:25 -0700 Message-ID: <004501c4771e$c7971cb0$0201a8c0@dylan> From: "David McClain" To: "caml" Subject: [Caml-list] BigArrays Date: Sat, 31 Jul 2004 09:52:29 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-ELNK-Trace: 7a0ab3eafc8cf994b22988ad1c62733440683398e744b8a474caab5465425905fd7f64f4a83b6c56a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 168.103.58.16 X-Miltered: at nez-perce with ID 410BCE0E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; mcclain:01 dmcclain:01 bigarrays:01 bigarrays:01 settable:01 wordsize:01 mcclain:01 arrays:01 arrays:01 ocaml:01 images:98 float:02 arbitrary:02 exceeded:96 unsigned:03 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk I'm finally ready to consider moving up to the OCaml BigArrays in my NML. It looks like a pretty good implementation. I still need to add special slicing code that can extract arbitrary regions from arrays with strides different from the parent array. But one thing that stands out as a major weakness is the implementation of memory mapped arrays. These currently do two objectionable things: 1. the entire file is mapped into memory -- these are potentially huge files, bumping up to and sometimes exceeding 4 GB. Memory mapping needs to use a sliding window of access in a manner transparent to the user. I often default to 4 MB windows, but I do offer the possibility of a user settable window size. 2. the file is mapped into memory starting at file offset zero. Most datafiles use some amount of header information to describe the internal layout of data. Data are mixed arrays of various machine word sizes, some are double, others are float, and still others are unsigned character arrays (images), all combined into the same large datafile. A file that contains only one array of fixed wordsize is rather unusual. Also curious about the limitation (??) to a maximum of 16 dimensions. While I have never exceeded that limit, why place any limit? David McClain Senior Corporate Scientist Avisere, Inc. +1.520.390.7738 (USA) david.mcclain@avisere.com ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners