From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 02033BC69 for ; Tue, 2 Oct 2007 19:10:05 +0200 (CEST) X-IronPort-AV: E=Sophos;i="4.21,220,1188770400"; d="scan'208";a="3662365" Received: from macchabee.inria.fr ([128.93.24.147]) by mail3-relais-sop.national.inria.fr with ESMTP; 02 Oct 2007 19:10:05 +0200 Received: by macchabee.inria.fr (Postfix, from userid 23100) id 97D161FA302; Tue, 2 Oct 2007 19:10:30 +0200 (CEST) From: verlyck To: kirillkh Cc: bhurt@janestcapital.com, caml-list@yquem.inria.fr In-reply-to: (kirillkh@gmail.com) Subject: Re: [Caml-list] best and fastest way to read lines from a file? From: Bruno.Verlyck@inria.fr References: <779bf2730710011427g5983da4cw6ad8b715a9e38771@mail.gmail.com> <47016CEE.8010704@crans.org> <200710021239.l92CdwZ15641@virtutech.se> <47024002.2080206@janestcapital.com> Mime-Version: 1.0 (generated by tm-edit 7.106) Content-Type: text/plain; charset=US-ASCII Message-Id: <20071002171030.97D161FA302@macchabee.inria.fr> Date: Tue, 2 Oct 2007 19:10:30 +0200 (CEST) X-Spam: no; 0.00; iterating:01 combinator:01 invokes:01 combinator:01 byte:01 ocaml:01 contrib:01 delimited:01 ocaml:01 runtime:01 generalizing:01 camlp:01 char:01 faq:01 parsing:01 Date: Tue, 2 Oct 2007 18:15:57 +0200 From: kirillkh Hi, > This should be a FAQ. Since we're talking of 10+ lines of code and only one case among many possible (you might also want to do something fairly similar, but not quite the same, as iterating over all words or characters in a file, doing something else than counting, etc.), I would rather see it implemented in a library as combinator. What I have in mind is a function that goes over a file and invokes some user code on each block of bytes/characters/lines/words/... The points of customization would be: * how to detect the start and end of block * routine to pass the blocks to Then, on top of this combinator, build block-specific ones: for byte, char, line, word blocks. Also make it possible to customize buffering behavior. Being new to OCaml, I'm interested in comments; is what I suggest a good idea? Yes, why not ? If yes, why hasn't anyone implemented it yet? I believe Cash (in the Hump: http://caml.inria.fr/cgi-bin/hump.fr.cgi?contrib=86) has some of the things you ask for: look around fold_in_channel (as a combinator; yes, it *is* 5 lines of code), and for what you call blocks, chapters 6 & 7 of the documentation (Reading delimited strings & Record I/O and field parsing). Buffering is also parameterizable (between 1 and 4Kb, no line buffering, sorry, too much C code to modify in the Ocaml runtime). It may not suit your taste, but when generalizing, everybody tends to have one's own very specific idea of how to do it. Human nature... At least Cash can give you some ideas. Of course, the OP was asking for the fastest way... OK, we aren't anymore. HTH, Bruno. Disclaimer: Cash is still not ported to Ocaml 3.10; but 3.09 is fine. Have to choose: camlp4 or 5... ?