From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,HTML_MESSAGE, MISSING_HEADERS,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id B30B4BC69 for ; Tue, 2 Oct 2007 22:49:44 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAACtLAkfRVZKymGdsb2JhbACCcYtFAQEBAQcEBBEY X-IronPort-AV: E=Sophos;i="4.21,221,1188770400"; d="scan'208";a="17209391" Received: from wa-out-1112.google.com ([209.85.146.178]) by mail4-smtp-sop.national.inria.fr with ESMTP; 02 Oct 2007 22:49:43 +0200 Received: by wa-out-1112.google.com with SMTP id k17so5299800waf for ; Tue, 02 Oct 2007 13:49:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:subject:cc:in-reply-to:mime-version:content-type:references; bh=e7akvfdvqRMQbd4zzSDmJos8arWqQ1ftq2CadL0YfqI=; b=QdYn+wYGAa4fytRtm2kncOW+psOcMl0fG5n42r4XOn3BRJdROBPeHI76tLxL9yMnRuc70/jTDFriC7vpNPri9KBdD3OHd7kB9/v3hkJKT+bigsI75cFVFpCNNF/Q4jwZs6wV3AWkNleEZmbEEPb/bT96fkbQeOJW3DbJglmqmxM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:subject:cc:in-reply-to:mime-version:content-type:references; b=SfY2hzcKX9IqB9KOdgPkxkueonb7l5b0hygJehexXm4cbhlcebk40aao9TpqBXb4ZTfWxO8mm7/0qNdSiJXjIl3lHNthFhnL+3YeBQRGPkN9zqcasjZIkREt1qoZEoMHI24GzVMvkrl66lAVPoNdkcu89ohAfTQEUVbvyrECtP4= Received: by 10.114.127.1 with SMTP id z1mr2830026wac.1191358181709; Tue, 02 Oct 2007 13:49:41 -0700 (PDT) Received: by 10.114.25.8 with HTTP; Tue, 2 Oct 2007 13:49:41 -0700 (PDT) Message-ID: Date: Tue, 2 Oct 2007 22:49:41 +0200 From: kirillkh Subject: Re: [Caml-list] best and fastest way to read lines from a file? Cc: caml-list@yquem.inria.fr In-Reply-To: <95513600710021323u762efd5k5ee6bdd03d7cc37@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_20349_10208929.1191358181699" References: <779bf2730710011427g5983da4cw6ad8b715a9e38771@mail.gmail.com> <47016CEE.8010704@crans.org> <200710021239.l92CdwZ15641@virtutech.se> <47024002.2080206@janestcapital.com> <95513600710021323u762efd5k5ee6bdd03d7cc37@mail.gmail.com> X-Spam: no; 0.00; andrieu:01 oandrieu:01 pitfalls:01 mattias:01 channel-:01 'a-:01 'b-:01 prev:01 val:01 prev:01 val:01 readline:01 tolerate:01 recursion:01 ocaml:01 ------=_Part_20349_10208929.1191358181699 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline 2007/10/2, Olivier Andrieu : > > On 10/2/07, kirillkh wrote: > > OK, so I'll give up the parsing/buffering part and only leave efficient > > exception handling. This should leave the user free to do anything with > it, > > but prevent performance pitfalls. The following is based on Mattias > > Engdegard's code: > > > > (* I couldn't figure out, how to declare a polymorphic exception > properly *) > > exception Done of 'a > > > > let fold_file (file: in_channel) > > (read_func: in_channel->'a) > > (elem_func: 'a->'b->'b) > > (seed: 'b) = > > let rec loop prev_val = > > let input = > > try read_func file > > with End_of_file -> raise (Done prev_val) > > in > > let combined_val = elem_func input prev_val in > > loop combined_val > > in > > try loop seed with Done x -> x > > > > And the usage for line counting: > > > > let line_count filename = > > let f = open_in filename in > > let counter _ count = count + 1 in > > fold_file f readline counter 0 > > > > Since it's library code, we can tolerate the little annoyance of the > second > > try-catch. As far as I can tell, this code has the same performance > > characteristics as yours: no consing + tail recursion. Any other > problems > > with it? > > well apart from the fact that you cannot have "polymorphic exceptions" > in OCaml, this kind of code is IMHO much more natural with an > imperative loop instead of a functional one: > > > let fold_file read chan f init = > let acc = ref init in > begin > try while true do > let d = read chan in > acc := f d !acc > done > with End_of_file -> () > end ; > !acc > > -- > Olivier > A little weird to see such inherently functional construct as fold implemented imperatively. But it's fine with me, as long as it does the job. I wonder, though, how would the performance of a line counter based on your code compare with the one suggested by Brian. ------=_Part_20349_10208929.1191358181699 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline

2007/10/2, Olivier Andrieu <oandrieu@gmail.com>:
On 10/2/07, kirillkh <kirillkh@gmail.com> wrote:
> OK, so I'll give up the parsing/buffering part and only leave efficient
> exception handling. This should leave the user free to do anything with it,
> but prevent performance pitfalls. The following is based on Mattias
> Engdegard's code:
>
> (* I couldn't figure out, how to declare a polymorphic exception properly *)
> exception Done of 'a
>
>  let fold_file (file: in_channel)
>               (read_func: in_channel->'a)
>               (elem_func: 'a->'b->'b)
>               (seed: 'b) =
>   let rec loop prev_val =
>     let input =
>       try read_func file
>       with End_of_file -> raise (Done prev_val)
>     in
>       let combined_val = elem_func input prev_val in
>       loop combined_val
>   in
>     try loop seed with Done x -> x
>
> And the usage for line counting:
>
> let line_count filename =
>    let f = open_in filename in
>    let counter _ count = count + 1 in
>    fold_file f readline counter 0
>
> Since it's library code, we can tolerate the little annoyance of the second
> try-catch. As far as I can tell, this code has the same performance
> characteristics as yours: no consing + tail recursion. Any other problems
> with it?

well apart from the fact that you cannot have "polymorphic exceptions"
in OCaml, this kind of code is IMHO much more natural with an
imperative loop instead of a functional one:


let fold_file read chan f init =
  let acc = ref init in
  begin
    try while true do
      let d = read chan in
      acc := f d !acc
    done
    with End_of_file -> ()
  end ;
  !acc

--
  Olivier

A little weird to see such inherently functional construct as fold implemented imperatively. But it's fine with me, as long as it does the job. I wonder, though, how would the performance of a line counter based on your code compare with the one suggested by Brian. ------=_Part_20349_10208929.1191358181699--