caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* best and fastest way to read lines from a file?
@ 2007-10-01 21:27 YC
  2007-10-01 21:55 ` [Caml-list] " Daniel Bünzli
  2007-10-01 21:55 ` Olivier Roussel
  0 siblings, 2 replies; 17+ messages in thread
From: YC @ 2007-10-01 21:27 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

Hi all -

Newbie question: I'm wondering what's the most efficient way to read in a
file line by line?  I wrote a routine in both python and ocaml to read in a
file with 345K lines to do line count and was surprised that python's code
run roughly 3x faster.

I thought the speed should be equivalent and/or somewhat in ocaml favor,
given this is an IO-bound comparison, but perhaps Python's simplistic for
loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is
unbuffered, but I'm not sure how to write a buffered code that can do a line
by line read-in.

Any insight is appreciated, thanks ;)

yc

Python code:
# test.py
#!/usr/bin/python

file = <345k-line.txt>
count = 0
for line in open (file, "r"):
    count = count + 1
print "Done: ", count

OCaml code:
(* test.ml *)
let rec line_count filename =
  let f = open_in filename in
  let rec loop file count =
    try
      ignore (input_line file);
      loop file (count + 1)
    with
      End_of_file -> count
  in
    loop f 0;;

let count = line_count <345k-line.txt> in
    Printf.printf "Done: %d" count;;

Test
$ time ./test.py
Done: 345001

real    0m0.416s
user   0m0.101s
sys    0m0.247s

$ ocamlopt -o test test.ml
$ time ./test
Done: 345001
real    0m1.483s
user   0m0.631s
sys    0m0.685s

[-- Attachment #2: Type: text/html, Size: 1844 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-01 21:27 best and fastest way to read lines from a file? YC
@ 2007-10-01 21:55 ` Daniel Bünzli
  2007-10-01 22:29   ` YC
  2007-10-01 21:55 ` Olivier Roussel
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Bünzli @ 2007-10-01 21:55 UTC (permalink / raw)
  To: caml-list List


Le 1 oct. 07 à 23:27, YC a écrit :

> OCaml code:
> (* test.ml *)
> let rec line_count filename =
>   let f = open_in filename in
>   let rec loop file count =
>     try
>       ignore (input_line file);
>       loop file (count + 1)
>     with
>       End_of_file -> count
>   in
>     loop f 0;;

Your function is not tail recursive. A function is tail-recursive if  
there is nothing to do after the recursive call. You might believe  
this is the case in your function, but it is not the case because of  
the exception handler. Try this instead :

let rec line_count filename =
   let ic = open_in filename in
   let rec loop ic count =
    let line = try Some (input_line ic) with End_of_file -> None in
    match line with
    | Some _ -> loop ic (count+1)
    | None -> count
   in
   loop ic 0

It should run faster.

Daniel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-01 21:27 best and fastest way to read lines from a file? YC
  2007-10-01 21:55 ` [Caml-list] " Daniel Bünzli
@ 2007-10-01 21:55 ` Olivier Roussel
  2007-10-02 12:39   ` Mattias Engdegård
  1 sibling, 1 reply; 17+ messages in thread
From: Olivier Roussel @ 2007-10-01 21:55 UTC (permalink / raw)
  To: YC; +Cc: caml-list

YC a écrit :
> Hi all -
Hi!
> OCaml code:
> (* test.ml <http://test.ml> *)
> let rec line_count filename =
>   let f = open_in filename in
>   let rec loop file count =
>     try
>       ignore (input_line file);
>       loop file (count + 1)
>     with
>       End_of_file -> count
>   in
>     loop f 0;;
> 
> let count = line_count <345k-line.txt> in
>     Printf.printf "Done: %d" count;;
The following solution is ~2.5x faster than the Python implementation on 
my computer. Because there is no more exceptions in recursive calls, and 
thanks to tail-recursion.

let readline f =
   try Some (input_line f)
   with End_of_file -> None;;

let line_count filename =
   let f = open_in filename in
   let rec loop count =
     match (readline f) with
       | Some(_) -> loop (count+1)
       | None -> count in
   loop 0;;

let count = line_count <345k-line.txt> in
Printf.printf "Done: %d\n" count;;


-- 
Olivier Roussel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-01 21:55 ` [Caml-list] " Daniel Bünzli
@ 2007-10-01 22:29   ` YC
  0 siblings, 0 replies; 17+ messages in thread
From: YC @ 2007-10-01 22:29 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list List

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

Thanks to everyone replied - your answers are extremely helpful.

> Your function is not tail recursive. A function is tail-recursive if
> there is nothing to do after the recursive call. You might believe
> this is the case in your function, but it is not the case because of
> the exception handler. Try this instead :

I did wonder whether my function was tail recursive but thought maybe it
is.  Thanks for the correction from everyone on this point.  And the pattern
of returning Some or None makes sense to me now.  Rerun the test makes me
feel better ;)

Thanks,
yc

[-- Attachment #2: Type: text/html, Size: 663 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-01 21:55 ` Olivier Roussel
@ 2007-10-02 12:39   ` Mattias Engdegård
  2007-10-02 12:56     ` Brian Hurt
  0 siblings, 1 reply; 17+ messages in thread
From: Mattias Engdegård @ 2007-10-02 12:39 UTC (permalink / raw)
  To: olivier.roussel; +Cc: yinso.chen, caml-list

>let line_count filename =
>   let f = open_in filename in
>   let rec loop count =
>     match (readline f) with
>       | Some(_) -> loop (count+1)
>       | None -> count in
>   loop 0;;

Something like this should be even faster:

exception Done of int

let line_count file =
  let rec loop count =
    let _ =
      try
	input_line f
      with End_of_file -> raise (Done count)
    in
      loop (count + 1)
  in
    try loop 0 with Done x -> x

as it avoids unnecessary consing in the inner loop.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 12:39   ` Mattias Engdegård
@ 2007-10-02 12:56     ` Brian Hurt
  2007-10-02 16:15       ` kirillkh
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Hurt @ 2007-10-02 12:56 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

Mattias Engdegård wrote:

>>let line_count filename =
>>  let f = open_in filename in
>>  let rec loop count =
>>    match (readline f) with
>>      | Some(_) -> loop (count+1)
>>      | None -> count in
>>  loop 0;;
>>    
>>
>
>Something like this should be even faster:
>
>exception Done of int
>
>let line_count file =
>  let rec loop count =
>    let _ =
>      try
>	input_line f
>      with End_of_file -> raise (Done count)
>    in
>      loop (count + 1)
>  in
>    try loop 0 with Done x -> x
>
>as it avoids unnecessary consing in the inner loop.
>
>__
>

This should be a FAQ.

let line_count file =
    let rec loop count =
       let test =
          try
             let _ = input_line f in
             true
          with
          | Not_found -> false
       in
       if test then
          loop (count + 1)
       else
          count
    in
    loop 0
;;

No consing, no unnecessary try/catch, tail recursive.

Brian


[-- Attachment #2: Type: text/html, Size: 1845 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 12:56     ` Brian Hurt
@ 2007-10-02 16:15       ` kirillkh
  2007-10-02 17:10         ` verlyck, Bruno.Verlyck
  2007-10-02 18:02         ` kirillkh
  0 siblings, 2 replies; 17+ messages in thread
From: kirillkh @ 2007-10-02 16:15 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

> This should be a FAQ.
>

Since we're talking of 10+ lines of code and only one case among many
possible (you might also want to do something fairly similar, but not quite
the same, as iterating over all words or characters in a file, doing
something else than counting, etc.), I would rather see it implemented in a
library as combinator. What I have in mind is a function that goes over a
file and invokes some user code on each block of
bytes/characters/lines/words/... The points of customization would be:
* how to detect the start and end of block
* routine to pass the blocks to

Then, on top of this combinator, build block-specific ones: for byte, char,
line, word blocks. Also make it possible to customize buffering behavior.

Being new to OCaml, I'm interested in comments - is what I suggest a good
idea? If yes, why hasn't anyone implemented it yet? One could argue that
it's trivial and can be implemented in each use case anew, but among 5
different solutions posted so far, each has its own problems, besides the
supposedly ideal 5-th -- I'd take this as an indication that, although
simple, it's not really trivial to write this thing.

[-- Attachment #2: Type: text/html, Size: 1469 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 16:15       ` kirillkh
@ 2007-10-02 17:10         ` verlyck, Bruno.Verlyck
  2007-10-02 18:02         ` kirillkh
  1 sibling, 0 replies; 17+ messages in thread
From: verlyck, Bruno.Verlyck @ 2007-10-02 17:10 UTC (permalink / raw)
  To: kirillkh; +Cc: bhurt, caml-list

   Date: Tue, 2 Oct 2007 18:15:57 +0200
   From: kirillkh <kirillkh@gmail.com>
Hi,
   > This should be a FAQ.
   Since we're talking of 10+ lines of code and only one case among
   many possible (you might also want to do something fairly similar,
   but not quite the same, as iterating over all words or characters
   in a file, doing something else than counting, etc.), I would
   rather see it implemented in a library as combinator.  What I have
   in mind is a function that goes over a file and invokes some user
   code on each block of bytes/characters/lines/words/... The points
   of customization would be:
   * how to detect the start and end of block
   * routine to pass the blocks to

   Then, on top of this combinator, build block-specific ones: for
   byte, char, line, word blocks.  Also make it possible to customize
   buffering behavior.

   Being new to OCaml, I'm interested in comments; is what I suggest a
   good idea?
Yes, why not ?

   If yes, why hasn't anyone implemented it yet?  
I believe Cash (in the Hump:
 http://caml.inria.fr/cgi-bin/hump.fr.cgi?contrib=86) 
has some of the things you ask for: look around fold_in_channel (as a
combinator; yes, it *is* 5 lines of code), and for what you call
blocks, chapters 6 & 7 of the documentation (Reading delimited strings
& Record I/O and field parsing).  Buffering is also parameterizable
(between 1 and 4Kb, no line buffering, sorry, too much C code to
modify in the Ocaml runtime).

It may not suit your taste, but when generalizing, everybody tends to
have one's own very specific idea of how to do it.  Human nature...
At least Cash can give you some ideas.

Of course, the OP was asking for the fastest way...  OK, we aren't
anymore.

HTH,
Bruno.

Disclaimer: Cash is still not ported to Ocaml 3.10; but 3.09 is fine.
Have to choose: camlp4 or 5... ?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 16:15       ` kirillkh
  2007-10-02 17:10         ` verlyck, Bruno.Verlyck
@ 2007-10-02 18:02         ` kirillkh
  2007-10-02 19:35           ` skaller
  2007-10-02 20:23           ` Olivier Andrieu
  1 sibling, 2 replies; 17+ messages in thread
From: kirillkh @ 2007-10-02 18:02 UTC (permalink / raw)
  Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3083 bytes --]

Replying to a private mail from Brian:

2007/10/2, Brian Hurt <bhurt@janestcapital.com>:
>
> kirillkh wrote:
>
>
>  This should be a FAQ.
> >
>
> Since we're talking of 10+ lines of code and only one case among many
> possible (you might also want to do something fairly similar, but not quite
> the same, as iterating over all words or characters in a file, doing
> something else than counting, etc.), I would rather see it implemented in a
> library as combinator. What I have in mind is a function that goes over a
> file and invokes some user code on each block of
> bytes/characters/lines/words/... The points of customization would be:
> * how to detect the start and end of block
> * routine to pass the blocks to
>
> Then, on top of this combinator, build block-specific ones: for byte,
> char, line, word blocks. Also make it possible to customize buffering
> behavior.
>
> Being new to OCaml, I'm interested in comments - is what I suggest a good
> idea? If yes, why hasn't anyone implemented it yet? One could argue that
> it's trivial and can be implemented in each use case anew, but among 5
> different solutions posted so far, each has its own problems, besides the
> supposedly ideal 5-th -- I'd take this as an indication that, although
> simple, it's not really trivial to write this thing.
>
> Note that the fifth version is only perfect because the author knew that
> he didn't need to keep the line around.  Doing something even moderately
> more complicated would almost certainly require an allocation in the main
> loop- not that this is that big of a problem (Ocaml allocation is lightning
> fast).
>
> The problem with this library is when to stop.  A real simple "fold over
> the lines of a file" interface might be nice, but there'll always be
> pressure to add just one more feature, deal with just one more slightly more
> complex case- at the end of which you get a badly specified and badly
> implemented parser combinator library.
>
> Brian
>

OK, so I'll give up the parsing/buffering part and only leave efficient
exception handling. This should leave the user free to do anything with it,
but prevent performance pitfalls. The following is based on Mattias
Engdegard's code:

(* I couldn't figure out, how to declare a polymorphic exception properly *)
exception Done of 'a

let fold_file (file: in_channel)
              (read_func: in_channel->'a)
              (elem_func: 'a->'b->'b)
              (seed: 'b) =
  let rec loop prev_val =
    let input =
      try read_func file
      with End_of_file -> raise (Done prev_val)
    in
      let combined_val = elem_func input prev_val in
      loop combined_val
  in
    try loop seed with Done x -> x

And the usage for line counting:

let line_count filename =
   let f = open_in filename in
   let counter _ count = count + 1 in
   fold_file f readline counter 0

Since it's library code, we can tolerate the little annoyance of the second
try-catch. As far as I can tell, this code has the same performance
characteristics as yours: no consing + tail recursion. Any other problems
with it?

[-- Attachment #2: Type: text/html, Size: 4600 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 18:02         ` kirillkh
@ 2007-10-02 19:35           ` skaller
  2007-10-02 21:05             ` kirillkh
  2007-10-02 20:23           ` Olivier Andrieu
  1 sibling, 1 reply; 17+ messages in thread
From: skaller @ 2007-10-02 19:35 UTC (permalink / raw)
  To: kirillkh; +Cc: caml-list

On Tue, 2007-10-02 at 20:02 +0200, kirillkh wrote:
> Replying to a private mail from Brian:

> (* I couldn't figure out, how to declare a polymorphic exception
> properly *)
> exception Done of 'a

That's easy -- you can't: even if you could, how could
you possibly use it?

This compiles fine:

type t = { field : 'a. 'a }
exception Done of t

but 'field' is useless. This is not at all the same as

	let f (x:'a) (g:'a -> int) =
	match g x with
	| 0 -> ..
	| ..

because *inside* the function, 'a is not a type variable,
and the code is not polymorphic, it is simply a sole 
unknown type, sometimes said to be monomorphised.

The problem with exceptions is that they're not captured,
so they cannot be polymorphic. Exceptions SUCK because
their context is not delimited -- you can throw all the way
out of the mainline .. :)

[This happens to me regularly and it can takes days to figure
out what is Not_found ..]

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 18:02         ` kirillkh
  2007-10-02 19:35           ` skaller
@ 2007-10-02 20:23           ` Olivier Andrieu
  2007-10-02 20:49             ` kirillkh
  1 sibling, 1 reply; 17+ messages in thread
From: Olivier Andrieu @ 2007-10-02 20:23 UTC (permalink / raw)
  To: kirillkh; +Cc: caml-list

On 10/2/07, kirillkh <kirillkh@gmail.com> wrote:
> OK, so I'll give up the parsing/buffering part and only leave efficient
> exception handling. This should leave the user free to do anything with it,
> but prevent performance pitfalls. The following is based on Mattias
> Engdegard's code:
>
> (* I couldn't figure out, how to declare a polymorphic exception properly *)
> exception Done of 'a
>
>  let fold_file (file: in_channel)
>               (read_func: in_channel->'a)
>               (elem_func: 'a->'b->'b)
>               (seed: 'b) =
>   let rec loop prev_val =
>     let input =
>       try read_func file
>       with End_of_file -> raise (Done prev_val)
>     in
>       let combined_val = elem_func input prev_val in
>       loop combined_val
>   in
>     try loop seed with Done x -> x
>
> And the usage for line counting:
>
> let line_count filename =
>    let f = open_in filename in
>    let counter _ count = count + 1 in
>    fold_file f readline counter 0
>
> Since it's library code, we can tolerate the little annoyance of the second
> try-catch. As far as I can tell, this code has the same performance
> characteristics as yours: no consing + tail recursion. Any other problems
> with it?

well apart from the fact that you cannot have "polymorphic exceptions"
in OCaml, this kind of code is IMHO much more natural with an
imperative loop instead of a functional one:


let fold_file read chan f init =
  let acc = ref init in
  begin
    try while true do
      let d = read chan in
      acc := f d !acc
    done
    with End_of_file -> ()
  end ;
  !acc

-- 
  Olivier


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 20:23           ` Olivier Andrieu
@ 2007-10-02 20:49             ` kirillkh
  2007-10-02 21:10               ` Jon Harrop
  2007-10-02 21:15               ` David Allsopp
  0 siblings, 2 replies; 17+ messages in thread
From: kirillkh @ 2007-10-02 20:49 UTC (permalink / raw)
  Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

2007/10/2, Olivier Andrieu <oandrieu@gmail.com>:
>
> On 10/2/07, kirillkh <kirillkh@gmail.com> wrote:
> > OK, so I'll give up the parsing/buffering part and only leave efficient
> > exception handling. This should leave the user free to do anything with
> it,
> > but prevent performance pitfalls. The following is based on Mattias
> > Engdegard's code:
> >
> > (* I couldn't figure out, how to declare a polymorphic exception
> properly *)
> > exception Done of 'a
> >
> >  let fold_file (file: in_channel)
> >               (read_func: in_channel->'a)
> >               (elem_func: 'a->'b->'b)
> >               (seed: 'b) =
> >   let rec loop prev_val =
> >     let input =
> >       try read_func file
> >       with End_of_file -> raise (Done prev_val)
> >     in
> >       let combined_val = elem_func input prev_val in
> >       loop combined_val
> >   in
> >     try loop seed with Done x -> x
> >
> > And the usage for line counting:
> >
> > let line_count filename =
> >    let f = open_in filename in
> >    let counter _ count = count + 1 in
> >    fold_file f readline counter 0
> >
> > Since it's library code, we can tolerate the little annoyance of the
> second
> > try-catch. As far as I can tell, this code has the same performance
> > characteristics as yours: no consing + tail recursion. Any other
> problems
> > with it?
>
> well apart from the fact that you cannot have "polymorphic exceptions"
> in OCaml, this kind of code is IMHO much more natural with an
> imperative loop instead of a functional one:
>
>
> let fold_file read chan f init =
>   let acc = ref init in
>   begin
>     try while true do
>       let d = read chan in
>       acc := f d !acc
>     done
>     with End_of_file -> ()
>   end ;
>   !acc
>
> --
>   Olivier
>

A little weird to see such inherently functional construct as fold
implemented imperatively. But it's fine with me, as long as it does the job.
I wonder, though, how would the performance of a line counter based on your
code compare with the one suggested by Brian.

[-- Attachment #2: Type: text/html, Size: 3191 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 19:35           ` skaller
@ 2007-10-02 21:05             ` kirillkh
  2007-10-02 21:07               ` Jon Harrop
  0 siblings, 1 reply; 17+ messages in thread
From: kirillkh @ 2007-10-02 21:05 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

2007/10/2, skaller <skaller@users.sourceforge.net>:
>
> On Tue, 2007-10-02 at 20:02 +0200, kirillkh wrote:
> > Replying to a private mail from Brian:
>
> > (* I couldn't figure out, how to declare a polymorphic exception
> > properly *)
> > exception Done of 'a
>
> That's easy -- you can't: even if you could, how could
> you possibly use it?
>
> This compiles fine:
>
> type t = { field : 'a. 'a }
> exception Done of t
>
> but 'field' is useless. This is not at all the same as
>
>         let f (x:'a) (g:'a -> int) =
>         match g x with
>         | 0 -> ..
>         | ..
>
> because *inside* the function, 'a is not a type variable,
> and the code is not polymorphic, it is simply a sole
> unknown type, sometimes said to be monomorphised.
>
> The problem with exceptions is that they're not captured,
> so they cannot be polymorphic. Exceptions SUCK because
> their context is not delimited -- you can throw all the way
> out of the mainline .. :)
>
> [This happens to me regularly and it can takes days to figure
> out what is Not_found ..]


Is there a way to instantiate an exception with a value of unspecified type
and then do explicit casting on catch?

Is it a deficiency in the language? I suppose OCaml could support
polymorphic exceptions, if they were checked, like in Java, and appeared in
function signatures in a similar way to parameters and return values.

[-- Attachment #2: Type: text/html, Size: 1922 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 21:05             ` kirillkh
@ 2007-10-02 21:07               ` Jon Harrop
  0 siblings, 0 replies; 17+ messages in thread
From: Jon Harrop @ 2007-10-02 21:07 UTC (permalink / raw)
  To: caml-list

On Tuesday 02 October 2007 22:05:07 kirillkh wrote:
> Is it a deficiency in the language?

A constraint of the type system by design.

> I suppose OCaml could support 
> polymorphic exceptions, if they were checked, like in Java, and appeared in
> function signatures in a similar way to parameters and return values.

The exn type would need an undefined number of type parameters.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 20:49             ` kirillkh
@ 2007-10-02 21:10               ` Jon Harrop
  2007-10-02 21:15               ` David Allsopp
  1 sibling, 0 replies; 17+ messages in thread
From: Jon Harrop @ 2007-10-02 21:10 UTC (permalink / raw)
  To: caml-list

On Tuesday 02 October 2007 21:49:41 kirillkh wrote:
> A little weird to see such inherently functional construct as fold
> implemented imperatively.

On the contrary, this is the core functionality of ML that makes it so 
practically useful: combining functional and imperative programming neatly 
and efficiently.

Look at the Array and Hashtbl modules of the stdlib for more examples of code 
that is both imperative and functional.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 20:49             ` kirillkh
  2007-10-02 21:10               ` Jon Harrop
@ 2007-10-02 21:15               ` David Allsopp
  2007-10-02 22:23                 ` skaller
  1 sibling, 1 reply; 17+ messages in thread
From: David Allsopp @ 2007-10-02 21:15 UTC (permalink / raw)
  To: 'kirillkh'; +Cc: caml-list

> A little weird to see such inherently functional construct as fold
implemented imperatively. But it's fine with me, as long as it does the job.
I
> wonder, though, how would the performance of a line counter based on your
code compare with the one suggested by Brian. 

It's faster, though only slightly - I was going to post an imperative
solution earlier, too. Even in Ocaml, a recursive call (tail or otherwise)
has some costs not present in a simple loop. However, I don't agree that
it's more natural - "ugly" is more the word I'd use!


David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Caml-list] best and fastest way to read lines from a file?
  2007-10-02 21:15               ` David Allsopp
@ 2007-10-02 22:23                 ` skaller
  0 siblings, 0 replies; 17+ messages in thread
From: skaller @ 2007-10-02 22:23 UTC (permalink / raw)
  To: David Allsopp; +Cc: 'kirillkh', caml-list

On Tue, 2007-10-02 at 22:15 +0100, David Allsopp wrote:
> > A little weird to see such inherently functional construct as fold
> implemented imperatively. But it's fine with me, as long as it does the job.
> I
> > wonder, though, how would the performance of a line counter based on your
> code compare with the one suggested by Brian. 
> 
> It's faster, though only slightly - I was going to post an imperative
> solution earlier, too. Even in Ocaml, a recursive call (tail or otherwise)
> has some costs not present in a simple loop. However, I don't agree that
> it's more natural - "ugly" is more the word I'd use!

In Felix, tail-rec calls are generally faster than loops.
The reason is that the compiler is better able to reason
about the semantics and so it can do better optimisations.
OTOH loops degenerate to conditional goto-spaghetti and
would require SSA analysis to recover.

For example, given:

	fun f(x,y) ... f (a,b)

the implementation generates

	var x,y; 
	set_initial_values(x,y);
	start:
		...
		paropt x,y = a,b;
		goto start;

where 'paropt' serialises the parallel assignment
using an optimisation algorithm which tries to minimise
the number of assignments and temporaries used.

this has a significant impact on performance .. Felix version
of Ackermann's function is more then 2.5x faster than Ocamlopt,
and almost 2x faster than C. Ackermann has 2 tail-rec calls.

Trying to do this with loops is much harder because, unlike
the tail-rec formulation, the 'inputs' and 'outputs' connected
by the loops are not explicit, as they are when a function
is used and parameters and arguments define the input/output
relationship (and everything else local is a temporary with no
persistence: Felix doesn't not allow functions to have side
effects, so only local data is mutable).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-10-02 22:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-01 21:27 best and fastest way to read lines from a file? YC
2007-10-01 21:55 ` [Caml-list] " Daniel Bünzli
2007-10-01 22:29   ` YC
2007-10-01 21:55 ` Olivier Roussel
2007-10-02 12:39   ` Mattias Engdegård
2007-10-02 12:56     ` Brian Hurt
2007-10-02 16:15       ` kirillkh
2007-10-02 17:10         ` verlyck, Bruno.Verlyck
2007-10-02 18:02         ` kirillkh
2007-10-02 19:35           ` skaller
2007-10-02 21:05             ` kirillkh
2007-10-02 21:07               ` Jon Harrop
2007-10-02 20:23           ` Olivier Andrieu
2007-10-02 20:49             ` kirillkh
2007-10-02 21:10               ` Jon Harrop
2007-10-02 21:15               ` David Allsopp
2007-10-02 22:23                 ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).