Hi all -
Newbie question: I'm wondering what's the most efficient way to read in a file line by line? I wrote a routine in both python and ocaml to read in a file with 345K lines to do line count and was surprised that python's code run roughly 3x faster.
I thought the speed should be equivalent and/or somewhat in ocaml favor, given this is an IO-bound comparison, but perhaps Python's simplistic for loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is unbuffered, but I'm not sure how to write a buffered code that can do a line by line read-in.
Any insight is appreciated, thanks ;)
yc
Python code:
# test.py
#!/usr/bin/python
file = <345k-line.txt>
count = 0
for line in open (file, "r"):
count = count + 1
print "Done: ", count
OCaml code:
(* test.ml *)
let rec line_count filename =
let f = open_in filename in
let rec loop file count =
try
ignore (input_line file);
loop file (count + 1)
with
End_of_file -> count
in
loop f 0;;
let count = line_count <345k-line.txt> in
Printf.printf "Done: %d" count;;
Test
$ time ./test.py
Done: 345001
real 0m0.416s
user 0m0.101s
sys 0m0.247s
$ ocamlopt -o test test.ml
$ time ./test
Done: 345001
real 0m1.483s
user 0m0.631s
sys 0m0.685s