From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <yinso.chen@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.4 required=5.0 tests=HTML_MESSAGE,SPF_NEUTRAL 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id A24ECBC69
	for <caml-list@yquem.inria.fr>; Mon,  1 Oct 2007 23:27:57 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgAAAMkCAUfRVZKzmGdsb2JhbACCPDWLQQICBwQEERg
X-IronPort-AV: E=Sophos;i="4.21,218,1188770400"; 
   d="scan'208";a="17150099"
Received: from wa-out-1112.google.com ([209.85.146.179])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 01 Oct 2007 23:27:56 +0200
Received: by wa-out-1112.google.com with SMTP id k17so4863733waf
        for <caml-list@yquem.inria.fr>; Mon, 01 Oct 2007 14:27:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=beta;
        h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type;
        bh=FnQMAO3TI/U4a8CKHTI74ceHWNZzAlCBxIhRaaTaEew=;
        b=J/+uODxyBMwVdA4AoPAXl4SIBf22vCXxc/tVSCutZpoJJtirGLvhJjkyEnvfT3ilA/1gZu4QeEVh15hXotN8ohiVakfwWduJ3GlfvEi2gPT3uvSjAuaZqlc4vtgtaxrIL1z52zk9ODv5b03sh9DbiQG73C9CYcb6AFYUXwnWsYQ=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:mime-version:content-type;
        b=LMV8Gc4VSqetxXUK0O3bpA2LI68X4zBvqVKw62RIIRmin5voVu/fCHLDgzggnZTm3MxYhqwhGwEmshfeh4BvEJ3otRkLK+SVxV84ndBMnf5hqyO21aUVQtUQsg27SSww8JJ0dDyXLuBGws9trtv+koTn/BkSY/AEQlo7GjdMv0M=
Received: by 10.114.178.1 with SMTP id a1mr1226282waf.1191274074512;
        Mon, 01 Oct 2007 14:27:54 -0700 (PDT)
Received: by 10.115.54.5 with HTTP; Mon, 1 Oct 2007 14:27:54 -0700 (PDT)
Message-ID: <779bf2730710011427g5983da4cw6ad8b715a9e38771@mail.gmail.com>
Date: Mon, 1 Oct 2007 14:27:54 -0700
From: YC <yinso.chen@gmail.com>
To: caml-list@yquem.inria.fr
Subject: best and fastest way to read lines from a file?
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_10645_10573288.1191274074511"
X-Spam: no; 0.00; ocaml:01 python's:01 ocaml:01 python's:01 buffer:01 ocaml's:01 buffered:01 usr:01 printf:01 printf:01 ocamlopt:01 buffer:01 ocaml's:01 buffered:01 usr:01 

------=_Part_10645_10573288.1191274074511
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi all -

Newbie question: I'm wondering what's the most efficient way to read in a
file line by line?  I wrote a routine in both python and ocaml to read in a
file with 345K lines to do line count and was surprised that python's code
run roughly 3x faster.

I thought the speed should be equivalent and/or somewhat in ocaml favor,
given this is an IO-bound comparison, but perhaps Python's simplistic for
loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is
unbuffered, but I'm not sure how to write a buffered code that can do a line
by line read-in.

Any insight is appreciated, thanks ;)

yc

Python code:
# test.py
#!/usr/bin/python

file = <345k-line.txt>
count = 0
for line in open (file, "r"):
    count = count + 1
print "Done: ", count

OCaml code:
(* test.ml *)
let rec line_count filename =
  let f = open_in filename in
  let rec loop file count =
    try
      ignore (input_line file);
      loop file (count + 1)
    with
      End_of_file -> count
  in
    loop f 0;;

let count = line_count <345k-line.txt> in
    Printf.printf "Done: %d" count;;

Test
$ time ./test.py
Done: 345001

real    0m0.416s
user   0m0.101s
sys    0m0.247s

$ ocamlopt -o test test.ml
$ time ./test
Done: 345001
real    0m1.483s
user   0m0.631s
sys    0m0.685s

------=_Part_10645_10573288.1191274074511
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi all - <br><br>Newbie question: I&#39;m wondering what&#39;s the most efficient way to read in a file line by line?&nbsp; I wrote a routine in both python and ocaml to read in a file with 345K lines to do line count and was surprised that python&#39;s code run roughly 3x faster.
<br><br>I thought the speed should be equivalent and/or somewhat in ocaml favor, given this is an IO-bound comparison, but perhaps Python&#39;s simplistic for loop have a read-ahead buffer built-in, and perhaps ocaml&#39;s input channel is unbuffered, but I&#39;m not sure how to write a buffered code that can do a line by line read-in.&nbsp; 
<br><br>Any insight is appreciated, thanks ;)<br><br>yc<br><br>Python code: <br># test.py <br>#!/usr/bin/python<br><br>file = &lt;345k-line.txt&gt;<br>count = 0 <br>for line in open (file, &quot;r&quot;):<br>&nbsp;&nbsp;&nbsp; count = count + 1 
<br>print &quot;Done: &quot;, count <br><br>OCaml code:<br>(* <a href="http://test.ml">test.ml</a> *)<br>let rec line_count filename = <br>&nbsp; let f = open_in filename in <br>&nbsp; let rec loop file count = <br>&nbsp;&nbsp;&nbsp; try <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ignore (input_line file);
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; loop file (count + 1)<br>&nbsp;&nbsp;&nbsp; with <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; End_of_file -&gt; count <br>&nbsp; in <br>&nbsp;&nbsp;&nbsp; loop f 0;;<br><br>let count = line_count &lt;345k-line.txt&gt; in <br>&nbsp;&nbsp;&nbsp; Printf.printf &quot;Done: %d&quot; count;;<br><br>Test
<br>$ time ./test.py <br>Done: 345001 <br><br>real&nbsp;&nbsp;&nbsp; 0m0.416s<br>user&nbsp;&nbsp; 0m0.101s<br>sys&nbsp;&nbsp;&nbsp; 0m0.247s <br><br>$ ocamlopt -o test <a href="http://test.ml">test.ml</a> <br>$ time ./test <br>Done: 345001 <br>real&nbsp;&nbsp;&nbsp; 0m1.483s<br>
user&nbsp;&nbsp; 0m0.631s<br>sys&nbsp;&nbsp;&nbsp; 0m0.685s<br><br>

------=_Part_10645_10573288.1191274074511--