From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=HTML_MESSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id A24ECBC69 for ; Mon, 1 Oct 2007 23:27:57 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAMkCAUfRVZKzmGdsb2JhbACCPDWLQQICBwQEERg X-IronPort-AV: E=Sophos;i="4.21,218,1188770400"; d="scan'208";a="17150099" Received: from wa-out-1112.google.com ([209.85.146.179]) by mail4-smtp-sop.national.inria.fr with ESMTP; 01 Oct 2007 23:27:56 +0200 Received: by wa-out-1112.google.com with SMTP id k17so4863733waf for ; Mon, 01 Oct 2007 14:27:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=FnQMAO3TI/U4a8CKHTI74ceHWNZzAlCBxIhRaaTaEew=; b=J/+uODxyBMwVdA4AoPAXl4SIBf22vCXxc/tVSCutZpoJJtirGLvhJjkyEnvfT3ilA/1gZu4QeEVh15hXotN8ohiVakfwWduJ3GlfvEi2gPT3uvSjAuaZqlc4vtgtaxrIL1z52zk9ODv5b03sh9DbiQG73C9CYcb6AFYUXwnWsYQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=LMV8Gc4VSqetxXUK0O3bpA2LI68X4zBvqVKw62RIIRmin5voVu/fCHLDgzggnZTm3MxYhqwhGwEmshfeh4BvEJ3otRkLK+SVxV84ndBMnf5hqyO21aUVQtUQsg27SSww8JJ0dDyXLuBGws9trtv+koTn/BkSY/AEQlo7GjdMv0M= Received: by 10.114.178.1 with SMTP id a1mr1226282waf.1191274074512; Mon, 01 Oct 2007 14:27:54 -0700 (PDT) Received: by 10.115.54.5 with HTTP; Mon, 1 Oct 2007 14:27:54 -0700 (PDT) Message-ID: <779bf2730710011427g5983da4cw6ad8b715a9e38771@mail.gmail.com> Date: Mon, 1 Oct 2007 14:27:54 -0700 From: YC To: caml-list@yquem.inria.fr Subject: best and fastest way to read lines from a file? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_10645_10573288.1191274074511" X-Spam: no; 0.00; ocaml:01 python's:01 ocaml:01 python's:01 buffer:01 ocaml's:01 buffered:01 usr:01 printf:01 printf:01 ocamlopt:01 buffer:01 ocaml's:01 buffered:01 usr:01 ------=_Part_10645_10573288.1191274074511 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all - Newbie question: I'm wondering what's the most efficient way to read in a file line by line? I wrote a routine in both python and ocaml to read in a file with 345K lines to do line count and was surprised that python's code run roughly 3x faster. I thought the speed should be equivalent and/or somewhat in ocaml favor, given this is an IO-bound comparison, but perhaps Python's simplistic for loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is unbuffered, but I'm not sure how to write a buffered code that can do a line by line read-in. Any insight is appreciated, thanks ;) yc Python code: # test.py #!/usr/bin/python file = <345k-line.txt> count = 0 for line in open (file, "r"): count = count + 1 print "Done: ", count OCaml code: (* test.ml *) let rec line_count filename = let f = open_in filename in let rec loop file count = try ignore (input_line file); loop file (count + 1) with End_of_file -> count in loop f 0;; let count = line_count <345k-line.txt> in Printf.printf "Done: %d" count;; Test $ time ./test.py Done: 345001 real 0m0.416s user 0m0.101s sys 0m0.247s $ ocamlopt -o test test.ml $ time ./test Done: 345001 real 0m1.483s user 0m0.631s sys 0m0.685s ------=_Part_10645_10573288.1191274074511 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi all -

Newbie question: I'm wondering what's the most efficient way to read in a file line by line?  I wrote a routine in both python and ocaml to read in a file with 345K lines to do line count and was surprised that python's code run roughly 3x faster.

I thought the speed should be equivalent and/or somewhat in ocaml favor, given this is an IO-bound comparison, but perhaps Python's simplistic for loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is unbuffered, but I'm not sure how to write a buffered code that can do a line by line read-in. 

Any insight is appreciated, thanks ;)

yc

Python code:
# test.py
#!/usr/bin/python

file = <345k-line.txt>
count = 0
for line in open (file, "r"):
    count = count + 1
print "Done: ", count

OCaml code:
(* test.ml *)
let rec line_count filename =
  let f = open_in filename in
  let rec loop file count =
    try
      ignore (input_line file);
      loop file (count + 1)
    with
      End_of_file -> count
  in
    loop f 0;;

let count = line_count <345k-line.txt> in
    Printf.printf "Done: %d" count;;

Test
$ time ./test.py
Done: 345001

real    0m0.416s
user   0m0.101s
sys    0m0.247s

$ ocamlopt -o test test.ml
$ time ./test
Done: 345001
real    0m1.483s
user   0m0.631s
sys    0m0.685s

------=_Part_10645_10573288.1191274074511--