From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,MAILTO_TO_REMOVE,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 1ED0CBC29 for ; Tue, 29 Aug 2006 20:55:12 +0200 (CEST) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id k7TItBqi023718 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Tue, 29 Aug 2006 20:55:11 +0200 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1GI8jS-0000tk-MW for caml-list@inria.fr; Tue, 29 Aug 2006 20:54:42 +0200 Received: from 0x5731d08c.bynxx18.adsl-dhcp.tele.dk ([87.49.208.140]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 29 Aug 2006 20:54:42 +0200 Received: from spam by 0x5731d08c.bynxx18.adsl-dhcp.tele.dk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 29 Aug 2006 20:54:42 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: caml-list@inria.fr From: Bardur Arantsson Subject: Re: zcat vs CamlZip Date: Tue, 29 Aug 2006 20:54:17 +0200 Message-ID: References: <44F48A17.5080005@podval.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 0x5731d08c.bynxx18.adsl-dhcp.tele.dk User-Agent: Thunderbird 1.5.0.5 (X11/20060729) In-Reply-To: <44F48A17.5080005@podval.org> Sender: news X-Miltered: at concorde with ID 44F48D8F.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; camlzip:01 buf:01 buffer:01 buffer:01 buf:01 inlined:01 wrote:01 rec:01 char:01 char:01 let:03 let:03 character:04 gzip:04 incr:05 Sam Steingold wrote: > I read through a huge *.gz file. > I have two versions of the code: [--snip--] > > let buf = Buffer.create 1024 > let gz_input_line gz_in char_counter line_counter = > Buffer.clear buf; > let finish () = incr line_counter; Buffer.contents buf in > let rec loop () = > let ch = Gzip.input_char gz_in in This is your most likely culprit. Any kind of "do this for every character" is usually insanely expensive when you can do it in bulk. (This is especially true when needing to do system calls, or if the called function cannot be inlined.) -- Bardur Arantsson If you can't join 'em, beat 'em. Preferably with a big stick.