From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: *** X-Spam-Status: No, score=3.7 required=5.0 tests=DNS_FROM_RFC_ABUSE, DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,FORGED_YAHOO_RCVD autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 0F10DBB83 for ; Wed, 30 Aug 2006 08:03:20 +0200 (CEST) Received: from out1.smtp.messagingengine.com (out1.smtp.messagingengine.com [66.111.4.25]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7U63JQS008645 for ; Wed, 30 Aug 2006 08:03:19 +0200 Received: from frontend3.internal (frontend3.internal [10.202.2.152]) by frontend1.messagingengine.com (Postfix) with ESMTP id 1FBE7DA09A8; Wed, 30 Aug 2006 02:03:12 -0400 (EDT) Received: from heartbeat2.internal ([10.202.2.161]) by frontend3.internal (MEProxy); Wed, 30 Aug 2006 02:03:12 -0400 X-Sasl-enc: ttbw/96b1cHtBKaVKjRFgPCM1f3jtFuwdvpuc4ulDqkZ 1156917792 Received: from [192.168.0.2] (71-35-111-218.tukw.qwest.net [71.35.111.218]) by mail.messagingengine.com (Postfix) with ESMTP id 6C08F2129; Wed, 30 Aug 2006 02:03:11 -0400 (EDT) Message-ID: <44F52C40.3070702@yahoo.com> Date: Tue, 29 Aug 2006 23:12:16 -0700 From: Jeff Henrikson User-Agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Sam Steingold Cc: caml-list@inria.fr Subject: Re: [Caml-list] zcat vs CamlZip References: <44F48A17.5080005@podval.org> In-Reply-To: <44F48A17.5080005@podval.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-j-chkmail-Score: MSGID : 44F52A27.000 on nez-perce : j-chkmail score : X : 0/20 1 X-Miltered: at nez-perce with ID 44F52A27.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; henrikson:01 jehenrik:01 camlzip:01 ocaml:01 zlib:01 gunzip:01 zlib:01 ocamlplot:01 henrikson:01 unix:01 foo:01 mli:01 mli:01 compactions:01 compactions:01 I was planning on using the library "ocaml gz" in my application, which is a binding to zlib. I haven't done any detailed benchmarking, but I presume its speed is comparable to gzip/gunzip since they just call out to zlib. http://ocamlplot.sourceforge.net/ Jeff Henrikson Sam Steingold wrote: > I read through a huge *.gz file. > I have two versions of the code: > > 1. use Unix.open_process_in "zcat foo.gz". > > 2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09. > > it turns out that the zcat version is 3(!) times as fast as the > gzip.mli one: > > Run time: 189.435840 sec > Self: 189.435840 sec > sys: 183.447465 sec > user: 5.988375 sec > Children: 0.000000 sec > sys: 0.000000 sec > user: 0.000000 sec > GC: minor: 169778 > major: 478 > compactions: 3 > Allocated: 5510457762.0 words > Wall clock: 206 sec (00:03:26) > > vs > > Run time: 58.471655 sec > Self: 54.855429 sec > sys: 48.527033 sec > user: 6.328396 sec > Children: 3.616226 sec > sys: 3.168198 sec > user: 0.448028 sec > GC: minor: 43174 > major: 229 > compactions: 5 > Allocated: 1401290543.0 words > Wall clock: 78 sec (00:01:18) > > since gzip.mli lacks input_line function, I had to roll my own: > > let buf = Buffer.create 1024 > let gz_input_line gz_in char_counter line_counter = > Buffer.clear buf; > let finish () = incr line_counter; Buffer.contents buf in > let rec loop () = > let ch = Gzip.input_char gz_in in > char_counter := Int64.succ !char_counter; > if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop > (); ) in > try loop () > with End_of_file -> > if Buffer.length buf = 0 then raise End_of_file else finish () > > is there something wrong with my gz_input_line? > is this a know performance issue with the CamlZip library? > > thanks. > Sam. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs