From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, MISSING_HEADERS,RCVD_IN_BL_SPAMCOP_NET,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 31B12BC29 for ; Wed, 30 Aug 2006 02:53:39 +0200 (CEST) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.226]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7U0rcjK000982 for ; Wed, 30 Aug 2006 02:53:38 +0200 Received: by wx-out-0506.google.com with SMTP id s7so21521wxc for ; Tue, 29 Aug 2006 17:53:37 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=CbxraeKWcQCf6Lk0w1h2wog5r7PuScipj5jAUHXVQTfNXCLsc5Ftyf40VyJjhKW0beR/mNUod0vh+of0tUL4n/igWEdmCn/1mTkgKXseqGNbBHi3KuH1rKVRmC8qkO8L64GhEYIycsNkIbjwCyC2jIRPZQePnckmgIxzPhysiFM= Received: by 10.70.90.18 with SMTP id n18mr373755wxb; Tue, 29 Aug 2006 17:53:37 -0700 (PDT) Received: by 10.70.84.13 with HTTP; Tue, 29 Aug 2006 17:53:37 -0700 (PDT) Message-ID: Date: Wed, 30 Aug 2006 12:53:37 +1200 From: "Jonathan Roewen" Subject: Re: [Caml-list] Re: zcat vs CamlZip Cc: caml-list@inria.fr In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44F48A17.5080005@podval.org> <44F49267.9080904@podval.org> <1156881856.21883.124.camel@localhost.localdomain> X-j-chkmail-Score: MSGID : 44F4E192.000 on nez-perce : j-chkmail score : X : 0/20 1 X-Miltered: at nez-perce with ID 44F4E192.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; camlzip:01 malc:01 malc:01 pulsesoft:01 gerd:01 stolpmann:01 ocamlnet:01 netchannels:01 admittedly:01 implements:01 netchannels:01 buffered:01 buf:01 buffer:01 buffer:01 Have you tried Unzip module from Extlib? Haven't tried it, but plan on using it later on. Jonathan On 8/30/06, malc wrote: > On Tue, 29 Aug 2006, Gerd Stolpmann wrote: > > > Am Dienstag, den 29.08.2006, 15:15 -0400 schrieb Sam Steingold: > >> at any rate, do you really expect that using Gzip.input and then > >> searching the result for a newline, slicing and dicing to get the > >> individual input lines, &c &c would be faster? > > > > Ah yes, and there is an easy solution with ocamlnet: > > [..snip..] > > > This adds a buffering layer. > > The Netchannels buffering looks very elegant, but my (admittedly rather > cursory) testing shows that it's also rather slow. > > Following code implements 4 line readers: > Sam's original [char] > Netchannels [net] > open_process_in [zcat] > and buffered (trying to stay compatible with original interface) [block] > > While Netchannels do win over original implementation it looses to all > other methods (on my machine). > > let buf = Buffer.create 1024 > let gz_input_line gz_in char_counter line_counter = > Buffer.clear buf; > let finish () = incr line_counter; Buffer.contents buf in > let rec loop () = > let ch = Gzip.input_char gz_in in > char_counter := Int64.succ !char_counter; > if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop (); ) in > try loop () > with End_of_file -> > if Buffer.length buf = 0 then raise End_of_file else finish () > > class input_gzip_rec gzip_ch : Netchannels.rec_in_channel = > object(self) > method input s p l = > let n = Gzip.input gzip_ch s p l in > if n = 0 then raise End_of_file; > n > method close_in() = > Gzip.close_in gzip_ch > end > > let wrap_gz gz_in = > let s = String.create 4096 in > let b = Buffer.create 1024 in > let r = ref (fun _ _ -> assert false) in > let findlf s start finish = > let rec loop pos = if pos >= finish then None > else if String.unsafe_get s pos = '\n' then Some pos else loop (succ pos) > in loop start > in > let rec cont pos char_counter line_counter = > let n = Gzip.input gz_in s pos (String.length s - pos) in > let rec subcont pos len char_counter line_counter = > let finish = pos + len in > match findlf s pos finish with > | None -> > Buffer.add_substring b s pos len; > cont 0 char_counter line_counter > > | Some lfpos -> > let runlen = lfpos - pos in > incr line_counter; > Buffer.add_substring b s pos runlen; > let s = Buffer.contents b in > Buffer.clear b; > r := subcont (succ lfpos) (len - succ runlen); > s > in > if n = 0 > then raise End_of_file > else ( > char_counter := Int64.add (Int64.of_int n) !char_counter; > subcont pos n char_counter line_counter > ) > in > let exec c l = !r c l in > r := cont 0; > exec > > let char () = > let gz = Gzip.open_in_chan stdin in > let cc = ref 0L in > let lc = ref 0 in > try > while true > do > let _line = gz_input_line gz cc lc in > () > done > with End_of_file -> > Format.printf "cc=%Ld lc=%d@." !cc !lc > > let block () = > let gz = Gzip.open_in_chan stdin in > let cc = ref 0L in > let lc = ref 0 in > let lg = wrap_gz gz in > try > while true > do > let _line = lg cc lc in > () > done > with End_of_file -> > Format.printf "cc=%Ld lc=%d@." !cc !lc > > let zcat () = > let ic = Unix.open_process_in "zcat" in > let cc = ref 0L in > let lc = ref 0 in > try > while true > do > let _line = input_line ic in > cc := Int64.add (Int64.of_int (String.length _line + 1)) !cc; > incr lc > done > with End_of_file -> > Format.printf "cc=%Ld lc=%d@." !cc !lc > > let net () = > let gz_in = Gzip.open_in_chan stdin in > let gz_ch = Netchannels.lift_in (`Rec (new input_gzip_rec gz_in)) in > let cc = ref 0L in > let lc = ref 0 in > try > while true > do > let _line = gz_ch#input_line () in > cc := Int64.add (Int64.of_int (String.length _line + 1)) !cc; > incr lc > done > with End_of_file -> > Format.printf "cc=%Ld lc=%d@." !cc !lc > > let _ = > match Sys.argv with > | [| _; "char" |] -> char () > | [| _; "zcat" |] -> zcat () > | [| _; "block" |] -> block () > | [| _; "net" |] -> net () > | _ -> prerr_endline (Sys.argv.(0) ^ ": [char|zcat|block|net]") > > -- > mailto:malc@pulsesoft.com > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >