caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* zcat vs CamlZip
@ 2006-08-29 18:40 Sam Steingold
  2006-08-29 18:54 ` Bardur Arantsson
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sam Steingold @ 2006-08-29 18:40 UTC (permalink / raw)
  To: caml-list

I read through a huge *.gz file.
I have two versions of the code:

1. use Unix.open_process_in "zcat foo.gz".

2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09.

it turns out that the zcat version is 3(!) times as fast as the gzip.mli 
one:

Run time: 189.435840 sec
Self:     189.435840 sec
      sys: 183.447465 sec
     user: 5.988375 sec
Children: 0.000000 sec
      sys: 0.000000 sec
     user: 0.000000 sec
GC:     minor: 169778
         major: 478
   compactions: 3
Allocated:  5510457762.0 words
Wall clock:  206 sec (00:03:26)

vs

Run time: 58.471655 sec
Self:     54.855429 sec
      sys: 48.527033 sec
     user: 6.328396 sec
Children: 3.616226 sec
      sys: 3.168198 sec
     user: 0.448028 sec
GC:     minor: 43174
         major: 229
   compactions: 5
Allocated:  1401290543.0 words
Wall clock:  78 sec (00:01:18)

since gzip.mli lacks input_line function, I had to roll my own:

let buf = Buffer.create 1024
let gz_input_line gz_in char_counter line_counter =
   Buffer.clear buf;
   let finish () = incr line_counter; Buffer.contents buf in
   let rec loop () =
     let ch = Gzip.input_char gz_in in
     char_counter := Int64.succ !char_counter;
     if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop (); 
) in
   try loop ()
   with End_of_file ->
     if Buffer.length buf = 0 then raise End_of_file else finish ()

is there something wrong with my gz_input_line?
is this a know performance issue with the CamlZip library?

thanks.
Sam.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-08-30  6:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-29 18:40 zcat vs CamlZip Sam Steingold
2006-08-29 18:54 ` Bardur Arantsson
2006-08-29 19:01   ` [Caml-list] " Florian Hars
2006-08-29 19:15   ` Sam Steingold
2006-08-29 19:48     ` Bárður Árantsson
2006-08-29 19:54     ` [Caml-list] " Gerd Stolpmann
2006-08-29 20:04     ` Gerd Stolpmann
2006-08-30  0:44       ` malc
2006-08-30  0:53         ` Jonathan Roewen
2006-08-29 19:37   ` John Carr
2006-08-29 19:11 ` [Caml-list] " Eric Cooper
2006-08-30  6:12 ` Jeff Henrikson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).