From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.5 required=5.0 tests=SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id B0163BC29 for ; Tue, 29 Aug 2006 21:37:47 +0200 (CEST) Received: from biscayne-one-station.mit.edu (BISCAYNE-ONE-STATION.MIT.EDU [18.7.7.80]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id k7TJbkTm001795 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 29 Aug 2006 21:37:47 +0200 Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by biscayne-one-station.mit.edu (8.13.6/8.9.2) with ESMTP id k7TJbjsW004720 for ; Tue, 29 Aug 2006 15:37:45 -0400 (EDT) Received: from contents-vnder-pressvre.mit.edu (CONTENTS-VNDER-PRESSVRE.MIT.EDU [18.7.16.67]) (authenticated bits=56) (User authenticated as jfc@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id k7TJbfbj012529 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 29 Aug 2006 15:37:42 -0400 (EDT) Received: (from jfc@localhost) by contents-vnder-pressvre.mit.edu (8.12.9.20060308) id k7TJbfWP016413; Tue, 29 Aug 2006 15:37:41 -0400 (EDT) Message-Id: <200608291937.k7TJbfWP016413@contents-vnder-pressvre.mit.edu> To: caml-list@inria.fr Subject: Re: [Caml-list] Re: zcat vs CamlZip In-Reply-To: Your message of "Tue, 29 Aug 2006 20:54:17 +0200." Date: Tue, 29 Aug 2006 15:37:41 -0400 From: John Carr X-Scanned-By: MIMEDefang 2.42 X-Miltered: at concorde with ID 44F4978A.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; camlzip:01 optionally:01 buf:01 buf:01 failwith:01 failwith:01 wrote:01 char:01 char:01 caml-list:01 unsafe:01 unsafe:01 data:02 data:02 defined:02 > This is your most likely culprit. Any kind of "do this for every > character" is usually insanely expensive when you can do it in bulk. I wrote a program that read data from a text file, which could optionally be compressed. I defined my text file format to have nearly-fixed length lines so I could call Gzip.really_input. My program doesn't spend much of its time reading the text file so I didn't spend much time making input fast. I just did what I thought the obvious optimization of reading a block of characters in the normal case. let input_line = begin function Uncompressed c -> input_line c | Compressed c -> begin match Gzip.input_char c with '#' -> while Gzip.input_char c <> '\n' do () done; "#" | 'S' -> let buf = String.make 11 'S' in Gzip.really_input c buf 1 10; if String.unsafe_get buf 10 = '\n' then String.unsafe_set buf 10 ' ' else begin if Gzip.input_char c <> '\n' then failwith "bad override file" end; buf | _ -> failwith "bad override file" end end (Lines are variable-length comments beginning '#' or data lines beginning with 'S' followed by 9 or 10 characters.)