From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,DNS_FROM_SECURITYSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 1F489BBAF for ; Tue, 4 Nov 2008 13:50:24 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArQCANLTD0nAXQIngWdsb2JhbACBdpIhAQEWIqp/hkOELINT X-IronPort-AV: E=Sophos;i="4.33,543,1220220000"; d="scan'208";a="19549827" Received: from concorde.inria.fr ([192.93.2.39]) by mail1-smtp-roc.national.inria.fr with ESMTP; 04 Nov 2008 13:50:23 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id mA4CoNFJ011613 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Tue, 4 Nov 2008 13:50:23 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiUBANLTD0mAArkpkWdsb2JhbACBdpIhAQEBAQkLCgcRBap6hkOELINT X-IronPort-AV: E=Sophos;i="4.33,543,1220220000"; d="scan'208";a="16841951" Received: from chokecherry.srv.cs.cmu.edu ([128.2.185.41]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 04 Nov 2008 13:50:22 +0100 Received: from stratocaster.home (pool-96-236-220-57.pitbpa.fios.verizon.net [96.236.220.57]) (authenticated bits=0) by chokecherry.srv.cs.cmu.edu (8.13.6/8.13.6) with ESMTP id mA4CoKiA002689 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Tue, 4 Nov 2008 07:50:21 -0500 (EST) Received: from ecc by stratocaster.home with local (Exim 4.69) (envelope-from ) id 1KxLMS-0001Zm-2V for caml-list@inria.fr; Tue, 04 Nov 2008 07:50:20 -0500 Date: Tue, 4 Nov 2008 07:50:20 -0500 From: Eric Cooper To: caml-list@inria.fr Subject: integration of compression with channels Message-ID: <20081104125019.GA5817@localhost> Mail-Followup-To: caml-list@inria.fr MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) X-Scanned-By: mimedefang-cmuscs on 128.2.185.41 X-Miltered: at concorde with ID 4910450F.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; zack:01 ocaml:01 gunzip:01 camlzip:01 blog:98 garbage:01 exception:01 unexpected:03 gzip:04 gzip:04 perhaps:05 i'd:06 integrating:07 argue:08 approach:08 I was interested to see Zack's work on integrating gzip and bzip2 with I/O channels: http://upsilon.cc/~zack/blog/posts/2008/11/ocaml_batteries_gzip/ I initially tried something like this in the approx proxy server, but found out the hard way that it was difficult to deal with corrupt .gz files. You might only discover the corruption after reading garbage for a while, and an exception at that point would be unexpected. Eventually I switched to spawning a "gunzip" process to a temporary file, and then reading that. In addition to detecting corruption early, it was also significantly faster than CamlZip. I suppose one could argue that you can get an I/O error even from reading an uncompressed file (bad disk block, or whatever), and that a robust program should be equally prepared to deal with that. But I think there's a real difference in practice. The integrated approach is definitely more elegant, and perhaps the performance will be competitive someday. So I'd be interested if anyone has a better way of handling potentially corrupt files. -- Eric Cooper e c c @ c m u . e d u