From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=AWL,DNS_FROM_SECURITYSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 22B4DBBAF for ; Tue, 4 Nov 2008 14:35:07 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMCAFTeD0nAXQImgWdsb2JhbACUFwEBFiK2L4NT X-IronPort-AV: E=Sophos;i="4.33,543,1220220000"; d="scan'208";a="19551634" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 04 Nov 2008 14:35:06 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id mA4DZ6C0019133 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Tue, 4 Nov 2008 14:35:06 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArMCAFTeD0lDz4HegWdsb2JhbACUFwEBFiK2L4NT X-IronPort-AV: E=Sophos;i="4.33,543,1220220000"; d="scan'208";a="16843694" Received: from fettunta.fettunta.org ([67.207.129.222]) by mail2-smtp-roc.national.inria.fr with ESMTP; 04 Nov 2008 14:35:05 +0100 Received: from usha.takhisis.invalid (unknown [10.17.0.18]) by fettunta.fettunta.org (Postfix) with ESMTP id 656FA180E8 for ; Tue, 4 Nov 2008 13:35:04 +0000 (UTC) Received: by usha.takhisis.invalid (Postfix, from userid 1000) id A4BD860E2; Tue, 4 Nov 2008 14:26:43 +0100 (CET) Date: Tue, 4 Nov 2008 14:26:43 +0100 From: Stefano Zacchiroli To: caml-list@inria.fr Subject: Re: [Caml-list] integration of compression with channels Message-ID: <20081104132643.GA19888@usha.takhisis.invalid> Mail-Followup-To: caml-list@inria.fr References: <20081104125019.GA5817@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20081104125019.GA5817@localhost> User-Agent: Mutt/1.5.18 (2008-05-17) X-Miltered: at discorde with ID 49104F8A.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; zacchiroli:01 zack:01 trade-off:01 trade-off:01 camlzip:01 camlzip:01 cheers:01 zacchiroli:01 postdoc:01 zack:01 dietro:98 c'e:98 sempre:98 garbage:01 wrote:01 On Tue, Nov 04, 2008 at 07:50:20AM -0500, Eric Cooper wrote: > I initially tried something like this in the approx proxy server, but > found out the hard way that it was difficult to deal with corrupt .gz > files. You might only discover the corruption after reading garbage > for a while, and an exception at that point would be unexpected. I think you are trying to fight with an intrinsic underlying problem. Let's take the extreme end of integrity checks: checksum on the whole file. To be able to check that you need to see all the file in advance, compute its checksum, and compare with the expected checksum. On the other hand, abstractions like channels are precisely meant to read files in a streaming fashion, rather than all together. Bottom-line: there is a trade-off among "streamability" and integrity checks, it is up to you to choose where to put yourself in the trade-off. Actually, often it is not even up to you, but rather up to the file format you are reading. I don't know the gory details of the GZip format, but Camlzip does some sanity checks on GZip headers, spotting *some* of the possible header corruptions. It might be that you hit some corruption cases not implemented by Camlzip, in that case the proper solution is to add those checks to Camlzip. On the other hand, if you want to spot in advance corruptions which occur later on in the compressed file (and I don't know if GZip supports that or not) you have no choice beside buffering. Cheers. -- Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è sempre /oo\ All one has to do is hit the right uno zaino -- A.Bergonzoni \__/ keys at the right time -- J.S.Bach