caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Stefano Zacchiroli <zack@upsilon.cc>
To: caml-list@inria.fr
Subject: Re: [Caml-list] integration of compression with channels
Date: Tue, 4 Nov 2008 14:26:43 +0100	[thread overview]
Message-ID: <20081104132643.GA19888@usha.takhisis.invalid> (raw)
In-Reply-To: <20081104125019.GA5817@localhost>

On Tue, Nov 04, 2008 at 07:50:20AM -0500, Eric Cooper wrote:
> I initially tried something like this in the approx proxy server, but
> found out the hard way that it was difficult to deal with corrupt .gz
> files.  You might only discover the corruption after reading garbage
> for a while, and an exception at that point would be unexpected.

I think you are trying to fight with an intrinsic underlying problem.

Let's take the extreme end of integrity checks: checksum on the whole
file. To be able to check that you need to see all the file in
advance, compute its checksum, and compare with the expected checksum.
On the other hand, abstractions like channels are precisely meant to
read files in a streaming fashion, rather than all together.

Bottom-line: there is a trade-off among "streamability" and integrity
checks, it is up to you to choose where to put yourself in the
trade-off.

Actually, often it is not even up to you, but rather up to the file
format you are reading.  I don't know the gory details of the GZip
format, but Camlzip does some sanity checks on GZip headers, spotting
*some* of the possible header corruptions. It might be that you hit
some corruption cases not implemented by Camlzip, in that case the
proper solution is to add those checks to Camlzip.  On the other hand,
if you want to spot in advance corruptions which occur later on in the
compressed file (and I don't know if GZip supports that or not) you
have no choice beside buffering.

Cheers.

-- 
Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è sempre /oo\ All one has to do is hit the right
uno zaino        -- A.Bergonzoni \__/ keys at the right time -- J.S.Bach


      reply	other threads:[~2008-11-04 13:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-04 12:50 Eric Cooper
2008-11-04 13:26 ` Stefano Zacchiroli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081104132643.GA19888@usha.takhisis.invalid \
    --to=zack@upsilon.cc \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).