caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Martin Jambon <ocaml@martin_jambon.emailuser.net>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Marshalling data format deteriorates compressibility
Date: Tue, 27 Jun 2006 16:31:03 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.63.0606271629090.8465@munge> (raw)

[to the list admin: why can I send messages with this subscription and not
  with martin_jambon@emailuser.net?]

On Tue, 27 Jun 2006, Markus Mottl wrote:

> We finally found out what causes the problem: OCaml represents
> pointers to shared data values using relative offsets instead of
> absolute positions within the marshalled data.  This means that e.g.
> an array containing pointers to these values will be represented by a
> sequence of increasing relative offsets, which essentially renders it
> almost incompressible to the usual compression algorithms.
> 
> As it seems, the current marshalling algorithm uses this relative
> addressing approach to save space: relative offsets are encoded with
> variable length (this assumes some degree of locality), which is not
> possible with absolute addressing.  Unfortunately, this does not take
> compression algorithms into account, which may greatly benefit from
> repeating patterns of pointers.

Maybe you can convert the data into a marshal-optimized format before 
marshalling, where you put your shared data into an array, and substitute the 
pointers by array indices, e.g.
   type t = string list
becomes:
   type marshalled_t = (string array * int list)

["a"; "b"; "a"; "a"] -> ([| "a"; "b" |], [0; 1; 0; 0])

That seems like a lot of work, but it shouldn't be too hard to maintain.

By the way, floats don't compress very well. Rounding them as much as possible 
used to save me about 50% of space.


Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr


             reply	other threads:[~2006-06-28 14:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-27 23:31 Martin Jambon [this message]
2006-06-28 22:48 ` Markus Mottl
  -- strict thread matches above, loose matches on Subject: below --
2006-06-27 23:42 Martin Jambon
2006-06-27 21:04 Markus Mottl
2006-06-28  0:00 ` [Caml-list] " Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.63.0606271629090.8465@munge \
    --to=ocaml@martin_jambon.emailuser.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).