caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Mauricio Fernandez <mfp@acm.org>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs
Date: Sat, 25 Oct 2008 20:58:24 +0200	[thread overview]
Message-ID: <20081025185824.GG32611@NANA.localdomain> (raw)
In-Reply-To: <f8560b80810240703k560941e3xa5ff89e5aa0ad8f0@mail.gmail.com>

On Fri, Oct 24, 2008 at 10:03:47AM -0400, Markus Mottl wrote:
> On Fri, Oct 24, 2008 at 5:11 AM, Mikkel Fahnøe Jørgensen
> <mikkel@dvide.com> wrote:
> > I guess this discussion is an overkill for the problem at hand, but
> > speaking of binary extensible protocols, have you looked at ASN.1? It
> > is an abstraction over any number of encodings. At least one binary
> > encoding has extension bits to allow future growth of object
> > collections and similar.
> 
> Note that it is perfectly safe to grow sum types with bin-prot.  It
> was designed that way intentionally.  It's just not safe to reorder or
> remove elements.  Nobody needs to reorder elements, because it doesn't
> make any operational difference in the program.  Backward
> compatibility of protocols you define necessarily requires the
> presence of old constructors in sum types anyway so you may not want
> to remove those in any case.  There is hardly any harm from the
> protocol perspective in leaving old constructors in there.
> 
> Note, too, that polymorphic variants even allow reordering with
> bin-prot. (...)
> 
> Except for human-readability, I think bin-prot should scale very well
> on the other requirements of serialization protocols once it has been
> ported to architectures with unusual endianness (almost all machines
> are little endian nowadays so hardly anybody on this list should be
> affected).

Unfortunately, growing sum types is far from being the only protocol extension
of interest. There's a trivial extension which, I suspect, will be at
least as common in practice, namely adding new fields to a record (or new
elements to a tuple). bin-prot is unable to handle it adequately --- a
self-describing format like the one I'm working on is required.

You might argue that this extension is subsumed by the ability to grow sum types,
since you can go from

    type record = { a : int } with bin_io
    type msg = A of record

to 

    type record1 = { a : int } with bin_io
    type record2 = { a' : int; b : int } with bin_io
    type msg = A of record1 | B of record2

(Note how special care has to be taken to tag the record --- "explicit
tagging" in ASN.1 parlance.)

However, this merely solves a part of a problem: that all serializations
according to an old type belong to the possible serializations for an
updated type, or, in other words, that new consumers be able to read data
written by old producers. Even with the above encoding (not with any arbitrary
type definition, but with a carefully constructed one), with bin-prot, this
implies that producers not be updated before consumers.

My design lifts that restriction and allows an old consumer to read the data
from a new producer when new fields have been added to a record or a tuple. 
It even allows a node to operate on data it doesn't understand completely
(e.g., when a new constructor is used): it can for instance update one
field it does know while leaving those it is unable to interpret (or doesn't
even know about!) unmodified. I think this is very important in many of the
scenarios where one would need an extensible binary protocol. Google's
Protocol Buffers support this; I'm not sure this is explicitly supported by
Facebook's Thrift compiler, but IIRC the protocol should allow it.

AFAICS the ability to process data not understood in full requires the use of
a self-describing format like the one I'm working on.

-- 
Mauricio Fernandez  -   http://eigenclass.org


  reply	other threads:[~2008-10-25 18:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-22 20:11 Dario Teixeira
2008-10-22 23:05 ` Sylvain Le Gall
2008-10-23 15:34   ` [Caml-list] " Dario Teixeira
2008-10-23 16:37     ` Stefano Zacchiroli
2008-10-23 16:53       ` Markus Mottl
2008-10-23 19:26       ` Dario Teixeira
2008-10-23 21:05         ` Mauricio Fernandez
2008-10-23 22:18           ` Gerd Stolpmann
2008-10-23 22:50             ` Mauricio Fernandez
2008-10-23 22:21           ` Dario Teixeira
2008-10-23 23:36             ` Mauricio Fernandez
2008-10-24  9:11               ` Mikkel Fahnøe Jørgensen
2008-10-24 14:03                 ` Markus Mottl
2008-10-25 18:58                   ` Mauricio Fernandez [this message]
2008-10-26 18:15                     ` Markus Mottl
2008-10-26 19:47                       ` Mauricio Fernandez
2008-10-24 21:39                 ` Mauricio Fernandez
2008-10-24 22:27                   ` Mikkel Fahnøe Jørgensen
2008-10-25 19:19                     ` Mauricio Fernandez
2008-10-23 16:46     ` Markus Mottl
2008-10-23 14:55 ` [Caml-list] " Gerd Stolpmann
2008-10-23 18:41 [Caml-list] " Dario Teixeira
2008-10-23 18:58 ` Markus Mottl
2008-10-23 20:04   ` Dario Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081025185824.GG32611@NANA.localdomain \
    --to=mfp@acm.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).