From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: *** X-Spam-Status: No, score=3.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, DNS_FROM_SECURITYSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id AA5D7BBAF for ; Sat, 25 Oct 2008 20:58:26 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjwEANoIA0nVBJXyY2dsb2JhbACBdZF7FwkKCBAFrESDTw X-IronPort-AV: E=Sophos;i="4.33,484,1220220000"; d="scan'208";a="19191010" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 25 Oct 2008 20:58:26 +0200 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m9PIwQNi029217 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Sat, 25 Oct 2008 20:58:26 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjwEANoIA0nVBJXyY2dsb2JhbACBdZF7FwkKCBAFrESDTw X-IronPort-AV: E=Sophos;i="4.33,484,1220220000"; d="scan'208";a="19191009" Received: from outmailhost.telefonica.net (HELO ctsmtpout3.frontal.correo) ([213.4.149.242]) by mail1-smtp-roc.national.inria.fr with ESMTP; 25 Oct 2008 20:58:25 +0200 Received: from NANA.localdomain (83.59.9.219) by ctsmtpout3.frontal.correo (7.3.135) (authenticated as ferferse$telefonica.net) id 490319AC00042903 for caml-list@inria.fr; Sat, 25 Oct 2008 20:58:25 +0200 Received: from mfp by NANA.localdomain with local (Exim 4.69) (envelope-from ) id 1KtoLA-0000UA-Be for caml-list@inria.fr; Sat, 25 Oct 2008 20:58:24 +0200 Date: Sat, 25 Oct 2008 20:58:24 +0200 From: Mauricio Fernandez To: caml-list@inria.fr Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs Message-ID: <20081025185824.GG32611@NANA.localdomain> Mail-Followup-To: caml-list@inria.fr References: <20081023210527.GB32611@NANA.localdomain> <278256.11946.qm@web54605.mail.re2.yahoo.com> <20081023233657.GE32611@NANA.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Miltered: at discorde with ID 49036C52.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; pxp:01 dtds:01 markus:01 mottl:01 mikkel:01 rgensen:01 mikkel:01 overkill:01 abstraction:01 encodings:01 constructors:01 constructors:01 variants:01 endian:01 trivial:01 On Fri, Oct 24, 2008 at 10:03:47AM -0400, Markus Mottl wrote: > On Fri, Oct 24, 2008 at 5:11 AM, Mikkel Fahnøe Jørgensen > wrote: > > I guess this discussion is an overkill for the problem at hand, but > > speaking of binary extensible protocols, have you looked at ASN.1? It > > is an abstraction over any number of encodings. At least one binary > > encoding has extension bits to allow future growth of object > > collections and similar. > > Note that it is perfectly safe to grow sum types with bin-prot. It > was designed that way intentionally. It's just not safe to reorder or > remove elements. Nobody needs to reorder elements, because it doesn't > make any operational difference in the program. Backward > compatibility of protocols you define necessarily requires the > presence of old constructors in sum types anyway so you may not want > to remove those in any case. There is hardly any harm from the > protocol perspective in leaving old constructors in there. > > Note, too, that polymorphic variants even allow reordering with > bin-prot. (...) > > Except for human-readability, I think bin-prot should scale very well > on the other requirements of serialization protocols once it has been > ported to architectures with unusual endianness (almost all machines > are little endian nowadays so hardly anybody on this list should be > affected). Unfortunately, growing sum types is far from being the only protocol extension of interest. There's a trivial extension which, I suspect, will be at least as common in practice, namely adding new fields to a record (or new elements to a tuple). bin-prot is unable to handle it adequately --- a self-describing format like the one I'm working on is required. You might argue that this extension is subsumed by the ability to grow sum types, since you can go from type record = { a : int } with bin_io type msg = A of record to type record1 = { a : int } with bin_io type record2 = { a' : int; b : int } with bin_io type msg = A of record1 | B of record2 (Note how special care has to be taken to tag the record --- "explicit tagging" in ASN.1 parlance.) However, this merely solves a part of a problem: that all serializations according to an old type belong to the possible serializations for an updated type, or, in other words, that new consumers be able to read data written by old producers. Even with the above encoding (not with any arbitrary type definition, but with a carefully constructed one), with bin-prot, this implies that producers not be updated before consumers. My design lifts that restriction and allows an old consumer to read the data from a new producer when new fields have been added to a record or a tuple. It even allows a node to operate on data it doesn't understand completely (e.g., when a new constructor is used): it can for instance update one field it does know while leaving those it is unable to interpret (or doesn't even know about!) unmodified. I think this is very important in many of the scenarios where one would need an extensible binary protocol. Google's Protocol Buffers support this; I'm not sure this is explicitly supported by Facebook's Thrift compiler, but IIRC the protocol should allow it. AFAICS the ability to process data not understood in full requires the use of a self-describing format like the one I'm working on. -- Mauricio Fernandez - http://eigenclass.org