From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: *** X-Spam-Status: No, score=3.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, DNS_FROM_SECURITYSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 7A840BB84 for ; Sun, 26 Oct 2008 20:47:46 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgDABNmBEnVBJXyXmdsb2JhbACBdZF7FwkKBhIFqx2DTw X-IronPort-AV: E=Sophos;i="4.33,489,1220220000"; d="scan'208";a="19217526" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 26 Oct 2008 20:47:46 +0100 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m9QJljQF021413 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Sun, 26 Oct 2008 20:47:46 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgDABNmBEnVBJXyXmdsb2JhbACBdZF7FwkKBhIFqx2DTw X-IronPort-AV: E=Sophos;i="4.33,489,1220220000"; d="scan'208";a="19217525" Received: from outmailhost.telefonica.net (HELO ctsmtpout2.frontal.correo) ([213.4.149.242]) by mail1-smtp-roc.national.inria.fr with ESMTP; 26 Oct 2008 20:47:45 +0100 Received: from NANA.localdomain (83.59.9.219) by ctsmtpout2.frontal.correo (7.2.056.6) (authenticated as ferferse$telefonica.net) id 4901A98A0007B50B for caml-list@inria.fr; Sun, 26 Oct 2008 20:47:44 +0100 Received: from mfp by NANA.localdomain with local (Exim 4.69) (envelope-from ) id 1KuBaR-0004gF-Ne for caml-list@inria.fr; Sun, 26 Oct 2008 20:47:43 +0100 Date: Sun, 26 Oct 2008 20:47:43 +0100 From: Mauricio Fernandez To: caml-list@inria.fr Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs Message-ID: <20081026194743.GK32611@NANA.localdomain> Mail-Followup-To: caml-list@inria.fr References: <20081023210527.GB32611@NANA.localdomain> <278256.11946.qm@web54605.mail.re2.yahoo.com> <20081023233657.GE32611@NANA.localdomain> <20081025185824.GG32611@NANA.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Miltered: at discorde with ID 4904C961.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; pxp:01 dtds:01 markus:01 mottl:01 trivial:01 prepend:01 runtime:01 decoding:01 subsumed:01 foo:01 baz:01 subset:01 recompile:01 recompiled:01 nodes:01 On Sun, Oct 26, 2008 at 02:15:18PM -0400, Markus Mottl wrote: > On Sat, Oct 25, 2008 at 2:58 PM, Mauricio Fernandez wrote: > > Unfortunately, growing sum types is far from being the only protocol extension > > of interest. There's a trivial extension which, I suspect, will be at > > least as common in practice, namely adding new fields to a record (or new > > elements to a tuple). bin-prot is unable to handle it adequately --- a > > self-describing format like the one I'm working on is required. (...) > With records / tuples it is exactly the other way round: you could, in > principle, read both in the old implementation, which just needs to drop > new, unknown fields, whereas the new implementation requires these fields > and hence cannot parse old protocols. This (having old consumers ignore extra fields) is what bin-prot doesn't support because records/tuples aren't self-delimited. It can only be done at the outermost level if you prepend the length of the message, and breaks as soon as you have a nested record or tuple type. In my format, records and tuples are self-delimited, so this is supported trivially. Note that it is possible for a new implementation to read old protocols by specifying default values for missing fields. This basically amounts to turning newly added fields into generalized option types (the only diff being whether the 'a option -> 'a conversion is controlled at the level of the type definition or distributed throughout the code). New code has to cope with the possibility that the fields might be None, that's all. Old code never sees those fields and works unmodified. > I don't see how any approach could "hande" the respective unsolvable > case. If a receiver doesn't know how to handle a tag, or if it > requires data that is not there, you'll be stuck. The former case is indeed unsolvable if the reader is to operate with that field in a specific (not polymorphic) way. (It can still do things involving only other fields, though.) In the second case, however, the receiver has got the advantage of hindsight: it knows that the extra data might not be present, and the code can cope with that. > Note, too, that even if you created an implementation which allows > handling extended records in old protocols, this would undoubtly come > at a pretty hefty cost. The only efficient way to do that would be to > exchange protocols and generate code at runtime to translate quickly > between protocols. I don't think it's worth it. ? I haven't optimized the generated code yet, but I'm seeing only a 25% drop in decoding speed compared to Marshal in my preliminary tests. Extra fields aren't even decoded, just saved in encoded form and appended to the output when serializing again. > > You might argue that this extension is subsumed by the ability to grow sum types, > > since you can go from > > > > type record = { a : int } with bin_io > > type msg = A of record > > > > to > > > > type record1 = { a : int } with bin_io > > type record2 = { a' : int; b : int } with bin_io > > type msg = A of record1 | B of record2 > > > > (Note how special care has to be taken to tag the record --- "explicit > > tagging" in ASN.1 parlance.) > > This is surely a clean way to extend protocols without losing backward > compatibility. It's bothersome for the programmer (picture type msg = ... | F of record6 and record6 = { a''''' : int; b'''': int; c''': float; d'': foo; e': bar; f : baz), and arguably worse than extending the record directly, because, as you said above, the receiver will not know how to handle the "B" tag, even though it would be perfectly able to decode the subset of the record it understands. It's safe only in one direction (new code can read old data). > > My design lifts that restriction and allows an old consumer to read the data > > from a new producer when new fields have been added to a record or a tuple. > > I'd probably bet that simply putting a protocol translator in front of > some old application you don't want to / cannot recompile would be > about as efficient. It's not always a matter of not recompiling the application, but rather of not having recompiled it *yet*: in a system with multiple nodes, it is hard to migrate them all to the updated code atomically... Putting a protocol translator in front of the old code is just as hard as updating it: it also means that all exchanges have to stop while the protocol translators are put in place --- hardly any advantage over just migrating to updated code. > > AFAICS the ability to process data not understood in full requires the use of > > a self-describing format like the one I'm working on. > > I'd go for the protocol translator. Especially if two protocols share > a lot of structure, it should be trivial to define translations. > Another very reasonable approach, which does not diminish performance, > would be to exchange protocol versions. Assuming that one side is > always more recent than the other, they should be able to support old > protocols directly. Protocol negotiation is not always possible. Consider the case of data stored on disk (or on any dummy server that only knows about files, not protocols) and accessed directly without an intermediate translation layer. -- Mauricio Fernandez - http://eigenclass.org