From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: *** X-Spam-Status: No, score=3.9 required=5.0 tests=DNS_FROM_RFC_ABUSE, DNS_FROM_SECURITYSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id C8842BBAF for ; Thu, 23 Oct 2008 23:05:30 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjADAKCDAEnAXQImgWdsb2JhbACBdZF6AQEWIq5tg04 X-IronPort-AV: E=Sophos;i="4.33,472,1220220000"; d="scan'208";a="16418040" Received: from discorde.inria.fr ([192.93.2.38]) by mail2-smtp-roc.national.inria.fr with ESMTP; 23 Oct 2008 23:05:30 +0200 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m9NL5UBi008284 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Thu, 23 Oct 2008 23:05:30 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjADAGSDAEnVBJXyb2dsb2JhbACBdZF6AQwKCwcPBa5kg04 X-IronPort-AV: E=Sophos;i="4.33,472,1220220000"; d="scan'208";a="30676310" Received: from outmailhost.telefonica.net (HELO ctsmtpout4.frontal.correo) ([213.4.149.242]) by mail4-smtp-sop.national.inria.fr with ESMTP; 23 Oct 2008 23:05:29 +0200 Received: from NANA.localdomain (83.59.9.219) by ctsmtpout4.frontal.correo (7.2.056.6) (authenticated as ferferse$telefonica.net) id 490014770004156E for caml-list@inria.fr; Thu, 23 Oct 2008 23:05:28 +0200 Received: from mfp by NANA.localdomain with local (Exim 4.69) (envelope-from ) id 1Kt7N2-0001ir-04 for caml-list@inria.fr; Thu, 23 Oct 2008 23:05:28 +0200 Date: Thu, 23 Oct 2008 23:05:27 +0200 From: Mauricio Fernandez To: caml-list@inria.fr Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs Message-ID: <20081023210527.GB32611@NANA.localdomain> Mail-Followup-To: caml-list@inria.fr References: <20081023163738.GA27707@usha.takhisis.invalid> <561320.93277.qm@web54606.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <561320.93277.qm@web54606.mail.re2.yahoo.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Miltered: at discorde with ID 4900E71A.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; pxp:01 dtds:01 foobar:01 constructors:01 foobar:01 ocaml:01 model:01 model:01 ocaml's:01 compiler:01 ocaml:01 haskell:01 syntax:01 rtt:01 bool:01 On Thu, Oct 23, 2008 at 12:26:54PM -0700, Dario Teixeira wrote: > > I mean, as long as types are as simples are pairs we will > > probably write down the very same S-expression, but for more > > complex types you hand up having to choose how to encode them > > in S-expressions. Such design choices can need to be changed > > in the future as more types will be supported. I fail to see > > why the future-proofness of such choices > > should be better than that of bin-prot. > > Hi, > > Well, there's several types of "future-proofness". If in the far-future I > was faced with the task of reverse-engineering and deserialising a structure > about whose contents I only had a rough idea, then a human-readable > text-format like that of S-expressions would simplify things enormously. On > a more down-to-earth scenario, bear in mind that S-expressions offer > forward-compatibility as long as you are only adding to a structure. > > For example, suppose I have a type foobar_t with two > constructors: > > type foobar_t = One | Two > > If later on I add a third constructor "Three" to this type, > the deserialiser for the new version can still read S-expressions > written with the serialiser for the old version. I have been working for a while on a self-describing, compact, extensible binary protocol, along with an OCaml implementation which I intent to release in not too long. It differs from sexplib and that bin-prot in two main ways: * the data model is deliberately more limited, as the format is meant to be de/encodable in multiple languages. * it is extensible at several levels, achieving both forward and backward compatibility across changes in the data type You can think of it as an extensible Protocol Buffers[1] with a richer data model (albeit not in 1:1 accordance with OCaml's for the above mentioned reason). In the criteria you gave in another message, namely (1) ease of use (2) "future-proofness" (3) portability (4) human-readability, it does fairly well at the 3 first ones --- especially at (2) and (3), which were poorly supported by existing solutions (I looked into bin-prot, sexplib, Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T X.690 DER during the design). Being a binary format, it obviously doesn't do that well at (4), but it is possible to get a human-readable dump of the binary data even in the absence of the interface definition, making reverse-engineering no harder than sexplib (and arguably easier in some ways). For example, here's a bogus message definition to illustrate (2) and (4). This protocol definition is fed to the compiler, which generates the OCaml type definitions, as well as the encoders/decoders and pretty-printers (as you can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but it's pretty clear IMO) type sum_type 'a 'b 'c = A 'a | B 'b | C 'c message complex_rtt = A { a1 : [(int * [|bool|])]; a2 : [ sum_type ] } | B { b1 : bool; b2 : (string * [int]) } The protocol is extensible in the sense that you can add new constructors to a sum or message type, add new elements to a tuple, and replace any primitive type by a sum type including the original type. For instance, if at some point in time we find that the b1 field should have a different type, we can do type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a and then ... | B { b1 : bool_or_something; ... } This, along with a way to specify default values, allows both forward and backward compatibility. The compiler generates a pretty printer for these structures, useful for debugging. Here's a message generated randomly: { Complex_rtt.a1 = [ ((-5378), [| false; false; false; true; true |]); (3942717140522000971, [| false; true; true; true; false |]); ((-6535386320450295), [| false |]); ((-238860767206), [| |]); (1810196202, [| false; false; true; true |]) ]; Complex_rtt.a2 = [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83; Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] } Now, this is the information decoded in the absence of the above definitions (iow., what you'd have to work with if you were reverse-engineering the protocol): T0 { T0 [ T0 { Vint_t0 (-5378); T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}; T0 { Vint_t0 3942717140522000971; T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1); Vint_t0 0]}; T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]}; T0 { Vint_t0 (-238860767206); T0 [ ]}; T0 { Vint_t0 1810196202; T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}]; T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83}; T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]} (I'm still changing some details so it might look better than this shortly.) It's not a drop-in solution like sexplib's "with sexp", by design (since it is meant to allow interoperability between different languages), but it's still fairly easy to use. If you're interested in this, tell me and I'll let you know when it's ready for serious usage. [1] http://code.google.com/p/protobuf/ -- Mauricio Fernandez - http://eigenclass.org