From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,DNS_FROM_SECURITYSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 10BAEBBAF for ; Fri, 24 Oct 2008 00:17:59 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvwAADSVAEnU436xkWdsb2JhbACBdpF5AQEBAQkLCgcRA65gAYNN X-IronPort-AV: E=Sophos;i="4.33,473,1220220000"; d="scan'208";a="16419239" Received: from discorde.inria.fr ([192.93.2.38]) by mail2-smtp-roc.national.inria.fr with ESMTP; 24 Oct 2008 00:17:58 +0200 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m9NMHw1G009669 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Fri, 24 Oct 2008 00:17:58 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvwAADSVAEnU436xkWdsb2JhbACBdpF5AQEBAQkLCgcRA65gAYNN X-IronPort-AV: E=Sophos;i="4.33,473,1220220000"; d="scan'208";a="16419238" Received: from moutng.kundenserver.de ([212.227.126.177]) by mail2-smtp-roc.national.inria.fr with ESMTP; 24 Oct 2008 00:17:58 +0200 Received: from gate.lan.gerd-stolpmann.de (dslb-088-068-193-126.pools.arcor-ip.net [88.68.193.126]) by mrelayeu.kundenserver.de (node=mrelayeu3) with ESMTP (Nemesis) id 0MKxQS-1Kt8VB2Hat-0007kX; Fri, 24 Oct 2008 00:17:57 +0200 Received: from [192.168.0.32] (fw.lan.gerd-stolpmann.de [192.168.1.1]) by gate.lan.gerd-stolpmann.de (Postfix) with ESMTP id 2A8D1C077; Fri, 24 Oct 2008 00:17:57 +0200 (CEST) Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs From: Gerd Stolpmann To: Mauricio Fernandez Cc: caml-list@inria.fr In-Reply-To: <20081023210527.GB32611@NANA.localdomain> References: <20081023163738.GA27707@usha.takhisis.invalid> <561320.93277.qm@web54606.mail.re2.yahoo.com> <20081023210527.GB32611@NANA.localdomain> Content-Type: text/plain Date: Fri, 24 Oct 2008 00:18:50 +0200 Message-Id: <1224800330.7340.65.camel@flake.lan.gerd-stolpmann.de> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX18D6adwb0F0slOqQrRBbB825Zz5YC1qgWtbsqN oT9uwrX0cPO0N/GYg69icNvBH0ExunsSiC4G8VtQcvnkZbbM4R BPuCfUWwCj1T9J1Qk9EYVexbnyc1y9a X-Miltered: at discorde with ID 4900F816.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; pxp:01 dtds:01 gerd:01 stolpmann:01 donnerstag:01 ocaml:01 model:01 model:01 ocaml's:01 bindings:01 ocaml:01 gerd:01 compiler:01 haskell:01 syntax:01 Am Donnerstag, den 23.10.2008, 23:05 +0200 schrieb Mauricio Fernandez: > I have been working for a while on a self-describing, compact, extensible > binary protocol, along with an OCaml implementation which I intent to release > in not too long. > > It differs from sexplib and that bin-prot in two main ways: > * the data model is deliberately more limited, as the format is meant to be > de/encodable in multiple languages. > * it is extensible at several levels, achieving both forward and backward > compatibility across changes in the data type > > You can think of it as an extensible Protocol Buffers[1] with a richer data > model (albeit not in 1:1 accordance with OCaml's for the above mentioned > reason). Have you looked at ICEP (see zeroc.com)? It has bindings for many languages, even for Ocaml (http://oss.wink.com/hydro/). It is, however, not self-describing. Anyway, you may find there ideas for portability. Gerd > In the criteria you gave in another message, namely > (1) ease of use > (2) "future-proofness" > (3) portability > (4) human-readability, > > it does fairly well at the 3 first ones --- especially at (2) and (3), which > were poorly supported by existing solutions (I looked into bin-prot, sexplib, > Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T > X.690 DER during the design). Being a binary format, it obviously doesn't do > that well at (4), but it is possible to get a human-readable dump of the > binary data even in the absence of the interface definition, making > reverse-engineering no harder than sexplib (and arguably easier in some ways). > > For example, here's a bogus message definition to illustrate (2) and (4). > This protocol definition is fed to the compiler, which generates the OCaml > type definitions, as well as the encoders/decoders and pretty-printers (as you > can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but > it's pretty clear IMO) > > type sum_type 'a 'b 'c = A 'a | B 'b | C 'c > > message complex_rtt = > A { > a1 : [(int * [|bool|])]; > a2 : [ sum_type ] > } > | B { > b1 : bool; > b2 : (string * [int]) > } > > The protocol is extensible in the sense that you can add new constructors to a > sum or message type, add new elements to a tuple, and replace any primitive > type by a sum type including the original type. For instance, if at some point > in time we find that the b1 field should have a different type, we can do > > type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a > > and then > ... > | B { b1 : bool_or_something; ... } > > This, along with a way to specify default values, allows both forward and > backward compatibility. > > The compiler generates a pretty printer for these structures, useful for > debugging. Here's a message generated randomly: > > { > Complex_rtt.a1 = > [ ((-5378), [| false; false; false; true; true |]); > (3942717140522000971, [| false; true; true; true; false |]); > ((-6535386320450295), [| false |]); ((-238860767206), [| |]); > (1810196202, [| false; false; true; true |]) ]; > Complex_rtt.a2 = > [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83; > Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] } > > Now, this is the information decoded in the absence of the above definitions > (iow., what you'd have to work with if you were reverse-engineering the > protocol): > > T0 { > T0 [ > T0 { Vint_t0 (-5378); > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1); > Vint_t0 (-1)]}; > T0 { Vint_t0 3942717140522000971; > T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1); > Vint_t0 0]}; > T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]}; > T0 { Vint_t0 (-238860767206); T0 [ ]}; > T0 { Vint_t0 1810196202; > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}]; > T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83}; > T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]} > > (I'm still changing some details so it might look better than this shortly.) > > It's not a drop-in solution like sexplib's "with sexp", by design (since it is > meant to allow interoperability between different languages), but it's still > fairly easy to use. > > If you're interested in this, tell me and I'll let you know when it's ready for > serious usage. > > [1] http://code.google.com/p/protobuf/ > -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------