Long-term storage of values

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Long-term storage of values
@ 2008-02-28 18:41 Dario Teixeira
  2008-02-28 20:01 ` [Caml-list] " David MENTRE
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Dario Teixeira @ 2008-02-28 18:41 UTC (permalink / raw)
  To: caml-list

Hi,

Suppose I have a value of type Story.t, fairly complex in its definition.
I wish to store this value in a DB (like Postgresql) for posterity.
At the moment, I am storing in the DB the marshalled representation
of the data; whenever I need to use it again in the Ocaml programme
I simply fetch it from the DB and unmarshal it.

This works fine; there is however one nagging problem: the marshalled
representation is brittle.  If Story.t changes even slightly, I will
no longer be able to retrieve values marshalled with the old version.

Note that changes to Story.t are likely to be small and will not
invalidate backwards compatibility (e.g., here or there another
variant might be added, but not removed or renamed).

I am therefore looking for an alternative to marshalling that a) does
not suffer from the brittleness problem, and b) is fast.  At the
moment, the best thing that occurs to me is to convert the data into
XML and to store it as such in the DB (the data is easily converted
into XML).

This strikes me as problem bound to be common.  How do you guys typically
solve this sort of situation?  In addition, if the XML route is indeed
the best one, what Ocaml tools would you recommend?  (Again, this is
for an application where speed is of the essence).

Thanks in advance for your input!
Cheers,
Dario Teixeira

      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Inbox. http://uk.docs.yahoo.com/nowyoucan.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
@ 2008-02-28 20:01 ` David MENTRE
  2008-02-28 20:01 ` Thomas Fischbacher
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: David MENTRE @ 2008-02-28 20:01 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Hello Dario,

2008/2/28 Dario Teixeira <darioteixeira@yahoo.com>:
>  I am therefore looking for an alternative to marshalling that a) does
>  not suffer from the brittleness problem, and b) is fast.  At the
>  moment, the best thing that occurs to me is to convert the data into
>  XML and to store it as such in the DB (the data is easily converted
>  into XML).

I asked a similar question once. One of the answer was to use Sexplib,
to store your values as Lisp S-exp:
http://www.ocaml.info/home/ocaml_sources.html#toc11

You could also use Config_file: http://home.gna.org/cameleon/configfile.en.html

Yours,
d.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
  2008-02-28 20:01 ` [Caml-list] " David MENTRE
@ 2008-02-28 20:01 ` Thomas Fischbacher
  2008-02-28 20:05 ` Mathias Kende
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Thomas Fischbacher @ 2008-02-28 20:01 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Dario Teixeira wrote:

> I am therefore looking for an alternative to marshalling that a) does
> not suffer from the brittleness problem, and b) is fast.  At the
> moment, the best thing that occurs to me is to convert the data into
> XML and to store it as such in the DB (the data is easily converted
> into XML).
> 
> This strikes me as problem bound to be common.  How do you guys typically
> solve this sort of situation?

By using lisp.

-- 
best regards,
Thomas Fischbacher
tf@functionality.de


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
  2008-02-28 20:01 ` [Caml-list] " David MENTRE
  2008-02-28 20:01 ` Thomas Fischbacher
@ 2008-02-28 20:05 ` Mathias Kende
  2008-02-28 22:09 ` Basile STARYNKEVITCH
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Mathias Kende @ 2008-02-28 20:05 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

Le jeudi 28 février 2008 à 18:41 +0000, Dario Teixeira a écrit :
> This strikes me as problem bound to be common.  How do you guys typically
> solve this sort of situation?  

I use a "magic" value at the beginning of the marshalled DB to
distinguish between different versions of the type and I write the code
to convert from one type to the next version. The result is as fast as
simple marshalling except in the few case were all the data must be
converted, but I need to write manually the code to convert the type to
a new version (probably not worse than what is needed the convert it to
xml).

Mathias

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
                   ` (2 preceding siblings ...)
  2008-02-28 20:05 ` Mathias Kende
@ 2008-02-28 22:09 ` Basile STARYNKEVITCH
  2008-02-29 14:45   ` Martin Jambon
  2008-02-28 23:42 ` Erik de Castro Lopo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Basile STARYNKEVITCH @ 2008-02-28 22:09 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Dario Teixeira wrote:
> Hi,
> 
> Suppose I have a value of type Story.t, fairly complex in its definition.
> I wish to store this value in a DB (like Postgresql) for posterity.
> At the moment, I am storing in the DB the marshalled representation
> of the data; whenever I need to use it again in the Ocaml programme
> I simply fetch it from the DB and unmarshal it.
> 
> This works fine; there is however one nagging problem: the marshalled
> representation is brittle.  If Story.t changes even slightly, I will
> no longer be able to retrieve values marshalled with the old version.

It is even theoretically a very difficult problem. There have been some 
publications by Cristal, Moscova, Gallium people at INRIA.

Assuming you have no abstract types, no objects, and no closures, and no 
polymorphisms i.e. that there is a *.ml source file containing all the 
type definitions. Then the types are composed by base types (int, 
string, ...), sums, records, and perhaps arrays.

Then you could consider what are the deltas on the type definition.

In a sum like type sumt = A of t1 | B of t2 | C
you might consider what happens when you remove let say B, or add let's 
say a choice | D of t3

In a record, consider likewise what happens when adding or removing a field.

Etc....

The details are very complex (at least to me, who tried unsucessfully to 
work on this during my year at INRIA).

Maybe a semi-hand-crafted generator could help. Adding polymorphisms, 
closures, objects, abstract types is a big mess.

Good luck to you. Try to publish something.

Regards

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
                   ` (3 preceding siblings ...)
  2008-02-28 22:09 ` Basile STARYNKEVITCH
@ 2008-02-28 23:42 ` Erik de Castro Lopo
  2008-02-29  1:14 ` Brian Hurt
  2008-03-20 21:03 ` Dario Teixeira
  6 siblings, 0 replies; 30+ messages in thread
From: Erik de Castro Lopo @ 2008-02-28 23:42 UTC (permalink / raw)
  To: caml-list

Dario Teixeira wrote:

> This works fine; there is however one nagging problem: the marshalled
> representation is brittle.  If Story.t changes even slightly, I will
> no longer be able to retrieve values marshalled with the old version.

Is XDR a solution? Its part of libocamlnet-ocaml.

Its come up on this mailing list a couple of times and I have a little
snippet of code if you want it.

Cheers,
Erik
-- 
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"The RIAA is obsessed to the point of comedy with the frustration
of having its rules broken, without considering whether such rules
might be standing in the way of increased revenues. Indeed,
Napster and Gnutella may turn out to be the two best music-marketing
gimmicks yet devised, if only the RIAA would take its head out of
its ass long enough to realise it."
-- Thomas C Greene on www.theregister.co.uk


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
                   ` (4 preceding siblings ...)
  2008-02-28 23:42 ` Erik de Castro Lopo
@ 2008-02-29  1:14 ` Brian Hurt
  2008-02-29  7:40   ` Gabriel Kerneis
  2008-03-01 14:15   ` Dario Teixeira
  2008-03-20 21:03 ` Dario Teixeira
  6 siblings, 2 replies; 30+ messages in thread
From: Brian Hurt @ 2008-02-29  1:14 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Thu, 28 Feb 2008, Dario Teixeira wrote:

> Hi,
>
> Suppose I have a value of type Story.t, fairly complex in its definition.
> I wish to store this value in a DB (like Postgresql) for posterity.
> At the moment, I am storing in the DB the marshalled representation
> of the data; whenever I need to use it again in the Ocaml programme
> I simply fetch it from the DB and unmarshal it.

The following is just my opinion, not that of my employeer.

You're making two mistakes.

Mistake #1: treating a database as a dumb object store.  This is a really 
popular idea right now- Hibernate does this, as does Ruby on Rails, and a 
number of other ORM packages take this effective approach.  On the other 
hand, dynamically typed languages are also really popular.

A database is an incredibly powerfull tool, used correctly. Used 
correctly, they allow you to handle huge amounts of data shared between 
multiple different clients with great flexibility and good performance. 
Used incorrectly, they tend to be bloated, slow, pigs.  There are a lot of 
things databases aren't good at- multidimensional data, for example, or 
recursive ("tree-structured") data.  Databases have some signifigant 
limitations.  Every single element in a relation (aka table) has to be 
exactly the same type- no superclasses, no variant types.  Worse yet, SQL 
isn't even Turing-complete.  It's the world's oldest, most popular, DSL.

So "used correctly" is tricky to define, because relational databases are 
a paradigm, not unlike functional programming or object oriented 
programming.  But the trick is that you're designing, and coding, to the 
database, and you can't hide that or ignore it.  Some things are easy: 
databases are really good at filtering, joining, some simple mapping and 
aggregation.  The first few "levels" of data handling should be done in 
SQL in the database- you should never be sucking whole tables down.  If 
you do try to hide the essential nature of the database, you're run right 
into the meatgrinder of it's limitations.  Used correctly, you get the 
advantages and avoid the disadvantages.

So, mistake number one: either use the data, and structure your data (at 
that layer) to take advantage of it, or don't use a database.

Mistake number two: file formats (and this includes marshalled data 
structures), are wire protocols, and need to be designed to be as abstract 
as possible- to reveal as little about the internal structure of the 
program as possible (preferrably none at all).

This is an idea that gets reinvented time after time, and it always ends 
in tears and recriminations: have some magic protocol that allows programs 
to communication directly- just have program X call a function or pass an 
object to program Y directly, and have the protocol handle all the mucking 
about with serializing/deserializing data, converting function calls into 
request/response messages, etc.  Sun RPC, COM, CORBA, OLE, XML-RPC, 
and SOAP are the implementations that spring to mind.  Object 
serialization hits the exact same problem: it doesn't matter whether 
program X and Y are communicating via TCP/IP sockets, files, or 
quantum-tachyon entanglement.

Sooner or later (and generally sooner), it'll happen: program X will ask 
to some function, or pass some type of data, that program Y doesn't have 
any knowledge of.  It may be because version X is a newer version of the 
program/protocol, and the function/data type has been added.  It may be 
because X is an older version, and the function/data type has since been 
removed.  In any case, the first time this happens is when the tears and 
recriminations start.

Versioning simply makes it more painfully obvious that you're shackled to 
the past.  You want to get rid of that pesky function?  You can't, because 
older versions of the protocol require it to be there.  Don't need a peice 
of data anymore?  Tough, older versions of the protocol still require it. 
The best thing versioning gives you is the ability to error out early, and 
make a more sensible error message ("Sorry, but protocol support >= 2.14 
is required!"), but it doesn't solve the problem.

The best solution I've found is to be aware that, when you're 
communicating with the outside world, you're implementing a *protocol*. 
And that protocol should be, as I said, as abstract as possible and reveal 
as little about the structure of the program as possible.  So I can change 
the program enormously, even reimplement it from scratch in a different 
language, without great difficulty.  Consider SMTP, HTTP, and YAML as 
examples of protocols or generic file formats done right.

Note that you can do protocol design, and then implement it is Corba or 
XML.  A sure that you've done this is the existance of a "translation 
layer" - comments like "OK, now we translate the XML data structure into 
our internal data structure" and such like.  THe successfull projects I've 
seen that used these technologies did this (or got lucky and grew into 
this).

So that's mistake number two: you're communicating between different 
versions of the program with an ill-defined (at best) and not 
generic protocol/file format.

Fix these two problems, and I'm willing to bet most of the rest of the 
problems go away too.

Brian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29  1:14 ` Brian Hurt
@ 2008-02-29  7:40   ` Gabriel Kerneis
  2008-02-29 10:19     ` Berke Durak
  2008-02-29 11:44     ` Richard Jones
  2008-03-01 14:15   ` Dario Teixeira
  1 sibling, 2 replies; 30+ messages in thread
From: Gabriel Kerneis @ 2008-02-29  7:40 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Hello,

Le Thu, 28 Feb 2008 20:14:27 -0500 (EST), Brian Hurt <bhurt@spnz.org> a
écrit :
> So, mistake number one: either use the data, and structure your data
> (at that layer) to take advantage of it, or don't use a database.
> [...]
> So that's mistake number two: you're communicating between different 
> versions of the program with an ill-defined (at best) and not 
> generic protocol/file format.

Right. But imagine you're communicating with yourself (saving and
restoring data). And you need to retrieve the data *efficiently*.
Converting from a generic file format is not efficient - not if you have
to retrieve the data 100 times per second (imagine a CMS on a popular
website). It would be faster to use an internal file format. And, of
course, keeping a backup in a generic file format.

But now, here is the big deal: when your internal data structure
changes (and it might not even be under your control, imagine you're
using a third-party library), you have to convert your generic backup
to the newer internal format. And if you have, say, an awful lot of
backups, it might take soooo long... Of course, it's only once in a
while, but when this happens, how do you deal with it? Remember the
efficiency is the key point, here.

I don't think there is a generic solution to this problem. But I'm just
pointing out the underlying requirements, in case some would have one.

Regards,
-- 
Gabriel Kerneis

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29  7:40   ` Gabriel Kerneis
@ 2008-02-29 10:19     ` Berke Durak
  2008-02-29 18:05       ` Markus Mottl
  2008-02-29 11:44     ` Richard Jones
  1 sibling, 1 reply; 30+ messages in thread
From: Berke Durak @ 2008-02-29 10:19 UTC (permalink / raw)
  To: Gabriel Kerneis, Caml-list List

Gabriel Kerneis a écrit :
> Hello,
> 
> Le Thu, 28 Feb 2008 20:14:27 -0500 (EST), Brian Hurt <bhurt@spnz.org> a
> écrit :
>> So, mistake number one: either use the data, and structure your data
>> (at that layer) to take advantage of it, or don't use a database.
>> [...]
>> So that's mistake number two: you're communicating between different 
>> versions of the program with an ill-defined (at best) and not 
>> generic protocol/file format.
> 
> Right. But imagine you're communicating with yourself (saving and
> restoring data). And you need to retrieve the data *efficiently*.
> Converting from a generic file format is not efficient - not if you have
> to retrieve the data 100 times per second (imagine a CMS on a popular
> website). It would be faster to use an internal file format. And, of
> course, keeping a backup in a generic file format.
> 
> But now, here is the big deal: when your internal data structure
> changes (and it might not even be under your control, imagine you're
> using a third-party library), you have to convert your generic backup
> to the newer internal format. And if you have, say, an awful lot of
> backups, it might take soooo long... Of course, it's only once in a
> while, but when this happens, how do you deal with it? Remember the
> efficiency is the key point, here.
> 
> I don't think there is a generic solution to this problem. But I'm just
> pointing out the underlying requirements, in case some would have one.
> 
> Regards,

I had exactly those kinds of issues during my work at the EDOS project.
We were basically treating the metadata for every dat of every component
of every architecture of every Debian distribution.

I started using SQL.  I first tried MySQL then PostgreSQL.
Fill performance was bad.  (And didn't get fast enough when Jaap added
prepared statements for postgresql).
Query performance was worse, because we were interested in things like
the transitive dependency closure, and SQL is
well-known to not be suitable for such transitive properties.  Maybe I
should have tried Sqlite but I didn't know it at the time.

I then switched to marshalling.  Boy, that was fast!  Of course I got
bitten very hard by segmentation faults...  sorry, Xavier, for bothering
you about such stupidity.  And yes, I had a version number in my datastructures
but, no, it was not automatically updated.

Also, adding a field was really, really painful.  Marshalling could work when
your data structures are stable, but during development, it is painful,
especially if you can't use a toy data set for development.

I then built an I/O combinator library.  Of course, combinators already
exist for defining parsers and printers, but I thought that it would be
great to combine them both.  (Such an idea has been presented at POPL
under the term of "picklers" if I recall correctly.)

Basically you describe your types using "literates" which can then be used
for reading and writing, as in

   type my_litterate = io_pair (io_list io_int) (io_hasthbl io_int io_string)

Anyway, that led to the "IO" module available here

   http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=537

and a better version still lives in the EDOS codebase, which is now
maintained by Jaap Boender.  I think Jaap made a GODI package for it.

IO can use multiple backends (binary, ASCII) and that was quite useful.
Performance is reasonably good.  One problem is with records and sum types -
you have to give pattern-matching functions to define readers/writers for them.
However, you can decide how you handle missing values or extra fields.

Martin Jambon's JSON wheel could also be used here, but I'm waiting for
the preprocessor situation to stabilize a little bit...

One major drawback of IO is that it does not handle sharing or recursive
structures.  And I know of no efficient, type-safe way of handling those,
especially if you don't want the serializer to add extra sharing (for instance
if you have distinct mutable records with the same contents)

I am still thinking about these problems.  For instance, I have been recently
working on Wikipedia revision history (only the Turkish (15 GB) and French ones
(200 GB), the English one is too big).  I needed maximal efficiency so I used Marshal, which was pretty
fast.  Of course I added a small version header.

I noticed that when you Marshal a closure, the module puts a MD5 of
presumably the whole program in the output (which costs some bytes but that's
another problem.)

It would be nice to have a mechanism for marshalling a value with an MD5 of its
types, and unmarshalling only then the MD5 matches.  In non-malicious environments,
that is, in environments where no-one fakes MD5s, that would be quite safe.

Of course it won't be possible to marshal polymorphic types.

So, I'm asking the experts: is it possible to have a very small extension that would allow this?
We could have the following restrictions and it still would be very useful:

   - the type t must be defined in a module M, which must live in a separate compilation unit
     (i.e. have an available cmi)
   - the type t must be ground
   - the type t must be named in an interface
   - it must be explicitly named when marshalling/unmarshalling
   - this is an ugly hack that only works in Marshal

Something along the lines of :

   safer_output_to_channel stdout (z : Interface.t) []

Trying

   safer_output_to_channel stdout z []

Would yield

   safer_output_to_channel stdout z []
                                  ^
Error: value must be explicitly annotated with a type

safer_output_to_channel stdout (33 : int) []
Error: Type needs to be defined in an external interface to be Marshallable.

safer_output_to_channel stdout (bar : Foo.t) []
Error: Module Foo.t has no interface

safer_output_to_channel stdout (bar : int Foo.t) []
Error: Type Foo.t is not ground

-- 
Berke DURAK

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29  7:40   ` Gabriel Kerneis
  2008-02-29 10:19     ` Berke Durak
@ 2008-02-29 11:44     ` Richard Jones
  2008-02-29 14:09       ` Brian Hurt
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Jones @ 2008-02-29 11:44 UTC (permalink / raw)
  To: Gabriel Kerneis; +Cc: caml-list

On Fri, Feb 29, 2008 at 08:40:00AM +0100, Gabriel Kerneis wrote:
> Hello,
> 
> Le Thu, 28 Feb 2008 20:14:27 -0500 (EST), Brian Hurt <bhurt@spnz.org> a
> écrit :
> > So, mistake number one: either use the data, and structure your data
> > (at that layer) to take advantage of it, or don't use a database.
> > [...]
> > So that's mistake number two: you're communicating between different 
> > versions of the program with an ill-defined (at best) and not 
> > generic protocol/file format.
> 
> Right. But imagine you're communicating with yourself (saving and
> restoring data). And you need to retrieve the data *efficiently*.
> Converting from a generic file format is not efficient - not if you have
> to retrieve the data 100 times per second (imagine a CMS on a popular
> website). It would be faster to use an internal file format. And, of
> course, keeping a backup in a generic file format.

But there are plenty of database-backed websites out there which
perform well[1].

Of course you have to understand databases, optimization of databases,
connection pooling, appropriate use of stored procedures,
denormalizing and occasionally even caching data outside the database.
So as with many things in life you need to know what you're doing.

A good first step for PostgreSQL is to use and understand the "VACUUM"
and "EXPLAIN" commands, and to turn on statement logging (with
timestamps) so you can see which queries are consuming the most time.

OCaml even has type-safe bindings to PostgreSQL ...

Rich.

[1] Some good books on the general issues involved:
http://www.amazon.co.uk/Building-Scalable-Web-Sites-Henderson/dp/0596102356
http://www.amazon.co.uk/Speed-Up-Your-Site-Optimization/dp/0735713243

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29 11:44     ` Richard Jones
@ 2008-02-29 14:09       ` Brian Hurt
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Hurt @ 2008-02-29 14:09 UTC (permalink / raw)
  To: caml-list

Richard Jones wrote:

>A good first step for PostgreSQL is to use and understand the "VACUUM"
>and "EXPLAIN" commands, and to turn on statement logging (with
>timestamps) so you can see which queries are consuming the most time.
>
>  
>
I'd add the "COPY" command to that list, with the attendent understand 
of why INSERTs are slow on Postgresql.  But this is off-topic for this list.

Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 22:09 ` Basile STARYNKEVITCH
@ 2008-02-29 14:45   ` Martin Jambon
  2008-02-29 19:09     ` Jake Donham
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Jambon @ 2008-02-29 14:45 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: Dario Teixeira, caml-list

On Thu, 28 Feb 2008, Basile STARYNKEVITCH wrote:

> Dario Teixeira wrote:
>>  Hi,
>>
>>  Suppose I have a value of type Story.t, fairly complex in its definition.
>>  I wish to store this value in a DB (like Postgresql) for posterity.
>>  At the moment, I am storing in the DB the marshalled representation
>>  of the data; whenever I need to use it again in the Ocaml programme
>>  I simply fetch it from the DB and unmarshal it.
>>
>>  This works fine; there is however one nagging problem: the marshalled
>>  representation is brittle.  If Story.t changes even slightly, I will
>>  no longer be able to retrieve values marshalled with the old version.
>
>
> It is even theoretically a very difficult problem. There have been some 
> publications by Cristal, Moscova, Gallium people at INRIA.
>
> Assuming you have no abstract types, no objects, and no closures, and no 
> polymorphisms i.e. that there is a *.ml source file containing all the type 
> definitions. Then the types are composed by base types (int, string, ...), 
> sums, records, and perhaps arrays.
>
> Then you could consider what are the deltas on the type definition.
>
> In a sum like type sumt = A of t1 | B of t2 | C
> you might consider what happens when you remove let say B, or add let's say a 
> choice | D of t3
>
> In a record, consider likewise what happens when adding or removing a field.
>
> Etc....
>
>
> The details are very complex (at least to me, who tried unsucessfully to work 
> on this during my year at INRIA).
>
> Maybe a semi-hand-crafted generator could help. Adding polymorphisms, 
> closures, objects, abstract types is a big mess.

I sure do believe you.
However, speaking with no experience in this domain, it seems to me that 
if we restrict the transition to a certain subset of operations, it can be 
possible to define a mapping using some Camlp4 tool such as Deriving 
(well, that's what I was told, assuming I interpreted correctly).

For instance:
- adding a record field: a default value is injected
- removing a record field: just remove it
- adding a variant: do nothing since it doesn't exist in the old data
- changing a type arbitrarily (such as changing a type foo1 into foo2
   everywhere): provide a map function that would override the default map
   function for such nodes. 
- functional and abstract values: left as-is

I think I'll soon have to deal with such a problem, so any further 
suggestions are welcome.


Martin

--
http://wink.com/profile/mjambon
http://martin.jambon.free.fr


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29 10:19     ` Berke Durak
@ 2008-02-29 18:05       ` Markus Mottl
  0 siblings, 0 replies; 30+ messages in thread
From: Markus Mottl @ 2008-02-29 18:05 UTC (permalink / raw)
  To: Berke Durak; +Cc: Gabriel Kerneis, Caml-list List

On Fri, Feb 29, 2008 at 5:19 AM, Berke Durak <berke.durak@exalead.com> wrote:
>  One major drawback of IO is that it does not handle sharing or recursive
>  structures.  And I know of no efficient, type-safe way of handling those,
>  especially if you don't want the serializer to add extra sharing (for instance
>  if you have distinct mutable records with the same contents)

It shouldn't be too hard to write a macro that generates code for
checking whether data marshalled out using the standard marshalling
function conforms to a type specification even in the presence of
shared or cyclic values.  You just need to store a type identifier in
an array for every structured value you encounter.  Whenever there is
a backreference in the marshalled data, you check whether the type
denoted by the identifier for the previously seen value unifies with
what you expect at the current position.  This should be sufficiently
efficient.  If this type checking succeeds, it should be safe to
unmarshal the data using the standard functions.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29 14:45   ` Martin Jambon
@ 2008-02-29 19:09     ` Jake Donham
  0 siblings, 0 replies; 30+ messages in thread
From: Jake Donham @ 2008-02-29 19:09 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Fri, Feb 29, 2008 at 6:45 AM, Martin Jambon <martin.jambon@ens-lyon.org>
wrote:

> if we restrict the transition to a certain subset of operations, it can be
> possible to define a mapping using some Camlp4 tool such as Deriving
> (well, that's what I was told, assuming I interpreted correctly).
>
> For instance:
> - adding a record field: a default value is injected
> - removing a record field: just remove it
> - adding a variant: do nothing since it doesn't exist in the old data
> - changing a type arbitrarily (such as changing a type foo1 into foo2
>   everywhere): provide a map function that would override the default map
>   function for such nodes.
> - functional and abstract values: left as-is
>

We do this at Skydeck. We modified Deriving's Typeable module to expose the
structure of types, and we marshal values along with a version number (for
upgrades) and a type description from Typeable (so you get an error if you
change the type and forget to change the version number).

Our modified Typeable also has reflection functions so if you have a dynamic
(a value along with a Typeable.TypeRep) you can examine its parts
dynamically (implemented with Obj.magic). On top of that we have a generic
function for changing a value of one type into a value of another, as Martin
describes--we try to do the translation automatically as far as we can, and
provide a way to pass in mapping functions for particular types.

For the most part this works well. Compared to a SQL database it is very
nice to have the OCaml type system for data representation. The upgrade
mechanism is much safer than hand-coding SQL schema upgrades. On the other
hand, there are many things you get for free with a SQL db that you have to
do yourself: e.g. putting IDs on objects so you can refer to them
externally, easy hand-inspection of the data (it's annoying navigating big
structures in the top-level), to name a couple of small ones.

Jake

[-- Attachment #2: Type: text/html, Size: 2262 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-29  1:14 ` Brian Hurt
  2008-02-29  7:40   ` Gabriel Kerneis
@ 2008-03-01 14:15   ` Dario Teixeira
  1 sibling, 0 replies; 30+ messages in thread
From: Dario Teixeira @ 2008-03-01 14:15 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

Hi,

> You're making two mistakes.
> 
> Mistake #1: treating a database as a dumb object store.  This is a really 
> popular idea right now- Hibernate does this, as does Ruby on Rails, and a 
> number of other ORM packages take this effective approach.  On the other 
> hand, dynamically typed languages are also really popular.

Thanks for your reply.  It was quite interesting, though I get the feeling
you used my question solely as a trigger to share with us a long-held
dissatisfaction with the current state of affairs concerning the use of
databases, regardless of whether it actually applies to my particular problem.
That's fair, and I do agree with practically all your points.  However,
if I were you I would refrain from starting such missives with statements
as blunt and uncompromising as "You're making two mistakes".  As it turns
out, this is one of those cases where the data (tree-like, with recursive
structures) does not map well at all with a relational database.

Moreover, I am far from treating the database as a dumb object storage.
In fact, a significant portion of the code for this particular application
(a web app using Ocsigen; you can download a preliminary version of the app
from http://dario.dse.nl/projects/lambdium-light/) lies on the Postgresql side,
with the Ocaml code serving as little more than delivery boy.  The exception
is of course those portions where it is far more natural and performant to let
the Ocaml code take the reins and treat the DB as dumb storage.  The complex
data structure holding the markup of stories/comments is one such example.

> So, mistake number one: either use the data, and structure your data (at 
> that layer) to take advantage of it, or don't use a database.

Unfortunately, this is an overly general statement.  I have no doubt that
if I were to present you the data I have, your reply would be "in this case,
just use the DB as dumb storage".

> Mistake number two: file formats (and this includes marshalled data 
> structures), are wire protocols, and need to be designed to be as abstract 
> as possible- to reveal as little about the internal structure of the 
> program as possible (preferrably none at all).

On the other hand, one of the advantages of using a language with such a
rich type-system as Ocaml's, is that the application-independent description
of the data can be translated on practically a 1:1 basis to native language
constructs!  Trust me, I didn't need to bend the data definition to suit
the internal structure of the programme.

Kind regards,
Dario Teixeira

      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-02-28 18:41 Long-term storage of values Dario Teixeira
                   ` (5 preceding siblings ...)
  2008-02-29  1:14 ` Brian Hurt
@ 2008-03-20 21:03 ` Dario Teixeira
  2008-03-20 21:32   ` Martin Jambon
                     ` (3 more replies)
  6 siblings, 4 replies; 30+ messages in thread
From: Dario Teixeira @ 2008-03-20 21:03 UTC (permalink / raw)
  To: caml-list

Hi,

Some weeks ago I asked in the list about solutions for the long-term
storage of values in Ocaml.  I was looking for a solution that was
not quite as brittle as the Marshal module, while still maintaining
reasonable performance and being able to deal with Ocaml types in
all their glory (recursive structures, etc).

Thanks for all your replies -- I got plenty of ideas and found out
about new libraries I didn't know existed.  After some experimenting,
I settled for Sexplib; the main reason being the syntax extension that
automatically takes care of writing the (de)serialising functions.

However, if your requirements are different, you may find that another
solution might be more suitable.  Here's a brief summary of the main
options I looked into:  (let me know if there are others!)


- S-expressions using Sexplib.  It has the advantage of being
  quite fast, compact, and still human-readable-ish (if you like
  parentheses).  Moreover, the Camlp4 syntax extension makes the
  creation of (de)serialising functions as straightforward as
  appending "with sexp" to type definitions.

- Config_file.  A strong contender if proper human-readability
  is a requirement (it is far more readable than Sexplib, for
  example).  As far as I can tell, you still need to manually
  create the (de)serialising functions for complex types, though.
  Also, my guess is that the slightly more complex syntax will
  make it slower than Sexplib.

- JSON.  Possibly close to Config_file in terms of effort and
  speed.  Has the same advantage of being very human-friendly,
  and the additional advantage of having quickly become a popular
  interchange format (you'll find libraries for almost any language
  out there).

- XML.  Though Ocamlduce makes XML easier to deal with, the
  only advantage I see in XML is its ubiquity.  The format is
  verbose, not all that human-friendly, and you still need to
  manually create the (de)serialising functions.

- XDR.  Though I have fond memories of using XDR with C (in a
  different life), I have no experience with it in Ocaml.
  Though it should be very fast, you also need to manually
  create the (de)serialising functions.

- The I/O combinator library by Berke Durak.  Seems similar
  to XDR in terms of effort required.  Does it still not
  support recursive structures?  That is a common requirement.


Cheers,
Dario Teixeira



      ___________________________________________________________ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:03 ` Dario Teixeira
@ 2008-03-20 21:32   ` Martin Jambon
  2008-03-20 22:41     ` Dario Teixeira
  2008-03-20 21:42   ` Daniel Bünzli
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Martin Jambon @ 2008-03-20 21:32 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Thu, 20 Mar 2008, Dario Teixeira wrote:

[...]
> - JSON.  Possibly close to Config_file in terms of effort and
>  speed.  Has the same advantage of being very human-friendly,
>  and the additional advantage of having quickly become a popular
>  interchange format (you'll find libraries for almost any language
>  out there).

There's json-static for automatic marshalling:

   http://martin.jambon.free.fr/json-static.html


Martin

--
http://wink.com/profile/mjambon
http://martin.jambon.free.fr


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:03 ` Dario Teixeira
  2008-03-20 21:32   ` Martin Jambon
@ 2008-03-20 21:42   ` Daniel Bünzli
  2008-03-20 22:33     ` Dario Teixeira
  2008-03-20 21:43   ` Gerd Stolpmann
  2008-03-21 10:32   ` Berke Durak
  3 siblings, 1 reply; 30+ messages in thread
From: Daniel Bünzli @ 2008-03-20 21:42 UTC (permalink / raw)
  To: caml-list caml-list


Le 20 mars 08 à 22:03, Dario Teixeira a écrit :

[...]

In this blog post [1] you wrote :

> And judging from comments by Xavier Leroy (the primary developer of  
> the Ocaml language) at the Ocaml users meeting in Paris this last  
> January, there's a good chance that type-safe marshalling will make  
> it into the core language in the near future.

What are the comments in question ? Is this real or wishful  
interpretations ?

Best,

Daniel

[1] http://nleyten.com/2008/03/10/sexpressions-for-longterm-storage-of-ocaml-values.aspx


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:03 ` Dario Teixeira
  2008-03-20 21:32   ` Martin Jambon
  2008-03-20 21:42   ` Daniel Bünzli
@ 2008-03-20 21:43   ` Gerd Stolpmann
  2008-03-21 14:37     ` Dario Teixeira
  2008-03-21 10:32   ` Berke Durak
  3 siblings, 1 reply; 30+ messages in thread
From: Gerd Stolpmann @ 2008-03-20 21:43 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list


Am Donnerstag, den 20.03.2008, 21:03 +0000 schrieb Dario Teixeira:
> Hi,
> 
> Some weeks ago I asked in the list about solutions for the long-term
> storage of values in Ocaml.  
> - XDR.  Though I have fond memories of using XDR with C (in a
>   different life), I have no experience with it in Ocaml.
>   Though it should be very fast, you also need to manually
>   create the (de)serialising functions.

No, there is a generator for that in Ocamlnet. ocamlrpcgen can be used
to generate these functions.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:42   ` Daniel Bünzli
@ 2008-03-20 22:33     ` Dario Teixeira
  0 siblings, 0 replies; 30+ messages in thread
From: Dario Teixeira @ 2008-03-20 22:33 UTC (permalink / raw)
  To: Daniel Bünzli, caml-list caml-list

> > And judging from comments by Xavier Leroy (the primary developer of  
> > the Ocaml language) at the Ocaml users meeting in Paris this last  
> > January, there's a good chance that type-safe marshalling will make  
> > it into the core language in the near future.
> 
> What are the comments in question ? Is this real or wishful  
> interpretations ?
> 

Hi,

As far as I understood it (I wasn't in Paris; I only listened to the audio
recordings), the research has been done (there are actually several different
projects that focused on this subject), and there are no major technical
impediments to the inclusion of type-safe marshalling into the core language.
However, given the present man-power constraints of the Ocaml team at INRIA,
I may have used the expression "near future" in a non-conventional way...
In any case, only the Ocaml team can elucidate us on that matter.

(Anyway, Xavier's talk was very interesting, as were all the others;
I recommend anyone interested in the future of the Ocaml language to
watch the video or to listen to the recordings)

Cheers,
Dario

      ___________________________________________________________ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:32   ` Martin Jambon
@ 2008-03-20 22:41     ` Dario Teixeira
  2008-03-20 23:00       ` Martin Jambon
  0 siblings, 1 reply; 30+ messages in thread
From: Dario Teixeira @ 2008-03-20 22:41 UTC (permalink / raw)
  To: Martin Jambon; +Cc: caml-list

> 
> There's json-static for automatic marshalling:
> 
>    http://martin.jambon.free.fr/json-static.html

Hi,

Thanks for your reply, Martin.  Looking at the examples, it seems that
json-static is JSON-centric, in the sense that the type definitions for
the automatic (de)serialisers are to be written in JSON.  Does it support
the reverse, namely "here's an Ocaml type: please write Ocaml functions
that (de)serialise into JSON"?

Cheers,
Dario

      ___________________________________________________________ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 22:41     ` Dario Teixeira
@ 2008-03-20 23:00       ` Martin Jambon
  2008-03-21 14:01         ` Dario Teixeira
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Jambon @ 2008-03-20 23:00 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Thu, 20 Mar 2008, Dario Teixeira wrote:

>>
>> There's json-static for automatic marshalling:
>>
>>    http://martin.jambon.free.fr/json-static.html
>
> Hi,
>
> Thanks for your reply, Martin.  Looking at the examples, it seems that
> json-static is JSON-centric, in the sense that the type definitions for
> the automatic (de)serialisers are to be written in JSON.  Does it support
> the reverse, namely "here's an Ocaml type: please write Ocaml functions
> that (de)serialise into JSON"?

Dario,

No JSON needs to be written by hand.
Here's a simple example:

type json point = { x : int; y : int }  (* an OCaml record *)

It creates the functions with the following signature:

val json_of_point : point -> Json_type.t
val point_of_json : Json_type.t -> point

Json_type.t is the JSON syntax tree that you can serialize using 
Json_io.string_of_json.


# let j = json_of_point { x = 12; y = 34 };;
val j : Json_type.t =
   Json_type.Object [("x", Json_type.Int 12); ("y", Json_type.Int 34)]

# Json_io.string_of_json j;;
- : string = "{ \"x\": 12, \"y\": 34 }"



Martin

--
http://wink.com/profile/mjambon
http://martin.jambon.free.fr


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:03 ` Dario Teixeira
                     ` (2 preceding siblings ...)
  2008-03-20 21:43   ` Gerd Stolpmann
@ 2008-03-21 10:32   ` Berke Durak
  3 siblings, 0 replies; 30+ messages in thread
From: Berke Durak @ 2008-03-21 10:32 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Dario Teixeira a écrit :
> Hi,
> 
> Some weeks ago I asked in the list about solutions for the long-term
> storage of values in Ocaml.  I was looking for a solution that was
> not quite as brittle as the Marshal module, while still maintaining
> reasonable performance and being able to deal with Ocaml types in
> all their glory (recursive structures, etc).
> 
> Thanks for all your replies -- I got plenty of ideas and found out
> about new libraries I didn't know existed.  After some experimenting,
> I settled for Sexplib; the main reason being the syntax extension that
> automatically takes care of writing the (de)serialising functions.
> 
> However, if your requirements are different, you may find that another
> solution might be more suitable.  Here's a brief summary of the main
> options I looked into:  (let me know if there are others!)
> 
> 
> - S-expressions using Sexplib.  It has the advantage of being
>   quite fast, compact, and still human-readable-ish (if you like
>   parentheses).  Moreover, the Camlp4 syntax extension makes the
>   creation of (de)serialising functions as straightforward as
>   appending "with sexp" to type definitions.
> 
> - Config_file.  A strong contender if proper human-readability
>   is a requirement (it is far more readable than Sexplib, for
>   example).  As far as I can tell, you still need to manually
>   create the (de)serialising functions for complex types, though.
>   Also, my guess is that the slightly more complex syntax will
>   make it slower than Sexplib.
> 
> - JSON.  Possibly close to Config_file in terms of effort and
>   speed.  Has the same advantage of being very human-friendly,
>   and the additional advantage of having quickly become a popular
>   interchange format (you'll find libraries for almost any language
>   out there).
> 
> - XML.  Though Ocamlduce makes XML easier to deal with, the
>   only advantage I see in XML is its ubiquity.  The format is
>   verbose, not all that human-friendly, and you still need to
>   manually create the (de)serialising functions.
> 
> - XDR.  Though I have fond memories of using XDR with C (in a
>   different life), I have no experience with it in Ocaml.
>   Though it should be very fast, you also need to manually
>   create the (de)serialising functions.
> 
> - The I/O combinator library by Berke Durak.  Seems similar
>   to XDR in terms of effort required.  Does it still not
>   support recursive structures?  That is a common requirement.
> 
> 

Hello,

I have been testing Sexplib for the last few weeks and I am very
satisfied with it.  The code is very high quality and the automatic
generation of converters (to and from S-expressions) is extremely
pleasant to work with.  It works very well with modules, functors
and polymorphic types.  Congratulations to Markus Mottl.
I certainly do not miss writing I/O combinators by hand for record
or sum types.

The use of uniform S-expressions has the considerable advantage of
simplicity.  You will never spend more than half an hour
writing a parser/printer for S-expressions in any language.

Simplicity and uniform representation are also very important for data
perennity.  The type is just

    type t = Atom of string | List of t list

Values of this type can very easily be processed in Ocaml.  This way,
you can migrate your old data quite easily by loading it, writing a few
recursive functions or using the Path module of Sexp, and then saving
it again.  For simple cases, you could even use sed, sgrep or even Scheme.

For instance, I have migrated twice a (small) user base for a personal
web site when I had to add some record fields or change one key from
ints to int lists to strings, and it works well.

We are also using it now for debugging output of complex types (ASTs,
class hierarchies, etc.) and the output is much more readable than what
you would usually crank by hand for debugging purposes.

Note however that Sexplib doesn't handle recursive data by default, but as
you can override printers/parsers for any type, you could easily use special
"references" thru which recursion could go and handle it manually.
As you all know, there is no good way in Ocaml for maintaining a physical
equality-based set.  (The IO combinator library doesn't either).

So I heavily recommend the use of Sexplib and its integration with
standardized Ocaml distributions.
-- 
Berke DURAK

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 23:00       ` Martin Jambon
@ 2008-03-21 14:01         ` Dario Teixeira
  2008-03-21 14:28           ` Martin Jambon
  0 siblings, 1 reply; 30+ messages in thread
From: Dario Teixeira @ 2008-03-21 14:01 UTC (permalink / raw)
  To: Martin Jambon; +Cc: caml-list

> No JSON needs to be written by hand.
> Here's a simple example:
> 
> type json point = { x : int; y : int }  (* an OCaml record *)
> 
> It creates the functions with the following signature:
> 
> val json_of_point : point -> Json_type.t
> val point_of_json : Json_type.t -> point

Hi,

And thanks for the clarification, Martin.  Okay then, this makes
json-static as convenient as Sexplib, and the strongest candidate
if human-readability is a requirement.  It should also come in
very handy for browser/server communication (AJAX) using the
Ocsigen platform, for example.

Cheers,
Dario



      __________________________________________________________
Sent from Yahoo! Mail.
More Ways to Keep in Touch. http://uk.docs.yahoo.com/nowyoucan.html


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-21 14:01         ` Dario Teixeira
@ 2008-03-21 14:28           ` Martin Jambon
  2008-03-21 14:34             ` Martin Jambon
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Jambon @ 2008-03-21 14:28 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Fri, 21 Mar 2008, Dario Teixeira wrote:

>> No JSON needs to be written by hand.
>> Here's a simple example:
>>
>> type json point = { x : int; y : int }  (* an OCaml record *)
>>
>> It creates the functions with the following signature:
>>
>> val json_of_point : point -> Json_type.t
>> val point_of_json : Json_type.t -> point
>
> Hi,
>
> And thanks for the clarification, Martin.  Okay then, this makes
> json-static as convenient as Sexplib, and the strongest candidate
> if human-readability is a requirement.  It should also come in
> very handy for browser/server communication (AJAX) using the
> Ocsigen platform, for example.

I have to say that json-static does not support parametrized types other 
than a few pervasive ones (lists, arrays, hash tables, options, ...).

Like the other syntax extensions that deal with types it uses the type 
names to determine the JSON type to use, 
i.e. if 2 names are used to refer to the same OCaml type, 
they can use 2 different JSON representations.
A common, predefined example is the "assoc" type, which is defined as 
follows:

type 'a assoc = (string * 'a) assoc

and would use a JSON object rather than a JSON array of arrays, which is
common usage.

Martin
--
http://wink.com/profile/mjambon
http://martin.jambon.free.fr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-21 14:28           ` Martin Jambon
@ 2008-03-21 14:34             ` Martin Jambon
  0 siblings, 0 replies; 30+ messages in thread
From: Martin Jambon @ 2008-03-21 14:34 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Fri, 21 Mar 2008, Martin Jambon wrote:

> On Fri, 21 Mar 2008, Dario Teixeira wrote:
>
>> >  No JSON needs to be written by hand.
>> >  Here's a simple example:
>> > 
>> >  type json point = { x : int; y : int }  (* an OCaml record *)
>> > 
>> >  It creates the functions with the following signature:
>> > 
>> >  val json_of_point : point -> Json_type.t
>> >  val point_of_json : Json_type.t -> point
>>
>>  Hi,
>>
>>  And thanks for the clarification, Martin.  Okay then, this makes
>>  json-static as convenient as Sexplib, and the strongest candidate
>>  if human-readability is a requirement.  It should also come in
>>  very handy for browser/server communication (AJAX) using the
>>  Ocsigen platform, for example.
>
> I have to say that json-static does not support parametrized types other than 
> a few pervasive ones (lists, arrays, hash tables, options, ...).
>
>
> Like the other syntax extensions that deal with types it uses the type names 
> to determine the JSON type to use, i.e. if 2 names are used to refer to the 
> same OCaml type, they can use 2 different JSON representations.
> A common, predefined example is the "assoc" type, which is defined as 
> follows:
>
> type 'a assoc = (string * 'a) assoc

erratum:

type 'a assoc = (string * 'a) list

> and would use a JSON object rather than a JSON array of arrays, which is
> common usage.
>
>
> Martin
> --
> http: //wink.com/profile/mjambon
> http: //martin.jambon.free.fr


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-20 21:43   ` Gerd Stolpmann
@ 2008-03-21 14:37     ` Dario Teixeira
  2008-03-21 15:24       ` Richard Jones
  2008-03-21 16:04       ` Gerd Stolpmann
  0 siblings, 2 replies; 30+ messages in thread
From: Dario Teixeira @ 2008-03-21 14:37 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

> No, there is a generator for that in Ocamlnet. ocamlrpcgen can be used
> to generate these functions.

Hi,

If I remember correctly, the model with XDR+rpcgen is that the data type
is defined in a special XDR notation, which ocamlrpcgen will then use to
generate the Ocaml type and the (de)serialisation functions.  Though XDR
offers a fairly rich type set, it's not quite as versatile as Ocaml's.
I just wonder if this will lead to situations where one would rather
write the (de)serialisation functions by hand instead of relying on
the poorer expressiveness of the automatic generators.

Btw, do you have any numbers concerning XDR performance?  My guess
is that this would be the fastest method after Marshalling.

Cheers,
Dario

      __________________________________________________________
Sent from Yahoo! Mail.
More Ways to Keep in Touch. http://uk.docs.yahoo.com/nowyoucan.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-21 14:37     ` Dario Teixeira
@ 2008-03-21 15:24       ` Richard Jones
  2008-03-22 12:14         ` David MENTRE
  2008-03-21 16:04       ` Gerd Stolpmann
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Jones @ 2008-03-21 15:24 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Fri, Mar 21, 2008 at 02:37:28PM +0000, Dario Teixeira wrote:
> If I remember correctly, the model with XDR+rpcgen is that the data type
> is defined in a special XDR notation, which ocamlrpcgen will then use to
> generate the Ocaml type and the (de)serialisation functions.

That's right.  You write a '*.x' file and it gets converted to C by
rpcgen or to OCaml by ocamlrpcgen.  There's a very lengthy example I
wrote below.  XDR is regarded as a rather "old" protocol and support
is somewhat limited (basically, C, Java and OCaml).  On the other hand
it is well-understood and miles faster than anything else, since it's
a simple marshalling format just like OCaml's Marshal.

http://git.et.redhat.com/?p=libvirt.git;a=blob_plain;f=qemud/remote_protocol.x;hb=HEAD

> Though XDR
> offers a fairly rich type set, it's not quite as versatile as Ocaml's.
> I just wonder if this will lead to situations where one would rather
> write the (de)serialisation functions by hand instead of relying on
> the poorer expressiveness of the automatic generators.

The limited type set is an advantage if you're sharing data with other
languages (or if you're using C), but a disadvantage otherwise.

> Btw, do you have any numbers concerning XDR performance?  My guess
> is that this would be the fastest method after Marshalling.

There's a really tiny table at the end of this document, comparing it
to XML so not really any competition:

http://et.redhat.com/~rjones/secure_rpc/

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-21 14:37     ` Dario Teixeira
  2008-03-21 15:24       ` Richard Jones
@ 2008-03-21 16:04       ` Gerd Stolpmann
  1 sibling, 0 replies; 30+ messages in thread
From: Gerd Stolpmann @ 2008-03-21 16:04 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Am Freitag, den 21.03.2008, 14:37 +0000 schrieb Dario Teixeira:
> > No, there is a generator for that in Ocamlnet. ocamlrpcgen can be used
> > to generate these functions.
> 
> Hi,
> 
> If I remember correctly, the model with XDR+rpcgen is that the data type
> is defined in a special XDR notation, which ocamlrpcgen will then use to
> generate the Ocaml type and the (de)serialisation functions.  Though XDR
> offers a fairly rich type set, it's not quite as versatile as Ocaml's.

No, but it's ok. There are products, sums, sequences, and options. What
you cannot do is to marshal cyclic data - this is a limitation XDR
shares with most other external representations. There's also no notion
of objects.

> I just wonder if this will lead to situations where one would rather
> write the (de)serialisation functions by hand instead of relying on
> the poorer expressiveness of the automatic generators.

This may be an issue. Currently, ocamlrpcgen understands only a few
annotations that modify the O'Caml type the XDR type is mapped to. 

In the past months, I wrote Hydro, which is a library for another RPC
protocol called ICE. Hydro bases on my SunRPC efforts, and improves on a
number of its limitations. In Hydro, it is possible to annotate an ICE
type with an O'Caml function that converts it into a more pleasuring
representation. This allows to fix most shortcomings of the built-in
mapping to O'Caml types. Something similar could be done for XDR.

I wouldn't recommend Hydro for storing values, because its model is
OO-centric, and there is some impedance mismatch between the OO approach
and O'Caml's type system. Except you have cyclic values, because ICE can
represent that. (If you got curious: http://oss.wink.com)

> Btw, do you have any numbers concerning XDR performance?  My guess
> is that this would be the fastest method after Marshalling.

I don't have numbers. The O'Caml implementation is definitely slower
than the C code generated by rpcgen. However, the company I'm currently
working for uses this implementation for a high-performance cluster of
servers, and we never even thought about the XDR speed. It never
mattered.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Long-term storage of values
  2008-03-21 15:24       ` Richard Jones
@ 2008-03-22 12:14         ` David MENTRE
  0 siblings, 0 replies; 30+ messages in thread
From: David MENTRE @ 2008-03-22 12:14 UTC (permalink / raw)
  To: Richard Jones; +Cc: Dario Teixeira, caml-list

Richard Jones <rich@annexia.org> writes:

> XDR is regarded as a rather "old" protocol and support
> is somewhat limited (basically, C, Java and OCaml).

And Python: http://thomas.enix.org/pub/pyrpc/ We used it to talk using
SUN's RPC (ONC RPC) to a server written in OCaml.

> On the other hand it is well-understood and miles faster than
> anything else, since it's a simple marshalling format just like
> OCaml's Marshal.

And the on-the-wire storage is much more efficient compared to say, XML,
with no issues like decimal-asci <-> binary conversions.

Yours,
d.
-- 
GPG/PGP key: A3AD7A2A David MENTRE <dmentre@linux-france.org>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-03-22 12:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-28 18:41 Long-term storage of values Dario Teixeira
2008-02-28 20:01 ` [Caml-list] " David MENTRE
2008-02-28 20:01 ` Thomas Fischbacher
2008-02-28 20:05 ` Mathias Kende
2008-02-28 22:09 ` Basile STARYNKEVITCH
2008-02-29 14:45   ` Martin Jambon
2008-02-29 19:09     ` Jake Donham
2008-02-28 23:42 ` Erik de Castro Lopo
2008-02-29  1:14 ` Brian Hurt
2008-02-29  7:40   ` Gabriel Kerneis
2008-02-29 10:19     ` Berke Durak
2008-02-29 18:05       ` Markus Mottl
2008-02-29 11:44     ` Richard Jones
2008-02-29 14:09       ` Brian Hurt
2008-03-01 14:15   ` Dario Teixeira
2008-03-20 21:03 ` Dario Teixeira
2008-03-20 21:32   ` Martin Jambon
2008-03-20 22:41     ` Dario Teixeira
2008-03-20 23:00       ` Martin Jambon
2008-03-21 14:01         ` Dario Teixeira
2008-03-21 14:28           ` Martin Jambon
2008-03-21 14:34             ` Martin Jambon
2008-03-20 21:42   ` Daniel Bünzli
2008-03-20 22:33     ` Dario Teixeira
2008-03-20 21:43   ` Gerd Stolpmann
2008-03-21 14:37     ` Dario Teixeira
2008-03-21 15:24       ` Richard Jones
2008-03-22 12:14         ` David MENTRE
2008-03-21 16:04       ` Gerd Stolpmann
2008-03-21 10:32   ` Berke Durak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).