caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Grégoire Henry" <gregoire.henry@pps.jussieu.fr>
To: Hendrik Tews <tews@cs.ru.nl>
Cc: caml-list@inria.fr, michel@mauny.net
Subject: Re: [Caml-list] SafeUnmarshal: questions/problems/timings
Date: Thu, 31 Aug 2006 12:01:19 +0200	[thread overview]
Message-ID: <DC78C5F1-121B-4E73-82FA-D5FCAA074D6F@pps.jussieu.fr> (raw)
In-Reply-To: <17652.53276.522158.649480@tandem.cs.ru.nl>

[-- Attachment #1: Type: text/plain, Size: 4845 bytes --]

Hello,

> 1. I made some measurements that suggest that the
>    unmarshal has quadratic complexity in the data size, see
>
>    http://www.cs.ru.nl/~tews/marshal-plot.eps
>    http://www.cs.ru.nl/~tews/marshal-plot-detail.eps
>
>    If my (simple-minded) estimations are correct it would take
>    more than 2 hours to check 4 MB of marshaled data (which I
>    generate in less than 3 seconds).
>
>    Is there any hope that the time complexity will improve?

If your value contains no sharing (hence no cycle), the type checking
of safe-unmarshalling should be linear in the size of the data.

If there is some sharing (but still no cycle), and if we assume that
looking for shared type information takes constant time, then type
checking should still be linear.

With cycles, things are a bit worse because of the computation of
strongly connected components.

Therefore, if your data contains shared parts but no cycle, then it
should be close to linear. Currently, mapping shared blocks to their
type is represented by an association list: accessing such type
information could explain this quadratic behaviour.

If you are in this situation, this could be the reason and the
implementation needs a more clever way of retrieving types of shared
blocks. Note that a simple workaround such as using the Map module is
incorrect because a memory compaction may occur and break the order
used for storing blocks.

With cycles, the `height' of cycles (cycles inside cycles) comes as an
extra complexity factor. But in order to produce a quadratic
behaviour, you need data with a very particular shape.

Could you please send us more information about the shape of your data
(sharing or not, cycles or not, and wether sharing increase linearly
or not with the size)?

> 2.	    Objective Caml version 3.09.3+dev0+ty1
>
>     # SafeUnmarshal.copy [^nativeint^] 4;;
>     Segmentation fault

See your fifth question :)

> 3. Would it be possible to put an english version of
>    http://www.pps.jussieu.fr/~henry/marshal/docTy/Ty.html online?

Certainly. I'm joining a translated ty.mli, I will put it online too.

> 4. Instead of type-safe unmarshal functions, I am more interested
>    in checking if some value that has been constructed outside
>    ocaml is type correct. Therefore I would suggest to make
>    Check.check available in come way. I am using now
>
>      let check obj ty = Check.check (Obj.repr obj) (Ty.dump ty)
>
>    with type
>
>      val check : 'a -> 'a tyrepr -> bool
>
>    Am I right that the type parameter of tyrepr is a kind of
>    phantom type that is mainly used to restrict the type of
>    SafeUnmarshal.from_channel? Then I could also use
>
>      val check : 'a -> 'b tyrepr -> bool  ?

Yes and yes.

>    It would be great if (as a debugging feature) this check could
>    produce some sort of trace that helps to locate those parts
>    that violate the given type.

My hope for reducing the complexity is to make some "pre-processing"
on the value when the unmarshaller reallocates it in the memory.
In this case, the Check.check function can't operate on "standard"
values, and should not be exported.

Yet I don't know if the "preprocessor" would be tightly bound to the
unmarshaller or have a chance to be exported too.

We can keep exporting a quadratic check for "standard" values.

> 5. nativeint, int32, and int64 are not supported yet (I would
>    suggest to make the documentation a bit more precise in that
>    point). As fix I use (in Check.check_block):
>
>     | Tnativeint ->
> 	tag = Obj.custom_tag && size = 2 &&
> 	((Obj.field obj 0) == (Obj.field (Obj.repr Nativeint.zero) 0))
>     | Tint32 ->
> 	tag = Obj.custom_tag && size = 2 &&
> 	((Obj.field obj 0) == (Obj.field (Obj.repr Int32.zero) 0))
>     | Tint64 ->
> 	tag = Obj.custom_tag && size = 3 &&
> 	((Obj.field obj 0) == (Obj.field (Obj.repr Int64.zero) 0))
>
>    Any comments? On a 64 bit architecture the int64 size should be
>    required to be 2.

let custom_int64_size = 1 + 64 / Sys.word_size;;

I will add those cases.

>    I would strongly suggest to replace the catch all cases
>
>     | _ -> false
>
>    in check.ml with some more precise code (it took me several
>    hours of bug hunting until I found out that the above line
>    made my unmarshal fail without even looking at the
>    nativeints).

Something like printing the type clash (looking for ty_xx, found ty_yy)?

Or pretty-printing part of the value that have been checked before  
failure?

> 6. Thanks for SafeUnmarshal, it helped me a lot when checking the
>    data created inside C++!

This is quite an original use for unmarshalling functions :-)

Many thanks for your reports and suggestions !

-- Grégoire Henry


[-- Attachment #2: ty.mli --]
[-- Type: application/octet-stream, Size: 4477 bytes --]

(*
 *
 * A run-time representation for OCaml types
 *
 *)

(** Functions for de-constructing values of type ['a tyrepr]. *)

(** {6 Type declaration} *)

(** A value of type [declaration] is produced by the compiler for each
    type declaration in a module. This allows at runtime to ``break
    the abstraction barriers'' imposed by the type-checker... *)
type declaration  

(** [name decl] returns the name of the type constructor defined by [decl]. *)
val name: declaration -> string

(** [fullname decl] returns the name of the type constructor defined by [decl],
    as well as the line and file name of this decleration. *)
val fullname: declaration -> string

(** [arity decl] returns the arity of the type constructor defined by [decl]. *)
val arity: declaration -> int

(** [is_alias decl] returns [true] if [decl] is an alias type declaration.
    When [decl] is abstract both on the structure and the signature, it
    returns [false]. *)
val is_alias: declaration -> bool

(** [is_datatype decl] returns [true] if [decl] is a record or sum type 
    declaration. When [decl] is abstract both on the structure and the 
    signature, it returns [false]. *)
val is_datatype: declaration -> bool

(** {6 Type expression} *)

(** This is the ``untyped'' equivalent of the predefined type ['a tyrepr]. *)
type expression

(** The [dump] function allows to project a type expression  [([^ ty ^] :
    ty tyrepr)] to the type [expression]. *)
val dump: 'a tyrepr -> expression

(** [body decl] returns a type expression describing the body of the 
    type declaration [decl]. *)
val body: declaration -> expression

(** [canonical ty] resolves type aliases and returns a type expression
    that corresponds to a datatype declaration.

    Therefore,  [desc (canonical ty)]
    - is a predefined type ;
    - or is an abstract type ;
    - or has shape [Tconstr(decl,_)] such that  [is_datatype(decl)] is true.
*)
val canonical: expression -> expression

(** Type equality modulo aliases. (It is  useless to call [canonical] on
    [equal] arguments.) *)
val equal: expression -> expression -> bool

(** Pretty-printing, FIXME ... *)
val print: Format.formatter -> expression -> unit

(** {6 Description of type expression } *)

(** Description of type expressions. *)
type description = 
  | Tunit
  | Tbool
  | Tint
  | Tnativeint
  | Tint32
  | Tint64
  | Tchar
  | Tstring
  | Tfloat
  | Texn
  | Tarray of expression
  | Tlist of expression
  | Toption of expression
  | Tlazy of expression
  | Ttyrepr of expression
  | Tformat4 of expression * expression * expression * expression
  | Tconstr of declaration * expression array 
  | Ttuple of expression array
  | Tarrow of string * expression * expression
  | Tvariant of bool * (string * int * expression option) array

  | Tabstract 
  | Tsum of string array * (string * expression array) array
  | Tdoublerecord of (string * bool) array
  | Trecord of (string * bool * expression) array
  | Targ of int
  
  | Tpoly of int * expression
  | Tvar of int 

(** [desc ty] returns a description of the type expression [ty].

    It corresponds to the type expression that we can write using
    OCaml concrete syntax. It therefore excludes constructors such as
    [Tabstract], [Tsum], [Tdoublerecord] or [Trecord]. See [repr]. *)
val desc: expression -> description

(** [repr ty] returns the description of possible memory
    representations for the type expression [ty].

    By contrast with [desc ty], it can be [Tabstract], [Tsum],
    [Tdoublerecord] ou [Trecord], but not [Tconstr]. *)
val repr: expression -> description

(** {6 Safe unmarshalling} *)

(** The following functions are specific to safe unmarshalling. They
    operate on implicitly universally quantified type expression. The
    constant [Tvar i] represents the constant {i Top} of the
    (currently only in French) formalization. *)

(** [cut_head_quantification ty] removes hypothetical headings
    [Tpoly _] from [ty]. *)
val cut_head_quantification: expression -> expression

(** [instance ty1 ty2] returns true when [ty1] is a polymorphic
    instance [ty2]. *) 
val instance: expression -> expression -> bool

(** Memoryless anti-unification (also sometimes called weak anti-unification).

    We use [Tvar (-1)] for resolving conflicts. *)
val antiunif: expression -> expression -> expression

(** [is_monomorphic ty] returns [true] when [ty] contains no
    occurrence of [Tvar (-1)]. *)
val is_monomorphic: expression -> bool

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



  reply	other threads:[~2006-08-31 10:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-29 23:39 Hendrik Tews
2006-08-31 10:01 ` Grégoire Henry [this message]
2006-09-01  9:23   ` [Caml-list] " Hendrik Tews
2006-09-14 13:07 Hendrik Tews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DC78C5F1-121B-4E73-82FA-D5FCAA074D6F@pps.jussieu.fr \
    --to=gregoire.henry@pps.jussieu.fr \
    --cc=caml-list@inria.fr \
    --cc=michel@mauny.net \
    --cc=tews@cs.ru.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).