Re: Efficency in OCaml

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: Efficency in OCaml
       [not found] <7qj9lv$aa9$1@goldenapple.srv.cs.cmu.edu>
@ 1999-09-01 18:40 ` chet
  1999-09-01 23:09   ` Jerome Vouillon
  0 siblings, 1 reply; 7+ messages in thread
From: chet @ 1999-09-01 18:40 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: caml-list, chet, blo.b

Jerome,

Is there a description of the Ocaml object and
"virtual-function-table" format?

Also, well, I think there's been some recent work on analyzing the
path-length of O-O code, and the conclusion has been that in fact,
methods do just "call one other" a lot.

That is, while C code is characterized by lots of tests and longer
path-lengths per function-body, C++ (and Java, *especially* -- geez,
it seems like Java code is all method-calls!) code tends to be short
code-bodies, with branches implemented effectively by calling virtual
functions.

I've had the opportunity to look at a *lot* of Java code in the past
few years, and it displays this trend to an extreme.  And to my
horror.

--chet--

>>>>> "JV" == Jerome Vouillon <Jerome.Vouillon@inria.fr> writes:

    JV> The type checking is done at compile time, but method dispatch
    JV> is always dynamic : there is indeed something similar to a
    JV> virtual function table. Therefore, a method invocation is
    JV> rather fast but not as fast as a function call. A great part
    JV> of the cost of method invocation also comes from the fact that
    JV> the method which must be called is not know at compile time
    JV> (function calls are must cheaper in Ocaml when the function is
    JV> known).

    JV> This does not really matters, however, as long as your program
    JV> does not spend its time making method invocations. I think
    JV> this is rare, even in a program making heavy use of objects :
    JV> usually, methods do something, and not just call one another.

    JV> -- Jrme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-01 18:40 ` Efficency in OCaml chet
@ 1999-09-01 23:09   ` Jerome Vouillon
  1999-09-04 14:26     ` Nicolas Ollinger
  1999-09-10 15:19     ` Hendrik Tews
  0 siblings, 2 replies; 7+ messages in thread
From: Jerome Vouillon @ 1999-09-01 23:09 UTC (permalink / raw)
  To: chet, Jerome Vouillon; +Cc: caml-list, blo.b

On Wed, Sep 01, 1999 at 02:40:21PM -0400, chet@watson.ibm.com wrote:
> Is there a description of the Ocaml object and
> "virtual-function-table" format?

I have not written any description but I can explain it rapidly here.
Objective Caml uses a scheme which is also used in the Gnu Objective C
compiler.

Each object holds a table containing its methods (the table is shared
among all objets of a same class). A unique integer is assigned at
run-time to each method name of a program.  This integer is used as an
index in this table to find the method. As the table is rather sparse,
it is implemented as a two-level array (an array of arrays of
functions). So, a method call
  "object#m e1 ... en"
is compile in something that looks like
  "object.(0).(idx mod N).(idx / N) objet e1 ... en"
where idx is the integer associated to the method name "m".

This scheme is very flexible and rather efficient. It would be hard to
improve it actually. Indeed, one will always need one indirection to
know what class the object belongs to and another indirection to
retrieve the method. So one could only gain one indirection (the
arithmetics is more or less free, as it can be done in parallel).
Actually, there is yet another indirection, not shown here, to
retrieve a pointer to the code from the method closure. This
indirection could probably be optimize a bit (by doing it in parallel
with the retrieval of the closure).

A great part of the cost of a method call also comes from the fact
that calling an unknown function is more expansive than calling a know
function : there is a check to see if the function is called with less
arguments than it needs, more arguments or exactly the right number of
arguments. The only way to optimize this would be to make a global
analyze of the program during the compilation. This analyze would
determine whether at a given method invocation location "x#m" all
methods that could possibly be called are called with the right number
of arguments, and would also check whether the method invocation could
be replaced by a function call (that is, whether only one method can
be called from this location). In order to provide more opportunities
for optimization, one would also probably need to do some partial
evaluation. Implementing all this would require a lot of work...

> Also, well, I think there's been some recent work on analyzing the
> path-length of O-O code, and the conclusion has been that in fact,
> methods do just "call one other" a lot.
> 
> That is, while C code is characterized by lots of tests and longer
> path-lengths per function-body, C++ (and Java, *especially* -- geez,
> it seems like Java code is all method-calls!) code tends to be short
> code-bodies, with branches implemented effectively by calling virtual
> functions.

OCaml is first a functional langage so I think (I hope) that people do
not use objects as heavily as in Java, and for instance use pattern
matching rather than only method calls for implementing branches.

-- Jérôme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-01 23:09   ` Jerome Vouillon
@ 1999-09-04 14:26     ` Nicolas Ollinger
  1999-09-10 13:14       ` Jerome Vouillon
  1999-09-10 15:19     ` Hendrik Tews
  1 sibling, 1 reply; 7+ messages in thread
From: Nicolas Ollinger @ 1999-09-04 14:26 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: caml-list

On Thu, 2 Sep 1999, Jerome Vouillon wrote:

> On Wed, Sep 01, 1999 at 02:40:21PM -0400, chet@watson.ibm.com wrote:
> > Is there a description of the Ocaml object and
> > "virtual-function-table" format?

(snip description of buckets use)

I played a little with objects representation in OCaml 2.xx. As far as
I understand, at least in bytecode, class instances are represented as
boxed values tagged object_tag with n+2 fields : then first field is the
method array array described by Jerome in last mail, the second field
seems to be a unique id associate to the object, other fields are used
for instance variables in the order of declaration, inherited variables
first. As the method array array is unique for each class, it can be used
to identify the class (notice that classes are represented as global
variables). I'm intrigued by this second field, what is the use of this
id ? Where is the necessity to identify uniquely every object ?

Concerning marshaling of objects, a simple solution is to use a function
like:

let crunch o =
  let r = Obj.dup (Obj.repr o) in
  let idclass = compute_id (Obj.field r 0) in
  Obj.set_field r 1 idclass;
  Obj.set_tag r Obj.marshaled_object_tag
  r;;

with compute_id a function that deduce a unique class id of the object.

Then unmarshaling is just doing the inverse operation. Of course, if
you want to share objects between different programs then you must add
some informations about the module in which the class is declared, and
so one.

Any comment ?

N.
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-04 14:26     ` Nicolas Ollinger
@ 1999-09-10 13:14       ` Jerome Vouillon
  0 siblings, 0 replies; 7+ messages in thread
From: Jerome Vouillon @ 1999-09-10 13:14 UTC (permalink / raw)
  To: Nicolas Ollinger, Jerome Vouillon; +Cc: caml-list

On Sat, Sep 04, 1999 at 05:26:33PM +0300, Nicolas Ollinger wrote:
> I played a little with objects representation in OCaml 2.xx. As far as
> I understand, at least in bytecode, class instances are represented as
> boxed values tagged object_tag with n+2 fields : then first field is the
> method array array described by Jerome in last mail, the second field
> seems to be a unique id associate to the object, other fields are used
> for instance variables in the order of declaration, inherited variables
> first. As the method array array is unique for each class, it can be used
> to identify the class (notice that classes are represented as global
> variables).

This is correct.

> I'm intrigued by this second field, what is the use of this
> id ? Where is the necessity to identify uniquely every object ?

This allows to compare objects using "(<)" or "compare" (and therefore
makes it easy to create sets of objects).

> Concerning marshaling of objects, a simple solution is to use a function
> like:
> 
> let crunch o =
>   let r = Obj.dup (Obj.repr o) in
>   let idclass = compute_id (Obj.field r 0) in
>   Obj.set_field r 1 idclass;
>   Obj.set_tag r Obj.marshaled_object_tag
>   r;;
> 
> with compute_id a function that deduce a unique class id of the object.
> 
> Then unmarshaling is just doing the inverse operation. Of course, if
> you want to share objects between different programs then you must add
> some information about the module in which the class is declared, and
> so one.

This would work. The hard part is to generate this information...

-- Jérôme




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-01 23:09   ` Jerome Vouillon
  1999-09-04 14:26     ` Nicolas Ollinger
@ 1999-09-10 15:19     ` Hendrik Tews
  1999-09-10 19:03       ` chet
  1999-09-15 12:39       ` Jerome Vouillon
  1 sibling, 2 replies; 7+ messages in thread
From: Hendrik Tews @ 1999-09-10 15:19 UTC (permalink / raw)
  To: caml-list



Jerome Vouillon writes:
   From: Jerome Vouillon <Jerome.Vouillon@inria.fr>
   Date: Thu, 2 Sep 1999 01:09:39 +0200
   Subject: Re: Efficency in OCaml
   
   [...]
   
   Each object holds a table containing its methods (the table is shared
   among all objets of a same class). A unique integer is assigned at
   run-time to each method name of a program.  This integer is used as an
   index in this table to find the method. As the table is rather sparse,
   it is implemented as a two-level array (an array of arrays of
   functions). So, a method call
     "object#m e1 ... en"
   is compile in something that looks like
     "object.(0).(idx mod N).(idx / N) objet e1 ... en"
   where idx is the integer associated to the method name "m".
   
Sorry, I don't understand this. How can the compiler know idx, if
it is not known until run-time?

Bye,

Hendrik




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-10 15:19     ` Hendrik Tews
@ 1999-09-10 19:03       ` chet
  1999-09-15 12:39       ` Jerome Vouillon
  1 sibling, 0 replies; 7+ messages in thread
From: chet @ 1999-09-10 19:03 UTC (permalink / raw)
  To: Hendrik Tews; +Cc: caml-list


Take a small program, e.g.

================================================================
class oa n =
  object
	val mutable a = n
	method get () = a
    method set a' = a <- a'
    method print_a () = (Printf.printf "A: %d\n" a; flush stdout)
  end
;;

print_string "foo\n";;
let ai = new oa 37;;

ai#print_a();;
================================================================

and run

% ocamlc -c -drawlambda o1.ml

on it.  Doing so on "oo.ml" (after having compiled "oo.mli" in the
same directory!) will also help illuminate the situation.

--chet--



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Efficency in OCaml
  1999-09-10 15:19     ` Hendrik Tews
  1999-09-10 19:03       ` chet
@ 1999-09-15 12:39       ` Jerome Vouillon
  1 sibling, 0 replies; 7+ messages in thread
From: Jerome Vouillon @ 1999-09-15 12:39 UTC (permalink / raw)
  To: Hendrik Tews, caml-list

On Fri, Sep 10, 1999 at 05:19:39PM +0200, Hendrik Tews wrote:
>    Each object holds a table containing its methods (the table is shared
>    among all objets of a same class). A unique integer is assigned at
>    run-time to each method name of a program.  This integer is used as an
>    index in this table to find the method. As the table is rather sparse,
>    it is implemented as a two-level array (an array of arrays of
>    functions). So, a method call
>      "object#m e1 ... en"
>    is compile in something that looks like
>      "object.(0).(idx mod N).(idx / N) objet e1 ... en"
>    where idx is the integer associated to the method name "m".
>    
> Sorry, I don't understand this. How can the compiler know idx, if
> it is not known until run-time?

idx is a variable which is bound at run-time at the beginning of the
toplevel module containing the method invocation.

-- Jérôme




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1999-09-17 11:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <7qj9lv$aa9$1@goldenapple.srv.cs.cmu.edu>
1999-09-01 18:40 ` Efficency in OCaml chet
1999-09-01 23:09   ` Jerome Vouillon
1999-09-04 14:26     ` Nicolas Ollinger
1999-09-10 13:14       ` Jerome Vouillon
1999-09-10 15:19     ` Hendrik Tews
1999-09-10 19:03       ` chet
1999-09-15 12:39       ` Jerome Vouillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).