Re: speed versus C

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: speed versus C
       [not found] <Pine.LNX.4.03.9910081713230.31666-100001@post.tepkom.ru>
@ 1999-10-10  4:51 ` skaller
  1999-10-11  9:08   ` Anton Moscal
  0 siblings, 1 reply; 32+ messages in thread
From: skaller @ 1999-10-10  4:51 UTC (permalink / raw)
  To: Anton Moscal; +Cc: Caml list

Anton Moscal wrote:

> Assignment to array element can be very ineffictive in O'Caml due to the
> following reasons:
> 
> 1)O'Caml uses generational GC. 

	Can you explain what a 'generational' collector is?


-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10  4:51 ` speed versus C skaller
@ 1999-10-11  9:08   ` Anton Moscal
  0 siblings, 0 replies; 32+ messages in thread
From: Anton Moscal @ 1999-10-11  9:08 UTC (permalink / raw)
  To: skaller; +Cc: Caml list

On Sun, 10 Oct 1999, skaller wrote:

> > Assignment to array element can be very ineffictive in O'Caml due to the
> > following reasons:
> > 
> > 1)O'Caml uses generational GC. 
> 
> 	Can you explain what a 'generational' collector is?

http://www.iecc.com/gclist/GC-faq.html

This page contains very good introduction into main garbage collection 
algorithms. From this FAQ:

-------------------------
 Generational collection

Based on the observation that most objects have short lifetimes, it is
useful to restrict garbage collection to the most recently allocated objects. 

A generational collector maintains several ``generations'' of objects.
Newly created objects are all put in the ``youngest'' generation, and
when the space allocated for that generation is full, the collector will
use the root set (and any pointers from older generations to the youngest
generation -- see below) to reclaim dead objects from the youngest
generation only, leaving the ``older'' generations untouched. Objects
that survive after several (perhaps just one) collection of the youngest
generation are ``promoted'' to the next older generation, and when that
generation becomes full, it and all the generations younger than it are
collected. 

The difficulty with generational collectors is the identification of
pointers from older generations into younger ones. An observation is that
such references are uncommon in most programs: new objects typically
point to older objects; this is clearly true in pure functional languages
where assignment is prohibited, but is common for many programs and
languages. Nonetheless, such references do exist, and a collector must
handle them. When a pointer to a younger generation object is written
into an old object, the pointer (or just the old object) is recorded so
it can be scanned when the younger generation is collected.
------------------------

O'Caml uses simple algorithm with two generations: young & old objects. 

Regards,
Anton Moscal

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-12 13:21 Damien Doligez
@ 1999-10-12 20:42 ` skaller
  0 siblings, 0 replies; 32+ messages in thread
From: skaller @ 1999-10-12 20:42 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml-list

Damien Doligez wrote:

> As far as I can tell, you've been asking for three things:
> 
> 1. Efficient finalization, with finalization functions written in O'Caml.
> 2. Correct finalization of circular references.
> 3. Immediate finalization: as soon as a value becomes unreachable, the
>    finalization function is called.
 
> I don't know how to get all three together.  It is likely that we'll
> have 1 and 2 in a future release of O'Caml.

	1 and 2 is good news! Now, Python does (3) by reference counting.
So it is always possible to do reference counting to get immediate
finalisation, there is an overhead in both time and space when
the garbage collector is used as well, and of course it doesn't
finalise unreachable circular references immediately.
But it is a good compromise, I believe Perl does something like this 
(reference count, and clean up with a GC at the end).

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-11 20:50                 ` Gerd Stolpmann
@ 1999-10-12 20:07                   ` skaller
  0 siblings, 0 replies; 32+ messages in thread
From: skaller @ 1999-10-12 20:07 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: William Chesters, caml-list

Gerd Stolpmann wrote:

> I was surprised by Caml when I tried it because it was not so slow than I
> expected. My point is that there are even some practically useful constructs in
> Caml which are faster than the construct a programmer would typically choose in
> the same situation in the "traditional" language. I can even imagine that
> functional languages are some day faster than traditional ones because medium-
> and high-level optimizations are better applicable. But this is only a dream.

I do not think it is a dream at all, but current reality with ocaml,
using it where it performs well. For example, a naive interpreter
for Python in ocaml is bound to be slower than C. However,
much sophistication is possible, such as infering types,
eliding dynamic lookup by string names, and recognising various
patterns, which, when implemented in C, would likely cost
a similar amount of processor time -- and take 10-100 times
longer to code, and remain unreliable. Perhaps C++ would
reduce the coding time to only 10 times longer, and improve
the reliability. But, for example, to add GC to Python,
is non-trivial in C, whereas in ocaml it is quite easy :-)

Consider also, FISh is an algol like language, like Ocaml
mixing functional and imperative features, and it 
outperforms C and Fortran for the same applications.
[For example, FISh quicksort is faster than the C
standard library qsort, because there is no function
calling overhead]

I think my point is that it isn't 'only a dream',
many people are not only dreaming, but implementing
some of it. [including the ocaml developers, for many years now].

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
@ 1999-10-12 13:21 Damien Doligez
  1999-10-12 20:42 ` skaller
  0 siblings, 1 reply; 32+ messages in thread
From: Damien Doligez @ 1999-10-12 13:21 UTC (permalink / raw)
  To: caml-list

>From: skaller <skaller@maxtal.com.au>

>        However, C++ allows finalisation and ocaml doesn't,
>which is a serious problem in ocaml when it is needed.
>
>        I would very much like to see some discussion by ocaml experts
>(not me!) leading to a standard solution to this problem, probably

As far as I can tell, you've been asking for three things:

1. Efficient finalization, with finalization functions written in O'Caml.
2. Correct finalization of circular references.
3. Immediate finalization: as soon as a value becomes unreachable, the
   finalization function is called.

I don't know how to get all three together.  It is likely that we'll
have 1 and 2 in a future release of O'Caml.

-- Damien

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10 20:48               ` William Chesters
  1999-10-10 23:54                 ` Alain Frisch
@ 1999-10-11 20:50                 ` Gerd Stolpmann
  1999-10-12 20:07                   ` skaller
  1 sibling, 1 reply; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-11 20:50 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

On Sun, 10 Oct 1999, William Chesters wrote:

>   My point was simply that nearly every* feature of ocaml, however
>abstract in appearance, compiles directly, and compositionally, onto
>an idiom which one might well use in C or even assembler---give or
>take some amount of sugar.  Looking at this fact one way round, I
>observe that the reason ocaml is so fast is that it mostly* stays
>within the framework of the traditional computer model; looking at it
>from the other direction, I note that the constructs which ocaml maps
>onto the various different C idioms illuminate the "deeper meaning" of
>the latter in terms of a much more abstract semantics.

Perhaps I have a too traditional understanding of C idioms, I think they first
reflect what is simple to write in the context of the language. Of course,
there is the wish of the users of the language who want to express their ideas,
i.e. the conceptual side of abstraction. These wishes are really an interesting
field, they are driven by several motives. On the one hand, there is often a
technical need to use abstraction, e.g. to make the program structure simpler.
I think your Linux example falls into this category. On the other hand,
abstractions are the basis how we think about programs, because they instruct us
which thoughts are considered reasonable and which not. If you, for example,
study functional languages and then read your old C programs again, you might
detect some deeper meaning in it. I do not want to judge if there is really
deepness, but I think the more interesting point is that your way of thinking
about programs has changed in the meantime, and you can interpret the idioms of
the language in a different context. I still do not accept that the Linux
closure example has all aspects I expect from a closure, but I admit that you
can find many aspects of a closure in it (perhaps 90%), i.e. it can be
interpreted on this background.

The point is that although there is a technical (operational) meaning of a
programming construct this is not the whole truth. Since I program in Caml I
have learned to think with closures, i.e. I always consider to create an ad-hoc
function which refers to the current value of the variables. I would never do
so when programming in C unless because of a strong requirement; it would be too
much work. (And the aspect that closures often have an "ad hoc nature" is
completely lost.)

There are much more drastical examples where a computational model is simulated
by a different one. Ever seen a grammar working like a Turing machine? This is
not only of theoretical interest, because both models stress important aspects
of EVERY computational model, and by studying them you can get more insight
into them. For example, a grammar is by definition non-deterministic, and by
studying them you can learn what it really means that there is no specific
order how the instructions are executed. This lesson may be instructive if you
are going to program with lazy evaluation (which has an order, but impossible
to survey).

The relationship between C and Caml is similar because both have a different
"history of semantics", and each may illuminate the other. Of course, both
share that it must be possible to run the programs efficiently on real
hardware, and this makes the discussion so controversial.

>   Compare this with lazy languages, with which the whole discussion
>started: they must necessarily use the traditional CPU in a pretty
>contorted way to implement a basically foreign computational model.
>(Graph reduction, or however you like to present it.)  Compare it too
>with SML/NJ, which supports continuations and therefore has to
>allocate its stack frames on the heap---crazy, because continuations
>aren't all that useful (corresponding most closely to a non-local
>JMP), and noone seems to believe their protestations that this
>implementation carries 0 performance penalty.

I still think that the computational models of the CPU and Caml are very
different, and that it is a lucky result that lambda-calculus with strict
evaluation can be implemented on the CPU very efficiently. This is not a design
decision somewhere in the middle of the development, I think it is a starting
point. Lazy evaluation is a different starting point; you cannot really discuss
whether they are sensible or not, you can only discuss how well the resulting
language is applicable to your problem. I think continuations are different,
because they are not in the center of the language.

>   I contend that on the one hand stepping distinctly outside the
>traditional model means slowness, and on the other that the
>traditional model is not a bad one to think in, as long as your
>understanding of it is enriched by experiencing and preferably using a
>language like ocaml (and/or C++).

I was surprised by Caml when I tried it because it was not so slow than I
expected. My point is that there are even some practically useful constructs in
Caml which are faster than the construct a programmer would typically choose in
the same situation in the "traditional" language. I can even imagine that
functional languages are some day faster than traditional ones because medium-
and high-level optimizations are better applicable. But this is only a dream.

Gerd
--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10 23:54                 ` Alain Frisch
  1999-10-11 17:58                   ` William Chesters
@ 1999-10-11 19:32                   ` John Prevost
  1 sibling, 0 replies; 32+ messages in thread
From: John Prevost @ 1999-10-11 19:32 UTC (permalink / raw)
  To: caml-list

Alain Frisch <frisch@clipper.ens.fr> writes:

> Do you think it would be easy to design processors with built-in
> support for boxed values, GC tags, OO, etc ... that is, a concrete
> OCaml machine ?

I'd actually vote more for a system involving type directed GC,
although I don't know that it'd be appropriate to put into a
processor.

John.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10 23:54                 ` Alain Frisch
@ 1999-10-11 17:58                   ` William Chesters
  1999-10-11 19:32                   ` John Prevost
  1 sibling, 0 replies; 32+ messages in thread
From: William Chesters @ 1999-10-11 17:58 UTC (permalink / raw)
  To: caml-list

Alain Frisch writes:
 > Do you think it would be easy to design processors with built-in support
 > for boxed values, GC tags, OO, etc ... that is, a concrete OCaml machine ?

This was tried in the 80s on quite a large scale by Symbolics, with
the Lisp machine (to which you will find 1000s of references on the
net).  It implemented some of the Lisp runtime, which is pretty
similar in conception to the ocaml runtime, in hardware.  It was all
very nicely done, and apparently the machines and OS were pretty
wonderful in many ways.  Enough examples were sold to upmarket
(e.g. AIish) firms and labs over a period of years to make it into a
bit of legend.

But of course they ran into a problem: the big firms spend billions on
developing sophisticated classical chips with incredibly large numbers
of transistors, to serve their vast market.  So the cost of fixing
your software performance hit just by buying a faster, but entirely
standard, computer is simply not big enough to support the development
of a whole different architecture.

Maybe things have matured and opened up now to the point where one
could take a readily available SPARC or StrongARM core and tack some
GC support onto it, I don't know.  Certainly Sun are hyping their MAJC
Java chip pretty strongly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10 20:48               ` William Chesters
@ 1999-10-10 23:54                 ` Alain Frisch
  1999-10-11 17:58                   ` William Chesters
  1999-10-11 19:32                   ` John Prevost
  1999-10-11 20:50                 ` Gerd Stolpmann
  1 sibling, 2 replies; 32+ messages in thread
From: Alain Frisch @ 1999-10-10 23:54 UTC (permalink / raw)
  To: caml-list

On Sun, 10 Oct 1999, William Chesters wrote:
>    My point was simply that nearly every* feature of ocaml, however
> abstract in appearance, compiles directly, and compositionally, onto
> an idiom which one might well use in C or even assembler---give or
> take some amount of sugar.  Looking at this fact one way round, I
<snip>
>    * apart from GC and the ocaml classes (of which I must admit I am
> slightly suspicious, because of the significant overhead in a method
> call---you don't really want to use them in an inner loop)

I would also add boxing/unboxing, and structural comparison to the list of
important features which aren't well implemented in classical
architecture.

Do you think it would be easy to design processors with built-in support
for boxed values, GC tags, OO, etc ... that is, a concrete OCaml machine ?

--
Alain Frisch




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-10 16:27             ` Gerd Stolpmann
@ 1999-10-10 20:48               ` William Chesters
  1999-10-10 23:54                 ` Alain Frisch
  1999-10-11 20:50                 ` Gerd Stolpmann
  0 siblings, 2 replies; 32+ messages in thread
From: William Chesters @ 1999-10-10 20:48 UTC (permalink / raw)
  To: caml-list

Gerd Stolpmann writes:
 > If you wanted to have a fully general substitute of closures in C (or
 > assembler), you could do it as follows: For every function store a function
 > pointer and an array of implicit parameters, e.g.

I'm not sure we are really connecting here.  The fragment I quoted
involved a table of functions which share "implicit parameters" (the
`file' struct)---i.e., a thinly disguised C++ object, implemented in
exactly the way cfront used to do it.

   (I wish I hadn't mentioned objects at all.  The simpler case of a
single function pointer associated with a single implicit parameter is
common in the APIs to numerical library routines.)

 > In object-oriented languages there is another way of paraphrasing
 > closures.

As I said, a closure is an object with only one method.

 > >(Though I'd argue that's because it sticks to
 > >abstractions that "ornament" the low-level computational model without
 > >"obscuring" it :-) .)
 > 
 > I think this is exactly the point where we have different opinions.

More like, we are understanding "the low-level model" to mean
different things.  I am happy to consider a function pointer plus a
persistent data record to "really be" a closure---something which one
might not realise before one was exposed to FLs, so that they enrich
and clarify one's understanding of low-level programming---whereas you
perhaps aren't?

   Give me a little credit and try to understand what I say charitably
:-).  I don't know what your background is, and I don't know how much
patience you have with "impressionistic" ideas.  But I did once study
formal semantics, domain theory and the deep way different
computational models relate to each other in some detail, so I am
perfectly well aware of what constitutes a tight argument in this
context.

   My point was simply that nearly every* feature of ocaml, however
abstract in appearance, compiles directly, and compositionally, onto
an idiom which one might well use in C or even assembler---give or
take some amount of sugar.  Looking at this fact one way round, I
observe that the reason ocaml is so fast is that it mostly* stays
within the framework of the traditional computer model; looking at it
from the other direction, I note that the constructs which ocaml maps
onto the various different C idioms illuminate the "deeper meaning" of
the latter in terms of a much more abstract semantics.

   * apart from GC and the ocaml classes (of which I must admit I am
slightly suspicious, because of the significant overhead in a method
call---you don't really want to use them in an inner loop)

   Compare this with lazy languages, with which the whole discussion
started: they must necessarily use the traditional CPU in a pretty
contorted way to implement a basically foreign computational model.
(Graph reduction, or however you like to present it.)  Compare it too
with SML/NJ, which supports continuations and therefore has to
allocate its stack frames on the heap---crazy, because continuations
aren't all that useful (corresponding most closely to a non-local
JMP), and noone seems to believe their protestations that this
implementation carries 0 performance penalty.

   I contend that on the one hand stepping distinctly outside the
traditional model means slowness, and on the other that the
traditional model is not a bad one to think in, as long as your
understanding of it is enriched by experiencing and preferably using a
language like ocaml (and/or C++).

   Anyway, thanks for the discussion!

Cheers,
William

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-08  0:26           ` William Chesters
@ 1999-10-10 16:27             ` Gerd Stolpmann
  1999-10-10 20:48               ` William Chesters
  0 siblings, 1 reply; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-10 16:27 UTC (permalink / raw)
  To: caml-list

On Fri, 08 Oct 1999, William Chesters wrote:

>OK, how about this real life example from the Linux kernel:
>
>	error = file->f_op->read(inode,file,buf,count);
>
>Here, `file' is a faked object, with `vtbl' = `f_op' and `this' passed
>in the second argument.  And what is a closure if not an object with
>one method :-) ?  I think this is quite a natural idiom to use, even
>in assembler---especially once one has seen how it can be given a nice
>meaning within a higher level framework like C++ or indeed Caml.

This is exactly what I mean because your example is not as general as a closure
in Caml. As far as I understand, the "f_op" component of "file" contains
pointers to the functions implementing the file operations of the various
filesystems. When the file is opened, these pointers are initialized with the
addresses of the functions of the file system where the file resides. Let me
show the difference to Caml by porting the definitions:

type file_operations =
  { llseek : file -> loff_t -> int -> loff_t;
    read   : file -> buffer -> loff_t -> size_t;
    (* and many others *)
  }
and file =
  { (* among other components: *)
    f_op : file_operations;
  }

When the f_op component is initialized functions are assigned to the "llseek",
"read" and the other components, e.g.

f_op.read <- (fun f b offset -> ... code ...)

Caml is more general because the code defining the function has access to EVERY
variable of the current scope, not only the parameters and global variables.
For example, you could use this to pass additional runtime configurations to
the functions:

let config = ... some value ... in
f_op.read <- (fun f b offset -> ... code accessing 'config' ...)

The value of 'config' is computed at the time the function is assigned, and it
is stored in a private array of implicitly passed parameters such that it is
still available when the function is actually called.

This is a language feature which can always be replaced by simpler
techniques. In this case you can simply extend the "file" struct by an
additional "config" component. 

If you wanted to have a fully general substitute of closures in C (or
assembler), you could do it as follows: For every function store a function
pointer and an array of implicit parameters, e.g.

struct file_operations {
        loff_t (*llseek) (struct file *, loff_t, int);
	void  **llseek_implicit_params;
        ssize_t (*read) (struct file *, char *, size_t, loff_t *);
	void  **read_implicit_params;
	...
}

When the function is assigned, you can now pass implicit parameters by
storing them into the array; in the function definition you can access the
parameters. The point is that every function definition can have its own array
of implicit parameters with its own meaning; to the outer world this is fully
abstract.

There are many reasons not to paraphrase closures as described, most important
you lose all type safety. For almost all situations in which closures would be
adequate it is also possible to make the implicit parameters explicit, and I
think most programmers do so.

In object-oriented languages there is another way of paraphrasing closures.
Define an abstract class with just one method (e.g. "read"); every
implementation of this method is done in subclasses. You can then pass implicit
parameters when the object is created and store them into instance variables.

>(Though I'd argue that's because it sticks to
>abstractions that "ornament" the low-level computational model without
>"obscuring" it :-) .)

I think this is exactly the point where we have different opinions.

Gerd
--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07 22:18     ` Gerd Stolpmann
@ 1999-10-08 19:15       ` skaller
  0 siblings, 0 replies; 32+ messages in thread
From: skaller @ 1999-10-08 19:15 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: caml-list

Gerd Stolpmann wrote:
> 
> >Also, where the library lacks certain facilities, it is occasionally
> >harder to provide efficient data structures than in C. These problems
> >are not intrinsic to ocaml, but can be solved by carefully considered
> >extension
> >of the standard library.
> 
> Any ideas?

	Sure: too many, and not enough ocaml experience.
But a good place to start would be to examine the C++ Standard Template
Library.
There are various 'classical' data structures which should be available:
singly linked lists, doubly linked lists, arrays of fixed length,
arrays of variable length, binary trees with various balancing acts,
quad trees, b-trees, hash tables, graphs, directed graphs, DAGS, stacks,
queues,
priority queues, ..

	There are also some more exotic ones: a generic garbage collector for
arbitrary resources, for example, would be interesting to consider.

	There are also a lot of numerical algorithms known :-)

> It's only from experience, and I have often good
> results with lists or trees.

	Yes, but they do not perform at all well when random access is
required.

> The world is full of such 'micky mouse' programs, and they are really used. Not
> freeing memory is very common if the amount of allocated data is not very high
> at all, or if the program runs only for a short time. There is nothing bad to
> say against it.

	I am not saying anything 'bad' against it, just that in such cases,
it is unlikely performance matters either: we should be considering
library code and intensive applications using it, when comparing
say, C/C++ and ocaml, since that is where the benefits of one or the
other really start to count.
 
> >       However, C++ allows finalisation and ocaml doesn't,
> >which is a serious problem in ocaml when it is needed.
> 
> Up to now I did not need finalisation, so I have no experience.

	I believe I would not design a program requiring it,
but my task is to implement an existing specification that
does require it. So I must find a solution, abandon the project,
modify the specifications, or try another language.

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-05 20:20 ` Gerd Stolpmann
  1999-10-06 15:21   ` William Chesters
  1999-10-07  6:56   ` skaller
@ 1999-10-08 13:40   ` Anton Moscal
  2 siblings, 0 replies; 32+ messages in thread
From: Anton Moscal @ 1999-10-08 13:40 UTC (permalink / raw)
  To: Caml list

On Tue, 5 Oct 1999, Gerd Stolpmann wrote:

> On Sun, 03 Oct 1999, Jan Brosius wrote:
> >Hi, is there anything known about the efficiency of compiled Ocaml code
> >compared with C/C++
> 
> A good comparision of the low-level features of Ocaml and C are my
> implementations of the cryptographic algorithms Blowfish and DES. Such
> algorithms do many integer calculations, bit shifting, and array lookups, i.e.
> tasks where C is perfect, and Ocaml is not designed for. The manually optimized
> algorithms are 8 to 10 times slower than the C counterparts, and a factor of 2
> can be explained with Ocaml's problems when computing with 32 bit numbers. To
> get around this, all 32 bit calculations are simulated by doing them on two 16
> bit halves. If such problems do not play a role, Ocaml is at most 4 to 5 times
> slower than C.

Assignment to array element can be very ineffictive in O'Caml due to the
following reasons:

1)O'Caml uses generational GC. Any pointer storing not to fresh memory
must be protected by write barrier. In O'Caml this means that if
type of array elements isn't unboxed type (int, char, bool, simple
enumeration or float), than a.(i) <- x implementation calls `modify'
function, which takes about 20-30 commands. 

for example:
----------------------
let rec test (a:int array) f = function 
   0 -> a
 | n -> 
   for i = 0 to Array.length a - 1 do a.(i) <- f a.(i) done;
   test a f (n-1);;

test (Array.create 1000 0) succ 20000;;
----------------------
runs 4.5 time faster than
----------------------
let rec test a f = function 
   0 -> a
 | n -> 
   for i = 0 to Array.length a - 1 do a.(i) <- f a.(i) done;
   test a f (n-1);;

test (Array.create 1000 0) succ 20000;;
----------------------
(on native code compiler for x86)

2) access to polymorphic array elements not very effective due
to special representation of the float arrays (polymorphic code 
test type of array and use different code for float/non-float
cases).

Good way to improve efficiency in such cases - suppress needless
polymorphism by type constaints.

Anton

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07 19:21         ` Gerd Stolpmann
  1999-10-08  0:26           ` William Chesters
@ 1999-10-08  9:56           ` Pierre Weis
  1 sibling, 0 replies; 32+ messages in thread
From: Pierre Weis @ 1999-10-08  9:56 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: caml-list

> in Caml imperative programming should not be preferred because
> the compiled code is better, but because the algorithm demands it, i.e. there
> are many algorithms which work magnitudes better when programmed imperatively.
> Think of the thousands of books explaining array algorithms; there
> are often no functional counterparts that can compete.

 I think your message is right, except may be for this last sentence
that can just be unreliable urbian folklore.

 Let me give you a surprising counter-example: ``once upon a time'',
many people from the Caml team were teaching ``algorithmic and
programmation'' courses, and some were tired of (bubble | heap | merge
| shell | quick | insertion | selection | mecanographe | bogo)-sorts
algorithms written in imperative (C | Pascal | Ada | Caml)-style. So,
those hackers decided to have some fun writing the fastest (in place)
sorting algorithm for Caml vectors. After a while, and a careful
benchmark campaign, it turns out that the fastest program was so
simple and trivial that it was almost unfair: it started by building a
list from the vector's elements, then called the merge sort algorithm
of the standard library and stored back to the vector the elements of
the sorted list! What an interesting imperative programming style!

 This old experimental result may be wrong today, due for instance to
the new optimizations of the current Caml compiler or to new hardware
specificities. However it is interesting to remember this story, just
to know that imperative programming is not uniformely better than
functional programming, even for vector sorting algorithms (at least
in languages that support the two programming paradigms, such as
... euh, I mean in Caml :).

 In conclusion, use whatever style leads to the most readable and
provably correct program for the problem at hand. Then optimize this
correct program, if it is absolutely mandatory and if you have enough
time.

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
@ 1999-10-08  6:57 Pascal Brisset
  0 siblings, 0 replies; 32+ messages in thread
From: Pascal Brisset @ 1999-10-08  6:57 UTC (permalink / raw)
  To: caml-list


Une petit benchmark qui pourrait être à l'avantage de C: crible d'Eratosthène

--8<------------------------------------------
let n = int_of_string Sys.argv.(1) in
let crible = Array.create n true in
for i = 2 to n-1 do
  if crible.(i)
  then begin
    for j = 2 to (n-1)/i do
      crible.(i*j) <- false
    done
  end
done
--8<------------------------------------------


--8<------------------------------------------
main(int argc, char **argv)
{
  int n = atoi(argv[1]);
  int crible[n], i, j;
  for(i=0; i < n; i++) { crible[i] = 1; }
  for(i = 2; i < n; i++) {
    if (crible[i]) {
      for(j = 2; j <= (n-1)/i; j++) {
	crible[i*j] = 0;
      }
    }
  }
}
--8<------------------------------------------


Sur un Pentium 400Mhz sous Linux:

sepia[854]% gcc -O4 premiers.c
sepia[855]% time a.out 2000000
0.860u 0.030s 0:00.91 97.8%     0+0k 0+0io 91pf+0w
sepia[856]% ocamlopt premiers.ml
sepia[857]% time a.out 2000000
0.910u 0.040s 0:00.96 98.9%     0+0k 0+0io 133pf+0w

 Seulement 5% de différence sur des opérations basiques entières, pas de quoi
se priver.
 Pour des benchmarks sophistiqués, méfions-nous; on a trop souvent tendance à
comparer une solution (en C) développée sur plusieurs années par des experts
à une solution (en Caml) écrite sur un coin de table « juste pour voir » ...

--Pascal Brisset




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07 19:21         ` Gerd Stolpmann
@ 1999-10-08  0:26           ` William Chesters
  1999-10-10 16:27             ` Gerd Stolpmann
  1999-10-08  9:56           ` Pierre Weis
  1 sibling, 1 reply; 32+ messages in thread
From: William Chesters @ 1999-10-08  0:26 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: William Chesters, caml-list

Gerd Stolpmann writes: 
 > >For me, the kind of elegance and beauty you want in a language
 > >comes not from constructing castles in the air, but from using
 > >abstract ideas to understand the real world better.  ocaml says
 > >"look, this is what you really mean when you write machine code".
 > 
 > I agree only partly. [...] For example, I cannot even imagine an
 > assembler program that uses closures (paraphrased by machine
 > instructions); there is always a much simpler way to get the same
 > effect.

OK, how about this real life example from the Linux kernel:

	error = file->f_op->read(inode,file,buf,count);

Here, `file' is a faked object, with `vtbl' = `f_op' and `this' passed
in the second argument.  And what is a closure if not an object with
one method :-) ?  I think this is quite a natural idiom to use, even
in assembler---especially once one has seen how it can be given a nice
meaning within a higher level framework like C++ or indeed Caml.

 > I like Caml because it does not waste resources, and because it
 > shows how cheap abstraction can be.

I can but agree ...  (Though I'd argue that's because it sticks to
abstractions that "ornament" the low-level computational model without
"obscuring" it :-) .)

 > I have done some benchmarks in the meantime:

Thanks, they were interesting (I was wrong about vectors being quicker
to construct).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07  6:56   ` skaller
  1999-10-07 12:37     ` Xavier Urbain
@ 1999-10-07 22:18     ` Gerd Stolpmann
  1999-10-08 19:15       ` skaller
  1 sibling, 1 reply; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-07 22:18 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On Thu, 07 Oct 1999, John Skaller wrote:
>	The factor is likely much worse than two. Doing multiple precision
>arithmetic, the factor is two, but even specially optimised simulations
>of 32 bit ints by 16 bit ones often involve a more than two
>instructions,
>and this impacts cache performance severely in tight loops.

It is very hard to estimate this. I know that there are more instructions
necessary but I do not know how much they weight compared with the whole task.
Cache performance decreases obviously because the double amount of data must be
processed (cryptographic algorithms perform lookups in precalculated tables).
The cache efficiency is hard to predict, I had strange effects while manually
optimizing the code.

>> As already pointed out, Ocaml is not designed for such problems, and this
>> slow-down factor is an upper limit. 
>
>	That depends on what kinds of data structures you are working with.
>Ocaml has some problems requiring many things to be initialised
>unnecessarily.
>Also, where the library lacks certain facilities, it is occasionally
>harder to provide efficient data structures than in C. These problems
>are not intrinsic to ocaml, but can be solved by carefully considered
>extension
>of the standard library.

Any ideas?

>>The typical application will have a speed
>> which is comparable with C/C++ solutions, as there are a number of advantages:
>> 
>> - The recursive data types of Ocaml are often more problem-oriented than
>>   "imperative" data structures. 
>
>	I don't think it is entirely reasonably to claim this; I find that
>the 'imperative' data structures are more generally applicable.
>Singly linked immutable lists are a special case well supported by ocaml
>(and most other functional languages) but their performance is abysmal
>for applications for which they are not suited.

No, not entirely reasonable. It's only from experience, and I have often good
results with lists or trees. "Problem-oriented" simply means that lists and
trees are very frequent in real world problems; I do not want to claim that
they work well for any problem, this is definitely not true.

>> - Memory management is much better; but this counts only for long-running
>>   applications. Many C/C++ programs only "malloc" and do not "free", so the
>>   time consumption of these functions isn't a problem.
>
>	I do not think 'micky mouse' programs that fail to release
>memory are worth considering here. C++ memory management can be
>very difficult to get both correct and efficient, and sometimes
>implementation details invade the program structure.

The world is full of such 'micky mouse' programs, and they are really used. Not
freeing memory is very common if the amount of allocated data is not very high
at all, or if the program runs only for a short time. There is nothing bad to
say against it.

>	However, C++ allows finalisation and ocaml doesn't,
>which is a serious problem in ocaml when it is needed.

Up to now I did not need finalisation, so I have no experience.

Gerd

--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07 10:46       ` William Chesters
  1999-10-07 15:48         ` Pierre Weis
@ 1999-10-07 19:21         ` Gerd Stolpmann
  1999-10-08  0:26           ` William Chesters
  1999-10-08  9:56           ` Pierre Weis
  1 sibling, 2 replies; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-07 19:21 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

On Thu, 07 Oct 1999, William Chesters wrote:
>Gerd Stolpmann writes:
> > The basis of your argumentation is that the array solution works
> > better with current hardware and typical problem sizes. For
> > example, memcpy is very fast because it is specially supported by
> > the hardware, much faster than initializing the same number of
> > elements of a list. This is absolutely true, but I think that it
> > only demonstrates that my example was too simple.
>
>... whereas I believe that it is merely part of a much more general
>point.  You almost sound like you think the efficiency of arrays
>should be discounted because it only applies "with current hardware
>and typical problem sizes".  It's dangerous to get so carried away by
>abstraction that you consider exploiting well-known facts about the
>way the computer really is to be cheating.

No, I do not mean it this way. I only want to point out that there is a
relation between the efficiency of arrays in C and the architecture of
hardware. I do not see any hardware with a different kind of memory that is not
organized in rows and columns. -- C is closer to the structure of the hardware
than Caml; for example you have addresses and you can even calculate with them.
Because of this it is simpler to exploit this specific structure, and get
faster running code.

>   ocaml is a great language precisely because it doesn't disdain to
>get its hands dirty---it knows very well what you have to do to solve
>real problems on real computers.  For me, the kind of elegance and
>beauty you want in a language comes not from constructing castles in
>the air, but from using abstract ideas to understand the real world
>better.  ocaml says "look, this is what you really mean when you write
>machine code".

I agree only partly. Most of the really "dirty" work is hidden, the programmer
does not see it. I like this, and I must praise the Caml developers that they
refuse to break holes into the abstraction (e.g. by allowing uninitialized
arrays). The point where I disagree is that I do not want to write machine
code, and that I do not recognize where Caml explains the semantics of machine
code. For example, I cannot even imagine an assembler program that uses
closures (paraphrased by machine instructions); there is always a much simpler
way to get the same effect. (On the other hand, closures are no castles in
the air, but very useful.) People want to solve real problems on real computers,
but I would stress that they want to solve PROBLEMS. It is often not so
important how much the hardware costs because software is much more expensive;
I'm currently woking for a project with much Perl and Java code, and they
simply bought a server with 12 CPUs and 4 Gig RAM. I do not want to defend
this. I like Caml because it does not waste resources, and because it shows how
cheap abstraction can be.

>   Incidentally the example you give of C strings is another case
>where on reflection you find that the "unsophisticated" way is not so
>bad ...

Because it is simple to handle, and not so error-prone. But it is not the
fastest way to manage strings, and it is simple to program really slow
algorithms which are a linear factor slower than necessary. It depends on the
application which way is better. I only want to point out that the language
means which can be managed conveniently in C do not automatically lead to
efficient code, and that Caml has a chance to compete with C.

>
> > My first example looks now far better, because you have forgotten
> > to count the function calls to hide the array implementation. In
> > Caml, they are not necessary.
>
>Er, what function calls?  In most Cs I can explicitly mark them
>"inline" (which I can't in ocaml (hint!!!)).  Let's try and bear in
>mind that good C can be surprisingly readable.

Inlining can have a contradictory effect because the cache efficiency decreases.
It is difficult for compilers to decide whether better to inline or not. This
means that even if you inline functions there is some additional time you did
not mention.

I have done some benchmarks in the meantime: A C program collecting ten
millions integers in a growing array versus a Caml program collecting the same
amount of integers in a list. The results:

- The unoptimized C program: 1.1 sec
- The optimized C program: 0.5 sec
- The Caml program with standard GC settings: 10.3 sec
- The Caml program with best GC settings (i.e. GC never happens): 0.28 sec

This means that memory management has far more impact on the result than any
other detail discussed so far. And it is very difficult to compare memory
management of a typical libc and Caml's GC because it depends on the whole
application which effects turn out to be important and which effects are
negligible. In my benchmark the C program had an advantage because the simpler 
memory management matches better the simple situation; in a complex real world
situation with fragmented memory the C program would be much slower.

> > Caml is simply not good at arrays.
>
>No, it's fine actually, that's one of the reasons why it's so great.

See the discussion about uninitialized arrays. I think Caml's arrays are ok,
but it is a bad idea to use them in the same universal way as in C. They are
rather slow when you try to implement growing buffers with them; often a
mixture of techniques (e.g. lists of arrays) is best. I like Caml because I can
combine functional and imperative structures.

> > I think it is not a good idea to adopt too much imperative style.
>
>Equally, it's not a good idea to adopt too much ref-trans style, and
>ocaml, thank heavens, supports both very well.  Imperative programming
>is either better from an efficiency point of view, or stylistically
>closer to the best way of thinking about the problem, or both.

I agree, and I often program in an imperative way because of the reasons you
mention. But in Caml imperative programming should not be preferred because
the compiled code is better, but because the algorithm demands it, i.e. there
are many algorithms which work magnitudes better when programmed imperatively.
Think of the thousands of books explaining array algorithms; there are often no
functional counterparts that can compete.

I think that one of the most important advantages of Caml is that there is no
reason to avoid functional programming because of bad translation to machine
code. For many tasks, both styles are equally well; sometimes functional
style is more readable, sometimes imperative style. In some cases you need
imperative style because of theoretical complexity, and (more often) because of
better control of side-effects.

Gerd

----------------------------------------------------------------------------

The Caml program:

let rec collect l n =
  if n < 10000000 then
     collect (n::l) (n+1)
  else
     l
;;

collect [] 0;;

----------------------------------------------------------------------------

The C program:

#define INIT_SIZE 1

#include <stdlib.h>

typedef struct collection {
    int n_used;
    int n_size;
    int *a;
} collection;

void add(collection *c, int x) {
    int used, size;
    int *b;
    if ((used = c->n_used) >= c->n_size) {
	size = c->n_size;
	b = (int *) malloc((size << 1)*sizeof(int));
	if (b == NULL) exit(1);
	memcpy(b, c->a, used*sizeof(int));
	free(c->a);
	c->a = b;
	c->n_size <<= 1; 
    };
    c->a[used] = x;
    c->n_used++;
}

collection *make(void) {
    collection *c;
    c = malloc(sizeof(struct collection));
    if (c == NULL) exit(1);
    c->a = malloc(INIT_SIZE * sizeof(int));
    if (c->a == NULL) exit(1);
    c->n_size = INIT_SIZE;
    c->n_used = 0;
    return c;
}

int main () {
    int i;
    collection *c;
    c = make();
    for (i=0; i<10000000; i++) 
	add(c,i);
    return 0;
}

--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07 10:46       ` William Chesters
@ 1999-10-07 15:48         ` Pierre Weis
  1999-10-07 19:21         ` Gerd Stolpmann
  1 sibling, 0 replies; 32+ messages in thread
From: Pierre Weis @ 1999-10-07 15:48 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

> Equally, it's not a good idea to adopt too much ref-trans style, and
> ocaml, thank heavens, supports both very well.  Imperative programming
> is either better from an efficiency point of view, or stylistically
> closer to the best way of thinking about the problem, or both.

Yes, Caml was designed to provide as good support as possible for both
functional and imperative programming style, and to encourage
a smooth interaction between the two styles in user's programs.

To be precise, we claim for decades now, that the power of Caml is
its ability to handle a free mixture of various programming styles.

That's why the logo I prefer for Caml is a Yin-Yang sign where the dots
are replaced by a $\lambda$ and a \verb":="...
(See http://cristal.inria.fr/~weis/books-fra.html to get an idea of
the image).

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-06 15:21   ` William Chesters
  1999-10-06 22:49     ` Gerd Stolpmann
@ 1999-10-07 15:25     ` Markus Mottl
  1 sibling, 0 replies; 32+ messages in thread
From: Markus Mottl @ 1999-10-07 15:25 UTC (permalink / raw)
  To: William Chesters; +Cc: OCAML

>    Incidentally, implementing a Vector in ocaml is slightly fiddly,
> because you have to keep valid pointers of the right type in even the
> unused part all the time.  This means o.a. delaying the creation of
> the underlying array until you have at least one element to put in it.

I have just finished implementing a fully featured efficient automatically
resizing array which gets rid of this problem (by using Obj.magic
internally, but the user of this module won't see this outside). Another
version treats integer and float arrays differently to exploit the fact
that values in it are unboxed. Even resizable strings can be used with it
(space and time efficiently).

You can also use customized resizing strategies which you can change
at runtime.

I am nearly ready testing the code now and will write some documentation
on it. I hope that it can be released sometime next week.

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
@ 1999-10-07 13:00 STARYNKEVITCH Basile
  0 siblings, 0 replies; 32+ messages in thread
From: STARYNKEVITCH Basile @ 1999-10-07 13:00 UTC (permalink / raw)
  To: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 2856 bytes --]

in september 1998, on a previous job (in the same organization, CEA,
but in a different department), I did work (several months) on
benchmarking various langages and implementations. My former collegue
Emmanuel Dorlet asked me to mail some timing results.

My tiny benchmark was related to computing the sum of inverses of
integers (a serie converging to pi^2/6 as we all know)

let onsquare i j =
  let ii = float i
  and jj = float j
  in ii /. ( jj *. jj)
;;

let test2 n p =
  for i=1 to p do
    let rec loop j s =
      if (j <= 0) then s
      else loop (j-1) (s +. onsquare i j)
    in
    let s = loop n 0.0 in
    Format.printf "n=%d i=%d s=%f\n" n i s
  done
;;

On that time I benched ocaml 1.07 or 2.00. Plateform was a PC/Linux
(Redhat-5.2, Pentium II 300? MHz). Here are the results, mesuring the
internal iteration time (running equivalent of test2 1000000 50, or
test2 10000 50 for slow implementations), which is the total CPU time
divided by n*p

Notice that this benchmark is a numerical test, and numbers is a
domain where Ocaml is rather poor (integers are tagged, and floating
point numbers are often boxed)

I tested ocaml in an imperative (for loop with reference) and
functional (terminal recursion)

langage impl.      time/iteration    slowdown/C
                    CPU microsec

Fortran77 compiled       0.124µs      0.95     
C compiled               0.131µs      1    
Ocaml1.07 compiled       0.408µs      3.11    
Java {JDK1.1.5}          1.31µs       10    
Ocaml {1.07 fonctionnal} 1.92µs       14.66   
Ocaml {1.07 impérative}  2.014µs      15.37   
Ocaml {2.00 fonctionnal} 2.112µs      16.12   
Lua 3.1                  3.43µs       26.18   
Scheme {vscm 0.4}        16.64µs      127   
Python 1.5               17.93µs      137   
Perl 5.003               18.62µs      142   
Scheme {guile 1.2}       34.37µs      262   
Tcl8.0                   158.3µs      1208   
Gibiane                  2656µs       20275  

For your information, Gibiane was an internal scripting langage
(simliar to Tcl, implemented in a Fortran dialect) which should now be
dismissed.

My personal opinion is that there are domain (such as tree
manipulation) where Ocaml outperform C or C++ (but that did not
convince my current boss).

Choosing Ocaml is not a technical issue. It is a managerial and
psychological issue. I am bad on both.

Regards.

N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.

---------------------------------------------------------------------
Basile STARYNKEVITCH   ----  Commissariat à l Energie Atomique 
DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France
phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53
email: Basile point Starynkevitch at cea point fr 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-07  6:56   ` skaller
@ 1999-10-07 12:37     ` Xavier Urbain
  1999-10-07 22:18     ` Gerd Stolpmann
  1 sibling, 0 replies; 32+ messages in thread
From: Xavier Urbain @ 1999-10-07 12:37 UTC (permalink / raw)
  To: caml-list

While talking about efficiency of ocaml versus C, I have to say that
as JC-Filliatre said before concerning gmp and num, we tried several
algorithms in order to compute huge fibonacci numbers. We are as
efficient (and in one case much more) as C with nice readable code as
a bonus. Actually most of the time is spent in gmp (far better than num
in THAT case) so... I should put those files on my web page.

Concerning other problems like "solitaire" solver or emacs' mpuz
solver (without any extenal library) we have quite comparable times.

I strongly agree with Gerd Stolpmann when he write that ocaml offer
the opportunity of coding directly (I should add "naturally") more
sophisticated algorithms.

Finally remember that the ocamlopt compiler makes NO OPTIMIZATION (not
even multiplication by constant).
Try then C code compiled without flags such as -O3
-fomit-frame-pointer and so on.

  Xavier
-- 

Xavier Urbain		
---------------------------------------------------------------
L.R.I., Bât 490	                   mailto: Xavier.Urbain@lri.fr
Université de Paris-Sud	           phoneto:  (33) 1 69 15 42 32
F-91405 Orsay cedex                faxto:    (33) 1 69 15 65 86

http://www.lri.fr/Francais/Recherche/demons/membres/urbain.html
---------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-06 22:49     ` Gerd Stolpmann
  1999-10-07 10:26       ` Michel Quercia
@ 1999-10-07 10:46       ` William Chesters
  1999-10-07 15:48         ` Pierre Weis
  1999-10-07 19:21         ` Gerd Stolpmann
  1 sibling, 2 replies; 32+ messages in thread
From: William Chesters @ 1999-10-07 10:46 UTC (permalink / raw)
  To: caml-list

Gerd Stolpmann writes:
 > The basis of your argumentation is that the array solution works
 > better with current hardware and typical problem sizes. For
 > example, memcpy is very fast because it is specially supported by
 > the hardware, much faster than initializing the same number of
 > elements of a list. This is absolutely true, but I think that it
 > only demonstrates that my example was too simple.

... whereas I believe that it is merely part of a much more general
point.  You almost sound like you think the efficiency of arrays
should be discounted because it only applies "with current hardware
and typical problem sizes".  It's dangerous to get so carried away by
abstraction that you consider exploiting well-known facts about the
way the computer really is to be cheating.

   ocaml is a great language precisely because it doesn't disdain to
get its hands dirty---it knows very well what you have to do to solve
real problems on real computers.  For me, the kind of elegance and
beauty you want in a language comes not from constructing castles in
the air, but from using abstract ideas to understand the real world
better.  ocaml says "look, this is what you really mean when you write
machine code".

   Incidentally the example you give of C strings is another case
where on reflection you find that the "unsophisticated" way is not so
bad ...

 > My first example looks now far better, because you have forgotten
 > to count the function calls to hide the array implementation. In
 > Caml, they are not necessary.

Er, what function calls?  In most Cs I can explicitly mark them
"inline" (which I can't in ocaml (hint!!!)).  Let's try and bear in
mind that good C can be surprisingly readable.

 > Caml is simply not good at arrays.

No, it's fine actually, that's one of the reasons why it's so great.

 > I think it is not a good idea to adopt too much imperative style.

Equally, it's not a good idea to adopt too much ref-trans style, and
ocaml, thank heavens, supports both very well.  Imperative programming
is either better from an efficiency point of view, or stylistically
closer to the best way of thinking about the problem, or both.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-06 22:49     ` Gerd Stolpmann
@ 1999-10-07 10:26       ` Michel Quercia
  1999-10-07 10:46       ` William Chesters
  1 sibling, 0 replies; 32+ messages in thread
From: Michel Quercia @ 1999-10-07 10:26 UTC (permalink / raw)
  To: caml-list

William Chesters :
:   Incidentally, implementing a Vector in ocaml is slightly fiddly,
: because you have to keep valid pointers of the right type in even the
: unused part all the time.  This means o.a. delaying the creation of
: the underlying array until you have at least one element to put in it.

I disagree, better fill-in the vector with dummy values on creation (fe.
Val_unit), otherwise you may keep pointers to data that can not be reclaimed,
though semantically unreachable.

Gerd.Stolpmann :
: Caml is simply not good at arrays. I think it is not a good idea to adopt too
: much imperative style.

Please, let the user decide by himself if he wants to write imperative or
functionnal code. Depending on the particular problem he his solving, his own
preferences and sometimes also the weather of the day, he may choose one or the
other style or even mix both. Why not ?

--
Michel Quercia
9/11 rue du grand rabbin Haguenauer, 54000 Nancy
http://pauillac.inria.fr/~quercia
mailto:quercia@cal.enst.fr

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-05 20:20 ` Gerd Stolpmann
  1999-10-06 15:21   ` William Chesters
@ 1999-10-07  6:56   ` skaller
  1999-10-07 12:37     ` Xavier Urbain
  1999-10-07 22:18     ` Gerd Stolpmann
  1999-10-08 13:40   ` Anton Moscal
  2 siblings, 2 replies; 32+ messages in thread
From: skaller @ 1999-10-07  6:56 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: caml-list

Gerd Stolpmann wrote:
> 
> On Sun, 03 Oct 1999, Jan Brosius wrote:
> >Hi, is there anything known about the efficiency of compiled Ocaml code
> >compared with C/C++
> 
> A good comparision of the low-level features of Ocaml and C are my
> implementations of the cryptographic algorithms Blowfish and DES. Such
> algorithms do many integer calculations, bit shifting, and array lookups, i.e.
> tasks where C is perfect, and Ocaml is not designed for. The manually optimized
> algorithms are 8 to 10 times slower than the C counterparts, and a factor of 2
> can be explained with Ocaml's problems when computing with 32 bit numbers. 

	The factor is likely much worse than two. Doing multiple precision
arithmetic, the factor is two, but even specially optimised simulations
of 32 bit ints by 16 bit ones often involve a more than two
instructions,
and this impacts cache performance severely in tight loops.


> As already pointed out, Ocaml is not designed for such problems, and this
> slow-down factor is an upper limit. 

	That depends on what kinds of data structures you are working with.
Ocaml has some problems requiring many things to be initialised
unnecessarily.
Also, where the library lacks certain facilities, it is occasionally
harder to provide efficient data structures than in C. These problems
are not intrinsic to ocaml, but can be solved by carefully considered
extension
of the standard library.

	There is also a problem with finalisation, which is not supported
by the garbage collector (at the user level): savings made using GC
to reduce code complexity and (possibly) enhance performance can go down
the
drain compared with manual memory management if finalisation is
required.

>The typical application will have a speed
> which is comparable with C/C++ solutions, as there are a number of advantages:
> 
> - The recursive data types of Ocaml are often more problem-oriented than
>   "imperative" data structures. 

	I don't think it is entirely reasonably to claim this; I find that
the 'imperative' data structures are more generally applicable.
Singly linked immutable lists are a special case well supported by ocaml
(and most other functional languages) but their performance is abysmal
for applications for which they are not suited.

>   Consider you want to fill an (C) array with an
>   unknown number of elements. The typical solution puts the elements into the
>   array until it is full, and then enlarges the array by reallocating new
>   memory. If you use the Ocaml lists instead, you do not have this problem. Of
>   course, you can program linked lists in C, too, but they are not as
>   convenient as arrays are, so many people avoid them. 

	This may be true in C, but it is NOT true in C++, in which the
Standard Template Library supports efficient data structures of various
kinds including lists and arrays (vectors).

> - OCaml code expresses the same with fewer lines of code, so it is possible to
>   write more complicated solutions, and to prefer a more sophisticated approach
>   over the straight-forward solution.

	I agree.  Furthermore, make times are very fast, to the point
that very rapid prototyping is possible, yet such code will perform
well even before tuning,  or native code generation.

	In particular, I think that higher order functions simplify coding
enormously, and reduce program size significantly; but variants and
matching are also likely to be faster than the 'hand coded' C/C++
equivalents,
and far more likely to be correct.

> - Memory management is much better; but this counts only for long-running
>   applications. Many C/C++ programs only "malloc" and do not "free", so the
>   time consumption of these functions isn't a problem.

	I do not think 'micky mouse' programs that fail to release
memory are worth considering here. C++ memory management can be
very difficult to get both correct and efficient, and sometimes
implementation details invade the program structure.

	However, C++ allows finalisation and ocaml doesn't,
which is a serious problem in ocaml when it is needed.

	I would very much like to see some discussion by ocaml experts
(not me!) leading to a standard solution to this problem, probably
a combination of some ocaml library code, some changes to the 
garbage collector to ensure efficiency, and some requirements
on the client. I need to call python __del__ methods on classes 
_immediately_ when they become unreferenced, and _some time or other_
when they become otherwise unreachable. 

	A technique for doing this using a strong array of objects,
a weak array of symbols for them, and a map recording symbolic
dependencies will work, but maintaining these data structures
is likely to be difficult to get right, will
obscure the currently clean code (both these problems would probably be
alievated with Haskell style monads), and will duplicate a lot of the
work
which the existing efficient collector does, and do so inefficiently.
I am not happy with this solution. Python uses reference counting,
which is reasonably fast, provides synchronous finalisation, but fails
to handle circular references: I would like to obtain better performance
_and_ handle circular references. This could be done for C Python
using the Boehm collector.

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-06 15:21   ` William Chesters
@ 1999-10-06 22:49     ` Gerd Stolpmann
  1999-10-07 10:26       ` Michel Quercia
  1999-10-07 10:46       ` William Chesters
  1999-10-07 15:25     ` Markus Mottl
  1 sibling, 2 replies; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-06 22:49 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

On Wed, 06 Oct 1999, William Chesters wrote:
>Gerd Stolpmann writes:
> > - The recursive data types of Ocaml are often more problem-oriented than
> >   "imperative" data structures. Consider you want to fill an (C) array with an
> >   unknown number of elements. The typical solution puts the elements into the
> >   array until it is full, and then enlarges the array by reallocating new
> >   memory. If you use the Ocaml lists instead, you do not have this problem.
>
>   But you do have N little problems (N memory allocations and
>structure initialisations); you waste large amounts of memory in
>headers and pointers, thereby filling up your caches; and you end up
>with a data structure which takes at just as long to traverse and
>entirely fails to support any other kind of access.  If you do the
>sums, you will find that an array-based data structure (`Vector')
>works out cheaper any way you care to look at it---as long as you
>increase its capacity exponentially, e.g. doubling it each time it
>runs out.  This stipulation is not a problem because the space you
>thereby waste is still less than the amount you would lose to the
>spine of a list (and is less pernicious since it doesn't have to churn
>through the cache).

The basis of your argumentation is that the array solution works better with
current hardware and typical problem sizes. For example, memcpy is very fast
because it is specially supported by the hardware, much faster than
initializing the same number of elements of a list. This is absolutely true,
but I think that it only demonstrates that my example was too simple.

Caml can only be faster than C if you take into account that programmers are not
perfect. In C, they often do not choose the most efficient data structure
because it would be complicated and/or require an enormous programming
discipline. An example are C strings which are far from being optimal, and the
alternative would be to use a library which implements strings with a length
field. In order to be compatible with the existing kind of strings these
improved strings are typically not abstract, and simply define a struct such
that you can anywhere access their components: just another source of errors.
There are a lot more problems with sophisticated data structures, such as
uninitialized pointers, the question who is responsible for memory deallocation,
and indexes out of the bounds of arrays. Some of these problems can be solved
by additional layers, but then C is not faster any longer. This is always the
same bad alternative in C: Write an efficient program and accept the risk of
errors, or write an average program and have some more protection against your
own imperfectness.

In Caml, choosing sophisticated data structures is less error-prone, and
programmers are more willing to implement and use them. Even the basic data
types (i.e. without abstraction layers) provide often a good protection against
stupid faults, and you can use them directly without wrapping them in functions
or modules. This simply means that a comparable programming style which tries
to avoid errors leads to faster code in Caml than in C, because you do not need
to encapsulate the details of accessing the structure. For example, list
construction x::l leads to very efficient code in Caml, and the comparable
implementation in C requires an additional function abstraction which
guarantees that the new list cell is properly initialized.

I know that there are some hackers who can live without abstraction but the
avarage programmer is happy if he/she has a means to avoid unnecessary errors.

My first example looks now far better, because you have forgotten to count the
function calls to hide the array implementation. In Caml, they are not
necessary.

>   Incidentally, implementing a Vector in ocaml is slightly fiddly,
>because you have to keep valid pointers of the right type in even the
>unused part all the time.  This means o.a. delaying the creation of
>the underlying array until you have at least one element to put in it.

Caml is simply not good at arrays. I think it is not a good idea to adopt too
much imperative style.

Gerd

--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-05 20:20 ` Gerd Stolpmann
@ 1999-10-06 15:21   ` William Chesters
  1999-10-06 22:49     ` Gerd Stolpmann
  1999-10-07 15:25     ` Markus Mottl
  1999-10-07  6:56   ` skaller
  1999-10-08 13:40   ` Anton Moscal
  2 siblings, 2 replies; 32+ messages in thread
From: William Chesters @ 1999-10-06 15:21 UTC (permalink / raw)
  To: Gerd.Stolpmann; +Cc: caml-list

Gerd Stolpmann writes:
 > - The recursive data types of Ocaml are often more problem-oriented than
 >   "imperative" data structures. Consider you want to fill an (C) array with an
 >   unknown number of elements. The typical solution puts the elements into the
 >   array until it is full, and then enlarges the array by reallocating new
 >   memory. If you use the Ocaml lists instead, you do not have this problem.

   But you do have N little problems (N memory allocations and
structure initialisations); you waste large amounts of memory in
headers and pointers, thereby filling up your caches; and you end up
with a data structure which takes at just as long to traverse and
entirely fails to support any other kind of access.  If you do the
sums, you will find that an array-based data structure (`Vector')
works out cheaper any way you care to look at it---as long as you
increase its capacity exponentially, e.g. doubling it each time it
runs out.  This stipulation is not a problem because the space you
thereby waste is still less than the amount you would lose to the
spine of a list (and is less pernicious since it doesn't have to churn
through the cache).

   Incidentally, implementing a Vector in ocaml is slightly fiddly,
because you have to keep valid pointers of the right type in even the
unused part all the time.  This means o.a. delaying the creation of
the underlying array until you have at least one element to put in it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-05 23:22   ` chet
@ 1999-10-06 10:22     ` skaller
  0 siblings, 0 replies; 32+ messages in thread
From: skaller @ 1999-10-06 10:22 UTC (permalink / raw)
  To: chet; +Cc: Jan Brosius, OCAML

chet@watson.ibm.com wrote:
> 
> The Caml XML parser I wrote was competitive with XML4C (slightly faster),
> and blew XML4J out of the water (10x).
> 
> This was using the native-code compiler.
> 
> The Caml XSL processor I wrote handily beat Java XSL processors.
> 
> My guess is that if you're thinking of writing in C, and you don't
> need low-level access to real memory and such, you will find that CAML
> is more than fast enough.

Thanks for the info: I suspected that this was the case,
quite apart from being easier to develop with.

My biggest problem using ocaml is that I'm still a naive user:
I cannot tell easily which operations will be fast and which will not.
In C/C++ I don't have this problem since I understand many of the
ways in which it is implemented.

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-04 21:59 ` skaller
@ 1999-10-05 23:22   ` chet
  1999-10-06 10:22     ` skaller
  0 siblings, 1 reply; 32+ messages in thread
From: chet @ 1999-10-05 23:22 UTC (permalink / raw)
  To: skaller; +Cc: Jan Brosius, OCAML

The Caml XML parser I wrote was competitive with XML4C (slightly faster),
and blew XML4J out of the water (10x).

This was using the native-code compiler.

The Caml XSL processor I wrote handily beat Java XSL processors.

My guess is that if you're thinking of writing in C, and you don't
need low-level access to real memory and such, you will find that CAML
is more than fast enough.

--chet--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-03 21:35 Jan Brosius
  1999-10-04 21:59 ` skaller
@ 1999-10-05 20:20 ` Gerd Stolpmann
  1999-10-06 15:21   ` William Chesters
                     ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Gerd Stolpmann @ 1999-10-05 20:20 UTC (permalink / raw)
  To: caml-list

On Sun, 03 Oct 1999, Jan Brosius wrote:
>Hi, is there anything known about the efficiency of compiled Ocaml code
>compared with C/C++

A good comparision of the low-level features of Ocaml and C are my
implementations of the cryptographic algorithms Blowfish and DES. Such
algorithms do many integer calculations, bit shifting, and array lookups, i.e.
tasks where C is perfect, and Ocaml is not designed for. The manually optimized
algorithms are 8 to 10 times slower than the C counterparts, and a factor of 2
can be explained with Ocaml's problems when computing with 32 bit numbers. To
get around this, all 32 bit calculations are simulated by doing them on two 16
bit halves. If such problems do not play a role, Ocaml is at most 4 to 5 times
slower than C.

As already pointed out, Ocaml is not designed for such problems, and this
slow-down factor is an upper limit. The typical application will have a speed
which is comparable with C/C++ solutions, as there are a number of advantages:

- The recursive data types of Ocaml are often more problem-oriented than
  "imperative" data structures. Consider you want to fill an (C) array with an
  unknown number of elements. The typical solution puts the elements into the
  array until it is full, and then enlarges the array by reallocating new
  memory. If you use the Ocaml lists instead, you do not have this problem. Of
  course, you can program linked lists in C, too, but they are not as
  convenient as arrays are, so many people avoid them. I think this is a very
  common problem in C where algorithms on static structures are cheap, and
  algorithms working with dynamic structures are either complicated (but fast),
  or calculate the same several times (but are simple to understand), or use
  sophisticated but heavy-weight libraries (with a complex interface). This is
  a bad alternative. Considering all of simplicity, time and space comsumption,
  I would say that Ocaml scales better.

- OCaml code expresses the same with fewer lines of code, so it is possible to
  write more complicated solutions, and to prefer a more sophisticated approach 
  over the straight-forward solution.

- Memory management is much better; but this counts only for long-running
  applications. Many C/C++ programs only "malloc" and do not "free", so the
  time consumption of these functions isn't a problem.

Gerd

--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: speed versus C
  1999-10-03 21:35 Jan Brosius
@ 1999-10-04 21:59 ` skaller
  1999-10-05 23:22   ` chet
  1999-10-05 20:20 ` Gerd Stolpmann
  1 sibling, 1 reply; 32+ messages in thread
From: skaller @ 1999-10-04 21:59 UTC (permalink / raw)
  To: Jan Brosius; +Cc: OCAML

Jan Brosius wrote:
> 
> Hi, is there anything known about the efficiency of compiled Ocaml code
> compared with C/C++

For what it is worth, I am writing an extensive test in the form of
a Python interpreter. C Python is reasonably efficient for certain
operations. The current Ocaml implementation is only 10 times slower
than the C version, running on the pystone benchmark,
and I expect that will be reduced considerably
as I improve the algorithms and data structures.

While an interpreter isn't a benchmark, it is a useful performance
measure because it exercises so much of the language, and in ways
determined partly by the code being interpreted.

It is my guess that, partly due to the higher level nature of ocaml
compared with C, it is possible to encode more optimisations
in the ocaml version than the C version, so that the ocaml
version may even run faster than the C one (simply because
encoding the pattern recognition for the cases in C is
hard to program).

BTW: I would like to see performance (Order) data for library functions.
It is difficult to chose a data structure without knowing
how fast various operations can be expected to perform.
For example, how fast are insertions into Set and Map?
In procedural code, these would be amortized constant time,
or at worst O(log n), by using hashtables or some efficient
tree representation, but the ocaml library versions
are FUNCTIONAL so I have no idea how fast the functional updates
are compared, with say, the in place modification of a hashtable.

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller

^ permalink raw reply	[flat|nested] 32+ messages in thread

* speed versus C
@ 1999-10-03 21:35 Jan Brosius
  1999-10-04 21:59 ` skaller
  1999-10-05 20:20 ` Gerd Stolpmann
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Brosius @ 1999-10-03 21:35 UTC (permalink / raw)
  To: OCAML

Hi, is there anything known about the efficiency of compiled Ocaml code
compared with C/C++

Thanks

Jan




^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~1999-10-13  7:11 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.03.9910081713230.31666-100001@post.tepkom.ru>
1999-10-10  4:51 ` speed versus C skaller
1999-10-11  9:08   ` Anton Moscal
1999-10-12 13:21 Damien Doligez
1999-10-12 20:42 ` skaller
  -- strict thread matches above, loose matches on Subject: below --
1999-10-08  6:57 Pascal Brisset
1999-10-07 13:00 STARYNKEVITCH Basile
1999-10-03 21:35 Jan Brosius
1999-10-04 21:59 ` skaller
1999-10-05 23:22   ` chet
1999-10-06 10:22     ` skaller
1999-10-05 20:20 ` Gerd Stolpmann
1999-10-06 15:21   ` William Chesters
1999-10-06 22:49     ` Gerd Stolpmann
1999-10-07 10:26       ` Michel Quercia
1999-10-07 10:46       ` William Chesters
1999-10-07 15:48         ` Pierre Weis
1999-10-07 19:21         ` Gerd Stolpmann
1999-10-08  0:26           ` William Chesters
1999-10-10 16:27             ` Gerd Stolpmann
1999-10-10 20:48               ` William Chesters
1999-10-10 23:54                 ` Alain Frisch
1999-10-11 17:58                   ` William Chesters
1999-10-11 19:32                   ` John Prevost
1999-10-11 20:50                 ` Gerd Stolpmann
1999-10-12 20:07                   ` skaller
1999-10-08  9:56           ` Pierre Weis
1999-10-07 15:25     ` Markus Mottl
1999-10-07  6:56   ` skaller
1999-10-07 12:37     ` Xavier Urbain
1999-10-07 22:18     ` Gerd Stolpmann
1999-10-08 19:15       ` skaller
1999-10-08 13:40   ` Anton Moscal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).