caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] comparison with C performance
  2003-05-01 18:38     ` Lex Stein
@ 2003-04-27 19:04       ` Chet Murthy
  2003-05-01 19:08       ` Brian Hurt
  1 sibling, 0 replies; 9+ messages in thread
From: Chet Murthy @ 2003-04-27 19:04 UTC (permalink / raw)
  To: Lex Stein; +Cc: Ocaml Mailing List


Hmmm .. Lex, are you aware of Ensemble?

Mark Hayden basically proved that if you properly manage memory and a
few other things, well, you can be faster than C, unless the C program
is (ahem) trivial.

some more details: Mark showed that for a rather complicated network
protocol stack, a CAML implementation was a *lot* faster than a
highly-optimized C implementation.

The key things he was able to do were:

  (a) since its in ML, you can be a lot more aggressive about
  optimization

  (b) effective memory-management of buffers in ML -- don't just leave
  it to the GC

  (c) serious reliance on the inliner

There were a few other things, but this is a good start.

Cheers,
--chet--

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] OT: Java Performance
@ 2003-05-01 15:27 Brian Hurt
  2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Hurt @ 2003-05-01 15:27 UTC (permalink / raw)
  To: Ocaml Mailing List


Given the number of performance-related discussions in this maillist of 
late, I thought I'd forward this article:
http://www-106.ibm.com/developerworks/java/library/j-jtp04223.html

It's about Java, but I think it's still worthwhile reading for Ocaml 
programmers.  The lesson to learn here is that performance is tricky- what 
you think will obviously be a problem often isn't, and what you think 
won't be a problem can be.  Make it work correctly first, then measure 
performance, then enhance for performance if necessary.

Another comment that applies to both languages is that if it's a glaringly 
obvious problem, the compiler people are probably already working on it 
(or possibly already solved it).

Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] comparison with C performance
  2003-05-01 15:27 [Caml-list] OT: Java Performance Brian Hurt
@ 2003-05-01 17:29 ` Lex Stein
  2003-05-01 17:55   ` Miles Egan
  2003-05-01 19:13   ` Eray Ozkural
  0 siblings, 2 replies; 9+ messages in thread
From: Lex Stein @ 2003-05-01 17:29 UTC (permalink / raw)
  To: Ocaml Mailing List


Hi,

A while ago I built an NFS server in OCaml (BDBFS) and the performance
stunk. It was 10x slower than the BSD in-kernel NFS server for metadata
operations. There was some speculation about what was causing this
slowness. It could have been a number of things. So in order for my
Advisor to let me continue programming in OCaml, I set out to show that it
wasn't due to the choice of OCaml.

The experiment consisted of 10,000 repeated RPC cycles across a 100Mbps
link. An RPC cycle consists of a NULL RPC followed by an RPC with a 20 and
24 byte string that is written to a Berkeley-DB database (via DB->put)
with the 20 bytes as key and 24 bytes as value. The C test never leaves C
code and calls directly into the Berkeley-DB C code. The OCaml test leaves
C above the RPC layer and enters the OCaml world, using the OCaml
Berkeley-DB interface (the one I wrote, I know Yaron Minsky has one too)
to write to the database. The following column shows the time taken by a
client (the same client across all 3 test configurations) to execute
100,000 RPC cycles. I ran the experiment 15 times. The square brackets
contain the standard deviation. The units are seconds.

		Test run at 5:00am 04-27-2003
		100,000 RPC cycs
C shunt: 	22.87s [1.20s]
OCaml shunts:
  bytecode: 	23.87s [0.96s]
  native:       22.20s [0.98s]

The result is that the C and OCaml native and C and OCaml bytecode are
not differentiable, due to the relative standard deviations. The OCaml
bytecode and native are differentiable, being more than one standard
deviation away from each other.

To get back to the original story: this has pointed me in the direction of
improving BDBFS' performance by improving the efficiency of the directory
listing and lookup algorithms rather than changing languages. OCaml seems
to fare just fine against C.

Lex

On Thu, 1 May 2003, Brian Hurt wrote:

>
> Given the number of performance-related discussions in this maillist of
> late, I thought I'd forward this article:
> http://www-106.ibm.com/developerworks/java/library/j-jtp04223.html
>
> It's about Java, but I think it's still worthwhile reading for Ocaml
> programmers.  The lesson to learn here is that performance is tricky- what
> you think will obviously be a problem often isn't, and what you think
> won't be a problem can be.  Make it work correctly first, then measure
> performance, then enhance for performance if necessary.
>
> Another comment that applies to both languages is that if it's a glaringly
> obvious problem, the compiler people are probably already working on it
> (or possibly already solved it).
>
> Brian
>
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein
@ 2003-05-01 17:55   ` Miles Egan
  2003-05-01 18:24     ` Lex Stein
  2003-05-01 18:38     ` Lex Stein
  2003-05-01 19:13   ` Eray Ozkural
  1 sibling, 2 replies; 9+ messages in thread
From: Miles Egan @ 2003-05-01 17:55 UTC (permalink / raw)
  To: Lex Stein; +Cc: Ocaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On Thu, 2003-05-01 at 10:29, Lex Stein wrote:
> Hi,
> 
> A while ago I built an NFS server in OCaml (BDBFS) and the performance
> stunk. It was 10x slower than the BSD in-kernel NFS server for metadata
> operations. There was some speculation about what was causing this
> slowness. It could have been a number of things. So in order for my
> Advisor to let me continue programming in OCaml, I set out to show that it
> wasn't due to the choice of OCaml.

Wouldn't you expect any userspace nfs server to be much slower than the
kernel-based implementation due to the overhead of all the extra
context-switching?

-- 
Miles Egan <miles@caddr.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 17:55   ` Miles Egan
@ 2003-05-01 18:24     ` Lex Stein
  2003-05-01 18:48       ` Miles Egan
  2003-05-01 18:38     ` Lex Stein
  1 sibling, 1 reply; 9+ messages in thread
From: Lex Stein @ 2003-05-01 18:24 UTC (permalink / raw)
  To: Ocaml Mailing List


Yes, there will be additional context switch costs for a user-land
implementation. However, where a disk I/O costs a luxury yacht a context
switch might cost a used bicycle. So I think filesystem designers are in
the position of not worrying about the old bike because it's best to focus
negotiating efforts on the yacht. So I guess the question on our mind was;
is OCaml another luxury yacht?

(With the NFS metadata operations in BDBFS there were synchronous I/O
operations on the path. These will make a context switch insignificant.
Consider the milliseconds required for an I/O.)

To narrow the experiment to isolating the language cost, I eliminated the
synchronous I/O by placing the DB->put()s outside of a transaction, with
no commit. As I'm sure you realised, all of the C and OCaml Native and
Bytecode experiments were run in user-land so all had additional context
switches above a kernel-level implementation. However, given I/O costs in
filesystems, context switch costs are insignificant.

Lex

On Thu, 1 May 2003, Miles Egan wrote:

> On Thu, 2003-05-01 at 10:29, Lex Stein wrote:
> > Hi,
> >
> > A while ago I built an NFS server in OCaml (BDBFS) and the performance
> > stunk. It was 10x slower than the BSD in-kernel NFS server for metadata
> > operations. There was some speculation about what was causing this
> > slowness. It could have been a number of things. So in order for my
> > Advisor to let me continue programming in OCaml, I set out to show that it
> > wasn't due to the choice of OCaml.
>
> Wouldn't you expect any userspace nfs server to be much slower than the
> kernel-based implementation due to the overhead of all the extra
> context-switching?
>
> --
> Miles Egan <miles@caddr.com>
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 17:55   ` Miles Egan
  2003-05-01 18:24     ` Lex Stein
@ 2003-05-01 18:38     ` Lex Stein
  2003-04-27 19:04       ` Chet Murthy
  2003-05-01 19:08       ` Brian Hurt
  1 sibling, 2 replies; 9+ messages in thread
From: Lex Stein @ 2003-05-01 18:38 UTC (permalink / raw)
  To: Ocaml Mailing List


My short answer is: No.

Thanks
Lex

> Wouldn't you expect any userspace nfs server to be much slower than the
> kernel-based implementation due to the overhead of all the extra
> context-switching?
>
> --
> Miles Egan <miles@caddr.com>
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 18:24     ` Lex Stein
@ 2003-05-01 18:48       ` Miles Egan
  0 siblings, 0 replies; 9+ messages in thread
From: Miles Egan @ 2003-05-01 18:48 UTC (permalink / raw)
  To: Lex Stein; +Cc: Ocaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

On Thu, 2003-05-01 at 11:24, Lex Stein wrote:
> Yes, there will be additional context switch costs for a user-land
> implementation. However, where a disk I/O costs a luxury yacht a context
> switch might cost a used bicycle. So I think filesystem designers are in
> the position of not worrying about the old bike because it's best to focus
> negotiating efforts on the yacht. So I guess the question on our mind was;
> is OCaml another luxury yacht?

Your basic argument is reasonable, but I seem to remember one of the
main reasons the previously userland Linux nfs server implementation was
rewritten as a kernel-space server was to improve performance.

Perhaps it's because the typical nfs server serves most of its pages out
of its ram cache so context switches becomes more of an issue?

Anyway, drifting off-topic for this list.

-- 
Miles Egan <miles@caddr.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 18:38     ` Lex Stein
  2003-04-27 19:04       ` Chet Murthy
@ 2003-05-01 19:08       ` Brian Hurt
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Hurt @ 2003-05-01 19:08 UTC (permalink / raw)
  To: Lex Stein; +Cc: Ocaml Mailing List


>From memory, task switches on the 386 were 300-500 clock cycles.  By the 
time of the pentium, the nominal cost of a task switch was ~50 cycles 
IIRC, but this did not include the costs of the TLB and cache flushs.  
Which raised the question of how much work you did after the TLB 
determining how expensive the task switch was (and does those costs count 
to task switch costs anyways?).  I don't beleive it's been signifigantly 
improved since then.

Task switches can be a performance problem.  This is, at heart, the 
problem with microkernel operating systems.  Done "canonically" you are 
task switching constantly.  Especially back in the day of the 
Torvalds-Tannenbaum debate, the task switch cost ate you alive.  The 
successfull microkernels generally did without memory protection- an 
example here is the Amiga kernel.  Microkernel, granted, but no memory 
protection either.  Several realtime OSs do the same stunt.  Or, in a 
slightly less extreme way, you can just move more stuff into the same 
task, reducing the number of task switches you need to make.  This is the 
choice Microsoft made with NT, when they moved the core graphics routines 
into the kernel with NT4.  I find it humorously that the "microkernel" NT 
has graphics in the kernel, while the "monolithic kernel" Linux keeps 
graphics in a user space application (X).  But by pulling functions into 
the same task space, A) you are losing a number of advantages of 
microkernels (for example, a misbehaving driver can now crash the kernel), 
and B) you are starting to look an awful lot like a monolithic kernel.  
The successfull kernels today are actually hybrids of monolithic and 
microkernel, to one extent or another, at this point.

On the other hand, task switching isn't nearly the cost of I/O- disk or
network- which I would expect to dominate.  That being said, limiting task
switches is not the only plausible optimizations an in-kernel NFS server
could implement.  I haven't investigated this code, but some plausible
explanations include interrupt/signal latency, scheduling advantages, few
address mappings/reverse mappings, etc.

Brian
On Thu, 1 May 2003, Lex Stein wrote:

> 
> My short answer is: No.
> 
> Thanks
> Lex
> 
> > Wouldn't you expect any userspace nfs server to be much slower than the
> > kernel-based implementation due to the overhead of all the extra
> > context-switching?
> >
> > --
> > Miles Egan <miles@caddr.com>
> >
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] comparison with C performance
  2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein
  2003-05-01 17:55   ` Miles Egan
@ 2003-05-01 19:13   ` Eray Ozkural
  1 sibling, 0 replies; 9+ messages in thread
From: Eray Ozkural @ 2003-05-01 19:13 UTC (permalink / raw)
  To: Lex Stein, Ocaml Mailing List

On Thursday 01 May 2003 20:29, Lex Stein wrote:
> Hi,
>
> A while ago I built an NFS server in OCaml (BDBFS) and the performance
> stunk. It was 10x slower than the BSD in-kernel NFS server for metadata
> operations. There was some speculation about what was causing this
> slowness. It could have been a number of things. So in order for my
> Advisor to let me continue programming in OCaml, I set out to show that it
> wasn't due to the choice of OCaml.

Too bad you sucked at writing nfs servers ;) Don't worry system-level stuff 
can get frustrating and there are always a stack of architectural issues that 
one must be wary of.

Happy hacking,

-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-05-02  4:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-01 15:27 [Caml-list] OT: Java Performance Brian Hurt
2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein
2003-05-01 17:55   ` Miles Egan
2003-05-01 18:24     ` Lex Stein
2003-05-01 18:48       ` Miles Egan
2003-05-01 18:38     ` Lex Stein
2003-04-27 19:04       ` Chet Murthy
2003-05-01 19:08       ` Brian Hurt
2003-05-01 19:13   ` Eray Ozkural

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).