9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] HBench:OS on Plan9 some results and a comparison with Linux
@ 1999-01-25 22:23 
  0 siblings, 0 replies; 2+ messages in thread
From:  @ 1999-01-25 22:23 UTC (permalink / raw)


    Hbench:OS on Plan9 - a comparison with Linux ( preliminary report )

HBench:OS ( HBench for short) is a suite of portable micro-benchmarks
[Brown et al 97] based on the well known lmbench benchmark [Mc Voy &
Staelin 96].
The main purpose of these benchmarks is to measure the system's hardware
memory capabilities ( bandwidth and latency ) and the bandwidth and
latency of some key Operating System primitives and functions. The later
ones measure the fine tuning of the C compiler implementation of the OS
primitives with the underlying memory hardware components. The
benchmarks isolate other I/O dependent operations such as disk and
network drivers�s speed. These will be covered by other benchmarks which
will also be part of this study.
The benchmarks were run on Plan9 and Linux (Redhat 5.1) installed on the
same machine: a vanilla 200Mhz 64MB Pentium with a 3GB Quantum IDE disk,
with exactly the same source code: a C program driven by a shell script.
This should  eliminate any hardware and software discrepancies (one
memory test, that used the Unix mmap() function, inexistant on Plan 9,
was discarded from the benchmark; see also comments about the APE layer
below).

Contrary to our expectations, the results were ( sometimes very)
unfavorable to Plan9 in the great majority of the tests. This is the
main reason for posting a summary of these results to the 9Fans list: we
would like to get some feedback from knowledgeable Plan9 users and
implementors before releasing these results to a wider audience.
The first question a skeptical reader could ask is : how reliable are
these figures? We believe they are quite reliable for the following
reasons:
1)HBench runs each benchmark for at least 1 second, looping tipically
tens to hundreds of  thousand times in each measurement.
2)HBench can measure time using the Pentium Hw Counters with one clock
resolution (the RDTSC instruction returns a 64 bit current clock counter
); all measurements agreed with the conventional, but less precise, use
of the gettimeofday() function.
3) each benchmark is run niter times ( a user given parameter), and the
measurements standard deviation is computed( usually less than 1% of the
average). We chose niter = 8.
4)HBench does two nice adjustments to its calculations: it computes the
timing overhead of each loop and it runs the benchmark once before
entering the measured loop in order to fill the CPU and memory caches.
The benchmarks ( about 20 different tests ) fall in 2 categories:
bandwidth and latency benchmarks. A complete description of each
benchmark can be found in the excellent lmbench paper, and a
summary,taken from [Brown et al 97], in
www.dcc.unicamp.br/~celio/plan9/benchmarks
Five illustrative benchmarks:
. pipe bandwith: measures the bandwidth attainable when transferring
data through a pipe bettween two processes in units of a buffer size
varying from 4KB to 4Mbytes.
. pipe latency : measutes the ammount of time to ping-pong a one-byte
token between 2 processes.
. process creation latency: measures the times to (i) fork() a null
process , (ii) fork and exec a hello-world program (iii) uses /bin/sh to
execute hello-world.
. system call latency: measures the latency of a few representative
system calls like getpid, gettimeofday, sbrk and write to /dev/null.
. File Reread bandwidth : rereads an 8MB file, alredy cached in the file
system buffer cache.

Comparison with other measurements:
Linux measurements can be found in the lmbench paper [Mc Voy & Staelin
96] and in [Kevin Lay & Mary Baker 96].
The only Plan 9 measurement we know of is the Performance table at the
end of the introductory "Plan 9 from Bell Labs" by Pike et al , in "Plan
9 the Documents" or in [Pike et al 95]: if we consider the 100MHZ MIPS
R4400 as roughly equivalent to our 200MHZ Pentium, the numbers there for
light fork, pipe latency and pipe bandwidth are quite close to our
measurements.
( the rfork(0) system call measurement is misleading, in our opinion).
We were worried about the overhead introduced by the APE layer, needed
to emulate the Unix environment used by the shell script, although
Trickeys paper on APE says "using APE will cause slower compilation and
marginally slower execution speeds"; also, a "Real Plan9 Programmer" may
rigthfully say: "who cares about APE. I only use native Plan9
applications!".
We hand-coded the system call latency benchmarks and the null process
creation benchmark and found a better than 10% improvement of native
plan9 over APE only in the last one ( see table below).
Summary of the results                 Linux   Plan9 APE   Plan9 native
-----------------------
Bandwidth in MBytes/sec
------------------------
Pipe Bandwidth (64KB buffer)              44       17
File reread (8MBfile, 64KB buffer)      34.5     0.52

Latency in microseconds
------------------------
Pipe latency                              22      119
Null process creation                   1283     1482          850
process creation + exec hello-world     6370    11030
getpid latency                          1.08      123          121
sbrk(1024) latency                      3.26     4.42          5.7
write /dev/null latency                 2.16      8.0          6.9

Three last comments:
(i)getpid seems to be cached by libc in many Unix systems, which may
explain the huge difference between Linux and Plan9.
(ii)It looks like Plan9 has a small buffer cache and/or no sequential
file read-ahead. This was somewhat confirmed by running the Bonnie
Benchmark ( www.cs.sunyit.edu/pub/BENCH/bonnie), where our tests showed
that Linux achieved block reading a 100MB file at 5.0 MB/sec while Plan9
achieved 1.3 MB/sec.
(iii) We uncovered a subtle bug on the Plan9 C compiler, while testing
the following Assembler routine that returns the Pentium 64 Bit Hw clock
counter:

#include <unistd.h>
#include "sys9.h"
typedef unsigned long long u_int64_t;
typedef u_int64_t internal_clk_t;
static internal_clk_t start_clk, stop_clk;
start_clk = cycle();    /* starts measuring time */
.... benchmark loop goes here ...
stop_clk= cycle(); /* stops measuring time *
...
unsigned long long cycle(void)
{
 return _RDTSC();
}

#define RDTSC BYTE $0x0F; BYTE $0x31
TEXT _RDTSC(SB), $0
 MOVL .ret+0(FP), CX
 RDTSC
 MOVL AX, 0(CX)
 MOVL DX, 4(CX)
 RET

Symptom: About 5% of the bandwidth measurements returned start_clk >=
stop_clk, which is clearly impossible. The Plan9 print library function
was of no help since it printed wrongly 64 bit values either with the %x
or %lld format rules; the problem went away eliminating the two typedefs
above and redefining start_clk, stop_clk as:
static unsigned long long start_clk, stop_clk;

Acknowledgments:
Many thanks to Russ Cox and Forsyth who helped with sockets, APE and
formatted print.
David Butler also ran Bonnie under Plan9 and passed his results to us.

References:
[Pike et al 95] Rob Pike et al, "Plan 9 from Bell Labs", Computing
Systems, Vol 8, Number 3, Summer 1995, 221-254.

[Brown et al 97] Aaron B. Brown & Margo I. Seltzer, "Operating System
Benchmarking in the Wake of Lmbench: A Case Study of the Performance of
NetBSD on the Intel x86 Architecture", Sigmetrics 1997; also in :
www.eecs.harvard.edu/vino/perf/hbench/index.html.

[Mc Voy & Staelin 96] Larry McVoy & Carl Staedlin, "lmbench: Portable
tools for performance analysis", Proceedings of the 1996 Usenix
Technical Conference, San Diego, CA, Jan 1996, 279-295; also in :
www.bitmover.com/lmbench/lmbench-usenix.ps.gz

[Kevin Lay & Mary Baker 96] Kevin Lay & Mary Baker, " A Performance
Comparison of Unix Operating Systems on the Pentium", Proceedings of the
Usenix Technical Conference, Jan 1996;
also in :
gunpowder.Stanford.EDU./~laik/benchmarks/paper/usenix96.bench.ps.gz.


Celio Guimaraes & Franklin Franca
Institute of Computing
Unicamp, Campinas, Brazil
celio@dcc.unicamp.br
franklin robert araujo franca <973930@dcc.unicamp.br>













^ permalink raw reply	[flat|nested] 2+ messages in thread

* [9fans] HBench:OS on Plan9 some results and a comparison with Linux
@ 1999-01-27 10:10 Bengt
  0 siblings, 0 replies; 2+ messages in thread
From: Bengt @ 1999-01-27 10:10 UTC (permalink / raw)


Work on Plan9 stopped around 1994(5?).  Linux is still under development.
I think it natural to find Linux ahead in the area of implementation improvments.

When it comes to concepts and design, my money is still on Plan9.


Best Wishes, Bengt
===============================================================
Everything aforementioned should be regarded as totally private
opinions, and nothing else. bengt@softwell.se
``His great strength is that he is uncompromising. It would make
him physically ill to think of programming in C++.''




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~1999-01-27 10:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-25 22:23 [9fans] HBench:OS on Plan9 some results and a comparison with Linux 
1999-01-27 10:10 Bengt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).