9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26 15:02 andrey mirtchovski
  2002-02-26 16:02 ` Wilhelm B. Kloke
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: andrey mirtchovski @ 2002-02-26 15:02 UTC (permalink / raw)
  To: 9fans

> The compile time using for the BSD/2.95/no test looks *really* low;
> are we sure about that number?  It's a very strange outlier, isn't it?
> I'll mostly ignore that one, because it is *such* a surprise; GCC
> isn't normally thought to be that fast, but hey, maybe it really is. 
>

several more tests yield exactly the same results -- FBSD 4.5 w/
gcc2.95 takes about 18 seconds to compile.

> A total curiosity is that running the Linux binaries under emulation
> in BSD is *faster* than running the native BSD binaries.  It's hard to
> imagine that the BSD team specially optimized that case, does anyone
> have any knowledge or guesses?  
> 

gcc 3.0 on FBSD was locally compiled and installed (as was noted in
the explanations), had it been taken from a binary package it _must_
have been much faster (all we did was 'make; make install')...

something else i forgot to mention: gcc 3.0 on linux (rpm install)
complains of -m386 being deprecated (-m386 is part of the CFLAGS
switches povray uses)...  gcc3.0 on FBSD did not say anything about
-m386...  I find this very strange.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26 15:02 [9fans] p9/linux/fbsd compiler shootout andrey mirtchovski
@ 2002-02-26 16:02 ` Wilhelm B. Kloke
  2002-02-26 18:00   ` splite
  2002-02-26 16:04 ` Matt H
  2002-03-04 10:04 ` Gaute B Strokkenes
  2 siblings, 1 reply; 14+ messages in thread
From: Wilhelm B. Kloke @ 2002-02-26 16:02 UTC (permalink / raw)
  To: 9fans

In article <20020226150031.A2A4B19A1C@mail.cse.psu.edu>,
andrey mirtchovski <9fans@cse.psu.edu> wrote:
>> The compile time using for the BSD/2.95/no test looks *really* low;
>> are we sure about that number?  It's a very strange outlier, isn't it?
>> I'll mostly ignore that one, because it is *such* a surprise; GCC
>> isn't normally thought to be that fast, but hey, maybe it really is. 
>>
>
>several more tests yield exactly the same results -- FBSD 4.5 w/
>gcc2.95 takes about 18 seconds to compile.

Surprise diminishes, when you take the difference between gcc2.95 and 2.96
into account (linux was tested with the latter). 2.96 is 3.00-beta
(in fact, 2.96 was never recommended for use and made it into some
Linux distributions only accidentally.)
-- 
Dipl.-Math. Wilhelm Bernhard Kloke
Institut fuer Arbeitsphysiologie an der Universitaet Dortmund
Ardeystrasse 67, D-44139 Dortmund, Tel. 0231-1084-257


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26 15:02 [9fans] p9/linux/fbsd compiler shootout andrey mirtchovski
  2002-02-26 16:02 ` Wilhelm B. Kloke
@ 2002-02-26 16:04 ` Matt H
  2002-03-04 10:04 ` Gaute B Strokkenes
  2 siblings, 0 replies; 14+ messages in thread
From: Matt H @ 2002-02-26 16:04 UTC (permalink / raw)
  To: 9fans

On Tue, 26 Feb 2002 08:02:54 -0700
"andrey mirtchovski" <andrey@lanl.gov> wrote:

>  had it been taken from a binary package it _must_
> have been much faster (all we did was 'make; make install')...

I would expect the opposite (depending on the default optimisations you
have set up for your compilation environment)

The binary package may well be targetted to 486 to make sure that when you
run it on your 486 it doesn't try to use Pentium or Pentium Pro
instructions.

I'm not totally sure but make; make install will surely do a configure to see
what processor it's running on etc.

M
(who always compiles ports rather than installing packages)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26 16:02 ` Wilhelm B. Kloke
@ 2002-02-26 18:00   ` splite
  0 siblings, 0 replies; 14+ messages in thread
From: splite @ 2002-02-26 18:00 UTC (permalink / raw)
  To: 9fans

On Tue, Feb 26, 2002 at 04:02:05PM +0000, Wilhelm B. Kloke wrote:
> 
> (in fact, [gcc] 2.96 was never recommended for use and made it into some
> Linux distributions only accidentally.)

OT, but this isn't true.  2.96 is a Red Hat creation and they swear by it.
Everyone else swears at it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26 15:02 [9fans] p9/linux/fbsd compiler shootout andrey mirtchovski
  2002-02-26 16:02 ` Wilhelm B. Kloke
  2002-02-26 16:04 ` Matt H
@ 2002-03-04 10:04 ` Gaute B Strokkenes
  2 siblings, 0 replies; 14+ messages in thread
From: Gaute B Strokkenes @ 2002-03-04 10:04 UTC (permalink / raw)
  To: 9fans

On Tue, 26 Feb 2002, andrey@lanl.gov wrote:
>> The compile time using for the BSD/2.95/no test looks *really* low;
>> are we sure about that number?  It's a very strange outlier, isn't
>> it?  I'll mostly ignore that one, because it is *such* a surprise;
>> GCC isn't normally thought to be that fast, but hey, maybe it
>> really is.
>>
> 
> several more tests yield exactly the same results -- FBSD 4.5 w/
> gcc2.95 takes about 18 seconds to compile.
> 
>> A total curiosity is that running the Linux binaries under
>> emulation in BSD is *faster* than running the native BSD binaries.
>> It's hard to imagine that the BSD team specially optimized that
>> case, does anyone have any knowledge or guesses?
>> 
> 
> gcc 3.0 on FBSD was locally compiled and installed (as was noted in
> the explanations), had it been taken from a binary package it _must_
> have been much faster (all we did was 'make; make install')...

Read the GCC installation manual: you're supposed to say "make
bootstrap", not just "make".  IIRC if you just use plain "make" what
you get is a compiler that is built with the system default compiler.
If you use "make bootstrap", then GCC is first built with the system
compiler, then that compiler is used to compile GCC again.  Then that
compiler builds GCC a third and final time.  The last two compilers
are compared byte-for-byte; they should be equal since GCC should
produce identical output independent of what compiler it is built
with.

-- 
Gaute Strokkenes                        http://www.srcf.ucam.org/~gs234/
I joined scientology at a garage sale!!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26 14:20 rob pike
@ 2002-02-26 16:07 ` Sean Quinlan
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Quinlan @ 2002-02-26 16:07 UTC (permalink / raw)
  To: 9fans

8c does almost no registerisation for floating point.
The floating point stack is used as a stack; i.e. the code for 
	y = y + x
looks like

	FMOVD	y+-16(SP),F0

	FADDD	x+-8(SP),F0
	
FMOVDP	F0,y+-16(SP)


even in the inner loops.  If you care about floating point on the x86,
you have to use a compiler that tries to treat the x86 floating point
stack into a register file... this is not easy and even with a lot of effort
and large amounts of hardware assistance, x86 floating point is still
relatively slow.  As rob said, ken never did this... on machines where
the floating point unit uses a register file, ken does a lot better.

seanq

rob pike wrote:
> 
> > Rob, can you say more about what 8c's shortcomings are in
> > floating point?
> 
> Ken just didn't spend much time on it.  A fair bit of effort was spent
> on integer code for most of the architectures, including x86.  When
> you have finite time (unlike the GCC people, who have infinite people
> and forever to work on it, 8c was a one-man operation) you choose
> where to spend it.
> 
> -rob


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26 14:20 rob pike
  2002-02-26 16:07 ` Sean Quinlan
  0 siblings, 1 reply; 14+ messages in thread
From: rob pike @ 2002-02-26 14:20 UTC (permalink / raw)
  To: 9fans

> Rob, can you say more about what 8c's shortcomings are in
> floating point?

Ken just didn't spend much time on it.  A fair bit of effort was spent
on integer code for most of the architectures, including x86.  When
you have finite time (unlike the GCC people, who have infinite people
and forever to work on it, 8c was a one-man operation) you choose
where to spend it.

-rob



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26 14:18 rob pike
  0 siblings, 0 replies; 14+ messages in thread
From: rob pike @ 2002-02-26 14:18 UTC (permalink / raw)
  To: 9fans

Again, and strongly, there was little effort made to make the Plan 9
C compilers generate good floating point code.  I predict much less
difference using normal, integer code - such as the kernel - which
did receive some optimization attention.

Also, the kernel spends most of its time in memmove and memset,
both of which are written in assembler and therefore are not affected
by the compiler.  I doubt the quality of complation of the kernel has a
minor effect on runtime of the compiler itself.

Russ Cox points out that:

	the linux guys
	decided that the default floating-point
	precision should be 80-bit in-register
	precision but with 64-bit in-memory
	precision.  i realize that this is sometimes
	desirable if you know what you're doing,
	but they made it the default.  so code 
	ends up behaving differently (usually
	incorrectly) based on what gcc registerizes
	and what overflows into memory.

	#include <stdio.h>
	#include <stdlib.h>
	#include <math.h>
	
	void
	main(void)
	{
		double x, y;
	
		x = sqrt(3.0);
		y = sqrt(3.0) - x;
	
		if(y)
			printf("buggered\n");
	}



-rob



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26 11:15 forsyth
  0 siblings, 0 replies; 14+ messages in thread
From: forsyth @ 2002-02-26 11:15 UTC (permalink / raw)
  To: 9fans

>>GCC had to develop a whole special extra codegen unit to manage to do
>>fp on a 386 anything approaching well, and without something like
>>that, it's pretty painful.

the code generators are all custom, although the RISC ones are similar
(and closely related), so that aspect of the peculiar floating-point
isn't troublesome (which isn't to say it tries anything too fancy).
the 680x0 and x86 compilers have a lot of custom code.

the plan 9 c compilers aren't like pcc, which did a simple essentially
one-pass translation (to assembler) with no optimisation.
the compiler does function-level register allocation, addressing mode selection
and broad instruction selection, with local code improvements.
unusually, detailed (machine-level) instruction selection is done by
the linker (which is in a good position to do ARM/Thumb linkage, literal pools,
span-dependent instructions, instruction scheduling, etc.).  only the linker knows
the binary formats of instructions of a given processor or processor variant.
some instructions issued by the compiler don't correspond to real instruction variants.
the assembler is just a front-end to the linker.  the compiler doesn't use the assembler.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26  3:05 andrey mirtchovski
  2002-02-26 10:27 ` Thomas Bushnell, BSG
@ 2002-02-26 10:27 ` Thomas Bushnell, BSG
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Bushnell, BSG @ 2002-02-26 10:27 UTC (permalink / raw)
  To: 9fans

andrey@lanl.gov (andrey mirtchovski) writes:

> pps: funny fact #430995: selecting gcc for installation on rh7.2
> results in installing 220mb worth of dependencies.  one of the
> more-importand ones is x-chat (or at least x-chat related)!

Oh, that's bizarre indeed!

(shameless-plug "Yet another reason to prefer Debian to Red Hat!")

Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26  3:05 andrey mirtchovski
@ 2002-02-26 10:27 ` Thomas Bushnell, BSG
  2002-02-26 10:27 ` Thomas Bushnell, BSG
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Bushnell, BSG @ 2002-02-26 10:27 UTC (permalink / raw)
  To: 9fans


Interesting data!  Thanks for doing the work, there is a fair bit
there to ponder.

I wish we had a test that controls for OS, but I understand that this
is not available.  My instrumentation showed that a fair bit of
preprocessor time was spent looking for and reading in include files.
(Like I said, that was on an HP-300 running BSD [specifically, More
BSD], and I don't know how much that would be relevant to a current
context.)

Some observations from the data posted, however:

The compile time using for the BSD/2.95/no test looks *really* low;
are we sure about that number?  It's a very strange outlier, isn't it?
I'll mostly ignore that one, because it is *such* a surprise; GCC
isn't normally thought to be that fast, but hey, maybe it really is. 

The test confirms that 8c is really fast at doing codegen with a
compile time about one-third that of any of the GCC's.  From the
comments of Plan 9 folks, that was a primary goal in its design, and
it does look (from this brief comparison) like they did a really good
job!

GCC's architecture is strongly geared towards optimization; even if no
optimizing passes are done, a lot of work is set up towards enabling
machine-independent optimization code, which is essentially wasted if
no optimization is done.

However, the speed of the resulting code under GCC seems far
superior.  8c-generated code runs at about half the speed of GCC
optimized code.  

A total curiosity is that running the Linux binaries under emulation
in BSD is *faster* than running the native BSD binaries.  It's hard to
imagine that the BSD team specially optimized that case, does anyone
have any knowledge or guesses?  

A real surprise for me was that non-optimized GCC also outperformed
8c, though less dramatically.  I wonder if that might not be mostly
because of the floating-point intensive nature of the test.  Rob Pike
mentioned that 8c was not so good at fp work, and the 386 especially
requires extra special hair to make fp compilation anything
approaching effecient.  I would expect that on an integer test, we
would not see much difference.

Now, the BSD and Linux kernels that underlay these tests were of
course compiled with optimized GCC, and this also might account for
some of the difference; some of what looks like slowness in the code
8c produced might actually be slowness in the Plan 9 kernel (compiled
by 8c).  I don't know any good way to control for this without porting
GCC to Plan 9 and compiling the kernel with it, or alternatively,
porting 8c to Linux or BSD.

Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
  2002-02-26  3:10 rob pike
@ 2002-02-26 10:26 ` Thomas Bushnell, BSG
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Bushnell, BSG @ 2002-02-26 10:26 UTC (permalink / raw)
  To: 9fans

rob@plan9.bell-labs.com (rob pike) writes:

> Great, thanks.  I'd like to see a similar shootout using a non-floating-point
> application.

Me too.  Rob, can you say more about what 8c's shortcomings are in
floating point?  I think you alluded to them briefly, but I'm
interested in hearing more.  386 floating point is particularly
horrid; is the problem floating point in general, or is it floating
point on freaky fp stack things like the 386 has?

GCC had to develop a whole special extra codegen unit to manage to do
fp on a 386 anything approaching well, and without something like
that, it's pretty painful.

Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26  3:10 rob pike
  2002-02-26 10:26 ` Thomas Bushnell, BSG
  0 siblings, 1 reply; 14+ messages in thread
From: rob pike @ 2002-02-26  3:10 UTC (permalink / raw)
  To: 9fans

Great, thanks.  I'd like to see a similar shootout using a non-floating-point
application.

-rob



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [9fans] p9/linux/fbsd compiler shootout
@ 2002-02-26  3:05 andrey mirtchovski
  2002-02-26 10:27 ` Thomas Bushnell, BSG
  2002-02-26 10:27 ` Thomas Bushnell, BSG
  0 siblings, 2 replies; 14+ messages in thread
From: andrey mirtchovski @ 2002-02-26  3:05 UTC (permalink / raw)
  To: 9fans

Hello,

We spent the day here doing a simple compiler/os/p9-vs-the-world
shootout..  Here's what we found...


Machine:
        A single cpu 800mhz Athlon T-Bird system (k7).  Each operating
system was installed on a separate IBM Deskstar 30GB IDE drive.  The
video card used was NVidia TNT2, but that's not important since only
plan9 ran any graphical interface (note: when attempting to test on P9
without rio the machine hung during ape/psh compilation).

OS: 
       Linux/RedHat 7.2, kernel 2.4.7 (ext3) standard install
       FreeBSD 4.5 fresh install (no softupdates turned on)
       Plan9 installed on-line (with kernel modification to
       accomodate NVidia graphics hardware)

Software: 
        The test was done using POV-Ray graphics/imaging sotware
(www.povray.org)..  We downloaded the latest 3.1 unix sources.  The
makefile for p9 was modified to compile under ape/psh.  We removed
most of the CFLAGS switches, set the compiler to pcc.  The compile
time reported under all OS's and compilers does *not* include time to
compile libpng and zlib support.  We used only POV-Ray - relevant
files.

        Time given is the time it took to render a single frame of a
predefined scene (explanation below).

        We chose to render using radiosity rather than ray-tracing,
since radiosity is more computationaly intensive.  The test is
extremely floating-point heavy (or at least it should be -- after all
it's graphics we're dealing with :).  Each compilation resulted in a
ppm image of size 1440015 bytes + 1910 bytes in an ASCII statistics
file.  POV-Ray keeps rough stats on its own, so the time reported is
the actual time in seconds that POV-Ray thinks it was rendering
(including initial parsing of the scene description file).  All output
was redirected to a file in order to avoid any console IO (this was
not done for the compilation, though.

        The scene description could be found in:
		.../scenes/radios/rad2.pov
	in the POV-Ray distribution.


        Images generated were of size 800x600, in PPM format.  Here is
the string used to render them:
                (plan9 version given, others identical except for
                redirection syntax)
                povray +L/path/to/povray/libs rad2.ini +Irad2.pov +Oscene.ppm +FP +W800 +H600 -GA +GS >[2]out.txt

        For each platform/compiler pair we compiled two binaries -- one using
the standard optimizations found in POV-Ray's makefile, the other
without any optimization switches (e.g.  gcc -c file.c).  Povray uses
the following optimizations:
        -O6
        --finline-functions
        --ffast-math 
        -m386 (deprecated in 3.0)
 
	Each compiler was given three runs, each run is reported. 


Some additional notes: 
        FreeBSD 4.5 does not come with gcc 3.0, neither could it be
found as a binary package.  We had to compile it from the ports
collection, but since it was a simple make; make install deal we got
no optimization for it.  It shows (compare the 3.0 results with the 
same fbsd using gcc 2.95 and a linux binary compiled with 3.0
under emulation)...


--- 

Results (time in seconds):

(please excuse bad formatting -- we elaborated on the idea of putting
everything in an excel spreadsheet, but couldn't find any)

OS/compiler/
optimization		run #1	run #2	run #3	Compile time (mm:ss.ms)
------------------------------------------------
P9/8c/yes(?)		293		293		292		0:19.24

Lnx/3.0.2/yes		150		149		149		0:56.541
BSD/3.0.2/yes		170		170		170		0:55.04
BSD/3.0.2/yes		151		151		151		none (running linux binaries under emulation)

Lnx/3.0.2/no		196		196		196		0:30.052
BSD/3.0.2/no		221		221		220		0:26.49
BSD/3.0.2/no		203		203		203		none (running linux binaries under emulation)

Lnx/2.96/yes		147		147		148		0:56.848
BSD/2.95/yes		177		178		178		0:42.03

Lnx/2.96/no		203		204		203		0:55.620
BSD/2.95/no		208		208		208		0:18.76


-----

regards :)

ps: reading the above forces you to agree not to use the information
here in flammable, non-p9-related and in any way unscientific
discussions :)

pps: funny fact #430995: selecting gcc for installation on rh7.2
results in installing 220mb worth of dependencies.  one of the
more-importand ones is x-chat (or at least x-chat related)!




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-03-04 10:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-02-26 15:02 [9fans] p9/linux/fbsd compiler shootout andrey mirtchovski
2002-02-26 16:02 ` Wilhelm B. Kloke
2002-02-26 18:00   ` splite
2002-02-26 16:04 ` Matt H
2002-03-04 10:04 ` Gaute B Strokkenes
  -- strict thread matches above, loose matches on Subject: below --
2002-02-26 14:20 rob pike
2002-02-26 16:07 ` Sean Quinlan
2002-02-26 14:18 rob pike
2002-02-26 11:15 forsyth
2002-02-26  3:10 rob pike
2002-02-26 10:26 ` Thomas Bushnell, BSG
2002-02-26  3:05 andrey mirtchovski
2002-02-26 10:27 ` Thomas Bushnell, BSG
2002-02-26 10:27 ` Thomas Bushnell, BSG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).