> From: owner-caml-list@pauillac.inria.fr [mailto:owner-caml-
> list@pauillac.inria.fr] On Behalf Of Remi VANICAT
> Sent: Monday, June 10, 2002 2:50 PM
> To: caml-list@inria.fr
> Subject: Re: [Caml-list] Timing Ocaml
> 
> Dmitry Bely <dbely@mail.ru> writes:
> 
> > Chris Hecker <checker@d6.com> writes:
> >
> > >>* The GNU C compiler gcc is recommended, as the bytecode
> > >>   interpreter takes advantage of gcc-specific features to enhance
> > >>   performance.
> > >>What is the nature of these optimizations?
> > >
> > > GCC lets you take the address of a label.  You can see in
> > > byterun/interp.c that it uses a jump table instead of a switch
when
> > > you're using GCC.
> > >
> > > At least, that's what it looks like.
> >
> > I would rather say that gcc allows to force register allocation for
some
> > specific variable, while MSVC always ignore "register" specifier.

	No, that's not the problem. MSVC is usually very good at
register allocation.

> >
> > #if defined(__GNUC__) && !defined(DEBUG)
> > [...]
> > #ifdef __i386__
> > #define PC_REG asm("%esi")
> > #define SP_REG asm("%edi")
> > #define ACCU_REG
> > #endif
> > [...]
> > #endif
> 
> well, it seem that threaded code also depend of being compile with
> gcc:
> 
> #if defined(__GNUC__) && __GNUC__ >= 2 && !defined(DEBUG) && !defined
> (SHRINKED_
> GNUC)
> #define THREADED_CODE
> #endif
> 
> so both register assignation and threaded code can imply a lot of
> speedup.

	If you look at the generated code, you can see that MSVC uses
registers very efficiently, and that the difference comes only from the
threaded code. Mainly, it is forced to do two nearly successive jumps,
and I think that this causes some pipeline problem in modern processors.

	If you check that bytecode ops are valid before the execution,
and if you use __assume(0) as the default case, you can gain about 10%
in execution speed, but the two successive jumps are still there.

	I don't know what MSVC 7 does, but I'd be interested.

-- 
  Lionel Fourquaux