caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocamlopt code generator question
@ 2009-04-28 19:36 Dmitry Bely
       [not found] ` <m27i13tofi.fsf@Pythagorion.local.i-did-not-set--mail-host-address--so-tickle-me>
  2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
  0 siblings, 2 replies; 24+ messages in thread
From: Dmitry Bely @ 2009-04-28 19:36 UTC (permalink / raw)
  To: Caml List

For amd64 we have in asmcomp/amd64/proc_nt.mlp:

(*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
                                xmm0 - xmm3: C function arguments
                                xmm0: Caml and C function results
                                xmm6-xmm15 are preserved by C *)

let loc_arguments arg =
  calling_conventions 0 9 100 109 outgoing arg
let loc_parameters arg =
  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
let loc_results res =
  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc

What these first_float=100 and last_float=109 for loc_arguments and
loc_parameters affect? My impression is that floats are always passed
boxed, so xmm registers are in fact never used to pass parameters. And
float values are returned as a pointer in eax, not a value in xmm0 as
loc_results would suggest.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt code generator question
       [not found] ` <m27i13tofi.fsf@Pythagorion.local.i-did-not-set--mail-host-address--so-tickle-me>
@ 2009-04-29 16:50   ` Dmitry Bely
  2009-04-29 20:04     ` Jeffrey Scofield
  0 siblings, 1 reply; 24+ messages in thread
From: Dmitry Bely @ 2009-04-29 16:50 UTC (permalink / raw)
  To: Caml List

On Wed, Apr 29, 2009 at 8:28 PM, Jeffrey Scofield
<jeffadm@pythagorion.local> wrote:
> Dmitry Bely <dmitry.bely@gmail.com> writes:
>
>> For amd64 we have in asmcomp/amd64/proc_nt.mlp:
>>
>> (*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
>>                                 xmm0 - xmm3: C function arguments
>>                                 xmm0: Caml and C function results
>>                                 xmm6-xmm15 are preserved by C *)
>>
>> let loc_arguments arg =
>>   calling_conventions 0 9 100 109 outgoing arg
>> let loc_parameters arg =
>>   let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
>> let loc_results res =
>>   let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc
>>
>> What these first_float=100 and last_float=109 for loc_arguments and
>> loc_parameters affect? My impression is that floats are always passed
>> boxed, so xmm registers are in fact never used to pass parameters.
>
> I don't have any experience with amd64, but I have looked at the ARM code
> generator of OCaml 3.10.2.  The first_float and last_float values there are used
> for unboxed calls to internal float functions--most notably, the C standard
> functions like floor().

No - for external C functions loc_external_arguments and
loc_external_results are used. And of course unboxed floats can be
acceptable there. But my question was about loc_arguments and
loc_parameters. E.g. what is the reason to have first_float=100 and
last_float=109 for loc_arguments?

- Dmitry Bely


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt code generator question
  2009-04-29 16:50   ` Dmitry Bely
@ 2009-04-29 20:04     ` Jeffrey Scofield
  0 siblings, 0 replies; 24+ messages in thread
From: Jeffrey Scofield @ 2009-04-29 20:04 UTC (permalink / raw)
  To: caml-list

Dmitry Bely <dmitry.bely@gmail.com> writes:


> No - for external C functions loc_external_arguments and
> loc_external_results are used. And of course unboxed floats can be
> acceptable there.

Whoops, I was obviously thinking of loc_external_arguments.  I should have
looked at the code again before posting!  Sorry for the confusion, and
thanks for the correction.

Regards,

Jeff Scofield
Seattle


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt code generator question
  2009-04-28 19:36 Ocamlopt code generator question Dmitry Bely
       [not found] ` <m27i13tofi.fsf@Pythagorion.local.i-did-not-set--mail-host-address--so-tickle-me>
@ 2009-05-05  9:24 ` Xavier Leroy
  2009-05-05  9:41   ` Dmitry Bely
  1 sibling, 1 reply; 24+ messages in thread
From: Xavier Leroy @ 2009-05-05  9:24 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

> For amd64 we have in asmcomp/amd64/proc_nt.mlp:
> 
> (*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
>                                 xmm0 - xmm3: C function arguments
>                                 xmm0: Caml and C function results
>                                 xmm6-xmm15 are preserved by C *)
> 
> let loc_arguments arg =
>   calling_conventions 0 9 100 109 outgoing arg
> let loc_parameters arg =
>   let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
> let loc_results res =
>   let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc
> 
> What these first_float=100 and last_float=109 for loc_arguments and
> loc_parameters affect? My impression is that floats are always passed
> boxed, so xmm registers are in fact never used to pass parameters. And
> float values are returned as a pointer in eax, not a value in xmm0 as
> loc_results would suggest.

The ocamlopt code generators support unboxed floats as function
parameters and results, as well as returning multiple results in
several registers.  (Except for the x86-32 bits port, because of the
weird floating-point model of this architecture.)  You're right that
the ocamlopt "middle-end" does not currently take advantage of this
possibility, since floats are passed between functions in boxed state.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt code generator question
  2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
@ 2009-05-05  9:41   ` Dmitry Bely
  2009-05-05 14:15     ` Jean-Marc Eber
  2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
  0 siblings, 2 replies; 24+ messages in thread
From: Dmitry Bely @ 2009-05-05  9:41 UTC (permalink / raw)
  To: Caml List

On Tue, May 5, 2009 at 1:24 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>> For amd64 we have in asmcomp/amd64/proc_nt.mlp:
>>
>> (*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
>>                                xmm0 - xmm3: C function arguments
>>                                xmm0: Caml and C function results
>>                                xmm6-xmm15 are preserved by C *)
>>
>> let loc_arguments arg =
>>  calling_conventions 0 9 100 109 outgoing arg
>> let loc_parameters arg =
>>  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
>> let loc_results res =
>>  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc
>>
>> What these first_float=100 and last_float=109 for loc_arguments and
>> loc_parameters affect? My impression is that floats are always passed
>> boxed, so xmm registers are in fact never used to pass parameters. And
>> float values are returned as a pointer in eax, not a value in xmm0 as
>> loc_results would suggest.
>
> The ocamlopt code generators support unboxed floats as function
> parameters and results, as well as returning multiple results in
> several registers.  (Except for the x86-32 bits port, because of the
> weird floating-point model of this architecture.)  You're right that
> the ocamlopt "middle-end" does not currently take advantage of this
> possibility, since floats are passed between functions in boxed state.

I see. Why I asked this: trying to improve floating-point performance
on 32-bit x86 platform I have merged floating-point SSE2 code
generator from amd64 ocamlopt back end to i386 one, making ia32sse2
architecture. It also inlines sqrt() via -ffast-math flag and slightly
optimizes emit_float_test (usually eliminates an extra jump) -
features that are missed in the original amd64 code generator. All
this seems to work OK: beyond my own code all tests found in Ocaml CVS
test directory are passed. Of course this is idea is not new - you had
working IA32+SSE2 back end several years ago [1] but unfortunately
never released it to the public.

Is this of any interest to anybody?

- Dmitry Bely

[1] http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt code generator question
  2009-05-05  9:41   ` Dmitry Bely
@ 2009-05-05 14:15     ` Jean-Marc Eber
  2009-05-05 14:58       ` Sylvain Le Gall
  2009-05-05 15:14       ` [Caml-list] " Jon Harrop
  2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
  1 sibling, 2 replies; 24+ messages in thread
From: Jean-Marc Eber @ 2009-05-05 14:15 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

Hi Dimitry,

LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.

One should probably have the following in mind and/or ask the following questions:

- it is probably not a good idea to support both backends (sse2 and old stack fp 
i386 architecture). It will be necessary to make a choice (especially taking in 
account the limited INRIA resources and the burden of already supporting 
different windows ports).

- would INRIA be ok to switch to a sse2 code generator (based on Dimitry's patch 
- supposing that he is ok to donate it to INRIA - or Xavier's work or whatever)?

- I also guess that a sse2 code generator would be simpler than the current one 
(that has to support this horrible fp stack architecture) and would therefore be 
a better candidate for further enhancements.

- what is the opinion on this list, as a switch to a sse2 backend would exclude 
"old" processors from being OCaml compatible (I don't have a precise list at 
hand for now) ?

My opinion is that this support of legacy hardware is not important, but I guess 
others are arguing in opposite directions... :-)

But again, having better floating point performance (and predictable behaviour, 
compared to the bytecode version) would be a big plus for some applications.

Best regards,

Jean-Marc




Dmitry Bely a écrit :
> 
> I see. Why I asked this: trying to improve floating-point performance
> on 32-bit x86 platform I have merged floating-point SSE2 code
> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
> architecture. It also inlines sqrt() via -ffast-math flag and slightly
> optimizes emit_float_test (usually eliminates an extra jump) -
> features that are missed in the original amd64 code generator. All
> this seems to work OK: beyond my own code all tests found in Ocaml CVS
> test directory are passed. Of course this is idea is not new - you had
> working IA32+SSE2 back end several years ago [1] but unfortunately
> never released it to the public.
> 
> Is this of any interest to anybody?
> 
> - Dmitry Bely
> 
> [1] http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt code generator question
  2009-05-05 14:15     ` Jean-Marc Eber
@ 2009-05-05 14:58       ` Sylvain Le Gall
  2009-05-05 15:21         ` [Caml-list] " David Allsopp
  2009-05-05 15:59         ` Dmitry Bely
  2009-05-05 15:14       ` [Caml-list] " Jon Harrop
  1 sibling, 2 replies; 24+ messages in thread
From: Sylvain Le Gall @ 2009-05-05 14:58 UTC (permalink / raw)
  To: caml-list

On 05-05-2009, Jean-Marc Eber <jeanmarc.eber@lexifi.com> wrote:
> Hi Dimitry,
>
> LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.
>
> One should probably have the following in mind and/or ask the following questions:
>
> - it is probably not a good idea to support both backends (sse2 and old stack fp 
> i386 architecture). It will be necessary to make a choice (especially taking in 
> account the limited INRIA resources and the burden of already supporting 
> different windows ports).
>

Maybe this point can be discussed. I think 3 ports for windows is a bit
too much... I don't know Dimitry point of view, but maybe INRIA can just
consider MSVC (or mingw). If this is a way to free INRIA resources, it
is a good option.

> - would INRIA be ok to switch to a sse2 code generator (based on Dimitry's patch 
>   supposing that he is ok to donate it to INRIA - or Xavier's work or whatever)?
>
> - I also guess that a sse2 code generator would be simpler than the current one 
> (that has to support this horrible fp stack architecture) and would therefore be 
> a better candidate for further enhancements.
>
> - what is the opinion on this list, as a switch to a sse2 backend would exclude 
> "old" processors from being OCaml compatible (I don't have a precise list at 
> hand for now) ?

I would like to say "go on", but SSE2 will limit OCaml to P4 on i386.
In Debian, this is the "low limit" of our build daemon. I think it is
quite dangerous not having the option of the older code generator...

If INRIA choose to switch to SSE2 there should be at least still a way
to compile on older architecture. Doesn't mean that INRIA need to keep
the old code generator, but should provide a simple emulation for it. In
this case, we will have good performance on new arch for float and we
will still be able to compile on old arch. 

>
> My opinion is that this support of legacy hardware is not important, but I guess 
> others are arguing in opposite directions... :-)
>

I would say that "the performance of legacy hardware is not important"
-- support is still important. 

> But again, having better floating point performance (and predictable behaviour, 
> compared to the bytecode version) would be a big plus for some applications.
>

Indeed.

Regards
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt code generator question
  2009-05-05 14:15     ` Jean-Marc Eber
  2009-05-05 14:58       ` Sylvain Le Gall
@ 2009-05-05 15:14       ` Jon Harrop
  1 sibling, 0 replies; 24+ messages in thread
From: Jon Harrop @ 2009-05-05 15:14 UTC (permalink / raw)
  To: caml-list

On Tuesday 05 May 2009 15:15:33 Jean-Marc Eber wrote:
> Hi Dimitry,
>
> LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.
>
> One should probably have the following in mind and/or ask the following
> questions:
>
> - it is probably not a good idea to support both backends (sse2 and old
> stack fp i386 architecture). It will be necessary to make a choice
> (especially taking in account the limited INRIA resources and the burden of
> already supporting different windows ports).
>
> - would INRIA be ok to switch to a sse2 code generator (based on Dimitry's
> patch - supposing that he is ok to donate it to INRIA - or Xavier's work or
> whatever)?
>
> - I also guess that a sse2 code generator would be simpler than the current
> one (that has to support this horrible fp stack architecture) and would
> therefore be a better candidate for further enhancements.
>
> - what is the opinion on this list, as a switch to a sse2 backend would
> exclude "old" processors from being OCaml compatible (I don't have a
> precise list at hand for now) ?
>
> My opinion is that this support of legacy hardware is not important, but I
> guess others are arguing in opposite directions... :-)
>
> But again, having better floating point performance (and predictable
> behaviour, compared to the bytecode version) would be a big plus for some
> applications.

If the idea is to provide better code generation on x86 going forwards with 
minimal effort then I'd have thought an LLVM-based backend would be the 
obvious choice. My tests with HLVM showed that numerical code can be a 
whopping 8x faster than today's ocamlopt on x86 and, of course, LLVM is 
improving much more rapidly.

LLVM can probably replace the x86, x64 and ppc backends. LLVM also seems like 
a sane approach to providing a native-code top level via its existing JIT 
functionality.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [Caml-list] Re: Ocamlopt code generator question
  2009-05-05 14:58       ` Sylvain Le Gall
@ 2009-05-05 15:21         ` David Allsopp
  2009-05-05 15:59         ` Dmitry Bely
  1 sibling, 0 replies; 24+ messages in thread
From: David Allsopp @ 2009-05-05 15:21 UTC (permalink / raw)
  To: 'Sylvain Le Gall', caml-list

Sylvain Le Gall wrote:
> Maybe this point can be discussed. I think 3 ports for windows is a bit
> too much... I don't know Dimitry point of view, but maybe INRIA can
> just consider MSVC (or mingw). If this is a way to free INRIA resources,
it
> is a good option.

There are actually 4 Windows ports if you include MSVC64! I'm not sure at
this stage that it's possible to reduce the number - I think you'll find
that there are enough users on both sides with enough
hard/impossible-to-work-around requirements (probably to do with external
libraries) such that you'd never be able to decide between just MinGW or
just MSVC. The Cygwin port, although obviously requiring extra work and
support, is more like supporting a separate flavour of UNIX than a separate
Windows port, I think.

> I would like to say "go on", but SSE2 will limit OCaml to P4 on i386.
> In Debian, this is the "low limit" of our build daemon. I think it is
> quite dangerous not having the option of the older code generator...

+1 I've still got a few quite useable Pentium 3 machines knocking around...
it would seem a shame if a lack of compiler rather than OS support ever
caused them to be retired. That said, the power and noise will probably be
what retires them first...



David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Re: Ocamlopt code generator question
  2009-05-05 14:58       ` Sylvain Le Gall
  2009-05-05 15:21         ` [Caml-list] " David Allsopp
@ 2009-05-05 15:59         ` Dmitry Bely
       [not found]           ` <4A006410.8000205@lexifi.com>
  1 sibling, 1 reply; 24+ messages in thread
From: Dmitry Bely @ 2009-05-05 15:59 UTC (permalink / raw)
  To: Caml List

On Tue, May 5, 2009 at 6:58 PM, Sylvain Le Gall <sylvain@le-gall.net> wrote:
> On 05-05-2009, Jean-Marc Eber <jeanmarc.eber@lexifi.com> wrote:
>> Hi Dimitry,
>>
>> LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.
>>
>> One should probably have the following in mind and/or ask the following questions:
>>
>> - it is probably not a good idea to support both backends (sse2 and old stack fp
>> i386 architecture). It will be necessary to make a choice (especially taking in
>> account the limited INRIA resources and the burden of already supporting
>> different windows ports).
>>
>
> Maybe this point can be discussed. I think 3 ports for windows is a bit
> too much... I don't know Dimitry point of view, but maybe INRIA can just
> consider MSVC (or mingw). If this is a way to free INRIA resources, it
> is a good option.

You should ask Xavier but I personally don't think that two Windows
ports (Cygwin is quite a different beast) are really the problem for
INRIA. They use (almost) the same C runtime library, the same
makefiles and I don't know a single Ocaml bug that was MSVC or Mingw
specific.

Yes, you have two different emit_nt.mlp and emit.mlp, but the only way
to make things simpler is to abandon MASM syntax completely. In
principle it's possible - GNU as under Windows generates the same COFF
files as MASM, although many Windows people that are not familiar with
AT&T syntax would not be very glad...

>> - would INRIA be ok to switch to a sse2 code generator (based on Dimitry's patch
>>   supposing that he is ok to donate it to INRIA - or Xavier's work or whatever)?
>>
>> - I also guess that a sse2 code generator would be simpler than the current one
>> (that has to support this horrible fp stack architecture) and would therefore be
>> a better candidate for further enhancements.
>>
>> - what is the opinion on this list, as a switch to a sse2 backend would exclude
>> "old" processors from being OCaml compatible (I don't have a precise list at
>> hand for now) ?
>
> I would like to say "go on", but SSE2 will limit OCaml to P4 on i386.
> In Debian, this is the "low limit" of our build daemon. I think it is
> quite dangerous not having the option of the older code generator...

I also would like to retain support for i386. Hopefully, one more code
generator (mostly a copy/paste combination of two already existing
ones) would not require too much efforts to support.

> If INRIA choose to switch to SSE2 there should be at least still a way
> to compile on older architecture. Doesn't mean that INRIA need to keep
> the old code generator, but should provide a simple emulation for it. In
> this case, we will have good performance on new arch for float and we
> will still be able to compile on old arch.
>
>>
>> My opinion is that this support of legacy hardware is not important, but I guess
>> others are arguing in opposite directions... :-)
>>
>
> I would say that "the performance of legacy hardware is not important"
> -- support is still important.
>
>> But again, having better floating point performance (and predictable behaviour,
>> compared to the bytecode version) would be a big plus for some applications.
>>
>
> Indeed.

Don't quite understand what is "predictable behavior" - any generator
should conform to specs. In my tests x87 and SSE2 backends show the
same results (otherwise it would be called a bug).

- Dmitry Bely


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Re: Ocamlopt code generator question
       [not found]           ` <4A006410.8000205@lexifi.com>
@ 2009-05-05 16:26             ` Dmitry Bely
  0 siblings, 0 replies; 24+ messages in thread
From: Dmitry Bely @ 2009-05-05 16:26 UTC (permalink / raw)
  To: Caml List

On Tue, May 5, 2009 at 8:06 PM, Jean-Marc Eber <jeanmarc.eber@lexifi.com> wrote:
> Hi Dimitry,
>
> Firstly thanks for looking again at the sse2 stuff!
>
> A difference may occur, if I'm not wrong, from the intermediate results
> precision:
>
> in sse2, eveything is done on 8 bytes (if you "do" doubles at least), while,
> in x87, intermdiate results (kept on the stack) are 10 bytes precision.
>
> This may result in differences (and the bytecode runtime never uses x87
> intermediate storage, so it behaves _numerically_ like the sse2 code
> generator, I guess).

I wouldn't be so sure. Bytecode runtime is C compiler-dependent (that
does use x87 for floating-point calculations), so rounding errors can
lead to different results. Floating point is always approximate...

- Dmitry Bely


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-05  9:41   ` Dmitry Bely
  2009-05-05 14:15     ` Jean-Marc Eber
@ 2009-05-08 10:21     ` Xavier Leroy
  2009-05-10 11:04       ` David MENTRE
                         ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Xavier Leroy @ 2009-05-08 10:21 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

Dmitry Bely wrote:

> I see. Why I asked this: trying to improve floating-point performance
> on 32-bit x86 platform I have merged floating-point SSE2 code
> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
> architecture. It also inlines sqrt() via -ffast-math flag and slightly
> optimizes emit_float_test (usually eliminates an extra jump) -
> features that are missed in the original amd64 code generator.

You just passed black belt in OCaml compiler hacking :-)

> Is this of any interest to anybody?

I'm definitely interested in the potential improvements to the amd64
code generator.

Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
arithmetic does improve performance and fit ocamlopt's compilation
model much better than the current x87 float arithmetic, which is a
bit of a hack.  Several options can be considered:

1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
   current "i386" port.

2- Declare pre-SSE2 processors obsolete and convert the current
   "i386" port to always use SSE2 float arithmetic.

3- Support both x87 and SSE2 float arithmetic within the same i386
   port, with a command-line option to activate SSE2, like gcc does.

I'm really not keen on approach 1.  We have too many ports (and
their variants for Windows/MSVC) already.  Moreover, I suspect
packagers would stick to the i386 port for compatibility with old
hardware, and most casual users would, too, out of lazyness, so this
hypothetical "ia32sse2" port would receive little testing.

Approach 2 is tempting for me because it would simplify the x86-32
code generator and remove some historical cruft.  The issue is that it
demands a processor that implements SSE2.  For a list of processors, see
  http://en.wikipedia.org/wiki/SSE2
As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
as well as almost all notebooks since 2006.  That should be OK for
professional users (it's nearly impossible to purchase maintenance
beyond 3 years, anyway) and serious hobbyists.  However, packagers are
going to be very unhappy: Debian still lists i486 as its bottom line;
for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
processor", meaning Pentium III.  All these processors lack SSE2
support.  Only MacOS X is SSE2-compatible from scratch.

Approach 3 is probably the best from a user's point of view.  But it's
going to complicate the code generator: the x87 cruft would still be
there, and new cruft would need to be added to support SSE2.  Code
compiled with the SSE2 flag could link with code compiled without,
provided the SSE2 registers are not used for parameter and result
passing.  But as Dmitry observed, this is already the case in the
current ocamlopt compiler.

Jean-Marc Eber:
>> But again, having better floating point performance (and
>> predictable behaviour, compared to the bytecode version) would be a
>> big plus for some applications.

Dmitry Bely:
> Don't quite understand what is "predictable behavior" - any generator
> should conform to specs. In my tests x87 and SSE2 backends show the
> same results (otherwise it would be called a bug).

You haven't tested enough :-).  The x87 backend keeps some intermediate
results in 80-bit float format, while the SSE2 backend (as well as all
other backends and the bytecode interpreter) compute everything in
64-bit format.  See David Monniaux's excellent tutorial:
  http://hal.archives-ouvertes.fr/hal-00128124/en/
Computing intermediate results in extended precision has pros and
cons, but my understanding is that the cons slightly outweigh the pros.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
@ 2009-05-10 11:04       ` David MENTRE
  2009-05-11  2:43         ` Jon Harrop
  2009-05-11  3:43         ` Stefan Monnier
  2009-05-10 23:12       ` [Caml-list] " Matteo Frigo
  2009-05-11  7:55       ` Dmitry Bely
  2 siblings, 2 replies; 24+ messages in thread
From: David MENTRE @ 2009-05-10 11:04 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Dmitry Bely, Caml List

Hello,

Xavier Leroy <Xavier.Leroy@inria.fr> writes:

> 1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
>    current "i386" port.
>
> 2- Declare pre-SSE2 processors obsolete and convert the current
>    "i386" port to always use SSE2 float arithmetic.
>
> 3- Support both x87 and SSE2 float arithmetic within the same i386
>    port, with a command-line option to activate SSE2, like gcc does.

Regarding option 2, I assume that byte-code would still work on i386
pre-SSE2 machines? So OCaml programs would still work on those machines.

As far as I know, one is using ocamlopt to improve performance. I can't
think of any case where one would need native code running on pre-SS2
machines which are so outdated performance-wise.

So I would vote for option 2: always use SSE2 float arithmetic.

Sincerely yours,
david
-- 
GPG/PGP key: A3AD7A2A David MENTRE <dmentre@linux-france.org>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
  2009-05-10 11:04       ` David MENTRE
@ 2009-05-10 23:12       ` Matteo Frigo
  2009-05-11  2:45         ` Jon Harrop
  2009-05-11  7:55       ` Dmitry Bely
  2 siblings, 1 reply; 24+ messages in thread
From: Matteo Frigo @ 2009-05-10 23:12 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Dmitry Bely, Caml List

Do you guys have any sort of empirical evidence that scalar SSE2 math is
faster than plain old x87?

I ask because every time I tried compiling FFTW with gcc -m32
-mfpmath=sse, the result has been invariably slower than the vanilla x87
compilation.  (I am talking about scalar arithmetic here.  FFTW also
supports SSE2 2-way vector arithmetic, which is of course faster.)

I also remember trying similar experiments with other numerical code in
the Pentium 4 dark ages, with similar results.  I don't see any reason
why this should be the case, and maybe this is just a problem of gcc,
but I don't think you should automatically assume that SSE2 math is
faster without running a few experiments first.

Regards,
Matteo Frigo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10 11:04       ` David MENTRE
@ 2009-05-11  2:43         ` Jon Harrop
  2009-05-11  3:43         ` Stefan Monnier
  1 sibling, 0 replies; 24+ messages in thread
From: Jon Harrop @ 2009-05-11  2:43 UTC (permalink / raw)
  To: caml-list

On Sunday 10 May 2009 12:04:13 David MENTRE wrote:
> Regarding option 2, I assume that byte-code would still work on i386
> pre-SSE2 machines? So OCaml programs would still work on those machines.
>
> As far as I know, one is using ocamlopt to improve performance. I can't
> think of any case where one would need native code running on pre-SS2
> machines which are so outdated performance-wise.
>
> So I would vote for option 2: always use SSE2 float arithmetic.

Note that you can use the same argument to justify not optimizing the x86 
backend because power users should be using the (much more performant) x64 
code gen.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10 23:12       ` [Caml-list] " Matteo Frigo
@ 2009-05-11  2:45         ` Jon Harrop
  0 siblings, 0 replies; 24+ messages in thread
From: Jon Harrop @ 2009-05-11  2:45 UTC (permalink / raw)
  To: caml-list, Matteo Frigo

On Monday 11 May 2009 00:12:49 Matteo Frigo wrote:
> Do you guys have any sort of empirical evidence that scalar SSE2 math is
> faster than plain old x87?

I believe the motivation is to make good performance tractible in ocamlopt so 
it is more about the ease of code generation rather than the inherent 
performance characteristics of the two approaches.

> I ask because every time I tried compiling FFTW with gcc -m32
> -mfpmath=sse, the result has been invariably slower than the vanilla x87
> compilation.  (I am talking about scalar arithmetic here.  FFTW also
> supports SSE2 2-way vector arithmetic, which is of course faster.)
>
> I also remember trying similar experiments with other numerical code in
> the Pentium 4 dark ages, with similar results.  I don't see any reason
> why this should be the case, and maybe this is just a problem of gcc,
> but I don't think you should automatically assume that SSE2 math is
> faster without running a few experiments first.

As I understand it, this is very much a problem with ocamlopt and not with 
gcc. Specifically, floating point code compiled by ocamlopt on x86 gives 
mediocre performance for unknown reasons. Hence there is a desire to use more 
modern solutions that simplify the generation of performant code.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-10 11:04       ` David MENTRE
  2009-05-11  2:43         ` Jon Harrop
@ 2009-05-11  3:43         ` Stefan Monnier
  2009-05-11  5:38           ` [Caml-list] " Jon Harrop
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Monnier @ 2009-05-11  3:43 UTC (permalink / raw)
  To: caml-list

> As far as I know, one is using ocamlopt to improve performance.
> I can't think of any case where one would need native code running on
> pre-SS2 machines which are so outdated performance-wise.

You mean we should make slow machines even slower?


        Stefan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Re: Ocamlopt x86-32 and SSE2
  2009-05-11  3:43         ` Stefan Monnier
@ 2009-05-11  5:38           ` Jon Harrop
  0 siblings, 0 replies; 24+ messages in thread
From: Jon Harrop @ 2009-05-11  5:38 UTC (permalink / raw)
  To: caml-list

On Monday 11 May 2009 04:43:21 Stefan Monnier wrote:
> > As far as I know, one is using ocamlopt to improve performance.
> > I can't think of any case where one would need native code running on
> > pre-SS2 machines which are so outdated performance-wise.
>
> You mean we should make slow machines even slower?

Old machines can still run old versions of OCaml at full speed.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
  2009-05-10 11:04       ` David MENTRE
  2009-05-10 23:12       ` [Caml-list] " Matteo Frigo
@ 2009-05-11  7:55       ` Dmitry Bely
  2 siblings, 0 replies; 24+ messages in thread
From: Dmitry Bely @ 2009-05-11  7:55 UTC (permalink / raw)
  To: Xavier Leroy, Caml List

On Fri, May 8, 2009 at 2:21 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:

>> I see. Why I asked this: trying to improve floating-point performance
>> on 32-bit x86 platform I have merged floating-point SSE2 code
>> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
>> architecture. It also inlines sqrt() via -ffast-math flag and slightly
>> optimizes emit_float_test (usually eliminates an extra jump) -
>> features that are missed in the original amd64 code generator.
>
> You just passed black belt in OCaml compiler hacking :-)

Thank you, sensei :-)

>> Is this of any interest to anybody?
>
> I'm definitely interested in the potential improvements to the amd64
> code generator.
>
> Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
> arithmetic does improve performance and fit ocamlopt's compilation
> model much better than the current x87 float arithmetic, which is a
> bit of a hack.  Several options can be considered:
>
> 1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
>   current "i386" port.
>
> 2- Declare pre-SSE2 processors obsolete and convert the current
>   "i386" port to always use SSE2 float arithmetic.
>
> 3- Support both x87 and SSE2 float arithmetic within the same i386
>   port, with a command-line option to activate SSE2, like gcc does.
>
> I'm really not keen on approach 1.  We have too many ports (and
> their variants for Windows/MSVC) already.  Moreover, I suspect
> packagers would stick to the i386 port for compatibility with old
> hardware, and most casual users would, too, out of lazyness, so this
> hypothetical "ia32sse2" port would receive little testing.
>
> Approach 2 is tempting for me because it would simplify the x86-32
> code generator and remove some historical cruft.  The issue is that it
> demands a processor that implements SSE2.  For a list of processors, see
>  http://en.wikipedia.org/wiki/SSE2
> As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
> as well as almost all notebooks since 2006.  That should be OK for
> professional users (it's nearly impossible to purchase maintenance
> beyond 3 years, anyway) and serious hobbyists.  However, packagers are
> going to be very unhappy: Debian still lists i486 as its bottom line;
> for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
> processor", meaning Pentium III.  All these processors lack SSE2
> support.  Only MacOS X is SSE2-compatible from scratch.
>
> Approach 3 is probably the best from a user's point of view.  But it's
> going to complicate the code generator: the x87 cruft would still be
> there, and new cruft would need to be added to support SSE2.  Code
> compiled with the SSE2 flag could link with code compiled without,
> provided the SSE2 registers are not used for parameter and result
> passing.  But as Dmitry observed, this is already the case in the
> current ocamlopt compiler.

I am curious if passing unboxed floats is possible in the current
Ocaml data model?

As for proposed options - I tend to vote for #3 (and implement it if
there is a consensus). Still there is a plenty of low-power/embedded
x86 hardware that does not support SSE2. And one will be able to
compare x87 and SSE2 backends performance to convince him/herself that
the play really worths the candle :-)

- Dmitry Bely


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12 10:04     ` Sylvain Le Gall
@ 2009-05-25  8:23       ` Sylvain Le Gall
  0 siblings, 0 replies; 24+ messages in thread
From: Sylvain Le Gall @ 2009-05-25  8:23 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Sylvain Le Gall <sylvain@le-gall.net> wrote:
> On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>>
>> Sylvain Le Gall:
>>
>> To finish: I'm still very interested in hearing from packagers.  Does
>> Debian, for example, already have some packages that are SSE2-only?
>> Are these packages specially tagged so that the installer will refuse
>> to install them on pre-SSE2 hardware?  What's the party line?
>>
>
> Im my opinion, Debian will probably refuse to ship a package that only
> provide SSE2-only version (but I am talking from my point of view).
>

For those who are interested, a discussion just started about dropping 
pre-i686 architecture for Debian:
http://permalink.gmane.org/gmane.linux.debian.devel.kernel/47844

The first round of post seems clearly against this decision. The main
argument is that many school are using old pre-i686 hardware. 

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
@ 2009-05-12 10:04     ` Sylvain Le Gall
  2009-05-25  8:23       ` Sylvain Le Gall
  0 siblings, 1 reply; 24+ messages in thread
From: Sylvain Le Gall @ 2009-05-12 10:04 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>
> Sylvain Le Gall:
>> If INRIA choose to switch to SSE2 there should be at least still a way
>> to compile on older architecture. Doesn't mean that INRIA need to keep
>> the old code generator, but should provide a simple emulation for it. In
>> this case, we will have good performance on new arch for float and we
>> will still be able to compile on old arch. 
>
> The least complicated way to preserve backward compatibility with
> pre-SSE2 hardware is to keep the existing x87 code generator and bolt
> the SSE2 generator on top of it, Frankenstein-style.  Well, either
> that, or rely on the kernel to trap unimplemented SSE2 instructions
> and emulate them in software.  This is theoretically possible but I'm
> pretty sure neither Linux nor Windows implement it.
>

I was thinking (if it is possible) to use simple "function call" for
doing float operation. This will be very inefficient, but will provide a
very simple compatible layer. 

>
> To finish: I'm still very interested in hearing from packagers.  Does
> Debian, for example, already have some packages that are SSE2-only?
> Are these packages specially tagged so that the installer will refuse
> to install them on pre-SSE2 hardware?  What's the party line?
>

The more obvious package I see, is the linux kernel or the libc6:
http://packages.debian.org/lenny/linux-image-2.6.26-2-486
http://packages.debian.org/lenny/linux-image-2.6.26-1-686-bigmem
http://packages.debian.org/lenny/libc6
http://packages.debian.org/lenny/libc6-i686

AFAIK, there is no way for the package manager to do a real difference
(no tag). However, the installer has some clue about which one to choose
and install the best one for linux and libc6. Once installed, it is
always updated in the good way, because the arch is embeded into the
package name.

I think linux and libc6 should be considered as exceptions, because they
really provide an important benefit for overall optimization.

For other package, if there is possible optimization, a version with and
without optimization is embedded into the package and chosen at runtime.
Example libavcodec provide i686 and i486 version:
http://packages.debian.org/sid/i386/libavcodec52/filelist

So in conclusion, there is always a "default" non SSE2 alternative for
package that can provide an optimized version. I don't know any package
that are SSE2-only.

Im my opinion, Debian will probably refuse to ship a package that only
provide SSE2-only version (but I am talking from my point of view).

Regards
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Ocamlopt x86-32 and SSE2
       [not found] <20090511043120.976EBBC67@yquem.inria.fr>
@ 2009-05-11  7:10 ` Pascal Cuoq
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
  0 siblings, 1 reply; 24+ messages in thread
From: Pascal Cuoq @ 2009-05-11  7:10 UTC (permalink / raw)
  To: caml-list

Here's an idea, I don't know if it is relevant, but it looks that
it could be a good compromise (option 2.5, if you will): how about
implementing floating-point operations as function calls
(the functions could be written in C and be part of the runtime library)
when the SSE2 instructions are not available? Is that simpler than
option 3?

Matteo Frigo <athena@fftw.org> wrote:
> Do you guys have any sort of empirical evidence that scalar SSE2  
> math is
> faster than plain old x87?

It's not speed I am after personally, but a correct implementation
of IEEE 754's round-to-nearest mode for doubles.
Also, the satisfying knowledge that the code of the compiler I use
is as tight is it can be and that I could understand it if I had to
some day.

Jon Harrop <jon@ffconsultancy.com> wrote:
> Note that you can use the same argument to justify not optimizing  
> the x86
> backend because power users should be using the (much more  
> performant) x64
> code gen.

I don't know where you get "much more performant" from.
For what I do, speed of floating-point operations is irrelevant, but
not the speed of the whole application. The whole application is
slightly slower (~10%) with the larger data words despite the improved
instruction set. Plus, memory is also a concern, and for users who
have less than 6GiB of memory, there are actually more addressable
data words in x86 mode.

Pascal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Ocamlopt x86-32 and SSE2
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
@ 2009-05-10  8:56     ` CUOQ Pascal
  0 siblings, 0 replies; 24+ messages in thread
From: CUOQ Pascal @ 2009-05-10  8:56 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

>That does not
>really mean i486 at 25MHz will be used but it is the common bottom
>line that can easily be supported.

My point is that you're not looking at the whole set of
requirements for OCaml and other existing Debian packages
when you look only at the processor's instruction set.

The way to keep old hardware running is to keep
it running old software. or, if you give me a second
to switch to my Bogart voice, "we will always have 3.11".

>Having ocaml require SSE2 is quite unacceptable for someone with a Via
>C7 cpu (they don't have SSE2, right?) 

According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
should be fine.

Pascal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Ocamlopt x86-32 and SSE2
       [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
@ 2009-05-09 11:38 ` CUOQ Pascal
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
  0 siblings, 1 reply; 24+ messages in thread
From: CUOQ Pascal @ 2009-05-09 11:38 UTC (permalink / raw)
  To: caml-list, caml-list

Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>2- Declare pre-SSE2 processors obsolete and convert the current
>   "i386" port to always use SSE2 float arithmetic.
>
>3- Support both x87 and SSE2 float arithmetic within the same i386
>   port, with a command-line option to activate SSE2, like gcc does.

As someone with somewhat of an obsession for keeping
obsolete computers in function as long as they are not broken,
I have to interject something.

I still have a functional Pentium 90 (granted, that's not
the newest computer that does not support SSE2, but
please hear me). I gave up the idea of bootstrapping
OCaml on it years ago because it has 16Mb of memory,
and that became insufficient around the time Camlp4 became
part of the distribution. I would have had either to modify
the compilation flow or cross-compile, both of which were
too much work for the meagre resulting cool factor.
Now, both the old and the new Camlp4 are
fine pieces of software that make use of
resources available nowadays to make things possible
that weren't before. I am not complaining. I am saying that
you have to be consistent in your requirements.

My father was using Debian on a 500MHz K6-3D that I had
somehow been able to upgrade with enough memory
to run one of the two popular desktops. He finally
upgraded to a new computer because he could
see the characters being displayed one by one in the
e-mail client. That, or the motherboard died. I can't
remember. It was serendipitous, anyway.

There are plenty of embedded processors with an x86
instruction set and no SSE2 around, but these are not in
the cool toys that we want to run OCaml on. The cool
toys have ARM processors.

My message is: I am one of the people who have the peculiar
mental illness that leads one to suggest a compatible option.

Well, I am not.

Take option 2 and run with it!

>However, packagers are
>going to be very unhappy: Debian still lists i486 as its bottom line;
>for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
>processor", meaning Pentium III.  All these processors lack SSE2
>support.  Only MacOS X is SSE2-compatible from scratch.

Only Linux distributions are a problem, if OCaml packages
are at risk of being rejected.

Just because Windows still works on old computers doesn't force
every program to do the same (flame bait: and I would add that
Windows' support for old computers is mostly unintentional).

In Linux distributions, is it completely forbidden to have packages
that will not work on the bottom line?
This is (I assume) Ocaml 3.12 that we are talking about, which
would land sometime in 2010 and arrive in binary distributions
that are scheduled to be released in 2011. Will Debian maintain
its delusion of supporting the i486 by that time?

Pascal


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-05-25  8:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-28 19:36 Ocamlopt code generator question Dmitry Bely
     [not found] ` <m27i13tofi.fsf@Pythagorion.local.i-did-not-set--mail-host-address--so-tickle-me>
2009-04-29 16:50   ` Dmitry Bely
2009-04-29 20:04     ` Jeffrey Scofield
2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
2009-05-05  9:41   ` Dmitry Bely
2009-05-05 14:15     ` Jean-Marc Eber
2009-05-05 14:58       ` Sylvain Le Gall
2009-05-05 15:21         ` [Caml-list] " David Allsopp
2009-05-05 15:59         ` Dmitry Bely
     [not found]           ` <4A006410.8000205@lexifi.com>
2009-05-05 16:26             ` Dmitry Bely
2009-05-05 15:14       ` [Caml-list] " Jon Harrop
2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
2009-05-10 11:04       ` David MENTRE
2009-05-11  2:43         ` Jon Harrop
2009-05-11  3:43         ` Stefan Monnier
2009-05-11  5:38           ` [Caml-list] " Jon Harrop
2009-05-10 23:12       ` [Caml-list] " Matteo Frigo
2009-05-11  2:45         ` Jon Harrop
2009-05-11  7:55       ` Dmitry Bely
     [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
2009-05-09 11:38 ` CUOQ Pascal
2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
2009-05-10  8:56     ` CUOQ Pascal
     [not found] <20090511043120.976EBBC67@yquem.inria.fr>
2009-05-11  7:10 ` Pascal Cuoq
2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
2009-05-12 10:04     ` Sylvain Le Gall
2009-05-25  8:23       ` Sylvain Le Gall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).