caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] doing MMX through ocaml
@ 2005-11-17 21:13 Jonathan Roewen
  2005-11-17 21:47 ` Oliver Bandel
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jonathan Roewen @ 2005-11-17 21:13 UTC (permalink / raw)
  To: caml-list

Hi,

I've received a lot of feedback on an mpeg2 decoder in ocaml, and
about performance woes ;-) So... the next step: building an MMX
library =) I presume it is possible, though, would lose a lot of
benefits having to use C wrappers everywhere....

The FFI just requires that the external symbol exists, right? Soooo, I
could theoretically define pure ASM routines that O'Caml could call.

The question begs: does OCaml expect the FFI function to have a
particular layout? Obviously, args have to be retrieved from the
stack, right? And there's no way around this?

And since MMX deals solely with integers, the Val_long/Long_val macros
could be implemented by a simple extra MMX instruction or two, right?

Maybe I could mod the compiler: create a 'naked' version of external,
and define registers to put values in, retrieve from? I've had a look
at the interp.c sources, and they don't look that complicated... just
a matter of defining a new instruction type for a C_CALL, and altering
the code generator and parser... though, that part in itself might be
quite tricky.

Basically, the idea is to create either a generic Math, or a special
MMX module built into the kernel, that apps can utilise if so wanted.
Of course, for the packed data types in MMX, might need some custom
types, but that shouldn't be too much of a problem.

Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 21:13 [Caml-list] doing MMX through ocaml Jonathan Roewen
@ 2005-11-17 21:47 ` Oliver Bandel
  2005-11-17 21:57   ` Jonathan Roewen
  2005-11-17 22:16   ` Damien Bobillot
  2005-11-17 23:01 ` Vincenzo Ciancia
  2005-11-17 23:49 ` [Caml-list] " Erik de Castro Lopo
  2 siblings, 2 replies; 16+ messages in thread
From: Oliver Bandel @ 2005-11-17 21:47 UTC (permalink / raw)
  To: caml-list

On Fri, Nov 18, 2005 at 10:13:00AM +1300, Jonathan Roewen wrote:
[...] 
> The FFI just requires that the external symbol exists, right? Soooo, I
> could theoretically define pure ASM routines that O'Caml could call.
[...] 
> And since MMX deals solely with integers, the Val_long/Long_val macros
> could be implemented by a simple extra MMX instruction or two, right?
[...] 


ASM...MMX.... will the OS be available for more than one platform?

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 21:47 ` Oliver Bandel
@ 2005-11-17 21:57   ` Jonathan Roewen
  2005-11-17 22:16   ` Damien Bobillot
  1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Roewen @ 2005-11-17 21:57 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

> ASM...MMX.... will the OS be available for more than one platform?

Right now, no. Later, possibly. If can abstract MMX enough, could use
cpuid to choose altivec for example on PPC. And this would be done at
init time (much like filename.ml handles platform choice automatically
at init time).

Another question =)

I've used -dinstr to observe the bytecode instructions that are
generated (though obviously not yet optimised as it pushes a value,
then pops it back into accumulator straight away).

Now, how can I find out what the actual generated instructions are
that correspond to those in interp.c? I'm trying to get an idea of the
performance of calling single MMX asm instructions from ocaml, rather
than create special functions that do a bunch at a time.

Like, does CHECK_SIGNALS get interweaved between two ccalls? Like in
the following -dinstr dump:

  const 5
  push
  acc 0
  ccall neg, 1
  ccall incr, 1

Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 21:47 ` Oliver Bandel
  2005-11-17 21:57   ` Jonathan Roewen
@ 2005-11-17 22:16   ` Damien Bobillot
  2005-11-17 22:43     ` Daniel Bünzli
  1 sibling, 1 reply; 16+ messages in thread
From: Damien Bobillot @ 2005-11-17 22:16 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Oliver Bandel wrote :

> On Fri, Nov 18, 2005 at 10:13:00AM +1300, Jonathan Roewen wrote:
> [...]
>> The FFI just requires that the external symbol exists, right?  
>> Soooo, I
>> could theoretically define pure ASM routines that O'Caml could call.
> [...]
>> And since MMX deals solely with integers, the Val_long/Long_val  
>> macros
>> could be implemented by a simple extra MMX instruction or two, right?
>
> ASM...MMX.... will the OS be available for more than one platform?

Yes, MMX is not portable.

But I think it's more a question of creating an abstract interface  
for vectorized algorithms like fft, convolution, vector operations  
(addition, multiplication element by element)... SIMD instructions on  
x86 and ppc are very similar, not compatible but may be abstracted to  
the same fonctions without too much overhead.

As an exemple, I will quote Apple's Accelaration framework. It's a  
library for signal processing (1D and 2D), linear algebra (blas &  
lapack) optimised to use MMX & SSE instructions when running the x86  
version of MacOS X, and use Altivec instructions when running the ppc  
version.

It may be a good think to have such a vectorize library in ocaml, to  
open the way to fast signal processing inside ocaml.

PS : I think it will perhaps have the same problems as floating  
computation, which is not really efficient. As far as I know, floats  
are not stored as float in memory, but as a generic ocaml value  : a  
pointer to a structure contains a tag indicating that it's a float,  
and after the IEEE float value.

-- 
Damien Bobillot


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2375 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 22:16   ` Damien Bobillot
@ 2005-11-17 22:43     ` Daniel Bünzli
  2005-11-17 22:55       ` Jonathan Roewen
  2005-11-17 22:55       ` Damien Bobillot
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Bünzli @ 2005-11-17 22:43 UTC (permalink / raw)
  To: Damien Bobillot; +Cc: caml-list


Le 17 nov. 05 à 23:16, Damien Bobillot a écrit :

> PS : I think it will perhaps have the same problems as floating  
> computation, which is not really efficient. As far as I know,  
> floats are not stored as float in memory, but as a generic ocaml  
> value  : a pointer to a structure contains a tag indicating that  
> it's a float, and after the IEEE float value.

This not exactly true. As an exception, they are unboxed in records  
and arrays made of floats only [1,2].

Note that you can also use bigarrays [3] to have unboxed arrays of  
any scalar datatype.

This page [4] (unfortunately not available in the faq of the new  
ocaml site) contains interesting information about writing numerical  
code in ocaml.

Best,

Daniel

[1] <http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#htoc218>
[2] <http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#htoc219>
[3] <http://caml.inria.fr/pub/docs/manual-ocaml/manual043.html#htoc261>
[4] <http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 22:43     ` Daniel Bünzli
@ 2005-11-17 22:55       ` Jonathan Roewen
  2005-11-18  1:26         ` Vincenzo Ciancia
  2005-11-18 10:04         ` [Caml-list] " Alessandro Baretta
  2005-11-17 22:55       ` Damien Bobillot
  1 sibling, 2 replies; 16+ messages in thread
From: Jonathan Roewen @ 2005-11-17 22:55 UTC (permalink / raw)
  Cc: caml-list

Another option, that I'm curious what people might think of, that
might alleviate some performance woes of multiple calls to much
simpler MMX instruction wrappers:

A runtime code generator, that's typed by ocaml. Basically, create a
code block from multiple instructions, which is turned into machine
code, then executed through a C function.

You'd still get type safety, but performance should hopefully be a bit
better. The idea is that you commonly execute a set of MMX
instructions at a time, rather than single instructions.

Ideas on whether this is a good design, and what would probably be the
best way to do this are welcome =)

Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 22:43     ` Daniel Bünzli
  2005-11-17 22:55       ` Jonathan Roewen
@ 2005-11-17 22:55       ` Damien Bobillot
  1 sibling, 0 replies; 16+ messages in thread
From: Damien Bobillot @ 2005-11-17 22:55 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]


Le 17 nov. 05 à 23:43, Daniel Bünzli a écrit :

>
> Le 17 nov. 05 à 23:16, Damien Bobillot a écrit :
>
>> PS : I think it will perhaps have the same problems as floating  
>> computation, which is not really efficient. As far as I know,  
>> floats are not stored as float in memory, but as a generic ocaml  
>> value  : a pointer to a structure contains a tag indicating that  
>> it's a float, and after the IEEE float value.
>
> This not exactly true. As an exception, they are unboxed in records  
> and arrays made of floats only [1,2].
>
> Note that you can also use bigarrays [3] to have unboxed arrays of  
> any scalar datatype.
>
> This page [4] (unfortunately not available in the faq of the new  
> ocaml site) contains interesting information about writing  
> numerical code in ocaml.

Ok, and thank you for the references.

Perhaps, writing a bigarray-like module for vectorized operation may  
be the solution.

> [1] <http://caml.inria.fr/pub/docs/manual-ocaml/ 
> manual032.html#htoc218>
> [2] <http://caml.inria.fr/pub/docs/manual-ocaml/ 
> manual032.html#htoc219>
> [3] <http://caml.inria.fr/pub/docs/manual-ocaml/ 
> manual043.html#htoc261>
> [4] <http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html>


-- 
Damien alias Schmurtz
aim:goim?screenname=schmuuurtz


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2375 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: doing MMX through ocaml
  2005-11-17 21:13 [Caml-list] doing MMX through ocaml Jonathan Roewen
  2005-11-17 21:47 ` Oliver Bandel
@ 2005-11-17 23:01 ` Vincenzo Ciancia
  2005-11-17 23:49 ` [Caml-list] " Erik de Castro Lopo
  2 siblings, 0 replies; 16+ messages in thread
From: Vincenzo Ciancia @ 2005-11-17 23:01 UTC (permalink / raw)
  To: caml-list

Jonathan Roewen wrote:

> 
> The FFI just requires that the external symbol exists, right? Soooo, I
> could theoretically define pure ASM routines that O'Caml could call.
> 

Wouldn't that constrain a single asm instruction to pass trough the calling
convention? In that case, wouldn't runtime code generation be better? 

Bye

Vincenzo

-- 
Please note that I do not read the e-mail address used in the from field but
I read vincenzo_ml at yahoo dot it
Attenzione: non leggo l'indirizzo di posta usato nel campo from, ma leggo
vincenzo_ml at yahoo dot it


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 21:13 [Caml-list] doing MMX through ocaml Jonathan Roewen
  2005-11-17 21:47 ` Oliver Bandel
  2005-11-17 23:01 ` Vincenzo Ciancia
@ 2005-11-17 23:49 ` Erik de Castro Lopo
  2005-11-18  1:52   ` Grégory Guyomarc'h
  2005-11-18  3:06   ` Brian Hurt
  2 siblings, 2 replies; 16+ messages in thread
From: Erik de Castro Lopo @ 2005-11-17 23:49 UTC (permalink / raw)
  To: caml-list

Jonathan Roewen wrote:

> And since MMX deals solely with integers, the Val_long/Long_val macros
> could be implemented by a simple extra MMX instruction or two, right?

All Pentium III and later processors have the SSE instruction 
set which is like MMX, but for 32 bit floats. Pentium IV and
latter also has SSE2 which is 64 bit floats.

Personally, I find SSE and SSe2 far more interesting than MMX.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"The music business is a cruel and shallow money trench, a long
 plastic hallway where thieves and pimps run free, and good men
 die like dogs. There's also a negative side."
   -- Hunter S. Thompson


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: doing MMX through ocaml
  2005-11-17 22:55       ` Jonathan Roewen
@ 2005-11-18  1:26         ` Vincenzo Ciancia
  2005-11-18 10:04         ` [Caml-list] " Alessandro Baretta
  1 sibling, 0 replies; 16+ messages in thread
From: Vincenzo Ciancia @ 2005-11-18  1:26 UTC (permalink / raw)
  To: caml-list

Jonathan Roewen wrote:

> 
> Ideas on whether this is a good design, and what would probably be the
> best way to do this are welcome =)

I think, from an "external" point of view in the sense that I never did such
things, that using arrows should be a good idea. Perhaps the following is a
good index/starting point for such concepts, in particular the papers
related to Fran and Yampa:

http://www.cs.uu.nl/wiki/Afp/DomainSpecificLanguages

V.

-- 
Please note that I do not read the e-mail address used in the from field but
I read vincenzo_ml at yahoo dot it
Attenzione: non leggo l'indirizzo di posta usato nel campo from, ma leggo
vincenzo_ml at yahoo dot it


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 23:49 ` [Caml-list] " Erik de Castro Lopo
@ 2005-11-18  1:52   ` Grégory Guyomarc'h
  2005-11-18  3:06   ` Brian Hurt
  1 sibling, 0 replies; 16+ messages in thread
From: Grégory Guyomarc'h @ 2005-11-18  1:52 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

There is already a binding for Altivec instructions you might want to have a
look at:

http://wwwlasmea.univ-bpclermont.fr/Personnel/Jocelyn.Serot/camlg4.html

Cheers,
Gregory.

On 11/18/05, Erik de Castro Lopo <ocaml-erikd@mega-nerd.com> wrote:
>
> Jonathan Roewen wrote:
>
> > And since MMX deals solely with integers, the Val_long/Long_val macros
> > could be implemented by a simple extra MMX instruction or two, right?
>
> All Pentium III and later processors have the SSE instruction
> set which is like MMX, but for 32 bit floats. Pentium IV and
> latter also has SSE2 which is 64 bit floats.
>
> Personally, I find SSE and SSe2 far more interesting than MMX.
>
> Erik
> --
> +-----------------------------------------------------------+
> Erik de Castro Lopo
> +-----------------------------------------------------------+
> "The music business is a cruel and shallow money trench, a long
> plastic hallway where thieves and pimps run free, and good men
> die like dogs. There's also a negative side."
> -- Hunter S. Thompson
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2118 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 23:49 ` [Caml-list] " Erik de Castro Lopo
  2005-11-18  1:52   ` Grégory Guyomarc'h
@ 2005-11-18  3:06   ` Brian Hurt
  2005-11-18  3:29     ` Jonathan Roewen
  2005-11-18 19:22     ` Ken Rose
  1 sibling, 2 replies; 16+ messages in thread
From: Brian Hurt @ 2005-11-18  3:06 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list



On Fri, 18 Nov 2005, Erik de Castro Lopo wrote:

> Jonathan Roewen wrote:
>
>> And since MMX deals solely with integers, the Val_long/Long_val macros
>> could be implemented by a simple extra MMX instruction or two, right?
>
> All Pentium III and later processors have the SSE instruction
> set which is like MMX, but for 32 bit floats. Pentium IV and
> latter also has SSE2 which is 64 bit floats.
>
> Personally, I find SSE and SSe2 far more interesting than MMX.

I'm pretty sure you need at least SSE for MPEG.  The core function is an 
8x8 2D FFT.  You *might* be able to do in fixed point (and thus in MMX), 
but the SSE version would be a lot easier to get right.

A run time code generator would be interesting...

Brian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-18  3:06   ` Brian Hurt
@ 2005-11-18  3:29     ` Jonathan Roewen
  2005-11-18 19:22     ` Ken Rose
  1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Roewen @ 2005-11-18  3:29 UTC (permalink / raw)
  Cc: caml-list

> I'm pretty sure you need at least SSE for MPEG.  The core function is an
> 8x8 2D FFT.  You *might* be able to do in fixed point (and thus in MMX),
> but the SSE version would be a lot easier to get right.

Well, I'd be basing my code on libmpeg2 .. which has an MMX only
optimised version.

Anyways, another question: is int64 boxed? I presume it would be, like
floats. sooo, a 8 char string would be better? And functions to
convert to ints? Since MMX is all 64bit....

Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-17 22:55       ` Jonathan Roewen
  2005-11-18  1:26         ` Vincenzo Ciancia
@ 2005-11-18 10:04         ` Alessandro Baretta
  1 sibling, 0 replies; 16+ messages in thread
From: Alessandro Baretta @ 2005-11-18 10:04 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: caml-list

Jonathan Roewen wrote:
> Another option, that I'm curious what people might think of, that
> might alleviate some performance woes of multiple calls to much
> simpler MMX instruction wrappers:
> 
> A runtime code generator, that's typed by ocaml. Basically, create a
> code block from multiple instructions, which is turned into machine
> code, then executed through a C function.

Wouldn't this be the perfect scenario to leverage the power of multistage 
programming. I wonder if MetaOcaml is the right solution. And I wonder I how 
much work would go into adapting the DST to MetaOcaml...

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-18  3:06   ` Brian Hurt
  2005-11-18  3:29     ` Jonathan Roewen
@ 2005-11-18 19:22     ` Ken Rose
  2005-11-21  9:11       ` Sebastian Egner
  1 sibling, 1 reply; 16+ messages in thread
From: Ken Rose @ 2005-11-18 19:22 UTC (permalink / raw)
  To: caml-list

Brian Hurt wrote:

> I'm pretty sure you need at least SSE for MPEG.  The core function is an
> 8x8 2D FFT.  You *might* be able to do in fixed point (and thus in MMX),
> but the SSE version would be a lot easier to get right.

It's actually an 8x8 Discrete Cosine Transform.  It can be done in fixed
point.  IIRC, you need 18 bits.

 - ken


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] doing MMX through ocaml
  2005-11-18 19:22     ` Ken Rose
@ 2005-11-21  9:11       ` Sebastian Egner
  0 siblings, 0 replies; 16+ messages in thread
From: Sebastian Egner @ 2005-11-21  9:11 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

> Brian Hurt wrote:
> 
> > I'm pretty sure you need at least SSE for MPEG.  The core function is 
an
> > 8x8 2D FFT.  You *might* be able to do in fixed point (and thus in 
MMX),
> > but the SSE version would be a lot easier to get right.
> 
> It's actually an 8x8 Discrete Cosine Transform.  It can be done in fixed
> point.  IIRC, you need 18 bits.
> 
>  - ken

As far as I know, the computationally most intensive part of an MPEG2
decoder are the 8x8 IDCT, the motion compensation, and the YCrCb -> RGB
color space conversion. To get an impression, I ported the IDCT and MC
from 'libmpeg2' to Ocaml, and optimized it with an eye on the assembly 
code.
Performance will be fine---as far as one can come without saturated SIMD
ops (e.g. MMX). 'Libmpeg2' is a good open source starting point; it
it clean, reasonably well structured, and relatively small.

The real pain in writing an MPEG2 decoder is probably the complexity
of options (MPEG2 is infamous in that respect), and right now I do not
have a lot of time to spend on such an enterprise.

Some advice on going about an MPEG2 decoder in Ocaml to whom it might
concern: 1. Keep the standard document (ISO) close. 2. Make a choice
which options and configurations to support (and test!). 3. Don't
try too hard to convert 'libmpeg2' literally, but design parts from
scratch. 'Libmpeg2' uses the C preprocessor to do extensive strength
reduction, i.e. generating specialized functions for the various signal
representations (RGB|BGR, 444|422|420, etc.) Since the representation
affects the inner loops, you must do strength reduction in Ocaml, too.
However, the Ocaml compiler is straight, and this means you must find
other ways to have simple but efficient source code. 4. KISS: Keep it
simple and stupid! Forget about the little shop of MPEG2 horror (e.g.
transport streams without a single I-frame, the hell of timecodes etc.).

http://www.iso.org/iso/en/CombinedQueryResult.CombinedQueryResult?queryString=13818-2

Sebastian.

[-- Attachment #2: Type: text/html, Size: 2931 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-11-21  9:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-17 21:13 [Caml-list] doing MMX through ocaml Jonathan Roewen
2005-11-17 21:47 ` Oliver Bandel
2005-11-17 21:57   ` Jonathan Roewen
2005-11-17 22:16   ` Damien Bobillot
2005-11-17 22:43     ` Daniel Bünzli
2005-11-17 22:55       ` Jonathan Roewen
2005-11-18  1:26         ` Vincenzo Ciancia
2005-11-18 10:04         ` [Caml-list] " Alessandro Baretta
2005-11-17 22:55       ` Damien Bobillot
2005-11-17 23:01 ` Vincenzo Ciancia
2005-11-17 23:49 ` [Caml-list] " Erik de Castro Lopo
2005-11-18  1:52   ` Grégory Guyomarc'h
2005-11-18  3:06   ` Brian Hurt
2005-11-18  3:29     ` Jonathan Roewen
2005-11-18 19:22     ` Ken Rose
2005-11-21  9:11       ` Sebastian Egner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).