caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* More registers in modern day CPUs
@ 2007-09-06  6:20 Tom
  2007-09-06  7:17 ` [Caml-list] " skaller
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Tom @ 2007-09-06  6:20 UTC (permalink / raw)
  To: Caml-list List

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

(This question may not be OCaml specific, but I guess it is not specific at
all, and there are quite some people here that have implemented compilers,
so I post it here...)

I was thinking about compiler implementation recently, and figured that it
is difficult to design the compiler for a variable number of hardware
registers - compared for designing a compiler witha fixed number of
registers.

However, would it be possible to "emulate" cpu registers using software? By
keeping registers in the main memory, but accessing them often enough to
keep them in primary cache? That would be quite fast I believe...

 - Tom

[-- Attachment #2: Type: text/html, Size: 662 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06  6:20 More registers in modern day CPUs Tom
@ 2007-09-06  7:17 ` skaller
  2007-09-06  9:07 ` Richard Jones
  2007-09-06 14:55 ` Chris King
  2 siblings, 0 replies; 28+ messages in thread
From: skaller @ 2007-09-06  7:17 UTC (permalink / raw)
  To: Tom; +Cc: Caml-list List

On Thu, 2007-09-06 at 08:20 +0200, Tom wrote:
> (This question may not be OCaml specific, 

you'd be surprised ..


> However, would it be possible to "emulate" cpu registers using
> software? By keeping registers in the main memory, but accessing them
> often enough to keep them in primary cache? That would be quite fast I
> believe... 

The technique is called 'boxing'. This is one reason why Ocaml
is so fast, when you'd expect the extra dereferences required
all the time to be a big penalty. Instead, if the address is
used but not the data (eg generic operation) cache is saved
compared to an expanded representation. The cache is loaded
if the pointer is dereferenced, and subsequent derefs are
effectively free provided only a small number of boxes
is opened: there is an extra cost of one word for the 
address, which is the price of the lazy loading, and is 
amortised away by generic operations.

This is even faster than one might think because cache
can do speculative preload of the pointed at data.
[Does Ocaml bother to generate those instructions?]

IMHO, the main purpose of registers is to organise
the interleaving of parallel operations (memory reads
mainly) based on dependencies. They differ from main
memory (and cache) in that they're usually thread local
(whereas all the other stuff is shared) so they're
expressing coupling between data and flow of control.

for example in:

	R1 = a
	R2 = b
	R3 = R1 + R2
	R4 = c 
	R5 = d
	R6 = R4 + R5

you'd be mainly wrong to think of these instructions as operating
on data. No. Not today. These instructions are chopping up the
control flow into parallel threads:

	a b c d
	| | | |
	V V V V
	 +   +
	 |   |

I think that's the main reason for registers, not memory operands.
Registers only need a few bits to name, so the dispatching to
functional units is easier to calculate with less hardware.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06  6:20 More registers in modern day CPUs Tom
  2007-09-06  7:17 ` [Caml-list] " skaller
@ 2007-09-06  9:07 ` Richard Jones
  2007-09-06 14:55 ` Chris King
  2 siblings, 0 replies; 28+ messages in thread
From: Richard Jones @ 2007-09-06  9:07 UTC (permalink / raw)
  To: Tom; +Cc: Caml-list List

On Thu, Sep 06, 2007 at 08:20:06AM +0200, Tom wrote:
> I was thinking about compiler implementation recently, and figured that it
> is difficult to design the compiler for a variable number of hardware
> registers - compared for designing a compiler witha fixed number of
> registers.
> 
> However, would it be possible to "emulate" cpu registers using software? By
> keeping registers in the main memory, but accessing them often enough to
> keep them in primary cache? That would be quite fast I believe...

You might want to grab a good book on compilers and read about
register allocation.  Or take a look at this Wikipedia page:

http://en.wikipedia.org/wiki/Register_allocation

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06  6:20 More registers in modern day CPUs Tom
  2007-09-06  7:17 ` [Caml-list] " skaller
  2007-09-06  9:07 ` Richard Jones
@ 2007-09-06 14:55 ` Chris King
  2007-09-06 15:17   ` Brian Hurt
                     ` (2 more replies)
  2 siblings, 3 replies; 28+ messages in thread
From: Chris King @ 2007-09-06 14:55 UTC (permalink / raw)
  To: Tom; +Cc: Caml-list List

On 9/6/07, Tom <tom.primozic@gmail.com> wrote:
> However, would it be possible to "emulate" cpu registers using software? By
> keeping registers in the main memory, but accessing them often enough to
> keep them in primary cache? That would be quite fast I believe...

This makes me wonder... why have registers to begin with?  I wonder
how feasible a chip with a, say, 256-byte "register-level" cache would
be.

- Chris


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 14:55 ` Chris King
@ 2007-09-06 15:17   ` Brian Hurt
  2007-09-06 15:54     ` Harrison, John R
  2007-09-06 20:48   ` [Caml-list] More registers in modern day CPUs Richard Jones
       [not found]   ` <20070906204524.GB10798@furbychan.cocan.org>
  2 siblings, 1 reply; 28+ messages in thread
From: Brian Hurt @ 2007-09-06 15:17 UTC (permalink / raw)
  To: Chris King; +Cc: Tom, Caml-list List

[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]

Chris King wrote:

>On 9/6/07, Tom <tom.primozic@gmail.com> wrote:
>  
>
>>However, would it be possible to "emulate" cpu registers using software? By
>>keeping registers in the main memory, but accessing them often enough to
>>keep them in primary cache? That would be quite fast I believe...
>>    
>>
>
>This makes me wonder... why have registers to begin with?  I wonder
>how feasible a chip with a, say, 256-byte "register-level" cache would
>be.
>  
>
Such chips exist.  The Itanium is one example.

The problem is gate delays.  The purpose of registers is to be faster 
than L1 cache (which typically has a 2-3 clock delay associated with 
it).  But the more registers you have, the more gate delays you need to 
read or write registers- the naive implementation takes O(log N) gate 
delays to access O(N) registers- reality is more complicated than this.  
But the rule more registers = more gate delays holds true.  And these 
gate delays translate into a slower chip (one way or another- either you 
have to lower your clock rate or add more pipeline stages or both to 
deal with the larger register cache).  Of course, more registers make 
compilers happy, and lowers pressure on the cache bandwidth (as the 
compiler doesn't need to spill/refill registers quite so often).  This 
is why the 64-bit x86 is generally faster than the 32-bit x86- going 
from 8 (6 in practice) to 16 (14 in practice) registers was a big step 
up.  The Itanium has a large enough register set that it's performance 
is probably getting hurt by it, but it's hard to tell with the 
everything else going on.

The sweet spot for register sets seems to be in the 16-64 range- less 
than that, and you're being hurt by the increased memory pressure, more 
than that and you're probably being hurt by the slower register addressing.

Brian

[-- Attachment #2: Type: text/html, Size: 2396 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [Caml-list] More registers in modern day CPUs
  2007-09-06 15:17   ` Brian Hurt
@ 2007-09-06 15:54     ` Harrison, John R
  2007-09-06 17:10       ` David MENTRE
  0 siblings, 1 reply; 28+ messages in thread
From: Harrison, John R @ 2007-09-06 15:54 UTC (permalink / raw)
  To: Caml-list List

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

Chris King wrote:

 

| This makes me wonder... why have registers to begin with?  I wonder
how

| feasible a chip with a, say, 256-byte "register-level" cache would be.

 

and Brian Hurt said:

 

| Such chips exist.  The Itanium is one example.

 

The Itanium is indeed an example of an architecture with a relatively

large number of registers, and where the register file has certain

memory-like features such as automatic indexing offsets.

 

But as I understood it, Chris was proposing the opposite: have few or

no registers, and rely on main memory instead, with some extra fast

inner level cache to speed it up.

 

Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba

Cell processor have/had a dedicated area of fast memory, rather like a

giant memory-based register file. In each case this is explicitly
visible to

user-level software rather than being a cache in the usual sense.

 

John.

 


[-- Attachment #2: Type: text/html, Size: 5883 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 15:54     ` Harrison, John R
@ 2007-09-06 17:10       ` David MENTRE
  2007-09-06 18:27         ` Harrison, John R
  2007-09-06 18:28         ` Christophe Raffalli
  0 siblings, 2 replies; 28+ messages in thread
From: David MENTRE @ 2007-09-06 17:10 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Caml-list List

Hello,

"Harrison, John R" <john.r.harrison@intel.com> writes:

> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba
> Cell processor have/had a dedicated area of fast memory, rather like a
> giant memory-based register file.

The Cell SPE has 128 registers of 128 bits.

http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf

"Synergistic Processor Elements (SPEs) The eight SPEs are SIMD
 processors optimized for data-rich operations allocated to them by the
 PPE. Each of these identical elements contains a RISC core, 256-KB,
 software-controlled local store for instructions and data, and a large
 (128-bit, 128-entry) unified register file."


Yours,
d.
-- 
GPG/PGP key: A3AD7A2A David MENTRE <dmentre@linux-france.org>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [Caml-list] More registers in modern day CPUs
  2007-09-06 17:10       ` David MENTRE
@ 2007-09-06 18:27         ` Harrison, John R
  2007-09-06 18:28         ` Christophe Raffalli
  1 sibling, 0 replies; 28+ messages in thread
From: Harrison, John R @ 2007-09-06 18:27 UTC (permalink / raw)
  To: David MENTRE; +Cc: Caml-list List

| > Both the old Inmos Transputer and the the more recent
IBM/Sony/Toshiba
| > Cell processor have/had a dedicated area of fast memory, rather like
a
| > giant memory-based register file.
|
| The Cell SPE has 128 registers of 128 bits.

Yes, but I was referring to the "256-KB software controlled local store"
rather than the actual register file. I didn't mean to imply that the
Cell
has few actual registers. (Though the transputer does, in fact.)

John.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 17:10       ` David MENTRE
  2007-09-06 18:27         ` Harrison, John R
@ 2007-09-06 18:28         ` Christophe Raffalli
  2007-09-06 18:48           ` Brian Hurt
  2007-09-06 18:48           ` Pal-Kristian Engstad
  1 sibling, 2 replies; 28+ messages in thread
From: Christophe Raffalli @ 2007-09-06 18:28 UTC (permalink / raw)
  To: David MENTRE; +Cc: Harrison, John R, Caml-list List

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

David MENTRE a écrit :
> Hello,
>
> "Harrison, John R" <john.r.harrison@intel.com> writes:
>
>   
>> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba
>> Cell processor have/had a dedicated area of fast memory, rather like a
>> giant memory-based register file.
>>     
>
> The Cell SPE has 128 registers of 128 bits.
>
> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf
>
> "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD
>  processors optimized for data-rich operations allocated to them by the
>  PPE. Each of these identical elements contains a RISC core, 256-KB,
>  software-controlled local store for instructions and data, and a large
>  (128-bit, 128-entry) unified register file."
>
>
> Yours,
> d.
>   
And apart from the playstation III (under linux for sure ;-), what kind
of not too expensive computer
can we buy with Cell Processors inside ?

Regards,
C.

[-- Attachment #2: Christophe.Raffalli.vcf --]
[-- Type: text/x-vcard, Size: 298 bytes --]

begin:vcard
fn:Christophe Raffalli
n:Raffalli;Christophe
org:LAMA (UMR 5127)
email;internet:christophe.raffalli@univ-savoie.fr
title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences
tel;work:+33 4 79 75 81 03
note:http://www.lama.univ-savoie.fr/~raffalli
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 18:28         ` Christophe Raffalli
@ 2007-09-06 18:48           ` Brian Hurt
  2007-09-06 18:48           ` Pal-Kristian Engstad
  1 sibling, 0 replies; 28+ messages in thread
From: Brian Hurt @ 2007-09-06 18:48 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: Caml-list List

[-- Attachment #1: Type: text/plain, Size: 251 bytes --]

Christophe Raffalli wrote:

>
>And apart from the playstation III (under linux for sure ;-), what kind
>of not too expensive computer
>can we buy with Cell Processors inside ?
>
>  
>

At that, they're cheaper than the Itanics.  Er, Itaniums.

Brian


[-- Attachment #2: Type: text/html, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 18:28         ` Christophe Raffalli
  2007-09-06 18:48           ` Brian Hurt
@ 2007-09-06 18:48           ` Pal-Kristian Engstad
  2007-11-20 15:32             ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan
  1 sibling, 1 reply; 28+ messages in thread
From: Pal-Kristian Engstad @ 2007-09-06 18:48 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: David MENTRE, Harrison, John R, Caml-list List

Hi,

IBM sells their IBM BladeCenter QS20 blade for around $20,000, which may 
be a bit much for most people. Instead, why not install Linux on the 
PS3? Or buy 3 or 4, for the price of one "gaming PC"? For instance, 
http://www.youtube.com/watch?v=oLte5f34ya8

Thanks,

PKE.

Christophe Raffalli wrote:
> David MENTRE a écrit :
>   
>> Hello,
>>
>> "Harrison, John R" <john.r.harrison@intel.com> writes:
>>
>>   
>>     
>>> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba
>>> Cell processor have/had a dedicated area of fast memory, rather like a
>>> giant memory-based register file.
>>>     
>>>       
>> The Cell SPE has 128 registers of 128 bits.
>>
>> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf
>>
>> "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD
>>  processors optimized for data-rich operations allocated to them by the
>>  PPE. Each of these identical elements contains a RISC core, 256-KB,
>>  software-controlled local store for instructions and data, and a large
>>  (128-bit, 128-entry) unified register file."
>>
>>
>> Yours,
>> d.
>>   
>>     
> And apart from the playstation III (under linux for sure ;-), what kind
> of not too expensive computer
> can we buy with Cell Processors inside ?
>
> Regards,
> C.
>   
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>   

-- 
Pål-Kristian Engstad (engstad@naughtydog.com), 
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.

"Most of us would do well to remember that there is a reason Carmack
is Carmack, and we are not Carmack.",
                       Jonathan Blow, 2/1/2006, GD Algo Mailing List




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
  2007-09-06 14:55 ` Chris King
  2007-09-06 15:17   ` Brian Hurt
@ 2007-09-06 20:48   ` Richard Jones
       [not found]   ` <20070906204524.GB10798@furbychan.cocan.org>
  2 siblings, 0 replies; 28+ messages in thread
From: Richard Jones @ 2007-09-06 20:48 UTC (permalink / raw)
  To: caml-list

On Thu, Sep 06, 2007 at 10:55:20AM -0400, Chris King wrote:
> On 9/6/07, Tom <tom.primozic@gmail.com> wrote:
> > However, would it be possible to "emulate" cpu registers using software? By
> > keeping registers in the main memory, but accessing them often enough to
> > keep them in primary cache? That would be quite fast I believe...
>
> This makes me wonder... why have registers to begin with?  I wonder
> how feasible a chip with a, say, 256-byte "register-level" cache would

The 6502 was a successful 8-bit processor where the "on chip"
registers were very few, but the first part of RAM acted as memory
mapped registers.

http://en.wikipedia.org/wiki/6502

This is not feasible in current chips for a whole variety of reasons,
starting with the fact that current RAM is hundreds of times slower
than registers (and even L1 cache is 4-8 times slower).

You should read "Computer Architecture: A Quantitative Approach" by
Hennessy & Patterson.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] More registers in modern day CPUs
       [not found]   ` <20070906204524.GB10798@furbychan.cocan.org>
@ 2007-09-06 20:59     ` Chris King
  0 siblings, 0 replies; 28+ messages in thread
From: Chris King @ 2007-09-06 20:59 UTC (permalink / raw)
  To: Richard Jones; +Cc: Caml List

On 9/6/07, Richard Jones <rich@annexia.org> wrote:
> The 6502 was a successful 8-bit processor where the "on chip"
> registers were very few, but the first part of RAM acted as memory
> mapped registers.

I grew up on the 6502... beautiful architecture :)

> This is not feasible in current chips for a whole variety of reasons,
> starting with the fact that current RAM is hundreds of times slower
> than registers (and even L1 cache is 4-8 times slower).

Right, hence my notion of "register-level cache"... something smaller
and faster than L1 that replaces registers entirely.  (John Harrison
got what I was on about.)  I will check out that book though... I know
very little about cache structures.

- Chris


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-09-06 18:48           ` Pal-Kristian Engstad
@ 2007-11-20 15:32             ` Mike Hogan
  2007-11-21 17:20               ` Richard Jones
  2007-12-02 10:14               ` [Caml-list] OCalm " Xavier Leroy
  0 siblings, 2 replies; 28+ messages in thread
From: Mike Hogan @ 2007-11-20 15:32 UTC (permalink / raw)
  To: caml-list


I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. 
Seems to work fine, but I haven't tested it rigorously (and at this point, I
wouldn't even know how to test it ... um ...what's the opposite of
"rigorously"? ... non-rigorously?)

At any rate, I would be interested in learning a little more about how to
build an open source item like this for a particular platform and then
contribute back to the community (i.e. how to test to standards for this
community, how to create RPMs and where to post them etc.).  

I'd also be interested in any ideas for starting to explore whether/how the
Cell BE's power can be exploited using OCaml (hopefully simple ideas at the
outset, I'm a newb on several fronts here).

Thanks,
mike hogan


Pal-Kristian Engstad wrote:
> 
> Hi,
> 
> IBM sells their IBM BladeCenter QS20 blade for around $20,000, which may 
> be a bit much for most people. Instead, why not install Linux on the 
> PS3? Or buy 3 or 4, for the price of one "gaming PC"? For instance, 
> http://www.youtube.com/watch?v=oLte5f34ya8
> 
> Thanks,
> 
> PKE.
> 
> Christophe Raffalli wrote:
>> David MENTRE a écrit :
>>   
>>> Hello,
>>>
>>> "Harrison, John R" <john.r.harrison@intel.com> writes:
>>>
>>>   
>>>     
>>>> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba
>>>> Cell processor have/had a dedicated area of fast memory, rather like a
>>>> giant memory-based register file.
>>>>     
>>>>       
>>> The Cell SPE has 128 registers of 128 bits.
>>>
>>> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf
>>>
>>> "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD
>>>  processors optimized for data-rich operations allocated to them by the
>>>  PPE. Each of these identical elements contains a RISC core, 256-KB,
>>>  software-controlled local store for instructions and data, and a large
>>>  (128-bit, 128-entry) unified register file."
>>>
>>>
>>> Yours,
>>> d.
>>>   
>>>     
>> And apart from the playstation III (under linux for sure ;-), what kind
>> of not too expensive computer
>> can we buy with Cell Processors inside ?
>>
>> Regards,
>> C.
>>   
>> _______________________________________________
>> Caml-list mailing list. Subscription management:
>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>> Archives: http://caml.inria.fr
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>   
> 
> -- 
> Pål-Kristian Engstad (engstad@naughtydog.com), 
> Lead Graphics & Engine Programmer,
> Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
> Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.
> 
> "Most of us would do well to remember that there is a reason Carmack
> is Carmack, and we are not Carmack.",
>                        Jonathan Blow, 2/1/2006, GD Algo Mailing List
> 
> 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 

-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13858952
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-11-20 15:32             ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan
@ 2007-11-21 17:20               ` Richard Jones
  2007-11-21 19:05                 ` [Caml-list] OCaml " Mike Hogan
  2007-11-23  6:44                 ` Mike Hogan
  2007-12-02 10:14               ` [Caml-list] OCalm " Xavier Leroy
  1 sibling, 2 replies; 28+ messages in thread
From: Richard Jones @ 2007-11-21 17:20 UTC (permalink / raw)
  To: Mike Hogan; +Cc: caml-list

On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote:
> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. 
> Seems to work fine, but I haven't tested it rigorously (and at this point, I
> wouldn't even know how to test it ... um ...what's the opposite of
> "rigorously"? ... non-rigorously?)

Native compiler?  64 bits??  Which version of OCaml???

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-11-21 17:20               ` Richard Jones
@ 2007-11-21 19:05                 ` Mike Hogan
  2007-11-23  6:44                 ` Mike Hogan
  1 sibling, 0 replies; 28+ messages in thread
From: Mike Hogan @ 2007-11-21 19:05 UTC (permalink / raw)
  To: caml-list


I'll try to check out the details tonight, but I have to confess that I am a
newb's newb -- never compiled a line of open source in my life until about a
week-and-a-half ago.  

Offhand, I'm not sure about the version beyond the fact that it's "3.10"
(something?).  It is some labeled version (not the development trunk) and
I'm pretty sure that I built it as plain PPC and in byte-code interpreted
mode (i.e. there was no ocamlopt after the build). 

I did end up with an ocamlc.opt and actually copied ocamlc.opt to "ocamlopt"
in order to build Coq 8.1pl2 (there seems to be a problem w/ the builds in
Coq under the "opt=byte" option where it insists on using ocamlopt in some
cases, despite "opt=byte" option being asserted).

In light of your question, I'm hoping that I can manage to improve the
builds for the Cell BE w/o too much trouble (maybe by following the pattern
of some other architectures in the build?).  PPC native would be great,
ppc64 would be fantastic.

As an aside, Coqide seems to run proofs noticeably faster on my PS3 than on
my XP laptop (1.86GHz Centrino), even though the PS3 is built in
byte-interpreted mode (I'm presuming that Coq on Windows uses a native
build).  That was a nice surprise.

My larger goal is to try to use camlp4 as a way to generate highly parallel
Cell SPU code -- kind of modeled after CorePy's "synthetic programming"
idea.  Hopefully any lack of a native build for PS3 won't be a roadblock for
this.


Richard Jones-4 wrote:
> 
> On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote:
>> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. 
>> Seems to work fine, but I haven't tested it rigorously (and at this
>> point, I
>> wouldn't even know how to test it ... um ...what's the opposite of
>> "rigorously"? ... non-rigorously?)
> 
> Native compiler?  64 bits??  Which version of OCaml???
> 
> Rich.
> 
> -- 
> Richard Jones
> Red Hat
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 

-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13883899
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-11-21 17:20               ` Richard Jones
  2007-11-21 19:05                 ` [Caml-list] OCaml " Mike Hogan
@ 2007-11-23  6:44                 ` Mike Hogan
  1 sibling, 0 replies; 28+ messages in thread
From: Mike Hogan @ 2007-11-23  6:44 UTC (permalink / raw)
  To: caml-list


Ok, it's 3.10.0

When I did ./configure -host powerpc-ydl-linux the resulting report seemed
to suggest that it was going to build native 32-bit, but native tools like
ocamlopt were not present post-build.

At this point I'm too inexperienced to unravel the mysteries behind the
less-than-ideal result.


Richard Jones-4 wrote:
> 
> On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote:
>> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. 
>> Seems to work fine, but I haven't tested it rigorously (and at this
>> point, I
>> wouldn't even know how to test it ... um ...what's the opposite of
>> "rigorously"? ... non-rigorously?)
> 
> Native compiler?  64 bits??  Which version of OCaml???
> 
> Rich.
> 
> -- 
> Richard Jones
> Red Hat
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 

-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13907602
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-11-20 15:32             ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan
  2007-11-21 17:20               ` Richard Jones
@ 2007-12-02 10:14               ` Xavier Leroy
  2007-12-02 16:22                 ` Mike Hogan
  2007-12-04  2:29                 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen
  1 sibling, 2 replies; 28+ messages in thread
From: Xavier Leroy @ 2007-12-02 10:14 UTC (permalink / raw)
  To: Mike Hogan; +Cc: caml-list

> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux.
> Seems to work fine, but I haven't tested it rigorously (and at this
point, I
> wouldn't even know how to test it ... um ...what's the opposite of
> "rigorously"? ... non-rigorously?)

I confirm that OCaml compiles correctly on the PS/3 with YDL.  The
native-code compiler works fine (in 32-bit mode) provided it's
configured with  -host powerpc-unknown-linux.  (Autodetection reports
powerpc64-unknown-linux, even though the default compilation mode on
this distro is 32-bit; I'll hack the configure script to work around
this issue.)

Of course, the generated code runs on the PPC core of the Cell
processor, not on the SPU cores.  Performance is unimpressive: about
1/5th of that of a recent Intel Core2 processor.

> I'd also be interested in any ideas for starting to explore
whether/how the
> Cell BE's power can be exploited using OCaml (hopefully simple ideas
at the
> outset, I'm a newb on several fronts here).

The SPU cores only have 256 Kb of local memory, so there is no hope to
run a Caml run-time system on them.  For some applications (linear
algebra, bignums), it might be possible to link with C libraries that
offload work to the SPU cores.

A more general but extremely difficult approach is two-level
programming, where the Caml program, running on the PPC core,
generates programs in a simple data-parallel language which is then
compiled on the fly to SPU code.  Such an approach could also target
graphics coprocessors (the "GPGPU" approach).  But I have no idea what
such an intermediate language would look like.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-12-02 10:14               ` [Caml-list] OCalm " Xavier Leroy
@ 2007-12-02 16:22                 ` Mike Hogan
  2007-12-02 22:19                   ` Konrad Meyer
  2007-12-04  2:29                 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen
  1 sibling, 1 reply; 28+ messages in thread
From: Mike Hogan @ 2007-12-02 16:22 UTC (permalink / raw)
  To: caml-list



Xavier Leroy wrote:
> 
> I confirm that OCaml compiles correctly on the PS/3 with YDL.  The
> native-code compiler works fine (in 32-bit mode) provided it's
> configured with  -host powerpc-unknown-linux.  (Autodetection reports
> powerpc64-unknown-linux, even though the default compilation mode on
> this distro is 32-bit; I'll hack the configure script to work around
> this issue.)
> 
Nice -- I'll try the "host powerpc-unknown-linux" option.


Xavier Leroy wrote:
> 
> A more general but extremely difficult approach is two-level
> programming, where the Caml program, running on the PPC core,
> generates programs in a simple data-parallel language which is then
> compiled on the fly to SPU code.  
> 
This is exactly what I would like to do.  There is a Python Extension for
the PS3 SPU's called "CorePy" that can be used to more-or-less directly
generate assembly instructions for the PPC, its associated AltiVec and the
SPUs.  In essence, CorePy makes a class for each particular processor on
your system and this class has processor-specific instructions as methods. 
The extensions take care of the details for loading the code, binding
between the Python interpreter and the assembler that was generated on the
fly etc.


Xavier Leroy wrote:
> 
> Such an approach could also target
> graphics coprocessors (the "GPGPU" approach).  But I have no idea what
> such an intermediate language would look like.
> 
This would actually push the system's abilities up by an order of magnitude
in some cases, but unfortunately the "Other OS" hypervisor on the PS3 bars
access to the GPU.  It's a shame, since the PS3 GPU is supposed to be one of
NVIDIA's hottest chips.
-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14116972
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-12-02 16:22                 ` Mike Hogan
@ 2007-12-02 22:19                   ` Konrad Meyer
  2007-12-03  0:09                     ` [Caml-list] OCaml " Mike Hogan
  0 siblings, 1 reply; 28+ messages in thread
From: Konrad Meyer @ 2007-12-02 22:19 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

Quoth Mike Hogan:
> This would actually push the system's abilities up by an order of magnitude
> in some cases, but unfortunately the "Other OS" hypervisor on the PS3 bars
> access to the GPU.  It's a shame, since the PS3 GPU is supposed to be one of
> NVIDIA's hottest chips.

Actually, (and I don't know much about it, sorry) there's a group of folks 
over at ps2dev.org trying to get at the GPU. Just thought I'd share.

< http://forums.ps2dev.org/viewtopic.php?t=8364 >

Regards,
-- 
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-12-02 22:19                   ` Konrad Meyer
@ 2007-12-03  0:09                     ` Mike Hogan
  2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Hogan @ 2007-12-03  0:09 UTC (permalink / raw)
  To: caml-list


Interesting ... but a little rough for my tastes.  I'm always amazed at how
determined some folks are to hack their way into stuff.  I think that a PC
with a decent GFORCE gpu and the CUDA library might be the easier route for
CAML -> GPGPU oriented experiments.

BTW, the idea of an OCaml based DSL for the cell processor or various GPUs
is a proposed summer intern project at Jane St. Capital's site
(http://osp2007.janestcapital.com/suggested-projects/), so there seems to be
an audience for this kind of stuff.  

In fact, GPGPU in general seems like an incredibly hot topic right now and
NVIDIA's support by way of the CUDA architecture is kind of an interesting
development.


Konrad Meyer-2 wrote:
> 
> Quoth Mike Hogan:
>> This would actually push the system's abilities up by an order of
>> magnitude
>> in some cases, but unfortunately the "Other OS" hypervisor on the PS3
>> bars
>> access to the GPU.  It's a shame, since the PS3 GPU is supposed to be one
>> of
>> NVIDIA's hottest chips.
> 
> Actually, (and I don't know much about it, sorry) there's a group of folks 
> over at ps2dev.org trying to get at the GPU. Just thought I'd share.
> 
> < http://forums.ps2dev.org/viewtopic.php?t=8364 >
> 
> Regards,
> -- 
> Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/
> 
>  
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 

-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14121946
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* minithread (was OCaml on Sony PS3)
  2007-12-03  0:09                     ` [Caml-list] OCaml " Mike Hogan
@ 2007-12-03 20:16                       ` Christophe Raffalli
  2007-12-04 14:25                         ` [Caml-list] " David MENTRE
                                           ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Christophe Raffalli @ 2007-12-03 20:16 UTC (permalink / raw)
  Cc: caml-list


[-- Attachment #1.1: Type: text/plain, Size: 2298 bytes --]


I propose the following idea for OCaml on Cell PowerPC or multicore
machine (this is just an idea,
there ay be a lot of thing I did not see ... in other word there is
probably a lot of work to do, but may be not too much):

- Create two functions and one data type to start "mini-thread":

type 'a result_channel
launch : int -> ('a -> 'b) -> 'a -> 'b result_channel.
get_result : 'b result_channel list -> 'b option
(or many similar functions to wait with or without blocking the result
for one or more mini-thread).

Now the point is this: each mini-thread has its own minor-heap whose
size is given as the first argument with the following restrictions:

1) the minor heap is used as a cache : access to  the major heap copy
the data in the minor heap. One need to mix the copying
minor GC with standard caching algorithm.

2) to ease the task 1), mutation of data in the heaps of the main thread
by a mini-thread is illegal (raises an exception in the main thread ?
Static check ?). This includes the arguments of the mini-thread.

3) a mini-thread can not start another mini-thread (raises an exception
in the main thread ? Static check)

4) 2-3) imply that a mini-thread can not access data of other
mini-threads and that the only way for the main thread to
get values from a mini-thread is via their 'b result_channel. Thus, if
you have a main thread M and many mini-threads T1 ... TN
runnnig, Ti can only acces its own data and the data of M (read only).
And, M can not acces the data of T1 ... TN.

If you launch one minithread per SPU or CORE with a minor heap of the
correct size and you fine tune you application to produce not too much
cache misses, then, I think this simple model could be usefull ????

Cheers,
Christophe

-- 
Christophe Raffalli
Universite de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tel: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution 
can check this signature. The public key is
stored on www.keyserver.net
---------------------------------------------


[-- Attachment #1.2: Christophe_Raffalli.vcf --]
[-- Type: text/x-vcard, Size: 310 bytes --]

begin:vcard
fn:Christophe Raffalli
n:Raffalli;Christophe
org:LAMA (UMR 5127)
email;internet:christophe.raffalli@univ-savoie.fr
title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences
tel;work:+33 4 79 75 81 03
note:http://www.lama.univ-savoie.fr/~raffalli
x-mozilla-html:TRUE
version:2.1
end:vcard


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs)
  2007-12-02 10:14               ` [Caml-list] OCalm " Xavier Leroy
  2007-12-02 16:22                 ` Mike Hogan
@ 2007-12-04  2:29                 ` Gordon Henriksen
  1 sibling, 0 replies; 28+ messages in thread
From: Gordon Henriksen @ 2007-12-04  2:29 UTC (permalink / raw)
  To: caml-list

On Dec 2, 2007, at 05:14, Xavier Leroy wrote:

>> I'd also be interested in any ideas for starting to explore whether/ 
>> how the Cell BE's power can be exploited using OCaml (hopefully  
>> simple ideas at the outset, I'm a newb on several fronts here).
>
> The SPU cores only have 256 Kb of local memory, so there is no hope  
> to run a Caml run-time system on them.  For some applications  
> (linear algebra, bignums), it might be possible to link with C  
> libraries that offload work to the SPU cores.
>
> A more general but extremely difficult approach is two-level  
> programming, where the Caml program, running on the PPC core,  
> generates programs in a simple data-parallel language which is then  
> compiled on the fly to SPU code.  Such an approach could also target  
> graphics coprocessors (the "GPGPU" approach).  But I have no idea  
> what such an intermediate language would look like.


Though difficult, this is probably a more practical approach.  
Statically extracting useful parallel programs is a very difficult  
task. (Witness the emergence of OpenMP.) It is probably easier for  
functional programs, but still.

In related news, Areospace legal recently cleared the CellSPU backend  
for upstream contribution to LLVM (http://llvm.org/) and its author is  
finally committing it today. As has been mentioned, it's possible to  
efficiently build LLVM IR in memory with Ocaml. Anyone interested in  
leveraging SPUs from Ocaml in this manner could spool up quickly with  
LLVM. Since LLVM has solid support for mainstream CPUs, so it would be  
quite possible to write programs which were also portable to standard  
SMP hardware.

— Gordon


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] minithread (was OCaml on Sony PS3)
  2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
@ 2007-12-04 14:25                         ` David MENTRE
  2007-12-04 14:37                         ` Basile STARYNKEVITCH
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: David MENTRE @ 2007-12-04 14:25 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

Hello,

2007/12/3, Christophe Raffalli <Christophe.Raffalli@univ-savoie.fr>:
> If you launch one minithread per SPU or CORE with a minor heap of the
> correct size and you fine tune you application to produce not too much
> cache misses, then, I think this simple model could be usefull ????

I might have not completely understood your proposal but it seems to
me that those mini-threads do not solve the issue. In the Cell
architecture, the SPU are *independent* processors. They access the
main memory through DMA like operations and do not have cache. In
other words, for you mini-threads to work on the SPU, you need to fit
the mini-thread s' data, code and environment (e.g. GC) in 256 KB of
memory. As Xavier said, it seems quite difficult if not impossible.

Yours,
david


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] minithread (was OCaml on Sony PS3)
  2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
  2007-12-04 14:25                         ` [Caml-list] " David MENTRE
@ 2007-12-04 14:37                         ` Basile STARYNKEVITCH
  2007-12-04 16:25                           ` Mattias Engdegård
  2007-12-04 17:33                         ` Gerd Stolpmann
  2007-12-04 18:00                         ` Mike Hogan
  3 siblings, 1 reply; 28+ messages in thread
From: Basile STARYNKEVITCH @ 2007-12-04 14:37 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

Christophe Raffalli wrote:
> I propose the following idea for OCaml on Cell PowerPC or multicore
> machine (this is just an idea,
> there ay be a lot of thing I did not see ... in other word there is
> probably a lot of work to do, but may be not too much):
> 
> - Create two functions and one data type to start "mini-thread":


As David MENTRE explained, this is not very realistic.

However, (one of the) the CELL coprocessor -eg SPU) might be used to 
implemented Ocaml garbage collector.

A copying GC has to move quite a lot of data, and it could be possible 
that CELL's coprocessors could be useful for that (assuming that they 
access memory as quickly as the processor).

I don't know if Gallium has resources for that (I suppose not, except 
perhaps maybe for an internship?), and I have no idea if it is easily 
doable or nearly impossible (maybe the current SPU limitations -in 
particular code size- are too strong).

Anyway, it might not help that much performance on Cells systems (eg 
PS3) because the GC is probably at most eating less than half of the 
resources (Damien & Xavier told me recently that the GC is usually using 
less than 20% of CPU, the KnuthBendix test case on ocamlopt being 
unusual to eat about a third of the CPU time). The Ocaml GC is quite 
good (a big bravo to Damien Doligez & Xavier Leroy).


I still think that SPU on PS3 are only useful for games, or specialized 
(e.g. graphical) applications.

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] minithread (was OCaml on Sony PS3)
  2007-12-04 14:37                         ` Basile STARYNKEVITCH
@ 2007-12-04 16:25                           ` Mattias Engdegård
  0 siblings, 0 replies; 28+ messages in thread
From: Mattias Engdegård @ 2007-12-04 16:25 UTC (permalink / raw)
  To: basile; +Cc: Christophe.Raffalli, caml-list

>However, (one of the) the CELL coprocessor -eg SPU) might be used to 
>implemented Ocaml garbage collector.
>
>A copying GC has to move quite a lot of data, and it could be possible 
>that CELL's coprocessors could be useful for that (assuming that they 
>access memory as quickly as the processor).

They don't so it isn't, and doing GC by a coprocessor that cannot
directly access the memory it manages does not sound very practical.

The PPE has the memories of all SPUs mapped into its physical address
space, so it could possibly do the GC for them. But again, given the
limited amount of SPU-private memory, it would probably not be a useful
approach.

Better use of the SPUs would be to run computations that can use manual
memory management (perhaps not using a heap at all), operating on small
chunks of data at a time. Such computations could be described in
a simpler language that is more amenable to parallelisation.

>I still think that SPU on PS3 are only useful for games, or specialized 
>(e.g. graphical) applications.

Maybe, but there are cell blades with more reasonable amounts of
memory, and for experimentation regarding how to use the processor, a
PS3 goes quite far and is very economical. Ground-breaking science has
been made in less than 256 MB.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] minithread (was OCaml on Sony PS3)
  2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
  2007-12-04 14:25                         ` [Caml-list] " David MENTRE
  2007-12-04 14:37                         ` Basile STARYNKEVITCH
@ 2007-12-04 17:33                         ` Gerd Stolpmann
  2007-12-04 18:00                         ` Mike Hogan
  3 siblings, 0 replies; 28+ messages in thread
From: Gerd Stolpmann @ 2007-12-04 17:33 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

Am Montag, den 03.12.2007, 21:16 +0100 schrieb Christophe Raffalli:
> Now the point is this: each mini-thread has its own minor-heap whose
> size is given as the first argument with the following restrictions:

What could work is that you really switch to a copying collector. That
means that there are two minor heaps of fixed size, and a minor GC
copies one heap to the other. While one heap is used, the other is
unused. Every coprocessor would have such a pair of heaps. Of course,
this means that:

- You have very limited memory, and you have to set its max
  size in advance. This heap cannot be extended as needed.
  But this is ok for a coprocessor.
- You waste half of the memory. E.g. if you want to have
  64 K of heap, you have to buy 128 K. On the other hand,
  this saves a lot of code in the OCaml runtime, surely
  more than 64 K, so this is a net win.
- Maybe even this works: The minor GC is done by the main
  processor, and the other heap is also there. This could
  work if the GC is not invoked too often.

Such a copy collector is very small (the minor_gc.c file in the runtime
has less than 300 lines, so you could have a miniature memory manager in
only a few K). If you remove most features of the OCaml runtime, there
are some chances that it really fits into the remaining memory: no I/O,
no generic comparison, no backtraces, no MD5, no lexing, ... You
couldn't use these features in the SPU anyway. From the stdlib I would
only keep arrays and strings (no lists), and add a communication channel
with the main processor.

Of course, programming in this context then does not make any fun. I
mean you'll get stack overflows really quickly. Maybe you can run very
simplistic algorithms. On the one hand I really have doubts whether it
makes sense to run OCaml in such an environment, but on the other hand
it's fun to have such a thing at all...

Gerd
> 
> 1) the minor heap is used as a cache : access to  the major heap copy
> the data in the minor heap. One need to mix the copying
> minor GC with standard caching algorithm.
> 
> 2) to ease the task 1), mutation of data in the heaps of the main thread
> by a mini-thread is illegal (raises an exception in the main thread ?
> Static check ?). This includes the arguments of the mini-thread.
> 
> 3) a mini-thread can not start another mini-thread (raises an exception
> in the main thread ? Static check)
> 
> 4) 2-3) imply that a mini-thread can not access data of other
> mini-threads and that the only way for the main thread to
> get values from a mini-thread is via their 'b result_channel. Thus, if
> you have a main thread M and many mini-threads T1 ... TN
> runnnig, Ti can only acces its own data and the data of M (read only).
> And, M can not acces the data of T1 ... TN.
> 
> If you launch one minithread per SPU or CORE with a minor heap of the
> correct size and you fine tune you application to produce not too much
> cache misses, then, I think this simple model could be usefull ????
> 
> Cheers,
> Christophe
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Caml-list] minithread (was OCaml on Sony PS3)
  2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
                                           ` (2 preceding siblings ...)
  2007-12-04 17:33                         ` Gerd Stolpmann
@ 2007-12-04 18:00                         ` Mike Hogan
  3 siblings, 0 replies; 28+ messages in thread
From: Mike Hogan @ 2007-12-04 18:00 UTC (permalink / raw)
  To: caml-list


Neat stuff.  

For anyone genuinely interested in this problem, a look at CorePy may be in
order -- it is about the simplest model imaginable for processor-specific
access and the Cell interface offers insight into the architectural
specifics that would need to be addressed for the Cell.  Not knowing Caml at
all, really, I wonder if a similar approach can be applied to Caml --
basically "escape" to specialized SPU code wrapped in an encapsulation (a la
mini-thread"?).  If the SPU code can be autogenerated and transparently
integrated using a Caml-based DSL of some sort, then that would be even
better.

I was also wondering recently if there is any practical possibility of code
extraction from Coq to Python in order to make a verified CorePy application
(basically CorePy as an IL) -- or is this just swapping one very difficult
problem for another? 

Coq (which also seems to run fine on the PS3) seems to open up some
interesting possibilities, for example optimization proofs where some
algorithm "X" converted to a high-performance Cell-specific equivalent using
some magic transformation "Y" results in the algorithm "Cell-X" whose
results are equivalent to the original "X".  Furthermore, the
characteristics of the "Y"s developed along the way would seem to provide
formal insight into what a "CoreCaml" language might entail.  

As an aside, would the Y's be a functors, "Cell" be a domain and the inverse
of any A from "Cell" be the domain "algorithms that can be transformed to
"Cell" equivalents using functor "A"" (and apologies in advance of this is a
stupid question beneath an answer).


Christophe Raffalli-2 wrote:
> 
> 
> I propose the following idea for OCaml on Cell PowerPC or multicore
> machine (this is just an idea,
> there ay be a lot of thing I did not see ... in other word there is
> probably a lot of work to do, but may be not too much):
> 
> - Create two functions and one data type to start "mini-thread":
> 
> type 'a result_channel
> launch : int -> ('a -> 'b) -> 'a -> 'b result_channel.
> get_result : 'b result_channel list -> 'b option
> (or many similar functions to wait with or without blocking the result
> for one or more mini-thread).
> 
> Now the point is this: each mini-thread has its own minor-heap whose
> size is given as the first argument with the following restrictions:
> 
> 1) the minor heap is used as a cache : access to  the major heap copy
> the data in the minor heap. One need to mix the copying
> minor GC with standard caching algorithm.
> 
> 2) to ease the task 1), mutation of data in the heaps of the main thread
> by a mini-thread is illegal (raises an exception in the main thread ?
> Static check ?). This includes the arguments of the mini-thread.
> 
> 3) a mini-thread can not start another mini-thread (raises an exception
> in the main thread ? Static check)
> 
> 4) 2-3) imply that a mini-thread can not access data of other
> mini-threads and that the only way for the main thread to
> get values from a mini-thread is via their 'b result_channel. Thus, if
> you have a main thread M and many mini-threads T1 ... TN
> runnnig, Ti can only acces its own data and the data of M (read only).
> And, M can not acces the data of T1 ... TN.
> 
> If you launch one minithread per SPU or CORE with a minor heap of the
> correct size and you fine tune you application to produce not too much
> cache misses, then, I think this simple model could be usefull ????
> 
> Cheers,
> Christophe
> 
> -- 
> Christophe Raffalli
> Universite de Savoie
> Batiment Le Chablais, bureau 21
> 73376 Le Bourget-du-Lac Cedex
> 
> tel: (33) 4 79 75 81 03
> fax: (33) 4 79 75 87 42
> mail: Christophe.Raffalli@univ-savoie.fr
> www: http://www.lama.univ-savoie.fr/~RAFFALLI
> ---------------------------------------------
> IMPORTANT: this mail is signed using PGP/MIME
> At least Enigmail/Mozilla, mutt or evolution 
> can check this signature. The public key is
> stored on www.keyserver.net
> ---------------------------------------------
> 
> 
> begin:vcard
> fn:Christophe Raffalli
> n:Raffalli;Christophe
> org:LAMA (UMR 5127)
> email;internet:christophe.raffalli@univ-savoie.fr
> title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences
> tel;work:+33 4 79 75 81 03
> note:http://www.lama.univ-savoie.fr/~raffalli
> x-mozilla-html:TRUE
> version:2.1
> end:vcard
> 
> 
>  
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 

-- 
View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14156018
Sent from the Caml Discuss2 mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-12-04 18:00 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-06  6:20 More registers in modern day CPUs Tom
2007-09-06  7:17 ` [Caml-list] " skaller
2007-09-06  9:07 ` Richard Jones
2007-09-06 14:55 ` Chris King
2007-09-06 15:17   ` Brian Hurt
2007-09-06 15:54     ` Harrison, John R
2007-09-06 17:10       ` David MENTRE
2007-09-06 18:27         ` Harrison, John R
2007-09-06 18:28         ` Christophe Raffalli
2007-09-06 18:48           ` Brian Hurt
2007-09-06 18:48           ` Pal-Kristian Engstad
2007-11-20 15:32             ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan
2007-11-21 17:20               ` Richard Jones
2007-11-21 19:05                 ` [Caml-list] OCaml " Mike Hogan
2007-11-23  6:44                 ` Mike Hogan
2007-12-02 10:14               ` [Caml-list] OCalm " Xavier Leroy
2007-12-02 16:22                 ` Mike Hogan
2007-12-02 22:19                   ` Konrad Meyer
2007-12-03  0:09                     ` [Caml-list] OCaml " Mike Hogan
2007-12-03 20:16                       ` minithread (was OCaml on Sony PS3) Christophe Raffalli
2007-12-04 14:25                         ` [Caml-list] " David MENTRE
2007-12-04 14:37                         ` Basile STARYNKEVITCH
2007-12-04 16:25                           ` Mattias Engdegård
2007-12-04 17:33                         ` Gerd Stolpmann
2007-12-04 18:00                         ` Mike Hogan
2007-12-04  2:29                 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen
2007-09-06 20:48   ` [Caml-list] More registers in modern day CPUs Richard Jones
     [not found]   ` <20070906204524.GB10798@furbychan.cocan.org>
2007-09-06 20:59     ` Chris King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).