From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <oliver@first.in-berlin.de>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id 16B45BB84
	for <caml-list@yquem.inria.fr>; Sat, 19 Jul 2008 14:41:24 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AtsCALt9gUjAbSoIiGdsb2JhbACSPgEBAQ8gmyI
X-IronPort-AV: E=Sophos;i="4.31,215,1215381600"; 
   d="scan'208";a="13270970"
Received: from einhorn.in-berlin.de ([192.109.42.8])
  by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Jul 2008 14:41:23 +0200
X-Envelope-From: oliver@first.in-berlin.de
Received: from einhorn.in-berlin.de (localhost [127.0.0.1])
	by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id m6JCfItn012241
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Sat, 19 Jul 2008 14:41:19 +0200
Received: (from www-data@localhost)
	by einhorn.in-berlin.de (8.13.6/8.13.6/Submit) id m6JCfI9o012239;
	Sat, 19 Jul 2008 14:41:18 +0200
X-Authentication-Warning: einhorn.in-berlin.de: www-data set sender to oliver@first.in-berlin.de using -f
Received: from dslb-088-073-068-085.pools.arcor-ip.net (dslb-088-073-068-085.pools.arcor-ip.net [88.73.68.85]) 
	by webmail.in-berlin.de (IMP) with HTTP 
	for <first@localhost>; Sat, 19 Jul 2008 14:41:18 +0200
Message-ID: <1216471278.4881e0ee770e7@webmail.in-berlin.de>
Date: Sat, 19 Jul 2008 14:41:18 +0200
From: Oliver Bandel <oliver@first.in-berlin.de>
To: Kuba Ober <ober.14@osu.edu>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] thousands of CPU cores
References: <c6c368760807092257v7a125fey534fbc6d5444d14@mail.gmail.com> <200807100944.29221.peng.zang@gmail.com> <200807101500.03079.jon@ffconsultancy.com> <200807151039.51519.ober.14@osu.edu>
In-Reply-To: <200807151039.51519.ober.14@osu.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.6
X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8
X-Spam: no; 0.00; bandel:01 in-berlin:01 byte:01 wikipedia:01 wiki:01 speedup:01 amortize:01 byte:01 mmaped:01 mmaped:01 cheers:01 beginner's:01 ocaml:01 bug:01 sram:98 

Zitat von Kuba Ober <ober.14@osu.edu>:
[...]
> If you count the "efficiency" of such out-of-the-blue uncached truly
> random
> access in terms of clock cycles, current hardware may be 1-2 orders
> of
> magnitude less efficient than almost any 8-bit microcontroller out
> there...
> On most MCUs you can read a random byte out of the SRAM in say 1-4
> clock
> cycles. On your commonplace modern multicore CPU, it may take a
> hundred clock
> cycles to do the same, and essentially the same amount of time in
> terms of the
> wall clock (a 2GHz CPU has only 100 times faster clock than a run of
> the mill
> 20MHz MCU).
[...]

Given a RAM with a certain clock frequency, when
the Microcontroller works at the same frequency as the RAM,
and a CPU of a typical computer uses a much higher
frequency, it's normal that the CPU has to wait longer.
That's thze reasion why cache-RAM is used on more then one level.

But there are also processors, that can lookup RAM while at the same
time working on instructions that were fetched before.
Some DSPs are really fast... for example Analog Devices'
TigerShark can do between one and four operations in one
clock cycle (on average two instructions per cycle).
And itÄs a single-core DSP.

When it runs at 600 MHz, it can do 32-Bit Floating-Point
operations  with 3.6 GFlops.

http://www.analog.com/en/embedded-processing-dsp/tigersharc/content/tigersharc_benchmarks/fca.html


OK, this is a very specific processor and comparing it with
CPU's of the computers we are using today, is maybe
a littlebid inproper. But I just wanted to say: a good design can do a
lot
things possible, which may not be used in many CPUs.

For example the TigerShark also has links to other processors
and therefore can be used in multi-processor systems.

http://www.analog.com/en/embedded-processing-dsp/tigersharc/content/tigersharc_processor__architectural_features/fca.html

http://www.analog.com/en/embedded-processing-dsp/tigersharc/content/tigersharc_architectural_backgrounder/fca.html


And the idea of Links between processors was used in the 1980's
by T-400 and T-800 from INMOS:
   http://en.wikipedia.org/wiki/Transputer

It seems, they were too far ahead to be commercially successful.

Ciao,
   Oliver

P.S.: Remember the Altivec unit of G4-processor, for example...
        ...they also gave good speedup and Math-speed.
       So, a good CPU-design can give advanatges in speed.


>
> What I'm trying to say is that such random, small memory accesses
> highlight
> the inherent message passing / transactional overhead of the hardware
> implementation. Those overheads amortize when you run real number
> tasks,
> not a made-up cold single byte access of course. But they are there.
>
> It's akin to mmaped file: you can use CPU's MMU to implement it in
> the
> usual OS/stock hardware framework, or you can have an FPGA handle
> memory
> transactions and talk directly to the hard drive. It doesn't change
> the
> fact that it's still a mmaped file :)
>
> Cheers, Kuba
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>