From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gclci-caml-list@m.gmane.org>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id 6DDCBBC57
	for <caml-list@yquem.inria.fr>; Wed, 17 Nov 2010 12:34:36 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq0DAP9L40xQW+UMgWdsb2JhbACiQRUBARYiIsBMhUsEilg
X-IronPort-AV: E=Sophos;i="4.59,210,1288566000"; 
   d="scan'208";a="78765630"
Received: from lo.gmane.org ([80.91.229.12])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 17 Nov 2010 12:34:36 +0100
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <gclci-caml-list@m.gmane.org>)
	id 1PIgHZ-0005ES-FO
	for caml-list@inria.fr; Wed, 17 Nov 2010 12:34:33 +0100
Received: from avelizy-155-1-94-54.w90-35.abo.wanadoo.fr ([90.35.89.54])
        by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <caml-list@inria.fr>; Wed, 17 Nov 2010 12:34:33 +0100
Received: from sylvain by avelizy-155-1-94-54.w90-35.abo.wanadoo.fr with local (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <caml-list@inria.fr>; Wed, 17 Nov 2010 12:34:33 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: caml-list@inria.fr
From: Sylvain Le Gall <sylvain@le-gall.net>
Subject: Re: SMP multithreading
Date: Wed, 17 Nov 2010 11:34:19 +0000 (UTC)
Message-ID: <slrnie7fdr.r67.sylvain@gallu.homelinux.org>
References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de>
 <slrnie4vah.r67.sylvain@gallu.homelinux.org>
 <8762vwdx09.fsf@frosties.localnet>
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: avelizy-155-1-94-54.w90-35.abo.wanadoo.fr
User-Agent: slrn/pre1.0.0-18 (Linux)
X-Spam: no; 0.00; le-gall:01 le-gall:01 speedups:01 ocaml:01 runtime:01 speedup:01 speedup:01 ocaml:01 cameleon:01 maillist:98 acb:98 15.:98 1.5:98 utilize:98 threads:01 

On 17-11-2010, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> Sylvain Le Gall <sylvain@le-gall.net> writes:
>
>> Hi,
>>
>> On 15-11-2010, Wolfgang Draxinger <wdraxinger.maillist@draxit.de> wrote:
>>> Hi,
>>>
>>> I've just read
>>> http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html
>>> in particular this paragraph:
>>>| What about hyperthreading?  Well, I believe it's the last convulsive
>>>| movement of SMP's corpse :-)  We'll see how it goes market-wise.  At
>>>| any rate, the speedups announced for hyperthreading in the Pentium 4
>>>| are below a factor of 1.5; probably not enough to offset the overhead
>>>| of making the OCaml runtime system thread-safe.
>>>
>>> This reads just like the "640k ought be enough for everyone". Multicore
>>> systems are the standard today. Even the cheapest consumer machines
>>> come with at least two cores. Once can easily get 6 core machines today.
>>>
>>> Still thinking SMP was a niche and was dying?
>>>
>>
>> Hyperthreading was never remarkable about performance or whatever and is
>> probably not pure SMP (emulated SMP maybe?).
>
> Hyperthreading is a hack to better utilize idle cpu sub units. The CPU
> has multiple complete sets of registers, one per hyper thread. Execution
> of the threads is interleaved. Now when one thread is doing some
> floating point operation the cpu switches over to another thread and
> lets it do some integer aritmetic. But that assumes the threads are
> using different sub units. If they are using the same unit then they
> just block each other and no speedup occurs.
>
> The speedup of hyperthreading is purely from avoiding dead cycles when
> one thread waits for something. On te other hand the cache is shared
> between threads so per thread it is smaller and more easily
> trashed. Hyperthreading can be much slower too.
>

Indeed, the HT extension was designed to reduce pipeline bubbles, which
most of the time occurs when you need to load data from a slow memory
(slow = RAM as opposed to L1/L2 cache). 

In the old time of my P4, ocaml was performing quite well on the
processor. One story about it: while compiling cameleon on it, I often
get into "thermal warning" (the CPU was overheating). I think it could
have been related to the fact the CPU idle level was very low (e.g. no
pipeline bubble). I always thought that this was related to the fact the
minor heap can be stored inside the cache and that reduces the hit/miss
factor (i.e. avoid fetching data in RAM). I have never really tested
this hypothesis. Maybe you can tell me your opinion about this?

Regards,
Sylvain Le Gall