From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 5154BBB81 for ; Tue, 21 Mar 2006 05:03:33 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k2L43WJg015311 for ; Tue, 21 Mar 2006 05:03:32 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id FAA20518 for ; Tue, 21 Mar 2006 05:03:31 +0100 (MET) Received: from cenn.mc.mpls.visi.com (cenn.mc.mpls.visi.com [208.42.156.9]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k2L43Uxr015308 for ; Tue, 21 Mar 2006 05:03:31 +0100 Received: from [192.168.42.2] (bhurt.dsl.visi.com [208.42.141.66]) by cenn.mc.mpls.visi.com (Postfix) with ESMTP id A3462816D; Mon, 20 Mar 2006 22:03:29 -0600 (CST) Date: Mon, 20 Mar 2006 22:04:14 -0600 (CST) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: Markus Mottl Cc: Robert Roessler , Caml-list Subject: Re: [Caml-list] Severe loss of performance due to new signal handling In-Reply-To: Message-ID: References: <441E760D.6010801@inria.fr> <441F57FD.90206@rftp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Miltered: at concorde with ID 441F7B14.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 441F7B12.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; markus:01 mottl:01 assertion:01 os-dependent:01 byterun:01 byterun:01 gnuc:01 gcc:01 gnuc:01 typedef:01 sizeof:01 extern:01 nail:98 wrote:01 wrote:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Mon, 20 Mar 2006, Markus Mottl wrote: > On 3/20/06, Robert Roessler wrote: >> >> At the risk of being "irrelevant", I wanted to nail down exactly what >> assertion is being made here: are we talking about directly executing >> in assembly code the relevant x86[-64]/ppc/whatever instructions for >> "read-and-clear", or going through OS-dependent access routines like >> Windows' InterlockedExchange()? > > > We are talking of the assembly code. See file byterun/signals_machdep.h, > which contains the corresponding macros. OK, poking around a little bit in byterun, I'm seeing this peice of code: for (signal_number = 0; signal_number < NSIG; signal_number++) { Read_and_clear(signal_state, caml_pending_signals[signal_number]); if (signal_state) caml_execute_signal(signal_number, 0); } with Read_and_clear being defined as: #if defined(__GNUC__) && defined(__i386__) #define Read_and_clear(dst,src) \ asm("xorl %0, %0; xchgl %0, %1" \ : "=r" (dst), "=m" (src) \ : "m" (src)) xchgl is the atomic operation (this is always atomic when referencing a memory location, regardless of the presence or absence of a lock prefix). Appropos of nothing, a better definition of that macro would be: #define Read_and_clear(dst,src) \ asm volatile ("xchgl %0, %1" \ : "=r" (dst), "+m" (src) \ : "0" (0)) as this gives gcc the choice of how to move 0 into the register (using an xor will still be a popular choice, but it'll occassionally do a movl depending upon instruction scheduling choices). Some more poking around tells me that NSIG is defined on Linux to be 64. I think the problem is not doing an atomic operation, but doing 64 of them. I'd be inclined to move to a bitset implementation- allowing you to replace 64 atomic instructions with 2. On the x86, you can use the lock bts instruction to set the bit. Some implementation like: #if defined(__GNUC__) && defined(__i386__) typedef unsigned long sigword_t; #define Read_and_clear(dst,src) \ asm volatile ("xchgl %0, %1" \ : "=r" (dst), "+m" (src) \ : "0" (0)) #define Set_sigflag(sigflags, NR) \ asm volatile ("lock bts %1, %0" \ : "+m" (*sigflags) \ : "rN" (NR) \ : "cc") ... #define SIGWORD_BITS (CHAR_BITS * sizeof(sigword_t)) #define NR_SIGWORDS ((NSIG + SIGWORD_BITS - 1)/SIGWORD_BITS) extern sigword_t caml_pending_signals[NR_SIGWORDS]; for (i = 0; i < NR_SIGWORDS; i++) { sigword_t temp; int j; Read_and_clear(temp, caml_pending_signals[i]); for (j = 0; temp != 0; j++) { if ((temp & 1ul) != 0) { caml_execute_signal((i * SIGWORD_BITS) + j, 0) } temp >>= 1; } } This is somewhat more code, but i, j, and temp would all end up in registers, and it'd be two atomic instructions, not 64. The x86 assembly code I can dash off from the top of my head. Similiar bits of assembly can be written for other CPUs- I just have to go dig out the right books. Brian