From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA02196; Wed, 27 Nov 2002 19:06:37 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA02327 for ; Wed, 27 Nov 2002 19:06:36 +0100 (MET) Received: from relay.pair.com (relay1.pair.com [209.68.1.20]) by concorde.inria.fr (8.11.1/8.11.1) with SMTP id gARI6YX15138 for ; Wed, 27 Nov 2002 19:06:35 +0100 (MET) Received: (qmail 21825 invoked from network); 27 Nov 2002 18:06:31 -0000 Received: from arda.pair.com (HELO compaqreview.d6.com) (209.68.1.133) by relay1.pair.com with SMTP; 27 Nov 2002 18:06:31 -0000 X-pair-Authenticated: 209.68.1.133 Message-Id: <4.3.2.7.2.20021127090821.032eae90@localhost> X-Sender: checker@localhost X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Wed, 27 Nov 2002 10:04:55 -0800 To: Damien Doligez , caml-list@inria.fr From: Chris Hecker Subject: Re: [Caml-list] Why systhreads? In-Reply-To: References: <4.3.2.7.2.20021125134858.037b4ef8@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk [sorry for the longwinded response] >Do you really think so ? In my experience, 95% of the costs of threads >(with shared memory) are in the debugging (of the threads implementation, >AND of the programs). Cheap SMP machines and HT do not change the >cost/benefit equation very much. Like I said in my previous mail, I think it's going to be similar to MMX/SSE. The performance improvement you get is not worth the development and support headache, until the technology is ubiquitous. Once it's everywhere, it becomes worthwhile. I'm using a middleware library for my game right now that requires MMX. That's finally an acceptable requirement. On xbox, which is a fixed platform with a known cpu, every game uses SSE, because it's just guaranteed to be there, and can make a big difference if you're willing to work with its problems (using structure of arrays layout, etc.). And let's not even talk about the insanity of the PS2 architecture. Xbox2 will use a CPU with HT, because there won't be any Intel CPUs that don't have HT, so it'll get used there by apps. Now, as you point out, threads are complicated to design, program, and debug. I agree with this completely. As I said, I never use threaded designs if I can avoid it. However, if it becomes very easy to spawn very small scale parallel threads in C on an HT processor, then it could make a big performance difference for some algorithms. People are working on C compilers that have these extensions built in. Intel's got one now. They'll be first, everyone will ignore it until the installed base is big enough, and then it'll go into msvc. MMX, SSE, and 3dnow followed the exact same path. The reason this is different (or has the potential to be different) with HT compared to discrete cpus is that a) HT is free so it will be ubiquitous eventually, and b) HT drops the thread context switch time to 0. It's not worth starting up a thread on another cpu to do a few instructions worth of work, but it is conceivable that it would be for HT. Again, I think this will mirror MMX. The original version of MMX has a horrible context switch time, and overloaded the FPU registers. It was worthless. They fixed it. I assume there are similar gotchas with the first version of HT. But, in a couple revs, they'll fix it and it will be possible to have a second thread do half the work in a small loop, with no overhead (there'll be a hw thread pool, hw wait on mutex/sleep, etc.). The reason HT can make a performance difference is that your app is stalling in the CPU all the time anyway. Even tight loops aren't memory bandwidth bound (unless it's a copy or fill), they're memory access bound; there's a huge difference between the two. HT can take advantage of the latter and give you way more utilization, even on a smallscale loop. In theory, anyway. :) But, as I said, I have [non-Intel] colleagues who have seen big wins with HT on some applications, enough to make them say, "huh, this actually works!" Now, you could just say, "hey, caml's not for that kind of lowlevel stuff", which is a fine response. However, I've been doing a lot of lowlevel stuff in my game, all in caml (linear algebra, 3d transforms, bitmap operations, etc.), and it's so close to being good enough to just stay in caml and not have to drop to C. I understand the point of using the right tool for the job, but there is overhead (both cognitive and development-process-wise, both important) associated with hooking something in C, and so it would be really nice to stay in caml all the time. Bringing this back to HT, this is the kind of feature that requires inria to do it, because I don't think anybody else understands the gc. By contrast, I could probably get an SSE code generator working if I thought it was worth it. But there's no way I could multithread the gc. :) >More important, you don't need threads and shared memory to make use >of a SMP machine. Any kind of parallelism will do. Several processes >with message-passing can easily get you 100% load on all your processors. >Also, message-passing is more general; for example it will work on clusters. Sure, but an HT cpu shares L1 and L2 caches between the threads. This means that you really want your threads to be working on the same data and code if you can help it. It'll still work for processes, but you're going to thrash way more than if you're doing local stuff. Again, I'm not an HT zealot; I don't even know if it's going to succeed. But, I do think it has the potential to have a big impact on performance oriented programming, and it would be great if there's a plan for supporting it in caml if it actually works. If it's simply not possible to multithread the gc well, then that's that. But it seems like something you want to have simmering on the mental back burner in case it turns out you want it later. Sorry for the huge post, Chris ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners