On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux wrote: > First of all let's try to stop the squabling and have some actual some > discussions with actual content (trolling is very tempting and I am > the first to fall for it). OCaml is extremly nice but not perfect. > Other languages have other tradeoffs and the INRIA is not here to > fullfill all our desires. > > > On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann > wrote: > > > > Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop: > >> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote: > >> > I think the parallelism capabilities are already excellent. We have > been > >> > able to implement the application backend of Wink's people search in > >> > O'Caml, and it is of course a highly parallel system of programs. This > >> > is not the same class raytracers or desktop parallelism fall into - > this > >> > is highly professional supercomputing. I'm talking about a cluster of > >> > ~20 computers with something like 60 CPUs. > >> > > >> > Of course, we did not use multithreading very much. We are relying on > >> > multi-processing (both "fork"ed style and separately started > programs), > >> > and multiplexing (i.e. application-driven micro-threading). I > especially > >> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute > >> > for most applications of multithreading. For example, you want to > query > >> > multiple remote servers in parallel: Very easy with multiplexing, > >> > whereas the multithreaded counterpart would quickly run into > scalability > >> > problems (threads are heavy-weight, and need a lot of resources). > >> > >> If OCaml is good for concurrency on distributed systems that is great > but it > >> is completely different to CPU-bound parallelism on multicores. > > > > You sound like somebody who tries to sell hardware :-) > > > > Well, our algorithms are quite easy to parallelize. I don't see a > > difference in whether they are CPU-bound or disk-bound - we also have > > lots of CPU-bound stuff, and the parallelization strategies are the > > same. > > > > The important thing is whether the algorithm can be formulated in a way > > so that state mutations are rare, or can at least be done in a > > "cache-friendly" way. Such algorithms exist for a lot of problems. I > > don't know which problems you want to solve, but it sounds like as if it > > were special problems. Like for most industries, most of our problems > > are simply "do the same for N objects" where N is very large, and > > sometimes "sort data", also for large N. > > > >> > In our case, the mutable data structures that count are on disk. > >> > Everything else is only temporary state. > >> > >> Exactly. That is a completely different kettle of fish to writing high > >> performance numerical codes for scientific computing. > > > > I don't understand. Relying on disk for sharing state is a big problem > > for us, but unavoidable. Disk is slow memory with a very special timing. > > Experience shows that even accessing state over the network is cheaper > > than over disk. Often, we end up designing our algorithms around the > > disk access characteristics. Compared to that the access to RAM-backed > > state over network is fast and easy. > > > shm_open shares memories through file descriptors and, under > linux/glibc, this done using /dev/shm. You can mmap this as a bigarray > and, voila, shared memory. This is quite nice for numerical > computation, plus you get closures etc... in your forks. Oh and COW on > modern OS's makes this very cheap. Yes, that's the kind of approach I like. - Do not forget to do a Gc.compact before forking to avoid collecting the same unreacahble data in each fork. - For sharing complex data, you can marshall into a shared Bigarray. If the speed of Marshal becomes a bottleneck, a specialized Marshal that skips most of the checks/byte-oriented, compact serialization things that extern.c currently does could speed things up. - A means for inter-process synchronization/communication is still needed. A userland solution using a shared memory consensus algorithm (which would probably require some C or assembly for atomic operations) could be cheap. -- Berke