From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA04559 for caml-red; Mon, 11 Dec 2000 18:29:39 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA04757 for ; Sat, 9 Dec 2000 14:37:56 +0100 (MET) Received: from moutvdom00.kundenserver.de (moutvdom00.kundenserver.de [195.20.224.149]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id eB9Dbtr10182 for ; Sat, 9 Dec 2000 14:37:56 +0100 (MET) Received: from [195.20.224.208] (helo=mrvdom01.schlund.de) by moutvdom00.kundenserver.de with esmtp (Exim 2.12 #2) id 144kCE-0002Dr-00; Sat, 9 Dec 2000 14:37:50 +0100 Received: from drms-3e364b52.pool.mediaways.net ([62.54.75.82] helo=ice.gerd-stolpmann.de) by mrvdom01.schlund.de with esmtp (Exim 2.12 #2) id 144kCA-0008CI-00; Sat, 9 Dec 2000 14:37:46 +0100 Received: from localhost (localhost [[UNIX: localhost]]) by ice.gerd-stolpmann.de (8.9.3/8.9.3) id OAA23603; Sat, 9 Dec 2000 14:37:39 +0100 From: Gerd Stolpmann Reply-To: gerd@gerd-stolpmann.de Organization: privat To: Markus Mottl Subject: Re: features of PCRE-OCaml Date: Sat, 9 Dec 2000 14:12:13 +0100 X-Mailer: KMail [version 1.0.28] Content-Type: text/plain Cc: Miles Egan , OCAML References: <20001206015139.D31140@miss.wu-wien.ac.at> <00120816434509.00625@ice> <20001209040315.D26367@miss.wu-wien.ac.at> In-Reply-To: <20001209040315.D26367@miss.wu-wien.ac.at> MIME-Version: 1.0 Message-Id: <0012091437390B.00625@ice> Content-Transfer-Encoding: 8bit Sender: weis@pauillac.inria.fr On Sat, 09 Dec 2000, Markus Mottl wrote: >Gerd Stolpmann schrieb am Friday, den 08. December 2000: >> There are two functions making it easy: enter_blocking_section and >> leave_blocking_section. For example, the stub for the read syscall of the Unix >> library: > >Ok, I have found an article by Xavier on these functions: > > http://caml.inria.fr/archives/199905/msg00035.html > >So if I am not mistaken, a function that calls the GC or allocates memory >on the OCaml-heap cannot be considered reentrant even if its semantics >is otherwise referentially transparent. This means that just "tagging" >a function as "pure" is no guarantee that it won't mess up the runtime >when e.g. calling the GC concurrently - right? For example, the situation must not occur where one thread is initializing memory and is interrupted by another thread allocating memory and calling the GC. One precondition of the GC is that memory is always initialized. "Reentrancy" is an abstract view on the function interface; it is not true for lower coding levels because (heap) memory is nothing but a large global variable implicitly shared by every piece of code. >In other terms I can put those functions around the largest section of >C-code that doesn't interfere with the OCaml-runtime system - then I >should be safe. I think so. >The only question now is: would it really pay for pattern matching in the >PCRE? I have taken a look at the implementation of these functions and on >their use, but have only found cases where some function really blocks for >either an indefinite (e.g. read) or at least potentially very long amount >of time (e.g. gethostbyaddr, which might need to contact a nameserver). > >Without threads we won't benefit, anyway, and if we use threads, there >is a small overhead associated with calling these functions. Pattern >matching maybe does not eat up so much time in the average case that this >is justified. Any experiences or suggestions when using these functions >is advisable? I would say it depends on the problem size. For example, when searching in a long text it is definitely worth-while to release the masterlock. The more interesting case is the average text processing program with many invocations of the PCRE engine with average problem sizes. The question is whether the sum of all invocations is big enough such that an effect is measurable. Ideally, I can imagine a two processor system in which one CPU only executes Caml code, and the other only regexps. From the Caml CPU's point of view, the PCRE calls are (ideally) cost-free (because they are delegated to the other CPU). However, there is a synchronization overhead, and nothing is won if the Caml CPU must spend more time with synchronization than it would spend with executing the regexp itself. I think it is worth an experiment. Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: gerd@gerd-stolpmann.de Germany ----------------------------------------------------------------------------