From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id EB7E7BBAF for ; Wed, 27 Oct 2010 11:36:57 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkoEAF+Px0zZSMDje2dsb2JhbAChWBUBAQsLChYFH70JhUgEjVs X-IronPort-AV: E=Sophos;i="4.58,245,1286143200"; d="scan'208";a="64409257" Received: from fmmailgate02.web.de ([217.72.192.227]) by mail3-smtp-sop.national.inria.fr with ESMTP; 27 Oct 2010 11:36:57 +0200 Received: from smtp01.web.de ( [172.20.0.243]) by fmmailgate02.web.de (Postfix) with ESMTP id 24BCD17AD6FE9; Wed, 27 Oct 2010 11:33:52 +0200 (CEST) Received: from [78.43.204.177] (helo=frosties.localdomain) by smtp01.web.de with asmtp (TLSv1:AES256-SHA:256) (WEB.DE 4.110 #24) id 1PB2OG-000240-00; Wed, 27 Oct 2010 11:33:52 +0200 Received: from mrvn by frosties.localdomain with local (Exim 4.72) (envelope-from ) id 1PB2OF-0000Gh-JO; Wed, 27 Oct 2010 11:33:51 +0200 From: Goswin von Brederlow To: Jeremie Dimino Cc: Yaron Minsky , caml-list@inria.fr Subject: Re: [Caml-list] Asynchronous IO programming in OCaml In-Reply-To: <20101025172633.GC32282@aurora> ("Jeremie Dimino"'s message of "Mon, 25 Oct 2010 19:26:33 +0200") References: <044101cb7367$10f94b30$32ebe190$@com> <20101024225037.GA8999@aurora> <87y69mk6jm.fsf@frosties.localdomain> <20101025111022.GA32282@aurora> <20101025143317.GB32282@aurora> <20101025172633.GC32282@aurora> User-Agent: Gnus/5.110009 (No Gnus v0.9) XEmacs/21.4.22 (linux, no MULE) Date: Wed, 27 Oct 2010 11:33:51 +0200 Message-ID: <877hh4dlog.fsf@frosties.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: goswin-v-b@web.de X-Sender: goswin-v-b@web.de X-Provags-ID: V01U2FsdGVkX18kcbORgs87XfvQ4iO4QOkifyAfrkPGiDBmloGR LPY3s8g9CuyiBTQFCHrMr3GpduGJS3h5qWATNy7MQ8Xf5JLsLq KFZ4q3ELg= X-Spam: no; 0.00; ocaml:01 yaron:01 minsky:01 fcntl:01 unistd:01 buffer:01 4096:01 buffer:01 4096:01 fcntl:01 unistd:01 parallelism:01 mutexes:01 synchronous:01 mfg:98 Jérémie Dimino writes: > On Mon, Oct 25, 2010 at 11:34:41AM -0400, Yaron Minsky wrote: >> I don't quite understand how this whole benchmark holds together.  Could >> you post the C code?  I don't understand the differences between (1), (2) >> and (3) well enough to explain where the factor of 100 comes in. > > Yes. Here is the code of the first program: > > ,---- > | #include > | #include > | #include > | #include > | > | int main() > | { > | int fd = open("data", O_RDONLY); > | char buffer[4096]; > | > | while (read(fd, buffer, 4096) > 0); > | > | close(fd); > | > | return 0; > | } > `---- Obvious example so nothing to comment. :) > the code of the second: > > ,---- > | #include > | #include > | #include > | #include > | #include > | > | int fd; > | char buffer[4096]; > | int done = 0; > | > | void *callback(void* data) > | { > | int count = read(fd, buffer, 4096); > | if (count == 0) done = 1; > | return NULL; > | } > | > | int main() > | { > | fd = open("data", O_RDONLY); > | > | while (!done) { > | pthread_t thread; > | pthread_create(&thread, NULL, callback, NULL); > | pthread_join(thread, NULL); > | } > | > | close(fd); > | > | return 0; > | } > `---- You aren't doing any multithreading. You are creating a thread and waiting for the thread to finish its read before strating a second. There are never ever 2 reads running in parallel. So all you do is add thread creation and destruction for every read to your first example. You should start multiple threads and let them read from different offsets (use pread) and only once they are all started join them all again. > and the third: > > ,---- > | #include > | #include > | #include > | #include > | #include > | > | int fd; > | char buffer[4096]; > | int done = 0; > | pthread_cond_t start = PTHREAD_COND_INITIALIZER; > | pthread_cond_t stop = PTHREAD_COND_INITIALIZER; > | pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER; > | pthread_mutex_t stop_mutex = PTHREAD_MUTEX_INITIALIZER; > | > | void *callback(void* data) > | { > | while (!done) { > | pthread_cond_wait(&start, &start_mutex); > | > | int count = read(fd, buffer, 4096); > | if (count == 0) done = 1; > | > | pthread_mutex_lock(&stop_mutex); > | pthread_cond_signal(&stop); > | pthread_mutex_unlock(&stop_mutex); > | } > | return NULL; > | } > | > | int main() > | { > | fd = open("data", O_RDONLY); > | > | pthread_cond_init(&start, NULL); > | pthread_cond_init(&stop, NULL); > | > | pthread_mutex_lock(&start_mutex); > | pthread_mutex_lock(&stop_mutex); > | > | pthread_t thread; > | pthread_create(&thread, NULL, callback, NULL); > | > | while (!done) { > | pthread_mutex_lock(&start_mutex); > | pthread_cond_signal(&start); > | pthread_mutex_unlock(&start_mutex); > | > | pthread_cond_wait(&stop, &stop_mutex); > | } > | > | pthread_join(thread, NULL); > | close(fd); > | > | return 0; > | } > `---- Again no parallelism at all. Instead of thread creation and destruction you now add mutexes between the reads. Again the only expected result is a slowdown. You should create X threads at the start and then repeadately give them work (an offset to read). Only when they are all busy you wait for one of them to become idle again. > Jérémie So far you have failed to do asynchronous IO. You examples all wait for the read to complete making it synchronous and then threads are obviously slower. Also sequential reads from a file should result in sequential reads from the disk (unless the filesystem fragments the file a lot or you created it that way). That is the ideal situation for the disks and kernels read-ahead. One advantage of doing many read/writes asynchronous is that the kernel can reorder the requests and potentially merge requests. Unless you crafted your input file to be fragmented you won't see this effect in a test with sequential reads. And that effect would be the only thing making multithreaded reads faster in a test like the above. MfG Goswin