From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <goswin-v-b@web.de>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by yquem.inria.fr (Postfix) with ESMTP id EB7E7BBAF
	for <caml-list@yquem.inria.fr>; Wed, 27 Oct 2010 11:36:57 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AkoEAF+Px0zZSMDje2dsb2JhbAChWBUBAQsLChYFH70JhUgEjVs
X-IronPort-AV: E=Sophos;i="4.58,245,1286143200"; 
   d="scan'208";a="64409257"
Received: from fmmailgate02.web.de ([217.72.192.227])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 27 Oct 2010 11:36:57 +0200
Received:  from smtp01.web.de  ( [172.20.0.243])
	by fmmailgate02.web.de (Postfix) with ESMTP id 24BCD17AD6FE9;
	Wed, 27 Oct 2010 11:33:52 +0200 (CEST)
Received: from [78.43.204.177] (helo=frosties.localdomain)
	by smtp01.web.de with asmtp (TLSv1:AES256-SHA:256)
	(WEB.DE 4.110 #24)
	id 1PB2OG-000240-00; Wed, 27 Oct 2010 11:33:52 +0200
Received: from mrvn by frosties.localdomain with local (Exim 4.72)
	(envelope-from <goswin-v-b@web.de>)
	id 1PB2OF-0000Gh-JO; Wed, 27 Oct 2010 11:33:51 +0200
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Jeremie Dimino <jeremie@dimino.org>
Cc: Yaron Minsky <yminsky@gmail.com>, caml-list@inria.fr
Subject: Re: [Caml-list] Asynchronous IO programming in OCaml
In-Reply-To: <20101025172633.GC32282@aurora> ("Jeremie Dimino"'s message of
	"Mon, 25 Oct 2010 19:26:33 +0200")
References: <044101cb7367$10f94b30$32ebe190$@com>
	<AANLkTikgf2qF4y=o+S8PYUxZoCrCG-TRcovUq8uB=y7k@mail.gmail.com>
	<A7C6ED92-10BA-444B-A381-FC6AB7666932@recoil.org>
	<20101024225037.GA8999@aurora> <87y69mk6jm.fsf@frosties.localdomain>
	<20101025111022.GA32282@aurora>
	<AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com>
	<20101025143317.GB32282@aurora>
	<AANLkTikZvCMQaOFknx4TBKUESpYcc2riRUexpU4QnFht@mail.gmail.com>
	<20101025172633.GC32282@aurora>
User-Agent: Gnus/5.110009 (No Gnus v0.9) XEmacs/21.4.22 (linux, no MULE)
Date: Wed, 27 Oct 2010 11:33:51 +0200
Message-ID: <877hh4dlog.fsf@frosties.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: goswin-v-b@web.de
X-Sender: goswin-v-b@web.de
X-Provags-ID: V01U2FsdGVkX18kcbORgs87XfvQ4iO4QOkifyAfrkPGiDBmloGR
	LPY3s8g9CuyiBTQFCHrMr3GpduGJS3h5qWATNy7MQ8Xf5JLsLq
	KFZ4q3ELg=
X-Spam: no; 0.00; ocaml:01 yaron:01 minsky:01 fcntl:01 unistd:01 buffer:01 4096:01 buffer:01 4096:01 fcntl:01 unistd:01 parallelism:01 mutexes:01 synchronous:01 mfg:98 

Jérémie Dimino <jeremie@dimino.org> writes:

> On Mon, Oct 25, 2010 at 11:34:41AM -0400, Yaron Minsky wrote:
>>    I don't quite understand how this whole benchmark holds together.  Could
>>    you post the C code?  I don't understand the differences between (1), (2)
>>    and (3) well enough to explain where the factor of 100 comes in.
>
> Yes. Here is the code of the first program:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | 
> | int main()
> | {
> |   int fd = open("data", O_RDONLY);
> |   char buffer[4096];
> | 
> |   while (read(fd, buffer, 4096) > 0);
> | 
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

Obvious example so nothing to comment. :)

> the code of the second:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | #include <pthread.h>
> | 
> | int fd;
> | char buffer[4096];
> | int done = 0;
> | 
> | void *callback(void* data)
> | {
> |   int count = read(fd, buffer, 4096);
> |   if (count == 0) done = 1;
> |   return NULL;
> | }
> | 
> | int main()
> | {
> |   fd = open("data", O_RDONLY);
> | 
> |   while (!done) {
> |     pthread_t thread;
> |     pthread_create(&thread, NULL, callback, NULL);
> |     pthread_join(thread, NULL);
> |   }
> | 
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

You aren't doing any multithreading. You are creating a thread and
waiting for the thread to finish its read before strating a second.
There are never ever 2 reads running in parallel. So all you do is add
thread creation and destruction for every read to your first example.

You should start multiple threads and let them read from different
offsets (use pread) and only once they are all started join them all
again.

> and the third:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | #include <pthread.h>
> | 
> | int fd;
> | char buffer[4096];
> | int done = 0;
> | pthread_cond_t start = PTHREAD_COND_INITIALIZER;
> | pthread_cond_t stop = PTHREAD_COND_INITIALIZER;
> | pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER;
> | pthread_mutex_t stop_mutex = PTHREAD_MUTEX_INITIALIZER;
> | 
> | void *callback(void* data)
> | {
> |   while (!done) {
> |     pthread_cond_wait(&start, &start_mutex);
> | 
> |     int count = read(fd, buffer, 4096);
> |     if (count == 0) done = 1;
> | 
> |     pthread_mutex_lock(&stop_mutex);
> |     pthread_cond_signal(&stop);
> |     pthread_mutex_unlock(&stop_mutex);
> |   }
> |   return NULL;
> | }
> | 
> | int main()
> | {
> |   fd = open("data", O_RDONLY);
> | 
> |   pthread_cond_init(&start, NULL);
> |   pthread_cond_init(&stop, NULL);
> | 
> |   pthread_mutex_lock(&start_mutex);
> |   pthread_mutex_lock(&stop_mutex);
> | 
> |   pthread_t thread;
> |   pthread_create(&thread, NULL, callback, NULL);
> | 
> |   while (!done) {
> |     pthread_mutex_lock(&start_mutex);
> |     pthread_cond_signal(&start);
> |     pthread_mutex_unlock(&start_mutex);
> | 
> |     pthread_cond_wait(&stop, &stop_mutex);
> |   }
> | 
> |   pthread_join(thread, NULL);
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

Again no parallelism at all. Instead of thread creation and destruction
you now add mutexes between the reads. Again the only expected result is
a slowdown.

You should create X threads at the start and then repeadately give them
work (an offset to read). Only when they are all busy you wait for
one of them to become idle again.

> Jérémie


So far you have failed to do asynchronous IO. You examples all wait for
the read to complete making it synchronous and then threads are
obviously slower.

Also sequential reads from a file should result in sequential reads from
the disk (unless the filesystem fragments the file a lot or you created
it that way). That is the ideal situation for the disks and kernels
read-ahead. One advantage of doing many read/writes asynchronous is that
the kernel can reorder the requests and potentially merge
requests. Unless you crafted your input file to be fragmented you won't
see this effect in a test with sequential reads. And that effect would
be the only thing making multithreaded reads faster in a test like the
above.

MfG
        Goswin