caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Jeremie Dimino <jeremie@dimino.org>
Cc: Yaron Minsky <yminsky@gmail.com>, caml-list@inria.fr
Subject: Re: [Caml-list] Asynchronous IO programming in OCaml
Date: Wed, 27 Oct 2010 11:33:51 +0200	[thread overview]
Message-ID: <877hh4dlog.fsf@frosties.localdomain> (raw)
In-Reply-To: <20101025172633.GC32282@aurora> ("Jeremie Dimino"'s message of "Mon, 25 Oct 2010 19:26:33 +0200")

Jérémie Dimino <jeremie@dimino.org> writes:

> On Mon, Oct 25, 2010 at 11:34:41AM -0400, Yaron Minsky wrote:
>>    I don't quite understand how this whole benchmark holds together.  Could
>>    you post the C code?  I don't understand the differences between (1), (2)
>>    and (3) well enough to explain where the factor of 100 comes in.
>
> Yes. Here is the code of the first program:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | 
> | int main()
> | {
> |   int fd = open("data", O_RDONLY);
> |   char buffer[4096];
> | 
> |   while (read(fd, buffer, 4096) > 0);
> | 
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

Obvious example so nothing to comment. :)

> the code of the second:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | #include <pthread.h>
> | 
> | int fd;
> | char buffer[4096];
> | int done = 0;
> | 
> | void *callback(void* data)
> | {
> |   int count = read(fd, buffer, 4096);
> |   if (count == 0) done = 1;
> |   return NULL;
> | }
> | 
> | int main()
> | {
> |   fd = open("data", O_RDONLY);
> | 
> |   while (!done) {
> |     pthread_t thread;
> |     pthread_create(&thread, NULL, callback, NULL);
> |     pthread_join(thread, NULL);
> |   }
> | 
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

You aren't doing any multithreading. You are creating a thread and
waiting for the thread to finish its read before strating a second.
There are never ever 2 reads running in parallel. So all you do is add
thread creation and destruction for every read to your first example.

You should start multiple threads and let them read from different
offsets (use pread) and only once they are all started join them all
again.

> and the third:
>
> ,----
> | #include <sys/types.h>
> | #include <sys/stat.h>
> | #include <fcntl.h>
> | #include <unistd.h>
> | #include <pthread.h>
> | 
> | int fd;
> | char buffer[4096];
> | int done = 0;
> | pthread_cond_t start = PTHREAD_COND_INITIALIZER;
> | pthread_cond_t stop = PTHREAD_COND_INITIALIZER;
> | pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER;
> | pthread_mutex_t stop_mutex = PTHREAD_MUTEX_INITIALIZER;
> | 
> | void *callback(void* data)
> | {
> |   while (!done) {
> |     pthread_cond_wait(&start, &start_mutex);
> | 
> |     int count = read(fd, buffer, 4096);
> |     if (count == 0) done = 1;
> | 
> |     pthread_mutex_lock(&stop_mutex);
> |     pthread_cond_signal(&stop);
> |     pthread_mutex_unlock(&stop_mutex);
> |   }
> |   return NULL;
> | }
> | 
> | int main()
> | {
> |   fd = open("data", O_RDONLY);
> | 
> |   pthread_cond_init(&start, NULL);
> |   pthread_cond_init(&stop, NULL);
> | 
> |   pthread_mutex_lock(&start_mutex);
> |   pthread_mutex_lock(&stop_mutex);
> | 
> |   pthread_t thread;
> |   pthread_create(&thread, NULL, callback, NULL);
> | 
> |   while (!done) {
> |     pthread_mutex_lock(&start_mutex);
> |     pthread_cond_signal(&start);
> |     pthread_mutex_unlock(&start_mutex);
> | 
> |     pthread_cond_wait(&stop, &stop_mutex);
> |   }
> | 
> |   pthread_join(thread, NULL);
> |   close(fd);
> | 
> |   return 0;
> | }
> `----

Again no parallelism at all. Instead of thread creation and destruction
you now add mutexes between the reads. Again the only expected result is
a slowdown.

You should create X threads at the start and then repeadately give them
work (an offset to read). Only when they are all busy you wait for
one of them to become idle again.

> Jérémie


So far you have failed to do asynchronous IO. You examples all wait for
the read to complete making it synchronous and then threads are
obviously slower.

Also sequential reads from a file should result in sequential reads from
the disk (unless the filesystem fragments the file a lot or you created
it that way). That is the ideal situation for the disks and kernels
read-ahead. One advantage of doing many read/writes asynchronous is that
the kernel can reorder the requests and potentially merge
requests. Unless you crafted your input file to be fragmented you won't
see this effect in a test with sequential reads. And that effect would
be the only thing making multithreaded reads faster in a test like the
above.

MfG
        Goswin


  reply	other threads:[~2010-10-27  9:36 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-24 10:34 Jon Harrop
2010-10-24 12:51 ` [Caml-list] " philippe
2010-10-24 12:52 ` Dario Teixeira
2010-10-24 16:33   ` oliver
2010-10-24 18:50     ` Dario Teixeira
2010-10-24 19:04       ` bluestorm
2010-10-24 20:02       ` oliver
2010-10-24 21:51     ` Michael Ekstrand
2010-10-24 16:17 ` Jake Donham
2010-10-24 20:54   ` Anil Madhavapeddy
2010-10-24 22:50     ` Jérémie Dimino
2010-10-25  3:42       ` Markus Mottl
2010-10-25  7:49         ` Richard Jones
2010-10-25  8:42       ` Goswin von Brederlow
2010-10-25 11:10         ` Jérémie Dimino
     [not found]           ` <AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com>
     [not found]             ` <20101025143317.GB32282@aurora>
2010-10-25 15:34               ` Yaron Minsky
2010-10-25 17:26                 ` Jérémie Dimino
2010-10-27  9:33                   ` Goswin von Brederlow [this message]
2010-10-27 11:18                     ` Jérémie Dimino
2010-10-27 13:43                       ` Goswin von Brederlow
2010-10-27 15:30                         ` Jérémie Dimino
2010-10-28  9:00                           ` Goswin von Brederlow
2010-10-28  9:28                             ` Jérémie Dimino
2010-10-28 10:11                               ` Goswin von Brederlow
2010-10-25 15:58           ` DS
2010-10-24 20:42 ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877hh4dlog.fsf@frosties.localdomain \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    --cc=jeremie@dimino.org \
    --cc=yminsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).