[Caml-list] How is Async implemented?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] How is Async implemented?
@ 2014-06-03 15:39 Dan Stark
  2014-06-03 16:29 ` David House
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Stark @ 2014-06-03 15:39 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

Hi all

I am trying to get a rough overview of how Async is implemented (or the
idea behind it) before I really dig into its source code.

I have the following questions:

*Q1:* Is Async event-loop like?

From the API and some docs for Async's usage, I feel it is quite like a
event-loop.

You create Deferred.t and it might be added to a queue and a scheduler
behind might be adjusting the order of running for all Deferred.t in the
queue.

Am I correct?

*Q2:* Deferred.return and Deferred.bind

If I say

Deferred.return 1

It will returns me a Deferred.t, but inside the function *return* or *bind*
somehow an "event" is implicitly added to the default queue for scheduling,
right?

If I am correct above,

*Q3:* Is Async depending on -thread? The queue or scheduler needs compiler
support?

I just need to understand the whole picture in a rough way first.

Thanks

Dan

[-- Attachment #2: Type: text/html, Size: 1664 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 15:39 [Caml-list] How is Async implemented? Dan Stark
@ 2014-06-03 16:29 ` David House
  2014-06-03 20:59   ` Dan Stark
  0 siblings, 1 reply; 7+ messages in thread
From: David House @ 2014-06-03 16:29 UTC (permalink / raw)
  To: Dan Stark; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 3402 bytes --]

There is a queue of jobs in the scheduler. The scheduler runs the jobs one
by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a -> unit].

There's a thing called a deferred. ['a Deferred.t] is an initially empty
box that may become filled later with something of type ['a]. There is a
similar type called ['a Ivar.t] -- the difference is that ivars have a
function to actually fill in the value, whereas deferreds do not: a
deferred is a "read-only" view on an ivar.

You can wait on a deferred using bind. Doing [x >>= f] mutates the deferred
x to add f as a "handler". When a deferred is filled, it adds a job to the
scheduler for each handler it has.

Doing [Deferred.return 1] allocates a deferred which is already filled and
has no handlers. Binding on that will immediately schedule a job to run
your function. (The job is still scheduled though, rather than being run
immediately, to ensure that you don't have an immediate context switch --
in async, the only context switch points are the binds.)

The primitive operations that block are replaced with functions that return
deferreds, and go do their work in a separate thread. There's a thread pool
to make sure you don't use infinity threads. (I think the default cap is 50
threads.) I think yes, async does depend on -thread.

There is an important optimisation: if you want to read or write to certain
file descriptors, that doesn't use a thread. Instead there's a central list
of such file descriptors. There's also a central list of all "timer events"
(e.g. deferreds that become deferred after some amount of time). The
scheduler actually is based around a select loop: it does the following:

run all the jobs
if more jobs have been scheduled, run those too
keep going until there are no more jobs, or we hit the
maximum-jobs-per-cycle cap
sleep using select until one read fd is read, or a write fd is ready, or a
timer event is due to fire
do that thing

There's also a way to manually interrupt the scheduler. Blocking operations
other than reading/writing to fds do this: they run in a thread, grab the
async scheduler lock, fill in an ivar, then wake up the scheduler to ensure
timely running of the jobs they just scheduled. The async scheduler lock is
necessary because the scheduler itself is not re-entrant: you cannot have
multiple threads modifying the scheduler's internals.

On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:

> Hi all
>
> I am trying to get a rough overview of how Async is implemented (or the
> idea behind it) before I really dig into its source code.
>
> I have the following questions:
>
> *Q1:* Is Async event-loop like?
>
> From the API and some docs for Async's usage, I feel it is quite like a
> event-loop.
>
> You create Deferred.t and it might be added to a queue and a scheduler
> behind might be adjusting the order of running for all Deferred.t in the
> queue.
>
> Am I correct?
>
> *Q2:* Deferred.return and Deferred.bind
>
> If I say
>
> Deferred.return 1
>
>
> It will returns me a Deferred.t, but inside the function *return* or
> *bind* somehow an "event" is implicitly added to the default queue for
> scheduling, right?
>
> If I am correct above,
>
> *Q3:* Is Async depending on -thread? The queue or scheduler needs
> compiler support?
>
> I just need to understand the whole picture in a rough way first.
>
> Thanks
>
> Dan
>
>
>
>
>
>
>
>
>
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 4796 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 16:29 ` David House
@ 2014-06-03 20:59   ` Dan Stark
  2014-06-03 22:33     ` Ashish Agarwal
  2014-06-03 23:17     ` Yaron Minsky
  0 siblings, 2 replies; 7+ messages in thread
From: Dan Stark @ 2014-06-03 20:59 UTC (permalink / raw)
  To: David House; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 5010 bytes --]

Hi David

Thank you very much for this comprehensive explanation.

Can I also know who is responsible for the queue and scheduler?

Are they created and maintained by OCaml thread (OCaml internal) or Async
(3rd party library, which means Async create the job queue and has its own
scheduler)?

In addition, will the compiler got involved in handling Deferred.t?

I ask above questions because I felt quite curious about what is happening
in the followings:

Suppose we have a normal function:

let f1 () = print_endline "hello"; whatever_result;;


*Normally*, no matter what *whatever_result *is, when I do *let _ = f1 ();;*,
*print_endline "hello" *will be executed, am I right? For example, finally
returning an int or a record or a lazy.t, etc, "hello" would be printed out.

However, if I do

let f2 () = print_endline "hello"; return 1;;


*let _ = f2 ();; *would do nothing unless I run the schedule *let _ =
ignore(Scheduler.go());; *

Since for *f2* I am not using any other special creation function and the
only special bit is *return 1* after *print_endline*, if the compiler
doesn't get involved, how can compiler know the whole application of
*f2()* should
be in future execution?

Sorry for my above verbose questions if they are boring. I am just trying
to understand more and I guess eventually I will look into the code once I
grasp the big picture.

thanks

Dan









On Tue, Jun 3, 2014 at 5:29 PM, David House <dhouse@janestreet.com> wrote:

> There is a queue of jobs in the scheduler. The scheduler runs the jobs one
> by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a -> unit].
>
> There's a thing called a deferred. ['a Deferred.t] is an initially empty
> box that may become filled later with something of type ['a]. There is a
> similar type called ['a Ivar.t] -- the difference is that ivars have a
> function to actually fill in the value, whereas deferreds do not: a
> deferred is a "read-only" view on an ivar.
>
> You can wait on a deferred using bind. Doing [x >>= f] mutates the
> deferred x to add f as a "handler". When a deferred is filled, it adds a
> job to the scheduler for each handler it has.
>
> Doing [Deferred.return 1] allocates a deferred which is already filled and
> has no handlers. Binding on that will immediately schedule a job to run
> your function. (The job is still scheduled though, rather than being run
> immediately, to ensure that you don't have an immediate context switch --
> in async, the only context switch points are the binds.)
>
> The primitive operations that block are replaced with functions that
> return deferreds, and go do their work in a separate thread. There's a
> thread pool to make sure you don't use infinity threads. (I think the
> default cap is 50 threads.) I think yes, async does depend on -thread.
>
> There is an important optimisation: if you want to read or write to
> certain file descriptors, that doesn't use a thread. Instead there's a
> central list of such file descriptors. There's also a central list of all
> "timer events" (e.g. deferreds that become deferred after some amount of
> time). The scheduler actually is based around a select loop: it does the
> following:
>
> run all the jobs
> if more jobs have been scheduled, run those too
> keep going until there are no more jobs, or we hit the
> maximum-jobs-per-cycle cap
> sleep using select until one read fd is read, or a write fd is ready, or a
> timer event is due to fire
> do that thing
>
> There's also a way to manually interrupt the scheduler. Blocking
> operations other than reading/writing to fds do this: they run in a thread,
> grab the async scheduler lock, fill in an ivar, then wake up the scheduler
> to ensure timely running of the jobs they just scheduled. The async
> scheduler lock is necessary because the scheduler itself is not re-entrant:
> you cannot have multiple threads modifying the scheduler's internals.
>
>
> On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:
>
>> Hi all
>>
>> I am trying to get a rough overview of how Async is implemented (or the
>> idea behind it) before I really dig into its source code.
>>
>> I have the following questions:
>>
>> *Q1:* Is Async event-loop like?
>>
>> From the API and some docs for Async's usage, I feel it is quite like a
>> event-loop.
>>
>> You create Deferred.t and it might be added to a queue and a scheduler
>> behind might be adjusting the order of running for all Deferred.t in the
>> queue.
>>
>> Am I correct?
>>
>> *Q2:* Deferred.return and Deferred.bind
>>
>> If I say
>>
>> Deferred.return 1
>>
>>
>> It will returns me a Deferred.t, but inside the function *return* or
>> *bind* somehow an "event" is implicitly added to the default queue for
>> scheduling, right?
>>
>> If I am correct above,
>>
>> *Q3:* Is Async depending on -thread? The queue or scheduler needs
>> compiler support?
>>
>> I just need to understand the whole picture in a rough way first.
>>
>> Thanks
>>
>> Dan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 7529 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 20:59   ` Dan Stark
@ 2014-06-03 22:33     ` Ashish Agarwal
  2014-06-03 23:17       ` Dan Stark
  2014-06-03 23:17     ` Yaron Minsky
  1 sibling, 1 reply; 7+ messages in thread
From: Ashish Agarwal @ 2014-06-03 22:33 UTC (permalink / raw)
  To: Dan Stark; +Cc: David House, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 5725 bytes --]

When you use Async, you must do `open Async.Std`, which overrides all
blocking functions from the standard library. Thus, in f2, it's not that
the "return 1" part somehow changes the behavior of the previous code.
Rather, since you've written "return 1", you've presumably done `open
Async.Std`, so the print_endline function is actually the one from Async.
So no, the compiler doesn't get involved. Async is implemented purely as a
library.


On Tue, Jun 3, 2014 at 4:59 PM, Dan Stark <interlock.public@gmail.com>
wrote:

> Hi David
>
> Thank you very much for this comprehensive explanation.
>
> Can I also know who is responsible for the queue and scheduler?
>
> Are they created and maintained by OCaml thread (OCaml internal) or Async
> (3rd party library, which means Async create the job queue and has its own
> scheduler)?
>
> In addition, will the compiler got involved in handling Deferred.t?
>
> I ask above questions because I felt quite curious about what is happening
> in the followings:
>
> Suppose we have a normal function:
>
> let f1 () = print_endline "hello"; whatever_result;;
>
>
> *Normally*, no matter what *whatever_result *is, when I do *let _ = f1
> ();;*, *print_endline "hello" *will be executed, am I right? For example,
> finally returning an int or a record or a lazy.t, etc, "hello" would be
> printed out.
>
> However, if I do
>
> let f2 () = print_endline "hello"; return 1;;
>
>
> *let _ = f2 ();; *would do nothing unless I run the schedule *let _ =
> ignore(Scheduler.go());; *
>
> Since for *f2* I am not using any other special creation function and the
> only special bit is *return 1* after *print_endline*, if the compiler
> doesn't get involved, how can compiler know the whole application of
> *f2()* should be in future execution?
>
> Sorry for my above verbose questions if they are boring. I am just trying
> to understand more and I guess eventually I will look into the code once I
> grasp the big picture.
>
> thanks
>
> Dan
>
>
>
>
>
>
>
>
>
> On Tue, Jun 3, 2014 at 5:29 PM, David House <dhouse@janestreet.com> wrote:
>
>> There is a queue of jobs in the scheduler. The scheduler runs the jobs
>> one by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a ->
>> unit].
>>
>> There's a thing called a deferred. ['a Deferred.t] is an initially empty
>> box that may become filled later with something of type ['a]. There is a
>> similar type called ['a Ivar.t] -- the difference is that ivars have a
>> function to actually fill in the value, whereas deferreds do not: a
>> deferred is a "read-only" view on an ivar.
>>
>> You can wait on a deferred using bind. Doing [x >>= f] mutates the
>> deferred x to add f as a "handler". When a deferred is filled, it adds a
>> job to the scheduler for each handler it has.
>>
>> Doing [Deferred.return 1] allocates a deferred which is already filled
>> and has no handlers. Binding on that will immediately schedule a job to run
>> your function. (The job is still scheduled though, rather than being run
>> immediately, to ensure that you don't have an immediate context switch --
>> in async, the only context switch points are the binds.)
>>
>> The primitive operations that block are replaced with functions that
>> return deferreds, and go do their work in a separate thread. There's a
>> thread pool to make sure you don't use infinity threads. (I think the
>> default cap is 50 threads.) I think yes, async does depend on -thread.
>>
>> There is an important optimisation: if you want to read or write to
>> certain file descriptors, that doesn't use a thread. Instead there's a
>> central list of such file descriptors. There's also a central list of all
>> "timer events" (e.g. deferreds that become deferred after some amount of
>> time). The scheduler actually is based around a select loop: it does the
>> following:
>>
>> run all the jobs
>> if more jobs have been scheduled, run those too
>> keep going until there are no more jobs, or we hit the
>> maximum-jobs-per-cycle cap
>> sleep using select until one read fd is read, or a write fd is ready, or
>> a timer event is due to fire
>> do that thing
>>
>> There's also a way to manually interrupt the scheduler. Blocking
>> operations other than reading/writing to fds do this: they run in a thread,
>> grab the async scheduler lock, fill in an ivar, then wake up the scheduler
>> to ensure timely running of the jobs they just scheduled. The async
>> scheduler lock is necessary because the scheduler itself is not re-entrant:
>> you cannot have multiple threads modifying the scheduler's internals.
>>
>>
>> On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> I am trying to get a rough overview of how Async is implemented (or the
>>> idea behind it) before I really dig into its source code.
>>>
>>> I have the following questions:
>>>
>>> *Q1:* Is Async event-loop like?
>>>
>>> From the API and some docs for Async's usage, I feel it is quite like a
>>> event-loop.
>>>
>>> You create Deferred.t and it might be added to a queue and a scheduler
>>> behind might be adjusting the order of running for all Deferred.t in the
>>> queue.
>>>
>>> Am I correct?
>>>
>>> *Q2:* Deferred.return and Deferred.bind
>>>
>>> If I say
>>>
>>> Deferred.return 1
>>>
>>>
>>> It will returns me a Deferred.t, but inside the function *return* or
>>> *bind* somehow an "event" is implicitly added to the default queue for
>>> scheduling, right?
>>>
>>> If I am correct above,
>>>
>>> *Q3:* Is Async depending on -thread? The queue or scheduler needs
>>> compiler support?
>>>
>>> I just need to understand the whole picture in a rough way first.
>>>
>>> Thanks
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 8571 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 22:33     ` Ashish Agarwal
@ 2014-06-03 23:17       ` Dan Stark
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Stark @ 2014-06-03 23:17 UTC (permalink / raw)
  To: Ashish Agarwal; +Cc: David House, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 6151 bytes --]

Hi Ashish

Ah, ok, understand now.

You are right, if I do

let f () = Pervasives.print_endline "hello";return 1;;


"hello" will be printed.

Thanks

Dan





On Tue, Jun 3, 2014 at 11:33 PM, Ashish Agarwal <agarwal1975@gmail.com>
wrote:

> When you use Async, you must do `open Async.Std`, which overrides all
> blocking functions from the standard library. Thus, in f2, it's not that
> the "return 1" part somehow changes the behavior of the previous code.
> Rather, since you've written "return 1", you've presumably done `open
> Async.Std`, so the print_endline function is actually the one from Async.
> So no, the compiler doesn't get involved. Async is implemented purely as a
> library.
>
>
> On Tue, Jun 3, 2014 at 4:59 PM, Dan Stark <interlock.public@gmail.com>
> wrote:
>
>> Hi David
>>
>> Thank you very much for this comprehensive explanation.
>>
>> Can I also know who is responsible for the queue and scheduler?
>>
>> Are they created and maintained by OCaml thread (OCaml internal) or Async
>> (3rd party library, which means Async create the job queue and has its own
>> scheduler)?
>>
>> In addition, will the compiler got involved in handling Deferred.t?
>>
>> I ask above questions because I felt quite curious about what is
>> happening in the followings:
>>
>> Suppose we have a normal function:
>>
>> let f1 () = print_endline "hello"; whatever_result;;
>>
>>
>> *Normally*, no matter what *whatever_result *is, when I do *let _ = f1
>> ();;*, *print_endline "hello" *will be executed, am I right? For
>> example, finally returning an int or a record or a lazy.t, etc, "hello"
>> would be printed out.
>>
>> However, if I do
>>
>> let f2 () = print_endline "hello"; return 1;;
>>
>>
>> *let _ = f2 ();; *would do nothing unless I run the schedule *let _ =
>> ignore(Scheduler.go());; *
>>
>> Since for *f2* I am not using any other special creation function and
>> the only special bit is *return 1* after *print_endline*, if the
>> compiler doesn't get involved, how can compiler know the whole application
>> of *f2()* should be in future execution?
>>
>> Sorry for my above verbose questions if they are boring. I am just trying
>> to understand more and I guess eventually I will look into the code once I
>> grasp the big picture.
>>
>> thanks
>>
>> Dan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 5:29 PM, David House <dhouse@janestreet.com>
>> wrote:
>>
>>> There is a queue of jobs in the scheduler. The scheduler runs the jobs
>>> one by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a ->
>>> unit].
>>>
>>> There's a thing called a deferred. ['a Deferred.t] is an initially empty
>>> box that may become filled later with something of type ['a]. There is a
>>> similar type called ['a Ivar.t] -- the difference is that ivars have a
>>> function to actually fill in the value, whereas deferreds do not: a
>>> deferred is a "read-only" view on an ivar.
>>>
>>> You can wait on a deferred using bind. Doing [x >>= f] mutates the
>>> deferred x to add f as a "handler". When a deferred is filled, it adds a
>>> job to the scheduler for each handler it has.
>>>
>>> Doing [Deferred.return 1] allocates a deferred which is already filled
>>> and has no handlers. Binding on that will immediately schedule a job to run
>>> your function. (The job is still scheduled though, rather than being run
>>> immediately, to ensure that you don't have an immediate context switch --
>>> in async, the only context switch points are the binds.)
>>>
>>> The primitive operations that block are replaced with functions that
>>> return deferreds, and go do their work in a separate thread. There's a
>>> thread pool to make sure you don't use infinity threads. (I think the
>>> default cap is 50 threads.) I think yes, async does depend on -thread.
>>>
>>> There is an important optimisation: if you want to read or write to
>>> certain file descriptors, that doesn't use a thread. Instead there's a
>>> central list of such file descriptors. There's also a central list of all
>>> "timer events" (e.g. deferreds that become deferred after some amount of
>>> time). The scheduler actually is based around a select loop: it does the
>>> following:
>>>
>>> run all the jobs
>>> if more jobs have been scheduled, run those too
>>> keep going until there are no more jobs, or we hit the
>>> maximum-jobs-per-cycle cap
>>> sleep using select until one read fd is read, or a write fd is ready, or
>>> a timer event is due to fire
>>> do that thing
>>>
>>> There's also a way to manually interrupt the scheduler. Blocking
>>> operations other than reading/writing to fds do this: they run in a thread,
>>> grab the async scheduler lock, fill in an ivar, then wake up the scheduler
>>> to ensure timely running of the jobs they just scheduled. The async
>>> scheduler lock is necessary because the scheduler itself is not re-entrant:
>>> you cannot have multiple threads modifying the scheduler's internals.
>>>
>>>
>>> On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:
>>>
>>>> Hi all
>>>>
>>>> I am trying to get a rough overview of how Async is implemented (or the
>>>> idea behind it) before I really dig into its source code.
>>>>
>>>> I have the following questions:
>>>>
>>>> *Q1:* Is Async event-loop like?
>>>>
>>>> From the API and some docs for Async's usage, I feel it is quite like a
>>>> event-loop.
>>>>
>>>> You create Deferred.t and it might be added to a queue and a scheduler
>>>> behind might be adjusting the order of running for all Deferred.t in the
>>>> queue.
>>>>
>>>> Am I correct?
>>>>
>>>> *Q2:* Deferred.return and Deferred.bind
>>>>
>>>> If I say
>>>>
>>>> Deferred.return 1
>>>>
>>>>
>>>> It will returns me a Deferred.t, but inside the function *return* or
>>>> *bind* somehow an "event" is implicitly added to the default queue for
>>>> scheduling, right?
>>>>
>>>> If I am correct above,
>>>>
>>>> *Q3:* Is Async depending on -thread? The queue or scheduler needs
>>>> compiler support?
>>>>
>>>> I just need to understand the whole picture in a rough way first.
>>>>
>>>> Thanks
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 9566 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 20:59   ` Dan Stark
  2014-06-03 22:33     ` Ashish Agarwal
@ 2014-06-03 23:17     ` Yaron Minsky
  2014-06-03 23:51       ` Dan Stark
  1 sibling, 1 reply; 7+ messages in thread
From: Yaron Minsky @ 2014-06-03 23:17 UTC (permalink / raw)
  To: Dan Stark; +Cc: David House, OCaml Mailing List

For what it's worth, this gives a decent overview of Async (if I do
say so myself)

https://realworldocaml.org/v1/en/html/concurrent-programming-with-async.html

This isn't quite right, but a good mental model is to think of Async
as being single threaded.  Scheduler.go starts up the async scheduler
on the main thread, and you do indeed need to do that for any IO to
actually happen.

y

On Tue, Jun 3, 2014 at 4:59 PM, Dan Stark <interlock.public@gmail.com> wrote:
> Hi David
>
> Thank you very much for this comprehensive explanation.
>
> Can I also know who is responsible for the queue and scheduler?
>
> Are they created and maintained by OCaml thread (OCaml internal) or Async
> (3rd party library, which means Async create the job queue and has its own
> scheduler)?
>
> In addition, will the compiler got involved in handling Deferred.t?
>
> I ask above questions because I felt quite curious about what is happening
> in the followings:
>
> Suppose we have a normal function:
>
>> let f1 () = print_endline "hello"; whatever_result;;
>
>
> Normally, no matter what whatever_result is, when I do let _ = f1 ();;,
> print_endline "hello" will be executed, am I right? For example, finally
> returning an int or a record or a lazy.t, etc, "hello" would be printed out.
>
> However, if I do
>
>> let f2 () = print_endline "hello"; return 1;;
>
>
> let _ = f2 ();; would do nothing unless I run the schedule let _ =
> ignore(Scheduler.go());;
>
> Since for f2 I am not using any other special creation function and the only
> special bit is return 1 after print_endline, if the compiler doesn't get
> involved, how can compiler know the whole application of f2() should be in
> future execution?
>
> Sorry for my above verbose questions if they are boring. I am just trying to
> understand more and I guess eventually I will look into the code once I
> grasp the big picture.
>
> thanks
>
> Dan
>
>
>
>
>
>
>
>
>
> On Tue, Jun 3, 2014 at 5:29 PM, David House <dhouse@janestreet.com> wrote:
>>
>> There is a queue of jobs in the scheduler. The scheduler runs the jobs one
>> by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a -> unit].
>>
>> There's a thing called a deferred. ['a Deferred.t] is an initially empty
>> box that may become filled later with something of type ['a]. There is a
>> similar type called ['a Ivar.t] -- the difference is that ivars have a
>> function to actually fill in the value, whereas deferreds do not: a deferred
>> is a "read-only" view on an ivar.
>>
>> You can wait on a deferred using bind. Doing [x >>= f] mutates the
>> deferred x to add f as a "handler". When a deferred is filled, it adds a job
>> to the scheduler for each handler it has.
>>
>> Doing [Deferred.return 1] allocates a deferred which is already filled and
>> has no handlers. Binding on that will immediately schedule a job to run your
>> function. (The job is still scheduled though, rather than being run
>> immediately, to ensure that you don't have an immediate context switch -- in
>> async, the only context switch points are the binds.)
>>
>> The primitive operations that block are replaced with functions that
>> return deferreds, and go do their work in a separate thread. There's a
>> thread pool to make sure you don't use infinity threads. (I think the
>> default cap is 50 threads.) I think yes, async does depend on -thread.
>>
>> There is an important optimisation: if you want to read or write to
>> certain file descriptors, that doesn't use a thread. Instead there's a
>> central list of such file descriptors. There's also a central list of all
>> "timer events" (e.g. deferreds that become deferred after some amount of
>> time). The scheduler actually is based around a select loop: it does the
>> following:
>>
>> run all the jobs
>> if more jobs have been scheduled, run those too
>> keep going until there are no more jobs, or we hit the
>> maximum-jobs-per-cycle cap
>> sleep using select until one read fd is read, or a write fd is ready, or a
>> timer event is due to fire
>> do that thing
>>
>> There's also a way to manually interrupt the scheduler. Blocking
>> operations other than reading/writing to fds do this: they run in a thread,
>> grab the async scheduler lock, fill in an ivar, then wake up the scheduler
>> to ensure timely running of the jobs they just scheduled. The async
>> scheduler lock is necessary because the scheduler itself is not re-entrant:
>> you cannot have multiple threads modifying the scheduler's internals.
>>
>>
>> On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:
>>>
>>> Hi all
>>>
>>> I am trying to get a rough overview of how Async is implemented (or the
>>> idea behind it) before I really dig into its source code.
>>>
>>> I have the following questions:
>>>
>>> Q1: Is Async event-loop like?
>>>
>>> From the API and some docs for Async's usage, I feel it is quite like a
>>> event-loop.
>>>
>>> You create Deferred.t and it might be added to a queue and a scheduler
>>> behind might be adjusting the order of running for all Deferred.t in the
>>> queue.
>>>
>>> Am I correct?
>>>
>>> Q2: Deferred.return and Deferred.bind
>>>
>>> If I say
>>>
>>>> Deferred.return 1
>>>
>>>
>>> It will returns me a Deferred.t, but inside the function return or bind
>>> somehow an "event" is implicitly added to the default queue for scheduling,
>>> right?
>>>
>>> If I am correct above,
>>>
>>> Q3: Is Async depending on -thread? The queue or scheduler needs compiler
>>> support?
>>>
>>> I just need to understand the whole picture in a rough way first.
>>>
>>> Thanks
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] How is Async implemented?
  2014-06-03 23:17     ` Yaron Minsky
@ 2014-06-03 23:51       ` Dan Stark
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Stark @ 2014-06-03 23:51 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: David House, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 6305 bytes --]

Hi Yaron

Yes, I do read your book and currently reading this chapter. And this is
the reason I got interested of how it is implemented behind the scene.

Thanks for all the help I got from here.

Dan




On Wed, Jun 4, 2014 at 12:17 AM, Yaron Minsky <yminsky@janestreet.com>
wrote:

> For what it's worth, this gives a decent overview of Async (if I do
> say so myself)
>
>
> https://realworldocaml.org/v1/en/html/concurrent-programming-with-async.html
>
> This isn't quite right, but a good mental model is to think of Async
> as being single threaded.  Scheduler.go starts up the async scheduler
> on the main thread, and you do indeed need to do that for any IO to
> actually happen.
>
> y
>
> On Tue, Jun 3, 2014 at 4:59 PM, Dan Stark <interlock.public@gmail.com>
> wrote:
> > Hi David
> >
> > Thank you very much for this comprehensive explanation.
> >
> > Can I also know who is responsible for the queue and scheduler?
> >
> > Are they created and maintained by OCaml thread (OCaml internal) or Async
> > (3rd party library, which means Async create the job queue and has its
> own
> > scheduler)?
> >
> > In addition, will the compiler got involved in handling Deferred.t?
> >
> > I ask above questions because I felt quite curious about what is
> happening
> > in the followings:
> >
> > Suppose we have a normal function:
> >
> >> let f1 () = print_endline "hello"; whatever_result;;
> >
> >
> > Normally, no matter what whatever_result is, when I do let _ = f1 ();;,
> > print_endline "hello" will be executed, am I right? For example, finally
> > returning an int or a record or a lazy.t, etc, "hello" would be printed
> out.
> >
> > However, if I do
> >
> >> let f2 () = print_endline "hello"; return 1;;
> >
> >
> > let _ = f2 ();; would do nothing unless I run the schedule let _ =
> > ignore(Scheduler.go());;
> >
> > Since for f2 I am not using any other special creation function and the
> only
> > special bit is return 1 after print_endline, if the compiler doesn't get
> > involved, how can compiler know the whole application of f2() should be
> in
> > future execution?
> >
> > Sorry for my above verbose questions if they are boring. I am just
> trying to
> > understand more and I guess eventually I will look into the code once I
> > grasp the big picture.
> >
> > thanks
> >
> > Dan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jun 3, 2014 at 5:29 PM, David House <dhouse@janestreet.com>
> wrote:
> >>
> >> There is a queue of jobs in the scheduler. The scheduler runs the jobs
> one
> >> by one. Jobs may schedule other jobs. A job is a pair of ['a * 'a ->
> unit].
> >>
> >> There's a thing called a deferred. ['a Deferred.t] is an initially empty
> >> box that may become filled later with something of type ['a]. There is a
> >> similar type called ['a Ivar.t] -- the difference is that ivars have a
> >> function to actually fill in the value, whereas deferreds do not: a
> deferred
> >> is a "read-only" view on an ivar.
> >>
> >> You can wait on a deferred using bind. Doing [x >>= f] mutates the
> >> deferred x to add f as a "handler". When a deferred is filled, it adds
> a job
> >> to the scheduler for each handler it has.
> >>
> >> Doing [Deferred.return 1] allocates a deferred which is already filled
> and
> >> has no handlers. Binding on that will immediately schedule a job to run
> your
> >> function. (The job is still scheduled though, rather than being run
> >> immediately, to ensure that you don't have an immediate context switch
> -- in
> >> async, the only context switch points are the binds.)
> >>
> >> The primitive operations that block are replaced with functions that
> >> return deferreds, and go do their work in a separate thread. There's a
> >> thread pool to make sure you don't use infinity threads. (I think the
> >> default cap is 50 threads.) I think yes, async does depend on -thread.
> >>
> >> There is an important optimisation: if you want to read or write to
> >> certain file descriptors, that doesn't use a thread. Instead there's a
> >> central list of such file descriptors. There's also a central list of
> all
> >> "timer events" (e.g. deferreds that become deferred after some amount of
> >> time). The scheduler actually is based around a select loop: it does the
> >> following:
> >>
> >> run all the jobs
> >> if more jobs have been scheduled, run those too
> >> keep going until there are no more jobs, or we hit the
> >> maximum-jobs-per-cycle cap
> >> sleep using select until one read fd is read, or a write fd is ready,
> or a
> >> timer event is due to fire
> >> do that thing
> >>
> >> There's also a way to manually interrupt the scheduler. Blocking
> >> operations other than reading/writing to fds do this: they run in a
> thread,
> >> grab the async scheduler lock, fill in an ivar, then wake up the
> scheduler
> >> to ensure timely running of the jobs they just scheduled. The async
> >> scheduler lock is necessary because the scheduler itself is not
> re-entrant:
> >> you cannot have multiple threads modifying the scheduler's internals.
> >>
> >>
> >> On 3 June 2014 16:39, Dan Stark <interlock.public@gmail.com> wrote:
> >>>
> >>> Hi all
> >>>
> >>> I am trying to get a rough overview of how Async is implemented (or the
> >>> idea behind it) before I really dig into its source code.
> >>>
> >>> I have the following questions:
> >>>
> >>> Q1: Is Async event-loop like?
> >>>
> >>> From the API and some docs for Async's usage, I feel it is quite like a
> >>> event-loop.
> >>>
> >>> You create Deferred.t and it might be added to a queue and a scheduler
> >>> behind might be adjusting the order of running for all Deferred.t in
> the
> >>> queue.
> >>>
> >>> Am I correct?
> >>>
> >>> Q2: Deferred.return and Deferred.bind
> >>>
> >>> If I say
> >>>
> >>>> Deferred.return 1
> >>>
> >>>
> >>> It will returns me a Deferred.t, but inside the function return or bind
> >>> somehow an "event" is implicitly added to the default queue for
> scheduling,
> >>> right?
> >>>
> >>> If I am correct above,
> >>>
> >>> Q3: Is Async depending on -thread? The queue or scheduler needs
> compiler
> >>> support?
> >>>
> >>> I just need to understand the whole picture in a rough way first.
> >>>
> >>> Thanks
> >>>
> >>> Dan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
>

[-- Attachment #2: Type: text/html, Size: 8646 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-06-03 23:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-03 15:39 [Caml-list] How is Async implemented? Dan Stark
2014-06-03 16:29 ` David House
2014-06-03 20:59   ` Dan Stark
2014-06-03 22:33     ` Ashish Agarwal
2014-06-03 23:17       ` Dan Stark
2014-06-03 23:17     ` Yaron Minsky
2014-06-03 23:51       ` Dan Stark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).