Concurrency for services

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Concurrency for services
@ 2005-07-12 22:40 Christophe TROESTLER
  2005-07-13  4:33 ` [Caml-list] " John Skaller
  2005-07-13 11:27 ` Gerd Stolpmann
  0 siblings, 2 replies; 5+ messages in thread
From: Christophe TROESTLER @ 2005-07-12 22:40 UTC (permalink / raw)
  To: O'Caml Mailing List

Hi,

I have been confronted a couple of times to the following situation
and I am looking for advice for a good (the best, if possible :)
reusable solution.

Imagine you want to build a server library (e.g. a Lpd daemon, a file
server, a web service,...).  In broad terms, you can describe it as a
protocol for which you will handle the various "events" by callbacks.
You do not want to wire any concurrency model in your library but
instead provide the user with the appropriate functions so it can use
his favorite concurrency schema.  My question is: what is the best way
to do that?

Perhaps naively, I am thinking along these lines.  There are three
points where a possibility of concurrency naturally offers itself:

1. several threads/processes can listen (run "accept") on the same
   socket;

2. each time a connection is accepted, one may decide to process it in
   a new thread/process (or send it to a pool of threads/processes,...);

3. each time one calls a callback (that only makes sense if several
   callbacks can be called independently for a single connection).

It seems to me that 3. belongs to the design of the library and cannot
easily be abstracted.  1. and 2. however are basically the same from
one library to the next.  For 2., one has to distinguish whether one
uses threads of processes to know when the accept socket has to be
closed.  So one ends up with 3 functions:

val socket : unit -> Unix.file_descr
val accept_fork :
  fork:((Unix.file_descr -> unit) -> int * Unix.file_descr) ->
  'a connection_handler -> Unix.inet_addr -> unit
val accept_threads :
  ?thread:((unit -> unit) -> unit) ->
  'a connection_handler -> Unix.inet_addr -> unit

The [connection_handler] is an abstract type that is created by a
[handle_connection] which states which callback(s) to use.  The file
descriptor in [fork] is a communication channel for the son to send
messages to the father (the father may need to react to some commands
in the protocol e.g. "shutdown").  A simple usage is

let () = accept_threads (handle_connection f) (socket())

Of course, there usually will be many optional arguments to tailor the
behavior of these functions.  I wonder:

i. are the above functions able to take care of about any concurrency
   model one can imagine (new thread/process, pool of threads,
   dispatching to a set of machines,...);

ii. are they designed well enough so that they can form a signature on
    which one can build usual concurrency models through functors?

I do not have a large experience with concurrency models, so your
input is very much appreciated.

Regards,
ChriS

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Concurrency for services
  2005-07-12 22:40 Concurrency for services Christophe TROESTLER
@ 2005-07-13  4:33 ` John Skaller
  2005-07-13  8:59   ` Richard Jones
  2005-07-13 11:27 ` Gerd Stolpmann
  1 sibling, 1 reply; 5+ messages in thread
From: John Skaller @ 2005-07-13  4:33 UTC (permalink / raw)
  To: Christophe TROESTLER; +Cc: O'Caml Mailing List

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

On Wed, 2005-07-13 at 00:40 +0200, Christophe TROESTLER wrote:
> Hi,

> Imagine you want to build a server library (e.g. a Lpd daemon, a file
> server, a web service,...).  In broad terms, you can describe it as a
> protocol for which you will handle the various "events" by callbacks.
> You do not want to wire any concurrency model in your library but
> instead provide the user with the appropriate functions so it can use
> his favorite concurrency schema.  My question is: what is the best way
> to do that?

The way the Felix model handles that is:

(a) The code is represented by a library which 'reads' 
abstract events of some type determined by the library.

(b) The programmer who is the client of this library
is responsible for all scheduling, threading, and  I/O.

That is, the library leaves the I/O and scheduling
"up to the operating system" which the client programmer
is required to write.

A twist with the Felix model is that the code is written
with 'read' commands, but the Felix translator control
inverts (twists) the code into event driven code.

Actually a version for Ocaml is possible although I
haven't tried to create one yet -- Felix targets C++
but it should work for any language which supports
classes with virtual functions and switches.

[Anyone want to take on an Ocaml back end?]

The system generates a class with a 'resume()' method
which looks like this in C++

con_t *resume() {
  switch (pc) {
    case 1: ....

          // read variable is translated to this:
          read=true;            // flag we need an event
          read_ptr = &variable; // where to deliver it
          pc = 2;               // set return address
          return this;
    case 2: ....

  }
}

An ocaml version would look the same. The point is that the 
library code incorporates all the abstract business logic,
protocol rules etc, but leaves out all the physical I/O
and timing issues.

I would love to recommend examining Felix package,
however there aren't any good examples yet. 

It was originally designed to run with an existing C++ framework
using ACE, ACN, TCAP, and other telephony stuff --
a million lines of C++ code -- where it provided the
'business logic' required by each client (a client
being a national telecoms carrier).

Examples of business logic rules include 1800,
911, prepaid, conference, and other telephony
services and 'products' offered by the client.

In this case the processing model used included
setting up a database server, plus a multi-CPU
Solaris box with a worker thread per CPU, plus
a collection of threads to do the event collation,
I/O, database caching, load balancing, etc etc etc.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Download Felix: http://felix.sf.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Concurrency for services
  2005-07-13  4:33 ` [Caml-list] " John Skaller
@ 2005-07-13  8:59   ` Richard Jones
  2005-07-13 14:10     ` John Skaller
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Jones @ 2005-07-13  8:59 UTC (permalink / raw)
  To: caml-list

On Wed, Jul 13, 2005 at 02:33:32PM +1000, John Skaller wrote:
> A twist with the Felix model is that the code is written
> with 'read' commands, but the Felix translator control
> inverts (twists) the code into event driven code.
> 
> Actually a version for Ocaml is possible although I
> haven't tried to create one yet -- Felix targets C++
> but it should work for any language which supports
> classes with virtual functions and switches.

It should definitely be possible in OCaml.  For anyone who wants to
see how this trick is done (in C using setcontext), also look at:
http://www.annexia.org/freeware/pthrlib

Specifically at the files src/pthr_reactor.c (an ordinary event
Reactor) and src/pthr_pseudothread.c (implements threads using control
inversion on top of the Reactor).

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Concurrency for services
  2005-07-12 22:40 Concurrency for services Christophe TROESTLER
  2005-07-13  4:33 ` [Caml-list] " John Skaller
@ 2005-07-13 11:27 ` Gerd Stolpmann
  1 sibling, 0 replies; 5+ messages in thread
From: Gerd Stolpmann @ 2005-07-13 11:27 UTC (permalink / raw)
  To: Christophe TROESTLER; +Cc: O'Caml Mailing List

Am Mittwoch, den 13.07.2005, 00:40 +0200 schrieb Christophe TROESTLER:
> Hi,
> 
> I have been confronted a couple of times to the following situation
> and I am looking for advice for a good (the best, if possible :)
> reusable solution.
> 
> Imagine you want to build a server library (e.g. a Lpd daemon, a file
> server, a web service,...).  In broad terms, you can describe it as a
> protocol for which you will handle the various "events" by callbacks.
> You do not want to wire any concurrency model in your library but
> instead provide the user with the appropriate functions so it can use
> his favorite concurrency schema.  My question is: what is the best way
> to do that?

For Nethttpd, the soon-to-be-published web server component, I had the
same problem. Event-driven code is the most general, so the core of the
protocol engine was written in this style (manually, without twister).
On top of that there are several encapsulations, and the user can also
choose multi-threading or multi-processing (at least in principal).

> Perhaps naively, I am thinking along these lines.  There are three
> points where a possibility of concurrency naturally offers itself:
> 
> 1. several threads/processes can listen (run "accept") on the same
>    socket;
> 
> 2. each time a connection is accepted, one may decide to process it in
>    a new thread/process (or send it to a pool of threads/processes,...);
> 
> 3. each time one calls a callback (that only makes sense if several
>    callbacks can be called independently for a single connection).

The common property is that you have to establish a socket descriptor in
some kind of container. The multi-processing model is the most
restrictive one because of the process boundaries and because the
descriptors can only be shared by inheriting them from a common parent
process. You cannot hide the properties of the service containers from
the user.

> It seems to me that 3. belongs to the design of the library and cannot
> easily be abstracted.  1. and 2. however are basically the same from
> one library to the next.  For 2., one has to distinguish whether one
> uses threads of processes to know when the accept socket has to be
> closed.  So one ends up with 3 functions:
> 
> val socket : unit -> Unix.file_descr
> val accept_fork :
>   fork:((Unix.file_descr -> unit) -> int * Unix.file_descr) ->
>   'a connection_handler -> Unix.inet_addr -> unit
> val accept_threads :
>   ?thread:((unit -> unit) -> unit) ->
>   'a connection_handler -> Unix.inet_addr -> unit
> 
> The [connection_handler] is an abstract type that is created by a
> [handle_connection] which states which callback(s) to use.  The file
> descriptor in [fork] is a communication channel for the son to send
> messages to the father (the father may need to react to some commands
> in the protocol e.g. "shutdown").  A simple usage is
> 
> let () = accept_threads (handle_connection f) (socket())
> 
> Of course, there usually will be many optional arguments to tailor the
> behavior of these functions.  I wonder:
> 
> i. are the above functions able to take care of about any concurrency
>    model one can imagine (new thread/process, pool of threads,
>    dispatching to a set of machines,...);
> 
> ii. are they designed well enough so that they can form a signature on
>     which one can build usual concurrency models through functors?

These functions outline what to do, but aren't complete enough. I see
the following limitations:

- Missing parameters for timeouts
- Missing parameters for controlling the speed of starting and stopping
  containers
- Missing parameters for error behaviour: What to do if one container
  does not respond?
- Many services listen on several sockets, and not only Internet 
  sockets.
- Missing hooks that could be executed on startup/shutdown of a 
  container
- A means to broadcast messages is very helpful

For a general solution I would prefer an abstract type [controller].
That could be a class type like:

class type controller =
object
  method type_of_container : [`Process|`Thread|...]
  method services : Unix.sockaddr list
  method on_container_startup : unit -> unit
  method on_container_shutdown : unit -> unit
  method serve : unit -> unit
  method shutdown : unit -> unit   (* shutdown all services *)
end

class type container =
object
  method controller : controller
  method broadcast : message -> unit
  method accept : unit -> [ `Connection of sockaddr * file_descr
                          | `Message of message
                          | `Shutdown ]
end


And implementations are classes:

class process_controller : 
  config:config -> handler:(container->unit) -> controller
class thread_controller : 
  config:config -> handler:(container->unit) -> controller

The idea is that [config] describes the properties of every
implementation. There can be common configuration options (like the list
of socket addresses), and options that are only applicable for a certain
type of controller (e.g. timeouts). The [handler] is called when the
container is established (i.e. a process or thread). The handler is
typically a loop that calls [accept] until it receives a shutdown
message. With [broadcast] containers can send messages to all other
containers.

An example would be:

let rec my_handler cont =
  match cont # accept() with
    | `Connection (servaddr,fd) ->
         (* Got a new connection for service [servaddr] as descriptor [fd] *)
         ...;
         my_handler cont
    | `Message msg ->
         (* Got a new message *)
         ...;
         my_handler cont
    | `Shutdown ->
         ()

let my_service =
  new process_controller
     config:(object
               method services = [ Unix.INET_ADDR ... ]
               method on_startup = (fun () -> ...)
               method on_shutdown = (fun () -> ...)
               method min_processes = 10
               method max_processes = 20
               method timeout = 30
             )
     handler:my_handler in
my_service # serve()

It is easy to see that one only needs to instantiate thread_controller
instead of process_controller to get a multi-threading server.

One problem of this approach that it is not easy to unify two container
handlers that implement services indepently in order to get a handler
that serves both, i.e. the function

val union : controller -> controller -> controller

is non-trivial.

Btw, I would like to include such a feature into OCamlnet. A sponsor
would be very welcome.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Telefon: 06151/153855                  Telefax: 06151/997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Concurrency for services
  2005-07-13  8:59   ` Richard Jones
@ 2005-07-13 14:10     ` John Skaller
  0 siblings, 0 replies; 5+ messages in thread
From: John Skaller @ 2005-07-13 14:10 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2015 bytes --]

On Wed, 2005-07-13 at 09:59 +0100, Richard Jones wrote:

> > Actually a version for Ocaml is possible although I
> > haven't tried to create one yet -- Felix targets C++
> > but it should work for any language which supports
> > classes with virtual functions and switches.
> 
> It should definitely be possible in OCaml.  For anyone who wants to
> see how this trick is done (in C using setcontext), also look at:
> http://www.annexia.org/freeware/pthrlib

Ah yes, but there is an important difference: this library
supports C code by swapping machine stacks.

This works, but it isn't capable of supporting large numbers
of threads due to a lack of address space: each stack must
reserve the maximum possible amount of memory it could use.
If you use malloc, it is even worse, since malloc is required
to allocate memory (not merely address space).

This kind of library is actually a good argument
for a 64 bit processor :)

Felix uses a linked list of heap allocated stack frames
to avoid this problem. Stackless Python works this way
too I think. The cost is slow allocation, however that
wouldn't apply to an Ocaml version. Also the whole
program optimiser eliminates many calls by inlining,
or by observing that there is no blocking operation
in the control path, which allows the machine
stack to be used instead. The C++ version also switches
much faster than setjump/longjmp since only 'the usual'
function call overhead is incurred, there's no need to save
the whole CPU state.

MLton uses linear stacks, but they're not machine stacks,
and MLton's copying collector can grow and shrink them.

There's another problem with the C stack swapping
trick -- it only works with C. It may or may not work
with C++, and it is very unlikely to work properly
in a multi-language environment (eg: C and Ocaml together).

BTW: I'm curious if g++ supports get/setcontext?

-- 
John Skaller <skaller at users dot sourceforge dot net>
Download Felix: http://felix.sf.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-07-13 14:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-12 22:40 Concurrency for services Christophe TROESTLER
2005-07-13  4:33 ` [Caml-list] " John Skaller
2005-07-13  8:59   ` Richard Jones
2005-07-13 14:10     ` John Skaller
2005-07-13 11:27 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).