From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <weis@pauillac.inria.fr>
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id B8578BC8E
	for <caml-list@yquem.inria.fr>; Fri,  6 May 2005 22:15:00 +0200 (CEST)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j46KExu4013067
	for <caml-list@yquem.inria.fr>; Fri, 6 May 2005 22:15:00 +0200
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA08510 for <caml-list@pauillac.inria.fr>; Fri, 6 May 2005 22:14:59 +0200 (MET DST)
Received: from hedwig1.umh.ac.be (hedwig2.umh.ac.be [193.190.193.73])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j46KExBO016040
	for <caml-list@inria.fr>; Fri, 6 May 2005 22:14:59 +0200
Received: from poincare (Debian-exim@poincare.swapping.umh.ac.be [10.102.100.35])
	by hedwig1.umh.ac.be (8.13.1/8.13.1) with ESMTP id j46KFJvu155800;
	Fri, 6 May 2005 22:15:23 +0200
Received: from localhost ([127.0.0.1] ident=trch)
	by poincare with esmtp (Exim 4.50)
	id 1DU9Df-0005Yc-0O; Fri, 06 May 2005 22:14:43 +0200
Date: Fri, 06 May 2005 22:14:35 +0200 (CEST)
Message-Id: <20050506.221435.115446774.debian00@tiscali.be>
To: Gerd Stolpmann <info@gerd-stolpmann.de>
Cc: "O'Caml Mailing List" <caml-list@inria.fr>,
	ocamlnet-devel@lists.sourceforge.net
Subject: Re: [Caml-list] Re: Common CGI interface
From: Christophe TROESTLER <debian00@tiscali.be>
In-Reply-To: <1114513681.6324.156.camel@localhost.localdomain>
References: <1114031186.6243.48.camel@localhost.localdomain>
	<20050425.123834.56365302.debian00@tiscali.be>
	<1114513681.6324.156.camel@localhost.localdomain>
Organization: Universite de Mons-Hainaut (http://math.umh.ac.be/an/)
X-Spook: 2600 Magazine NSA broadside Skipjack Mafia Commecen Albania ASPIC
 Firefly NORAD 
X-Blessing: Om Ah Hum Vajra Guru Pema Siddhi Hum
X-Operating-System: GNU/Linux (http://www.linux.org/)
X-Mailer-URL: http://www.mew.org/
X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Multipart/Mixed;
 boundary="--Next_Part(Fri_May__6_22_14_35_2005_012)--"
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.1 (www dot roaringpenguin dot com slash mimedefang)
X-Miltered: at concorde with ID 427BD044.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Miltered: at nez-perce with ID 427BD043.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 christophe:01 troestler:01 gerd:01 stolpmann:01 model:01 model:01 reused:01 sockaddr:01 threads:01 jserv:01 jserv:01 dispatching:01 threads:01 merging:01 
X-Attachments: name="netcgi.mli" 
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr
X-Spam-Status: No, score=0.5 required=5.0 tests=FROM_ENDS_IN_NUMS 
	autolearn=disabled version=3.0.2
X-Spam-Level: 

----Next_Part(Fri_May__6_22_14_35_2005_012)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

To start with, sorry to reply at such a slow pace but I am quite busy
with my main job...

On Tue, 26 Apr 2005, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> 
> - The model is controlled by the web server. [...] establish_server:
> I think this is not applicable here

I agree.  This is the easier case as, from the point of view of the
app writer, it basically amounts to a single process (that may serve
several requests sequentially).  [This is also the model for
mod_caml.]  I would suggest that a default function [run] takes care
of this case for every connector.  Resources "opened" (e.g. a DB
connection) before [run f] can be reused by each call of [f].  For
convenience, I would add an optional argument [?sockaddr] to [run] can
turn it into a distant app server (still sequential).

No other connectors are needed for CGI and Test.

> - Multi-processing controlled by the application itself. [...]
> - Multi-threading controlled by the application itself. [...]
>
> - Whether only one URL is served by the application, or several [...]
> - [...] persistent connections to external services [...]
> - [...] instances of the application [...] communicate with each other [...]
>
> My point is that it is not easy to find a common description of all
> that [...] really want to say for every CGI that you don't have all
> that features that are only available for server models?  [...]

As said above, sequential processing should just be done through a
[run] function.  My idea is NOT to implement all the above models,
just to have the minimal set of primitives handling their respective
protocols and factored in such a way that ANY kind of concurrency one
want CAN be programmed on top of them.  After all, as you point out,
one may not be able to describe all possibilities that the user may
want.

The two points where launching a new thread/process makes sense are:
- to accept connections on a given (list of) socket(s);
- to handle a request (when its input has been completely provided).

As far as I understand, for AJP, the second possibility is not so much
interesting (requests are sequential).  It is however for FCGI since
requests may be multiplexed (thus one may want to continue reading the
other requests while processing some).

Currently the primirives on which multi-processes/threads apps can be
built are:
- Netcgi_jserv.server_init
- Netcgi_jserv.server_loop
- Netcgi_jserv_ajp12.serve_connection

IMO they are a bit cumbersome to use (looking at their implementation
is necessary to graps them fully) but they are a good example of what
I mean when I'm talking about a "minimal set of primitives".  I'll
have to think more to see if I can come up with something that I like
better (and hopefully that you will too! :).  (BTW, the AJP protocol
includes its version, so I think it would be good to have a
[serve_connection] that adapts to it -- this makes
[Netcgi_jserv_app.protocol_type] useless.)


A related note on connectors is how they should handle exceptions
raised by the script.  My feeling is that they should catch them and
log them and then get ready for the next request (obviously the last
part only makes sense when several requests are handled -- maybe
sequentially -- by the script).  That seems better than simply
crashing the app.  IMO the exception [Exit] should be treated
specially and be accepted as an appropriate way of ending prematurely
the script (it is really useful to finish early after error
reporting).

> > I do not understand why [Netcgi_jserv.server_init] is not just
> > included in [server_loop].
> 
> This is not possible for multi-processing models: server_init runs in
> the master process, and server_loop in every child process.

I am wondering: why have concurrent, possibly mutex protected, accept
to the sockect instead of having one process listening on it and
dispatching (to processes or threads) on each accept?

> > * all connectors are treated equally
>
> As explained, this isn't as simple as you think. The connectors
> aren't equal, and the user must know that, and merging the specific
> differences into a common standard is far from being trivial.

The purpose is not to hide *all* differences between the connectors
but, as much as possible, to do the same things the same way.  In
other words: have them share a common "philosophy".

> I am currently thinking about a system of configuration objects. [...]

Yes, they are a good idea and may help designing something elegant.

> > > > About arguments: is the mutability of arguments useful?
> 
> [...] the arguments are often considered as session state, passed
> from one web page to the next [...] mutability is quite natural

I would agree if there was a way to "return" them to the next page.
But of course, there can't be at the level of the "connector".  So
having mutability at this level is in fact misleading.  IMO, session
preservation belongs to a "model" of web applications -- that is to a
library build on the top of connectors [1].  It is the role of that
library to provide pages-as-functions, session managment through
arguments/databases/continuations/session-key/cookies,... [2].  The
bottom line is that I believe that connectors should not "push"
towards a given model -- as moreover they can only provide a half
baked solution [3].

---
[1] It is important I believe that the community who can develop such
    "higher level" libraries is offered a simple and standard
    interface to connectors (including mod_caml if you ask me).

[2] I've been dreaming of a session module that can dialog with fully
    typed templates (a la Kartz, but typed) which lets you define the
    session variables you need and manages the state transparently...

[3] The arguments being string or files, they are very much mutable by
    their very own nature.  However, the flexibility to directly
    modify them is barely matched by the interface (e.g. there is no
    need to "open" the argument for reading AND writing).

    Also, the session management may want to have "hidden" variables
    (like a session id that is automatically generated and not user's
    business) and it is not nice that there is a way to modify those
    "behind the back" of the library.

> > [std_environment] and [test_environment]
> > [custom_environment] is fine
> 
> Ok, this exception exposes internals. But, as already pointed out, I
> don't see why this is so bad.

Well, that will not give me nightmares either.  Nonetheless, I am a
strong believer that interfaces should be minimal and hide irrelevant
details unless there is a strong case about it ("it's there but ignore
it" is not neat IMO).  Note that my question (now stripped) was
broader: what is the modularity gained by providing [std_environment]
and [test_environment] ? -- seems to me [custom_environment] is all we
need.

To speak on something concrete I attach a piece of the interface as I
see it.  It does not include the "extension" interface.  If we agree
that it is only is useful to define new connectors, it should either
be an inner module (hidden when -pack'ing) or a separate module.
Normal comments explain the intent or raise questions.  OCamldoc
comments document new functions.

Regards,
ChriS


---
P.S. tmp_directory in make_message_board (in file
netcgi_jserv_app.ml) should take its value from the config record.

----Next_Part(Fri_May__6_22_14_35_2005_012)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Description: Interface proposal
Content-Disposition: inline; filename="netcgi.mli"

(*
 * Types and functions shared by all connectors
 ***********************************************************************)

module Random :
sig
  val init : ?lock:(unit -> unit) -> ?unlock:(unit -> unit) -> string -> unit
  val init_from_file : ?lock:(unit -> unit) -> ?unlock:(unit -> unit) ->
    ?length:int -> string -> unit
  val byte : unit -> int
  val sessionid : int -> string
    (** [sessionid length] generates a [8 * length] bit ([lenght] hex
        digits) random string which can be used for session IDs,
        random cookies and so on.  The string returned will be very
        random and hard to predict *)
end


class type argument = (* I do not see the point of the cgi prefix *)
object
  (* ... methods to be discussed ... *)

  method storage : [`Memory | `File of string]
    (* No need to define [store] as it is only used here -- saying it
       here makes it easier to figure out what the method is. *)
  method representation : [ `Simple of Netmime.mime_body
                          | `MIME of Netmime.mime_message ]
    (* Same justification as above: having a single point of entry to
       undertand a method is easier. *)
end


type cookie = ... (* I do not see the point of the cgi prefix *)


class type environment =
object
  (* Only changed methods are listed *)

  method cookie : string -> cookie
  method cookies : (string * cookie) list
    (* The first one is convenient.  Moreover, both should use the
       cookie type. *)

  (* Since [set_input_state] and [set_output_state] are not supposed
     to be for the final user, it would be nicer if they did not
     appear here.

     In the same vein, I would not include [input_ch] and
     [input_state] in the *public* interface: they are only useful for
     the [cgi] to initialize itself. *)
end

class type cgi = (* formerly cgi_activation *)
object
  (* I believe short names are better so long they are as readable *)
  method arg : string -> argument
  method arg_val : ?default:string -> string -> string (*was [argument_value]*)
  method arg_true : string -> bool
    (** This method returns false if the named parameter is missing,
        is an empty string, or is the string ["0"]. Otherwise it
        returns true. Thus the intent of this is to return true in the
        Perl sense of the word.  If a parameter appears multiple
        times, then this uses the first definition and ignores the
        others. *)
  method arg_all : string -> argument list (* formerly [multiple_argument] *)
  method args : (string * argument) list

  method url : ?protocol:Netcgi.protocol -> ... -> unit -> string

  method set_header : ?status:status -> ... -> unit -> unit
  method set_redirection_header : string -> unit
  method output : Netchannels.trans_out_obj_channel

  method log : Netchannels.out_obj_channel
    (** A channel whose data is appended to the webserver log. *)

  method finalize  : unit -> unit

  method environment : Netcgi.environment
  method request_method : [`GET | `HEAD | `POST | `DELETE | `PUT of argument]
    (* Single point of doc. *)
end


type config = {
  tmp_directory : string;
  tmp_prefix : string;
  permitted_http_methods : [`GET | `HEAD | `POST | `DELETE | `PUT] list;
                                                             (* Uniformity *)
  permitted_input_content_types : string list;
  input_content_length_limit : int;
  workarounds : [ `MSIE_Content_type_bug | `backslash_bug ] list;
    (* Single point of documentation. *)
}

(* These names better convey the intent I think *)
type output_type = [`Direct | `Transactional]
type arg_storage = [`Memory | `File | `Automatic]

(*
 * Specific connectors
 ***********************************************************************)
module CGI :
sig
  val run :
    ?config:config ->
    ?output_type:output_type ->
    ?arg_storage:(string -> Netmime.mime_header -> arg_storage) ->
    (cgi -> unit) -> unit
end

module FCGI :
sig
  val run :
    ?config:config ->
    ?output_type:output_type ->
    ?arg_storage:(string -> Netmime.mime_header -> arg_storage) ->
    ?sockaddr:Unix.sockaddr ->
    (cgi -> unit) -> unit

  (* Some flexible functions that allow any concurrency model.  Here is
     a possibility. *)
  val socket : ?backlog:int -> ?reuseaddr:bool ->
    ?addr:Unix.inet_addr -> ?port:int -> Unix.file_descr
    (** [socket ?backlog ?reuseaddr ?addr ?port] setup a FCGI socket
	listening to incomming requests.

	@param backlog Length of the backlog queue (connections not yet
	accepted by the AJP server)
	@param reuseaddr Whether to reuse the port
	@param addr defaults to localhost.
	@param port if not present, uses stdin (which is a socket on
 	Unix or -- contrarily to the spec -- a pipe on win$).
    *)

  (* More thinking is needed on "accept" and "request-handler". *)
end

module AJP :
sig
  val run :
    ?config:config ->
    ?output_type:output_type ->
    ?arg_storage:(string -> Netmime.mime_header -> arg_storage) ->
    ?sockaddr:Unix.sockaddr ->
    (cgi -> unit) -> unit

(* Some flexible functions that allow any concurrency model.  Similar
   to the ones for FCGI. *)
end

module Test :
sig
  val simple_arg : string -> string -> argument
  val mime_argument : ?work_around_backslash_bug:bool ->
    string -> Netmime.mime_message -> argument

  val run :
    ?config:config ->
    ?output_type:output_type ->
    ?arg_storage:(string -> Netmime.mime_header -> arg_storage) ->
    ?args:cgi_argument list ->
    ?meth:request_method ->
    (cgi -> unit) -> unit
    (* More flexibility is definitely required here -- along the lines
       of [custom_environment].  I am thinking one could e.g. be
       general enough to allow the output to be set into a frame, and
       another frame being used for control, logging,... -- a live
       debugger if you like! *)
end

----Next_Part(Fri_May__6_22_14_35_2005_012)----