From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id B8578BC8E for ; Fri, 6 May 2005 22:15:00 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j46KExu4013067 for ; Fri, 6 May 2005 22:15:00 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA08510 for ; Fri, 6 May 2005 22:14:59 +0200 (MET DST) Received: from hedwig1.umh.ac.be (hedwig2.umh.ac.be [193.190.193.73]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j46KExBO016040 for ; Fri, 6 May 2005 22:14:59 +0200 Received: from poincare (Debian-exim@poincare.swapping.umh.ac.be [10.102.100.35]) by hedwig1.umh.ac.be (8.13.1/8.13.1) with ESMTP id j46KFJvu155800; Fri, 6 May 2005 22:15:23 +0200 Received: from localhost ([127.0.0.1] ident=trch) by poincare with esmtp (Exim 4.50) id 1DU9Df-0005Yc-0O; Fri, 06 May 2005 22:14:43 +0200 Date: Fri, 06 May 2005 22:14:35 +0200 (CEST) Message-Id: <20050506.221435.115446774.debian00@tiscali.be> To: Gerd Stolpmann Cc: "O'Caml Mailing List" , ocamlnet-devel@lists.sourceforge.net Subject: Re: [Caml-list] Re: Common CGI interface From: Christophe TROESTLER In-Reply-To: <1114513681.6324.156.camel@localhost.localdomain> References: <1114031186.6243.48.camel@localhost.localdomain> <20050425.123834.56365302.debian00@tiscali.be> <1114513681.6324.156.camel@localhost.localdomain> Organization: Universite de Mons-Hainaut (http://math.umh.ac.be/an/) X-Spook: 2600 Magazine NSA broadside Skipjack Mafia Commecen Albania ASPIC Firefly NORAD X-Blessing: Om Ah Hum Vajra Guru Pema Siddhi Hum X-Operating-System: GNU/Linux (http://www.linux.org/) X-Mailer-URL: http://www.mew.org/ X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Fri_May__6_22_14_35_2005_012)--" Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.1 (www dot roaringpenguin dot com slash mimedefang) X-Miltered: at concorde with ID 427BD044.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 427BD043.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 christophe:01 troestler:01 gerd:01 stolpmann:01 model:01 model:01 reused:01 sockaddr:01 threads:01 jserv:01 jserv:01 dispatching:01 threads:01 merging:01 X-Attachments: name="netcgi.mli" X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.5 required=5.0 tests=FROM_ENDS_IN_NUMS autolearn=disabled version=3.0.2 X-Spam-Level: ----Next_Part(Fri_May__6_22_14_35_2005_012)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, To start with, sorry to reply at such a slow pace but I am quite busy with my main job... On Tue, 26 Apr 2005, Gerd Stolpmann wrote: > > - The model is controlled by the web server. [...] establish_server: > I think this is not applicable here I agree. This is the easier case as, from the point of view of the app writer, it basically amounts to a single process (that may serve several requests sequentially). [This is also the model for mod_caml.] I would suggest that a default function [run] takes care of this case for every connector. Resources "opened" (e.g. a DB connection) before [run f] can be reused by each call of [f]. For convenience, I would add an optional argument [?sockaddr] to [run] can turn it into a distant app server (still sequential). No other connectors are needed for CGI and Test. > - Multi-processing controlled by the application itself. [...] > - Multi-threading controlled by the application itself. [...] > > - Whether only one URL is served by the application, or several [...] > - [...] persistent connections to external services [...] > - [...] instances of the application [...] communicate with each other [...] > > My point is that it is not easy to find a common description of all > that [...] really want to say for every CGI that you don't have all > that features that are only available for server models? [...] As said above, sequential processing should just be done through a [run] function. My idea is NOT to implement all the above models, just to have the minimal set of primitives handling their respective protocols and factored in such a way that ANY kind of concurrency one want CAN be programmed on top of them. After all, as you point out, one may not be able to describe all possibilities that the user may want. The two points where launching a new thread/process makes sense are: - to accept connections on a given (list of) socket(s); - to handle a request (when its input has been completely provided). As far as I understand, for AJP, the second possibility is not so much interesting (requests are sequential). It is however for FCGI since requests may be multiplexed (thus one may want to continue reading the other requests while processing some). Currently the primirives on which multi-processes/threads apps can be built are: - Netcgi_jserv.server_init - Netcgi_jserv.server_loop - Netcgi_jserv_ajp12.serve_connection IMO they are a bit cumbersome to use (looking at their implementation is necessary to graps them fully) but they are a good example of what I mean when I'm talking about a "minimal set of primitives". I'll have to think more to see if I can come up with something that I like better (and hopefully that you will too! :). (BTW, the AJP protocol includes its version, so I think it would be good to have a [serve_connection] that adapts to it -- this makes [Netcgi_jserv_app.protocol_type] useless.) A related note on connectors is how they should handle exceptions raised by the script. My feeling is that they should catch them and log them and then get ready for the next request (obviously the last part only makes sense when several requests are handled -- maybe sequentially -- by the script). That seems better than simply crashing the app. IMO the exception [Exit] should be treated specially and be accepted as an appropriate way of ending prematurely the script (it is really useful to finish early after error reporting). > > I do not understand why [Netcgi_jserv.server_init] is not just > > included in [server_loop]. > > This is not possible for multi-processing models: server_init runs in > the master process, and server_loop in every child process. I am wondering: why have concurrent, possibly mutex protected, accept to the sockect instead of having one process listening on it and dispatching (to processes or threads) on each accept? > > * all connectors are treated equally > > As explained, this isn't as simple as you think. The connectors > aren't equal, and the user must know that, and merging the specific > differences into a common standard is far from being trivial. The purpose is not to hide *all* differences between the connectors but, as much as possible, to do the same things the same way. In other words: have them share a common "philosophy". > I am currently thinking about a system of configuration objects. [...] Yes, they are a good idea and may help designing something elegant. > > > > About arguments: is the mutability of arguments useful? > > [...] the arguments are often considered as session state, passed > from one web page to the next [...] mutability is quite natural I would agree if there was a way to "return" them to the next page. But of course, there can't be at the level of the "connector". So having mutability at this level is in fact misleading. IMO, session preservation belongs to a "model" of web applications -- that is to a library build on the top of connectors [1]. It is the role of that library to provide pages-as-functions, session managment through arguments/databases/continuations/session-key/cookies,... [2]. The bottom line is that I believe that connectors should not "push" towards a given model -- as moreover they can only provide a half baked solution [3]. --- [1] It is important I believe that the community who can develop such "higher level" libraries is offered a simple and standard interface to connectors (including mod_caml if you ask me). [2] I've been dreaming of a session module that can dialog with fully typed templates (a la Kartz, but typed) which lets you define the session variables you need and manages the state transparently... [3] The arguments being string or files, they are very much mutable by their very own nature. However, the flexibility to directly modify them is barely matched by the interface (e.g. there is no need to "open" the argument for reading AND writing). Also, the session management may want to have "hidden" variables (like a session id that is automatically generated and not user's business) and it is not nice that there is a way to modify those "behind the back" of the library. > > [std_environment] and [test_environment] > > [custom_environment] is fine > > Ok, this exception exposes internals. But, as already pointed out, I > don't see why this is so bad. Well, that will not give me nightmares either. Nonetheless, I am a strong believer that interfaces should be minimal and hide irrelevant details unless there is a strong case about it ("it's there but ignore it" is not neat IMO). Note that my question (now stripped) was broader: what is the modularity gained by providing [std_environment] and [test_environment] ? -- seems to me [custom_environment] is all we need. To speak on something concrete I attach a piece of the interface as I see it. It does not include the "extension" interface. If we agree that it is only is useful to define new connectors, it should either be an inner module (hidden when -pack'ing) or a separate module. Normal comments explain the intent or raise questions. OCamldoc comments document new functions. Regards, ChriS --- P.S. tmp_directory in make_message_board (in file netcgi_jserv_app.ml) should take its value from the config record. ----Next_Part(Fri_May__6_22_14_35_2005_012)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Description: Interface proposal Content-Disposition: inline; filename="netcgi.mli" (* * Types and functions shared by all connectors ***********************************************************************) module Random : sig val init : ?lock:(unit -> unit) -> ?unlock:(unit -> unit) -> string -> unit val init_from_file : ?lock:(unit -> unit) -> ?unlock:(unit -> unit) -> ?length:int -> string -> unit val byte : unit -> int val sessionid : int -> string (** [sessionid length] generates a [8 * length] bit ([lenght] hex digits) random string which can be used for session IDs, random cookies and so on. The string returned will be very random and hard to predict *) end class type argument = (* I do not see the point of the cgi prefix *) object (* ... methods to be discussed ... *) method storage : [`Memory | `File of string] (* No need to define [store] as it is only used here -- saying it here makes it easier to figure out what the method is. *) method representation : [ `Simple of Netmime.mime_body | `MIME of Netmime.mime_message ] (* Same justification as above: having a single point of entry to undertand a method is easier. *) end type cookie = ... (* I do not see the point of the cgi prefix *) class type environment = object (* Only changed methods are listed *) method cookie : string -> cookie method cookies : (string * cookie) list (* The first one is convenient. Moreover, both should use the cookie type. *) (* Since [set_input_state] and [set_output_state] are not supposed to be for the final user, it would be nicer if they did not appear here. In the same vein, I would not include [input_ch] and [input_state] in the *public* interface: they are only useful for the [cgi] to initialize itself. *) end class type cgi = (* formerly cgi_activation *) object (* I believe short names are better so long they are as readable *) method arg : string -> argument method arg_val : ?default:string -> string -> string (*was [argument_value]*) method arg_true : string -> bool (** This method returns false if the named parameter is missing, is an empty string, or is the string ["0"]. Otherwise it returns true. Thus the intent of this is to return true in the Perl sense of the word. If a parameter appears multiple times, then this uses the first definition and ignores the others. *) method arg_all : string -> argument list (* formerly [multiple_argument] *) method args : (string * argument) list method url : ?protocol:Netcgi.protocol -> ... -> unit -> string method set_header : ?status:status -> ... -> unit -> unit method set_redirection_header : string -> unit method output : Netchannels.trans_out_obj_channel method log : Netchannels.out_obj_channel (** A channel whose data is appended to the webserver log. *) method finalize : unit -> unit method environment : Netcgi.environment method request_method : [`GET | `HEAD | `POST | `DELETE | `PUT of argument] (* Single point of doc. *) end type config = { tmp_directory : string; tmp_prefix : string; permitted_http_methods : [`GET | `HEAD | `POST | `DELETE | `PUT] list; (* Uniformity *) permitted_input_content_types : string list; input_content_length_limit : int; workarounds : [ `MSIE_Content_type_bug | `backslash_bug ] list; (* Single point of documentation. *) } (* These names better convey the intent I think *) type output_type = [`Direct | `Transactional] type arg_storage = [`Memory | `File | `Automatic] (* * Specific connectors ***********************************************************************) module CGI : sig val run : ?config:config -> ?output_type:output_type -> ?arg_storage:(string -> Netmime.mime_header -> arg_storage) -> (cgi -> unit) -> unit end module FCGI : sig val run : ?config:config -> ?output_type:output_type -> ?arg_storage:(string -> Netmime.mime_header -> arg_storage) -> ?sockaddr:Unix.sockaddr -> (cgi -> unit) -> unit (* Some flexible functions that allow any concurrency model. Here is a possibility. *) val socket : ?backlog:int -> ?reuseaddr:bool -> ?addr:Unix.inet_addr -> ?port:int -> Unix.file_descr (** [socket ?backlog ?reuseaddr ?addr ?port] setup a FCGI socket listening to incomming requests. @param backlog Length of the backlog queue (connections not yet accepted by the AJP server) @param reuseaddr Whether to reuse the port @param addr defaults to localhost. @param port if not present, uses stdin (which is a socket on Unix or -- contrarily to the spec -- a pipe on win$). *) (* More thinking is needed on "accept" and "request-handler". *) end module AJP : sig val run : ?config:config -> ?output_type:output_type -> ?arg_storage:(string -> Netmime.mime_header -> arg_storage) -> ?sockaddr:Unix.sockaddr -> (cgi -> unit) -> unit (* Some flexible functions that allow any concurrency model. Similar to the ones for FCGI. *) end module Test : sig val simple_arg : string -> string -> argument val mime_argument : ?work_around_backslash_bug:bool -> string -> Netmime.mime_message -> argument val run : ?config:config -> ?output_type:output_type -> ?arg_storage:(string -> Netmime.mime_header -> arg_storage) -> ?args:cgi_argument list -> ?meth:request_method -> (cgi -> unit) -> unit (* More flexibility is definitely required here -- along the lines of [custom_environment]. I am thinking one could e.g. be general enough to allow the output to be set into a frame, and another frame being used for control, logging,... -- a live debugger if you like! *) end ----Next_Part(Fri_May__6_22_14_35_2005_012)----