caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Hendrik Tews <tews@cs.ru.nl>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] cookies in netclient
Date: Sun, 30 Sep 2007 18:45:17 +0200	[thread overview]
Message-ID: <1191170717.7114.39.camel@localhost.localdomain> (raw)
In-Reply-To: <wwuir5vunjz.fsf@tandem.cs.ru.nl>

Am Freitag, den 28.09.2007, 11:49 +0200 schrieb Hendrik Tews:
> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
> 
>    >    Unfortunately, get_set_cookie is missing (I have an implementation if
>    >    you really need it).
>    > 
>    > This one would retrieve the cookies as an Nethttp.cookie list? I
>    > don't know yet if I need it.
> 
>    yes. You find it if you need it:
> 
>    https://godirepo.camlcity.org/wwwsvn/trunk/code/get-set-cookie.ml?rev=1145&root=lib-ocamlnet2&view=auto
> 
> 
> In my opinion it would be more convenient to have something of
> type 
> 
>    #Nethttp.http_header_ro -> Nethttp.cookie list
> 
> eg
> 
> let get_set_cookies mh =
>   List.map get_set_cookie (mh#multiple_field "set-cookie")

Right. Please keep in mind that the posted function is just an
extraction of a bigger program, and I didn't need more there.

> Further I propose to add a function to set cookies that accepts a
> cookie list, like 
> 
> let set_cookies mh l = 
>   Nethttp.Header.set_cookie mh 
>     (List.map (fun c -> (c.Nethttp.cookie_name, c.Nethttp.cookie_value)) l)

I usually resist to include such convenience functions since (a) they
are ultimately simple and (b) the docs would be longer than the
function.

> The docs for http_call#request_header says
> 
>  The user should set the following headers:
> 
>     * Content-length: Set this to the length of the request body
>       if known. (The client falls back to HTTP 1.0 if not set!) 
> 
> Do I have to care about this when using
> Nethttp.Header.set_cookie?

I don't understand. The problem with Content-length is the following.
Older HTTP versions (i.e. 1.0) did not have a way to transfer request
messages with unknown length other than sending EOF. That means you can
only indicate the end of the request by closing the sending part of the
connection. Although HTTP 1.1 fixes that problem, a client simply cannot
know whether it talks to a 1.0 or a 1.1 server, so you have to be
1.0-compatible. And that means to either include the Content-length
header, or to accept the EOF. (And there are still many 1.0-only servers
around!)

Of course, this does not have anything to do with cookies.

> >From what I read in the docs, it was not clear to me if
> #request_header returns a copy of the header. I.e. do I have to
> #set_request_header after modifying the header? (It works
> without, so I guess #request_header does not copy.)

Yes, it is not a copy.

> Yet another question: The docs for Netmime.mime_body_ro#value
> says it will return the decoded body. 

That means that any Base-64 or quoted-printable encoding is
automatically decoded. These encodings are only used for mail messages,
and not for HTTP.

> But in which encoding? For
> instance, if I want to extract pieces of an html page, what
> should I pass as in_enc:Netconversion.encoding to
> Netencoding.Html.decode? (At the moment decode_to_latin1 works
> fine with me, but that's probably not the right way.)

The character encoding is a different thing. It can be sent in two ways:

If there is a Content-type header in the response with a charset
parameter, this one counts. This looks like

Content-type: text/html;charset=euc-kr

but the grammar allows more complex expressions as well. Use the
#content_type method of the response header to parse it, e.g.

let ct, ct_params = http_call#response_header#content_type in
let charset = List.assoc "charset" ct_params in
let charset_s = Mimestring.param_value charset in
...

If you get Not_found by List.assoc, there is a second way to get the
character encoding. The HTML document may contain 

<meta http-equiv="Content-type" content="text/html;charset=euc-kr">

There is unfortunately no other way than to HTML-parse the document and
look for this element (Nethtml should do well).

If this method also fails, there is no clean way of determining the
character encoding. Browsers usually fall back to something called
"auto-recognition" but it works only if you know the language (e.g. if
you know it is Japanese these algorithms can distinguish euc-jp from
Shift-JIS). There is no auto-recognition implementation in ocamlnet.

> Ocamlnet works now fine for me: Thanks for this great package!

Great to hear it!

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


      reply	other threads:[~2007-09-30 16:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-24 21:41 Hendrik Tews
2007-08-27 22:27 ` [Caml-list] " Gerd Stolpmann
2007-08-28 12:48   ` Hendrik Tews
2007-08-28 14:12     ` Gerd Stolpmann
2007-09-28  9:49       ` Hendrik Tews
2007-09-30 16:45         ` Gerd Stolpmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191170717.7114.39.camel@localhost.localdomain \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@yquem.inria.fr \
    --cc=tews@cs.ru.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).