caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Drup <drupyog+caml@zoho.com>
To: "Anton Bachin" <antonbachin@yahoo.com>,
	"François Bobot" <francois.bobot@cea.fr>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] [ANN] Lambda Soup - HTML scraping and rewriting with CSS selectors
Date: Mon, 23 Nov 2015 18:16:41 +0100	[thread overview]
Message-ID: <565349F9.6020405@zoho.com> (raw)
In-Reply-To: <98E819C0-76A2-4038-A5E6-DFBDC08DF7FA@yahoo.com>

There seems to be a slight misunderstanding about how tyxml is 
constructed, so let me clarify things a bit.

- Tyxml doesn't have a canonical xml datatype, it's functorized over a 
generic Xml signature (implemented in [Xml_sigs.T]). As far as tyxml is 
concerned, xml nodes are a fully abstract type and can only be 
constructed. Multiple modules implements this signature in the ocsigen 
stack (two in js_of_ocaml's Tyxml_js, tree in eliom) that presents 
different characteristics. In particular some of them are really 
abstracts (React signals ...) and I doubt you could construct selectors 
over them in a meaningful way (but I would be happy to be proven wrong).
- Another signature, [Xml_sigs.ITERABLE], implement global iteration 
over xml trees. It is not necessary for an XML implementation used by 
tyxml to respect it and, in particular, it is not implemented for 
js_of_ocaml's Tyxml_js. As pointed out previously, it doesn't make sense 
for all implementations, but we could implement it for some of them.
- There is no signature for mutation (at the moment). This may be an 
interesting improvement.
- The [Xml] module implements a "bare" XML datatype that is not really 
used by ocsigen, but can be used to build simple xml trees in a typeful 
manner (and then print them). It also answers ITERABLE.

Now, in order to type lambda_soup using tyxml's types: It's going to be 
a bit of work. You can perfectly reuse all tyxml's type, but you need 
typeful combinators instead of strings, otherwise you have no way to 
know what your selection is going to return. You may be able to cheat 
your way through by creating a fake xml module and instantiate tyxml's 
functors on it to create all the combinators (that would be fun :p)

In any case, you will pay typesafety by a significant increase in 
verbosity and awkwardness. I'm not sure it's worth the effort, since a 
lot of real world html trees are not correct and that you never really 
need to select tyxml-constructed trees anyway. Simple compatibility with 
tyxml is much easier: you just have to agree with tyxml's signatures 
(which would deserve a bit of a cleanup).

[Xml_sigs.T]: 
https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L21
[Xml_sigs.ITERABLE]: 
https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L70
[Xml]: https://github.com/ocsigen/tyxml/blob/master/lib/xml.mli
[Tyxml_js]: 
https://github.com/ocsigen/js_of_ocaml/blob/master/lib/tyxml/tyxml_js.mli




  reply	other threads:[~2015-11-23 17:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-16 21:01 Anton Bachin
2015-11-17  9:31 ` François Bobot
2015-11-22  7:58   ` Anton Bachin
2015-11-23 10:44     ` François Bobot
2015-11-23 16:26       ` Anton Bachin
2015-11-23 17:16         ` Drup [this message]
2015-11-23 17:35           ` Anton Bachin
2015-11-23 17:41             ` Anton Bachin
2015-11-23 18:20             ` Drup
2015-11-23 19:02               ` Anton Bachin
2015-11-24  8:35         ` François Bobot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565349F9.6020405@zoho.com \
    --to=drupyog+caml@zoho.com \
    --cc=antonbachin@yahoo.com \
    --cc=caml-list@inria.fr \
    --cc=francois.bobot@cea.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).