caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Anton Bachin <antonbachin@yahoo.com>
To: Drup <drupyog+caml@zoho.com>
Cc: "François Bobot" <francois.bobot@cea.fr>, caml-list@inria.fr
Subject: Re: [Caml-list] [ANN] Lambda Soup - HTML scraping and rewriting with CSS selectors
Date: Mon, 23 Nov 2015 11:35:41 -0600	[thread overview]
Message-ID: <5D7BF541-7E63-4A21-842A-8C34F36B550B@yahoo.com> (raw)
In-Reply-To: <565349F9.6020405@zoho.com>


> There seems to be a slight misunderstanding about how tyxml is constructed, so let me clarify things a bit.

Thanks. I will still have to look at tyxml, though.

> Now, in order to type lambda_soup using tyxml's types: It's going to be a bit of work. You can perfectly reuse all tyxml's type, but you need typeful combinators instead of strings, otherwise you have no way to know what your selection is going to return. You may be able to cheat your way through by creating a fake xml module and instantiate tyxml's functors on it to create all the combinators (that would be fun :p)

Does tyxml have checked coercions? I was thinking of something like
filtering a traversal by a checked coercion. This is how Lambda Soup
currently does it for traversing elements. While traversing nodes, it
filters by a checked coercion to elements. Typed selection, as you
suggest, is another possibility, but my guess is that it would take a
quite a while to design something that is easily learnable and not very
challenging to type, if that is possible at all – as you seem to agree.

> In any case, you will pay typesafety by a significant increase in verbosity and awkwardness. I'm not sure it's worth the effort, since a lot of real world html trees are not correct and that you never really need to select tyxml-constructed trees anyway. Simple compatibility with tyxml is much easier: you just have to agree with tyxml's signatures (which would deserve a bit of a cleanup).

This is what I would be going for by default, since without resorting
to coercions, that is the best, in terms of typing, that you could hope
for when parsing. My main concern beyond that, as expressed in my
previous message, is how Lambda Soup could best interact at the type
level with trees constructed by tyxml, not what types it (or any other
library in OCaml) could assign to a tree constructed from arbitrary
input. I suppose that if people really never need to select on tyxml
trees, as you say, then Lambda Soup and tyxml are simply addressing
different usages that don’t interact very much, which is what I
suspected from the beginning. Having no experience with tyxml, however,
I would like more feedback :)

Regards,
Anton

  reply	other threads:[~2015-11-23 17:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-16 21:01 Anton Bachin
2015-11-17  9:31 ` François Bobot
2015-11-22  7:58   ` Anton Bachin
2015-11-23 10:44     ` François Bobot
2015-11-23 16:26       ` Anton Bachin
2015-11-23 17:16         ` Drup
2015-11-23 17:35           ` Anton Bachin [this message]
2015-11-23 17:41             ` Anton Bachin
2015-11-23 18:20             ` Drup
2015-11-23 19:02               ` Anton Bachin
2015-11-24  8:35         ` François Bobot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D7BF541-7E63-4A21-842A-8C34F36B550B@yahoo.com \
    --to=antonbachin@yahoo.com \
    --cc=caml-list@inria.fr \
    --cc=drupyog+caml@zoho.com \
    --cc=francois.bobot@cea.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).