From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 866997F0F9 for ; Mon, 23 Nov 2015 18:17:09 +0100 (CET) IronPort-PHdr: 9a23:pxH7cB+n5Gr4df9uRHKM819IXTAuvvDOBiVQ1KB80uocTK2v8tzYMVDF4r011RmSDdidtKoP27SempujcFJDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXsq3G/pQQfBg/4fVIsYL+lR8iN14/niaibwN76XUZhvHKFe7R8LRG7/036l/I9ps9cEJs30QbDuXBSeu5blitCLFOXmAvgtI/rpMYwuwwZgf8q9tZBXKPmZOx4COUAVHV1e1wyseTtqR7FBSGG7XsdVC1CmxxUBA7P5Rr6X5HZoyL6se070y6fa4m+Y6o9X7ul7rxcYhjijztPYzAj+Wfcjc1ryqhcqhW9jxdyysjaetfGGuB5e/bxZ84CDT5NRNtJRitOQYi1ao8nHe0BOqBTqIyr9AhGlge3GQT5XLCn8TRPnHKjmPRii+k= Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=drupyog+caml@zoho.com; spf=Pass smtp.mailfrom=drupyog+caml@zoho.com; spf=None smtp.helo=postmaster@sender153-mail.zoho.com Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of drupyog+caml@zoho.com) identity=pra; client-ip=74.201.84.153; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="drupyog+caml@zoho.com"; x-sender="drupyog+caml@zoho.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of drupyog+caml@zoho.com designates 74.201.84.153 as permitted sender) identity=mailfrom; client-ip=74.201.84.153; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="drupyog+caml@zoho.com"; x-sender="drupyog+caml@zoho.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@sender153-mail.zoho.com) identity=helo; client-ip=74.201.84.153; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="drupyog+caml@zoho.com"; x-sender="postmaster@sender153-mail.zoho.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0B9AABpSVNWm5lUyUpehA5vwREjhWwCgUU8EAEBAQEBAQEBEAEBAQEBCAkLCSEugi2CCAEBBCMVCwE0AQIOCxoCBRYLAgIJAwIBAgFFBgEMCAEBiBQBAwEEDQ2uDnGEaQKFXSIoJYRVAQEBAQEBBAEBAQEBAQEVAwSBAYpRgnGFBIFEllWBXINIgnKFG4FbgUCGAo88g3I4glIdgUFxhSsBAQE X-IPAS-Result: A0B9AABpSVNWm5lUyUpehA5vwREjhWwCgUU8EAEBAQEBAQEBEAEBAQEBCAkLCSEugi2CCAEBBCMVCwE0AQIOCxoCBRYLAgIJAwIBAgFFBgEMCAEBiBQBAwEEDQ2uDnGEaQKFXSIoJYRVAQEBAQEBBAEBAQEBAQEVAwSBAYpRgnGFBIFEllWBXINIgnKFG4FbgUCGAo88g3I4glIdgUFxhSsBAQE X-IronPort-AV: E=Sophos;i="5.20,338,1444687200"; d="scan'208";a="154849586" Received: from sender153-mail.zoho.com ([74.201.84.153]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-SHA; 23 Nov 2015 18:17:08 +0100 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=from:subject:to:references:cc:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=Ufxf4azd3iI0NOw2F8phaQbfCDoDJwPKR9quF62JNQ0Vb+/5+RxGp54viRpSYkeCW4y/ltLJfoS0 y6qEFYYTooHFKFkGR/XPS7iyiDhU7wqQL84X2LDpC2HS8q3uDqoG Received: from [192.168.1.106] (perens.inria.fr [128.93.60.79]) by mx.zohomail.com with SMTPS id 1448299003643518.3647349067213; Mon, 23 Nov 2015 09:16:43 -0800 (PST) From: Drup To: Anton Bachin , =?UTF-8?Q?Fran=c3=a7ois_Bobot?= References: <4824377F-4045-4D47-9BAB-E06B0C939988@yahoo.com> <564AF405.10400@cea.fr> <1A569326-8749-4332-A88C-719165728F20@yahoo.com> <5652EDFC.5070105@cea.fr> <98E819C0-76A2-4038-A5E6-DFBDC08DF7FA@yahoo.com> Cc: caml-list@inria.fr Message-ID: <565349F9.6020405@zoho.com> Date: Mon, 23 Nov 2015 18:16:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <98E819C0-76A2-4038-A5E6-DFBDC08DF7FA@yahoo.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Caml-list] [ANN] Lambda Soup - HTML scraping and rewriting with CSS selectors There seems to be a slight misunderstanding about how tyxml is constructed, so let me clarify things a bit. - Tyxml doesn't have a canonical xml datatype, it's functorized over a generic Xml signature (implemented in [Xml_sigs.T]). As far as tyxml is concerned, xml nodes are a fully abstract type and can only be constructed. Multiple modules implements this signature in the ocsigen stack (two in js_of_ocaml's Tyxml_js, tree in eliom) that presents different characteristics. In particular some of them are really abstracts (React signals ...) and I doubt you could construct selectors over them in a meaningful way (but I would be happy to be proven wrong). - Another signature, [Xml_sigs.ITERABLE], implement global iteration over xml trees. It is not necessary for an XML implementation used by tyxml to respect it and, in particular, it is not implemented for js_of_ocaml's Tyxml_js. As pointed out previously, it doesn't make sense for all implementations, but we could implement it for some of them. - There is no signature for mutation (at the moment). This may be an interesting improvement. - The [Xml] module implements a "bare" XML datatype that is not really used by ocsigen, but can be used to build simple xml trees in a typeful manner (and then print them). It also answers ITERABLE. Now, in order to type lambda_soup using tyxml's types: It's going to be a bit of work. You can perfectly reuse all tyxml's type, but you need typeful combinators instead of strings, otherwise you have no way to know what your selection is going to return. You may be able to cheat your way through by creating a fake xml module and instantiate tyxml's functors on it to create all the combinators (that would be fun :p) In any case, you will pay typesafety by a significant increase in verbosity and awkwardness. I'm not sure it's worth the effort, since a lot of real world html trees are not correct and that you never really need to select tyxml-constructed trees anyway. Simple compatibility with tyxml is much easier: you just have to agree with tyxml's signatures (which would deserve a bit of a cleanup). [Xml_sigs.T]: https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L21 [Xml_sigs.ITERABLE]: https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L70 [Xml]: https://github.com/ocsigen/tyxml/blob/master/lib/xml.mli [Tyxml_js]: https://github.com/ocsigen/js_of_ocaml/blob/master/lib/tyxml/tyxml_js.mli