I did an evaluation of HTML parsers back in February. Most of the options
are XML parsers, and a lot of them are very old. Other than Nethtml, I came
up with two alternatives to consider:
http://erratique.ch/software/xmlm
https://github.com/facebook/pfff/tree/master/lang_html
I didn't end up spending much time on either. It quickly became clear that
Nethtml was what I needed. It handles content that isn't strictly valid,
which was important to me, and has good performance.
Cheers,
Andy
On Mon, Aug 11, 2014 at 4:57 PM, Jacques du Preez
wrote:
> Thanks. I eventually discovered ocamlnet, but I'm hoping there's maybe
> more than 1 option?
>
> ==============================
> Jacques du Preez
>
> Web: OpenLandscape.net
> Twitter: @jacquesdp
>
>
> On Sun, Aug 10, 2014 at 10:42 PM, Christophe Troestler <
> Christophe.Troestler@umons.ac.be> wrote:
>
>> Hi,
>>
>> On Sun, 10 Aug 2014 19:38:39 +0200, Jacques du Preez wrote:
>> >
>> > I've been searching for an OCaml library to parse HTML, and then be
>> able to
>> > query and manipulate it similar to jQuery.
>> >
>> > The JSoup Java library, http://jsoup.org, allows me to do this. Is
>> there
>> > something like this for OCaml?
>>
>> Nethtml in ocamlnet partly does what you need (you can easily write
>> recursive functions to extract the desired data from the HTML tree).
>>
>> Best,
>> C.
>>
>
>