I did an evaluation of HTML parsers back in February. Most of the options are XML parsers, and a lot of them are very old. Other than Nethtml, I came up with two alternatives to consider: http://erratique.ch/software/xmlm https://github.com/facebook/pfff/tree/master/lang_html I didn't end up spending much time on either. It quickly became clear that Nethtml was what I needed. It handles content that isn't strictly valid, which was important to me, and has good performance. Cheers, Andy On Mon, Aug 11, 2014 at 4:57 PM, Jacques du Preez wrote: > Thanks. I eventually discovered ocamlnet, but I'm hoping there's maybe > more than 1 option? > > ============================== > Jacques du Preez > > Web: OpenLandscape.net > Twitter: @jacquesdp > > > On Sun, Aug 10, 2014 at 10:42 PM, Christophe Troestler < > Christophe.Troestler@umons.ac.be> wrote: > >> Hi, >> >> On Sun, 10 Aug 2014 19:38:39 +0200, Jacques du Preez wrote: >> > >> > I've been searching for an OCaml library to parse HTML, and then be >> able to >> > query and manipulate it similar to jQuery. >> > >> > The JSoup Java library, http://jsoup.org, allows me to do this. Is >> there >> > something like this for OCaml? >> >> Nethtml in ocamlnet partly does what you need (you can easily write >> recursive functions to extract the desired data from the HTML tree). >> >> Best, >> C. >> > >