Hi, Le 15/08/2013 13:21, John Whitington a écrit : > The first new release of the CamlPDF library for a while is here: > > http://www.github.com/johnwhitington/camlpdf > > (Or, shortly, via OPAM.) Thanks! I have been playing with CamlPDF a bit, trying to do text extraction. I'm a total novice about the PDF format, so i might be doing it wrong, but I was wondering if there were facilities, in CamlPDF, to handle diacritics and ligatures. For example, when reading the PDF operators for "Université", I get Pdfops_TJ (Pdf.Array [Pdf.String "Universit"; Pdf.String "\019"; Pdf.Real 486.; Pdf.String "e"]) For "efficient", with "ffi" being ligated, I get Pdfops_TJ (Pdf.Array [Pdf.String "e\014cient"]) How can I convert these back, especially the ligature? I tried to use the conversion functions of Pdftext, like codepoints_of_text followed by utf8_of_codepoints, but that didn't seem to work. It's highly possible that I'm also doing it wrong here. Armaël