caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: John Whitington <john@coherentgraphics.co.uk>
To: "Armaël Guéneau" <armael.gueneau@ens-lyon.fr>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] ANN: CamlPDF 1.7
Date: Fri, 16 Aug 2013 15:26:21 +0100	[thread overview]
Message-ID: <520E368D.40604@coherentgraphics.co.uk> (raw)
In-Reply-To: <520E10E1.5020701@ens-lyon.fr>

Hi,

Armaël Guéneau wrote:
>> So 0o019 looks like a floating acute in that encoding, followed by a
>> kern of 486/1000 of a point to shift leftward, followed by an 'e'. So,
>> this is an accented character built by composition of glyphs.
>>
>>> For "efficient", with "ffi" being ligated, I get
>>>
>>> Pdfops_TJ (Pdf.Array [Pdf.String "e\014cient"])
>>
>> In the font in use here, character 0o014 appears to be a single glyph
>> for the ffi ligature.
> Yes, ok. How do you know that? I mean, without knowing the displayed text.
> Is there a way, knowing the glyph code (here, 0o019 or 0o014), to convert
> it to something more "readable"? Like, hum, ['] for the floating acute,
> and [ffi]
> for the ligature.

Sometimes, sometimes not. This is what the Pdftext module does. Modern 
PDFs have a /ToUnicode for each font, which is a special data structure 
mapping bytes or sequences of bytes directly to unicode codepoints or 
sequences of unicode codepoints.

In the absence of this, one can fall back to the other parts of the font 
metadata (or even the font itself) which might give the encoding in use.

> I tried to copy paste the text from the pdf using evince, and the
> floating acute
> is indeed rendered separately, but the ligature is properly converted to
> "ffi".

> I guess the interpretation of the glyph code depends on the font, but I
> don't
> find how to do that with CamlPDF - using glyphnames_of_text just returned
> only "/.notdef"...

So it looks like only the ffi part will be doable. I can't really 
comment more without seeing the PDF and your code...

Thanks,

-- 
John Whitington
Director, Coherent Graphics Ltd
http://www.coherentpdf.com/


  reply	other threads:[~2013-08-16 14:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-15 11:21 John Whitington
2013-08-15 14:21 ` oliver
2013-08-15 14:28   ` John Whitington
2013-08-15 16:17     ` Gerd Stolpmann
2013-08-15 18:39       ` oliver
2013-08-18 12:04         ` Adrien Nader
2013-08-18 14:04           ` Florent Monnier
2013-08-18 18:23             ` oliver
2013-08-15 18:40 ` oliver
2013-08-15 18:42   ` oliver
2013-08-16 10:53 ` Armaël Guéneau
2013-08-16 11:06   ` John Whitington
2013-08-16 11:45     ` Armaël Guéneau
2013-08-16 14:26       ` John Whitington [this message]
2013-08-21 12:01 ` oliver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=520E368D.40604@coherentgraphics.co.uk \
    --to=john@coherentgraphics.co.uk \
    --cc=armael.gueneau@ens-lyon.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).