discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: "Anthony J. Bentley" <anthony@cathet.us>
To: discuss@mdocml.bsd.lv
Cc: Ted Unangst <tedu@tedunangst.com>
Subject: Accents vs. combining accents
Date: Wed, 26 Feb 2014 00:51:38 -0700	[thread overview]
Message-ID: <CAFRrxynppe1pB7_hc=oUGKaOQr+XvhyweBJU0zOH0=63P5gGMQ@mail.gmail.com> (raw)

Mandoc misbehaves a bit when printing accents in UTF-8. In summary:

- Under normal circumstances, \` and \' should print spacing accents
and not combining accents.
- Maybe we should consider printing real quotes (U+2018/9) on raw `
and ' in UTF-8 mode. Maybe worth bringing up on the groff list too.


texinfo2man (part of the devel/gindent build process) converts the
following info source:

`slithy_toves.c'

into the following man(7) source:

\`slithy_toves.c\'

This is, of course, wrong. ` and ' in TeX represent left and right
single quotes, but in troff \` and \' are accents, not quotation
marks, so this is a bug in texinfo2man.

mandoc also has a bug in this situation. It represents \` and \' as
combining grave and acute accents (U+0300 and U+0301, respectively).
But according to groff.info, section "Using Symbols":


 -- Escape: \'
     This is a backslash followed by the apostrophe character, ASCII
     character `0x27' (EBCDIC character `0x7D').  The same as `\[aa]',
     the acute accent.

 -- Escape: \`
     This is a backslash followed by ASCII character `0x60' (EBCDIC
     character `0x79' usually).  The same as `\[ga]', the grave accent.


And in turn, groff_char(7):


       The composite request is used to map most of the accents to non-spacing
       glyph names; the values given in parentheses are the original (spacing)
       ones.

       Output   Input   PostScript     Unicode         Notes
       ------------------------------------------------------------
       '        \[aa]   acute          u0301 (u00B4)   +
       `        \[ga]   grave          u0300 (u0060)   +


In situations with no composite request (I guess?), mandoc should
print U+0060 and U+00B4 for \` and \' respectively, as groff does.
Wrongly printing combining accents as it does now leads to dramatic
visual artifacts throughout the manpage.

(Side note: in TeXinfo, ` and ' represent quote marks while \` and \'
represent accents. In troff, ` and ' represent quote marks while \`
and \' represent accents. A bit of overzealous escaping on
texinfo2man's part. But then again, it looks like neither groff nor
mandoc actually represent ` and ' as accents except in print formats
like PDF. Which means that (unless groff and/or mandoc start
converting ` and ' in UTF-8 output, which neither currently do) the
real correct characters there are \(oq and \(cq . texinfo2man had a
patch submitted to use those in 2005, but it never got committed...
sigh. I'll nudge upstream, and try to push this to ports after
unlock.)

(Boy, lots of manpages wrongly escape the ' character. afm2pl(1),
bzr(1), curl(1), kpsetool(1), lacheck(1), makeindex(1), mendex(1),
pdfinfo(1), pdftops(1), xsltproc(1)... Maybe people should be forced
to run their manpages through gropdf and make sure it looks
typographically pretty!)

-- 
Anthony J. Bentley
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

             reply	other threads:[~2014-02-26  7:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-26  7:51 Anthony J. Bentley [this message]
2014-02-28 17:06 ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFRrxynppe1pB7_hc=oUGKaOQr+XvhyweBJU0zOH0=63P5gGMQ@mail.gmail.com' \
    --to=anthony@cathet.us \
    --cc=discuss@mdocml.bsd.lv \
    --cc=tedu@tedunangst.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).