discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Accents vs. combining accents
@ 2014-02-26  7:51 Anthony J. Bentley
  2014-02-28 17:06 ` Ingo Schwarze
  0 siblings, 1 reply; 2+ messages in thread
From: Anthony J. Bentley @ 2014-02-26  7:51 UTC (permalink / raw)
  To: discuss; +Cc: Ted Unangst

Mandoc misbehaves a bit when printing accents in UTF-8. In summary:

- Under normal circumstances, \` and \' should print spacing accents
and not combining accents.
- Maybe we should consider printing real quotes (U+2018/9) on raw `
and ' in UTF-8 mode. Maybe worth bringing up on the groff list too.


texinfo2man (part of the devel/gindent build process) converts the
following info source:

`slithy_toves.c'

into the following man(7) source:

\`slithy_toves.c\'

This is, of course, wrong. ` and ' in TeX represent left and right
single quotes, but in troff \` and \' are accents, not quotation
marks, so this is a bug in texinfo2man.

mandoc also has a bug in this situation. It represents \` and \' as
combining grave and acute accents (U+0300 and U+0301, respectively).
But according to groff.info, section "Using Symbols":


 -- Escape: \'
     This is a backslash followed by the apostrophe character, ASCII
     character `0x27' (EBCDIC character `0x7D').  The same as `\[aa]',
     the acute accent.

 -- Escape: \`
     This is a backslash followed by ASCII character `0x60' (EBCDIC
     character `0x79' usually).  The same as `\[ga]', the grave accent.


And in turn, groff_char(7):


       The composite request is used to map most of the accents to non-spacing
       glyph names; the values given in parentheses are the original (spacing)
       ones.

       Output   Input   PostScript     Unicode         Notes
       ------------------------------------------------------------
       '        \[aa]   acute          u0301 (u00B4)   +
       `        \[ga]   grave          u0300 (u0060)   +


In situations with no composite request (I guess?), mandoc should
print U+0060 and U+00B4 for \` and \' respectively, as groff does.
Wrongly printing combining accents as it does now leads to dramatic
visual artifacts throughout the manpage.

(Side note: in TeXinfo, ` and ' represent quote marks while \` and \'
represent accents. In troff, ` and ' represent quote marks while \`
and \' represent accents. A bit of overzealous escaping on
texinfo2man's part. But then again, it looks like neither groff nor
mandoc actually represent ` and ' as accents except in print formats
like PDF. Which means that (unless groff and/or mandoc start
converting ` and ' in UTF-8 output, which neither currently do) the
real correct characters there are \(oq and \(cq . texinfo2man had a
patch submitted to use those in 2005, but it never got committed...
sigh. I'll nudge upstream, and try to push this to ports after
unlock.)

(Boy, lots of manpages wrongly escape the ' character. afm2pl(1),
bzr(1), curl(1), kpsetool(1), lacheck(1), makeindex(1), mendex(1),
pdfinfo(1), pdftops(1), xsltproc(1)... Maybe people should be forced
to run their manpages through gropdf and make sure it looks
typographically pretty!)

-- 
Anthony J. Bentley
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-02-28 17:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26  7:51 Accents vs. combining accents Anthony J. Bentley
2014-02-28 17:06 ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).