* Accents vs. combining accents @ 2014-02-26 7:51 Anthony J. Bentley 2014-02-28 17:06 ` Ingo Schwarze 0 siblings, 1 reply; 2+ messages in thread From: Anthony J. Bentley @ 2014-02-26 7:51 UTC (permalink / raw) To: discuss; +Cc: Ted Unangst Mandoc misbehaves a bit when printing accents in UTF-8. In summary: - Under normal circumstances, \` and \' should print spacing accents and not combining accents. - Maybe we should consider printing real quotes (U+2018/9) on raw ` and ' in UTF-8 mode. Maybe worth bringing up on the groff list too. texinfo2man (part of the devel/gindent build process) converts the following info source: `slithy_toves.c' into the following man(7) source: \`slithy_toves.c\' This is, of course, wrong. ` and ' in TeX represent left and right single quotes, but in troff \` and \' are accents, not quotation marks, so this is a bug in texinfo2man. mandoc also has a bug in this situation. It represents \` and \' as combining grave and acute accents (U+0300 and U+0301, respectively). But according to groff.info, section "Using Symbols": -- Escape: \' This is a backslash followed by the apostrophe character, ASCII character `0x27' (EBCDIC character `0x7D'). The same as `\[aa]', the acute accent. -- Escape: \` This is a backslash followed by ASCII character `0x60' (EBCDIC character `0x79' usually). The same as `\[ga]', the grave accent. And in turn, groff_char(7): The composite request is used to map most of the accents to non-spacing glyph names; the values given in parentheses are the original (spacing) ones. Output Input PostScript Unicode Notes ------------------------------------------------------------ ' \[aa] acute u0301 (u00B4) + ` \[ga] grave u0300 (u0060) + In situations with no composite request (I guess?), mandoc should print U+0060 and U+00B4 for \` and \' respectively, as groff does. Wrongly printing combining accents as it does now leads to dramatic visual artifacts throughout the manpage. (Side note: in TeXinfo, ` and ' represent quote marks while \` and \' represent accents. In troff, ` and ' represent quote marks while \` and \' represent accents. A bit of overzealous escaping on texinfo2man's part. But then again, it looks like neither groff nor mandoc actually represent ` and ' as accents except in print formats like PDF. Which means that (unless groff and/or mandoc start converting ` and ' in UTF-8 output, which neither currently do) the real correct characters there are \(oq and \(cq . texinfo2man had a patch submitted to use those in 2005, but it never got committed... sigh. I'll nudge upstream, and try to push this to ports after unlock.) (Boy, lots of manpages wrongly escape the ' character. afm2pl(1), bzr(1), curl(1), kpsetool(1), lacheck(1), makeindex(1), mendex(1), pdfinfo(1), pdftops(1), xsltproc(1)... Maybe people should be forced to run their manpages through gropdf and make sure it looks typographically pretty!) -- Anthony J. Bentley -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Accents vs. combining accents 2014-02-26 7:51 Accents vs. combining accents Anthony J. Bentley @ 2014-02-28 17:06 ` Ingo Schwarze 0 siblings, 0 replies; 2+ messages in thread From: Ingo Schwarze @ 2014-02-28 17:06 UTC (permalink / raw) To: discuss; +Cc: Ted Unangst, Anthony J. Bentley Hi Anthony, Anthony J. Bentley wrote on Wed, Feb 26, 2014 at 12:51:38AM -0700: > Mandoc misbehaves a bit when printing accents in UTF-8. Right, see below for a patch which i intend to commit. > In summary: > > - Under normal circumstances, \` and \' should print spacing accents > and not combining accents. Correct. The same is true for many other accent escape sequences. > - Maybe we should consider printing real quotes (U+2018/9) on raw ` > and ' in UTF-8 mode. Maybe worth bringing up on the groff list too. That would cause similar issues with copy and paste like the ones just discussed regarding hyphens and dashes. So at least for manuals, it would probably have to be disabled right away, just like manuals output the ASCII character for - and \-. I tend to agree with Dmitrij Czarkoff that plain ASCII input in better left as-is, and people should use escape sequences if they want specific fancy UTF-8 characters (except that in manuals, they probably shouldn't, it merely harms portability to ask for fancy characters). [...] > In situations with no composite request (I guess?), mandoc should > print U+0060 and U+00B4 for \` and \' respectively, as groff does. Correct. Mandoc doesn't support escape sequences involving composite characters at all, so mandoc has to use the codes you cite in all cases. Yours, Ingo P.S. I snipped your discussion of texinfo2man, which makes sense to me. Index: chars.in =================================================================== RCS file: /cvs/src/usr.bin/mandoc/chars.in,v retrieving revision 1.20 diff -u -p -r1.20 chars.in --- chars.in 22 Jan 2014 20:58:35 -0000 1.20 +++ chars.in 28 Feb 2014 16:46:26 -0000 @@ -49,21 +49,21 @@ CHAR("c", "", 0) CHAR("}", "", 0) /* Accents. */ -CHAR("a\"", "\"", 779) +CHAR("a\"", "\"", 733) CHAR("a-", "-", 175) CHAR("a.", ".", 729) -CHAR("a^", "^", 770) -CHAR("\'", "\'", 769) -CHAR("aa", "\'", 769) -CHAR("ga", "`", 768) -CHAR("`", "`", 768) -CHAR("ab", "`", 774) -CHAR("ac", ",", 807) -CHAR("ad", "\"", 776) +CHAR("a^", "^", 94) +CHAR("\'", "\'", 180) +CHAR("aa", "\'", 180) +CHAR("ga", "`", 96) +CHAR("`", "`", 96) +CHAR("ab", "`", 728) +CHAR("ac", ",", 184) +CHAR("ad", "\"", 168) CHAR("ah", "v", 711) CHAR("ao", "o", 730) -CHAR("a~", "~", 771) -CHAR("ho", ",", 808) +CHAR("a~", "~", 126) +CHAR("ho", ",", 731) CHAR("ha", "^", 94) CHAR("ti", "~", 126) -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-02-28 17:06 UTC | newest] Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-26 7:51 Accents vs. combining accents Anthony J. Bentley 2014-02-28 17:06 ` Ingo Schwarze
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).