From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ve0-f179.google.com (mail-ve0-f179.google.com [209.85.128.179]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s1Q7pnjX026452 for ; Wed, 26 Feb 2014 02:51:50 -0500 (EST) Received: by mail-ve0-f179.google.com with SMTP id oz11so1791879veb.10 for ; Tue, 25 Feb 2014 23:51:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc :content-type; bh=+Lssax7UEzJCw3x8/wQ11CbFWy1IiWqvbBENufusn2U=; b=nFmgrttYcjBuLV7DtDNF/6Mnx77esc/PZ3pJ7FfJ/QhD54HX/8q+VFPr6gLFj7TfMz bpUu+U3A/X29SiwtRKKoes0WRf63ncdEepjKhjb/qZwFlWpgYov9RnBR/8+Pi0Fvuzph cgCQah4fYztuCA+eL47xcT8gEM1aL3Sf9B3jKSm1B43AjOi1riQhWn7+E+oRvHUADoIK MF6Y5fk5JQlnVwdrnnO0NgCK2EPuBqAVLaaHqfYckFFo7FfyYSgV8y2kliZ7VviIwYC4 VIybpkt3CRR8OhWp/6ubigVAR9KziL2/kUuYmxVzu1cJKsg1A0h+HI0aqc3CtJ2bY0im OaUw== X-Gm-Message-State: ALoCoQliPStJz8NImNyv4JzZn4CzEZsYvrJT3dOO/kQLQykqPa1RO6yuhI1Dc6DidPpz/AzEilVx X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 X-Received: by 10.52.251.232 with SMTP id zn8mr54052vdc.41.1393401098610; Tue, 25 Feb 2014 23:51:38 -0800 (PST) Received: by 10.58.238.162 with HTTP; Tue, 25 Feb 2014 23:51:38 -0800 (PST) X-Originating-IP: [75.161.4.128] Date: Wed, 26 Feb 2014 00:51:38 -0700 Message-ID: Subject: Accents vs. combining accents From: "Anthony J. Bentley" To: discuss@mdocml.bsd.lv Cc: Ted Unangst Content-Type: text/plain; charset=ISO-8859-1 Mandoc misbehaves a bit when printing accents in UTF-8. In summary: - Under normal circumstances, \` and \' should print spacing accents and not combining accents. - Maybe we should consider printing real quotes (U+2018/9) on raw ` and ' in UTF-8 mode. Maybe worth bringing up on the groff list too. texinfo2man (part of the devel/gindent build process) converts the following info source: `slithy_toves.c' into the following man(7) source: \`slithy_toves.c\' This is, of course, wrong. ` and ' in TeX represent left and right single quotes, but in troff \` and \' are accents, not quotation marks, so this is a bug in texinfo2man. mandoc also has a bug in this situation. It represents \` and \' as combining grave and acute accents (U+0300 and U+0301, respectively). But according to groff.info, section "Using Symbols": -- Escape: \' This is a backslash followed by the apostrophe character, ASCII character `0x27' (EBCDIC character `0x7D'). The same as `\[aa]', the acute accent. -- Escape: \` This is a backslash followed by ASCII character `0x60' (EBCDIC character `0x79' usually). The same as `\[ga]', the grave accent. And in turn, groff_char(7): The composite request is used to map most of the accents to non-spacing glyph names; the values given in parentheses are the original (spacing) ones. Output Input PostScript Unicode Notes ------------------------------------------------------------ ' \[aa] acute u0301 (u00B4) + ` \[ga] grave u0300 (u0060) + In situations with no composite request (I guess?), mandoc should print U+0060 and U+00B4 for \` and \' respectively, as groff does. Wrongly printing combining accents as it does now leads to dramatic visual artifacts throughout the manpage. (Side note: in TeXinfo, ` and ' represent quote marks while \` and \' represent accents. In troff, ` and ' represent quote marks while \` and \' represent accents. A bit of overzealous escaping on texinfo2man's part. But then again, it looks like neither groff nor mandoc actually represent ` and ' as accents except in print formats like PDF. Which means that (unless groff and/or mandoc start converting ` and ' in UTF-8 output, which neither currently do) the real correct characters there are \(oq and \(cq . texinfo2man had a patch submitted to use those in 2005, but it never got committed... sigh. I'll nudge upstream, and try to push this to ports after unlock.) (Boy, lots of manpages wrongly escape the ' character. afm2pl(1), bzr(1), curl(1), kpsetool(1), lacheck(1), makeindex(1), mendex(1), pdfinfo(1), pdftops(1), xsltproc(1)... Maybe people should be forced to run their manpages through gropdf and make sure it looks typographically pretty!) -- Anthony J. Bentley -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv