From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from localhost (fantadrom.bsd.lv [local]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTPA id 1413e7b5 for ; Fri, 10 Aug 2018 17:13:15 -0500 (EST) Date: Fri, 10 Aug 2018 17:13:15 -0500 (EST) X-Mailinglist: mandoc-source Reply-To: source@mandoc.bsd.lv MIME-Version: 1.0 From: schwarze@mandoc.bsd.lv To: source@mandoc.bsd.lv Subject: mandoc: handle the non-portable GNU-style \[charNN], \[charNNN] X-Mailer: activitymail 1.26, http://search.cpan.org/dist/activitymail/ Content-Type: text/plain; charset=utf-8 Message-Id: Log Message: ----------- handle the non-portable GNU-style \[charNN], \[charNNN] character escape sequences, used for example in the groff_char(7) manual page Modified Files: -------------- mandoc: TODO mandoc.c mandoc_char.7 Revision Data ------------- Index: TODO =================================================================== RCS file: /home/cvs/mandoc/mandoc/TODO,v retrieving revision 1.260 retrieving revision 1.261 diff -LTODO -LTODO -u -p -r1.260 -r1.261 --- TODO +++ TODO @@ -40,9 +40,10 @@ are mere guesses, and some may be wrong. - \*(.T prints the device being used, see groff_char(7) for an example - -- \[charNN], \[charNNN] prints a single-byte codepoint - see groff_char(7) for examples + This is slightly hard because -Tlocale only decides to use ascii or + utf8 when initializing the formatter, so the information is not + yet available to the preprocessor at the parsing stage. + loc ** exist ** algo * size * imp * - .ad (adjust margins) .ad l -- adjust left margin only (flush left) Index: mandoc.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/mandoc.c,v retrieving revision 1.104 retrieving revision 1.105 diff -Lmandoc.c -Lmandoc.c -u -p -r1.104 -r1.105 --- mandoc.c +++ mandoc.c @@ -41,7 +41,7 @@ enum mandoc_esc mandoc_escape(const char **end, const char **start, int *sz) { const char *local_start; - int local_sz; + int local_sz, c, i; char term; enum mandoc_esc gly; @@ -330,8 +330,26 @@ mandoc_escape(const char **end, const ch } break; case ESCAPE_SPECIAL: - if (1 == *sz && 'c' == **start) - gly = ESCAPE_NOSPACE; + if (**start == 'c') { + if (*sz == 1) { + gly = ESCAPE_NOSPACE; + break; + } + if (*sz < 6 || *sz > 7 || + strncmp(*start, "char", 4) != 0 || + (int)strspn(*start + 4, "0123456789") + 4 < *sz) + break; + c = 0; + for (i = 4; i < *sz; i++) + c = 10 * c + ((*start)[i] - '0'); + if (c < 0x21 || (c > 0x7e && c < 0xa0) || c > 0xff) + break; + *start += 4; + *sz -= 4; + gly = ESCAPE_NUMBERED; + break; + } + /* * Unicode escapes are defined in groff as \[u0000] * to \[u10FFFF], where the contained value must be Index: mandoc_char.7 =================================================================== RCS file: /home/cvs/mandoc/mandoc/mandoc_char.7,v retrieving revision 1.72 retrieving revision 1.73 diff -Lmandoc_char.7 -Lmandoc_char.7 -u -p -r1.72 -r1.73 --- mandoc_char.7 +++ mandoc_char.7 @@ -761,14 +761,16 @@ For backward compatibility with existing .Xr mandoc 1 also supports the .Pp -.Dl \eN\(aq Ns Ar number Ns \(aq +.Dl \eN\(aq Ns Ar number Ns \(aq and \e[ Ns Cm char Ns Ar number ] .Pp -escape sequence, inserting the character +escape sequences, inserting the character .Ar number from the current character set into the output. Of course, this is inherently non-portable and is already marked -as deprecated in the Heirloom roff manual. -For example, do not use \eN\(aq34\(aq, use \e(dq, or even the plain +as deprecated in the Heirloom roff manual; +on top of that, the second form is a GNU extension. +For example, do not use \eN\(aq34\(aq or \e[char34], use \e(dq, +or even the plain .Sq \(dq character where possible. .Sh COMPATIBILITY -- To unsubscribe send an email to source+unsubscribe@mandoc.bsd.lv