From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from krisdoz.my.domain (kristaps@localhost [127.0.0.1]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p4FFUX2r003579 for ; Sun, 15 May 2011 11:30:33 -0400 (EDT) Received: (from kristaps@localhost) by krisdoz.my.domain (8.14.3/8.14.3/Submit) id p4FFUX2R006350; Sun, 15 May 2011 11:30:33 -0400 (EDT) Date: Sun, 15 May 2011 11:30:33 -0400 (EDT) Message-Id: <201105151530.p4FFUX2R006350@krisdoz.my.domain> X-Mailinglist: mdocml-source Reply-To: source@mdocml.bsd.lv MIME-Version: 1.0 From: kristaps@mdocml.bsd.lv To: source@mdocml.bsd.lv Subject: mdocml: Support groff's escape for Unicode input. X-Mailer: activitymail 1.26, http://search.cpan.org/dist/activitymail/ Content-Type: text/plain; charset=utf-8 Log Message: ----------- Support groff's escape for Unicode input. See http://mdocml.bsd.lv/archives/tech/0368.html For the time being, we just throw it away. Modified Files: -------------- mdocml: mandoc.c mandoc.h mandoc_char.7 Revision Data ------------- Index: mandoc.c =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v retrieving revision 1.51 retrieving revision 1.52 diff -Lmandoc.c -Lmandoc.c -u -p -r1.51 -r1.52 --- mandoc.c +++ mandoc.c @@ -125,6 +125,14 @@ mandoc_escape(const char **end, const ch break; case ('['): gly = ESCAPE_SPECIAL; + /* + * Unicode escapes are defined in groff as \[uXXXX] to + * \[u10FFFF], where the contained value must be a valid + * Unicode codepoint. Here, however, only check whether + * it's not a zero-width escape. + */ + if ('u' == cp[i] && ']' != cp[i + 1]) + gly = ESCAPE_UNICODE; term = ']'; break; case ('C'): Index: mandoc_char.7 =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc_char.7,v retrieving revision 1.44 retrieving revision 1.45 diff -Lmandoc_char.7 -Lmandoc_char.7 -u -p -r1.44 -r1.45 --- mandoc_char.7 +++ mandoc_char.7 @@ -520,6 +520,20 @@ portable. .It \e*(Px Ta \*(Px Ta POSIX standard name .It \e*(Ai Ta \*(Ai Ta ANSI standard name .El +.Sh UNICODE CHARACTERS +The escape sequence +.Pp +.Dl \e[uXXXX] +.Pp +is interpreted as a Unicode codepoint. +The codepoint must be in the range above U+0080 and less than U+10FFFF. +For compatibility, points must be zero-padded to four characters; if +greater than four characters, no zero padding is allowed. +Unicode surrogates are not allowed. +.\" .Pp +.\" Unicode glyphs attenuate to the +.\" .Sq \&? +.\" character if invalid or not rendered by current output media. .Sh NUMBERED CHARACTERS For backward compatibility with existing manuals, .Xr mandoc 1 Index: mandoc.h =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.h,v retrieving revision 1.74 retrieving revision 1.75 diff -Lmandoc.h -Lmandoc.h -u -p -r1.74 -r1.75 --- mandoc.h +++ mandoc.h @@ -299,6 +299,7 @@ enum mandoc_esc { ESCAPE_FONTROMAN, /* roman font mode */ ESCAPE_FONTPREV, /* previous font mode */ ESCAPE_NUMBERED, /* a numbered glyph */ + ESCAPE_UNICODE, /* a unicode codepoint */ ESCAPE_NOSPACE /* suppress space if the last on a line */ }; -- To unsubscribe send an email to source+unsubscribe@mdocml.bsd.lv