source@mandoc.bsd.lv
 help / color / mirror / Atom feed
* mdocml: Support groff's escape for Unicode input.
@ 2011-05-15 15:30 kristaps
  0 siblings, 0 replies; only message in thread
From: kristaps @ 2011-05-15 15:30 UTC (permalink / raw)
  To: source

Log Message:
-----------
Support groff's escape for Unicode input.  See

  http://mdocml.bsd.lv/archives/tech/0368.html

For the time being, we just throw it away.

Modified Files:
--------------
    mdocml:
        mandoc.c
        mandoc.h
        mandoc_char.7

Revision Data
-------------
Index: mandoc.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v
retrieving revision 1.51
retrieving revision 1.52
diff -Lmandoc.c -Lmandoc.c -u -p -r1.51 -r1.52
--- mandoc.c
+++ mandoc.c
@@ -125,6 +125,14 @@ mandoc_escape(const char **end, const ch
 		break;
 	case ('['):
 		gly = ESCAPE_SPECIAL;
+		/*
+		 * Unicode escapes are defined in groff as \[uXXXX] to
+		 * \[u10FFFF], where the contained value must be a valid
+		 * Unicode codepoint.  Here, however, only check whether
+		 * it's not a zero-width escape.
+		 */
+		if ('u' == cp[i] && ']' != cp[i + 1])
+			gly = ESCAPE_UNICODE;
 		term = ']';
 		break;
 	case ('C'):
Index: mandoc_char.7
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc_char.7,v
retrieving revision 1.44
retrieving revision 1.45
diff -Lmandoc_char.7 -Lmandoc_char.7 -u -p -r1.44 -r1.45
--- mandoc_char.7
+++ mandoc_char.7
@@ -520,6 +520,20 @@ portable.
 .It \e*(Px   Ta \*(Px       Ta POSIX standard name
 .It \e*(Ai   Ta \*(Ai       Ta ANSI standard name
 .El
+.Sh UNICODE CHARACTERS
+The escape sequence
+.Pp
+.Dl \e[uXXXX]
+.Pp
+is interpreted as a Unicode codepoint.
+The codepoint must be in the range above U+0080 and less than U+10FFFF.
+For compatibility, points must be zero-padded to four characters; if
+greater than four characters, no zero padding is allowed.
+Unicode surrogates are not allowed.
+.\" .Pp
+.\" Unicode glyphs attenuate to the
+.\" .Sq \&?
+.\" character if invalid or not rendered by current output media.
 .Sh NUMBERED CHARACTERS
 For backward compatibility with existing manuals,
 .Xr mandoc 1
Index: mandoc.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.h,v
retrieving revision 1.74
retrieving revision 1.75
diff -Lmandoc.h -Lmandoc.h -u -p -r1.74 -r1.75
--- mandoc.h
+++ mandoc.h
@@ -299,6 +299,7 @@ enum	mandoc_esc {
 	ESCAPE_FONTROMAN, /* roman font mode */
 	ESCAPE_FONTPREV, /* previous font mode */
 	ESCAPE_NUMBERED, /* a numbered glyph */
+	ESCAPE_UNICODE, /* a unicode codepoint */
 	ESCAPE_NOSPACE /* suppress space if the last on a line */
 };
 
--
 To unsubscribe send an email to source+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-05-15 15:30 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-15 15:30 mdocml: Support groff's escape for Unicode input kristaps

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).