source@mandoc.bsd.lv
 help / color / mirror / Atom feed
* mandoc: Improve coverage of edge cases for 3-byte UTF-8 sequences.
@ 2024-05-16 20:37 schwarze
  0 siblings, 0 replies; only message in thread
From: schwarze @ 2024-05-16 20:37 UTC (permalink / raw)
  To: source

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 11391 bytes --]

Log Message:
-----------
Improve coverage of edge cases for 3-byte UTF-8 sequences.
Coverage for 2-byte and 4-byte sequences was already reasonable.

Modified Files:
--------------
    mandoc/regress/char/unicode:
        input.in
        input.out_ascii
        input.out_lint
        input.out_utf8

Revision Data
-------------
Index: input.out_lint
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/char/unicode/input.out_lint,v
diff -Lregress/char/unicode/input.out_lint -Lregress/char/unicode/input.out_lint -u -p -r1.7 -r1.8
--- regress/char/unicode/input.out_lint
+++ regress/char/unicode/input.out_lint
@@ -21,61 +21,61 @@ mandoc: input.in:34:19: ERROR: skipping 
 mandoc: input.in:35:17: ERROR: skipping bad character: 0xe0
 mandoc: input.in:35:18: ERROR: skipping bad character: 0x9f
 mandoc: input.in:35:19: ERROR: skipping bad character: 0xbf
-mandoc: input.in:42:25: ERROR: skipping bad character: 0xed
-mandoc: input.in:42:26: ERROR: skipping bad character: 0xa0
-mandoc: input.in:42:27: ERROR: skipping bad character: 0x80
-mandoc: input.in:42:17: ERROR: invalid special character: \[uD800]
 mandoc: input.in:43:25: ERROR: skipping bad character: 0xed
-mandoc: input.in:43:26: ERROR: skipping bad character: 0xbf
-mandoc: input.in:43:27: ERROR: skipping bad character: 0xbf
-mandoc: input.in:43:17: ERROR: invalid special character: \[uDFFF]
-mandoc: input.in:53:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:53:20: ERROR: skipping bad character: 0x80
-mandoc: input.in:53:21: ERROR: skipping bad character: 0x80
-mandoc: input.in:53:22: ERROR: skipping bad character: 0x80
-mandoc: input.in:54:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:54:20: ERROR: skipping bad character: 0x80
-mandoc: input.in:54:21: ERROR: skipping bad character: 0x81
-mandoc: input.in:54:22: ERROR: skipping bad character: 0xbf
-mandoc: input.in:55:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:55:20: ERROR: skipping bad character: 0x80
-mandoc: input.in:55:21: ERROR: skipping bad character: 0x82
-mandoc: input.in:55:22: ERROR: skipping bad character: 0x80
-mandoc: input.in:56:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:56:20: ERROR: skipping bad character: 0x80
-mandoc: input.in:56:21: ERROR: skipping bad character: 0x9f
-mandoc: input.in:56:22: ERROR: skipping bad character: 0xbf
-mandoc: input.in:57:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:57:20: ERROR: skipping bad character: 0x80
-mandoc: input.in:57:21: ERROR: skipping bad character: 0xa0
-mandoc: input.in:57:22: ERROR: skipping bad character: 0x80
+mandoc: input.in:43:26: ERROR: skipping bad character: 0xa0
+mandoc: input.in:43:27: ERROR: skipping bad character: 0x80
+mandoc: input.in:43:17: ERROR: invalid special character: \[uD800]
+mandoc: input.in:44:25: ERROR: skipping bad character: 0xed
+mandoc: input.in:44:26: ERROR: skipping bad character: 0xbf
+mandoc: input.in:44:27: ERROR: skipping bad character: 0xbf
+mandoc: input.in:44:17: ERROR: invalid special character: \[uDFFF]
 mandoc: input.in:58:19: ERROR: skipping bad character: 0xf0
-mandoc: input.in:58:20: ERROR: skipping bad character: 0x8f
-mandoc: input.in:58:21: ERROR: skipping bad character: 0xbf
-mandoc: input.in:58:22: ERROR: skipping bad character: 0xbf
-mandoc: input.in:67:31: ERROR: skipping bad character: 0xf4
-mandoc: input.in:67:32: ERROR: skipping bad character: 0x90
-mandoc: input.in:67:33: ERROR: skipping bad character: 0x80
-mandoc: input.in:67:34: ERROR: skipping bad character: 0x80
-mandoc: input.in:67:21: ERROR: invalid special character: \[u110000]
-mandoc: input.in:68:31: ERROR: skipping bad character: 0xf4
-mandoc: input.in:68:32: ERROR: skipping bad character: 0xbf
-mandoc: input.in:68:33: ERROR: skipping bad character: 0xbf
-mandoc: input.in:68:34: ERROR: skipping bad character: 0xbf
-mandoc: input.in:68:21: ERROR: invalid special character: \[u13FFFF]
-mandoc: input.in:69:31: ERROR: skipping bad character: 0xf5
-mandoc: input.in:69:32: ERROR: skipping bad character: 0x80
-mandoc: input.in:69:33: ERROR: skipping bad character: 0x80
-mandoc: input.in:69:34: ERROR: skipping bad character: 0x80
-mandoc: input.in:69:21: ERROR: invalid special character: \[u140000]
-mandoc: input.in:70:31: ERROR: skipping bad character: 0xf7
-mandoc: input.in:70:32: ERROR: skipping bad character: 0xbf
-mandoc: input.in:70:33: ERROR: skipping bad character: 0xbf
-mandoc: input.in:70:34: ERROR: skipping bad character: 0xbf
-mandoc: input.in:70:21: ERROR: invalid special character: \[u1FFFFF]
-mandoc: input.in:71:33: ERROR: skipping bad character: 0xf8
-mandoc: input.in:71:34: ERROR: skipping bad character: 0x88
-mandoc: input.in:71:35: ERROR: skipping bad character: 0x80
-mandoc: input.in:71:36: ERROR: skipping bad character: 0x80
-mandoc: input.in:71:37: ERROR: skipping bad character: 0x80
-mandoc: input.in:71:23: ERROR: invalid special character: \[u200000]
+mandoc: input.in:58:20: ERROR: skipping bad character: 0x80
+mandoc: input.in:58:21: ERROR: skipping bad character: 0x80
+mandoc: input.in:58:22: ERROR: skipping bad character: 0x80
+mandoc: input.in:59:19: ERROR: skipping bad character: 0xf0
+mandoc: input.in:59:20: ERROR: skipping bad character: 0x80
+mandoc: input.in:59:21: ERROR: skipping bad character: 0x81
+mandoc: input.in:59:22: ERROR: skipping bad character: 0xbf
+mandoc: input.in:60:19: ERROR: skipping bad character: 0xf0
+mandoc: input.in:60:20: ERROR: skipping bad character: 0x80
+mandoc: input.in:60:21: ERROR: skipping bad character: 0x82
+mandoc: input.in:60:22: ERROR: skipping bad character: 0x80
+mandoc: input.in:61:19: ERROR: skipping bad character: 0xf0
+mandoc: input.in:61:20: ERROR: skipping bad character: 0x80
+mandoc: input.in:61:21: ERROR: skipping bad character: 0x9f
+mandoc: input.in:61:22: ERROR: skipping bad character: 0xbf
+mandoc: input.in:62:19: ERROR: skipping bad character: 0xf0
+mandoc: input.in:62:20: ERROR: skipping bad character: 0x80
+mandoc: input.in:62:21: ERROR: skipping bad character: 0xa0
+mandoc: input.in:62:22: ERROR: skipping bad character: 0x80
+mandoc: input.in:63:19: ERROR: skipping bad character: 0xf0
+mandoc: input.in:63:20: ERROR: skipping bad character: 0x8f
+mandoc: input.in:63:21: ERROR: skipping bad character: 0xbf
+mandoc: input.in:63:22: ERROR: skipping bad character: 0xbf
+mandoc: input.in:72:31: ERROR: skipping bad character: 0xf4
+mandoc: input.in:72:32: ERROR: skipping bad character: 0x90
+mandoc: input.in:72:33: ERROR: skipping bad character: 0x80
+mandoc: input.in:72:34: ERROR: skipping bad character: 0x80
+mandoc: input.in:72:21: ERROR: invalid special character: \[u110000]
+mandoc: input.in:73:31: ERROR: skipping bad character: 0xf4
+mandoc: input.in:73:32: ERROR: skipping bad character: 0xbf
+mandoc: input.in:73:33: ERROR: skipping bad character: 0xbf
+mandoc: input.in:73:34: ERROR: skipping bad character: 0xbf
+mandoc: input.in:73:21: ERROR: invalid special character: \[u13FFFF]
+mandoc: input.in:74:31: ERROR: skipping bad character: 0xf5
+mandoc: input.in:74:32: ERROR: skipping bad character: 0x80
+mandoc: input.in:74:33: ERROR: skipping bad character: 0x80
+mandoc: input.in:74:34: ERROR: skipping bad character: 0x80
+mandoc: input.in:74:21: ERROR: invalid special character: \[u140000]
+mandoc: input.in:75:31: ERROR: skipping bad character: 0xf7
+mandoc: input.in:75:32: ERROR: skipping bad character: 0xbf
+mandoc: input.in:75:33: ERROR: skipping bad character: 0xbf
+mandoc: input.in:75:34: ERROR: skipping bad character: 0xbf
+mandoc: input.in:75:21: ERROR: invalid special character: \[u1FFFFF]
+mandoc: input.in:76:33: ERROR: skipping bad character: 0xf8
+mandoc: input.in:76:34: ERROR: skipping bad character: 0x88
+mandoc: input.in:76:35: ERROR: skipping bad character: 0x80
+mandoc: input.in:76:36: ERROR: skipping bad character: 0x80
+mandoc: input.in:76:37: ERROR: skipping bad character: 0x80
+mandoc: input.in:76:23: ERROR: invalid special character: \[u200000]
Index: input.out_utf8
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/char/unicode/input.out_utf8,v
diff -Lregress/char/unicode/input.out_utf8 -Lregress/char/unicode/input.out_utf8 -u -p -r1.8 -r1.9
--- regress/char/unicode/input.out_utf8
+++ regress/char/unicode/input.out_utf8
@@ -31,12 +31,17 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      U+1000   0xe18080   ကက     begin of second start byte
      U+CFFF   0xecbfbf   ì¿¿ì¿¿   end of last normal start byte
      U+D000   0xed8080   퀀퀀   begin of last start byte
+     U+D7FB   0xed9fbb   ퟻퟻ   highest valid public three-byte
      U+D7FF   0xed9fbf   ퟿퟿       highest public three-byte
      U+D800   0xeda080   ???    lowest surrogate
      U+DFFF   0xedbfbf   ???    highest surrogate
      U+E000   0xee8080        lowest private use
      U+F8FF   0xefa3bf        highest private use
      U+F900   0xefa480   豈豈   lowest post-private
+     U+FEFF   0xefbbbf          byte-order mark
+     U+FFFC   0xefbfbc        object replacement character
+     U+FFFD   0xefbfbd   ��     replacement character
+     U+FFFE   0xefbfbe   ￾￾       reversed byte-order mark
      U+FFFF   0xefbfbf   ï¿¿ï¿¿       highest three-byte
 
    F\bFo\bou\bur\br-\b-b\bby\byt\bte\be r\bra\ban\bng\bge\be
@@ -60,4 +65,4 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      U+1FFFFF   0xf7bfbfbf     ????    highest invalid four-byte
      U+200000   0xf888808080   ?????   lowest five-byte
 
-OpenBSD                          June 2, 2021            CHAR-UNICODE-INPUT(1)
+OpenBSD                          May 16, 2024            CHAR-UNICODE-INPUT(1)
Index: input.out_ascii
===================================================================
RCS file: /home/cvs/mandoc/mandoc/regress/char/unicode/input.out_ascii,v
diff -Lregress/char/unicode/input.out_ascii -Lregress/char/unicode/input.out_ascii -u -p -r1.7 -r1.8
--- regress/char/unicode/input.out_ascii
+++ regress/char/unicode/input.out_ascii
@@ -31,12 +31,17 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      U+1000   0xe18080   <?><?>   begin of second start byte
      U+CFFF   0xecbfbf   <?><?>   end of last normal start byte
      U+D000   0xed8080   <?><?>   begin of last start byte
+     U+D7FB   0xed9fbb   <?><?>   highest valid public three-byte
      U+D7FF   0xed9fbf   <?><?>   highest public three-byte
      U+D800   0xeda080   ???      lowest surrogate
      U+DFFF   0xedbfbf   ???      highest surrogate
      U+E000   0xee8080   <?><?>   lowest private use
      U+F8FF   0xefa3bf   <?><?>   highest private use
      U+F900   0xefa480   <?><?>   lowest post-private
+     U+FEFF   0xefbbbf   <?><?>   byte-order mark
+     U+FFFC   0xefbfbc   <?><?>   object replacement character
+     U+FFFD   0xefbfbd   <?><?>   replacement character
+     U+FFFE   0xefbfbe   <?><?>   reversed byte-order mark
      U+FFFF   0xefbfbf   <?><?>   highest three-byte
 
    F\bFo\bou\bur\br-\b-b\bby\byt\bte\be r\bra\ban\bng\bge\be
@@ -60,4 +65,4 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
      U+1FFFFF   0xf7bfbfbf     ????     highest invalid four-byte
      U+200000   0xf888808080   ?????    lowest five-byte
 
-OpenBSD                          June 2, 2021            CHAR-UNICODE-INPUT(1)
+OpenBSD                          May 16, 2024            CHAR-UNICODE-INPUT(1)
--
 To unsubscribe send an email to source+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-05-16 20:37 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-16 20:37 mandoc: Improve coverage of edge cases for 3-byte UTF-8 sequences schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).