* UTF kernel/user space discrepancy
@ 2016-03-10 17:43 Maurice Quennet
2016-03-10 18:55 ` [9front] " cinap_lenrek
0 siblings, 1 reply; 2+ messages in thread
From: Maurice Quennet @ 2016-03-10 17:43 UTC (permalink / raw)
To: 9front
Hi,
while reading some source code I discovered that /sys/include/libc.h defines
enum
{
UTFmax = 4, /* maximum bytes per rune */
Runesync = 0x80, /* cannot represent part of a
UTF sequence (<) */
Runeself = 0x80, /* rune and UTF sequences are
the same (<) */
Runeerror = 0xFFFD, /* decoding error in UTF */
Runemax = 0x10FFFF, /* 21 bit rune */
Runemask = 0x1FFFFF, /* bits used by runes (see grep) */
};
whereas /sys/src/9/port/lib.h defines
enum
{
UTFmax = 3, /* maximum bytes per rune */
Runesync = 0x80, /* cannot represent part of a UTF
sequence */
Runeself = 0x80, /* rune and UTF sequences are the same
(<) */
Runeerror = 0xFFFD, /* decoding error in UTF */
Runemax = 0xFFFF, /* 16 bit rune */
};
I'm not sure if this is considered a bug (the system works either way),
but it struck me as odd, that the kernel and user space would use
different types of runes (just to be crystal clear: I have no technical
knowledge about UTF, whatsoever, other than "has more characters than
ASCII"). Especially, since vanilla Plan 9 consistently uses 21 bit runes
(although they call it "24 bit rune[s]" in port/lib.h).
I tested the patch below and it seems to work (on 386 …).
- Maurice
diff -r 6b193fcbc781 sys/src/9/port/lib.h
--- a/sys/src/9/port/lib.h Tue Mar 08 16:45:29 2016 +0100
+++ b/sys/src/9/port/lib.h Wed Mar 09 23:28:35 2016 +0100
@@ -35,11 +35,12 @@
enum
{
- UTFmax = 3, /* maximum bytes per rune */
- Runesync = 0x80, /* cannot represent part of a UTF sequence */
- Runeself = 0x80, /* rune and UTF sequences are the same (<) */
+ UTFmax = 4, /* maximum bytes per rune */
+ Runesync = 0x80, /* cannot represent part of a UTF sequence (<) */
+ Runeself = 0x80, /* rune and UTF sequences are the same (<) */
Runeerror = 0xFFFD, /* decoding error in UTF */
- Runemax = 0xFFFF, /* 16 bit rune */
+ Runemax = 0x10FFFF, /* 21 bit rune */
+ Runemask = 0x1FFFFF, /* bits used by runes (see grep) */
};
/*
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [9front] UTF kernel/user space discrepancy
2016-03-10 17:43 UTF kernel/user space discrepancy Maurice Quennet
@ 2016-03-10 18:55 ` cinap_lenrek
0 siblings, 0 replies; 2+ messages in thread
From: cinap_lenrek @ 2016-03-10 18:55 UTC (permalink / raw)
To: 9front
yes. that was an oversight. thanks.
--
cinap
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-03-10 18:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10 17:43 UTF kernel/user space discrepancy Maurice Quennet
2016-03-10 18:55 ` [9front] " cinap_lenrek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).