9front - general discussion about 9front
 help / color / mirror / Atom feed
* UTF kernel/user space discrepancy
@ 2016-03-10 17:43 Maurice Quennet
  2016-03-10 18:55 ` [9front] " cinap_lenrek
  0 siblings, 1 reply; 2+ messages in thread
From: Maurice Quennet @ 2016-03-10 17:43 UTC (permalink / raw)
  To: 9front

Hi,

while reading some source code I discovered that /sys/include/libc.h defines

enum
{
         UTFmax          = 4,            /* maximum bytes per rune */
         Runesync        = 0x80,         /* cannot represent part of a 
UTF sequence (<) */
         Runeself        = 0x80,         /* rune and UTF sequences are 
the same (<) */
         Runeerror       = 0xFFFD,       /* decoding error in UTF */
         Runemax         = 0x10FFFF,     /* 21 bit rune */
         Runemask        = 0x1FFFFF,     /* bits used by runes (see grep) */
};

whereas /sys/src/9/port/lib.h defines

enum
{
         UTFmax          = 3,    /* maximum bytes per rune */
         Runesync        = 0x80, /* cannot represent part of a UTF 
sequence */
         Runeself        = 0x80, /* rune and UTF sequences are the same 
(<) */
         Runeerror       = 0xFFFD,       /* decoding error in UTF */
         Runemax         = 0xFFFF,       /* 16 bit rune */
};

I'm not sure if this is considered a bug (the system works either way), 
but it struck me as odd, that the kernel and user space would use 
different types of runes (just to be crystal clear: I have no technical 
knowledge about UTF, whatsoever, other than "has more characters than 
ASCII"). Especially, since vanilla Plan 9 consistently uses 21 bit runes 
(although they call it "24 bit rune[s]" in port/lib.h).

I tested the patch below and it seems to work (on 386 …).

- Maurice


diff -r 6b193fcbc781 sys/src/9/port/lib.h
--- a/sys/src/9/port/lib.h	Tue Mar 08 16:45:29 2016 +0100
+++ b/sys/src/9/port/lib.h	Wed Mar 09 23:28:35 2016 +0100
@@ -35,11 +35,12 @@

  enum
  {
-	UTFmax		= 3,	/* maximum bytes per rune */
-	Runesync	= 0x80,	/* cannot represent part of a UTF sequence */
-	Runeself	= 0x80,	/* rune and UTF sequences are the same (<) */
+	UTFmax		= 4,		/* maximum bytes per rune */
+	Runesync	= 0x80,		/* cannot represent part of a UTF sequence (<) */
+	Runeself	= 0x80,		/* rune and UTF sequences are the same (<) */
  	Runeerror	= 0xFFFD,	/* decoding error in UTF */
-	Runemax		= 0xFFFF,	/* 16 bit rune */
+	Runemax		= 0x10FFFF,	/* 21 bit rune */
+	Runemask	= 0x1FFFFF,	/* bits used by runes (see grep) */
  };

  /*


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [9front] UTF kernel/user space discrepancy
  2016-03-10 17:43 UTF kernel/user space discrepancy Maurice Quennet
@ 2016-03-10 18:55 ` cinap_lenrek
  0 siblings, 0 replies; 2+ messages in thread
From: cinap_lenrek @ 2016-03-10 18:55 UTC (permalink / raw)
  To: 9front

yes. that was an oversight. thanks.

--
cinap


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-03-10 18:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10 17:43 UTF kernel/user space discrepancy Maurice Quennet
2016-03-10 18:55 ` [9front] " cinap_lenrek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).