9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] plan9 and the Unicode Consortium definitions
@ 2005-08-19 15:23 Dimitry Golubovsky
  0 siblings, 0 replies; 5+ messages in thread
From: Dimitry Golubovsky @ 2005-08-19 15:23 UTC (permalink / raw)
  To: mirtchov; +Cc: 9fans

Andrey,

Andrey wrote:

>> I am just wondering whether any API to access more complete set of
>> character properties defined by Unicode.org is available in Plan9.

>you mean things like diacritics?

I mean character categories defined in 

http://www.unicode.org/Public/4.1.0/ucd/UCD.html#General_Category_Values

Abbr.  Description
 
Lu Letter, Uppercase 
Ll Letter, Lowercase 
Lt Letter, Titlecase 
Lm Letter, Modifier 
Lo Letter, Other 
Mn Mark, Nonspacing 
Mc Mark, Spacing Combining 
Me Mark, Enclosing 
Nd Number, Decimal Digit 

etc., total about 30 or so. isxxxrune distinguishes only among 5 categories.

This would probably inlcude diacritics, but my question was more
general (maybe even philosophical): there exists a recommended set of
Unicode character properties, APIs, and interfaces (Unicode.org).
Plan9 which probably influenced some aspects of Unicode to be
implemented in other systems does not follow. Is there any historical
/political /technical /other reason? Related man pages mention "The
Unicode Standard" though in SEE ALSO section.

What is more interesting to me (technically, as I asked in my first
message) - is 16-bitness of runes hardcoded anywhere in the kernel, or
only in libc?

-- 
Dimitry Golubovsky

Anywhere on the Web


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [9fans] plan9 and the Unicode Consortium definitions
@ 2005-08-19 14:51 Dimitry Golubovsky
  2005-08-19 15:00 ` Christoph Lohmann
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dimitry Golubovsky @ 2005-08-19 14:51 UTC (permalink / raw)
  To: 9fans

I am just wondering whether any API to access more complete set of
character properties defined by Unicode.org is available in Plan9. So
far I have seen only library functions like isalpharune(2) defined in
runetype.c, but it does not cover all the character categories defined
by the Unicode Consortium. Something might be expected in the Section
7 of manpages, might not it?

BTW I've got some code I wrote earlier for Hugs and Glasgow Haskell
Compiler, which is autogenerated from UnicodeData.txt (runetype.c
seems to be manually hardcoded, or at least there is nothing in the
mkfile that shows how it was generated). If there is any interest, I
may send a link. My code is based on the same princilpes as I see in
runetype.c: binary search over sorted lists of character ranges.

Another question: is (historical) 16-bitness of runes a limitation of
the C runtime library only, or is the kernel rune-size-aware, too?
Because what Unicode.org defines is wider than 16 bits, as everybody
knows.

Unless there is any intentional divergence from the Unicode.org definitions.

PS I looked at the sources mirror at 9grid.de, and manpages at the
Bell Labs website. Outdated?

-- 
Dimitry Golubovsky

Anywhere on the Web


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-08-19 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-19 15:23 [9fans] plan9 and the Unicode Consortium definitions Dimitry Golubovsky
  -- strict thread matches above, loose matches on Subject: below --
2005-08-19 14:51 Dimitry Golubovsky
2005-08-19 15:00 ` Christoph Lohmann
2005-08-19 15:03 ` andrey mirtchovski
2005-08-19 15:29 ` Rob Pike

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).