From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <e9b74b6445d28fc21789f1fb390f769b@yourdomain.dom>
To: 9fans@cse.psu.edu
From: dbailey27@ameritech.net
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: [9fans] The cost of Runes in Lexi
Date: Tue, 24 Feb 2004 18:07:22 -0500
Topicbox-Message-UUID: f3f40ff6-eacc-11e9-9e20-41e7f4b1d025

Hey, all,
	I'm redesigning my tokenizer and preprocessor to use Runes,
and it got me to thinking... is it worth the time to interpret UTF
patterns besides Latin '0' -> '9' as integers? If one performs a
grep on /lib/unicode:
	grep '(digit|number)' /lib/unicode

it is obvious that more than a few languages will have disctinct
UTF patterns for digits. Is it desirable to interpret these values as
digits? I'm especially interested in the opinion of our Asian friends.

	In Japanese (and others?), integers are depicted by multiple
glyphs when written as Hiragana. However, are there single-glyph
Kanji that can be interpreted as an Integer when seen isolated from
other NAME class glyphs?

	Basically, this would allow for unicode known to equate to
digits to be interpreted as integers in expressions and assignments
throughout an ASM or C source file. Every unicode *not* known as
a digit would be interpreted as a NAME class character. Thus, one
could still properly program C source in Thai, Lao, etc, as long as
each reserved word was still in english.

	What are your thoughts?

Don (north_)