From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: To: 9fans@cse.psu.edu From: dbailey27@ameritech.net MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] The cost of Runes in Lexi Date: Tue, 24 Feb 2004 18:07:22 -0500 Topicbox-Message-UUID: f3f40ff6-eacc-11e9-9e20-41e7f4b1d025 Hey, all, I'm redesigning my tokenizer and preprocessor to use Runes, and it got me to thinking... is it worth the time to interpret UTF patterns besides Latin '0' -> '9' as integers? If one performs a grep on /lib/unicode: grep '(digit|number)' /lib/unicode it is obvious that more than a few languages will have disctinct UTF patterns for digits. Is it desirable to interpret these values as digits? I'm especially interested in the opinion of our Asian friends. In Japanese (and others?), integers are depicted by multiple glyphs when written as Hiragana. However, are there single-glyph Kanji that can be interpreted as an Integer when seen isolated from other NAME class glyphs? Basically, this would allow for unicode known to equate to digits to be interpreted as integers in expressions and assignments throughout an ASM or C source file. Every unicode *not* known as a digit would be interpreted as a NAME class character. Thus, one could still properly program C source in Thai, Lao, etc, as long as each reserved word was still in english. What are your thoughts? Don (north_)