From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Thu, 28 Jan 2010 15:05:03 -0500
To: 9fans@9fans.net
Message-ID: <6277a4dcc738c2eee17e029efeb1b324@ladd.quanstro.net>
In-Reply-To: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca>
References: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1
Topicbox-Message-UUID: c9bd4334-ead5-11e9-9d60-3106f5b1d025

> A colleague put me on to Plan9, some of whose online documentation I
> have read with interest, in particular the "Hello World" discussion as
> it relates to Unicode/UTF-8.
>
> I'm one of the authors of the Cuneiform proposal now encoded under
> Unicode (see block U+12000), and I'm interesting in lex/yacc-like
> parsing of Unicode input to produce (among other things) Cuneiform
> output.
>
> I realize some of the documentation was written long ago... so I'm
> unclear as to whether or not (or how easily) Plan9 (and specifically its
> lex/yacc software, etc.) handles such things? (this sparked by the
> references to four hex digits etc.)

that's interesting stuff.

lex(1) is generally not used, and doesn't support
unicode.  yacc(1) does a fine job with unicode.
though, to be fair, most of that job falls on the
lexer.  however this is not hard to do by hand. there
are many good examples in the distribution.
the bio(2) buffered io library provides a Bgetrune
function, which is generally what is desired.

(i have some patches, partially stolen from russ,
that should support extended plane runes at the
cost of double the storage.)

- erik