From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 28 Jan 2010 15:05:03 -0500 To: 9fans@9fans.net Message-ID: <6277a4dcc738c2eee17e029efeb1b324@ladd.quanstro.net> In-Reply-To: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> References: <4B61A280020000CC0001D4A1@wlgw07.wlu.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Lex, Yacc, Unicode Plane 1 Topicbox-Message-UUID: c9bd4334-ead5-11e9-9d60-3106f5b1d025 > A colleague put me on to Plan9, some of whose online documentation I > have read with interest, in particular the "Hello World" discussion as > it relates to Unicode/UTF-8. > > I'm one of the authors of the Cuneiform proposal now encoded under > Unicode (see block U+12000), and I'm interesting in lex/yacc-like > parsing of Unicode input to produce (among other things) Cuneiform > output. > > I realize some of the documentation was written long ago... so I'm > unclear as to whether or not (or how easily) Plan9 (and specifically its > lex/yacc software, etc.) handles such things? (this sparked by the > references to four hex digits etc.) that's interesting stuff. lex(1) is generally not used, and doesn't support unicode. yacc(1) does a fine job with unicode. though, to be fair, most of that job falls on the lexer. however this is not hard to do by hand. there are many good examples in the distribution. the bio(2) buffered io library provides a Bgetrune function, which is generally what is desired. (i have some patches, partially stolen from russ, that should support extended plane runes at the cost of double the storage.) - erik