From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Tue, 28 Dec 2010 23:40:43 -0500 To: 9fans@9fans.net Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] hard-coded UTF-8 in wc.c Topicbox-Message-UUID: 8ec89e58-ead6-11e9-9d60-3106f5b1d025 this just popped up when i was searching the archive. On Mon 15 Mar 2010 18:44:41 EST 2010, quanstro@quanstro.net wrote: > On Mon Mar 15 17:46:11 EDT 2010, aim0shei@lav... wrote: > > Yes, but why wc utility counts runes (wc(1) call them runes) manually > > using huge table instead of using functions from rune(3) such as utflen? > > i didn't write wc, but i would imagine that it's for speed. i took some time a few weeks ago to extend wc to handle runes up to 0x10ffff which ment adding 3 states for 4-byte runes and adding an additional table. with that perspective ... wc is a big state machine. using the rune functions would hide a good deal of the state machine, which would make the states harder to understand and some of this work would need to be redone. the tables are actually really easy to understand and generate. wikipedia has a discussion of the bit patterns which can help. - erik