From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Mon, 15 Mar 2010 17:13:40 -0400 To: 9fans@9fans.net Message-ID: In-Reply-To: <20100315210251.GA26934@machine> References: <20100315210251.GA26934@machine> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] hard-coded UTF-8 in wc.c Topicbox-Message-UUID: e96a3052-ead5-11e9-9d60-3106f5b1d025 On Mon Mar 15 17:12:06 EDT 2010, aim0shei@lavabit.com wrote: > Just looked at source of wc > (http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/wc.c). UTF-8 > is hard-coded here. What is the reason? Nobody wants to rewrite it, > it is optimization or it is impossible to rewrite it using runes for > some reason? > > http://plan9.bell-labs.com/sys/doc/utf.html says all you need to do to > change encoding is: > 1. Rewrite UTF encoding/decoding code. > 2. Convert all text files. > 3. Recompile all software. > > Looks like it is impossible with current code. It is not fixed just > because there is more important work or there is some serious problem > in design? perhaps you have misunderstood. inside programs, sometimes unicode text is represented as runes. runes are not sent over pipes nor stored in files. therefore, there is no need to wc runes. - erik