From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@labs.coraid.com>
Date: Mon, 15 Mar 2010 17:13:40 -0400
To: 9fans@9fans.net
Message-ID: <f162c25ba3a2d4f206b1eccf52df8742@coraid.com>
In-Reply-To: <20100315210251.GA26934@machine>
References: <20100315210251.GA26934@machine>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] hard-coded UTF-8 in wc.c
Topicbox-Message-UUID: e96a3052-ead5-11e9-9d60-3106f5b1d025

On Mon Mar 15 17:12:06 EDT 2010, aim0shei@lavabit.com wrote:
> Just looked at source of wc
> (http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/wc.c). UTF-8
> is hard-coded here. What is the reason? Nobody wants to rewrite it,
> it is optimization or it is impossible to rewrite it using runes for
> some reason?
>
> http://plan9.bell-labs.com/sys/doc/utf.html says all you need to do to
> change encoding is:
> 1. Rewrite UTF encoding/decoding code.
> 2. Convert all text files.
> 3. Recompile all software.
>
> Looks like it is impossible with current code. It is not fixed just
> because there is more important work or there is some serious problem
> in design?

perhaps you have misunderstood.

inside programs, sometimes unicode text is represented as
runes.  runes are not sent over pipes nor stored in files.

therefore, there is no need to wc runes.

- erik