From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <7ba10925935da3080b62c7cb6e2649d5@coraid.com>
From: erik quanstrom <quanstro@coraid.com>
Date: Mon, 17 Sep 2007 11:55:04 -0400
To: 9fans@cse.psu.edu
Subject: Re: [9fans] simplicity
In-Reply-To: <46EE9A41.7DD78E60@null.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Topicbox-Message-UUID: c050adde-ead2-11e9-9d60-3106f5b1d025

> erik quanstrom wrote:
> > i think the devolution of gnu grep is quite instructive.  ...
> > it gets to the heart of why plan9's invention and use (thank's rob, ken) of
> > utf-8 is so great.
>
> If the problem is that Gnu grep converts any non-8-bit character set
> to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a
> fair criticism of the software.  The conversion approach handles a
> wide variety of character encoding scheme, whereas grepping the
> encodings directly (the fast approach) doesn't work well for many
> non-UTF-8 encodings.

performance may suck, but that's just a symptom of a bigger problem.

wchar_t is not the equivalent of Rune.  Rune is always utf-8.  wchar_t
can be whatever.

this is not a feature.  it is a bug.

suppose Linux user a and user b grep the same "text" file for the same string.
results will depend on the users' locales.

contrast plan 9.  any two users grepping the same file for the same string
will get the same results.

in either case a character set conversion might be necessary to match
the locale.  but in the plan 9 case, one conversion will fix things for
any plan 9 user.  in the Linux case, there is no conversion that will fix
things for any Linux user.

- erik

p.s. gnu grep does special-cases utf-8 and avoids wchar_t conversions