From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <5d375e920709180838t4070c23al11bc0eb5cc7280c9@mail.gmail.com>
Date: Tue, 18 Sep 2007 17:38:48 +0200
From: Uriel <uriel99@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] simplicity
In-Reply-To: <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <8ccc8ba40709161155t356da3dcvc9735a2fe4f42a03@mail.gmail.com>
	<88ec1a25417025b5f86c7cdf76b249ff@quanstro.net>
	<46EE9A41.7DD78E60@null.net>
	<7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com>
Topicbox-Message-UUID: c0a23c12-ead2-11e9-9d60-3106f5b1d025

Don't complain, at least it is not producing random behaviour, I have
seen versions of gnu awk that when feed plain ASCII input, if the
locale was UTF-8, rules would match random lines of input, the fix?
set the locale to 'C' at the top of all your scripts (and don't even
think of dealing with files which actually contain non-ASCII UTF-8).

This was some years ago, it might be fixed by now, but it demonstrates
how the locale insanity makes life so much more fun.

And talking of simplicity, don't forget to mention X. By chance I just
found this gem in one of the many X headers:

#define NBBY    8       /* number of bits in a byte */

uriel


On 9/18/07, Rob Pike <robpike@gmail.com> wrote:
> On 9/17/07, Douglas A. Gwyn <DAGwyn@null.net> wrote:
> > erik quanstrom wrote:
> > > i think the devolution of gnu grep is quite instructive.  ...
> > > it gets to the heart of why plan9's invention and use (thank's rob, ken) of
> > > utf-8 is so great.
> >
> > If the problem is that Gnu grep converts any non-8-bit character set
> > to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a
> > fair criticism of the software.  The conversion approach handles a
> > wide variety of character encoding scheme, whereas grepping the
> > encodings directly (the fast approach) doesn't work well for many
> > non-UTF-8 encodings.
>
> Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an
> ASCII file if I set my locale to the UTF-8 locale.  UTF-8 is ASCII
> compatible - explicitly, publicly, and on purpose - so there is no
> excuse for this sort of performance penalty.  To be specific, in
> the UTF-8 locale it should take just a few instructions to convert
> any character to wchar_t, ASCII or not, but gnu grep was calling
> malloc for this, even for an ASCII byte.
>
> It is a fair criticism to say this is unacceptable, whatever the
> intentions of the authors may be.
>
> -rob
>