From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com>
Date: Tue, 18 Sep 2007 08:27:32 -0700
From: "Rob Pike" <robpike@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] simplicity
In-Reply-To: <46EE9A41.7DD78E60@null.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <8ccc8ba40709161155t356da3dcvc9735a2fe4f42a03@mail.gmail.com>
	<88ec1a25417025b5f86c7cdf76b249ff@quanstro.net>
	<46EE9A41.7DD78E60@null.net>
Topicbox-Message-UUID: c09e19b6-ead2-11e9-9d60-3106f5b1d025

On 9/17/07, Douglas A. Gwyn <DAGwyn@null.net> wrote:
> erik quanstrom wrote:
> > i think the devolution of gnu grep is quite instructive.  ...
> > it gets to the heart of why plan9's invention and use (thank's rob, ken) of
> > utf-8 is so great.
>
> If the problem is that Gnu grep converts any non-8-bit character set
> to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a
> fair criticism of the software.  The conversion approach handles a
> wide variety of character encoding scheme, whereas grepping the
> encodings directly (the fast approach) doesn't work well for many
> non-UTF-8 encodings.

Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an
ASCII file if I set my locale to the UTF-8 locale.  UTF-8 is ASCII
compatible - explicitly, publicly, and on purpose - so there is no
excuse for this sort of performance penalty.  To be specific, in
the UTF-8 locale it should take just a few instructions to convert
any character to wchar_t, ASCII or not, but gnu grep was calling
malloc for this, even for an ASCII byte.

It is a fair criticism to say this is unacceptable, whatever the
intentions of the authors may be.

-rob