From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com> Date: Tue, 18 Sep 2007 08:27:32 -0700 From: "Rob Pike" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] simplicity In-Reply-To: <46EE9A41.7DD78E60@null.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <8ccc8ba40709161155t356da3dcvc9735a2fe4f42a03@mail.gmail.com> <88ec1a25417025b5f86c7cdf76b249ff@quanstro.net> <46EE9A41.7DD78E60@null.net> Topicbox-Message-UUID: c09e19b6-ead2-11e9-9d60-3106f5b1d025 On 9/17/07, Douglas A. Gwyn wrote: > erik quanstrom wrote: > > i think the devolution of gnu grep is quite instructive. ... > > it gets to the heart of why plan9's invention and use (thank's rob, ken) of > > utf-8 is so great. > > If the problem is that Gnu grep converts any non-8-bit character set > to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a > fair criticism of the software. The conversion approach handles a > wide variety of character encoding scheme, whereas grepping the > encodings directly (the fast approach) doesn't work well for many > non-UTF-8 encodings. Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an ASCII file if I set my locale to the UTF-8 locale. UTF-8 is ASCII compatible - explicitly, publicly, and on purpose - so there is no excuse for this sort of performance penalty. To be specific, in the UTF-8 locale it should take just a few instructions to convert any character to wchar_t, ASCII or not, but gnu grep was calling malloc for this, even for an ASCII byte. It is a fair criticism to say this is unacceptable, whatever the intentions of the authors may be. -rob