From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <5d375e920709180838t4070c23al11bc0eb5cc7280c9@mail.gmail.com> Date: Tue, 18 Sep 2007 17:38:48 +0200 From: Uriel To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] simplicity In-Reply-To: <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <8ccc8ba40709161155t356da3dcvc9735a2fe4f42a03@mail.gmail.com> <88ec1a25417025b5f86c7cdf76b249ff@quanstro.net> <46EE9A41.7DD78E60@null.net> <7359f0490709180827h6978ae52re27825646a091ec8@mail.gmail.com> Topicbox-Message-UUID: c0a23c12-ead2-11e9-9d60-3106f5b1d025 Don't complain, at least it is not producing random behaviour, I have seen versions of gnu awk that when feed plain ASCII input, if the locale was UTF-8, rules would match random lines of input, the fix? set the locale to 'C' at the top of all your scripts (and don't even think of dealing with files which actually contain non-ASCII UTF-8). This was some years ago, it might be fixed by now, but it demonstrates how the locale insanity makes life so much more fun. And talking of simplicity, don't forget to mention X. By chance I just found this gem in one of the many X headers: #define NBBY 8 /* number of bits in a byte */ uriel On 9/18/07, Rob Pike wrote: > On 9/17/07, Douglas A. Gwyn wrote: > > erik quanstrom wrote: > > > i think the devolution of gnu grep is quite instructive. ... > > > it gets to the heart of why plan9's invention and use (thank's rob, ken) of > > > utf-8 is so great. > > > > If the problem is that Gnu grep converts any non-8-bit character set > > to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a > > fair criticism of the software. The conversion approach handles a > > wide variety of character encoding scheme, whereas grepping the > > encodings directly (the fast approach) doesn't work well for many > > non-UTF-8 encodings. > > Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an > ASCII file if I set my locale to the UTF-8 locale. UTF-8 is ASCII > compatible - explicitly, publicly, and on purpose - so there is no > excuse for this sort of performance penalty. To be specific, in > the UTF-8 locale it should take just a few instructions to convert > any character to wchar_t, ASCII or not, but gnu grep was calling > malloc for this, even for an ASCII byte. > > It is a fair criticism to say this is unacceptable, whatever the > intentions of the authors may be. > > -rob >