From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <599f06db0802260654x555970e8q8a055c2889a0c121@mail.gmail.com> Date: Tue, 26 Feb 2008 15:54:50 +0100 From: "Gorka Guardiola" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] awk, not utf aware... In-Reply-To: <20080226131613.GA811@shodan.homeunix.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <599f06db0802260418m1c2732fdt1487051c59152e27@mail.gmail.com> <20080226131613.GA811@shodan.homeunix.net> Topicbox-Message-UUID: 61eeaa60-ead3-11e9-9d60-3106f5b1d025 On Tue, Feb 26, 2008 at 2:16 PM, Martin Neubauer wrote: > Awk is one of the few programs in the ditribution that is maintained > externally (by Brian Kernighan) and is pulled in via ape and pcc (it mig= ht > actually be the only one - I didn't bother to check.) A quick glimpse at > lex.c suggests that awk scans input one char at a time. In hindsight I'm= a > bit surprised that I haven't got bitten by this, but I probably didn't s= plit > within multibyte sequences. It's probably not too hard to change awk to = read > runes for the price of creating ``the other one true awk.'' > I don't know if it is as easy. I leave it in my todo list for the future :-= ). Anyway, the BUGS section should say it does not know about UTF. I=B4ll send a patch. --=20 - curiosity sKilled the cat