From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <39d22acfc53470335fdb74156c738feb@plan9.bell-labs.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] awk, not utf aware...
Date: Wed, 27 Feb 2008 10:54:27 -0500
From: Sape Mullender <sape@plan9.bell-labs.com>
In-Reply-To: <599f06db0802262336n7e418f22p1a94e2cfbb564069@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Topicbox-Message-UUID: 6531fcc2-ead3-11e9-9d60-3106f5b1d025

> There is split and other functions,
> for example:
>=20
> toupper("a=C3=AD")
> gives
> A=C3=AD
>=20
> My guess is that there are many more little (or not) corners where it
> doesn't work.

Yes, and then there is locale: does [a-z] include =C4=B3 when you run it
in Holland (it should)?  Does it include =C3=A1, =C3=A8, =C3=B4 in France=
 (it should)?
Does it include =C3=B8, =C3=A5 in Norway (it should not)?  And what happe=
ns when
you evaluate "=C3=A8" < "o" (it depends)?

Fixing awk is much harder than anyone things.  I had a chat about it with
Brian Kernighan and he says he's been thinking about fixing awk for a
long time, but that it really is a hard problem.

	Sape