From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <39d22acfc53470335fdb74156c738feb@plan9.bell-labs.com> To: 9fans@cse.psu.edu Subject: Re: [9fans] awk, not utf aware... Date: Wed, 27 Feb 2008 10:54:27 -0500 From: Sape Mullender In-Reply-To: <599f06db0802262336n7e418f22p1a94e2cfbb564069@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 6531fcc2-ead3-11e9-9d60-3106f5b1d025 > There is split and other functions, > for example: >=20 > toupper("a=C3=AD") > gives > A=C3=AD >=20 > My guess is that there are many more little (or not) corners where it > doesn't work. Yes, and then there is locale: does [a-z] include =C4=B3 when you run it in Holland (it should)? Does it include =C3=A1, =C3=A8, =C3=B4 in France= (it should)? Does it include =C3=B8, =C3=A5 in Norway (it should not)? And what happe= ns when you evaluate "=C3=A8" < "o" (it depends)? Fixing awk is much harder than anyone things. I had a chat about it with Brian Kernighan and he says he's been thinking about fixing awk for a long time, but that it really is a hard problem. Sape