9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Gorka Guardiola" <paurea@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] awk, not utf aware...
Date: Wed, 27 Feb 2008 08:36:14 +0100	[thread overview]
Message-ID: <599f06db0802262336n7e418f22p1a94e2cfbb564069@mail.gmail.com> (raw)
In-Reply-To: <7f50f59eb69c5e678cbb97e2b978b112@quanstro.net>

On Tue, Feb 26, 2008 at 9:24 PM, erik quanstrom <quanstro@quanstro.net> wrote:
>
>  i think the comments about this problem are missing the point
>  a bit.  utf8 should be transparent to awk unless the situation demands

No. It is not transparent at all. It is semitranslucid because someone did it
partways and because of that I have been bitten hardly by this in different
situations (I am not complaining, just saying that this may not be the right
approach to take in the future).

What someone did is make it so:
/a.j/
matches
a☺j
because someone fixed the regexp part of awk somehow it already understands this
which made me (falsely) think originally that it works and conned me
into the bug.

There is split and other functions,
for example:

toupper("aí")
gives
Aí

My guess is that there are many more little (or not) corners where it
doesn't work.
We can go on and on looking for crevices and hiding the bugs further
under the rug
so that they are not evident and find everyone completely unaware,
leave awk as it is now or really fix the problem. The first approach
doesn't work. I am going to take
the second till I have time to take the third which means use runes or
at least revise all the
code so that it is uniformly aware of the existance of non-ascii characters.
-- 
- curiosity sKilled the cat

  parent reply	other threads:[~2008-02-27  7:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-26 12:18 Gorka Guardiola
2008-02-26 13:16 ` Martin Neubauer
2008-02-26 14:54   ` Gorka Guardiola
2008-02-26 20:24 ` erik quanstrom
2008-02-26 21:08   ` geoff
2008-02-26 21:21     ` Pietro Gagliardi
2008-02-26 21:24       ` erik quanstrom
2008-02-26 21:32       ` Steven Vormwald
2008-02-26 21:40         ` Pietro Gagliardi
2008-02-26 21:42           ` Pietro Gagliardi
2008-02-26 23:59           ` Steven Vormwald
2008-02-27  2:38       ` Joel C. Salomon
2008-02-29 17:00         ` Douglas A. Gwyn
2008-02-26 21:34     ` erik quanstrom
2008-02-27  7:36   ` Gorka Guardiola [this message]
2008-02-27 15:54     ` Sape Mullender
2008-02-27 20:01       ` Uriel
2008-02-28 19:06         ` [9fans] localization, unicode, regexps (was: awk, not utf aware...) Tristan Plumb
2008-02-28 15:10       ` [9fans] awk, not utf aware erik quanstrom
2008-03-03 23:48         ` Jack Johnson
2008-03-04  0:13           ` erik quanstrom
2008-02-27  9:57 erik quanstrom
2008-02-28 18:54 Aharon Robbins
2008-02-28 21:48 ` Uriel
2008-02-28 22:08   ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=599f06db0802262336n7e418f22p1a94e2cfbb564069@mail.gmail.com \
    --to=paurea@gmail.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).