From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] simplicity
Date: Wed, 10 Oct 2007 00:02:38 -0400 [thread overview]
Message-ID: <e53cd4b0faac1f3090800c8190d4958c@quanstro.net> (raw)
In-Reply-To: <6e35c0620710092030u1187029dhf54f67e48a62b85c@mail.gmail.com>
> Yes, old thread, sorry. Blame Uriel.
>
> On 9/18/07, Douglas A. Gwyn <DAGwyn@null.net> wrote:
> > erik quanstrom wrote:
> > > suppose Linux user a and user b grep the same "text" file for the same string.
> > > results will depend on the users' locales.
> >
> > But if they're trying to match an alphabetic character class, the
> > result *should* depend on the locale.
>
> This baffles me. Can anyone think of examples where one might want
> differing results depending on your locale?
>
> -Jack
i think i see what the reasoning is. the thought is that, e.g.,
in spanish [a-z] should match ñ.
the problem is this means that grep(regexp, data) now
returns a set of results, one for each locale.
so on the one hand, one would like [a-z] to do the Right Thing,
depending on language. and on the other hand, one wants
grep(regexp, data) to return a single result.
i think the way to see through this issue is to notice that
the reason we want ñ to be in [a-z] is because of visual
similarity. what if we were dealing with chinese? i think
it's pretty clear that [a-z] should map to a contiguous set
of unicode codepoints.
if you want to deal with ñ, the unicode tables do note that ñ
is n+combining ~, so one could come up with a new
denotation for base codepoint. unfortunately the combining
that with existing regexp would be a bit painful.
- erik
next prev parent reply other threads:[~2007-10-10 4:02 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-16 18:55 Francisco J Ballesteros
2007-09-16 20:42 ` Anant Narayanan
2007-09-16 21:24 ` Francisco J Ballesteros
2007-09-17 15:22 ` Douglas A. Gwyn
2007-09-16 20:43 ` roger peppe
2007-09-16 20:53 ` Steve Simon
2007-09-17 15:22 ` Douglas A. Gwyn
2007-09-17 20:00 ` Scott Schwartz
2007-09-17 3:23 ` erik quanstrom
2007-09-17 15:22 ` Douglas A. Gwyn
2007-09-17 15:55 ` erik quanstrom
2007-09-18 8:38 ` Douglas A. Gwyn
2007-09-18 10:45 ` dave.l
2007-09-18 14:44 ` Iruata Souza
2007-09-18 15:41 ` Douglas A. Gwyn
2007-09-18 21:34 ` Iruata Souza
2007-10-10 3:30 ` Jack Johnson
2007-10-10 4:02 ` erik quanstrom [this message]
2007-10-10 6:17 ` Jack Johnson
2007-10-10 12:22 ` erik quanstrom
2007-09-18 15:27 ` Rob Pike
2007-09-18 15:38 ` Uriel
2007-09-19 8:50 ` Douglas A. Gwyn
2007-09-19 11:51 ` erik quanstrom
2007-09-19 15:02 ` Russ Cox
2007-09-19 14:17 ` Charles Forsyth
2007-09-19 14:21 ` Iruata Souza
2007-09-19 15:32 ` Skip Tavakkolian
2007-10-09 20:08 ` Aharon Robbins
2007-10-09 21:08 ` Uriel
2007-10-10 5:33 ` sqweek
2007-10-10 11:49 ` erik quanstrom
2007-09-17 14:52 ` ron minnich
2007-09-17 14:53 ` ron minnich
2007-10-10 7:36 John Stalker
2007-10-10 8:24 ` Charles Forsyth
2007-10-10 11:47 ` erik quanstrom
2007-10-10 14:05 ` John Stalker
2007-10-10 14:29 ` erik quanstrom
2007-10-10 15:26 ` John Stalker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e53cd4b0faac1f3090800c8190d4958c@quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@cse.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).