9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] simplicity
Date: Wed, 10 Oct 2007 00:02:38 -0400	[thread overview]
Message-ID: <e53cd4b0faac1f3090800c8190d4958c@quanstro.net> (raw)
In-Reply-To: <6e35c0620710092030u1187029dhf54f67e48a62b85c@mail.gmail.com>

> Yes, old thread, sorry.  Blame Uriel.
> 
> On 9/18/07, Douglas A. Gwyn <DAGwyn@null.net> wrote:
> > erik quanstrom wrote:
> > > suppose Linux user a and user b grep the same "text" file for the same string.
> > > results will depend on the users' locales.
> >
> > But if they're trying to match an alphabetic character class, the
> > result *should* depend on the locale.
> 
> This baffles me.  Can anyone think of examples where one might want
> differing results depending on your locale?
> 
> -Jack

i think i see what the reasoning is.  the thought is that, e.g.,
in spanish [a-z] should match ñ.  

the problem is this means that grep(regexp, data) now
returns a set of results, one for each locale.

so on the one hand, one would like [a-z] to do the Right Thing,
depending on language.  and on the other hand, one wants
grep(regexp, data) to return a single result.

i think the way to see through this issue is to notice that
the reason we want ñ to be in [a-z] is because of visual
similarity.  what if we were dealing with chinese?  i think
it's pretty clear that [a-z] should map to a contiguous set
of unicode codepoints.

if you want to deal with ñ, the unicode tables do note that ñ
is n+combining ~, so one could come up with a new
denotation for base codepoint.  unfortunately the combining
that with existing regexp would be a bit painful.

- erik


  reply	other threads:[~2007-10-10  4:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-16 18:55 Francisco J Ballesteros
2007-09-16 20:42 ` Anant Narayanan
2007-09-16 21:24   ` Francisco J Ballesteros
2007-09-17 15:22     ` Douglas A. Gwyn
2007-09-16 20:43 ` roger peppe
2007-09-16 20:53   ` Steve Simon
2007-09-17 15:22     ` Douglas A. Gwyn
2007-09-17 20:00   ` Scott Schwartz
2007-09-17  3:23 ` erik quanstrom
2007-09-17 15:22   ` Douglas A. Gwyn
2007-09-17 15:55     ` erik quanstrom
2007-09-18  8:38       ` Douglas A. Gwyn
2007-09-18 10:45         ` dave.l
2007-09-18 14:44           ` Iruata Souza
2007-09-18 15:41             ` Douglas A. Gwyn
2007-09-18 21:34               ` Iruata Souza
2007-10-10  3:30         ` Jack Johnson
2007-10-10  4:02           ` erik quanstrom [this message]
2007-10-10  6:17             ` Jack Johnson
2007-10-10 12:22               ` erik quanstrom
2007-09-18 15:27     ` Rob Pike
2007-09-18 15:38       ` Uriel
2007-09-19  8:50         ` Douglas A. Gwyn
2007-09-19 11:51           ` erik quanstrom
2007-09-19 15:02             ` Russ Cox
2007-09-19 14:17           ` Charles Forsyth
2007-09-19 14:21           ` Iruata Souza
2007-09-19 15:32           ` Skip Tavakkolian
2007-10-09 20:08         ` Aharon Robbins
2007-10-09 21:08           ` Uriel
2007-10-10  5:33         ` sqweek
2007-10-10 11:49           ` erik quanstrom
2007-09-17 14:52 ` ron minnich
2007-09-17 14:53 ` ron minnich
2007-10-10  7:36 John Stalker
2007-10-10  8:24 ` Charles Forsyth
2007-10-10 11:47 ` erik quanstrom
2007-10-10 14:05   ` John Stalker
2007-10-10 14:29     ` erik quanstrom
2007-10-10 15:26       ` John Stalker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e53cd4b0faac1f3090800c8190d4958c@quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).