zsh-workers
 help / color / mirror / code / Atom feed
* 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
@ 1998-07-06 17:28 C. v. Stuckrad
  1998-07-06 18:14 ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: C. v. Stuckrad @ 1998-07-06 17:28 UTC (permalink / raw)
  To: Zsh workers list



Hi!

Is it 'really correct', that after setting 'LANG=de' or 'LC_COLLATE=de'
ranges of characters will no more be differentiate between uppercase
and lowecase ? So 'rm [A-Z]' will remove not only 'FOO' but 'bar' too!

Explicitely writing all the letters works! 
'rm [ABCDEFGHIJKLMNOPQRSTUVWXYZ]*' removes 'FOO' leaves 'bar' alone.

Is this a bug ?  Or a feature I've not been warned of by the manuals.

Stucki

PS.: I'm using zsh-3.0.5 on Solaris 2.4 compiled by gcc-2.7.2
     (soon to be zsh 3.1.4 on Solaris 2.6) 

Christoph von Stuckrad       * *  | talk to  | <stucki@math.fu-berlin.de> \
Freie Universitaet Berlin    |/_* | nickname | ...!unido!fub!leibniz!stucki|
Fachbereich Mathematik, EDV  |\ * | 'stucki' | Tel:+49 30 838-75459        |
Arnimallee 2-6/14195 Berlin  * *  |  on IRC  | Fax:+49 30 838-75454       /


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-06 17:28 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase C. v. Stuckrad
@ 1998-07-06 18:14 ` Bart Schaefer
  1998-07-07 15:30   ` C. v. Stuckrad
  1998-07-08  6:40   ` Zoltan Hidvegi
  0 siblings, 2 replies; 8+ messages in thread
From: Bart Schaefer @ 1998-07-06 18:14 UTC (permalink / raw)
  To: C. v. Stuckrad, Zsh workers list

On Jul 6,  7:28pm, C. v. Stuckrad wrote:
} Subject: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lower
}
} 
} Is it 'really correct', that after setting 'LANG=de' or 'LC_COLLATE=de'
} ranges of characters will no more be differentiate between uppercase
} and lowecase ? So 'rm [A-Z]' will remove not only 'FOO' but 'bar' too!

Ranges like [A-Z] are computed using strcoll() when it is available.  If
that collation function returns that "b" is greater than "A" and less
than "Z" then 'b' is considered to be in the range [A-Z].

It's entirely possible that setting LANG and/or LC_COLLATE to something
other than C or ASCII could cause sorting to become case-insensitive or
to mix the letters (e.g. AaBbCcDd...).  In the latter case, [A-Z] would
include 'a' through 'y' but not 'z', which is seriously confusing.

} Is this a bug ?  Or a feature I've not been warned of by the manuals.

I'd have to list it as the latter, but it sure creeps awfully close to
being a bug, because it's totally unexpected if you actually know about
the numeric values of your character set.

I'd vote in favor of removing HAVE_STRCOLL from matchonce() in glob.c.


-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-06 18:14 ` Bart Schaefer
@ 1998-07-07 15:30   ` C. v. Stuckrad
  1998-07-08  6:40   ` Zoltan Hidvegi
  1 sibling, 0 replies; 8+ messages in thread
From: C. v. Stuckrad @ 1998-07-07 15:30 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh workers list

On Mon, 6 Jul 1998, Bart Schaefer wrote:

> I'd vote in favor of removing HAVE_STRCOLL from matchonce() in glob.c.
I second that !

Then I'd get the A-Z before a-z sorting behaviour (right?),
while LC_COLLATE=de seems to generate 'normal german dictionary order'.

Really, slightly confusing (:-). My problem then will be the
(L)users on Solaris CDE who swich to german (LANG=de),
think they've understood globbing, then rm [A-Z]* ...

Stucki

Christoph von Stuckrad       * *  | talk to  | <stucki@math.fu-berlin.de> \
Freie Universitaet Berlin    |/_* | nickname | ...!unido!fub!leibniz!stucki|
Fachbereich Mathematik, EDV  |\ * | 'stucki' | Tel:+49 30 838-75459        |
Arnimallee 2-6/14195 Berlin  * *  |  on IRC  | Fax:+49 30 838-75454       /


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-06 18:14 ` Bart Schaefer
  1998-07-07 15:30   ` C. v. Stuckrad
@ 1998-07-08  6:40   ` Zoltan Hidvegi
  1998-07-08 11:02     ` Bart Schaefer
  1 sibling, 1 reply; 8+ messages in thread
From: Zoltan Hidvegi @ 1998-07-08  6:40 UTC (permalink / raw)
  To: Zsh hacking and development

> I'd vote in favor of removing HAVE_STRCOLL from matchonce() in glob.c.

According to the standard:

   LC_COLLATE 
         This variable determines the behaviour of range expressions,
         equivalence classes and multi-character collating elements
         within pattern matching.

Of course the standard also requires POSIX character classes, so instead
of [a-z] you are supposed to use [[:lower:]], but unfortunately the later
is not supported yet.

Zoli


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-08  6:40   ` Zoltan Hidvegi
@ 1998-07-08 11:02     ` Bart Schaefer
  1998-07-08 15:36       ` Zoltan Hidvegi
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 1998-07-08 11:02 UTC (permalink / raw)
  To: Zoltan Hidvegi, Zsh hacking and development

On Jul 8,  1:40am, Zoltan Hidvegi wrote:
} Subject: Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including l
}
} > I'd vote in favor of removing HAVE_STRCOLL from matchonce() in glob.c.
} 
} According to the standard:
} 
}    LC_COLLATE 
}          This variable determines the behaviour of range expressions,
}          equivalence classes and multi-character collating elements
}          within pattern matching.

Which standard, specifically?

This is a case where that standard is harmfully flying in the face of
common sense.  Of course LC_COLLATE ought to apply to POSIX character
classes once those are supported, and ought to apply to the collation
of any resulting ordered list.  But to have it apply to the characters
within ranges like [A-Z] will cause the simplest shell scripts to go
completely haywire; we'll have to start putting "local LC_COLLATE=C"
or some such at the top of every autoloaded function, along with all
the "emulate -R" and "setopt localoptions" junk that's already there.

Standards are not meant to be followed blindly.  That's why the IETF,
for example, has a collection of rules for interpreting MUST, SHOULD,
MAY, etc. when they appear in IETF documents.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-08 11:02     ` Bart Schaefer
@ 1998-07-08 15:36       ` Zoltan Hidvegi
  1998-07-08 17:24         ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Zoltan Hidvegi @ 1998-07-08 15:36 UTC (permalink / raw)
  To: Zsh hacking and development

> } According to the standard:
> } 
> }    LC_COLLATE 
> }          This variable determines the behaviour of range expressions,
> }          equivalence classes and multi-character collating elements
> }          within pattern matching.
> 
> Which standard, specifically?

POSIX 1003.2 and X Open Single Unix Spcification Version 2.  It says:

     (7)  A range expression represents the set of collating elements that
          fall between two elements in the current collation sequence,
          inclusively.  It shall be expressed as the starting point and
          the ending point separated by a hyphen (-).

          Range expressions shall not be used in Strictly Conforming
          POSIX.2 Applications because their behavior is dependent on the
          collating sequence.  Range expressions shall be supported by
          conforming implementations.

Zoli


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-08 15:36       ` Zoltan Hidvegi
@ 1998-07-08 17:24         ` Bart Schaefer
  1998-07-08 19:35           ` Zoltan Hidvegi
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 1998-07-08 17:24 UTC (permalink / raw)
  To: Zsh hacking and development

On Jul 8, 10:36am, Zoltan Hidvegi wrote:
} Subject: Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including l
}
} > Which standard, specifically?
} 
} POSIX 1003.2 and X Open Single Unix Spcification Version 2.

Well, yes, but I meant which volume.  Commands and Utilities?

} It says:
} 
}           Range expressions shall not be used in Strictly Conforming
}           POSIX.2 Applications because their behavior is dependent on the
}           collating sequence.  Range expressions shall be supported by
}           conforming implementations.

I just love it when committees do stuff like that.  What does it mean for
a shell to "support" but "not use" range expressions?

I still think that in this case the cure is worse than the disease.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase
  1998-07-08 17:24         ` Bart Schaefer
@ 1998-07-08 19:35           ` Zoltan Hidvegi
  0 siblings, 0 replies; 8+ messages in thread
From: Zoltan Hidvegi @ 1998-07-08 19:35 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

> On Jul 8, 10:36am, Zoltan Hidvegi wrote:
> } Subject: Re: 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including l
> }
> } > Which standard, specifically?
> } 
> } POSIX 1003.2 and X Open Single Unix Spcification Version 2.
> 
> Well, yes, but I meant which volume.  Commands and Utilities?

Base Definitions -> Regular Expressions.  Commands and Utilities ->
Shell Command Language refers to regular expressions describing range
patterns.  Which means that you should not be surprised when sed and
grep start matching lower case letters with [A-Z].  And if you ever
port zsh to some EBCDIC machine, you'd probably want to use the
collate order.  But it looks like the GNU regexp library does not use
the collate order for ranges, neither does bash, actually, I haven't
found anything which uses the collate order for ranges, so it is
probably OK to make zsh non-conforming here.

Zoli


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1998-07-08 19:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-07-06 17:28 'LC_COLLATE=de ls [A-Z]*' expands to 'every file' including lowercase C. v. Stuckrad
1998-07-06 18:14 ` Bart Schaefer
1998-07-07 15:30   ` C. v. Stuckrad
1998-07-08  6:40   ` Zoltan Hidvegi
1998-07-08 11:02     ` Bart Schaefer
1998-07-08 15:36       ` Zoltan Hidvegi
1998-07-08 17:24         ` Bart Schaefer
1998-07-08 19:35           ` Zoltan Hidvegi

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).