* New D07 test failure
@ 2008-02-27 22:48 Vin Shelton
2008-02-28 9:51 ` Peter Stephenson
0 siblings, 1 reply; 5+ messages in thread
From: Vin Shelton @ 2008-02-27 22:48 UTC (permalink / raw)
To: zsh workers
Peter et al,
There's a new test failure in the latest CVS sources:
/opt/src/zsh-2008-02-27/Test/D07multibyte.ztst: starting.
Testing multibyte with locale en_US.UTF-8
*** /tmp/zsh.ztst.out.28335 Wed Feb 27 17:43:12 2008
--- /tmp/zsh.ztst.tout.28335 Wed Feb 27 17:43:12 2008
***************
*** 1,3 ****
--- 1,9 ----
Diff output should be empty
+ 3,4d2
+ < HÃH
+ < HÃH
+ 5a4,5
+ > HÃH
+ > HÃH
Sort in C locale
HAH HEH HUH HÃH HÃH
Test /opt/src/zsh-2008-02-27/Test/D07multibyte.ztst failed: output
differs from expected as shown above for:
print -loi HAH HUH HEH HÃH HÃH >zshsort.txt
print -l HAH HUH HEH HÃH HÃH | sort >sortsort.txt
print Diff output should be empty
diff zshsort.txt sortsort.txt
print Sort in C locale
(LC_ALL=C; print -oi HAH HUH HEH HÃH HÃH)
Was testing: Multibyte characters in print sorting
/opt/src/zsh-2008-02-27/Test/D07multibyte.ztst: test failed.
Let me know if more details are needed.
Regards,
Vin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New D07 test failure
2008-02-27 22:48 New D07 test failure Vin Shelton
@ 2008-02-28 9:51 ` Peter Stephenson
2008-03-03 6:30 ` Bart Schaefer
0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2008-02-28 9:51 UTC (permalink / raw)
To: zsh workers
"Vin Shelton" wrote:
> Peter et al,
>
> There's a new test failure in the latest CVS sources:
OK, so we can't rely on the collation sequence in UTF-8 being consistent
across implementations, and we can't rely on "sort" either. Any other
ideas before I simply remove the sort tests? (I can leave the one where
it sets LC_ALL=C, but that's not giving us much.)
--
Peter Stephenson <pws@csr.com> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New D07 test failure
2008-02-28 9:51 ` Peter Stephenson
@ 2008-03-03 6:30 ` Bart Schaefer
2008-03-03 16:06 ` Peter Stephenson
0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2008-03-03 6:30 UTC (permalink / raw)
To: zsh workers
On Feb 28, 9:51am, Peter Stephenson wrote:
}
} OK, so we can't rely on the collation sequence in UTF-8 being consistent
} across implementations, and we can't rely on "sort" either. Any other
} ideas before I simply remove the sort tests?
Maybe you just need to choose the inputs more carefully?
I don't know whether this is really the case here, but it's entirely
possible that E-with-grave and E-with-acute have equivalent collation
in some locales, and therefore we're running into sort algorithm
stability differences that have nothing to do with correctness.
The important thing would be to find some characters whose collation
order actually inverts (rather than becoming indistinguishable) in
some locales. Unfortunately I don't know how to go about that, either.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New D07 test failure
2008-03-03 6:30 ` Bart Schaefer
@ 2008-03-03 16:06 ` Peter Stephenson
2008-03-03 16:54 ` İsmail Dönmez
0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2008-03-03 16:06 UTC (permalink / raw)
To: zsh workers
On Sun, 02 Mar 2008 22:30:56 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Feb 28, 9:51am, Peter Stephenson wrote:
> }
> } OK, so we can't rely on the collation sequence in UTF-8 being consistent
> } across implementations, and we can't rely on "sort" either. Any other
> } ideas before I simply remove the sort tests?
>
> Maybe you just need to choose the inputs more carefully?
Yes, maybe it's worth trying with a different well-ordered alphabet before
giving up. Note those really are Greek upper case letters even though some
may be rendered the same as Roman. (And people still say a classical
education is useless. Sheesh.)
Ismail's later problem looked like it was to do with "0" sorting before
".". Perhaps not making assumptions about punctuation symbols also
helps... If "0" doesn't come before "t", I will lose interest in fixing
this.
Index: Test/D07multibyte.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D07multibyte.ztst,v
retrieving revision 1.23
diff -u -r1.23 D07multibyte.ztst
--- Test/D07multibyte.ztst 27 Feb 2008 20:03:23 -0000 1.23
+++ Test/D07multibyte.ztst 3 Mar 2008 16:03:28 -0000
@@ -322,16 +322,10 @@
# We ask for case-insensitive sorting here (and supply upper case
# characters) so that we exercise the logic in the shell that lowers the
# case of the string for case-insensitive sorting.
-# As all letters are upper case, however, sort should produce the same order.
- print -loi HAH HUH HEH HÉH HÈH >zshsort.txt
- print -l HAH HUH HEH HÉH HÈH | sort >sortsort.txt
- print Diff output should be empty
- diff zshsort.txt sortsort.txt
- print Sort in C locale
+ print -oi HΕH HΔH HΓH HΒH HΑH
(LC_ALL=C; print -oi HAH HUH HEH HÉH HÈH)
0:Multibyte characters in print sorting
->Diff output should be empty
->Sort in C locale
+>HΑH HΒH HΓH HΔH HΕH
>HAH HEH HUH HÈH HÉH
# These are control characters in Unicode, so don't show up.
@@ -366,24 +360,24 @@
>1 149
>1 150
- touch ngs1.txt ngs2.txt ngs10.txt ngs20.txt ngs100.txt ngs200.txt
+ touch ngs1txt ngs2txt ngs10txt ngs20txt ngs100txt ngs200txt
setopt numericglobsort
print -l ngs*
unsetopt numericglobsort
print -l ngs*
0:NUMERIC_GLOB_SORT option in UTF-8 locale
->ngs1.txt
->ngs2.txt
->ngs10.txt
->ngs20.txt
->ngs100.txt
->ngs200.txt
->ngs100.txt
->ngs10.txt
->ngs1.txt
->ngs200.txt
->ngs20.txt
->ngs2.txt
+>ngs1txt
+>ngs2txt
+>ngs10txt
+>ngs20txt
+>ngs100txt
+>ngs200txt
+>ngs100txt
+>ngs10txt
+>ngs1txt
+>ngs200txt
+>ngs20txt
+>ngs2txt
# Not strictly multibyte, but gives us a well-defined locale for testing.
foo=$'X\xc0Y\x07Z\x7fT'
--
Peter Stephenson <pws@csr.com> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: New D07 test failure
2008-03-03 16:06 ` Peter Stephenson
@ 2008-03-03 16:54 ` İsmail Dönmez
0 siblings, 0 replies; 5+ messages in thread
From: İsmail Dönmez @ 2008-03-03 16:54 UTC (permalink / raw)
To: Peter Stephenson; +Cc: zsh workers
Hi,
On Mon, Mar 3, 2008 at 6:06 PM, Peter Stephenson <pws@csr.com> wrote:
> On Sun, 02 Mar 2008 22:30:56 -0800
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Feb 28, 9:51am, Peter Stephenson wrote:
> > }
> > } OK, so we can't rely on the collation sequence in UTF-8 being consistent
> > } across implementations, and we can't rely on "sort" either. Any other
> > } ideas before I simply remove the sort tests?
> >
> > Maybe you just need to choose the inputs more carefully?
>
> Yes, maybe it's worth trying with a different well-ordered alphabet before
> giving up. Note those really are Greek upper case letters even though some
> may be rendered the same as Roman. (And people still say a classical
> education is useless. Sheesh.)
>
> Ismail's later problem looked like it was to do with "0" sorting before
> ".". Perhaps not making assumptions about punctuation symbols also
> helps... If "0" doesn't come before "t", I will lose interest in fixing
> this.
This fixed it for me, great thanks.
Regards,
ismail
--
UNIX is basically a simple operating system, but you have to be a
genius to understand the simplicity. - Dennis Ritchie
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-03-03 16:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-27 22:48 New D07 test failure Vin Shelton
2008-02-28 9:51 ` Peter Stephenson
2008-03-03 6:30 ` Bart Schaefer
2008-03-03 16:06 ` Peter Stephenson
2008-03-03 16:54 ` İsmail Dönmez
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).