zsh-workers
 help / color / mirror / code / Atom feed
* Regression in UTF-8 support
@ 2005-09-25 16:05 Andrey Borzenkov
  2005-09-25 21:56 ` Mikael Magnusson
  2005-09-26 18:37 ` Peter Stephenson
  0 siblings, 2 replies; 10+ messages in thread
From: Andrey Borzenkov @ 2005-09-25 16:05 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

I did not really need Russian filenames until recently; with quite unexpected 
results. The following is in UTF; please compare file listing with completion 
listing (ignore obvious formatting error):

{pts/1}% ll
итого 0
drwxr-xr-x  1 root root 0 Сен 24 11:57 arvidjaar/
drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои видеозаписи/
drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои документы/
drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои фотографии/
drwxr-xr-x  1 root root 0 Сен 24 11:57 Моя музыка/
drwxr-xr-x  1 root root 0 Сен 25 19:40 Папки друзей/
drwxr-xr-x  1 root root 0 Сен 25 19:40 Публичные папки/
{pts/1}% cd arvidjaar/
Completing local directory
arvidjaar/                         Папки\ друзей/
Мои\ видеозаписи/    Мои\ документу/
Мои\ уотограуии/    Моу\ музука/
Публиунуе\ папки/   

Here are codes of some characters that are mixed:

{pts/2}% echo фу | xxd
0000000: d184 d183 0a                             .....
{pts/2}% echo ф <= result of up history!!!
ф
{pts/2}% echo уы | xxd
0000000: d183 d18b 0a                             .....
{pts/2}% echo  <= result of up history!!!

so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover, 
parsing stops at this character (d183)

                            

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov
@ 2005-09-25 21:56 ` Mikael Magnusson
  2005-09-26 18:37 ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Mikael Magnusson @ 2005-09-25 21:56 UTC (permalink / raw)
  To: zsh-workers

On 9/25/05, Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> I did not really need Russian filenames until recently; with quite unexpected
> results. The following is in UTF; please compare file listing with completion
> listing (ignore obvious formatting error):
>
> {pts/1}% ll
> итого 0
> drwxr-xr-x  1 root root 0 Сен 24 11:57 arvidjaar/
> drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои видеозаписи/
> drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои документы/
> drwxr-xr-x  1 root root 0 Сен 24 11:57 Мои фотографии/
> drwxr-xr-x  1 root root 0 Сен 24 11:57 Моя музыка/
> drwxr-xr-x  1 root root 0 Сен 25 19:40 Папки друзей/
> drwxr-xr-x  1 root root 0 Сен 25 19:40 Публичные папки/
> {pts/1}% cd arvidjaar/
> Completing local directory
> arvidjaar/                         Папки\ друзей/
> Мои\ видеозаписи/    Мои\ документу/
> Мои\ уотограуии/    Моу\ музука/
> Публиунуе\ папки/
>
> Here are codes of some characters that are mixed:
>
> {pts/2}% echo фу | xxd
> 0000000: d184 d183 0a                             .....
> {pts/2}% echo ф <= result of up history!!!
> ф
> {pts/2}% echo уы | xxd
> 0000000: d183 d18b 0a                             .....
> {pts/2}% echo  <= result of up history!!!
>
> so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover,
> parsing stops at this character (d183)

I think i brought this up in my thread about utf a while ago, but
maybe listing several issues in one mail wasn't really a good idea.
Just wanted to say it is reproducible here too, at least the history
truncating part.

--
Mikael Magnusson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov
  2005-09-25 21:56 ` Mikael Magnusson
@ 2005-09-26 18:37 ` Peter Stephenson
  2005-09-26 18:53   ` Andrey Borzenkov
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2005-09-26 18:37 UTC (permalink / raw)
  To: zsh-workers

Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> I did not really need Russian filenames until recently; with quite
> unexpected results. The following is in UTF; please compare file listing
> with completion listing (ignore obvious formatting error):
>...
> so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover, 
> parsing stops at this character (d183)

I think this improves matters, but whether it's the whole thing I don't
know.  It's a simple interface issue.  I'm now less convinced I should have
let stringaszleline() operate in place.

Index: Src/Zle/zle_hist.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v
retrieving revision 1.27
diff -u -r1.27 zle_hist.c
--- Src/Zle/zle_hist.c	15 Aug 2005 10:01:50 -0000	1.27
+++ Src/Zle/zle_hist.c	26 Sep 2005 18:34:59 -0000
@@ -75,6 +75,8 @@
 static void
 zletext(Histent ent, struct zle_text *zt)
 {
+    char *duptext;
+
     if (ent->zle_text) {
 	zt->text = ent->zle_text;
 	zt->len = ent->zle_len;
@@ -82,8 +84,10 @@
 	return;
     }
 
-    zt->text = stringaszleline((unsigned char *)ent->text, 0,
+    duptext = ztrdup(ent->text);
+    zt->text = stringaszleline((unsigned char *)duptext, 0,
 			       &zt->len, NULL, NULL);
+    zsfree(duptext);
     zt->alloced = 1;
 }
 


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-26 18:37 ` Peter Stephenson
@ 2005-09-26 18:53   ` Andrey Borzenkov
  2005-09-27 14:22     ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrey Borzenkov @ 2005-09-26 18:53 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Monday 26 September 2005 22:37, Peter Stephenson wrote:
> Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> > I did not really need Russian filenames until recently; with quite
> > unexpected results. The following is in UTF; please compare file listing
> > with completion listing (ignore obvious formatting error):
> >...
> > so something mangles characters (d184 -> d183, d18b -> d183 etc),
> > moreover, parsing stops at this character (d183)
>
> I think this improves matters, but whether it's the whole thing I don't
> know.  It's a simple interface issue.  I'm now less convinced I should have
> let stringaszleline() operate in place.
>

this fixed history truncation but not strange mangling in completion listing. 
I'll try a bit more tomorrow.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-26 18:53   ` Andrey Borzenkov
@ 2005-09-27 14:22     ` Peter Stephenson
  2005-09-27 17:00       ` Mikael Magnusson
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2005-09-27 14:22 UTC (permalink / raw)
  To: zsh-workers

Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> this fixed history truncation but not strange mangling in completion
> listing.

There were some bits I missed or got wrong when updating nicechar().

Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.94
diff -u -r1.94 utils.c
--- Src/utils.c	20 Sep 2005 16:33:01 -0000	1.94
+++ Src/utils.c	27 Sep 2005 14:19:45 -0000
@@ -260,7 +260,7 @@
      * This can't happen if the character is printed "nicely", so
      * this results in a maximum of two bytes total (plus the null).
      */
-    if (itok(c)) {
+    if (imeta(c)) {
 	*s++ = Meta;
 	*s++ = c ^ 32;
     } else
Index: Src/Zle/complist.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/complist.c,v
retrieving revision 1.71
diff -u -r1.71 complist.c
--- Src/Zle/complist.c	10 Aug 2005 13:21:16 -0000	1.71
+++ Src/Zle/complist.c	27 Sep 2005 14:19:45 -0000
@@ -570,11 +570,12 @@
 	    cc = *s++ ^ 32;
 
 	for (t = nicechar(cc); *t; t++) {
+	    int nc = (*t == Meta) ? STOUC(*++t ^ 32) : STOUC(*t);
 	    if (ml == mlend - 1 && col == columns - 1) {
 		mlprinted = ml - oml;
 		return 0;
 	    }
-	    putc(*t, shout);
+	    putc(nc, shout);
 	    if (++col == columns) {
 		ml++;
 		if (mscroll && !--mrestlines && (ask = asklistscroll(ml))) {
@@ -978,11 +979,12 @@
 	    c = *s++ ^ 32;
 
 	for (t = nicechar(c); *t; t++) {
+	    int nc = (*t == Meta) ? STOUC(*++t ^ 32) : STOUC(*t);
 	    if (ml == mlend - 1 && col == columns - 1) {
 		mlprinted = ml - oml;
 		return 0;
 	    }
-	    putc(*t, shout);
+	    putc(nc, shout);
 	    if (++col == columns) {
 		ml++;
 		if (mscroll && !--mrestlines && (ask = asklistscroll(ml))) {


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-27 14:22     ` Peter Stephenson
@ 2005-09-27 17:00       ` Mikael Magnusson
  2005-09-28  3:04         ` Andrey Borzenkov
  0 siblings, 1 reply; 10+ messages in thread
From: Mikael Magnusson @ 2005-09-27 17:00 UTC (permalink / raw)
  To: zsh-workers

On 9/27/05, Peter Stephenson <pws@csr.com> wrote:
> Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> > this fixed history truncation but not strange mangling in completion
> > listing.
>
> There were some bits I missed or got wrong when updating nicechar().

This seems to fix most things here, but when i look at the history
file, some utf characters aren't saved correctly, but they become
correct when up-arrowing in zsh. Manually entering the same utf-8 code
in the history file seems to not confuse zsh though, but pressing
enter saves the "malformed" entry again. In my case the utf is し,
hiragana shi, 0xE38107. It is saved in history as ぃ・, 0xE38183C2B7.

--
Mikael Magnusson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-27 17:00       ` Mikael Magnusson
@ 2005-09-28  3:04         ` Andrey Borzenkov
  2005-09-28 10:15           ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Andrey Borzenkov @ 2005-09-28  3:04 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Tuesday 27 September 2005 21:00, Mikael Magnusson wrote:
> On 9/27/05, Peter Stephenson <pws@csr.com> wrote:
> > Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> > > this fixed history truncation but not strange mangling in completion
> > > listing.
> >
> > There were some bits I missed or got wrong when updating nicechar().
>
> This seems to fix most things here, 

yes, completion listing for sure (sans width calculation :)

> but when i look at the history 
> file, some utf characters aren't saved correctly, but they become
> correct when up-arrowing in zsh. Manually entering the same utf-8 code
> in the history file seems to not confuse zsh though, but pressing
> enter saves the "malformed" entry again. In my case the utf is し,
> hiragana shi, 0xE38107. It is saved in history as ぃ・, 0xE38183C2B7.
>

Zsh saves it metafied. I agree, external representation should be unmetafied; 
OTOH this is unlikely to depend on UTF-8 support, it is just that those 
characters are usually unused in 8-bit character sets so nobody has probably 
noticed this before

-andrey 

PS I am pretty much impressed; finally there is valid usage for UTF-8 encoding 
in E-Mail. Good bye legacy terminals?

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-28  3:04         ` Andrey Borzenkov
@ 2005-09-28 10:15           ` Peter Stephenson
  2005-09-28 10:22             ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2005-09-28 10:15 UTC (permalink / raw)
  To: Zsh hackers list

Andrey Borzenkov wrote:
> Zsh saves it metafied. I agree, external representation should be unmetafied;
> OTOH this is unlikely to depend on UTF-8 support, it is just that those
> characters are usually unused in 8-bit character sets so nobody has probably
> noticed this before

Metafication at this stage is mostly just to preserve a NULL.  It's
other use, protecting tokens, doesn't really apply in strings in the
command line editor (although it's needed while the string is being processed
by the main shell during completion).  However, a NULL won't occur in
the character sets we're using except as an ASCII NULL (since the
character set must be an extension of ASCII, contrary to the test I
added to the prompt code), so this isn't really a multibyte issue.
On the other hand, you *can* add a literal NULL to a command line if
you want.

pws


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-28 10:15           ` Peter Stephenson
@ 2005-09-28 10:22             ` Peter Stephenson
  2005-09-28 14:45               ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2005-09-28 10:22 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson wrote:
> Metafication at this stage is mostly just to preserve a NULL.
>...
> On the other hand, you *can* add a literal NULL to a command line if
> you want.

On the gripping hand, it's not clear we even need to quote a null in
a history file, since it's not a set of null-terminated strings.
The only special character is newline which is already treated.

pws


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regression in UTF-8 support
  2005-09-28 10:22             ` Peter Stephenson
@ 2005-09-28 14:45               ` Bart Schaefer
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2005-09-28 14:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sep 28, 11:22am, Peter Stephenson wrote:
}
} On the gripping hand, it's not clear we even need to quote a null in
} a history file, since it's not a set of null-terminated strings.

History files have traditionally had problems with NFS-mounted home
directories when zsh instances on multiple machines are sharing the
files.  A common NFS problem, at least in years past, has been to
dump a bunch of NULs into files when e.g. two processes disagree on
the ftruncate() length.  Presently (IIRC) the zsh history mechanism
discards these unmetafied NULs, which masks a lot of potential idiocy.

Also, zsh history files have long been designed such that they are
compatible with other shells as long as you don't turn on assorted
extended features.  Metafication probably breaks a little of this
already, but unmetafied NULs would abolish it entirely.

However, my biggest objection would be that changing the history file
format would mean that previous versions of zsh would not be able to
share the files with newer versions.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-09-28 14:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov
2005-09-25 21:56 ` Mikael Magnusson
2005-09-26 18:37 ` Peter Stephenson
2005-09-26 18:53   ` Andrey Borzenkov
2005-09-27 14:22     ` Peter Stephenson
2005-09-27 17:00       ` Mikael Magnusson
2005-09-28  3:04         ` Andrey Borzenkov
2005-09-28 10:15           ` Peter Stephenson
2005-09-28 10:22             ` Peter Stephenson
2005-09-28 14:45               ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).