* Regression in UTF-8 support @ 2005-09-25 16:05 Andrey Borzenkov 2005-09-25 21:56 ` Mikael Magnusson 2005-09-26 18:37 ` Peter Stephenson 0 siblings, 2 replies; 10+ messages in thread From: Andrey Borzenkov @ 2005-09-25 16:05 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1410 bytes --] I did not really need Russian filenames until recently; with quite unexpected results. The following is in UTF; please compare file listing with completion listing (ignore obvious formatting error): {pts/1}% ll итого 0 drwxr-xr-x 1 root root 0 Сен 24 11:57 arvidjaar/ drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои видеозаписи/ drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои документы/ drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои фотографии/ drwxr-xr-x 1 root root 0 Сен 24 11:57 Моя музыка/ drwxr-xr-x 1 root root 0 Сен 25 19:40 Папки друзей/ drwxr-xr-x 1 root root 0 Сен 25 19:40 Публичные папки/ {pts/1}% cd arvidjaar/ Completing local directory arvidjaar/ Папки\ друзей/ Мои\ видеозаписи/ Мои\ документу/ Мои\ уотограуии/ Моу\ музука/ Публиунуе\ папки/ Here are codes of some characters that are mixed: {pts/2}% echo фу | xxd 0000000: d184 d183 0a ..... {pts/2}% echo ф <= result of up history!!! ф {pts/2}% echo уы | xxd 0000000: d183 d18b 0a ..... {pts/2}% echo <= result of up history!!! so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover, parsing stops at this character (d183) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov @ 2005-09-25 21:56 ` Mikael Magnusson 2005-09-26 18:37 ` Peter Stephenson 1 sibling, 0 replies; 10+ messages in thread From: Mikael Magnusson @ 2005-09-25 21:56 UTC (permalink / raw) To: zsh-workers On 9/25/05, Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > I did not really need Russian filenames until recently; with quite unexpected > results. The following is in UTF; please compare file listing with completion > listing (ignore obvious formatting error): > > {pts/1}% ll > итого 0 > drwxr-xr-x 1 root root 0 Сен 24 11:57 arvidjaar/ > drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои видеозаписи/ > drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои документы/ > drwxr-xr-x 1 root root 0 Сен 24 11:57 Мои фотографии/ > drwxr-xr-x 1 root root 0 Сен 24 11:57 Моя музыка/ > drwxr-xr-x 1 root root 0 Сен 25 19:40 Папки друзей/ > drwxr-xr-x 1 root root 0 Сен 25 19:40 Публичные папки/ > {pts/1}% cd arvidjaar/ > Completing local directory > arvidjaar/ Папки\ друзей/ > Мои\ видеозаписи/ Мои\ документу/ > Мои\ уотограуии/ Моу\ музука/ > Публиунуе\ папки/ > > Here are codes of some characters that are mixed: > > {pts/2}% echo фу | xxd > 0000000: d184 d183 0a ..... > {pts/2}% echo ф <= result of up history!!! > ф > {pts/2}% echo уы | xxd > 0000000: d183 d18b 0a ..... > {pts/2}% echo <= result of up history!!! > > so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover, > parsing stops at this character (d183) I think i brought this up in my thread about utf a while ago, but maybe listing several issues in one mail wasn't really a good idea. Just wanted to say it is reproducible here too, at least the history truncating part. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov 2005-09-25 21:56 ` Mikael Magnusson @ 2005-09-26 18:37 ` Peter Stephenson 2005-09-26 18:53 ` Andrey Borzenkov 1 sibling, 1 reply; 10+ messages in thread From: Peter Stephenson @ 2005-09-26 18:37 UTC (permalink / raw) To: zsh-workers Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > I did not really need Russian filenames until recently; with quite > unexpected results. The following is in UTF; please compare file listing > with completion listing (ignore obvious formatting error): >... > so something mangles characters (d184 -> d183, d18b -> d183 etc), moreover, > parsing stops at this character (d183) I think this improves matters, but whether it's the whole thing I don't know. It's a simple interface issue. I'm now less convinced I should have let stringaszleline() operate in place. Index: Src/Zle/zle_hist.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v retrieving revision 1.27 diff -u -r1.27 zle_hist.c --- Src/Zle/zle_hist.c 15 Aug 2005 10:01:50 -0000 1.27 +++ Src/Zle/zle_hist.c 26 Sep 2005 18:34:59 -0000 @@ -75,6 +75,8 @@ static void zletext(Histent ent, struct zle_text *zt) { + char *duptext; + if (ent->zle_text) { zt->text = ent->zle_text; zt->len = ent->zle_len; @@ -82,8 +84,10 @@ return; } - zt->text = stringaszleline((unsigned char *)ent->text, 0, + duptext = ztrdup(ent->text); + zt->text = stringaszleline((unsigned char *)duptext, 0, &zt->len, NULL, NULL); + zsfree(duptext); zt->alloced = 1; } -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-26 18:37 ` Peter Stephenson @ 2005-09-26 18:53 ` Andrey Borzenkov 2005-09-27 14:22 ` Peter Stephenson 0 siblings, 1 reply; 10+ messages in thread From: Andrey Borzenkov @ 2005-09-26 18:53 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 773 bytes --] On Monday 26 September 2005 22:37, Peter Stephenson wrote: > Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > > I did not really need Russian filenames until recently; with quite > > unexpected results. The following is in UTF; please compare file listing > > with completion listing (ignore obvious formatting error): > >... > > so something mangles characters (d184 -> d183, d18b -> d183 etc), > > moreover, parsing stops at this character (d183) > > I think this improves matters, but whether it's the whole thing I don't > know. It's a simple interface issue. I'm now less convinced I should have > let stringaszleline() operate in place. > this fixed history truncation but not strange mangling in completion listing. I'll try a bit more tomorrow. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-26 18:53 ` Andrey Borzenkov @ 2005-09-27 14:22 ` Peter Stephenson 2005-09-27 17:00 ` Mikael Magnusson 0 siblings, 1 reply; 10+ messages in thread From: Peter Stephenson @ 2005-09-27 14:22 UTC (permalink / raw) To: zsh-workers Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > this fixed history truncation but not strange mangling in completion > listing. There were some bits I missed or got wrong when updating nicechar(). Index: Src/utils.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/utils.c,v retrieving revision 1.94 diff -u -r1.94 utils.c --- Src/utils.c 20 Sep 2005 16:33:01 -0000 1.94 +++ Src/utils.c 27 Sep 2005 14:19:45 -0000 @@ -260,7 +260,7 @@ * This can't happen if the character is printed "nicely", so * this results in a maximum of two bytes total (plus the null). */ - if (itok(c)) { + if (imeta(c)) { *s++ = Meta; *s++ = c ^ 32; } else Index: Src/Zle/complist.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/complist.c,v retrieving revision 1.71 diff -u -r1.71 complist.c --- Src/Zle/complist.c 10 Aug 2005 13:21:16 -0000 1.71 +++ Src/Zle/complist.c 27 Sep 2005 14:19:45 -0000 @@ -570,11 +570,12 @@ cc = *s++ ^ 32; for (t = nicechar(cc); *t; t++) { + int nc = (*t == Meta) ? STOUC(*++t ^ 32) : STOUC(*t); if (ml == mlend - 1 && col == columns - 1) { mlprinted = ml - oml; return 0; } - putc(*t, shout); + putc(nc, shout); if (++col == columns) { ml++; if (mscroll && !--mrestlines && (ask = asklistscroll(ml))) { @@ -978,11 +979,12 @@ c = *s++ ^ 32; for (t = nicechar(c); *t; t++) { + int nc = (*t == Meta) ? STOUC(*++t ^ 32) : STOUC(*t); if (ml == mlend - 1 && col == columns - 1) { mlprinted = ml - oml; return 0; } - putc(*t, shout); + putc(nc, shout); if (++col == columns) { ml++; if (mscroll && !--mrestlines && (ask = asklistscroll(ml))) { -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-27 14:22 ` Peter Stephenson @ 2005-09-27 17:00 ` Mikael Magnusson 2005-09-28 3:04 ` Andrey Borzenkov 0 siblings, 1 reply; 10+ messages in thread From: Mikael Magnusson @ 2005-09-27 17:00 UTC (permalink / raw) To: zsh-workers On 9/27/05, Peter Stephenson <pws@csr.com> wrote: > Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > > this fixed history truncation but not strange mangling in completion > > listing. > > There were some bits I missed or got wrong when updating nicechar(). This seems to fix most things here, but when i look at the history file, some utf characters aren't saved correctly, but they become correct when up-arrowing in zsh. Manually entering the same utf-8 code in the history file seems to not confuse zsh though, but pressing enter saves the "malformed" entry again. In my case the utf is し, hiragana shi, 0xE38107. It is saved in history as ぃ・, 0xE38183C2B7. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-27 17:00 ` Mikael Magnusson @ 2005-09-28 3:04 ` Andrey Borzenkov 2005-09-28 10:15 ` Peter Stephenson 0 siblings, 1 reply; 10+ messages in thread From: Andrey Borzenkov @ 2005-09-28 3:04 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1228 bytes --] On Tuesday 27 September 2005 21:00, Mikael Magnusson wrote: > On 9/27/05, Peter Stephenson <pws@csr.com> wrote: > > Andrey Borzenkov <arvidjaar@newmail.ru> wrote: > > > this fixed history truncation but not strange mangling in completion > > > listing. > > > > There were some bits I missed or got wrong when updating nicechar(). > > This seems to fix most things here, yes, completion listing for sure (sans width calculation :) > but when i look at the history > file, some utf characters aren't saved correctly, but they become > correct when up-arrowing in zsh. Manually entering the same utf-8 code > in the history file seems to not confuse zsh though, but pressing > enter saves the "malformed" entry again. In my case the utf is し, > hiragana shi, 0xE38107. It is saved in history as ぃ・, 0xE38183C2B7. > Zsh saves it metafied. I agree, external representation should be unmetafied; OTOH this is unlikely to depend on UTF-8 support, it is just that those characters are usually unused in 8-bit character sets so nobody has probably noticed this before -andrey PS I am pretty much impressed; finally there is valid usage for UTF-8 encoding in E-Mail. Good bye legacy terminals? [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-28 3:04 ` Andrey Borzenkov @ 2005-09-28 10:15 ` Peter Stephenson 2005-09-28 10:22 ` Peter Stephenson 0 siblings, 1 reply; 10+ messages in thread From: Peter Stephenson @ 2005-09-28 10:15 UTC (permalink / raw) To: Zsh hackers list Andrey Borzenkov wrote: > Zsh saves it metafied. I agree, external representation should be unmetafied; > OTOH this is unlikely to depend on UTF-8 support, it is just that those > characters are usually unused in 8-bit character sets so nobody has probably > noticed this before Metafication at this stage is mostly just to preserve a NULL. It's other use, protecting tokens, doesn't really apply in strings in the command line editor (although it's needed while the string is being processed by the main shell during completion). However, a NULL won't occur in the character sets we're using except as an ASCII NULL (since the character set must be an extension of ASCII, contrary to the test I added to the prompt code), so this isn't really a multibyte issue. On the other hand, you *can* add a literal NULL to a command line if you want. pws ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-28 10:15 ` Peter Stephenson @ 2005-09-28 10:22 ` Peter Stephenson 2005-09-28 14:45 ` Bart Schaefer 0 siblings, 1 reply; 10+ messages in thread From: Peter Stephenson @ 2005-09-28 10:22 UTC (permalink / raw) To: Zsh hackers list Peter Stephenson wrote: > Metafication at this stage is mostly just to preserve a NULL. >... > On the other hand, you *can* add a literal NULL to a command line if > you want. On the gripping hand, it's not clear we even need to quote a null in a history file, since it's not a set of null-terminated strings. The only special character is newline which is already treated. pws ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regression in UTF-8 support 2005-09-28 10:22 ` Peter Stephenson @ 2005-09-28 14:45 ` Bart Schaefer 0 siblings, 0 replies; 10+ messages in thread From: Bart Schaefer @ 2005-09-28 14:45 UTC (permalink / raw) To: Zsh hackers list On Sep 28, 11:22am, Peter Stephenson wrote: } } On the gripping hand, it's not clear we even need to quote a null in } a history file, since it's not a set of null-terminated strings. History files have traditionally had problems with NFS-mounted home directories when zsh instances on multiple machines are sharing the files. A common NFS problem, at least in years past, has been to dump a bunch of NULs into files when e.g. two processes disagree on the ftruncate() length. Presently (IIRC) the zsh history mechanism discards these unmetafied NULs, which masks a lot of potential idiocy. Also, zsh history files have long been designed such that they are compatible with other shells as long as you don't turn on assorted extended features. Metafication probably breaks a little of this already, but unmetafied NULs would abolish it entirely. However, my biggest objection would be that changing the history file format would mean that previous versions of zsh would not be able to share the files with newer versions. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-09-28 14:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-09-25 16:05 Regression in UTF-8 support Andrey Borzenkov 2005-09-25 21:56 ` Mikael Magnusson 2005-09-26 18:37 ` Peter Stephenson 2005-09-26 18:53 ` Andrey Borzenkov 2005-09-27 14:22 ` Peter Stephenson 2005-09-27 17:00 ` Mikael Magnusson 2005-09-28 3:04 ` Andrey Borzenkov 2005-09-28 10:15 ` Peter Stephenson 2005-09-28 10:22 ` Peter Stephenson 2005-09-28 14:45 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).