zsh-workers
 help / color / mirror / code / Atom feed
* various weirdnesses with unicode support
@ 2005-09-07 20:30 Mikael Magnusson
  2005-09-08  8:55 ` David Gómez
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mikael Magnusson @ 2005-09-07 20:30 UTC (permalink / raw)
  To: zsh-workers

I've been using zsh in sv_SE.utf8 locale now for a few weeks, and it
is mostly working fine. Some things are a bit weird though.
* Pressing alt-t for transpose word doesn't work, and inserts lots of
NULLs in the command line, shown as ^@.
* NULLs aren't saved in the history, and when up-arrowing to a command
with a NULL in it, everything after it is cut off. (this one might not
have anything to do with unicode, i am too lazy to turn it off to try
and it's probably some sort of bug anyway :)
* Sometimes, strings with multibyte characters are only partially
saved in history too, like katakana PU. Maybe they contain a NULL, i
don't know.
* Having zsh in utf-8 locale but the terminal inputting for example
ISO-8859-1 makes zsh enter some weird state where it doesn't accept
input correctly. Going to a history entry with invalid UTF-8 also
seems to do weird things.

I can't provide any backtraces or something like that since zsh
doesn't actually crash anywhere.
I think there was something else too, but I can't seem to remember it right now.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-07 20:30 various weirdnesses with unicode support Mikael Magnusson
@ 2005-09-08  8:55 ` David Gómez
  2005-09-08 10:02   ` Peter Stephenson
  2005-09-09 12:29 ` Mikael Magnusson
  2005-09-09 20:32 ` Peter Stephenson
  2 siblings, 1 reply; 13+ messages in thread
From: David Gómez @ 2005-09-08  8:55 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: zsh-workers

Hi Mikael ;),

On Sep 07 at 10:30:24, Mikael Magnusson wrote:
> * Having zsh in utf-8 locale but the terminal inputting for example
> ISO-8859-1 makes zsh enter some weird state where

Yep, i noticed too this one. It becomes necessary to logout, resetting
the terminal doesn't fix it. From your list, i think this is the
most serious problem.

bye

-- 
David Gómez                                      Jabber ID: davidge@jabber.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-08  8:55 ` David Gómez
@ 2005-09-08 10:02   ` Peter Stephenson
  2005-09-08 10:32     ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2005-09-08 10:02 UTC (permalink / raw)
  To: zsh-workers

David =?utf-8?B?R8OzbWV6?= wrote:
> Hi Mikael ;),
> 
> On Sep 07 at 10:30:24, Mikael Magnusson wrote:
> > * Having zsh in utf-8 locale but the terminal inputting for example
> > ISO-8859-1 makes zsh enter some weird state where
> 
> Yep, i noticed too this one. It becomes necessary to logout, resetting
> the terminal doesn't fix it. From your list, i think this is the
> most serious problem.

It might be because the multibyte input state never gets reset
(getrestchar() in zle_main.c).  How does this manifest itself?  Are you
unable even to generate a new line?  If you can, resetting the state for
each line would be enough.  Otherwise we could time out multibyte
characters and return, for example, a '?'.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-08 10:02   ` Peter Stephenson
@ 2005-09-08 10:32     ` Peter Stephenson
  2005-09-08 14:31       ` Mikael Magnusson
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2005-09-08 10:32 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson <pws@csr.com> wrote:
> David =?utf-8?B?R8OzbWV6?= wrote:
> > Hi Mikael ;),
> > 
> > On Sep 07 at 10:30:24, Mikael Magnusson wrote:
> > > * Having zsh in utf-8 locale but the terminal inputting for example
> > > ISO-8859-1 makes zsh enter some weird state where
> > 
> > Yep, i noticed too this one. It becomes necessary to logout, resetting
> > the terminal doesn't fix it. From your list, i think this is the
> > most serious problem.
> 
> It might be because the multibyte input state never gets reset
> (getrestchar() in zle_main.c).  How does this manifest itself?  Are you
> unable even to generate a new line?  If you can, resetting the state for
> each line would be enough.  Otherwise we could time out multibyte
> characters and return, for example, a '?'.

For example, this patch uses the existing $KEYTIMEOUT variable to time out
the remaining bytes of a multibyte character.  If mbrtowc() reported
there was more, but reading the next byte took more than $KEYTIMEOUT
hundredths of a second, the character is returned as a wide '?' and
the shift state for character input is reset.

Much of the patch is adding the extra argument to getbyte to differentiate
a real EOF from a timeout.

Does this help?  I suspect we probably need something like this anyway.

Index: Src/builtin.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v
retrieving revision 1.146
diff -u -r1.146 builtin.c
--- Src/builtin.c	10 Aug 2005 07:45:17 -0000	1.146
+++ Src/builtin.c	8 Sep 2005 10:24:14 -0000
@@ -4539,7 +4539,7 @@
 
 	do {
 	    if (izle) {
-		if ((val = getkeyptr(0)) < 0)
+		if ((val = getkeyptr(0, NULL)) < 0)
 		    break;
 		*bptr++ = (char) val;
 		nchars--;
@@ -4595,7 +4595,7 @@
 
 	/* get, and store, reply */
 	if (izle) {
-	    int key = getkeyptr(0);
+	    int key = getkeyptr(0, NULL);
 
 	    readbuf[0] = (key == 'y' ? 'y' : 'n');
 	} else {
@@ -4818,7 +4818,7 @@
     int ret;
 
     if (izle) {
-	int c = getkeyptr(0);
+	int c = getkeyptr(0, NULL);
 
 	return (c < 0 ? EOF : c);
     }
Index: Src/init.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/init.c,v
retrieving revision 1.56
diff -u -r1.56 init.c
--- Src/init.c	9 Aug 2005 09:33:50 -0000	1.56
+++ Src/init.c	8 Sep 2005 10:24:14 -0000
@@ -82,7 +82,7 @@
 /* Pointer to read-key function from zle */
 
 /**/
-mod_export int (*getkeyptr) _((int));
+mod_export int (*getkeyptr) _((int, int *));
 
 /* SIGCHLD mask */
 
Index: Src/Zle/zle_keymap.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_keymap.c,v
retrieving revision 1.17
diff -u -r1.17 zle_keymap.c
--- Src/Zle/zle_keymap.c	15 Aug 2005 10:01:50 -0000	1.17
+++ Src/Zle/zle_keymap.c	8 Sep 2005 10:24:14 -0000
@@ -1341,7 +1341,7 @@
 static int
 getkeybuf(int w)
 {
-    int c = getbyte(w);
+    int c = getbyte(w, NULL);
 
     if(c < 0)
 	return EOF;
Index: Src/Zle/zle_main.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_main.c,v
retrieving revision 1.73
diff -u -r1.73 zle_main.c
--- Src/Zle/zle_main.c	10 Aug 2005 10:56:41 -0000	1.73
+++ Src/Zle/zle_main.c	8 Sep 2005 10:24:14 -0000
@@ -628,13 +628,16 @@
 
 /**/
 mod_export int
-getbyte(int keytmout)
+getbyte(int keytmout, int *timeout)
 {
     char cc;
     unsigned int ret;
     int die = 0, r, icnt = 0;
     int old_errno = errno, obreaks = breaks;
 
+    if (timeout)
+	*timeout = 0;
+
 #ifdef ZLE_UNICODE_SUPPORT
     /*
      * Reading a single byte always invalidates the status
@@ -660,8 +663,12 @@
 	    dont_queue_signals();
 	    r = raw_getbyte(keytmout, &cc);
 	    restore_queue_signals(q);
-	    if (r == -2)	/* timeout */
+	    if (r == -2) {
+		/* timeout */
+		if (timeout)
+		    *timeout = 1;
 		return lastchar = EOF;
+	    }
 	    if (r == 1)
 		break;
 	    if (r == 0) {
@@ -733,7 +740,7 @@
 mod_export ZLE_INT_T
 getfullchar(int keytmout)
 {
-    int inchar = getbyte(keytmout);
+    int inchar = getbyte(keytmout, NULL);
 
 #ifdef ZLE_UNICODE_SUPPORT
     return getrestchar(inchar);
@@ -759,7 +766,7 @@
     /* char cnull = '\0'; */
     char c = inchar;
     wchar_t outchar;
-    int ret;
+    int ret, timeout;
     static mbstate_t ps;
 
     /*
@@ -784,12 +791,30 @@
 	    return lastchar_wide = WEOF;
 	}
 
-	/* No timeout here as we really need the character. */
-	inchar = getbyte(0);
+	/*
+	 * Always apply KEYTIMEOUT to the remains of the input
+	 * character.  The parts of a multibyte character should
+	 * arrive together.  If we don't do this the input can
+	 * get stuck if an invalid byte sequence arrives.
+	 */
+	inchar = getbyte(1, &timeout);
 	/* getbyte deliberately resets lastchar_wide_valid */
 	lastchar_wide_valid = 1;
-	if (inchar == EOF)
-	    return lastchar_wide = WEOF;
+	if (inchar == EOF) {
+	    if (timeout)
+	    {
+		/*
+		 * This case means that we got a valid initial byte
+		 * (since we tested for EOF above), but the followup
+		 * timed out.  This probably indicates a duff character.
+		 * Reset the shift state and return a '?'.
+		 */
+		memset(&ps, 0, sizeof(ps));
+		lastchar_wide = L'?';
+	    }
+	    else
+		return lastchar_wide = WEOF;
+	}
 	c = inchar;
     }
     return lastchar_wide = (ZLE_INT_T)outchar;
Index: Src/Zle/zle_misc.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_misc.c,v
retrieving revision 1.26
diff -u -r1.26 zle_misc.c
--- Src/Zle/zle_misc.c	15 Aug 2005 15:47:54 -0000	1.26
+++ Src/Zle/zle_misc.c	8 Sep 2005 10:24:14 -0000
@@ -595,7 +595,7 @@
      *
      * Hence for now this remains byte-by-byte.
      */
-    while ((gotk = getbyte(0)) != EOF) {
+    while ((gotk = getbyte(0, NULL)) != EOF) {
 	if (gotk == '-' && !digcnt) {
 	    minus = -1;
 	    digcnt++;
Index: Src/Zle/zle_vi.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_vi.c,v
retrieving revision 1.10
diff -u -r1.10 zle_vi.c
--- Src/Zle/zle_vi.c	17 Aug 2005 19:26:03 -0000	1.10
+++ Src/Zle/zle_vi.c	8 Sep 2005 10:24:14 -0000
@@ -108,7 +108,7 @@
     char m[3], *str;
     Thingy cmd;
 
-    if(getbyte(0) == EOF)
+    if (getbyte(0, NULL) == EOF)
 	return ZLEEOF;
 
     m[0] = lastchar;


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-08 10:32     ` Peter Stephenson
@ 2005-09-08 14:31       ` Mikael Magnusson
  2005-09-09 11:53         ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Mikael Magnusson @ 2005-09-08 14:31 UTC (permalink / raw)
  To: zsh-workers

On 9/8/05, Peter Stephenson <pws@csr.com> wrote:
> Peter Stephenson <pws@csr.com> wrote:
> > David =?utf-8?B?R8OzbWV6?= wrote:
> > > Hi Mikael ;),
> > >
> > > On Sep 07 at 10:30:24, Mikael Magnusson wrote:
> > > > * Having zsh in utf-8 locale but the terminal inputting for example
> > > > ISO-8859-1 makes zsh enter some weird state where
> > >
> > > Yep, i noticed too this one. It becomes necessary to logout, resetting
> > > the terminal doesn't fix it. From your list, i think this is the
> > > most serious problem.
> >
> > It might be because the multibyte input state never gets reset
> > (getrestchar() in zle_main.c).  How does this manifest itself?  Are you
> > unable even to generate a new line?  If you can, resetting the state for
> > each line would be enough.  Otherwise we could time out multibyte
> > characters and return, for example, a '?'.
> 
> For example, this patch uses the existing $KEYTIMEOUT variable to time out
> the remaining bytes of a multibyte character.  If mbrtowc() reported
> there was more, but reading the next byte took more than $KEYTIMEOUT
> hundredths of a second, the character is returned as a wide '?' and
> the shift state for character input is reset.
> 
> Much of the patch is adding the extra argument to getbyte to differentiate
> a real EOF from a timeout.
> 
> Does this help?  I suspect we probably need something like this anyway.

It seems to help a bit. If i press å (a with a ring), and then press a
a few times, zle still seems to be in a mostly normal state. But if i
press several å in a row i can't seem to get out.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-08 14:31       ` Mikael Magnusson
@ 2005-09-09 11:53         ` Peter Stephenson
  2005-09-09 12:33           ` Mikael Magnusson
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2005-09-09 11:53 UTC (permalink / raw)
  To: zsh-workers

Mikael Magnusson <mikachu@gmail.com> wrote:
> > > > > * Having zsh in utf-8 locale but the terminal inputting for example
> > > > > ISO-8859-1 makes zsh enter some weird state where
>
> > For example, this patch uses the existing $KEYTIMEOUT variable to time out
> > the remaining bytes of a multibyte character.  If mbrtowc() reported
> > there was more, but reading the next byte took more than $KEYTIMEOUT
> > hundredths of a second, the character is returned as a wide '?' and
> > the shift state for character input is reset.
> > 
> > Much of the patch is adding the extra argument to getbyte to differentiate
> > a real EOF from a timeout.
> > 
> > Does this help?  I suspect we probably need something like this anyway.
> 
> It seems to help a bit. If i press å (a with a ring), and then press a
> a few times, zle still seems to be in a mostly normal state. But if i
> press several å in a row i can't seem to get out.

I've checked this in anyway, with some slight tweaks: the MB state is
also reset if getrestchar() returns EOF, and I've added zle documentation
indicating that $KEYTIMEOUT is applied to multibyte characters.

I won't be much use with debugging this since I have currently no way
of generating invalid characters.  X Windows is smart enough to convert
between windows.  There may be some way of confusing it with the locale
or character set.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-07 20:30 various weirdnesses with unicode support Mikael Magnusson
  2005-09-08  8:55 ` David Gómez
@ 2005-09-09 12:29 ` Mikael Magnusson
  2005-09-09 13:44   ` Peter Stephenson
  2005-09-09 20:32 ` Peter Stephenson
  2 siblings, 1 reply; 13+ messages in thread
From: Mikael Magnusson @ 2005-09-09 12:29 UTC (permalink / raw)
  To: zsh-workers

On 9/7/05, Mikael Magnusson <mikachu@gmail.com> wrote:
> I've been using zsh in sv_SE.utf8 locale now for a few weeks, and it
> is mostly working fine. Some things are a bit weird though.

> I think there was something else too, but I can't seem to remember it right now.

I just hit one thing i forgot about. The minibuffer doesn't seem to
work at all. I just tried zsh -f and ran
bindkey '^a' where-is
and it doesn't work, so it's not my rc file being weird or something.
The prompt does show up but i can't enter any characters into it.
(ctrl-r history search still works fine though).

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-09 11:53         ` Peter Stephenson
@ 2005-09-09 12:33           ` Mikael Magnusson
  2005-09-09 16:52             ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Mikael Magnusson @ 2005-09-09 12:33 UTC (permalink / raw)
  To: zsh-workers

On 9/9/05, Peter Stephenson <pws@csr.com> wrote:
> Mikael Magnusson <mikachu@gmail.com> wrote:
> > It seems to help a bit. If i press å (a with a ring), and then press a
> > a few times, zle still seems to be in a mostly normal state. But if i
> > press several å in a row i can't seem to get out.
> 
> I've checked this in anyway, with some slight tweaks: the MB state is
> also reset if getrestchar() returns EOF, and I've added zle documentation
> indicating that $KEYTIMEOUT is applied to multibyte characters.
> 
> I won't be much use with debugging this since I have currently no way
> of generating invalid characters.  X Windows is smart enough to convert
> between windows.  There may be some way of confusing it with the locale
> or character set.

You can try stuff like echo `echo åäö or other characters | iconv -f
UTF-8 -t ISO-8859-1`<tab>
It will just expand to an empty string in the command line, but the
completion listing does show the å (rxvt-unicode shows invalid
characters as latin-1 which is not really the right thing to do but
still more helpful than a box).
As for confusing zsh with keyboard input, just running a terminal in
some iso-8859 mode and running export LC_CTYPE=xx_XX.UTF-8 should be
enough to trigger the thing i talked about, since zsh then expects the
input to be UTF-8.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-09 12:29 ` Mikael Magnusson
@ 2005-09-09 13:44   ` Peter Stephenson
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Stephenson @ 2005-09-09 13:44 UTC (permalink / raw)
  To: zsh-workers

Mikael Magnusson <mikachu@gmail.com> wrote:
> I just hit one thing i forgot about. The minibuffer doesn't seem to
> work at all. I just tried zsh -f and ran
> bindkey '^a' where-is
> and it doesn't work, so it's not my rc file being weird or something.
> The prompt does show up but i can't enter any characters into it.
> (ctrl-r history search still works fine though).

Oops.

Index: Src/Zle/zle_misc.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_misc.c,v
retrieving revision 1.27
diff -u -r1.27 zle_misc.c
--- Src/Zle/zle_misc.c	9 Sep 2005 11:48:28 -0000	1.27
+++ Src/Zle/zle_misc.c	9 Sep 2005 13:42:10 -0000
@@ -955,7 +955,7 @@
 		else {
 #ifdef ZLE_UNICODE_SUPPORT
 		    if (!lastchar_wide_valid)
-			getrestchar(0);
+			getrestchar(lastchar);
 		    if (iswcntrl(lastchar_wide))
 #else
 		    if (icntrl(lastchar))


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-09 12:33           ` Mikael Magnusson
@ 2005-09-09 16:52             ` Peter Stephenson
  2005-09-09 16:56               ` David Gómez
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2005-09-09 16:52 UTC (permalink / raw)
  To: zsh-workers

Mikael Magnusson <mikachu@gmail.com> wrote:
> As for confusing zsh with keyboard input, just running a terminal in
> some iso-8859 mode and running export LC_CTYPE=xx_XX.UTF-8 should be
> enough to trigger the thing i talked about, since zsh then expects the
> input to be UTF-8.

Something like this worked... this leads to the following patch which
should improve matters a bit.

Index: Src/Zle/zle_main.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_main.c,v
retrieving revision 1.74
diff -u -r1.74 zle_main.c
--- Src/Zle/zle_main.c	9 Sep 2005 11:48:28 -0000	1.74
+++ Src/Zle/zle_main.c	9 Sep 2005 16:50:19 -0000
@@ -814,7 +814,8 @@
 		 * timed out.  This probably indicates a duff character.
 		 * Return a '?'.
 		 */
-		lastchar_wide = L'?';
+		lastchar = '?';
+		return lastchar_wide = L'?';
 	    }
 	    else
 		return lastchar_wide = WEOF;



-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-09 16:52             ` Peter Stephenson
@ 2005-09-09 16:56               ` David Gómez
  0 siblings, 0 replies; 13+ messages in thread
From: David Gómez @ 2005-09-09 16:56 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

Hi Peter ;),

On Sep 09 at 05:52:44, Peter Stephenson wrote:
> Mikael Magnusson <mikachu@gmail.com> wrote:
> > As for confusing zsh with keyboard input, just running a terminal in
> > some iso-8859 mode and running export LC_CTYPE=xx_XX.UTF-8 should be
> > enough to trigger the thing i talked about, since zsh then expects the
> > input to be UTF-8.
> 
> Something like this worked... this leads to the following patch which
> should improve matters a bit.

I've been testing with you previous patch (that you said has been already
checked in) and it solved the problem, the text console didn't get into a
weird state when non UTF-8 characters were typed. I'll test again with
this last patch, but everything seems quite good ;).

regards,

-- 
David Gómez                                      Jabber ID: davidge@jabber.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-07 20:30 various weirdnesses with unicode support Mikael Magnusson
  2005-09-08  8:55 ` David Gómez
  2005-09-09 12:29 ` Mikael Magnusson
@ 2005-09-09 20:32 ` Peter Stephenson
  2005-09-09 20:40   ` Peter Stephenson
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2005-09-09 20:32 UTC (permalink / raw)
  To: zsh-workers

Mikael Magnusson wrote:
> * Pressing alt-t for transpose word doesn't work, and inserts lots of
> NULLs in the command line, shown as ^@.

There were two issues here.  The first was that the code for
transpose-words still only handled the line as single-byte strings,
which was just plain wrong.

The second was that our iword() macro doesn't handle wide characters.
The partial fix here should break the back of the problem by allowing
the existing iword() to work on ASCII characters and assuming for now
that testing for alphanumerics is good enough for the remainder.
I don't think the extra execution time from using a function instead of
a macro is all that significant for the uses we have.

A full fix will be to scan $WORDCHARS for multibyte characters and
squirrel those or the corresponding wide characters away somewhere,
either in a hash table (probably easiest since we have the
infrastructure, although I don't know if the hashing algorithm will be
good enough to cope) or something like the multibyte keymaps, i.e. a set
of sparse tables.

As I've noted, extending iident() along the same lines should be
easy---it's the same fix, but here we simply return 0 if the character
isn't ASCII.  This would be a nice straigtforward exercise for someone
eager to get on in the zsh world.  (No, I haven't either.)

Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.88
diff -u -r1.88 utils.c
--- Src/utils.c	17 Aug 2005 19:17:40 -0000	1.88
+++ Src/utils.c	9 Sep 2005 20:14:27 -0000
@@ -2469,6 +2469,42 @@
 	typtab[bangchar] |= ISPECIAL;
 }
 
+
+#ifdef ZLE_UNICODE_SUPPORT
+/*
+ * iword() macro extended to support wide characters.
+ */
+
+/**/
+mod_export int
+wcsiword(wchar_t c)
+{
+    int len;
+    VARARR(char, outstr, MB_CUR_MAX);
+    /*
+     * Strategy:  the shell requires that the multibyte representation
+     * be an extension of ASCII.  So see if converting the character
+     * produces an ASCII character.  If it does, use iword on that.
+     * If it doesn't, use iswalnum on the original character.  This
+     * is pretty good most of the time.
+     *
+     * TODO: extend WORDCHARS to handle multibyte chars by some kind
+     * of hierarchical list or hash table.
+     */
+    len = wctomb(outstr, c);
+
+    if (len == 0) {
+	/* NULL is special */
+	return iword(0);
+    } else if (len == 1 && isascii(*outstr)) {
+	return iword(*outstr);
+    } else {
+	return iswalnum(c);
+    }
+}
+#endif
+
+
 /**/
 mod_export char **
 arrdup(char **s)
Index: Src/Zle/zle.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle.h,v
retrieving revision 1.17
diff -u -r1.17 zle.h
--- Src/Zle/zle.h	15 Aug 2005 17:20:29 -0000	1.17
+++ Src/Zle/zle.h	9 Sep 2005 20:14:31 -0000
@@ -69,11 +69,13 @@
 /*
  * TODO: doesn't work on arguments with side effects.
  * Also YUK.  Not even sure this is guaranteed to work.
+ * Should be easy to do along the lines of wcsiword.
  */
 #define ZC_iident(x)	(x < 256 && iident((int)x))
 
 #define ZC_tolower towlower
 #define ZC_toupper towupper
+#define ZC_iword  wcsiword
 
 #define LASTFULLCHAR	lastchar_wide
 
@@ -122,6 +124,7 @@
 
 #define ZC_tolower tulower
 #define ZC_toupper tuupper
+#define ZC_iword   iword
 
 #define LASTFULLCHAR	lastchar
 
Index: Src/Zle/zle_misc.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_misc.c,v
retrieving revision 1.28
diff -u -r1.28 zle_misc.c
--- Src/Zle/zle_misc.c	9 Sep 2005 13:49:00 -0000	1.28
+++ Src/Zle/zle_misc.c	9 Sep 2005 20:14:42 -0000
@@ -623,10 +623,10 @@
     int len, t0;
 
     for (t0 = zlecs - 1; t0 >= 0; t0--)
-	if (iword(zleline[t0]))
+	if (ZC_iword(zleline[t0]))
 	    break;
     for (; t0 >= 0; t0--)
-	if (!iword(zleline[t0]))
+	if (!ZC_iword(zleline[t0]))
 	    break;
     if (t0)
 	t0++;
Index: Src/Zle/zle_word.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_word.c,v
retrieving revision 1.5
diff -u -r1.5 zle_word.c
--- Src/Zle/zle_word.c	26 Feb 2005 07:40:57 -0000	1.5
+++ Src/Zle/zle_word.c	9 Sep 2005 20:14:44 -0000
@@ -30,11 +30,6 @@
 #include "zle.mdh"
 #include "zle_word.pro"
 
-/*
- * TODO: use of iword needs completely rethinking for Unicode
- * since we can't base it on a table lookup.
- */
-
 /**/
 int
 forwardword(char **args)
@@ -49,11 +44,11 @@
 	return ret;
     }
     while (n--) {
-	while (zlecs != zlell && iword(zleline[zlecs]))
+	while (zlecs != zlell && ZC_iword(zleline[zlecs]))
 	    zlecs++;
 	if (wordflag && !n)
 	    return 0;
-	while (zlecs != zlell && !iword(zleline[zlecs]))
+	while (zlecs != zlell && !ZC_iword(zleline[zlecs]))
 	    zlecs++;
     }
     return 0;
@@ -125,11 +120,11 @@
 	return ret;
     }
     while (n--) {
-	while (zlecs != zlell && !iword(zleline[zlecs]))
+	while (zlecs != zlell && !ZC_iword(zleline[zlecs]))
 	    zlecs++;
 	if (wordflag && !n)
 	    return 0;
-	while (zlecs != zlell && iword(zleline[zlecs]))
+	while (zlecs != zlell && ZC_iword(zleline[zlecs]))
 	    zlecs++;
     }
     return 0;
@@ -197,9 +192,9 @@
 	return ret;
     }
     while (n--) {
-	while (zlecs && !iword(zleline[zlecs - 1]))
+	while (zlecs && !ZC_iword(zleline[zlecs - 1]))
 	    zlecs--;
-	while (zlecs && iword(zleline[zlecs - 1]))
+	while (zlecs && ZC_iword(zleline[zlecs - 1]))
 	    zlecs--;
     }
     return 0;
@@ -267,9 +262,9 @@
 	return ret;
     }
     while (n--) {
-	while (zlecs && !iword(zleline[zlecs - 1]))
+	while (zlecs && !ZC_iword(zleline[zlecs - 1]))
 	    zlecs--;
-	while (zlecs && iword(zleline[zlecs - 1]))
+	while (zlecs && ZC_iword(zleline[zlecs - 1]))
 	    zlecs--;
     }
     return 0;
@@ -289,9 +284,9 @@
 	return ret;
     }
     while (n--) {
-	while (x && !iword(zleline[x - 1]))
+	while (x && !ZC_iword(zleline[x - 1]))
 	    x--;
-	while (x && iword(zleline[x - 1]))
+	while (x && ZC_iword(zleline[x - 1]))
 	    x--;
     }
     backdel(zlecs - x);
@@ -337,9 +332,9 @@
 	return ret;
     }
     while (n--) {
-	while (x && !iword(zleline[x - 1]))
+	while (x && !ZC_iword(zleline[x - 1]))
 	    x--;
-	while (x && iword(zleline[x - 1]))
+	while (x && ZC_iword(zleline[x - 1]))
 	    x--;
     }
     backkill(zlecs - x, 1);
@@ -356,9 +351,9 @@
     if (neg)
 	n = -n;
     while (n--) {
-	while (zlecs != zlell && !iword(zleline[zlecs]))
+	while (zlecs != zlell && !ZC_iword(zleline[zlecs]))
 	    zlecs++;
-	while (zlecs != zlell && iword(zleline[zlecs])) {
+	while (zlecs != zlell && ZC_iword(zleline[zlecs])) {
 	    zleline[zlecs] = ZC_toupper(zleline[zlecs]);
 	    zlecs++;
 	}
@@ -378,9 +373,9 @@
     if (neg)
 	n = -n;
     while (n--) {
-	while (zlecs != zlell && !iword(zleline[zlecs]))
+	while (zlecs != zlell && !ZC_iword(zleline[zlecs]))
 	    zlecs++;
-	while (zlecs != zlell && iword(zleline[zlecs])) {
+	while (zlecs != zlell && ZC_iword(zleline[zlecs])) {
 	    zleline[zlecs] = ZC_tolower(zleline[zlecs]);
 	    zlecs++;
 	}
@@ -401,11 +396,11 @@
 	n = -n;
     while (n--) {
 	first = 1;
-	while (zlecs != zlell && !iword(zleline[zlecs]))
+	while (zlecs != zlell && !ZC_iword(zleline[zlecs]))
 	    zlecs++;
-	while (zlecs != zlell && iword(zleline[zlecs]) && !isalpha(zleline[zlecs]))
+	while (zlecs != zlell && ZC_iword(zleline[zlecs]) && !isalpha(zleline[zlecs]))
 	    zlecs++;
-	while (zlecs != zlell && iword(zleline[zlecs])) {
+	while (zlecs != zlell && ZC_iword(zleline[zlecs])) {
 	    zleline[zlecs] = (first) ? ZC_toupper(zleline[zlecs]) :
 		ZC_tolower(zleline[zlecs]);
 	    first = 0;
@@ -432,9 +427,9 @@
 	return ret;
     }
     while (n--) {
-	while (x != zlell && !iword(zleline[x]))
+	while (x != zlell && !ZC_iword(zleline[x]))
 	    x++;
-	while (x != zlell && iword(zleline[x]))
+	while (x != zlell && ZC_iword(zleline[x]))
 	    x++;
     }
     foredel(x - zlecs);
@@ -456,9 +451,9 @@
 	return ret;
     }
     while (n--) {
-	while (x != zlell && !iword(zleline[x]))
+	while (x != zlell && !ZC_iword(zleline[x]))
 	    x++;
-	while (x != zlell && iword(zleline[x]))
+	while (x != zlell && ZC_iword(zleline[x]))
 	    x++;
     }
     forekill(x - zlecs, 0);
@@ -469,36 +464,43 @@
 int
 transposewords(UNUSED(char **args))
 {
-    int p1, p2, p3, p4, x = zlecs;
-    char *temp, *pp;
+    int p1, p2, p3, p4, len, x = zlecs;
+    ZLE_STRING_T temp, pp;
     int n = zmult;
     int neg = n < 0, ocs = zlecs;
 
     if (neg)
 	n = -n;
     while (n--) {
-	while (x != zlell && zleline[x] != '\n' && !iword(zleline[x]))
+	while (x != zlell && zleline[x] != ZWC('\n') && !ZC_iword(zleline[x]))
 	    x++;
-	if (x == zlell || zleline[x] == '\n') {
+	if (x == zlell || zleline[x] == ZWC('\n')) {
 	    x = zlecs;
-	    while (x && zleline[x - 1] != '\n' && !iword(zleline[x]))
+	    while (x && zleline[x - 1] != ZWC('\n') && !ZC_iword(zleline[x]))
 		x--;
-	    if (!x || zleline[x - 1] == '\n')
+	    if (!x || zleline[x - 1] == ZWC('\n'))
 		return 1;
 	}
-	for (p4 = x; p4 != zlell && iword(zleline[p4]); p4++);
-	for (p3 = p4; p3 && iword(zleline[p3 - 1]); p3--);
+	for (p4 = x; p4 != zlell && ZC_iword(zleline[p4]); p4++);
+	for (p3 = p4; p3 && ZC_iword(zleline[p3 - 1]); p3--);
 	if (!p3)
 	    return 1;
-	for (p2 = p3; p2 && !iword(zleline[p2 - 1]); p2--);
+	for (p2 = p3; p2 && !ZC_iword(zleline[p2 - 1]); p2--);
 	if (!p2)
 	    return 1;
-	for (p1 = p2; p1 && iword(zleline[p1 - 1]); p1--);
-	pp = temp = (char *)zhalloc(p4 - p1 + 1);
-	struncpy(&pp, (char *) zleline + p3, p4 - p3);
-	struncpy(&pp, (char *) zleline + p2, p3 - p2);
-	struncpy(&pp, (char *) zleline + p1, p2 - p1);
-	strncpy((char *)zleline + p1, temp, p4 - p1);
+	for (p1 = p2; p1 && ZC_iword(zleline[p1 - 1]); p1--);
+
+	pp = temp = (ZLE_STRING_T)zhalloc((p4 - p1)*ZLE_CHAR_SIZE);
+	len = p4 - p3;
+	ZS_memcpy(pp, zleline + p3, len);
+	pp += len;
+	len = p3 - p2;
+	ZS_memcpy(pp, zleline + p2, len);
+	pp += len;
+	ZS_memcpy(pp, zleline + p1, p2 - p1);
+
+	ZS_memcpy(zleline + p1, temp, p4 - p1);
+
 	zlecs = p4;
     }
     if (neg)

-- 
Peter Stephenson <pws@pwstephenson.fsnet.co.uk>
Work: pws@csr.com
Web: http://www.pwstephenson.fsnet.co.uk


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: various weirdnesses with unicode support
  2005-09-09 20:32 ` Peter Stephenson
@ 2005-09-09 20:40   ` Peter Stephenson
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Stephenson @ 2005-09-09 20:40 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson wrote:
By the way, I'll now be away until about Saturday week.

pws


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-09-09 20:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-07 20:30 various weirdnesses with unicode support Mikael Magnusson
2005-09-08  8:55 ` David Gómez
2005-09-08 10:02   ` Peter Stephenson
2005-09-08 10:32     ` Peter Stephenson
2005-09-08 14:31       ` Mikael Magnusson
2005-09-09 11:53         ` Peter Stephenson
2005-09-09 12:33           ` Mikael Magnusson
2005-09-09 16:52             ` Peter Stephenson
2005-09-09 16:56               ` David Gómez
2005-09-09 12:29 ` Mikael Magnusson
2005-09-09 13:44   ` Peter Stephenson
2005-09-09 20:32 ` Peter Stephenson
2005-09-09 20:40   ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).