zsh-workers
 help / color / mirror / code / Atom feed
* Re: PATCH: read full multibyte string a bit more sooner
@ 2015-09-12 19:41 Peter Stephenson
  2015-09-12 19:49 ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2015-09-12 19:41 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]


> Which means that read-command is still returning something that
> matches [[:INCOMPLETE:]]* on that very first call, which ought to
> be impossible as I understand it.  And indeed, if I step across
> getrestchar() with a debugger, it's failing on any character that
> is more than two bytes wide (returning only the first two bytes),
> which probably leaves mbrtowc() in an indeterminate state.  (This
> is reading from the "zle -U" buffer so key timeout does not matter.)

getrestchar()  only changed to the extent of passing back extra info.
Itʼs being called at a different point but as long as the first byte  is
correct that shouldn't matter. Is that happening with 5.1.1?

pws

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 19:41 PATCH: read full multibyte string a bit more sooner Peter Stephenson
@ 2015-09-12 19:49 ` Bart Schaefer
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2015-09-12 19:49 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Sat, Sep 12, 2015 at 12:41 PM, Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
>> ... if I step across
>> getrestchar() with a debugger, it's failing on any character that
>> is more than two bytes wide (returning only the first two bytes),
>
> getrestchar()  only changed to the extent of passing back extra info.
> Itʼs being called at a different point but as long as the first byte  is
> correct that shouldn't matter. Is that happening with 5.1.1?

Yes, this appears also to happen with 5.1.1, which is why I began to
suspect something external such as my LC_CTYPE value -- but I tried
setting that to a couple of different non-English things and still
have the same problem.  mbrtowc() is always claiming a complete
character after two bytes on that sample input.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-19 19:25     ` Peter Stephenson
@ 2015-09-19 20:49       ` Bart Schaefer
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2015-09-19 20:49 UTC (permalink / raw)
  To: Zsh hackers list

On Sep 19,  8:25pm, Peter Stephenson wrote:
} Subject: Re: PATCH: read full multibyte string a bit more sooner
}
} Is that the last of the problems with multibyte strings?  It sounded
} more copmlicated than that...

I haven't run into anything else (at least nothing that wasn't
addressed by subsequent patches) but I may not be a good test case.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 23:09   ` Bart Schaefer
@ 2015-09-19 19:25     ` Peter Stephenson
  2015-09-19 20:49       ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2015-09-19 19:25 UTC (permalink / raw)
  To: Zsh hackers list

Is that the last of the problems with multibyte strings?  It sounded
more copmlicated than that...

pws


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 20:35 ` Bart Schaefer
@ 2015-09-12 23:09   ` Bart Schaefer
  2015-09-19 19:25     ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2015-09-12 23:09 UTC (permalink / raw)
  To: Zsh hackers list

On Sep 12,  1:35pm, Bart Schaefer wrote:
}
} So it's specific to reading bytes out of (or putting them into) the
} kungetbuf.

This probably goes back all the way to the initial implementation of
multibyte.  If the input to "zle -U" is the wide character represented
by the two bytes 0xc4 0x84, then after it passes through "zle -U" it
comes back as the three bytes 0xc4 0x83 0xa4, which are then handled
literally by getbyte() -- so this is a metafication problem.

The string is already metafied when it comes to bin_zle_unget() in
the args array.  I suppose it should be unmetafied there, only to be
metafied again later?  Indeed, that seems to work.

(Weird that unmeta() calls its argument "file_name".  Historical, I
suppose.)


diff --git a/Src/Zle/zle_thingy.c b/Src/Zle/zle_thingy.c
index 7fd3a59..da3a6d4 100644
--- a/Src/Zle/zle_thingy.c
+++ b/Src/Zle/zle_thingy.c
@@ -466,7 +466,7 @@ bin_zle_mesg(char *name, char **args, UNUSED(Options ops), UNUSED(char func))
 static int
 bin_zle_unget(char *name, char **args, UNUSED(Options ops), UNUSED(char func))
 {
-    char *b = *args, *p = b + strlen(b);
+    char *b = unmeta(*args), *p = b + strlen(b);
 
     if (!zleactive) {
 	zwarnnam(name, "can only be called from widget function");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 20:07 Peter Stephenson
@ 2015-09-12 20:35 ` Bart Schaefer
  2015-09-12 23:09   ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2015-09-12 20:35 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Sat, Sep 12, 2015 at 1:07 PM, Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
> Certainly sounds like it thinks you're in some 16-bit locale.

That doesn't explain why there's a sensible result with self-insert
instead of with read-command.

I think I've got this narrowed down now; this reproduces it:

burner% zfoo() { zle -U 'Ą Пётр Ильич Чайковский 梶浦由記' }
burner% zle -N zfoo
burner% ă<ffffffff> Ѓ<ffffffff>у<ffffffff>тр
Ѓ<ffffffff>лу<ffffffff>иу<ffffffff> Чайковский
梶浦烴<ffffffff>訃<ffffffff>

(esc-x zfoo ret after the zle -N)

So it's specific to reading bytes out of (or putting them into) the kungetbuf.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
@ 2015-09-12 20:07 Peter Stephenson
  2015-09-12 20:35 ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2015-09-12 20:07 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 162 bytes --]

Certainly sounds like it thinks you're in some 16-bit locale. 
LC_ALL is the only obvious spanner in the works. Try setting
that to your UTF-8 locale too? 

pws

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 16:46     ` Bart Schaefer
  2015-09-12 17:56       ` Peter Stephenson
@ 2015-09-12 18:02       ` Bart Schaefer
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2015-09-12 18:02 UTC (permalink / raw)
  To: Zsh hackers list

On Sep 12,  9:46am, Bart Schaefer wrote:
}
} I then switched back to zsh-5.1.1-dev-0 and tried to repeat this.
} 
} The very first time I pasted the test string, I got this: [...]
} 
} As you can see this is ALMOST correct, except for that unexpected
} trailing tilde

So one more tidbit:  I reverted back to the 5.1 version of the
bracketed-paste-magic source with the 5.1.1-dev-0 binary and tried
this again, and with the mbchar+= loop removed (so relying solely
on read-command to read multiple bytes properly), I NEVER get a
correct result, even on first paste -- I always get the <ffffffff>
garbage.

Which means that read-command is still returning something that
matches [[:INCOMPLETE:]]* on that very first call, which ought to
be impossible as I understand it.  And indeed, if I step across
getrestchar() with a debugger, it's failing on any character that
is more than two bytes wide (returning only the first two bytes),
which probably leaves mbrtowc() in an indeterminate state.  (This
is reading from the "zle -U" buffer so key timeout does not matter.)

Is it possible this is happening because my environment LC_CTYPE
is not set, even though I have LANG=en_US.UTF-8 ?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12 16:46     ` Bart Schaefer
@ 2015-09-12 17:56       ` Peter Stephenson
  2015-09-12 18:02       ` Bart Schaefer
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Stephenson @ 2015-09-12 17:56 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list


On 12 Sep 2015 17:46, Bart Schaefer <schaefer@brasslantern.com> wrote:
> burner% Ą Пётр Ильич Чайковский 梶浦由記~ 
>
> As you can see this is ALMOST correct, except for that unexpected 
> trailing tilde, which must be part of the terminal escape for ending 
> paste-mode? 
>
> Sadly the next time I try pasting, I get this: 
>
> burner% ă<ffffffff> Ѓ<ffffffff>у<ffffffff>тр 
> Ѓ<ffffffff>лу<ffffffff>иу<ffffffff> Чайковский 
> 梶浦烴<ffffffff>訃<ffffffff> 
> Ą Пётр Ильич Чайковский 梶浦由記 
>
> (where all those <ffffffff> are highlighted).

Rather than an issue with meta conversion,
that's looking more like a length miscalculation of
a more basic nature or some mixture of effects.

Most likely it's not too far
from the code I changed.  I suspect tracing the
string that becomes KEYS might reveal it. 

I'll be drinking my Tribute and following with great
amusement (until it becomes my problem).

pws

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-12  9:57   ` Peter Stephenson
@ 2015-09-12 16:46     ` Bart Schaefer
  2015-09-12 17:56       ` Peter Stephenson
  2015-09-12 18:02       ` Bart Schaefer
  0 siblings, 2 replies; 13+ messages in thread
From: Bart Schaefer @ 2015-09-12 16:46 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Sep 12, 2015 at 2:57 AM, Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> On 11 Sep 2015 23:42, Bart Schaefer <schaefer@brasslantern.com> wrote:
>>
>> This breaks for me with bracketed-paste-magic when pasting the multibyte
>> strings from Test/D07multibyte, specifically "More metafied characters
>> in prompt expansion" test that has several different languages.

I just reverted to the zsh-5.1.1 tag and tried again, and it breaks
there, too, so this is probably not specific to the patch in 36483.

> I won't have the source or anything more than
> phones or tablets for a week, but it might be
> meta aggro again.

Unfortunately I don't know what that refers to.

> I've a vague memory 'a grave' has one, if you
> want an easy check.

I threw in a { zle -M -- "$PASTED"; zle -R } in the read-command loop
and got the following output (hope it comes through OK with the
mutibyte in email).  Here is the test string I'm pasting:

Ą Пётр Ильич Чайковский 梶浦由記

And the result (5.1.1 without 36483):

burner% ă Ѓутр Ѓлуиу Чайковский 梶浦烴訃
Ą Пётр Ильич Чайковский 梶浦由記

The first line is what got composed by mbchar+=$KEYS and the second
line is what is actually in $PASTED.  As you can see they match for
some but not all characters.

I then switched back to zsh-5.1.1-dev-0 and tried to repeat this.
Here's where things get really interesting.

The very first time I pasted the test string, I got this:

burner% Ą Пётр Ильич Чайковский 梶浦由記~

As you can see this is ALMOST correct, except for that unexpected
trailing tilde, which must be part of the terminal escape for ending
paste-mode?

Sadly the next time I try pasting, I get this:

burner% ă<ffffffff> Ѓ<ffffffff>у<ffffffff>тр
Ѓ<ffffffff>лу<ffffffff>иу<ffffffff> Чайковский
梶浦烴<ffffffff>訃<ffffffff>
Ą Пётр Ильич Чайковский 梶浦由記

(where all those <ffffffff> are highlighted).  So either there's some
memory corruption, or the internal multibyte parsing state is messed
up, or both.

Is there someone who works in a multibyte character set all the time
who can help with figuring out where this is going wrong?  (Insight
into what happened in the first [5.1.1] case would also be
interesting.)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-11 22:42 ` Bart Schaefer
@ 2015-09-12  9:57   ` Peter Stephenson
  2015-09-12 16:46     ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2015-09-12  9:57 UTC (permalink / raw)
  To: zsh-workers


On 11 Sep 2015 23:42, Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sep 11,  9:29pm, Peter Stephenson wrote: 
> } Subject: PATCH: read full multibyte string a bit more sooner 
> } 
> } I'll commit it now the release is out of the way but it'll need some 
> } shaking down. 
>
> This breaks for me with bracketed-paste-magic when pasting the multibyte 
> strings from Test/D07multibyte, specifically "More metafied characters 
> in prompt expansion" test that has several different languages. 
>
> I even tried reverting to the previous bracketed-paste-magic that does 
> not have the [[:INCOMPLETE:]] loop, and it fails in the same way.  I'm 
> not entirely certain, but I think the value in $KEYS is wrong, it no 
> longer works to use PASTED=${PASTED#$KEYS} (nor the $mbchar variation) 
> so the read-command loop never stops. 

I won't have the source or anything more than
phones or tablets for a week, but it might be
meta aggro again.

I've a vague memory 'a grave' has one, if you
want an easy check.

If you want a difficult check Mikael has lots
and lots.

pws


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PATCH: read full multibyte string a bit more sooner
  2015-09-11 20:29 Peter Stephenson
@ 2015-09-11 22:42 ` Bart Schaefer
  2015-09-12  9:57   ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2015-09-11 22:42 UTC (permalink / raw)
  To: Zsh hackers list

On Sep 11,  9:29pm, Peter Stephenson wrote:
} Subject: PATCH: read full multibyte string a bit more sooner
}
} I'll commit it now the release is out of the way but it'll need some
} shaking down.

This breaks for me with bracketed-paste-magic when pasting the multibyte
strings from Test/D07multibyte, specifically "More metafied characters
in prompt expansion" test that has several different languages.

I even tried reverting to the previous bracketed-paste-magic that does
not have the [[:INCOMPLETE:]] loop, and it fails in the same way.  I'm
not entirely certain, but I think the value in $KEYS is wrong, it no
longer works to use PASTED=${PASTED#$KEYS} (nor the $mbchar variation)
so the read-command loop never stops.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* PATCH: read full multibyte string a bit more sooner
@ 2015-09-11 20:29 Peter Stephenson
  2015-09-11 22:42 ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Stephenson @ 2015-09-11 20:29 UTC (permalink / raw)
  To: Zsh hackers list

This is going along the right lines to make KEYS and read-command do the
right thing with self-insert.  I'm a little unhappy at the special
casing, but that aspect isn't obviously any worse than what we did
before.

I'll commit it now the release is out of the way but it'll need some
shaking down.

Some other tests for !lastchar_wide_valid may have become redundant.

pws

diff --git a/Src/Zle/zle_hist.c b/Src/Zle/zle_hist.c
index c61b4ef..0cff039 100644
--- a/Src/Zle/zle_hist.c
+++ b/Src/Zle/zle_hist.c
@@ -1643,7 +1643,7 @@ doisearch(char **args, int dir, int pattern)
 	    } else if (cmd == Th(z_selfinsert)) {
 #ifdef MULTIBYTE_SUPPORT
 		if (!lastchar_wide_valid)
-		    if (getrestchar(lastchar) == WEOF) {
+		    if (getrestchar(lastchar, NULL, NULL) == WEOF) {
 			handlefeep(zlenoargs);
 			continue;
 		    }
@@ -1877,7 +1877,7 @@ getvisrchstr(void)
 	    } else {
 #ifdef MULTIBYTE_SUPPORT
 		if (!lastchar_wide_valid)
-		    if (getrestchar(lastchar) == WEOF) {
+		    if (getrestchar(lastchar, NULL, NULL) == WEOF) {
 			handlefeep(zlenoargs);
 			continue;
 		    }
diff --git a/Src/Zle/zle_keymap.c b/Src/Zle/zle_keymap.c
index 5b4189f..0405c84 100644
--- a/Src/Zle/zle_keymap.c
+++ b/Src/Zle/zle_keymap.c
@@ -1501,6 +1501,20 @@ getkeymapcmd(Keymap km, Thingy *funcp, char **strp)
 	     * they wait till a key is pressed for the movement anyway      */
 	    timeout = !(!virangeflag && !region_active && f && f->widget &&
 		    f->widget->flags & ZLE_VIOPER);
+#ifdef MULTIBYTE_SUPPORT
+	    if ((f == Th(z_selfinsert) || f == Th(z_selfinsertunmeta)) &&
+		!lastchar_wide_valid) {
+		int len;
+		VARARR(char, mbc, MB_CUR_MAX);
+		ZLE_INT_T inchar = getrestchar(lastchar, mbc, &len);
+		if (inchar != WEOF && len) {
+		    char *ptr = mbc;
+		    lastlen += len;
+		    while (len--)
+			addkeybuf(STOUC(*ptr++));
+		}
+	    }
+#endif
 	}
 	if (!ispfx)
 	    break;
@@ -1521,6 +1535,20 @@ getkeymapcmd(Keymap km, Thingy *funcp, char **strp)
     return keybuf;
 }
 
+/**/
+static void
+addkeybuf(int c)
+{
+    if(keybuflen + 3 > keybufsz)
+	keybuf = realloc(keybuf, keybufsz *= 2);
+    if(imeta(c)) {
+	keybuf[keybuflen++] = Meta;
+	keybuf[keybuflen++] = c ^ 32;
+    } else
+	keybuf[keybuflen++] = c;
+    keybuf[keybuflen] = 0;
+}
+
 /*
  * Add a (possibly metafied) byte to the key input so far.
  * This handles individual bytes of a multibyte string separately;
@@ -1542,14 +1570,7 @@ getkeybuf(int w)
 
     if(c < 0)
 	return EOF;
-    if(keybuflen + 3 > keybufsz)
-	keybuf = realloc(keybuf, keybufsz *= 2);
-    if(imeta(c)) {
-	keybuf[keybuflen++] = Meta;
-	keybuf[keybuflen++] = c ^ 32;
-    } else
-	keybuf[keybuflen++] = c;
-    keybuf[keybuflen] = 0;
+    addkeybuf(c);
     return c;
 }
 
diff --git a/Src/Zle/zle_main.c b/Src/Zle/zle_main.c
index ec3d2c3..992f152 100644
--- a/Src/Zle/zle_main.c
+++ b/Src/Zle/zle_main.c
@@ -933,7 +933,7 @@ getfullchar(int do_keytmout)
     int inchar = getbyte((long)do_keytmout, NULL);
 
 #ifdef MULTIBYTE_SUPPORT
-    return getrestchar(inchar);
+    return getrestchar(inchar, NULL, NULL);
 #else
     return inchar;
 #endif
@@ -951,7 +951,7 @@ getfullchar(int do_keytmout)
 
 /**/
 mod_export ZLE_INT_T
-getrestchar(int inchar)
+getrestchar(int inchar, char *outstr, int *outcount)
 {
     char c = inchar;
     wchar_t outchar;
@@ -965,6 +965,8 @@ getrestchar(int inchar)
      */
     lastchar_wide_valid = 1;
 
+    if (outcount)
+	*outcount = 0;
     if (inchar == EOF) {
 	/* End of input, so reset the shift state. */
 	memset(&mbs, 0, sizeof mbs);
@@ -1013,6 +1015,10 @@ getrestchar(int inchar)
 		return lastchar_wide = WEOF;
 	}
 	c = inchar;
+	if (outstr) {
+	    *outstr++ = c;
+	    (*outcount)++;
+	}
     }
     return lastchar_wide = (ZLE_INT_T)outchar;
 }
diff --git a/Src/Zle/zle_misc.c b/Src/Zle/zle_misc.c
index 2d18628..12143e0 100644
--- a/Src/Zle/zle_misc.c
+++ b/Src/Zle/zle_misc.c
@@ -115,9 +115,7 @@ selfinsert(UNUSED(char **args))
     ZLE_CHAR_T tmp;
 
 #ifdef MULTIBYTE_SUPPORT
-    if (!lastchar_wide_valid)
-	if (getrestchar(lastchar) == WEOF)
-	    return 1;
+    DPUTS(!lastchar_wide_valid, "keybuf did not read full wide character");
 #endif
     tmp = LASTFULLCHAR;
     doinsert(&tmp, 1);
@@ -1431,7 +1429,7 @@ executenamedcommand(char *prmt)
 		else {
 #ifdef MULTIBYTE_SUPPORT
 		    if (!lastchar_wide_valid)
-			getrestchar(lastchar);
+			getrestchar(lastchar, NULL, NULL);
 		    if (lastchar_wide == WEOF)
 			feep = 1;
 		    else
diff --git a/Src/Zle/zle_vi.c b/Src/Zle/zle_vi.c
index 42dc46e..86840bd 100644
--- a/Src/Zle/zle_vi.c
+++ b/Src/Zle/zle_vi.c
@@ -151,7 +151,7 @@ vigetkey(void)
 #ifdef MULTIBYTE_SUPPORT
     if (!lastchar_wide_valid)
     {
-	getrestchar(lastchar);
+	getrestchar(lastchar, NULL, NULL);
     }
 #endif
     return LASTFULLCHAR;


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-09-19 20:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-12 19:41 PATCH: read full multibyte string a bit more sooner Peter Stephenson
2015-09-12 19:49 ` Bart Schaefer
  -- strict thread matches above, loose matches on Subject: below --
2015-09-12 20:07 Peter Stephenson
2015-09-12 20:35 ` Bart Schaefer
2015-09-12 23:09   ` Bart Schaefer
2015-09-19 19:25     ` Peter Stephenson
2015-09-19 20:49       ` Bart Schaefer
2015-09-11 20:29 Peter Stephenson
2015-09-11 22:42 ` Bart Schaefer
2015-09-12  9:57   ` Peter Stephenson
2015-09-12 16:46     ` Bart Schaefer
2015-09-12 17:56       ` Peter Stephenson
2015-09-12 18:02       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).