zsh-workers
 help / color / mirror / code / Atom feed
* Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
@ 2015-09-06 15:57 Axel Beckert
  2015-09-10 11:17 ` Axel Beckert
  2015-09-10 14:39 ` Bart Schaefer
  0 siblings, 2 replies; 23+ messages in thread
From: Axel Beckert @ 2015-09-06 15:57 UTC (permalink / raw)
  To: zsh-workers

Hi,

while trying to paste a git commit message after 'git commit -m "'
using a middle click of my mouse into an uxterm, nothing happened
anymore afterwards. The pasted text didn't display and nothing of what
I typed showed up anymore...

Until I pressed Ctrl-C. Then the text appeared without the prepended
'git commit -m "' on the commandline, plus a new prompt.

This does not happen with "zsh -f" and it only happened so far if the
pasted text contained an UTF-8 character, namely with "…" or "ä" so
far. The position of the UTF-8 character inside the pasted string does
neither matter (none of the characters in the pasted string will show
up) nor does matter if I paste into the uxterm with middle click or
Shift-Ins.

The minimal way to reproduce seems the following:

Start a "zsh -f" in an uxterm. Any other UTF-8 capable terminal likely
does it, too. Run the following two commands:

autoload -Uz bracketed-paste-magic
zle -N bracketed-paste bracketed-paste-magic

Then type "echo ", but don't press enter. Copy an UTF-8 "ä" from
somewhere else with the mouse and paste it after "echo ".

Nothing will happen. Type a few words. Nothing will happen either.

Then press Ctrl-C. The UTF-8 character and the typed words will appear
where the "echo " had been before and you will be back at a new,
emptied prompt.

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-06 15:57 Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1 Axel Beckert
@ 2015-09-10 11:17 ` Axel Beckert
  2015-09-10 15:26   ` Yuri D'Elia
  2015-09-10 14:39 ` Bart Schaefer
  1 sibling, 1 reply; 23+ messages in thread
From: Axel Beckert @ 2015-09-10 11:17 UTC (permalink / raw)
  To: zsh-workers

Hi,

On Sun, Sep 06, 2015 at 05:57:52PM +0200, Axel Beckert wrote:
> while trying to paste a git commit message after 'git commit -m "'
> using a middle click of my mouse into an uxterm, nothing happened
> anymore afterwards. The pasted text didn't display and nothing of what
> I typed showed up anymore...
> 
> Until I pressed Ctrl-C. Then the text appeared without the prepended
> 'git commit -m "' on the commandline, plus a new prompt.
[...]
> Start a "zsh -f" in an uxterm. Any other UTF-8 capable terminal likely
> does it, too. Run the following two commands:
> 
> autoload -Uz bracketed-paste-magic
> zle -N bracketed-paste bracketed-paste-magic
> 
> Then type "echo ", but don't press enter. Copy an UTF-8 "ä" from
> somewhere else with the mouse and paste it after "echo ".
> 
> Nothing will happen. Type a few words. Nothing will happen either.
> 
> Then press Ctrl-C. The UTF-8 character and the typed words will appear
> where the "echo " had been before and you will be back at a new,
> emptied prompt.

Has nobody an idea what could cause this?

I'd be glad to see this fixed in 5.1.1 as I consider it quite annoying
(and confusing), but I have no idea what I should look for in the
code.

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-06 15:57 Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1 Axel Beckert
  2015-09-10 11:17 ` Axel Beckert
@ 2015-09-10 14:39 ` Bart Schaefer
  2015-09-10 14:57   ` Axel Beckert
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 14:39 UTC (permalink / raw)
  To: Axel Beckert, zsh-workers

Hi Axel, I was traveling on all day on Tuesday and have not caught up
on all zsh list mail until today.

On Sep 6,  5:57pm, Axel Beckert wrote:
}
} This does not happen with "zsh -f" and it only happened so far if the
} pasted text contained an UTF-8 character, namely with "..." or "a" so
} far. The position of the UTF-8 character inside the pasted string does
} neither matter (none of the characters in the pasted string will show
} up) nor does matter if I paste into the uxterm with middle click or
} Shift-Ins.
} 
} autoload -Uz bracketed-paste-magic
} zle -N bracketed-paste bracketed-paste-magic

OK, this most likely means that either:

(1) There is a problem handling multibyte characters in the built-in
read-commmand widget; or

(2) There is a problem with using ${PASTED#$KEYS} to remove multi-byte
characters from the beginning of the pasted text.

I'm still dealing with other post-travel stuff so may not get a chance
to look at this today, but perhaps that gives you some ideas.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 14:39 ` Bart Schaefer
@ 2015-09-10 14:57   ` Axel Beckert
  2015-09-10 15:45     ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Axel Beckert @ 2015-09-10 14:57 UTC (permalink / raw)
  To: zsh-workers

Hi Bart,

On Thu, Sep 10, 2015 at 07:39:20AM -0700, Bart Schaefer wrote:
> Hi Axel, I was traveling on all day on Tuesday and have not caught up
> on all zsh list mail until today.

No offense meant. I wouldn't have sent that "ping" after less than one
week if 5.1.1 wouldn't be said to be that close.

But indeed, you guys do an impressive job on replying promptly to
nearly every issue! So people might start to wonder if a mail has been
overseen or lost in some inbox if there's no reply at all. :-)

> OK, this most likely means that either:
> 
> (1) There is a problem handling multibyte characters in the built-in
> read-commmand widget; or
> 
> (2) There is a problem with using ${PASTED#$KEYS} to remove multi-byte
> characters from the beginning of the pasted text.

(2) looks fine on a first glance:

PASTED=äoö
KEYS=äo
echo ${PASTED#$KEYS}
ö

> I'm still dealing with other post-travel stuff so may not get a chance
> to look at this today, but perhaps that gives you some ideas.

Maybe we should wait with 5.1.1 a few more days...

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 11:17 ` Axel Beckert
@ 2015-09-10 15:26   ` Yuri D'Elia
  0 siblings, 0 replies; 23+ messages in thread
From: Yuri D'Elia @ 2015-09-10 15:26 UTC (permalink / raw)
  To: zsh-workers

On 10/09/15 13:17, Axel Beckert wrote:
> Has nobody an idea what could cause this?

I also get the same.

It seems to be triggered by the undo processing in (while [[ -n $PASTED
]] && zle .read-command; do ...) block, although I don't have time to
analyze what fails exactly.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 14:57   ` Axel Beckert
@ 2015-09-10 15:45     ` Bart Schaefer
  2015-09-10 16:07       ` Peter Stephenson
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 15:45 UTC (permalink / raw)
  To: zsh-workers

On Sep 10,  4:57pm, Axel Beckert wrote:
}
} > (1) There is a problem handling multibyte characters in the built-in
} > read-commmand widget; or
} > 
} > (2) There is a problem with using ${PASTED#$KEYS} to remove multi-byte
} > characters from the beginning of the pasted text.
} 
} (2) looks fine on a first glance:
} 
} PASTED=äoö
} KEYS=äo
} echo ${PASTED#$KEYS}
} ö

Yes, but KEYS is a ZLE local so its content may be escaped or something.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 15:45     ` Bart Schaefer
@ 2015-09-10 16:07       ` Peter Stephenson
  2015-09-10 16:16         ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Stephenson @ 2015-09-10 16:07 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Sep 2015 08:45:16 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 10,  4:57pm, Axel Beckert wrote:
> }
> } > (1) There is a problem handling multibyte characters in the built-in
> } > read-commmand widget; or
> } > 
> } > (2) There is a problem with using ${PASTED#$KEYS} to remove multi-byte
> } > characters from the beginning of the pasted text.
> } 
> } (2) looks fine on a first glance:
> } 
> } PASTED=äoö
> } KEYS=äo
> } echo ${PASTED#$KEYS}
> } ö
> 
> Yes, but KEYS is a ZLE local so its content may be escaped or something.

read-command doesn't explicitly handle multibyte characters in the way
self-insert does.  self-insert gets the remainder of the character.
That doesn't work for you because you want KEYS to be correct before the
self-insert.  KEYS refers to the level where you originally read the
keys, in getkeybuf() / getkeymapcmd(), and at this point you've only got
the initial byte.

So I'm not sure what the fix is.  I think it might be (despite what the
comment in getkeymapcmd() currently says) to move the special stuff down
to the end of getkeymapcmd() and posprocess it into keybuf, which isn't
very nice but, well.  I'll look later on.

Can't really see a good reason for not producing a bug fix version anyway,
though, since there's plenty of stuff needs fixing.

pws


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 16:07       ` Peter Stephenson
@ 2015-09-10 16:16         ` Bart Schaefer
  2015-09-10 16:28           ` Peter Stephenson
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 16:16 UTC (permalink / raw)
  To: zsh-workers

On Sep 10,  5:07pm, Peter Stephenson wrote:
}
} read-command doesn't explicitly handle multibyte characters in the way
} self-insert does.  self-insert gets the remainder of the character.

Hrm.  So the problem is that a multibyte character isn't explicitly
bound to self-insert, rather the first byte is bound to self-insert
and self-insert knows that when it sees that byte it should read more?

Should be possible to handle that in the loop in bracketed-paste-magic
with a test of $KEYS and a call to read -k.  Might try to get to this
later, out of time now ... or someone else can jump in.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 16:16         ` Bart Schaefer
@ 2015-09-10 16:28           ` Peter Stephenson
  2015-09-10 18:57             ` Peter Stephenson
  2015-09-10 19:20             ` Bart Schaefer
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Stephenson @ 2015-09-10 16:28 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Sep 2015 09:16:49 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:

 On Sep 10,  5:07pm, Peter Stephenson wrote:
> }
> } read-command doesn't explicitly handle multibyte characters in the way
> } self-insert does.  self-insert gets the remainder of the character.
> 
> Hrm.  So the problem is that a multibyte character isn't explicitly
> bound to self-insert, rather the first byte is bound to self-insert
> and self-insert knows that when it sees that byte it should read more?

Right.

> Should be possible to handle that in the loop in bracketed-paste-magic
> with a test of $KEYS and a call to read -k.  Might try to get to this
> later, out of time now ... or someone else can jump in.

Not really clear to me what level the "right" fix is, since this was all
designed for a pre-multibyte world... I don't think we have a test for a
valid/invalid multibyte character at the shell level currently, do we?
Wouldn't be hard to add [[:INCOMPLETE::]] or [[:INVALID:]] to the
pattern code, but that's an extra step...

pws


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 16:28           ` Peter Stephenson
@ 2015-09-10 18:57             ` Peter Stephenson
  2015-09-10 19:35               ` Peter Stephenson
  2015-09-11 21:53               ` Daniel Shahaf
  2015-09-10 19:20             ` Bart Schaefer
  1 sibling, 2 replies; 23+ messages in thread
From: Peter Stephenson @ 2015-09-10 18:57 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

On Thu, 10 Sep 2015 17:28:40 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Wouldn't be hard to add [[:INCOMPLETE::]] or [[:INVALID:]] to the
> pattern code, but that's an extra step...

Easy to write, though slightly less convenient to use than you might
hope.  The point is that we treat invalid and incomplete characters byte
by byte, so you can guarantee to detect [[:INCOMPLETE:]] as the first
byte, but you can't in general guarantee how the rest will be treated,
particularly since we don't insist multibyte means UTF-8.  So
[[:INCOMPLETE:]]* is about the best you can do to determine your
sequence is incomplete.  But in general you can't be sure the sequence
is ever going to be complete anyway, so this isn't so much of a
limitation, and I've documented it.

Now should be possible to do more in shell code...

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index d44b40a..de12c85 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1956,6 +1956,20 @@ ifzman(the zmanref(zshparam) manual page)\
 ifnzman(noderef(Parameters Used By The Shell))\
 .
 )
+item(tt([:INCOMPLETE:]))(
+Matches a byte that starts an incomplete multibyte character.
+Note that there may be a sequence of more than one bytes that
+taken together form the prefix of a multibyte character.  To
+test for a potentially incomplete byte sequence, use the pattern
+`tt([[:INCOMPLETE:]]*)'.  This will never match a sequence starting
+with a valid multibyte character.
+)
+item(tt([:INVALID:]))(
+Matches a byte that does not start a valid multibyte character.
+Note this may be a continuation byte of an incomplete multibyte
+character as any part of a multibyte string consisting of invalid and
+incomplete multibyte characters is treated as single bytes.
+)
 item(tt([:WORD:]))(
 The character is treated as part of a word; this test is sensitive
 to the value of the tt(WORDCHARS) parameter
diff --git a/Src/Zle/comp.h b/Src/Zle/comp.h
index 34da2ca..023c418 100644
--- a/Src/Zle/comp.h
+++ b/Src/Zle/comp.h
@@ -202,8 +202,9 @@ struct cpattern {
  * TODO: this will change.
  */
 #ifdef MULTIBYTE_SUPPORT
-#define PATMATCHRANGE(r, c, ip, mtp)	mb_patmatchrange(r, c, ip, mtp)
-#define PATMATCHINDEX(r, i, cp, mtp)	mb_patmatchindex(r, i, cp, mtp)
+#define PATMATCHRANGE(r, c, ip, mtp)		\
+    mb_patmatchrange(r, c, ZMB_VALID, ip, mtp)
+#define PATMATCHINDEX(r, i, cp, mtp)    mb_patmatchindex(r, i, cp, mtp)
 #define CONVCAST(c)			((wchar_t)(c))
 #define CHR_INVALID			(WEOF)
 #else
diff --git a/Src/pattern.c b/Src/pattern.c
index b4ba33e..3b55ccf 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -145,7 +145,7 @@ typedef union upat *Upat;
  *
  *  P_ANY, P_ANYOF:  the operand is a null terminated
  *    string.  Normal characters match as expected.  Characters
- *    in the range Meta+PP_ALPHA..Meta+PP_UNKNWN do the appropriate
+ *    in the range Meta+PP_ALPHA..Meta+PP_UNKWN do the appropriate
  *    Posix range tests.  This relies on imeta returning true for these
  *    characters.  We treat unknown POSIX ranges as never matching.
  *    PP_RANGE means the next two (possibly metafied) characters form
@@ -1119,7 +1119,7 @@ patgetglobflags(char **strp, long *assertp, int *ignore)
 static const char *colon_stuffs[]  = {
     "alpha", "alnum", "ascii", "blank", "cntrl", "digit", "graph", 
     "lower", "print", "punct", "space", "upper", "xdigit", "IDENT",
-    "IFS", "IFSSPACE", "WORD", NULL
+    "IFS", "IFSSPACE", "WORD", "INCOMPLETE", "INVALID", NULL
 };
 
 /*
@@ -1870,9 +1870,9 @@ static int globdots;			/* Glob initial dots? */
 #ifdef MULTIBYTE_SUPPORT
 
 /* Get a character from the start point in a string */
-#define CHARREF(x, y)	charref((x), (y))
+#define CHARREF(x, y)	charref((x), (y), (int *)NULL)
 static wchar_t
-charref(char *x, char *y)
+charref(char *x, char *y, int *zmb_ind)
 {
     wchar_t wc;
     size_t ret;
@@ -1886,9 +1886,13 @@ charref(char *x, char *y)
 	/* Error. */
 	/* Reset the shift state for next time. */
 	memset(&shiftstate, 0, sizeof(shiftstate));
+	if (zmb_ind)
+	    *zmb_ind = (ret == MB_INVALID) ? ZMB_INVALID : ZMB_INCOMPLETE;
 	return WCHAR_INVALID(*x);
     }
 
+    if (zmb_ind)
+	*zmb_ind = ZMB_VALID;
     return wc;
 }
 
@@ -2580,10 +2584,11 @@ patmatch(Upat prog)
 		fail = 1;
 	    else {
 #ifdef MULTIBYTE_SUPPORT
-		wchar_t cr = CHARREF(patinput, patinend);
+		int zmb_ind;
+		wchar_t cr = charref(patinput, patinend, &zmb_ind);
 		char *scanop = (char *)P_OPERAND(scan);
 		if (patglobflags & GF_MULTIBYTE) {
-		    if (mb_patmatchrange(scanop, cr, NULL, NULL) ^
+		    if (mb_patmatchrange(scanop, cr, zmb_ind, NULL, NULL) ^
 			(P_OP(scan) == P_ANYOF))
 			fail = 1;
 		    else
@@ -3351,6 +3356,9 @@ patmatch(Upat prog)
  * The null-terminated specification is in range; the test
  * character is in ch.
  *
+ * zmb is one of the enum defined above charref(), for indicating
+ * incomplete or invalid multibyte characters.
+ *
  * indptr is used by completion matching, which is why this
  * function is exported.  If indptr is not NULL we set *indptr
  * to the index of the character in the range string, adjusted
@@ -3367,7 +3375,7 @@ patmatch(Upat prog)
 
 /**/
 mod_export int
-mb_patmatchrange(char *range, wchar_t ch, wint_t *indptr, int *mtp)
+mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 {
     wchar_t r1, r2;
 
@@ -3476,6 +3484,14 @@ mb_patmatchrange(char *range, wchar_t ch, wint_t *indptr, int *mtp)
 		    *indptr += r2 - r1;
 		}
 		break;
+	    case PP_INCOMPLETE:
+		if (zmb_ind == ZMB_INCOMPLETE)
+		    return 1;
+		break;
+	    case PP_INVALID:
+		if (zmb_ind == ZMB_INVALID)
+		    return 1;
+		break;
 	    case PP_UNKWN:
 		DPUTS(1, "BUG: unknown posix range passed through.\n");
 		break;
@@ -3545,6 +3561,8 @@ mb_patmatchindex(char *range, wint_t ind, wint_t *chr, int *mtp)
 	    case PP_IFS:
 	    case PP_IFSSPACE:
 	    case PP_WORD:
+	    case PP_INCOMPLETE:
+	    case PP_INVALID:
 		if (!ind) {
 		    *mtp = swtype;
 		    return 1;
@@ -3698,6 +3716,10 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		if (indptr && r1 < r2)
 		    *indptr += r2 - r1;
 		break;
+	    case PP_INCOMPLETE:
+	    case PP_INVALID:
+		/* Never true if not in multibyte mode */
+		break;
 	    case PP_UNKWN:
 		DPUTS(1, "BUG: unknown posix range passed through.\n");
 		break;
@@ -3768,6 +3790,8 @@ patmatchindex(char *range, int ind, int *chr, int *mtp)
 	    case PP_IFS:
 	    case PP_IFSSPACE:
 	    case PP_WORD:
+	    case PP_INCOMPLETE:
+	    case PP_INVALID:
 		if (!ind) {
 		    *mtp = swtype;
 		    return 1;
@@ -3851,9 +3875,10 @@ static int patrepeat(Upat p, char *charstart)
     case P_ANYBUT:
 	while (scan < patinend) {
 #ifdef MULTIBYTE_SUPPORT
-	    wchar_t cr = CHARREF(scan, patinend);
+	    int zmb_ind;
+	    wchar_t cr = charref(scan, patinend, &zmb_ind);
 	    if (patglobflags & GF_MULTIBYTE) {
-		if (mb_patmatchrange(opnd, cr, NULL, NULL) ^
+		if (mb_patmatchrange(opnd, cr, zmb_ind, NULL, NULL) ^
 		    (P_OP(p) == P_ANYOF))
 		    break;
 	    } else if (patmatchrange(opnd, (int)cr, NULL, NULL) ^
diff --git a/Src/zsh.h b/Src/zsh.h
index a99c900..4e2cb65 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1562,13 +1562,15 @@ typedef struct zpc_disables_save *Zpc_disables_save;
 #define PP_IFS    15
 #define PP_IFSSPACE   16
 #define PP_WORD   17
+#define PP_INCOMPLETE 18
+#define PP_INVALID 19
 /* Special value for last definition */
-#define PP_LAST   17
+#define PP_LAST   19
 
 /* Unknown type.  Not used in a valid token. */
-#define PP_UNKWN  18
+#define PP_UNKWN  20
 /* Range: token followed by the (possibly multibyte) start and end */
-#define PP_RANGE  19
+#define PP_RANGE  21
 
 /* Globbing flags: lower 8 bits gives approx count */
 #define GF_LCMATCHUC	0x0100
@@ -1577,6 +1579,15 @@ typedef struct zpc_disables_save *Zpc_disables_save;
 #define GF_MATCHREF	0x0800
 #define GF_MULTIBYTE	0x1000	/* Use multibyte if supported by build */
 
+enum {
+    /* Valid multibyte character from charref */
+    ZMB_VALID,
+    /* Incomplete multibyte character from charref */
+    ZMB_INCOMPLETE,
+    /* Invalid multibyte character charref */
+    ZMB_INVALID
+};
+
 /* Dummy Patprog pointers. Used mainly in executable code, but the
  * pattern code needs to know about it, too. */
 
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 3fadd80..ace191f 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -525,3 +525,9 @@
     fi
   done
 0:Invalid characters in pattern matching
+
+  [[ $'\xe3' == [[:INCOMPLETE:]] ]] || print fail 1
+  [[ $'\xe3\x83' == [[:INCOMPLETE:]][[:INVALID:]] ]] || print fail 2
+  [[ $'\xe3\x83\x9b' != [[:INCOMPLETE:][:NVALID:]] ]] || print fail 3
+  [[ $'\xe3\x83\x9b' = ? ]] || print fail 4
+0:Testing incomplete and invalid multibyte character components


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 16:28           ` Peter Stephenson
  2015-09-10 18:57             ` Peter Stephenson
@ 2015-09-10 19:20             ` Bart Schaefer
  2015-09-10 19:29               ` Bart Schaefer
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 19:20 UTC (permalink / raw)
  To: zsh-workers

On Sep 10,  5:28pm, Peter Stephenson wrote:
} Subject: Re: Pasting UTF-8 characters with bracketed-paste-magic seems bro
}
} On Thu, 10 Sep 2015 09:16:49 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} 
} > Should be possible to handle that in the loop in bracketed-paste-magic
} > with a test of $KEYS and a call to read -k.  Might try to get to this
} > later, out of time now ... or someone else can jump in.
} 
} Not really clear to me what level the "right" fix is, since this was all
} designed for a pre-multibyte world... I don't think we have a test for a
} valid/invalid multibyte character at the shell level currently, do we?

We can probably get by with some variant of this:

    read-multibyte() {
	local K=$KEYS[-1]
	if (( ##K & 0x80 )); then
	    if (( ##K & 0xe0 == 0xc0 )); then
		read -k 1 K
	    elif (( ##K & 0xf0 == 0xe0 )); then
		read -k 2 K
	    elif (( ##K & 0xf8 == 0xf0 )); then
		read -k 3 K
	    fi
	    KEYS+="$K"
	fi
    }

    while zle .read-command; do
	read-multibyte
	# etc.
    done

Except "read -k" is supposed to read a full multibyte character, so I
don't know if it behaves consistently when invoked in the middle of one.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 19:20             ` Bart Schaefer
@ 2015-09-10 19:29               ` Bart Schaefer
  2015-09-10 19:53                 ` Peter Stephenson
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 19:29 UTC (permalink / raw)
  To: zsh-workers

On Sep 10, 12:20pm, Bart Schaefer wrote:
}
} We can probably get by with some variant of this:
} 
}     while zle .read-command; do
} 	read-multibyte
} 	# etc.
}     done

Oh, but "read -k" is always going to read from the terminal?  It does
not use the "zle -U" pushback?  In which case we'd have to loop on
"zle .read-command", even with the [[:INCOMPLETE:]] pattern.

Urk.  I think read-command is going to have to know multibyte ...?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 18:57             ` Peter Stephenson
@ 2015-09-10 19:35               ` Peter Stephenson
  2015-09-10 23:24                 ` Bart Schaefer
  2015-09-11 21:53               ` Daniel Shahaf
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Stephenson @ 2015-09-10 19:35 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Sep 2015 19:57:13 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> Now should be possible to do more in shell code...

This seems to be getting somewhere, but possibly needs more expert
examination...

I suppose I'd better leave making a build till tomorrow.

pws

diff --git a/Functions/Zle/bracketed-paste-magic b/Functions/Zle/bracketed-paste-magic
index 49f4b66..464c6b3 100644
--- a/Functions/Zle/bracketed-paste-magic
+++ b/Functions/Zle/bracketed-paste-magic
@@ -164,17 +164,25 @@ bracketed-paste-magic() {
 	integer bpm_limit=$UNDO_LIMIT_NO bpm_undo=$UNDO_CHANGE_NO
 	UNDO_LIMIT_NO=$UNDO_CHANGE_NO
 
+	local mbchar
+	integer ismb
 	while [[ -n $PASTED ]] && zle .read-command; do
-	    PASTED=${PASTED#$KEYS}
-	    if [[ $KEYS = ${(~j:|:)${(b)bpm_inactive}} ]]; then
-		zle .self-insert-unmeta
+	    mbchar=$KEYS
+	    ismb=0
+	    while [[ $mbchar = [[:INCOMPLETE:]]* ]] && zle .read-command; do
+		mbchar+=$KEYS
+		ismb=1
+	    done
+	    PASTED=${PASTED#$mbchar}
+	    if [[ ismb -ne 0 || $mbchar = ${(~j:|:)${(b)bpm_inactive}} ]]; then
+		LBUFFER+=$mbchar
 	    else
 		case $REPLY in
 		    (${~bpm_active}) function () {
 			emulate -L $bpm_emulate; set -$bpm_opts
 			zle $REPLY
 		    };;
-		    (*) zle .self-insert-unmeta;;
+		    (*) LBUFFER+=$mbchar;
 		esac
 	    fi
 	done


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 19:29               ` Bart Schaefer
@ 2015-09-10 19:53                 ` Peter Stephenson
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Stephenson @ 2015-09-10 19:53 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On Thu, 10 Sep 2015 12:29:53 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Oh, but "read -k" is always going to read from the terminal?  It does
> not use the "zle -U" pushback?  In which case we'd have to loop on
> "zle .read-command", even with the [[:INCOMPLETE:]] pattern.
> 
> Urk.  I think read-command is going to have to know multibyte ...?

I don't think that's a problem.  The whole shell is built around the
requirement that multibyte characters are an 8-bit extension of ASCII,
else it would need rewriting from the ground up.  So, even if it's not
UTF-8, the chraracter set needs to have the property that bytes in a
multibyte character are not ASCII characters, or to put it another way
every byte is equivalent as far as .read-command is concerned.  So I
think the code I posted using [[:INCOMPLETE:]] should be the core of a
reasonable solution.

Note that if we do upgrade read-command, it's future proof, since then
we never get something starting with [[:INCOMPLETE:]] (though it may in
principle start with [[:INVALID;]]).

pws



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 19:35               ` Peter Stephenson
@ 2015-09-10 23:24                 ` Bart Schaefer
  2015-09-11  8:10                   ` Peter Stephenson
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-10 23:24 UTC (permalink / raw)
  To: zsh-workers

On Sep 10,  8:35pm, Peter Stephenson wrote:
}
} This seems to be getting somewhere, but possibly needs more expert
} examination...

This diff looks OK, though I'm not sure why you need $ismb ?  Is it just
a shortcut?  (i.e., because if the multibyte character had been bound to
something, then .read-command would have consumed it as one $KEYS, so
we know it can't match $bpm_inactive)

Thinking harder, though ... isn't the only reason that this gets "stuck"
because "zle .self-insert-unmeta" blocks waiting for more bytes?  Do we
really need the loop-within-a-loop given LBUFFER+= instead?  Perhaps
because the trailing part of the multibyte character might be bound to
an active widget?

I keep coming back to how much cleaner this would be if .read-command
consumed a whole character.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 23:24                 ` Bart Schaefer
@ 2015-09-11  8:10                   ` Peter Stephenson
  2015-09-11  9:42                     ` Axel Beckert
  2015-09-11 15:33                     ` Bart Schaefer
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Stephenson @ 2015-09-11  8:10 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Sep 2015 16:24:15 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 10,  8:35pm, Peter Stephenson wrote:
> }
> } This seems to be getting somewhere, but possibly needs more expert
> } examination...
> 
> This diff looks OK, though I'm not sure why you need $ismb ?  Is it just
> a shortcut?  (i.e., because if the multibyte character had been bound to
> something, then .read-command would have consumed it as one $KEYS, so
> we know it can't match $bpm_inactive)

It didn't work otherwise.  I don't know what the pattern match was doing
that would cause it to go down the wrong way.

> Thinking harder, though ... isn't the only reason that this gets "stuck"
> because "zle .self-insert-unmeta" blocks waiting for more bytes?  Do we
> really need the loop-within-a-loop given LBUFFER+= instead?  Perhaps
> because the trailing part of the multibyte character might be bound to
> an active widget?

That's correct about self-insert, but if we don't have the loop, I'm not
sure we've done enough analysis to decide between executing a command
and putting stuff in the buffer.  But feel free to experiment as you
know the sort of things that should work.

I should say I don't really have any interest in this beyond getting it
basically working as a proof of concept --- if you want to tweak it
further, go ahead.  An elegant fix to the internals of the line editor
can wait.

pws


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11  8:10                   ` Peter Stephenson
@ 2015-09-11  9:42                     ` Axel Beckert
  2015-09-11 15:33                     ` Bart Schaefer
  1 sibling, 0 replies; 23+ messages in thread
From: Axel Beckert @ 2015-09-11  9:42 UTC (permalink / raw)
  To: zsh-workers

Hi Peter,

On Fri, Sep 11, 2015 at 09:10:31AM +0100, Peter Stephenson wrote:
> I should say I don't really have any interest in this beyond getting it
> basically working as a proof of concept

Oh, ok. I expected that this affects quite some people. Maybe I was
wrong. So thanks for having cared anyways!

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11  8:10                   ` Peter Stephenson
  2015-09-11  9:42                     ` Axel Beckert
@ 2015-09-11 15:33                     ` Bart Schaefer
  2015-09-11 17:41                       ` Peter Stephenson
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-11 15:33 UTC (permalink / raw)
  To: zsh-workers

On Sep 11,  9:10am, Peter Stephenson wrote:
}
} I should say I don't really have any interest in this beyond getting it
} basically working as a proof of concept

Given this, if your patch from 36483 seems to work for you, go ahead and
commit it so you can do the 5.1.1 release.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11 15:33                     ` Bart Schaefer
@ 2015-09-11 17:41                       ` Peter Stephenson
  2015-09-11 19:22                         ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Stephenson @ 2015-09-11 17:41 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On Fri, 11 Sep 2015 08:33:44 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 11,  9:10am, Peter Stephenson wrote:
> }
> } I should say I don't really have any interest in this beyond getting it
> } basically working as a proof of concept
> 
> Given this, if your patch from 36483 seems to work for you, go ahead and
> commit it so you can do the 5.1.1 release.

I simply tried it with a few simple accented latin characters and it
seemed to work.  It doesn't sound like it's had much testing otherwise.
I've committed it.

pws


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11 17:41                       ` Peter Stephenson
@ 2015-09-11 19:22                         ` Bart Schaefer
  2015-09-11 19:49                           ` Axel Beckert
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2015-09-11 19:22 UTC (permalink / raw)
  To: zsh-workers

On Sep 11,  6:41pm, Peter Stephenson wrote:
}
} > Given this, if your patch from 36483 seems to work for you, go ahead and
} > commit it so you can do the 5.1.1 release.
} 
} I simply tried it with a few simple accented latin characters and it
} seemed to work.  It doesn't sound like it's had much testing otherwise.
} I've committed it.

Now that it's out:

(1) someone who was experiencing the multibyte problem should try it again

(2) it should be tested for the case where a multibyte character is bound
to a widget that has been declared in active-widgets

I'm actually pretty sure this is still broken for obscure cases, but in
fact the whole scheme in which multi-byte recognition is buried inside
the self-insert widget is probaby also broken for those same cases even
when simply typing.  E.g., if I override the self-insert widget with my
own, my widget is only going to get passed in $KEYS the first byte of
any multibyte character.  (Just verified this with 4.3.17.)

Compare:

  shove-in-LBUFFER() { LBUFFER+=$KEYS }
  zle -N self-insert shove-in-LBUFFER

Against:

  call-self-insert() { zle .self-insert }
  zle -N self-insert call-self-insert

With shove-in-LBUFFER, pasting a three-byte multibyte character results
in interpretation as three separate characters.  With call-self-insert,
the trailing bytes are picked up by getrestchar() and the insert works.

So we're no worse off than before, but this needs re-thinking.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11 19:22                         ` Bart Schaefer
@ 2015-09-11 19:49                           ` Axel Beckert
  2015-09-11 20:04                             ` Axel Beckert
  0 siblings, 1 reply; 23+ messages in thread
From: Axel Beckert @ 2015-09-11 19:49 UTC (permalink / raw)
  To: zsh-workers

Hi,

On Fri, Sep 11, 2015 at 12:22:59PM -0700, Bart Schaefer wrote:
> } > Given this, if your patch from 36483 seems to work for you, go ahead and
> } > commit it so you can do the 5.1.1 release.
> } 
> } I simply tried it with a few simple accented latin characters and it
> } seemed to work.  It doesn't sound like it's had much testing otherwise.
> } I've committed it.
> 
> Now that it's out:
> 
> (1) someone who was experiencing the multibyte problem should try it again

Yep, will do so within the next few hours.

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-11 19:49                           ` Axel Beckert
@ 2015-09-11 20:04                             ` Axel Beckert
  0 siblings, 0 replies; 23+ messages in thread
From: Axel Beckert @ 2015-09-11 20:04 UTC (permalink / raw)
  To: zsh-workers

Hi,

On Fri, Sep 11, 2015 at 09:49:30PM +0200, Axel Beckert wrote:
> On Fri, Sep 11, 2015 at 12:22:59PM -0700, Bart Schaefer wrote:
> > } > Given this, if your patch from 36483 seems to work for you, go ahead and
> > } > commit it so you can do the 5.1.1 release.
> > } 
> > } I simply tried it with a few simple accented latin characters and it
> > } seemed to work.  It doesn't sound like it's had much testing otherwise.
> > } I've committed it.
> > 
> > Now that it's out:
> > 
> > (1) someone who was experiencing the multibyte problem should try it again
> 
> Yep, will do so within the next few hours.

Yep, works for me! *beinghappy* :-)

Thanks Peter and Bart!

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://abe.noone.org/ (Web)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1
  2015-09-10 18:57             ` Peter Stephenson
  2015-09-10 19:35               ` Peter Stephenson
@ 2015-09-11 21:53               ` Daniel Shahaf
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Shahaf @ 2015-09-11 21:53 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Peter Stephenson, zsh-workers

Peter Stephenson wrote on Thu, Sep 10, 2015 at 19:57:13 +0100:
> +  [[ $'\xe3' == [[:INCOMPLETE:]] ]] || print fail 1
> +  [[ $'\xe3\x83' == [[:INCOMPLETE:]][[:INVALID:]] ]] || print fail 2
> +  [[ $'\xe3\x83\x9b' != [[:INCOMPLETE:][:NVALID:]] ]] || print fail 3

Typo: s/NVALID/INVALID/

> +  [[ $'\xe3\x83\x9b' = ? ]] || print fail 4
> +0:Testing incomplete and invalid multibyte character components




^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-09-11 21:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-06 15:57 Pasting UTF-8 characters with bracketed-paste-magic seems broken in 5.1 Axel Beckert
2015-09-10 11:17 ` Axel Beckert
2015-09-10 15:26   ` Yuri D'Elia
2015-09-10 14:39 ` Bart Schaefer
2015-09-10 14:57   ` Axel Beckert
2015-09-10 15:45     ` Bart Schaefer
2015-09-10 16:07       ` Peter Stephenson
2015-09-10 16:16         ` Bart Schaefer
2015-09-10 16:28           ` Peter Stephenson
2015-09-10 18:57             ` Peter Stephenson
2015-09-10 19:35               ` Peter Stephenson
2015-09-10 23:24                 ` Bart Schaefer
2015-09-11  8:10                   ` Peter Stephenson
2015-09-11  9:42                     ` Axel Beckert
2015-09-11 15:33                     ` Bart Schaefer
2015-09-11 17:41                       ` Peter Stephenson
2015-09-11 19:22                         ` Bart Schaefer
2015-09-11 19:49                           ` Axel Beckert
2015-09-11 20:04                             ` Axel Beckert
2015-09-11 21:53               ` Daniel Shahaf
2015-09-10 19:20             ` Bart Schaefer
2015-09-10 19:29               ` Bart Schaefer
2015-09-10 19:53                 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).