zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] fix mulibyte input/mbstate_t problem
@ 2005-02-22 21:20 Andrey Borzenkov
  2005-02-23 10:51 ` Peter Stephenson
  0 siblings, 1 reply; 3+ messages in thread
From: Andrey Borzenkov @ 2005-02-22 21:20 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

Attached patch fixes multibyte input (verified with UTF-8). As it turns out, 
mbstate_t works quite differently from expectation :)

The patch makes it static (with implicit initialization). It is fundamentally 
wrong to reinitialize it every time. mbstate_t is a function of all preceding 
input; for shift state encoding it will also keep current shift state among 
other things. It also means that in the long run every input must have own 
mbstate_t which is initialized when stream is first opened. We need one 
mbstate_t for zle.

It also has small fixes in zsh_utils.

Editing Russian is funny; "echo xxxx" outputs correct text but during line 
editing display is wrong (it counts every UTF-8 as 2 screen characters).

BTW calling getbyte from getsrestchar resets lastchar_wide_valid.

-andrey

PS am I the only one to have problems with SourceForge ssh CVS? It does not 
hang completely but it is painfully slow.

[-- Attachment #2: mbrtowc.diff --]
[-- Type: text/x-diff, Size: 2378 bytes --]

Index: Src/Zle/zle_main.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_main.c,v
retrieving revision 1.60
diff -u -p -r1.60 zle_main.c
--- Src/Zle/zle_main.c	22 Feb 2005 13:13:05 -0000	1.60
+++ Src/Zle/zle_main.c	22 Feb 2005 21:01:07 -0000
@@ -749,10 +749,10 @@ mod_export ZLE_INT_T
 getrestchar(int inchar)
 {
     /* char cnull = '\0'; */
-    char buf[MB_CUR_MAX], *ptr;
+    char c = inchar;
     wchar_t outchar;
     int ret;
-    mbstate_t ps;
+    static mbstate_t ps;
 
     /*
      * We are guaranteed to set a valid wide last character,
@@ -764,28 +764,23 @@ getrestchar(int inchar)
     if (inchar == EOF)
 	return lastchar_wide = WEOF;
 
-    /* reset shift state by converting null */
-    /* mbrtowc(&outchar, &cnull, 1, &ps); */
-    memset (&ps, '\0', sizeof (ps));
-
-    ptr = buf;
-    *ptr++ = inchar;
     /*
      * Return may be zero if we have a NULL; handle this like
      * any other character.
      */
-    while ((ret = mbrtowc(&outchar, buf, ptr - buf, &ps)) < 0) {
+    while ((ret = mbrtowc(&outchar, &c, 1, &ps)) < 0) {
 	if (ret == -1) {
 	    /*
 	     * Invalid input.  Hmm, what's the right thing to do here?
 	     */
 	    return lastchar_wide = WEOF;
 	}
+
 	/* No timeout here as we really need the character. */
 	inchar = getbyte(0);
 	if (inchar == EOF)
 	    return lastchar_wide = WEOF;
-	*ptr++ = inchar;
+	c = inchar;
     }
     return lastchar_wide = (ZLE_INT_T)outchar;
 }
Index: Src/Zle/zle_utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_utils.c,v
retrieving revision 1.19
diff -u -p -r1.19 zle_utils.c
--- Src/Zle/zle_utils.c	22 Feb 2005 13:13:08 -0000	1.19
+++ Src/Zle/zle_utils.c	22 Feb 2005 21:01:07 -0000
@@ -116,8 +116,8 @@ zlelineasstring(ZLE_STRING_T instr, int 
 
     s = zalloc(inll * MB_CUR_MAX + 1);
 
-    for(i=0; i < inll; i++) {
-	if (outcs != NULL && i == incs)
+    for(i=0; i < inll; i++, incs--) {
+	if (outcs != NULL && incs == 0)
 	    *outcs = mb_len;
 	j = wctomb(s + mb_len, instr[i]);
 	if (j == -1) {
@@ -206,7 +206,7 @@ stringaszleline(unsigned char *instr, in
 	wchar_t *outptr = outstr;
 
 	/* mbrtowc(outstr, &cnull, 1, &ps); */
-	memset(&ps, \0, sizeof(ps));
+	memset(&ps, '\0', sizeof(ps));
 
 	while (ll) {
 	    size_t ret = mbrtowc(outptr, inptr, ll, &ps);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fix mulibyte input/mbstate_t problem
  2005-02-22 21:20 [PATCH] fix mulibyte input/mbstate_t problem Andrey Borzenkov
@ 2005-02-23 10:51 ` Peter Stephenson
  2005-02-23 11:11   ` Andrey Borzenkov
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Stephenson @ 2005-02-23 10:51 UTC (permalink / raw)
  To: zsh-workers

Andrey Borzenkov wrote:
> Attached patch fixes multibyte input (verified with UTF-8). As it turns out, 
> mbstate_t works quite differently from expectation :)
> 
> The patch makes it static (with implicit initialization). It is fundamentally
> wrong to reinitialize it every time. mbstate_t is a function of all preceding
> input; for shift state encoding it will also keep current shift state among 
> other things. It also means that in the long run every input must have own 
> mbstate_t which is initialized when stream is first opened. We need one 
> mbstate_t for zle.

It's not quite so clear in our case.  We're not just inputting a stream
of characters, we've got editing characters in between.  However,
there's not a lot we can do apart from making consecutively typed
characters work properly, so what you say is the best bet.

We do probably need to reset the shift state each time zleread() is
called to get a complete line, however, so the static in getrestchar()
probably needs to be a global somewhere.

Presumably also leaving the mbrtowc in stringaszleline separate, as it
is at the moment, is also the right thing to do.

> Editing Russian is funny; "echo xxxx" outputs correct text but during line 
> editing display is wrong (it counts every UTF-8 as 2 screen characters).

If it works otherwise, e.g. the cursor moves over the right number of
characters (but not screen columns), I guess that's something in
zle_refresh.c.  We're not using wcwidth, yet, but it doesn't sound like
that's the problem since it should assume a single column per character.

> BTW calling getbyte from getsrestchar resets lastchar_wide_valid.

Aha, that's probably what Bart meant by "are we setting
lastchar_wide_valid too early", which I misinterpreted.  I'll fix it in
the patch I'm doing.

> PS am I the only one to have problems with SourceForge ssh CVS? It does not 
> hang completely but it is painfully slow.

I've been having that for a day or two, too.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fix mulibyte input/mbstate_t problem
  2005-02-23 10:51 ` Peter Stephenson
@ 2005-02-23 11:11   ` Andrey Borzenkov
  0 siblings, 0 replies; 3+ messages in thread
From: Andrey Borzenkov @ 2005-02-23 11:11 UTC (permalink / raw)
  To: zsh-workers

On Wednesday 23 February 2005 13:51, Peter Stephenson wrote:
> We do probably need to reset the shift state each time zleread() is
> called to get a complete line, however, so the static in getrestchar()
> probably needs to be a global somewhere.
>

Yes, I intend make getrestchar to accept mbstate as parameter at some point.

> Presumably also leaving the mbrtowc in stringaszleline separate, as it
> is at the moment, is also the right thing to do.
>
> > Editing Russian is funny; "echo xxxx" outputs correct text but during
> > line editing display is wrong (it counts every UTF-8 as 2 screen
> > characters).
>
> If it works otherwise, e.g. the cursor moves over the right number of
> characters (but not screen columns), I guess that's something in
> zle_refresh.c.  We're not using wcwidth, yet, but it doesn't sound like
> that's the problem since it should assume a single column per character.
>

I am rewriting it to use wide char. the problem was that cursor position was 
assumed equal to string position which was wrong for multibyte characters. 
Having refresh to deal consistently with wide chars simplifies everthing a 
lot.

it still does not wcwidth, currently everything is assumed to be exactly 1 
char.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-02-23 11:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-22 21:20 [PATCH] fix mulibyte input/mbstate_t problem Andrey Borzenkov
2005-02-23 10:51 ` Peter Stephenson
2005-02-23 11:11   ` Andrey Borzenkov

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).