* [PATCH] fix mulibyte input/mbstate_t problem
@ 2005-02-22 21:20 Andrey Borzenkov
2005-02-23 10:51 ` Peter Stephenson
0 siblings, 1 reply; 3+ messages in thread
From: Andrey Borzenkov @ 2005-02-22 21:20 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 921 bytes --]
Attached patch fixes multibyte input (verified with UTF-8). As it turns out,
mbstate_t works quite differently from expectation :)
The patch makes it static (with implicit initialization). It is fundamentally
wrong to reinitialize it every time. mbstate_t is a function of all preceding
input; for shift state encoding it will also keep current shift state among
other things. It also means that in the long run every input must have own
mbstate_t which is initialized when stream is first opened. We need one
mbstate_t for zle.
It also has small fixes in zsh_utils.
Editing Russian is funny; "echo xxxx" outputs correct text but during line
editing display is wrong (it counts every UTF-8 as 2 screen characters).
BTW calling getbyte from getsrestchar resets lastchar_wide_valid.
-andrey
PS am I the only one to have problems with SourceForge ssh CVS? It does not
hang completely but it is painfully slow.
[-- Attachment #2: mbrtowc.diff --]
[-- Type: text/x-diff, Size: 2378 bytes --]
Index: Src/Zle/zle_main.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_main.c,v
retrieving revision 1.60
diff -u -p -r1.60 zle_main.c
--- Src/Zle/zle_main.c 22 Feb 2005 13:13:05 -0000 1.60
+++ Src/Zle/zle_main.c 22 Feb 2005 21:01:07 -0000
@@ -749,10 +749,10 @@ mod_export ZLE_INT_T
getrestchar(int inchar)
{
/* char cnull = '\0'; */
- char buf[MB_CUR_MAX], *ptr;
+ char c = inchar;
wchar_t outchar;
int ret;
- mbstate_t ps;
+ static mbstate_t ps;
/*
* We are guaranteed to set a valid wide last character,
@@ -764,28 +764,23 @@ getrestchar(int inchar)
if (inchar == EOF)
return lastchar_wide = WEOF;
- /* reset shift state by converting null */
- /* mbrtowc(&outchar, &cnull, 1, &ps); */
- memset (&ps, '\0', sizeof (ps));
-
- ptr = buf;
- *ptr++ = inchar;
/*
* Return may be zero if we have a NULL; handle this like
* any other character.
*/
- while ((ret = mbrtowc(&outchar, buf, ptr - buf, &ps)) < 0) {
+ while ((ret = mbrtowc(&outchar, &c, 1, &ps)) < 0) {
if (ret == -1) {
/*
* Invalid input. Hmm, what's the right thing to do here?
*/
return lastchar_wide = WEOF;
}
+
/* No timeout here as we really need the character. */
inchar = getbyte(0);
if (inchar == EOF)
return lastchar_wide = WEOF;
- *ptr++ = inchar;
+ c = inchar;
}
return lastchar_wide = (ZLE_INT_T)outchar;
}
Index: Src/Zle/zle_utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_utils.c,v
retrieving revision 1.19
diff -u -p -r1.19 zle_utils.c
--- Src/Zle/zle_utils.c 22 Feb 2005 13:13:08 -0000 1.19
+++ Src/Zle/zle_utils.c 22 Feb 2005 21:01:07 -0000
@@ -116,8 +116,8 @@ zlelineasstring(ZLE_STRING_T instr, int
s = zalloc(inll * MB_CUR_MAX + 1);
- for(i=0; i < inll; i++) {
- if (outcs != NULL && i == incs)
+ for(i=0; i < inll; i++, incs--) {
+ if (outcs != NULL && incs == 0)
*outcs = mb_len;
j = wctomb(s + mb_len, instr[i]);
if (j == -1) {
@@ -206,7 +206,7 @@ stringaszleline(unsigned char *instr, in
wchar_t *outptr = outstr;
/* mbrtowc(outstr, &cnull, 1, &ps); */
- memset(&ps, \0, sizeof(ps));
+ memset(&ps, '\0', sizeof(ps));
while (ll) {
size_t ret = mbrtowc(outptr, inptr, ll, &ps);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] fix mulibyte input/mbstate_t problem
2005-02-22 21:20 [PATCH] fix mulibyte input/mbstate_t problem Andrey Borzenkov
@ 2005-02-23 10:51 ` Peter Stephenson
2005-02-23 11:11 ` Andrey Borzenkov
0 siblings, 1 reply; 3+ messages in thread
From: Peter Stephenson @ 2005-02-23 10:51 UTC (permalink / raw)
To: zsh-workers
Andrey Borzenkov wrote:
> Attached patch fixes multibyte input (verified with UTF-8). As it turns out,
> mbstate_t works quite differently from expectation :)
>
> The patch makes it static (with implicit initialization). It is fundamentally
> wrong to reinitialize it every time. mbstate_t is a function of all preceding
> input; for shift state encoding it will also keep current shift state among
> other things. It also means that in the long run every input must have own
> mbstate_t which is initialized when stream is first opened. We need one
> mbstate_t for zle.
It's not quite so clear in our case. We're not just inputting a stream
of characters, we've got editing characters in between. However,
there's not a lot we can do apart from making consecutively typed
characters work properly, so what you say is the best bet.
We do probably need to reset the shift state each time zleread() is
called to get a complete line, however, so the static in getrestchar()
probably needs to be a global somewhere.
Presumably also leaving the mbrtowc in stringaszleline separate, as it
is at the moment, is also the right thing to do.
> Editing Russian is funny; "echo xxxx" outputs correct text but during line
> editing display is wrong (it counts every UTF-8 as 2 screen characters).
If it works otherwise, e.g. the cursor moves over the right number of
characters (but not screen columns), I guess that's something in
zle_refresh.c. We're not using wcwidth, yet, but it doesn't sound like
that's the problem since it should assume a single column per character.
> BTW calling getbyte from getsrestchar resets lastchar_wide_valid.
Aha, that's probably what Bart meant by "are we setting
lastchar_wide_valid too early", which I misinterpreted. I'll fix it in
the patch I'm doing.
> PS am I the only one to have problems with SourceForge ssh CVS? It does not
> hang completely but it is painfully slow.
I've been having that for a day or two, too.
--
Peter Stephenson <pws@csr.com> Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] fix mulibyte input/mbstate_t problem
2005-02-23 10:51 ` Peter Stephenson
@ 2005-02-23 11:11 ` Andrey Borzenkov
0 siblings, 0 replies; 3+ messages in thread
From: Andrey Borzenkov @ 2005-02-23 11:11 UTC (permalink / raw)
To: zsh-workers
On Wednesday 23 February 2005 13:51, Peter Stephenson wrote:
> We do probably need to reset the shift state each time zleread() is
> called to get a complete line, however, so the static in getrestchar()
> probably needs to be a global somewhere.
>
Yes, I intend make getrestchar to accept mbstate as parameter at some point.
> Presumably also leaving the mbrtowc in stringaszleline separate, as it
> is at the moment, is also the right thing to do.
>
> > Editing Russian is funny; "echo xxxx" outputs correct text but during
> > line editing display is wrong (it counts every UTF-8 as 2 screen
> > characters).
>
> If it works otherwise, e.g. the cursor moves over the right number of
> characters (but not screen columns), I guess that's something in
> zle_refresh.c. We're not using wcwidth, yet, but it doesn't sound like
> that's the problem since it should assume a single column per character.
>
I am rewriting it to use wide char. the problem was that cursor position was
assumed equal to string position which was wrong for multibyte characters.
Having refresh to deal consistently with wide chars simplifies everthing a
lot.
it still does not wcwidth, currently everything is assumed to be exactly 1
char.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-02-23 11:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-22 21:20 [PATCH] fix mulibyte input/mbstate_t problem Andrey Borzenkov
2005-02-23 10:51 ` Peter Stephenson
2005-02-23 11:11 ` Andrey Borzenkov
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).