zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: zsh-workers@sunsite.dk (Zsh hackers list)
Subject: PATCH: Solaris multibyte stuff
Date: Mon, 17 Dec 2007 16:58:16 +0000	[thread overview]
Message-ID: <15829.1197910696@csr.com> (raw)

Thanks to Danek, I was able to trace the multibyte problem on Solaris.
It turns out that mbrlen() and presumably other functions that
return the number of characters in a multibyte expression return
the full number of character, even if reading one byte at a time
left you in the middle of a character from the previous time.  So we
were miscounting.  Luckily, this doesn't affect too much of the shell
and it's easy to make it robust since (obviously) we know how many
bytes we've just examined.  However, it's something to remember for
future code.

One other test was failing:  I was expecting $'\u00e9' in the C locale
to fail to convert but it seems it gets converted silently to a question
mark, so I've tested for that as an alternative.

All the tests now pass.

Index: Src/builtin.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v
retrieving revision 1.183
diff -u -r1.183 builtin.c
--- Src/builtin.c	12 Dec 2007 18:43:29 -0000	1.183
+++ Src/builtin.c	17 Dec 2007 16:54:01 -0000
@@ -4927,7 +4927,7 @@
 		    break;
 		}
 		*bptr = (char) val;
-#ifdef MULTIBYTE_SUPPORT	
+#ifdef MULTIBYTE_SUPPORT
 		if (isset(MULTIBYTE)) {
 		    ret = mbrlen(bptr++, 1, &mbs);
 		    if (ret == MB_INVALID)
@@ -4954,8 +4954,8 @@
 		    eof = 1;
 		    break;
 		}
-	    
-#ifdef MULTIBYTE_SUPPORT	
+
+#ifdef MULTIBYTE_SUPPORT
 		if (isset(MULTIBYTE)) {
 		    while (val > 0) {
 			ret = mbrlen(bptr, val, &mbs);
@@ -4970,6 +4970,10 @@
 			    }
 			    else if (ret == 0) /* handle null as normal char */
 				ret = 1;
+			    else if (ret > val) {
+				/* Some mbrlen()s return the full char len */
+				ret = val;
+			    }
 			    nchars--;
 			    val -= ret;
 			    bptr += ret;
Index: Src/Zle/zle_utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_utils.c,v
retrieving revision 1.42
diff -u -r1.42 zle_utils.c
--- Src/Zle/zle_utils.c	19 Apr 2007 14:16:23 -0000	1.42
+++ Src/Zle/zle_utils.c	17 Dec 2007 16:54:03 -0000
@@ -294,6 +294,16 @@
 		 * (certainly true for Unicode and unlikely to be false
 		 * in any non-pathological multibyte representation). */
 		cnt = 1;
+	    } else if (cnt > ll) {
+		/*
+		 * Some multibyte implementations return the
+		 * full length of a previous incomplete character
+		 * instead of the remaining length.
+		 * This is paranoia: it only applies if we start
+		 * midway through a multibyte character, which
+		 * presumably can't happen.
+		 */
+		cnt = ll;
 	    }
 
 	    if (outcs) {
@@ -843,6 +853,12 @@
 		cnt = 1;
 		/* FALL THROUGH */
 	    default:
+		/*
+		 * Paranoia: only needed if we start in the middle
+		 * of a multibyte string and only in some implementations.
+		 */
+		if (cnt > ulen)
+		    cnt = ulen;
 		n = wcs_nicechar(c, &width, NULL);
 		break;
 	    }
Index: Test/D07multibyte.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D07multibyte.ztst,v
retrieving revision 1.21
diff -u -r1.21 D07multibyte.ztst
--- Test/D07multibyte.ztst	6 Nov 2007 20:45:09 -0000	1.21
+++ Test/D07multibyte.ztst	17 Dec 2007 16:54:03 -0000
@@ -388,9 +388,18 @@
 # This also isn't strictly multibyte and is here to reduce the
 # likelihood of a "can't do character set conversion" error.
   testfn() { (LC_ALL=C; print $'\u00e9') }
-  repeat 4 testfn
-1:error handling in Unicode quoting
-?testfn: character not in range
-?testfn: character not in range
-?testfn: character not in range
-?testfn: character not in range
+  repeat 4 testfn 2>&1 | while read line; do
+    if [[ $line = *"character not in range"* ]]; then
+      print OK
+    elif [[ $line = "?" ]]; then
+      print OK
+    else
+      print Failed: no error message and no question mark
+    fi
+  done
+  true
+0:error handling in Unicode quoting
+>OK
+>OK
+>OK
+>OK


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


                 reply	other threads:[~2007-12-17 17:04 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15829.1197910696@csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).