zsh-workers
 help / color / mirror / code / Atom feed
From: Oliver Kiddle <opk@zsh.org>
To: Jun T <takimoto-j@kba.biglobe.ne.jp>
Cc: zsh-workers@zsh.org
Subject: Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH])
Date: Wed, 14 Dec 2022 22:42:54 +0100	[thread overview]
Message-ID: <46661-1671054174.401235@OHsn.sB58.XThR> (raw)
In-Reply-To: <1FF79E35-0103-4B80-BA4A-ECC6FD2ADF7E@kba.biglobe.ne.jp>

Jun T wrote:
> Thanks, I think it fixes the problem for the '#ifdef MULTIBYTE_SUPPORT' section.
>
> When MULTIBYTE_SUPPORT is not defined, delim is char, so we need
> STOUC() not when assigning to delim but when using delim.
> But instead of adding STOUC() to every use of delim (in nondef
> MULTIBYTE_SUPPORT section), it would be easier to define delim as int.

At least in my testing, it appears to also work to define delim as
unsigned char which I would find less confusing.

> + print -n $'first line\x80second line\x80' |
> + while read -d $'\x80' line; do print $line; done
> +0:read with a delimeter >= 0x80

There's a typo in "delimiter"

The patch below needs to be applied on top of your patch. It adds a few
more test cases, documents (and tests) the empty string being an
alternative way to set the delimiter to NUL. It also addresses the
additional problem I was hitting when trying to reproduce the original
problem. Rather than follow the 0xdc00 + byte suggestion it was
easier to simply set a separate flag variable and follow the
!isset(MULTIBYTE) path through the later code.

Oliver

diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo
index b6217f66d..56428a714 100644
--- a/Doc/Zsh/builtins.yo
+++ b/Doc/Zsh/builtins.yo
@@ -1589,7 +1589,8 @@ Input is read from the coprocess.
 )
 item(tt(-d) var(delim))(
 Input is terminated by the first character of var(delim) instead of
-by newline.
+by newline.  For compatibility with other shells, if var(delim) is an
+empty string, input is terminated at the first NUL.
 )
 item(tt(-t) [ var(num) ])(
 Test if input is available before attempting to read.  If var(num)
diff --git a/Src/builtin.c b/Src/builtin.c
index a6fadb622..09d0ca2f0 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -6282,6 +6282,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
     long izle_timeout = 0;
 #ifdef MULTIBYTE_SUPPORT
     wchar_t delim = L'\n', wc;
+    int rawbyte = 0;
     mbstate_t mbs;
     char *laststart;
     size_t ret;
@@ -6412,9 +6413,11 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
 	    wi = WEOF;
 	if (wi != WEOF)
 	    delim = (wchar_t)wi;
-	else
+	else {
 	    delim = (wchar_t)STOUC((delimstr[0] == Meta) ?
 			      delimstr[1] ^ 32 : delimstr[0]);
+	    rawbyte = 1;
+	}
 #else
         delim = STOUC((delimstr[0] == Meta) ? delimstr[1] ^ 32 : delimstr[0]);
 #endif
@@ -6841,7 +6844,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
 		break;
 	    }
 	    *bptr = (char)c;
-	    if (isset(MULTIBYTE)) {
+	    if (isset(MULTIBYTE) && !rawbyte) {
 		ret = mbrtowc(&wc, bptr, 1, &mbs);
 		if (!ret)	/* NULL */
 		    ret = 1;
diff --git a/Test/B04read.ztst b/Test/B04read.ztst
index a2f03c9b3..f50c43682 100644
--- a/Test/B04read.ztst
+++ b/Test/B04read.ztst
@@ -82,6 +82,10 @@
 >Testing the
 >null hypothesis
 
+ read -ed '' <<<$'one\0two'
+0:empty delimiter terminates at nulls
+>one
+
  print -n $'first line\x80second line\x80' |
  while read -d $'\x80' line; do print $line; done
 0:read with a delimeter >= 0x80
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 6909346cb..413c4fe73 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -212,6 +212,20 @@
 >first
 >second
 
+  read -ed £
+0:read with multibyte delimiter where bytes of delimiter also occur in input
+<one¤twoãthree£four
+>one¤twoãthree
+
+  read -ed $'\xa0' <<<$'first\xa0second'
+0:read delimited by a byte that isn't a valid multibyte character
+>first
+
+  read -ed $'\xc2'
+0:read delimited by a single byte terminates if the byte is part of a multibyte character
+<one£two
+>one
+
   (IFS=«
   read -d » -A array
   print -l $array)


  reply	other threads:[~2022-12-14 21:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 15:42 read -d $'\200' doesn't work with set +o multibyte Stephane Chazelas
2022-12-09 20:05 ` Oliver Kiddle
2022-12-10  9:06   ` read -d $'\200' doesn't work with set +o multibyte (and [PATCH]) Stephane Chazelas
2022-12-13 11:12     ` Jun T
2022-12-14 21:42       ` Oliver Kiddle [this message]
2022-12-15 12:37         ` Jun. T
2022-12-16  8:29           ` Oliver Kiddle
2022-12-18 10:51             ` Jun. T
2022-12-18 17:58               ` Stephane Chazelas
2022-12-15  2:01     ` Oliver Kiddle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46661-1671054174.401235@OHsn.sB58.XThR \
    --to=opk@zsh.org \
    --cc=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).