From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 32389 invoked from network); 14 Dec 2022 21:43:16 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 14 Dec 2022 21:43:16 -0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Message-ID:Date: Content-Transfer-Encoding:Content-ID:Content-Type:MIME-Version:Subject:To: References:From:In-reply-to:cc:Reply-To:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=rGp4V/6snqWp5MEKPgpWW+S1WMGHeIgqnkMKBAGoBpo=; b=HeuhzZb/330eHPyc3B4jRSqDH6 ckx3qwgce3+h9rLH2AzfykIpNhjoD/lz3eG1bnnAsiYoEoE232NbV/5V+6nAFnw7jYNc1wIUWpOwx jFwbyq3Bbt1MEblOfVL4Ugklw7DuCJyZqnPGQBvK09E9p6YRsUg5R1HfokoSuW3B/4EVH6eOfsJpU tfUeSQ/nyPHARWJqb3JEdLVqOIrD/pzQ7rj9zhynOMIOQht65tX8LtZKzfLp7KoIpnjC4diU1zKV8 f6/lV7DTwJd++wbc4N1r6bzC34hqQieKxtRP4Jw63aeDpfEMHzLmql4PlwgYsoXFtBJACvitZ7pWc unxFJJHw==; Received: by zero.zsh.org with local id 1p5ZX0-000MW5-TB; Wed, 14 Dec 2022 21:43:14 +0000 Received: by zero.zsh.org with esmtpsa (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1p5ZWh-000MDP-V1; Wed, 14 Dec 2022 21:42:56 +0000 Received: from [192.168.178.21] (helo=hydra) by mail.kiddle.eu with esmtp(Exim 4.95) (envelope-from ) id 1p5ZWg-000C8c-D6; Wed, 14 Dec 2022 22:42:54 +0100 cc: zsh-workers@zsh.org In-reply-to: <1FF79E35-0103-4B80-BA4A-ECC6FD2ADF7E@kba.biglobe.ne.jp> From: Oliver Kiddle References: <20221209154225.2z3lbtf422ypnmjx@chazelas.org> <99492-1670616302.663548@1brw.o7tP.wgJL> <20221210090626.mkv7bxeqnap6awah@chazelas.org> <1FF79E35-0103-4B80-BA4A-ECC6FD2ADF7E@kba.biglobe.ne.jp> To: Jun T Subject: Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH]) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <46660.1671054174.1@hydra> Content-Transfer-Encoding: 8bit Date: Wed, 14 Dec 2022 22:42:54 +0100 Message-ID: <46661-1671054174.401235@OHsn.sB58.XThR> X-Seq: 51214 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: Jun T wrote: > Thanks, I think it fixes the problem for the '#ifdef MULTIBYTE_SUPPORT' section. > > When MULTIBYTE_SUPPORT is not defined, delim is char, so we need > STOUC() not when assigning to delim but when using delim. > But instead of adding STOUC() to every use of delim (in nondef > MULTIBYTE_SUPPORT section), it would be easier to define delim as int. At least in my testing, it appears to also work to define delim as unsigned char which I would find less confusing. > + print -n $'first line\x80second line\x80' | > + while read -d $'\x80' line; do print $line; done > +0:read with a delimeter >= 0x80 There's a typo in "delimiter" The patch below needs to be applied on top of your patch. It adds a few more test cases, documents (and tests) the empty string being an alternative way to set the delimiter to NUL. It also addresses the additional problem I was hitting when trying to reproduce the original problem. Rather than follow the 0xdc00 + byte suggestion it was easier to simply set a separate flag variable and follow the !isset(MULTIBYTE) path through the later code. Oliver diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo index b6217f66d..56428a714 100644 --- a/Doc/Zsh/builtins.yo +++ b/Doc/Zsh/builtins.yo @@ -1589,7 +1589,8 @@ Input is read from the coprocess. ) item(tt(-d) var(delim))( Input is terminated by the first character of var(delim) instead of -by newline. +by newline. For compatibility with other shells, if var(delim) is an +empty string, input is terminated at the first NUL. ) item(tt(-t) [ var(num) ])( Test if input is available before attempting to read. If var(num) diff --git a/Src/builtin.c b/Src/builtin.c index a6fadb622..09d0ca2f0 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -6282,6 +6282,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func)) long izle_timeout = 0; #ifdef MULTIBYTE_SUPPORT wchar_t delim = L'\n', wc; + int rawbyte = 0; mbstate_t mbs; char *laststart; size_t ret; @@ -6412,9 +6413,11 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func)) wi = WEOF; if (wi != WEOF) delim = (wchar_t)wi; - else + else { delim = (wchar_t)STOUC((delimstr[0] == Meta) ? delimstr[1] ^ 32 : delimstr[0]); + rawbyte = 1; + } #else delim = STOUC((delimstr[0] == Meta) ? delimstr[1] ^ 32 : delimstr[0]); #endif @@ -6841,7 +6844,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func)) break; } *bptr = (char)c; - if (isset(MULTIBYTE)) { + if (isset(MULTIBYTE) && !rawbyte) { ret = mbrtowc(&wc, bptr, 1, &mbs); if (!ret) /* NULL */ ret = 1; diff --git a/Test/B04read.ztst b/Test/B04read.ztst index a2f03c9b3..f50c43682 100644 --- a/Test/B04read.ztst +++ b/Test/B04read.ztst @@ -82,6 +82,10 @@ >Testing the >null hypothesis + read -ed '' <<<$'one\0two' +0:empty delimiter terminates at nulls +>one + print -n $'first line\x80second line\x80' | while read -d $'\x80' line; do print $line; done 0:read with a delimeter >= 0x80 diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index 6909346cb..413c4fe73 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -212,6 +212,20 @@ >first >second + read -ed £ +0:read with multibyte delimiter where bytes of delimiter also occur in input +one¤twoãthree + + read -ed $'\xa0' <<<$'first\xa0second' +0:read delimited by a byte that isn't a valid multibyte character +>first + + read -ed $'\xc2' +0:read delimited by a single byte terminates if the byte is part of a multibyte character +one + (IFS=« read -d » -A array print -l $array)