From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 2996 invoked from network); 16 Dec 2022 08:30:35 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 16 Dec 2022 08:30:35 -0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Message-ID:Date: Content-Transfer-Encoding:Content-ID:Content-Type:MIME-Version:Subject:To: References:From:In-reply-to:cc:Reply-To:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=Sfb7R+QMb5gqShkhTEny1YpVyyOOvUTKYj0Iu8q+4h0=; b=U0Zey/Ku/BFXe+pxRgZ0JkQBWH dTERuxhoz2kQREeTLs0MMjNZiA8UBE58NZldV+ZKX+jEN0qwoZg61CAlFX1O/x/E2ysypOC8GE+1d EAxDVS/n1X+G/+AQeTVds6L1CPsaveb6wERSbO6kU3UcO22QL5SREJp+1w+lzn9KdHqReZt7lTL4j MUqgr7J3mc/S4pXh2KP5VbqxVPejKJOo5hMXB/uYPtZQsQSBpA34Fc6nf8Fbyc60oZ7iznty9KEZ/ QRdn7r3AHxtlhp2tKNPKGOCUdiW02oxgMCX+1mzI3T9jDCQ8dN5UTy0YAYbhP8Uqs8Qo6GtxBbpBL Fbj1e+Kg==; Received: by zero.zsh.org with local id 1p6670-000JeY-OD; Fri, 16 Dec 2022 08:30:34 +0000 Received: by zero.zsh.org with esmtpsa (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1p666C-000JJC-Sa; Fri, 16 Dec 2022 08:29:45 +0000 Received: from [192.168.178.21] (helo=hydra) by mail.kiddle.eu with esmtp(Exim 4.95) (envelope-from ) id 1p666C-0004rP-4a; Fri, 16 Dec 2022 09:29:44 +0100 cc: zsh-workers@zsh.org In-reply-to: From: Oliver Kiddle References: <20221209154225.2z3lbtf422ypnmjx@chazelas.org> <99492-1670616302.663548@1brw.o7tP.wgJL> <20221210090626.mkv7bxeqnap6awah@chazelas.org> <1FF79E35-0103-4B80-BA4A-ECC6FD2ADF7E@kba.biglobe.ne.jp> <46661-1671054174.401235@OHsn.sB58.XThR> To: "Jun. T" Subject: Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH]) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <18685.1671179384.1@hydra> Content-Transfer-Encoding: 8bit Date: Fri, 16 Dec 2022 09:29:44 +0100 Message-ID: <18686-1671179384.136789@8qJu.Y1PF.BJgr> X-Seq: 51223 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: "Jun. T" wrote: > > --- a/Test/B04read.ztst > > +++ b/Test/B04read.ztst > (snip) > > + read -ed $'\xc2' > > +0:read delimited by a single byte terminates if the byte is part of a multibyte character > > + > +>one > > Is this really what the standard requires (or will require)? > Breaking in the middle of a valid multibyte character looks > rather odd to me. The proposed standard wording appears to only talk about the case of the delimiter consisting of "one single-byte character". $'\xc2' is not a valid UTF-8 character so my interpretation is that they are leaving this undefined. Behaviour that treats the input as raw bytes for a raw byte delimiter is consistent. This retains compatibility with the way things work for a non-multibyte locale. Not all files are valid UTF-8 and it can be useful to force things to work at a raw byte level. The only alternative I can think of would be to print an error for the delimiter. Did you have something else in mind? Oliver