zsh-workers
 help / color / mirror / code / Atom feed
From: "Jun. T" <takimoto-j@kba.biglobe.ne.jp>
To: zsh-workers@zsh.org
Subject: Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH])
Date: Sun, 18 Dec 2022 19:51:22 +0900	[thread overview]
Message-ID: <5D83D776-4F97-499D-8848-A680F712DD31@kba.biglobe.ne.jp> (raw)
In-Reply-To: <18686-1671179384.136789@8qJu.Y1PF.BJgr>


> 2022/12/16 17:29, Oliver Kiddle <opk@zsh.org> wrote:
> 
>>> +  read -ed $'\xc2'
>>> +0:read delimited by a single byte terminates if the byte is part of a multibyte character
>>> +<one£two
>>> +>one
>> 
>> Is this really what the standard requires (or will require)?
>> Breaking in the middle of a valid multibyte character looks
>> rather odd to me.
> 
> The proposed standard wording appears to only talk about the case of the
> delimiter consisting of "one single-byte character". $'\xc2' is not a
> valid UTF-8 character so my interpretation is that they are leaving this
> undefined.

I thought the "one single-byte character" etc. applies only when C or
POSIX locale is in use.

> Behaviour that treats the input as raw bytes for a raw byte delimiter
> is consistent. This retains compatibility with the way things
> work for a non-multibyte locale. Not all files are valid UTF-8 and it
> can be useful to force things to work at a raw byte level.

I was thinking it would be enough if we can do 'byte-by-byte' analysis by
using C/POSIX locale (or by setting MULTIBYTE option to off).

In the web page Stehane mentioned:
https://austingroupbugs.net/view.php?id=243#c6091

"When the current locale is not the C or POSIX locale, pathnames can contain bytes that do not form part of a valid character, and therefore portable applications need to ensure that the current locale is the C or POSIX locale when using read with arbitrary pathnames as input."

But I'm not familiar with this type of documents.



  reply	other threads:[~2022-12-18 10:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 15:42 read -d $'\200' doesn't work with set +o multibyte Stephane Chazelas
2022-12-09 20:05 ` Oliver Kiddle
2022-12-10  9:06   ` read -d $'\200' doesn't work with set +o multibyte (and [PATCH]) Stephane Chazelas
2022-12-13 11:12     ` Jun T
2022-12-14 21:42       ` Oliver Kiddle
2022-12-15 12:37         ` Jun. T
2022-12-16  8:29           ` Oliver Kiddle
2022-12-18 10:51             ` Jun. T [this message]
2022-12-18 17:58               ` Stephane Chazelas
2022-12-15  2:01     ` Oliver Kiddle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D83D776-4F97-499D-8848-A680F712DD31@kba.biglobe.ne.jp \
    --to=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).