zsh-users
 help / color / mirror / code / Atom feed
From: Roman Perepelitsa <roman.perepelitsa@gmail.com>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: Zsh Users <zsh-users@zsh.org>
Subject: Re: A comment about "slurp" and -o multibyte
Date: Wed, 17 Jan 2024 07:07:50 +0100	[thread overview]
Message-ID: <CAN=4vMrAKoZ-R7kaCpESkEU4PuBM-8Yxig1d1Jg7QLqNK7jMNw@mail.gmail.com> (raw)
In-Reply-To: <CAH+w=7YpEjmzROcrOsqwJ+EMsa7dQUMFQKJoY7YqFC1VpBGtzQ@mail.gmail.com>

On Wed, Jan 17, 2024 at 4:46 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Jan 14, 2024 at 2:34 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> >     function slurp() {
> >       emulate -L zsh -o no_multibyte
> > [...]
> >       typeset -g REPLY=${(j::)content}
> >     }
>
> Although the function faithfully reads the input stream into $REPLY,
> later references to $REPLY with the multibyte option back in effect
> will (re-)interpret the content as multibyte characters.  This may not
> be what's desired.
>
> % slurp < =zsh
> % () {
> print $#REPLY
> print ${(m)#REPLY}
> print ${(mm)#REPLY}
> setopt localoptions nomultibyte
> print $#REPLY
> }
> 872903  <-- number of characters
> 873259  <-- width of printable characters
> 872383  <-- number of glyphs
> 878288  <-- actual number of bytes
>
> (Of course those first three numbers are all garbage because it's just
> interpreting an executable as wide character text.)

To me this behavior looks as expected. It's consistent with `read`,
`sysread` and process substitution.

    % head -c $((1 << 20)) </dev/urandom | tr '\0' x >1MB
    % slurp <1MB
    % IFS= read -rd '' read <1MB
    % sysread -s $((1 << 20)) sysread <1MB
    % procsubst=${"$(<1MB; print -n .)"%.}
    % () {
      print -r -- $#REPLY $#read $#sysread $#procsubst
      setopt local_options no_multibyte
      print -r -- $#REPLY $#read $#sysread $#procsubst
    }
    1008389 1008389 1008389 1008389
    1048576 1048576 1048576 1048576

Roman.


  reply	other threads:[~2024-01-17  6:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17  3:45 Bart Schaefer
2024-01-17  6:07 ` Roman Perepelitsa [this message]
2024-01-17  7:06   ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN=4vMrAKoZ-R7kaCpESkEU4PuBM-8Yxig1d1Jg7QLqNK7jMNw@mail.gmail.com' \
    --to=roman.perepelitsa@gmail.com \
    --cc=schaefer@brasslantern.com \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).