zsh-users
 help / color / mirror / code / Atom feed
From: Roman Perepelitsa <roman.perepelitsa@gmail.com>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: Zsh Users <zsh-users@zsh.org>
Subject: Re: Slurping a file (was: more spllitting travails)
Date: Mon, 15 Jan 2024 09:53:05 +0100	[thread overview]
Message-ID: <CAN=4vMqU1CAMEEuL9dT0D3eqRSdSJE+cRdjr1WNOMsESYoPGNQ@mail.gmail.com> (raw)
In-Reply-To: <CAH+w=7bZqKwJT-D8BMRe+Smo70iUzAV3SCFFnG-9HSY=XGMzHw@mail.gmail.com>

On Sun, Jan 14, 2024 at 11:10 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Sun, Jan 14, 2024 at 2:34 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> > On Sat, Jan 13, 2024 at 9:02 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > >
> > >   IFS= read -rd '' file_content <file
> >
> > In addition to being unable to read files with nul bytes, this
> > solution suffers from additional drawbacks:
> >
> > - It's impossible to distinguish EOF from I/O error.
>
> Pretty sure you can do that by examining $ERRNO on nonzero status?

I wouldn't do that other than for debugging. In general, you can
examine errno only for functions that explicitly document how they set
it. If this part is not documented, you have to assume the function
may set errno to anything both on success and on error. Also, most
libc functions may set errno to anything on success.

In this specific case perhaps `read` calls `malloc` after an I/O
error, which may trash errno. Or perhaps at the end of `read <file`
the file descriptor is closed, which again may trash errno. I haven't
verified either of these things. I am merely suggesting why `read`
conceivably could fail to propagate errno from an I/O error in the
absence of explicit guarantees in the docs.

> I'm curious whether
>   setopt nomultibyte
>   read -u 0 -k 8192 ...
> is actually that much slower in a slurp-like loop.

It is slightly *faster*. For smaller files the difference is about
25%. From 512KB and up there is no discernible difference.

> Another thought:  Use -c count option to get number of bytes read and
> -s $size option to specify buffer size.  If (( $count == $size )) then
> double $size for the next read.

This does not seem to help, although this might be dependent on the
device and filesystem. Here's a benchmark for various file sizes
(rows) and various fixed buffer sizes (columns):

     n   fsize    1KB    2KB     4KB    8KB   16KB   32KB   64KB
     1       0     41     43      43     43     51     52     53
     2       1     47     48      49     48     57     57     59
     3       2     48     48      48     48     56     57     58
     4       4     49     49      48     49     62     61     59
     5       8     74     75      51     49     62     61     63
     6      16     47     51      49     49     57     61     63
     7      32     47     50      49     50     58     58     59
     8      64     54     53      49     50     59     58     71
     9     128     50     50      51     51     59     60     61
    10     256     49     52      51     51     60     61     63
    11     512     53     55      55     54     64     64     65
    12    1024     58     61      60     61     57     68     71
    13    2048     77     72      71     74     83     83     83
    14    4096    112    102      88     89    107    100    108
    15    8192    188    153     152    145    161    163    140
    16   16384    343    290     270    259    265    240    225
    17   32768    658    577     427    471    499    495    489
    18   65536   1281   1082     983    771    938    827    937
    19  131072   2659   2214    2046   1952   1893   1928   1506
    20  262144   4818   4608    4195   4254   3810   3955   3043
    21  524288  10174   8967    7502   6382   7632   6142   7148
    22 1048576  21591  18205   16424  15691  15243  14327  14889
    23 2097152  41156  36087   32731  31840  30104  30090  29913
    24 4194304  89814  72949   66447  62716  60998  60252  59485
    25 8388608 191579 147195  125987 116327 121544 122384 122631

4KB and 8KB buffers perform best in this benchmark across all file
sizes. Given that 8KB is the default for sysread, there is no apparent
reason to use `-s`.

> >       typeset -g REPLY=${(j::)content}
>
> Why the typeset here?  Just assign?

Just a habit from using warn_create_global in my scripts. It catches
typos and missing `local` declarations quite well.

> Sadly there's another utility named "slurp":
>
> slurp
>   cli utility to select a region in a Wayland compositor

That's too bad: "slurp" is a well-known moniker for reading the full
content of a file (https://www.google.com/search?q=file+slurp).

Perhaps zslurp?

Roman.


  reply	other threads:[~2024-01-15  8:54 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-12 19:05 more splitting travails Ray Andrews
2024-01-12 19:19 ` Bart Schaefer
2024-01-12 19:56   ` Ray Andrews
2024-01-12 20:07     ` Mark J. Reed
     [not found]   ` <CAA=-s3zc5a+PA7draaA=FmXtwU9K8RrHbb70HbQN8MhmuXTYrQ@mail.gmail.com>
2024-01-12 20:03     ` Fwd: " Bart Schaefer
2024-01-12 20:32       ` Ray Andrews
2024-01-12 20:50         ` Roman Perepelitsa
2024-01-13  2:12           ` Ray Andrews
2024-01-12 20:51         ` Bart Schaefer
2024-01-12 21:57           ` Mark J. Reed
2024-01-12 22:09             ` Bart Schaefer
2024-01-13  3:06               ` Ray Andrews
2024-01-13  3:36                 ` Ray Andrews
2024-01-13  4:07                   ` Bart Schaefer
2024-01-13  5:39               ` Roman Perepelitsa
2024-01-13 20:02                 ` Slurping a file (was: more spllitting travails) Bart Schaefer
2024-01-13 20:07                   ` Slurping a file Ray Andrews
2024-01-14  5:03                     ` zcurses mouse delay (not Re: Slurping a file) Bart Schaefer
2024-01-14  5:35                       ` Ray Andrews
2024-01-14 10:34                   ` Slurping a file (was: more spllitting travails) Roman Perepelitsa
2024-01-14 10:57                     ` Roman Perepelitsa
2024-01-14 15:36                     ` Slurping a file Ray Andrews
2024-01-14 15:41                       ` Roman Perepelitsa
2024-01-14 20:13                       ` Lawrence Velázquez
2024-01-15  0:03                         ` Ray Andrews
2024-01-15  0:55                           ` Empty element elision and associative arrays (was Re: Slurping a file) Bart Schaefer
2024-01-15  4:09                             ` Ray Andrews
2024-01-15  7:01                               ` Lawrence Velázquez
2024-01-15 14:47                                 ` Ray Andrews
2024-01-18 16:20                                 ` Mark J. Reed
2024-01-18 17:22                                   ` Ray Andrews
2024-01-18 17:36                                     ` Mark J. Reed
2024-01-18 17:55                                       ` Ray Andrews
2024-01-18 22:34                               ` Bart Schaefer
2024-01-18 23:08                                 ` Ray Andrews
2024-01-19  2:46                                   ` Bart Schaefer
2024-01-19  2:58                                     ` Ray Andrews
2024-01-19 10:27                                       ` Stephane Chazelas
2024-01-19 13:45                                         ` Mikael Magnusson
2024-01-19 14:37                                           ` Mark J. Reed
2024-01-19 14:57                                             ` Ray Andrews
2024-01-19 15:46                                               ` Mark J. Reed
2024-01-19 16:01                                                 ` Mikael Magnusson
2024-01-19 17:15                                                   ` Ray Andrews
2024-01-19 17:42                                                     ` Bart Schaefer
2024-01-19 18:45                                                       ` Ray Andrews
2024-01-14 22:09                     ` Slurping a file (was: more spllitting travails) Bart Schaefer
2024-01-15  8:53                       ` Roman Perepelitsa [this message]
2024-01-16 19:57                         ` Bart Schaefer
2024-01-16 20:07                           ` Slurping a file Ray Andrews
2024-01-16 20:14                             ` Roman Perepelitsa
2024-01-16 20:38                               ` Ray Andrews
2024-01-16 20:43                                 ` Roman Perepelitsa
2024-01-16 22:27                                   ` Ray Andrews
2024-01-15  2:00                     ` Slurping a file (was: more spllitting travails) Bart Schaefer
2024-01-15  4:24                       ` Slurping a file Ray Andrews
2024-01-15  6:56                         ` Lawrence Velázquez
2024-01-15 14:37                           ` Ray Andrews
2024-01-15 15:10                             ` Marc Chantreux
2024-01-15 15:29                               ` Mark J. Reed
2024-01-15 16:16                                 ` Marc Chantreux
2024-01-15 16:33                                   ` MUAs (was: Re: Slurping a file) zeurkous
2024-01-16  7:23                               ` Slurping a file Lawrence Velázquez
2024-01-16 14:37                                 ` Ray Andrews
2024-01-17  3:50                                   ` Lawrence Velázquez
2024-01-17  5:10                                     ` Ray Andrews
2024-01-15  7:26                       ` Slurping a file (was: more spllitting travails) Lawrence Velázquez
2024-01-15 14:48                         ` Slurping a file Ray Andrews
2024-01-15 13:13                       ` Slurping a file (was: more spllitting travails) Marc Chantreux
2024-02-10 20:48                     ` Stephane Chazelas
2024-02-11  0:59                       ` Mikael Magnusson
2024-02-11  4:49                         ` Bart Schaefer
2024-02-11  5:04                           ` Mikael Magnusson
2024-02-11  4:46                       ` Bart Schaefer
2024-02-11  5:06                         ` Mikael Magnusson
2024-02-11  7:09                         ` Stephane Chazelas
2024-01-13  2:19           ` Fwd: more splitting travails Ray Andrews
2024-01-13  3:59             ` Bart Schaefer
2024-01-13  4:54               ` Ray Andrews
2024-01-13  5:51                 ` Roman Perepelitsa
2024-01-13 16:40                   ` Ray Andrews
2024-01-13 18:22                     ` Bart Schaefer
2024-01-13 19:08                       ` Ray Andrews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN=4vMqU1CAMEEuL9dT0D3eqRSdSJE+cRdjr1WNOMsESYoPGNQ@mail.gmail.com' \
    --to=roman.perepelitsa@gmail.com \
    --cc=schaefer@brasslantern.com \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).