From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from zero.zsh.org (zero.zsh.org [IPv6:2a02:898:31:0:48:4558:7a:7368]) by inbox.vuxu.org (Postfix) with ESMTP id 9D70929371 for ; Mon, 15 Jan 2024 09:54:54 +0100 (CET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1705308894; b=i/Jz3MI/8W+mHn8MUgQqYfxdm6t0J+rbodg8mlRB7yW6Wu5KS52kahcT9l4l4GWWehLhGZ/Gya /tW9SBK4TpyexxiiznFma7/lFPu8jq3HXH1ttR6/hcvvbQPXkpR20GqfWwPWaIqHqnkTajZT0Q EwJavVLtxqeciwh7mRyneO+yrbSXyMUQ22BIMjQn6hx8fRWlT8W70NifWjzAjQFQ8xhcg/Jsnd N3xdzW5rOUDRfcL/utrqFzzZjOUINzaD+zdMmdCr67KSxGrk6Apeh8RvTfNwTvGnt3RNmPlF9J b3LLFKAqfzQ/KKM1wx2wSeP6AsmFf8P21SsPsHFf37go/Q==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-lj1-f177.google.com) smtp.remote-ip=209.85.208.177; dkim=pass header.d=gmail.com header.s=20230601 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1705308894; bh=7GfpSXrGlwmGjNIbC9E7c98ifnAqb9lJOwtwClNjwXI=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:Cc:To:Subject: Message-ID:Date:From:In-Reply-To:References:MIME-Version:DKIM-Signature: DKIM-Signature; b=Q8AN/IqCFziNZvPEZmXuEZhlcFdMWIBFmeDanMOzbpWXriGAkPYpU4qwV6FrcByml3f8ZmUQR/ wRSrK8Vlsh2z4huq5GRf4WxtYY6lO1VtKM+4r+qmF1D+X7oTLBxijJoew2/1SKhCQRz0bKJvKw JhWoiSu0doYDuxdWPBeHCd2OCRYlqCMt9EvOf477bDDNwRGUf1p1PSdQ722jt2lFO4kzpmtu4k h09iOHc4vyj3Enrr8GwJIu8i8w7RfcJOPw8hqMC8k3Gl5x79xOKrFld6VT4oi3yvRZsTU03gxm LabUExjnaJTvYk4gObBooYzutQMSoAlhqyRWAnwy7IGN6Q==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=EgCzVKgUIIkJViIbEg2dO28IQ+PXUVEILfynaQ2dqfA=; b=oLEu40bkvU4Er71iKjGJ/kJhDT p1vxEWTy/oXpkocrKwlOsX26jJGB/7qqef3XecGSBkngg+eoDZy40aZnaRgTQadaiZHd5nviIYze1 EuaFyUXlyYmRiaXbQpM2aiuJSKUXv86/88daAV6d49IzeigFKx0eoa/0AvaqyKQb2UEA5w9kLGSia H+tZSmJGMhJ2+ARgxeQHzxH7WRoCA+EK0NN+cMRat0aFjjUzUlh2WHbJyarlMpBqc6vwe/iybaC1M H0yGDB2IBcStAhwQcQM1p8jtZrvDExD+IXX8DjQ9r5vDmx4FHC3b2UnhCN4ljCwA+V9BK1CH8YV9s 4pj01yAw==; Received: by zero.zsh.org with local id 1rPIkA-000PWL-Ds; Mon, 15 Jan 2024 08:54:54 +0000 Authentication-Results: zsh.org; iprev=pass (mail-lj1-f177.google.com) smtp.remote-ip=209.85.208.177; dkim=pass header.d=gmail.com header.s=20230601 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none Received: from mail-lj1-f177.google.com ([209.85.208.177]:48390) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1rPIic-000OgF-21; Mon, 15 Jan 2024 08:53:19 +0000 Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2cdc1af60b2so7845471fa.1 for ; Mon, 15 Jan 2024 00:53:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705308797; x=1705913597; darn=zsh.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EgCzVKgUIIkJViIbEg2dO28IQ+PXUVEILfynaQ2dqfA=; b=X3DcQHvT5zBONWGNyOMbaeKqDmU3E7Ozj1Phe68Oeles4DqxCAmgOCFWfa77FIJuLW /L8C7HvovNF9m5OGebf0k0PLnhgBCix2VBmUllLmxw58RR5ccxOtnwGVbFOXssibqvBw IOZlFbINF2mVPKBMq11JqcJGYX/G0Dai3Uo2qD2Kh1pM3cwBkTpR0X+bSA28nFA/YhPZ owCIKVvR4yNFy2P8Wq7xVwedxz/wM5mRg/ggGtYOZnuqmLmOTkV49tW71v7FBLDe/8/V Q8Ehcx0JIb3KCt91dckCchBzLLhg74LgKDJddM4ZH6iOV4F29CTwNs0kDXZ/68A20b3x IuSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705308797; x=1705913597; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EgCzVKgUIIkJViIbEg2dO28IQ+PXUVEILfynaQ2dqfA=; b=RkWfc3wZEAiigImZ9I+uS82MPQZbz+ZGUVO4Awjd0jUr9Sava+hYca6Dj66ZZTwvMN a0DvkwfOd8++iQ90CtmGuXvoep3eh/LdO9lrFznQzKkV2Z8ZqkIoUXPFWwg79e18ZeQx U6pvlHwq7/HDLtQ2XsuF8ibQMXo8IURbZFD7pSG4eGNcFolDFuHsnBJNNyKIy5bwnPqr PMkGk54/tg81JZwDPqwM4UB770N2kxkix/OAUhduJ0iFfPGsAUKs3x0GbfQhftzMPfiP 6YBWRPCUMNsW5IqQt5GRkvGwJIv7NFdRwp+0ZmBNhwUZdSqJWSbW1+Q8qhSWhmSMW7EO IeFg== X-Gm-Message-State: AOJu0YwoHV91/QeouWNb8/NA2ys/9iw1X8bFDizaP23tK2pveEiNU9Qm 553gD0z3if8OT5Jp2Thhec6PaOdCjMtZALd5KE48mlz/FP4= X-Google-Smtp-Source: AGHT+IGt8ywJmxdMbuVcbc8x/m0HnYm2G/hkQl5Yq0t9Oxl1s6LACtjkJT57ZDrliP7TYI90rv06kQjMfvZJOK31nk8= X-Received: by 2002:a2e:9b0f:0:b0:2cc:ce6d:5ae5 with SMTP id u15-20020a2e9b0f000000b002ccce6d5ae5mr2349683lji.54.1705308796938; Mon, 15 Jan 2024 00:53:16 -0800 (PST) MIME-Version: 1.0 References: <205735b2-11e1-4b5e-baa2-7418753f591f@eastlink.ca> In-Reply-To: From: Roman Perepelitsa Date: Mon, 15 Jan 2024 09:53:05 +0100 Message-ID: Subject: Re: Slurping a file (was: more spllitting travails) To: Bart Schaefer Cc: Zsh Users Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Seq: 29487 Archived-At: X-Loop: zsh-users@zsh.org Errors-To: zsh-users-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-users-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: On Sun, Jan 14, 2024 at 11:10=E2=80=AFPM Bart Schaefer wrote: > > On Sun, Jan 14, 2024 at 2:34=E2=80=AFAM Roman Perepelitsa > wrote: > > > > On Sat, Jan 13, 2024 at 9:02=E2=80=AFPM Bart Schaefer wrote: > > > > > > IFS=3D read -rd '' file_content > > > In addition to being unable to read files with nul bytes, this > > solution suffers from additional drawbacks: > > > > - It's impossible to distinguish EOF from I/O error. > > Pretty sure you can do that by examining $ERRNO on nonzero status? I wouldn't do that other than for debugging. In general, you can examine errno only for functions that explicitly document how they set it. If this part is not documented, you have to assume the function may set errno to anything both on success and on error. Also, most libc functions may set errno to anything on success. In this specific case perhaps `read` calls `malloc` after an I/O error, which may trash errno. Or perhaps at the end of `read I'm curious whether > setopt nomultibyte > read -u 0 -k 8192 ... > is actually that much slower in a slurp-like loop. It is slightly *faster*. For smaller files the difference is about 25%. From 512KB and up there is no discernible difference. > Another thought: Use -c count option to get number of bytes read and > -s $size option to specify buffer size. If (( $count =3D=3D $size )) the= n > double $size for the next read. This does not seem to help, although this might be dependent on the device and filesystem. Here's a benchmark for various file sizes (rows) and various fixed buffer sizes (columns): n fsize 1KB 2KB 4KB 8KB 16KB 32KB 64KB 1 0 41 43 43 43 51 52 53 2 1 47 48 49 48 57 57 59 3 2 48 48 48 48 56 57 58 4 4 49 49 48 49 62 61 59 5 8 74 75 51 49 62 61 63 6 16 47 51 49 49 57 61 63 7 32 47 50 49 50 58 58 59 8 64 54 53 49 50 59 58 71 9 128 50 50 51 51 59 60 61 10 256 49 52 51 51 60 61 63 11 512 53 55 55 54 64 64 65 12 1024 58 61 60 61 57 68 71 13 2048 77 72 71 74 83 83 83 14 4096 112 102 88 89 107 100 108 15 8192 188 153 152 145 161 163 140 16 16384 343 290 270 259 265 240 225 17 32768 658 577 427 471 499 495 489 18 65536 1281 1082 983 771 938 827 937 19 131072 2659 2214 2046 1952 1893 1928 1506 20 262144 4818 4608 4195 4254 3810 3955 3043 21 524288 10174 8967 7502 6382 7632 6142 7148 22 1048576 21591 18205 16424 15691 15243 14327 14889 23 2097152 41156 36087 32731 31840 30104 30090 29913 24 4194304 89814 72949 66447 62716 60998 60252 59485 25 8388608 191579 147195 125987 116327 121544 122384 122631 4KB and 8KB buffers perform best in this benchmark across all file sizes. Given that 8KB is the default for sysread, there is no apparent reason to use `-s`. > > typeset -g REPLY=3D${(j::)content} > > Why the typeset here? Just assign? Just a habit from using warn_create_global in my scripts. It catches typos and missing `local` declarations quite well. > Sadly there's another utility named "slurp": > > slurp > cli utility to select a region in a Wayland compositor That's too bad: "slurp" is a well-known moniker for reading the full content of a file (https://www.google.com/search?q=3Dfile+slurp). Perhaps zslurp? Roman.