zsh-workers
 help / color / mirror / code / Atom feed
From: Sebastian Gniazdowski <psprint@zdharma.org>
To: zsh-workers@zsh.org
Subject: Re: An idea for fast "last-N-lines" read
Date: Thu, 23 Mar 2017 11:17:26 +0100	[thread overview]
Message-ID: <etPan.58d3a0b6.19495cff.10ab3@MacMini.local> (raw)
In-Reply-To: <etPan.58d39baa.74b0dc51.10ab3@MacMini.local>

Włącz 23 marca 2017 at 05:05:33, Bart Schaefer (schaefer@brasslantern.com) napisano: 
> This could, however, be made a lot better, e.g. by introducing a cache 
> of mapped files into mapfile.c and causing get_contents() to first use 
> the cache (and setpmmapfile to update it, unsetpmmapfile to erase an 
> entry from it) before resorting to remapping the actual file. 

I think this should be done (I might get to it too at some point). As for the use case "last-N-lines", one could do ${${mapfile[name]}[-250000,-1]}}, providing that average line length would be computed to be 250.. So the 250000 is expected to hold needed 1000 lines. Tried this on some log that I aggregate: 

% lines=(); typeset -F SECONDS=0; lines=( "${(@f)${${mapfile[input.db]}[-250000,-1]}}" ); echo $SECONDS, ${#lines} 
0.3925410000, 1536 

Doing this in traditional way: 

% lines=(); typeset -F SECONDS=0; lines=( ${"${(@f)"$(<input.db)"}"[-1000,-1]} ); echo $SECONDS, ${#lines} 
0.8828770000, 1000 

Without slicing: 

% lines=(); typeset -F SECONDS=0; lines=( "${(@f)"$(<input.db)"}" ); echo $SECONDS, ${#lines} 
0.8219170000, 38707 

I would expect mapfile to perform little better. For input file with 12500 lines, it's 0.1625320000, 1151. So the time quite raises from 162 ms to 400 ms when input file is larger. This could be just constant. It looks like [-250000,-1] does typical multibyte string iteration over buffer, to establish offset. If one would just look for last 250000 bytes and do this at mapfile level, it would be constant. Not sure if unicode can be fully broken for whole buffer this way, telling from the way how Zsh handles unicode, it would be just few first characters that could be broken. For looking for last-N-lines, I wonder how should be '\n' handled in reverse-read unicode buffer. 

> } BTW, (@f) skips trailing \n\n... That's quite problematic and there's 
> } probably no workaround? 
> 
> In double quotes, (@f) retains empty elements, which includes making 
> empty elements out of trailing newlines. However, there is no way to 
> get $(<file) [or $(<<<string) etc.] to retain trailing newlines, which 
> is most likely what's misleading you.  

Ah, thanks. Wonder how would sysread perform, and what about metafication when using it. 

-- 
Sebastian Gniazdowski 
psprint [at] zdharma.org


  parent reply	other threads:[~2017-03-23 10:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-21  6:04 Sebastian Gniazdowski
2017-03-23  3:53 ` Bart Schaefer
     [not found]   ` <etPan.58d39baa.74b0dc51.10ab3@MacMini.local>
2017-03-23 10:17     ` Sebastian Gniazdowski [this message]
2017-03-23 16:46       ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=etPan.58d3a0b6.19495cff.10ab3@MacMini.local \
    --to=psprint@zdharma.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).