Re: Why sourcing a file is not faster than doing a loop with eval, zle -N

zsh-users
 help / color / mirror / code / Atom feed

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
       [not found] <etPan.594513a8.516100cd.10b2e__10513.1716504276$1497699329$gmane$org@zdharma.org>
@ 2017-06-19 12:24 ` Stephane Chazelas
  2017-06-19 15:31   ` Bart Schaefer
       [not found]   ` <170619083116.ZM17323__41722.0601499595$1497886320$gmane$org@torch.brasslantern.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Stephane Chazelas @ 2017-06-19 12:24 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: zsh-users

Note:

$ time zsh -c 'repeat 100 . ./fsh_cache'
[...]
./fsh_cache:zle:269: invalid widget `.menu-select'
./fsh_cache:zle:269: invalid widget `.menu-select'
zsh -c 'repeat 100 . ../hacking-private/FSH/fsh_cache'  1.13s user 0.98s system 99% cpu 2.109 total

A lot of "system" time.

$ wc ./fsh_cache
  554  2964 58524 ./fsh_cache

$ strace -c  zsh -c '. ./fsh_cache'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000996           0     60022           rt_sigprocmask


60022 calls to rt_sigprocmask sounds a bit much. They  seem to be all on

#0  0x00007ffff730d730 in __sigprocmask (how=1, set=0x7fffffffb1c0, oset=0x7fffffffb120) at ../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#1  0x000000000049b2c0 in signal_unblock (set=...) at signals.c:274
#2  0x00000000004580ac in shingetline () at input.c:148
#3  0x000000000045899b in inputline () at input.c:278
#4  0x000000000045882a in ingetc () at input.c:226
#5  0x000000000046211e in gettok () at lex.c:611
#6  0x000000000046183b in zshlex () at lex.c:275
#7  0x0000000000484825 in parse_event (endtok=37) at parse.c:569
#8  0x0000000000453f6e in loop (toplevel=0, justonce=0) at init.c:146
#9  0x0000000000456db0 in source (s=0x708930 "../hacking-private/FSH/fsh_cache") at init.c:1386
#10 0x0000000000425a0e in bin_dot (name=0x7ffff7ff2550 ".", argv=0x7ffff7ff25b0, ops=0x7fffffffd980, func=0) at builtin.c:5699
#11 0x00000000004105ff in execbuiltin (args=0x7ffff7ff2580, assigns=0x0, bn=0x6dc7c0 <builtins+384>) at builtin.c:485
#12 0x0000000000437fd4 in execcmd_exec (state=0x7fffffffe300, eparams=0x7fffffffdef0, input=0, output=0, how=18, last1=1) at exec.c:3958
#13 0x0000000000431a50 in execpline2 (state=0x7fffffffe300, pcode=131, how=18, input=0, output=0, last1=1) at exec.c:1873
#14 0x0000000000430665 in execpline (state=0x7fffffffe300, slcode=4098, how=18, last1=1) at exec.c:1602
#15 0x000000000042f95a in execlist (state=0x7fffffffe300, dont_change_job=0, exiting=1) at exec.c:1360
#16 0x000000000042efd4 in execode (p=0x7ffff7ff2488, dont_change_job=0, exiting=1, context=0x4c37a2 "cmdarg") at exec.c:1141
#17 0x000000000042ee9c in execstring (s=0x7fffffffe772 ". ../hacking-private/FSH/fsh_cache", dont_change_job=0, exiting=1,
    context=0x4c37a2 "cmdarg") at exec.c:1107
#18 0x0000000000456a61 in init_misc (cmd=0x7fffffffe772 ". ../hacking-private/FSH/fsh_cache", zsh_name=0x7fffffffe76a "zsh") at init.c:1292
#19 0x0000000000457e8e in zsh_main (argc=3, argv=0x7fffffffe4f8) at init.c:1678
#20 0x000000000040f7f6 in main (argc=3, argv=0x7fffffffe4f8) at ./main.c:93

Which probably explains why one gets about as many rt_sigprocmask calls as
there are bytes in the file.

$ time zsh -c 'repeat 100 eval  "$(<fsh_cache)"'

gives:

1.18s user 0.05s system 99% cpu 1.239 total

With "only" 942 rt_sigprocmask calls according to strace -c.

There's probably scope for optimisation here, though I can't
comment further as I don't know why that signal handling code is
there in the first place.

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
  2017-06-19 12:24 ` Why sourcing a file is not faster than doing a loop with eval, zle -N Stephane Chazelas
@ 2017-06-19 15:31   ` Bart Schaefer
       [not found]   ` <170619083116.ZM17323__41722.0601499595$1497886320$gmane$org@torch.brasslantern.com>
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-06-19 15:31 UTC (permalink / raw)
  To: zsh-users

On Jun 19,  1:24pm, Stephane Chazelas wrote:
}
} There's probably scope for optimisation here, though I can't
} comment further as I don't know why that signal handling code is
} there in the first place.

rt_signprocmask should not be significantly more expensive than an
assignment to an integer.

The signal handling code is there because the shell MUST NOT respond
instantly to arbitrary signals while doing operations such as token
interpretation or or memory management -- the signal handlers might
themselves invoke shell commands/functions and many of those layers
are not safe for re-entrancy -- but it MUST respond to those signals 
whenever it may be blocked for an unknown length of time, such as when
reading from a file descriptor.

Many years of "I can't interrupt my script when X" or "interrupting
my script when Y causes a crash" resulted in the current signal
paradigm.  When the shell was first written, processors weren't fast
enough and process scheduling not well-threaded enough to expose a
lot of these issues, but the better our computers get the greater
the likelyhood of hitting an ever-smaller race condition window, so
those windows have to be aggressively closed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
       [not found]   ` <170619083116.ZM17323__41722.0601499595$1497886320$gmane$org@torch.brasslantern.com>
@ 2017-06-19 16:16     ` Stephane Chazelas
  2017-06-19 19:14       ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Chazelas @ 2017-06-19 16:16 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-users

2017-06-19 08:31:16 -0700, Bart Schaefer:
> On Jun 19,  1:24pm, Stephane Chazelas wrote:
> }
> } There's probably scope for optimisation here, though I can't
> } comment further as I don't know why that signal handling code is
> } there in the first place.
> 
> rt_signprocmask should not be significantly more expensive than an
> assignment to an integer.

Still,

$ time zsh -c 'repeat 100 . ./fsh_cache'  2> /dev/null
zsh -c 'repeat 100 . ./fsh_cache' 2> /dev/null  0.73s user 0.78s system 99% cpu 1.522 total
$ time zsh -c 'repeat 100 eval "$(<fsh_cache)"'  2> /dev/null
zsh -c 'repeat 100 eval "$(<fsh_cache)"' 2> /dev/null  0.80s user 0.04s system 99% cpu 0.848 total

See how the system time falls to almost 0 with the eval variant.
I get the same kind of performance gain if I comment out the
line that eventually calls the rt_signprocmask there.
winch_unblock() (so only for SIGWINCH).

> The signal handling code is there because the shell MUST NOT respond
> instantly to arbitrary signals while doing operations such as token
> interpretation or or memory management -- the signal handlers might
> themselves invoke shell commands/functions and many of those layers
> are not safe for re-entrancy -- but it MUST respond to those signals 
> whenever it may be blocked for an unknown length of time, such as when
> reading from a file descriptor.
> 
> Many years of "I can't interrupt my script when X" or "interrupting
> my script when Y causes a crash" resulted in the current signal
> paradigm.  When the shell was first written, processors weren't fast
> enough and process scheduling not well-threaded enough to expose a
> lot of these issues, but the better our computers get the greater
> the likelyhood of hitting an ever-smaller race condition window, so
> those windows have to be aggressively closed.

I suspected it would be something like that, but here note that
it's done for every byte of the data even though the code is
read in full chunks at a time (by stdio's fgetc)

If you look at the strace output, you see something:

open("/etc/zsh/zshenv", O_RDONLY|O_NOCTTY) = 3
fcntl(3, F_DUPFD, 10)                   = 11
read(11, "# /etc/zsh/zshenv: system-wide ."..., 4096) = 623
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [], 8) = 0
[...]
open("./fsh_cache", O_RDONLY|O_NOCTTY)  = 3
fcntl(3, F_DUPFD, 10)                   = 13
read(13, "zle -N orig-s0.0000060000-r9037-"..., 4096) = 4096
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [WINCH], [CHLD], 8) = 0
[...]

Most of those rt_sigprocmask are unnecessary.

That defeats a benefit of stdio saving read() systems calls by
reading in chunk if we end up doing one system call per byte
anyway.

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
  2017-06-19 16:16     ` Stephane Chazelas
@ 2017-06-19 19:14       ` Bart Schaefer
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-06-19 19:14 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: zsh-users

This is now WELL into zsh-workers territory, please direct replies there
rather than to the -users list.

On Jun 19,  5:16pm, Stephane Chazelas wrote:
}
} That defeats a benefit of stdio saving read() systems calls by
} reading in chunk if we end up doing one system call per byte
} anyway.

Unfortunately we need to read from stdio one byte at a time, and
as far as I know there is no way to "ask" stdio whether it is still
working on a buffer, or is instead going to refill its buffer (and
therefore possibly block) on the next attempted getc() -- and to
find out would likely be more expensive than doing the system call.

Also stdio is *itself* not re-entrant, so we have to control signals
around all stdio operations.

Just to demonstrate why the signal handling is necessary; consider
this:

% echo $(trap '' INT; sleep 100)

That shell is now un-interruptible for 100 seconds, because readoutput()
does not do signal management around its fgetc() calls.  Worse, if you
type ^Z the sleep is silently suspended and the parent is hung forever.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
  2017-06-17 16:39           ` Bart Schaefer
@ 2017-06-17 17:25             ` Sebastian Gniazdowski
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-06-17 17:25 UTC (permalink / raw)
  To: Bart Schaefer, Zsh Users

On 17 Jun 2017 at 18:39:55, Bart Schaefer (schaefer@brasslantern.com) wrote:
> On Jun 17, 5:44pm, Sebastian Gniazdowski wrote:
> }
> } So the gain from zcompiled .fsh_cache seems to be maximal.
>  
> As long as you've got a good way to test these timings ... compile
> your .fsh_cache file with "zcompile -R" and see if you still get
> any speedup?
>  
> Some simple tests that I did seem to indicate that zcompile does
> not do much good if the default "zcompile -M" behavior is disabled.
>  

No problem. The results seem the same, best "zcompile" time (previously reported): 0.375 s, best "zcompile -R": 0.376 s. Did this twice after anxiety that I forgot "-R" (very unlikely): 0.374 s.

BTW, I have a whacky idea:

1. Invoke "source_prepare ~/.plugins/aplugin.plugin.zsh.zwc", etc. for all used plugins
2. Continue with normal zshrc
3. After it, invoke source_load with the same paths
4. source_prepare will use threads to load .zwc files
5. There will be internal hash table mapping paths to Eprogs, mutexes
6. If the Eprog is not yet ready, source_load will hang on mutex

This is to: perform normal zshrc execution while loading of bytecode in background. I checked that ~/.fsh_cache.zwc is 158 kB in size, quite much. But all this is probably a lost game, the mutex use, thread creation, will waste the gain. Although a cool thing to code, I think I will do it anyway.

--  
Sebastian Gniazdowski
psprint /at/ zdharma.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
  2017-06-17 15:44         ` Sebastian Gniazdowski
@ 2017-06-17 16:39           ` Bart Schaefer
  2017-06-17 17:25             ` Sebastian Gniazdowski
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2017-06-17 16:39 UTC (permalink / raw)
  To: Zsh Users

On Jun 17,  5:44pm, Sebastian Gniazdowski wrote:
}
} So the gain from zcompiled .fsh_cache seems to be maximal.

As long as you've got a good way to test these timings ... compile
your .fsh_cache file with "zcompile -R" and see if you still get
any speedup?

Some simple tests that I did seem to indicate that zcompile does
not do much good if the default "zcompile -M" behavior is disabled.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
  2017-06-17 14:56       ` Sebastian Gniazdowski
@ 2017-06-17 15:44         ` Sebastian Gniazdowski
  2017-06-17 16:39           ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-06-17 15:44 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users


On 17 czerwca 2017 at 16:56:12, Sebastian Gniazdowski (psprint@zdharma.org) wrote:
> Time for zcompiled ~/.fsh_cache is 0.375 in new test (previous: 0.383). So it's 20 ms  
> compared to slower normal read 0.396. Almost reached the goal of 40 ms. For someone having  
> startup time 200-250 ms the 40 ms would matter.

I've checked with zprof how much does the loop take normally:

19,41    19,41   36,83%     19,41    19,41   36,83%  _zsh_highlight_bind_widgets

It varies between 16 ms and 20 ms. So the gain from zcompiled .fsh_cache seems to be maximal.

--  
Sebastian Gniazdowski
psprint /at/ zdharma.org


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
       [not found]     ` <etPan.594538f9.2ea629d6.10b2e@AirmailxGenerated.am>
@ 2017-06-17 14:56       ` Sebastian Gniazdowski
  2017-06-17 15:44         ` Sebastian Gniazdowski
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-06-17 14:56 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users

17.06.2017 o 16:06:59, Bart Schaefer (schaefer@brasslantern.com) napisał:
> You might get faster parsing if you setopt noaliases. Just a thought,
> can't try it right now.

I tried it. I think there's no change. Setup that reads ~/.fsh_cache is 0.414 vs 0.417 (noaliases). Before I've given time 0.432, but there was single large value 0.599, so I now summed 9 numbers, and divided by 9 (giving 0.414).

Just to remind, normal time is 0.396 (0.389 in second test).

Time for zcompiled ~/.fsh_cache is 0.375 in new test (previous: 0.383). So it's 20 ms compared to slower normal read 0.396. Almost reached the goal of 40 ms. For someone having startup time 200-250 ms the 40 ms would matter.

--  
Sebastian Gniazdowski
psprint /at/ zdharma.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why sourcing a file is not faster than doing a loop with eval, zle -N
       [not found]   ` <CAH+w=7afTi=1bfLBCmq8-vB-rLWDtEkAtk8gCCna3-mQwZ1-Ow@mail.gmail.com>
@ 2017-06-17 14:05     ` Bart Schaefer
       [not found]     ` <etPan.594538f9.2ea629d6.10b2e@AirmailxGenerated.am>
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-06-17 14:05 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 100 bytes --]

You might get faster parsing if you setopt noaliases.  Just a thought,
can't try it right now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Why sourcing a file is not faster than doing a loop with eval, zle -N
@ 2017-06-17 11:34 Sebastian Gniazdowski
       [not found] ` <CAH+w=7bVXtubcdwvEBC9isE32683dUipAUS=vrAkgO5pp2bkkw@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-06-17 11:34 UTC (permalink / raw)
  To: zsh-users

Hello,
I've tried to optimize my fast-syntax-highlighting. The idea is simple, instead of a loop:

  for cur_widget in $widgets_to_bind; do
    case $widgets[$cur_widget] in
        ...
        builtin) eval "_zsh_highlight_widget_${(q)prefix}-${(q)cur_widget}() { _call_widget .${(q)cur_widget} -- \"\$@\" }"
               zle -N $cur_widget _zsh_highlight_widget_$prefix-$cur_widget;;
  ...
  ...

I do, in the same loop:
    ...
    print -r "zle -N" "$prefix-$cur_widget" "${widgets[$cur_widget]#*:}" >>| ~/.fsh_cache
    ...

and so on, to then only detect ~/.fsh_cache, and source it, skipping the loop. Times of "zsh -i -c exit" are:

- normal FSH:          0.3968 sec on average
- cache-feature FSH:   0.4329 sec on average
- zcompiled cache:     0.3831 sec on average

So, only after compiling ~/.fsh_cache, I get slightly better time, normally it is ~30 ms slower. I would expect this to be always and more faster. Why it is not?

I now suspect that maybe there's more parsing – loop doesn't have 554 lines like ~/.fsh_cache, and is parsed quickier.

Test data: https://github.com/zdharma/hacking-private/tree/master/FSH

-- 
Sebastian Gniazdowski
psprint /at/ zdharma.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-19 19:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <etPan.594513a8.516100cd.10b2e__10513.1716504276$1497699329$gmane$org@zdharma.org>
2017-06-19 12:24 ` Why sourcing a file is not faster than doing a loop with eval, zle -N Stephane Chazelas
2017-06-19 15:31   ` Bart Schaefer
     [not found]   ` <170619083116.ZM17323__41722.0601499595$1497886320$gmane$org@torch.brasslantern.com>
2017-06-19 16:16     ` Stephane Chazelas
2017-06-19 19:14       ` Bart Schaefer
2017-06-17 11:34 Sebastian Gniazdowski
     [not found] ` <CAH+w=7bVXtubcdwvEBC9isE32683dUipAUS=vrAkgO5pp2bkkw@mail.gmail.com>
     [not found]   ` <CAH+w=7afTi=1bfLBCmq8-vB-rLWDtEkAtk8gCCna3-mQwZ1-Ow@mail.gmail.com>
2017-06-17 14:05     ` Bart Schaefer
     [not found]     ` <etPan.594538f9.2ea629d6.10b2e@AirmailxGenerated.am>
2017-06-17 14:56       ` Sebastian Gniazdowski
2017-06-17 15:44         ` Sebastian Gniazdowski
2017-06-17 16:39           ` Bart Schaefer
2017-06-17 17:25             ` Sebastian Gniazdowski

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).