On 23 January 2015 at 11:46, <ondra@mistotebe.net> wrote:
> when running zsh from a terminal multiplexer like tmux, it is possible
> (and not uncommon) to close several processes at the exact same moment.
> In that case, all of them try to sync their history and all but one fail
> to acquire the lock. So far so good.
>
> However, all of these call nanosleep({1,0}), and since we might be on a
> multicore system, they might just be scheduled at the same time again,
> only one of them will progress and the scenario repeats. What the user
> sees is the terminals not disappearing immediately, but one after the
> other, a second per process.
>
> A common solution to this problem is to randomize the back-off, for
> example in the interval (0, 1s). On a decent system without load, this
> would make it more responsive (all of them would be likely to finish
> within a second of the user's request if possible) while not causing
> any undue increase in the load on the machine.

Yes, this is sensible.  I think the current simplistic algorithm
is mostly there because we aren't making assumptions about the
availability of sleep with a resolution better than 1 second ---
and a random backoff of multiples of seconds isn't very useful.
However, it can be improved in various largely standard ways which
are already used in other parts of the shell, so this is fixable without
needing to rely on system specific features.

A better scheme would be to start with a small wait time, says 0.1 seconds,
with a random back off, with a maximum length of say the same time again,
and then double the base time each time we back off, but keeping the
same maximum time of 10 seconds since we first tried before reporting
a failure.  We can then tweak this as necessary.  Or maybe just
having a random backoff with doubling maximum and no fixed
base time is good enough for typical problems.

pws

P.S. if acronyms like EDCA and SIFS went through anyone else's mind,
too, my commiserations :-).