zsh-workers
 help / color / mirror / code / Atom feed
From: Sebastian Gniazdowski <psprint@zdharma.org>
To: zsh-workers@zsh.org
Subject: Idea for optimization (use case: iterate string with index parameter)
Date: Fri, 5 Jan 2018 14:38:37 +0100	[thread overview]
Message-ID: <etPan.5a4f7fdd.52e15119.14e5a@zdharma.org> (raw)

Hello,
iterating string with index parameter is quite slow, because unicode characters are skipped and counted using mbrtowc(). This function takes much CPU time. It is called theta(0.5 * n^2) times when iterating string of length n. The goal is to make this theta(n).

The idea is to avoid calling mbrtowc by holding array of 5-10 structures with:
- pointer to beginning of string
- requested index (request to getarg() / getindex())
- computed in-string pointer for that index (by getarg() / getindex())

In general, the array would hold #N (5-10 or so) last string-index requests. If new request would target the same string, but index greater by 1, getarg() would call mbrtowc() once (via MB_METACHARLEN macro) reusing the previous in-string pointer.

So the idea is to catch not too large loops (using not too many scalar indexing) into the cache. For example, I saw z-sy-h uses such loops, my projects sometimes use them too. The point is that iterating a string and doing something with letters, e.g. counting brackets, is a very common use case, and the optimization would be triggered often.

I'm sharing as maybe there's better way to do this and I'm not sure when I'll get time to write this, and maybe someone has time and energy to do it. It would solve the rather shocking revelation of a typical C programmer, that strings are iterated like lists :)
--  
Sebastian Gniazdowski
psprint /at/ zdharma.org


             reply	other threads:[~2018-01-05 13:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-05 13:38 Sebastian Gniazdowski [this message]
2018-01-05 22:23 ` Bart Schaefer
2018-01-06  5:16   ` Sebastian Gniazdowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=etPan.5a4f7fdd.52e15119.14e5a@zdharma.org \
    --to=psprint@zdharma.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).