From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6162 invoked by alias); 6 Jan 2018 05:16:26 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42237 Received: (qmail 7163 invoked by uid 1010); 6 Jan 2018 05:16:26 -0000 X-Qmail-Scanner-Diagnostics: from aok120.rev.netart.pl by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(85.128.245.120):SA:0(-1.9/5.0):. Processed in 9.038489 secs); 06 Jan 2018 05:16:26 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: psprint@zdharma.org X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | X-Virus-Scanned: by amavisd-new using ClamAV (12) Date: Sat, 6 Jan 2018 06:16:12 +0100 From: Sebastian Gniazdowski To: Bart Schaefer , "=?utf-8?Q?zsh-workers=40zsh.org?=" Message-ID: In-Reply-To: References: Subject: Re: Idea for optimization (use case: iterate string with index parameter) X-Mailer: Airmail (467) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On 5 Jan 2018 at 23:23:57, Bart Schaefer (schaefer=40brasslantern.com) wr= ote: > On =46ri, Jan 5, 2018 at 5:38 AM, Sebastian Gniazdowski > wrote: > > iterating string with index parameter is quite slow, because unicode = characters are =20 > skipped and counted using mbrtowc(). > =20 > I can't remember the last time I needed to do that kind of iteration. Maybe indeed it's not that common. It's one of the basic things one can d= o with strings but in practice, hmm. I would accumulate that optimization= though, as the overall optimization starts to give effects while it's la= rgely composed of disappointing optimizations. > typeset -a iter=3D(=24=7B(s//)string=7D) > for ((i=3D1; i <=3D =24=23iter; i++)); do something with =24iter=5Bi=5D= ; done > string=3D=24=7B(j//)iter=7D =23 if needed > =20 > That is more memory-intensive, of course, but it also assists with > cases of unordered access into the array of characters. It might give some effects, I was doing =22for letter in =24iter=22 path = blindly and missed the obvious =24iter=5Bi=5D way, and without index, =22= for letter ...=22 couldn't replace existing code. > > In general, the array would hold =23N (5-10 or so) last string-index = requests. If new request =20 > would target the same string, but index greater by 1, getarg() would ca= ll mbrtowc() once =20 > (via MB=5FMETACHARLEN macro) reusing the previous in-string pointer. > =20 > Why only when greater by 1=3F If greater, scan to and record the next > needed position. Same number of mbrtowc() conversions, overall. Yes this should be generalized this way, I didn't want to complicate exam= ple. I recalled yesterday that for ASCII there's a short path that returns 1 a= nd doesn't call mbrtowc() to compute size of character. In discussion on = irc this yielded a conclusion that the cache should probably be 1-element= only, because it would be an overkill for simple =24string=5B2=5D, etc. = indexing. This way the code should be very simple. The params.c part in q= uestion is:=C2=A0 https://github.com/zsh-users/zsh/blob/c2cc8b0fbefc9868fa83537f5b6d90fc1ec= 438dd/Src/params.c=23L1478-L1489 I'm little afraid that getarg() might be called in some generalized situa= tions, but heck it shouldn't be called for a=3D=22=24b=22, so the cache m= ight well survive in many typical loops. And maybe a 2-element cache will= not add much code and not slow down simple indexing. -- =20 Sebastian Gniazdowski psprint /at/ zdharma.org