From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27338 invoked by alias); 25 Mar 2011 02:50:21 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 15895 Received: (qmail 23651 invoked from network); 25 Mar 2011 02:50:18 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.1 Received-SPF: pass (ns1.primenet.com.au: SPF record at myproxylists.com designates 217.119.39.74 as permitted sender) X-Originating-IP: 127.0.0.1 Message-ID: <31bd8ebb9d28bb9817c7d702aba8dd63.squirrel@gameframe.net> In-Reply-To: <110324192914.ZM29861@torch.brasslantern.com> References: <5c830d92034e61a01c96e6aa4d10798c.squirrel@gameframe.net> <110324192914.ZM29861@torch.brasslantern.com> Date: Fri, 25 Mar 2011 04:50:17 +0200 Subject: Re: Why large arrays are extremely slow to handle? From: nix@myproxylists.com To: zsh-users@zsh.org User-Agent: SquirrelMail/1.4.20 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal > On Mar 25, 2:37am, nix@myproxylists.com wrote: > } > } I think there's is a big flaw somewhere that causes the following: > } > } #!/bin/zsh > } emulate zsh > } TEST=() > } for i in {1..10000} ; do > } TEST+="$i" # append (push) to an array > } done > } > } --- 10K > } time ./bench > } real 0m3.944s > } > } --- 50K BOOOM! WTF? > } > } time ./bench > } real 1m53.321s > } > } Any ideas why it's extremely slow? > > It's not the array, it's the loop interpretation thereof. > > TEST=({1..50000}) > > will populate a 50k-element array almost instantly. Here's a 500,000 > element array on my home desktop: > > torch% typeset -F SECONDS > torch% print $SECONDS; TEST=({1..500000}); print $SECONDS > 24.9600260000 > 25.4452710000 > torch% > > Put that in a loop instead, and you're interpreting a fetch/replace of the > whole array on every cycle. This is in part because array assignment is > generalized for replacing arbitrary slices of the array; append is not > treated specially. [If someone wants to try to optimize this, start at > the final "else" block in Src/params.c : setarrvalue() -- but beware of > what happens in freearray().] > > As it happens, you can get much better update performance at the cost of > some memory performance by using an associative array instead. Try: > > typeset -A TEST > for i in {1..50000} ; do > TEST[$i]=$i > done > > Individual elements of hashes *are* fetched by reference without the > whole hash coming along, and are updated in place rather than treated > as slices, so this is your fastest option without a C-code change. typeset -A did the trick. Now the speed is decent enough on my 1090T: For a 50K: time ./bench real 0m0.681s My C skills are very limited (barely basics), so im afraid it's better not to touch at all to that code. > > You can also build up the "array" as a simple text block with delimiters, > then split it to an actual array very quickly. Append to a scalar isn't > really any better algorithmically than an array, but it does fewer memory > operations. > > torch% for i in {1..50000}; do TEST+="$i"$'\n' ; done > torch% TEST=(${(f)TEST}) > torch% print $#TEST > 50000 > As usually, thank you for very detailed explanation of the problem and solution. Now even 'TEST=( $(print -r -- ${(u)=TEST}) ) # List only unique elements in an array' give reasonable speed after using associative array. I like "foreach", very similar to PHP's one. Thanks.