From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1338 invoked by alias); 25 Mar 2011 02:29:55 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 15894 Received: (qmail 12809 invoked from network); 25 Mar 2011 02:29:43 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts) From: Bart Schaefer Message-id: <110324192914.ZM29861@torch.brasslantern.com> Date: Thu, 24 Mar 2011 19:29:14 -0700 In-reply-to: <5c830d92034e61a01c96e6aa4d10798c.squirrel@gameframe.net> Comments: In reply to nix@myproxylists.com "Why large arrays are extremely slow to handle?" (Mar 25, 2:37am) References: <5c830d92034e61a01c96e6aa4d10798c.squirrel@gameframe.net> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: zsh-users@zsh.org Subject: Re: Why large arrays are extremely slow to handle? MIME-version: 1.0 Content-type: text/plain; charset=us-ascii On Mar 25, 2:37am, nix@myproxylists.com wrote: } } I think there's is a big flaw somewhere that causes the following: } } #!/bin/zsh } emulate zsh } TEST=() } for i in {1..10000} ; do } TEST+="$i" # append (push) to an array } done } } --- 10K } time ./bench } real 0m3.944s } } --- 50K BOOOM! WTF? } } time ./bench } real 1m53.321s } } Any ideas why it's extremely slow? It's not the array, it's the loop interpretation thereof. TEST=({1..50000}) will populate a 50k-element array almost instantly. Here's a 500,000 element array on my home desktop: torch% typeset -F SECONDS torch% print $SECONDS; TEST=({1..500000}); print $SECONDS 24.9600260000 25.4452710000 torch% Put that in a loop instead, and you're interpreting a fetch/replace of the whole array on every cycle. This is in part because array assignment is generalized for replacing arbitrary slices of the array; append is not treated specially. [If someone wants to try to optimize this, start at the final "else" block in Src/params.c : setarrvalue() -- but beware of what happens in freearray().] As it happens, you can get much better update performance at the cost of some memory performance by using an associative array instead. Try: typeset -A TEST for i in {1..50000} ; do TEST[$i]=$i done Individual elements of hashes *are* fetched by reference without the whole hash coming along, and are updated in place rather than treated as slices, so this is your fastest option without a C-code change. You can also build up the "array" as a simple text block with delimiters, then split it to an actual array very quickly. Append to a scalar isn't really any better algorithmically than an array, but it does fewer memory operations. torch% for i in {1..50000}; do TEST+="$i"$'\n' ; done torch% TEST=(${(f)TEST}) torch% print $#TEST 50000