From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 0ff7b6cf for ; Wed, 17 Jul 2019 09:55:29 +0000 (UTC) Received: (qmail 10841 invoked by alias); 17 Jul 2019 09:55:22 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: List-Unsubscribe: X-Seq: 24074 Received: (qmail 15892 invoked by uid 1010); 17 Jul 2019 09:55:22 -0000 X-Qmail-Scanner-Diagnostics: from mail-io1-f54.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.101.2/25510. spamassassin: 3.4.2. Clear:RC:0(209.85.166.54):SA:0(-2.0/5.0):. Processed in 1.804733 secs); 17 Jul 2019 09:55:22 -0000 X-Envelope-From: roman.perepelitsa@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 209.85.166.54 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YnEL5H0XzJSacwDutkOBrfVuz2aZ4AWmGFV8/LrDANY=; b=nR//hItN8j054LuMCzTJkRBdHiKQZJBb4cVbIpy4qDrlr9Z7tYKTDpvBdijVmDx+V6 Mvd4hGpTIMsj+UdFLauwmYoIdPB2dMIC5KmR3iS9w4GnW9doFr7+mU8djL6GzYqeRB/a kXbQxdZZcA+slEcmd6CJ2MQRfg6bsyezXSQ+B9ODz5UHPT/vleasWiluR4VIr19sS5Xg fZtpy5PXw6vroAoSpLCTmLF2PRL5dTRaxeHLPBBGrH02LsyhCIz1v1+IJA6MYy2L/GXw C4cohUW5zO5HV6EBEkzyeOm9rHs4/UigYyC3XP+stcpMv4PM6AzAwvd7GFKzvW2xHPZ8 mEFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YnEL5H0XzJSacwDutkOBrfVuz2aZ4AWmGFV8/LrDANY=; b=bWUebqUW86iFBzzVKeoufyCO3CdNuXd3M0IMFO4Hb5nx3u9wQGHgimEeSqf+OPItxv 0aY2IpvRGJtllX/qbknYBX7ao63ZeaaQK2j+uIiB3jkF88TWYv9u6HkhtxudfFBn3sXa 7SKUtJuYeWN9zACZO8mAyCzc+QTcjZusaKELNT6WeqKoMCGgT529+WNBAafZic09cdom GgyQ+Bi6Jl1wD+boGWGJkobUaWq37Kes1GnJNU6TdRbxLMF4Q6zz/YjqznBF/prJW/vK Un26GED2JIxfE5JP5HCWeWdRjvr2HHtk3+vbZrjboRaaoVzrgV/7s6A0t1p5WT9kfUlC Vyiw== X-Gm-Message-State: APjAAAV2XTGySrZKqXlPfFbTfHTzsZdjIFB3Hm5yjx4DlOaYx7x7GCb2 X3ASMr7qcOEWbPvV3eXIUb0kPWUwLJgJYNc24v0= X-Google-Smtp-Source: APXvYqzz28nVByyOlIwXFFkRIBMabL7gj0alSgQWu991EFDjducw0kxvA15ARLPrU5biKpbfoaN5jPx/bN96i4Ggs6Q= X-Received: by 2002:a02:b914:: with SMTP id v20mr35333186jan.83.1563357286661; Wed, 17 Jul 2019 02:54:46 -0700 (PDT) MIME-Version: 1.0 References: <95595e97-d131-ca4b-ead8-6c0a371ecb05@goots.org> <8c968d3a-2a7a-d16c-25f4-2fc13cdfffa7@goots.org> In-Reply-To: From: Roman Perepelitsa Date: Wed, 17 Jul 2019 11:54:35 +0200 Message-ID: Subject: Re: Nested function definition question To: Nick Cross Cc: Ray Andrews , Zsh Users Content-Type: text/plain; charset="UTF-8" On Wed, Jul 17, 2019 at 11:35 AM Nick Cross wrote: > > On 17/07/2019 06:49, Roman Perepelitsa wrote: > > Inlining functions makes a big difference because function calls are > > very expensive in ZSH. Calling a function to do something trivial > > takes ~10 times longer than doing the same thing inline. > > Really? Thats interesting. Is there any benchmarks available showing > this? Here's a quick one: local -i c=0 function inc() { ((++c)) } function outofline() { inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; inc; } function inline() { ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); } time ( repeat 10000 outofline ) time ( repeat 10000 inline ) I've got: ( repeat 10000; do; outofline; done; ) cpu 4.184 total ( repeat 10000; do; inline; done; ) cpu 0.171 total Apparently, ((++c)) is 24 times faster than a call to function that does the same thing. You can adapt this benchmark to measure something closer to what you care about. > So is it actually possible to do this? It's definitely possible to inline functions by hand the same way I did in the benchmark. Another thing that can make a difference is manual loop unrolling, although the overhead of looping isn't as big as that of function calls. That is, you can replace this: function normal() { local -i i c for ((i = 0; i != 6400000; ++i)); do ((++c)) done } With this: function unrolled() { local -i i c for ((i = 0; i != 6400000; i+=64)); do ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); ((++c)); done } I've gout about 3x speedup on my machine with this transformation. It's quite rare in my experience that you can get to the point where it makes sense to perform these low-level optimizations. You can usually get decent performance by getting rid of all forks and replacing loops with expansions. Array expansions are especially powerful when it comes to making your scripts faster. Roman.