From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3618 Path: news.gmane.org!not-for-mail From: Nathan McSween Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Wed, 10 Jul 2013 13:49:50 -0700 Message-ID: References: <20130709053711.GO29800@brightrain.aerifal.cx> <1373485116.27613.40@driftwood> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b5d4dae3ad6bf04e12e6bfd X-Trace: ger.gmane.org 1373489402 29062 80.91.229.3 (10 Jul 2013 20:50:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 10 Jul 2013 20:50:02 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3622-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 10 22:50:02 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Ux1Ks-00012F-OF for gllmg-musl@plane.gmane.org; Wed, 10 Jul 2013 22:50:02 +0200 Original-Received: (qmail 3586 invoked by uid 550); 10 Jul 2013 20:50:02 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3576 invoked from network); 10 Jul 2013 20:50:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ndln9uONumZMaX0ZRT4QR9wdRdgPhLUUexJYrkbvEJU=; b=RfB8h1CtNb5/fdtCBMakWNX7nldfUvKp+9vuVE+z1VIVFjU5TM1mSG7iO69BMhZW6p SIrx+9soWF5cm1r/5Gnhw8RX/Ez/DvTNJdFWUnLU3f0DVHx8EWoX4Qz9mj8IUwIg59GJ Bfw/s0a5Pf4n7Ill/WZ/Jfdf4ArxZZCkfLbEXqjXh3VQ2YnLZilOALYLN28vAJ3JT76X YvsRkPLCEqo+0Z4XYndbsRMkJiwjIS0+e5CZ748R0fwP0n92D2Q5R3t/B8KrxN6J/Jad +0mzX5BF1S8NtZG4/Vhsw3ryVhvPsYynvB+jgSef4LfcfNWiV4ekhUv35AWLNbg0HHFL +YfQ== X-Received: by 10.194.19.130 with SMTP id f2mr18832663wje.22.1373489390603; Wed, 10 Jul 2013 13:49:50 -0700 (PDT) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:3618 Archived-At: --047d7b5d4dae3ad6bf04e12e6bfd Content-Type: text/plain; charset=UTF-8 I would think the iterate-per-char-till-zero would take the most time, even if GCC vectorized without SIMD it would still need to iterate to find the zero in the word with the zero, current musl does this as well though. On Jul 10, 2013 1:34 PM, "Andre Renaud" wrote: > >> What also might be worth testing is whether GCC can compete if you > >> just give it a naive loop (not the fancy pseudo-vectorized stuff > >> currently in musl) and good CFLAGS. I know on x86 I was able to beat > >> the fanciest asm strlen I could come up with simply by writing the > >> naive loop in C and unrolling it a lot. > > > > > > Duff's device! > > That was exactly my first idea too, but interestingly it turns out not > to have really added any performance improvement. Looking at the > assembler, with -O3, gcc does a pretty good job of unrolling as it is. > > Regards, > Andre > --047d7b5d4dae3ad6bf04e12e6bfd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I would think the iterate-per-char-till-zero would take the = most time, even if GCC vectorized without SIMD it would still need to itera= te to find the zero in the word with the zero, current musl does this as we= ll though.

On Jul 10, 2013 1:34 PM, "Andre Renaud"= ; <andre@bluewatersys.com&= gt; wrote:
>> What also might be worth testing is whether GCC can compete if you=
>> just give it a naive loop (not the fancy pseudo-vectorized stuff >> currently in musl) and good CFLAGS. I know on x86 I was able to be= at
>> the fanciest asm strlen I could come up with simply by writing the=
>> naive loop in C and unrolling it a lot.
>
>
> Duff's device!

That was exactly my first idea too, but interestingly it turns out not
to have really added any performance improvement. Looking at the
assembler, with -O3, gcc does a pretty good job of unrolling as it is.

Regards,
Andre
--047d7b5d4dae3ad6bf04e12e6bfd--