From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12125 Path: news.gmane.org!.POSTED!not-for-mail From: David Guillen Fandos Newsgroups: gmane.linux.lib.musl.general Subject: Re: Do not use 64 bit division if possible Date: Sun, 26 Nov 2017 00:46:56 +0100 Message-ID: <5575a0c9-0f53-f8e7-e0dc-6c1ff2b594f7@davidgf.es> References: <424674f0-8460-7807-7366-a87d8588e8bc@davidgf.es> <9716E0B3-B86C-4CFF-8636-6DE4BAA0D716@mac.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1511653639 8379 195.159.176.226 (25 Nov 2017 23:47:19 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 25 Nov 2017 23:47:19 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 To: musl@lists.openwall.com Original-X-From: musl-return-12141-gllmg-musl=m.gmane.org@lists.openwall.com Sun Nov 26 00:47:15 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1eIkA5-0001Ik-W4 for gllmg-musl@m.gmane.org; Sun, 26 Nov 2017 00:47:06 +0100 Original-Received: (qmail 7741 invoked by uid 550); 25 Nov 2017 23:47:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 7723 invoked from network); 25 Nov 2017 23:47:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidgf-es.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=/XElvx6tuhDtSsOQaOMxGsXcs17yM/MXqnU1g8C1vlU=; b=q8+tFQiLkLixVzV1KDULk8ML9Yrq9bgxLZNEliQNRKazmMW9Wf8Dd67pk3zsHgwfCU 8ZvYX8pLtX2LSqWFw92kkQY15h3cyYs/Akvqln5ciQ4cwN4V6I8iWJyT5GtY2fgR8YkL +jKOEE0mYIcdG5+ak25kfUOqsbXi6MjwUoToKQfSYplb/Sj90NkyoMViSKtp2xKBd4pZ E9vsKS5mIeuJs51hYfCffdHJm/irYzBy26QIjY0Lq4IjFWSc/0JkTZMtwy5MbHZACFw9 6u6LABUCPClOuVRWfzrXHfK88Mddk0IzwitMsEyTSgJKtsaZP6YHUuBlKtVYi9Sh7gDS n/1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=/XElvx6tuhDtSsOQaOMxGsXcs17yM/MXqnU1g8C1vlU=; b=FaoEDEkEEnRBfpneJCwxAqtmzhmEQCNoRiV24omodp6VAPobJWU39YmGp8Y+lf3Dfg mWDsVlvXZYGKUBLuQ/VFzAKoi33bprpxChGLUWAKR7BDnnXzO9nOpsdLkXLlofov59yN y7j9LNgbYbPPmZT2jBSoyloJO7Qvjunf2K844bufsiAJKeZxLc4UFMn4MLZNbcpG2K8I LIX36vvcg4TgI4r0Isn74SdJSAQpQDBG+zZpt5Shpugm3Al8baPHQp6jiljk2BLYJRHF w48sIo9tyN1eFofQGJkFlcj3V2a2Lznh9krTr42p9Ju+AOxHbs4s1avD6qSXjtjz5Y9P DjnA== X-Gm-Message-State: AJaThX5d7PS794t4BaEock9Rs3pGIiTmdAscmR5uIhn/twqL2E3YOf8a QE2JL6eMDP8MbS7gs/ROo9sLvpigGQ== X-Google-Smtp-Source: AGs4zMaoQUbPj0gKkOQWTU6XwrbTBkC0j+sG7El3JQQN/ViVOCoL665I8wPKJPLQhT9NQiAUBX/+/g== X-Received: by 10.80.164.27 with SMTP id u27mr44211138edb.11.1511653618238; Sat, 25 Nov 2017 15:46:58 -0800 (PST) In-Reply-To: <9716E0B3-B86C-4CFF-8636-6DE4BAA0D716@mac.com> Content-Language: en-US Xref: news.gmane.org gmane.linux.lib.musl.general:12125 Archived-At: Thanks for your response. Please note that PAGE_SIZE is not a constant but an alias to libc.page_size which is a variable of type size_t (signed). That's why at O1+ gcc doesn't generate a shift. I also created a patch to include libc.page_shift, but as far as I can see no other functions would benefit from it, since there's no other divides there (only negations, additions and subtractions). And yeah I agree, a_ctz_l is not exactly inexpensive but I guess it is better than full 64 bit signed division (that's why I cast unsigned otherwise the shift right is not trivial due to the sign). Thanks! David On 26/11/17 00:15, Michael Clark wrote: > At -O0 and above, clang and gcc strength reduce division by a constant power of two into a right shift (arithmetic or logical depending on signedness of the types). > > - https://cx.rv8.io/g/kDrEkB > > a_ctz_l is not exactly inexpensive, given it has a multiply, and, negate, shift, load (cache miss). > > We’d be better off defining PAGE_SHIFT if we want to be certain the code uses shift when optimisation is disabled, however I trust the compilers to turn the division into a shift. > > #ifndef a_ctz_l > #define a_ctz_l a_ctz_l > static inline int a_ctz_l(unsigned long x) > { > static const char debruijn32[32] = { > 0, 1, 23, 2, 29, 24, 19, 3, 30, 27, 25, 11, 20, 8, 4, 13, > 31, 22, 28, 18, 26, 10, 7, 12, 21, 17, 9, 6, 16, 5, 15, 14 > }; > if (sizeof(long) == 8) return a_ctz_64(x); > return debruijn32[(x&-x)*0x076be629 >> 27]; > } > #endif > > If you study the codegen then this might be a better change (including to all other archs). > > $ git diff arch/x86_64/bits/limits.h > diff --git a/arch/x86_64/bits/limits.h b/arch/x86_64/bits/limits.h > index 792a30b..32f29bf 100644 > --- a/arch/x86_64/bits/limits.h > +++ b/arch/x86_64/bits/limits.h > @@ -1,6 +1,6 @@ > #if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > -#define PAGE_SIZE 4096 > +#define PAGE_SIZE 4096UL > #define LONG_BIT 64 > #endif > > Try removing the UL suffix from the constant in the compiler explorer example above and see the change in codegen. > >> On 26/11/2017, at 9:52 AM, David Guillen Fandos wrote: >> >> Hey there, >> >> Just noticed that my binary was getting some gcc functions for integer division in some places coming from musl. I checked and it seems that, even though musl assumes PAGE_SIZE is always power of two, that we divide by it instead of using shifts for that. This results in extra overhead and slow division on platforms that do not have a 64 bit divider (even the ones that do have 32 bit divider). >> >> So I propose a patch here, let me know what you people think about. >> >> David >> >> >> diff --git a/src/conf/sysconf.c b/src/conf/sysconf.c >> index b8b761d0..aa9fc9d1 100644 >> --- a/src/conf/sysconf.c >> +++ b/src/conf/sysconf.c >> @@ -4,6 +4,7 @@ long sysconf(int name) >> #include >> #include "syscall.h" >> #include "libc.h" >> +#include "atomic.h" >> >> #define JT(x) (-256|(x)) >> #define VER JT(1) >> @@ -206,7 +206,7 @@ long sysconf(int name) >> if (name==_SC_PHYS_PAGES) mem = si.totalram; >> else mem = si.freeram + si.bufferram; >> mem *= si.mem_unit; >> - mem /= PAGE_SIZE; >> + mem >>= (unsigned)(a_ctz_l(PAGE_SIZE)); >> return (mem > LONG_MAX) ? LONG_MAX : mem; >> case JT_ZERO & 255: >> return 0; >