From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9778 Path: news.gmane.org!not-for-mail From: Jaydeep Patil Newsgroups: gmane.linux.lib.musl.general Subject: RE: [PATCH] Fix atomic_arch.h for MIPS32 R6 Date: Wed, 30 Mar 2016 09:45:59 +0000 Message-ID: References: <20160321173754.GC21636@brightrain.aerifal.cx> <20160322212211.GG21636@brightrain.aerifal.cx> <20160323150302.GK21636@brightrain.aerifal.cx> <20160328130451.GH21636@brightrain.aerifal.cx> <20160329041055.GL21636@brightrain.aerifal.cx> <20160329133254.GM21636@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1459331181 12815 80.91.229.3 (30 Mar 2016 09:46:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 30 Mar 2016 09:46:21 +0000 (UTC) Cc: "musl@lists.openwall.com" To: Rich Felker Original-X-From: musl-return-9791-gllmg-musl=m.gmane.org@lists.openwall.com Wed Mar 30 11:46:20 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1alChf-0004Tn-By for gllmg-musl@m.gmane.org; Wed, 30 Mar 2016 11:46:19 +0200 Original-Received: (qmail 19818 invoked by uid 550); 30 Mar 2016 09:46:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 19800 invoked from network); 30 Mar 2016 09:46:15 -0000 Thread-Topic: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6 Thread-Index: AQHRg5hzySIsGLLUbES01dme3Dr6SZ9k53DggAC29oCAANgNAIAAUFkAgAeMHPCAAC6HgIABVBQA//+pFACAAI87AIAADcgAgAFZ7RA= In-Reply-To: <20160329133254.GM21636@brightrain.aerifal.cx> Accept-Language: en-IN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.93.60] Xref: news.gmane.org gmane.linux.lib.musl.general:9778 Archived-At: >-----Original Message----- >From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich Felker >Sent: 29 March 2016 PM 07:03 >To: Jaydeep Patil >Cc: musl@lists.openwall.com >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6 > >On Tue, Mar 29, 2016 at 07:16:46AM +0000, Jaydeep Patil wrote: >> >-----Original Message----- >> >From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich Felker >> >Sent: 29 March 2016 AM 09:41 >> >To: Jaydeep Patil >> >Cc: musl@lists.openwall.com >> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6 >> > >> >On Tue, Mar 29, 2016 at 03:54:02AM +0000, Jaydeep Patil wrote: >> >> >-----Original Message----- >> >> >From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich >> >> >Felker >> >> >Sent: 28 March 2016 PM 06:35 >> >> >To: Jaydeep Patil >> >> >Cc: musl@lists.openwall.com >> >> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6 >> >> > >> >> >On Mon, Mar 28, 2016 at 05:07:39AM +0000, Jaydeep Patil wrote: >> >> >> >> >I was just saying it makes the code less cluttered to use >> >> >> >> >them spuriously even though we don't need to: >> >> >> >> > >> >> >> >> > ".set push ; " >> >> >> >> >#if __mips_isa_rev < 6 >> >> >> >> > ".set mips2 ; " >> >> >> >> >#endif >> >> >> >> > "ll %0, %1 ; .set pop" >> >> >> >> > >> >> >> >> >or similar. >> >> >> >> > >> >> >> >> >It's also not clear to me whether the "m" constraint is >> >> >> >> >valid anymore for the R6 ll/sc instructions since they take >> >> >> >> >a 9-bit offset now instead of a >> >> >> >16-bit offset. >> >> >> >> >The compiler could generate an address expression whose >> >> >> >> >offset part does not fit in 9 bits. In that case we may need >> >> >> >> >to #if the whole function (or at least the __asm__ >> >> >> >> >statement) separately rather than just >> >> >> >skipping the .set mips2.... >> >> >> >> > >> >> >> >> >> >> >> >> The "m" constrain is still valid here, as the offset will be 0 = in this >case.. >> >> >> > >> >> >> >How can you assume the offset will be 0? It's the compiler's >> >> >> >choice what to use. For instance, a_cas(&foo->bar, t, s) is >> >> >> >likely to have an offset equal to >> >> >> >offsetof(__typeof__(foo),bar). AFAIK this happens in practice >> >> >> >with small offsets in mutex structures, etc. so the bug may be >> >> >> >unlikely to be hit, but I think it's still an incorrect- >> >constraint bug. >> >> >> >> >> >> Compiler generates appropriate LL/SC based on the offset. >> >> >> Compiler adds the offset to the base register if it does not fit 9= bits. >> >> > >> >> >The compiler has no way of knowing that the operand will be used >> >> >with ll with the 9-bit offset restriction; as far as it knows, it >> >> >will be used in a normal context where a 16-bit offset is valid. I >> >> >don't have a toolchain that will target r6, but you can try the >> >> >following program which produces an offset of 4096 for loading p[102= 4]: >> >> > >> >> >unsigned ll1k(volatile unsigned *p) { >> >> > unsigned val; >> >> > __asm__ __volatile__ ("ll %0, %1" : "=3Dr"(val) : "m"(p[1024]) : >> >> >"memory" ); >> >> > return val; >> >> >} >> >> > >> >> >I would expect this to produce errors at assembly time on r6. >> >> >Rich >> >> >> >> This is what compiler has generated for above function: >> >> >> >> $ gcc -c -o main.o main.c -O3 -mips32r6 -mabi=3D32 >> >> >> >> Objdump: >> >> >> >> 00000000 : >> >> 0: 24821000 addiu v0,a0,4096 >> >> 4: 7c420036 ll v0,0(v0) >> >> 8: d81f0000 jrc ra >> >> c: 00000000 nop >> > >> >Can you try gcc -S instead of -c (still at -O3) to produce asm output >> >without assembling it? >> >> Generated asssembly: >> >> #APP >> # 4 "test.c" 1 >> ll $2, 4096($4) >> # 0 "" 2 >> #NO_APP >> jrc $31 >> >> Even if we set "noreorder" before LL, assembler generates addiu+ll: >> >> 00000000 : >> 0: 24821000 addiu v0,a0,4096 >> 4: 7c420036 ll v0,0(v0) >> 8: d81f0000 jrc ra >> c: 00000000 nop > >I see. I suspected the assembler was doing it. "noat", not "noreorder", is= the >way to suppress things like this but I doubt even "noat" does it since a >separate temp register ("at") is not needed in this case. > >If all assembers that support R6 support this rewriting, then the ZC const= raint >in gcc is really just an optimization, not strictly necessary. We should p= robably >check (1) whether clang's internal assembler can do the rewriting, and (2) >whether clang supports the ZC constraint. I would prefer using ZC but I wa= nt >to do whatever is more compatible; I don't think the codegen efficiency >matters a lot either way. >Rich Clang's integrated assembler does not support this rewriting. However ZC is= supported. I have modified both atomic_arch.h and pthread_arch.h to reflect this.=20 Please refer to https://github.com/JaydeepIMG/musl-1/tree/fix_inline_asm_fo= r_R6 for the patch (also listed below). I have also added R6 as subarch. >From 20054ee55643d9e81163ca58ac63cc38b5080969 Mon Sep 17 00:00:00 2001 From: Jaydeep Patil Date: Wed, 30 Mar 2016 10:37:30 +0100 Subject: [PATCH] [MIPS] Update inline asm for R6 and add R6 as subtarget --- arch/mips/atomic_arch.h | 17 +++-------------- arch/mips/pthread_arch.h | 8 +------- arch/mips64/atomic_arch.h | 12 +++++------- arch/mips64/pthread_arch.h | 7 +------ configure | 2 ++ 5 files changed, 12 insertions(+), 34 deletions(-) diff --git a/arch/mips/atomic_arch.h b/arch/mips/atomic_arch.h index ce2823b..4dbe4bb 100644 --- a/arch/mips/atomic_arch.h +++ b/arch/mips/atomic_arch.h @@ -3,10 +3,8 @@ static inline int a_ll(volatile int *p) { int v; __asm__ __volatile__ ( - ".set push ; .set mips2\n\t" "ll %0, %1" - "\n\t.set pop" - : "=3Dr"(v) : "m"(*p)); + : "=3Dr"(v) : "ZC"(*p)); return v; } =20 @@ -15,24 +13,15 @@ static inline int a_sc(volatile int *p, int v) { int r; __asm__ __volatile__ ( - ".set push ; .set mips2\n\t" "sc %0, %1" - "\n\t.set pop" - : "=3Dr"(r), "=3Dm"(*p) : "0"(v) : "memory"); + : "=3Dr"(r), "=3DZC"(*p) : "0"(v) : "memory"); return r; } =20 #define a_barrier a_barrier static inline void a_barrier() { - /* mips2 sync, but using too many directives causes - * gcc not to inline it, so encode with .long instead. */ - __asm__ __volatile__ (".long 0xf" : : : "memory"); -#if 0 - __asm__ __volatile__ ( - ".set push ; .set mips2 ; sync ; .set pop" - : : : "memory"); -#endif + __asm__ __volatile__ ("sync" : : : "memory"); } =20 #define a_pre_llsc a_barrier diff --git a/arch/mips/pthread_arch.h b/arch/mips/pthread_arch.h index 8a49965..d8b6955 100644 --- a/arch/mips/pthread_arch.h +++ b/arch/mips/pthread_arch.h @@ -1,13 +1,7 @@ static inline struct pthread *__pthread_self() { -#ifdef __clang__ - char *tp; - __asm__ __volatile__ (".word 0x7c03e83b ; move %0, $3" : "=3Dr" (tp) : : = "$3" ); -#else register char *tp __asm__("$3"); - /* rdhwr $3,$29 */ - __asm__ __volatile__ (".word 0x7c03e83b" : "=3Dr" (tp) ); -#endif + __asm__ __volatile__ ("rdhwr %0,$29" : "=3Dr" (tp)); return (pthread_t)(tp - 0x7000 - sizeof(struct pthread)); } =20 diff --git a/arch/mips64/atomic_arch.h b/arch/mips64/atomic_arch.h index b468fd9..ac92891 100644 --- a/arch/mips64/atomic_arch.h +++ b/arch/mips64/atomic_arch.h @@ -4,7 +4,7 @@ static inline int a_ll(volatile int *p) int v; __asm__ __volatile__ ( "ll %0, %1" - : "=3Dr"(v) : "m"(*p)); + : "=3Dr"(v) : "ZC"(*p)); return v; } =20 @@ -14,7 +14,7 @@ static inline int a_sc(volatile int *p, int v) int r; __asm__ __volatile__ ( "sc %0, %1" - : "=3Dr"(r), "=3Dm"(*p) : "0"(v) : "memory"); + : "=3Dr"(r), "=3DZC"(*p) : "0"(v) : "memory"); return r; } =20 @@ -24,7 +24,7 @@ static inline void *a_ll_p(volatile void *p) void *v; __asm__ __volatile__ ( "lld %0, %1" - : "=3Dr"(v) : "m"(*(void *volatile *)p)); + : "=3Dr"(v) : "ZC"(*(void *volatile *)p)); return v; } =20 @@ -34,16 +34,14 @@ static inline int a_sc_p(volatile void *p, void *v) long r; __asm__ __volatile__ ( "scd %0, %1" - : "=3Dr"(r), "=3Dm"(*(void *volatile *)p) : "0"(v) : "memory"); + : "=3Dr"(r), "=3DZC"(*(void *volatile *)p) : "0"(v) : "memory"); return r; } =20 #define a_barrier a_barrier static inline void a_barrier() { - /* mips2 sync, but using too many directives causes - * gcc not to inline it, so encode with .long instead. */ - __asm__ __volatile__ (".long 0xf" : : : "memory"); + __asm__ __volatile__ ("sync" : : : "memory"); } =20 #define a_pre_llsc a_barrier diff --git a/arch/mips64/pthread_arch.h b/arch/mips64/pthread_arch.h index b42edbe..d8b6955 100644 --- a/arch/mips64/pthread_arch.h +++ b/arch/mips64/pthread_arch.h @@ -1,12 +1,7 @@ static inline struct pthread *__pthread_self() { -#ifdef __clang__ - char *tp; - __asm__ __volatile__ (".word 0x7c03e83b ; move %0, $3" : "=3Dr" (tp) : : = "$3" ); -#else register char *tp __asm__("$3"); - __asm__ __volatile__ (".word 0x7c03e83b" : "=3Dr" (tp) ); -#endif + __asm__ __volatile__ ("rdhwr %0,$29" : "=3Dr" (tp)); return (pthread_t)(tp - 0x7000 - sizeof(struct pthread)); } =20 diff --git a/configure b/configure index 213a825..969671d 100755 --- a/configure +++ b/configure @@ -612,11 +612,13 @@ trycppif __AARCH64EB__ "$t" && SUBARCH=3D${SUBARCH}_b= e fi =20 if test "$ARCH" =3D "mips" ; then +trycppif "__mips_isa_rev >=3D 6" "$t" && SUBARCH=3D${SUBARCH}r6 trycppif "_MIPSEL || __MIPSEL || __MIPSEL__" "$t" && SUBARCH=3D${SUBARCH}e= l trycppif __mips_soft_float "$t" && SUBARCH=3D${SUBARCH}-sf fi =20 if test "$ARCH" =3D "mips64" ; then +trycppif "__mips_isa_rev >=3D 6" "$t" && SUBARCH=3D${SUBARCH}r6 trycppif "_MIPSEL || __MIPSEL || __MIPSEL__" "$t" && SUBARCH=3D${SUBARCH}e= l trycppif __mips_soft_float "$t" && SUBARCH=3D${SUBARCH}-sf fi --=20 2.1.4 Thanks, Jaydeep