From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9778
Path: news.gmane.org!not-for-mail
From: Jaydeep Patil <Jaydeep.Patil@imgtec.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: RE: [PATCH] Fix atomic_arch.h for MIPS32 R6
Date: Wed, 30 Mar 2016 09:45:59 +0000
Message-ID: <BD7773622145634B952E5B54ACA8E349AA24C18C@PUMAIL01.pu.imgtec.org>
References: <20160321173754.GC21636@brightrain.aerifal.cx>
 <BD7773622145634B952E5B54ACA8E349AA24AFEB@PUMAIL01.pu.imgtec.org>
 <20160322212211.GG21636@brightrain.aerifal.cx>
 <BD7773622145634B952E5B54ACA8E349AA24B25D@PUMAIL01.pu.imgtec.org>
 <20160323150302.GK21636@brightrain.aerifal.cx>
 <BD7773622145634B952E5B54ACA8E349AA24BAFA@PUMAIL01.pu.imgtec.org>
 <20160328130451.GH21636@brightrain.aerifal.cx>
 <BD7773622145634B952E5B54ACA8E349AA24BD1A@PUMAIL01.pu.imgtec.org>
 <20160329041055.GL21636@brightrain.aerifal.cx>
 <BD7773622145634B952E5B54ACA8E349AA24BE1D@PUMAIL01.pu.imgtec.org>
 <20160329133254.GM21636@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1459331181 12815 80.91.229.3 (30 Mar 2016 09:46:21 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Wed, 30 Mar 2016 09:46:21 +0000 (UTC)
Cc: "musl@lists.openwall.com" <musl@lists.openwall.com>
To: Rich Felker <dalias@libc.org>
Original-X-From: musl-return-9791-gllmg-musl=m.gmane.org@lists.openwall.com Wed Mar 30 11:46:20 2016
Return-path: <musl-return-9791-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-9791-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1alChf-0004Tn-By
	for gllmg-musl@m.gmane.org; Wed, 30 Mar 2016 11:46:19 +0200
Original-Received: (qmail 19818 invoked by uid 550); 30 Mar 2016 09:46:16 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 19800 invoked from network); 30 Mar 2016 09:46:15 -0000
Thread-Topic: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
Thread-Index: AQHRg5hzySIsGLLUbES01dme3Dr6SZ9k53DggAC29oCAANgNAIAAUFkAgAeMHPCAAC6HgIABVBQA//+pFACAAI87AIAADcgAgAFZ7RA=
In-Reply-To: <20160329133254.GM21636@brightrain.aerifal.cx>
Accept-Language: en-IN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [192.168.93.60]
Xref: news.gmane.org gmane.linux.lib.musl.general:9778
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/9778>

>-----Original Message-----
>From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich Felker
>Sent: 29 March 2016 PM 07:03
>To: Jaydeep Patil
>Cc: musl@lists.openwall.com
>Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
>
>On Tue, Mar 29, 2016 at 07:16:46AM +0000, Jaydeep Patil wrote:
>> >-----Original Message-----
>> >From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich Felker
>> >Sent: 29 March 2016 AM 09:41
>> >To: Jaydeep Patil
>> >Cc: musl@lists.openwall.com
>> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
>> >
>> >On Tue, Mar 29, 2016 at 03:54:02AM +0000, Jaydeep Patil wrote:
>> >> >-----Original Message-----
>> >> >From: Rich Felker [mailto:dalias@aerifal.cx] On Behalf Of Rich
>> >> >Felker
>> >> >Sent: 28 March 2016 PM 06:35
>> >> >To: Jaydeep Patil
>> >> >Cc: musl@lists.openwall.com
>> >> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
>> >> >
>> >> >On Mon, Mar 28, 2016 at 05:07:39AM +0000, Jaydeep Patil wrote:
>> >> >> >> >I was just saying it makes the code less cluttered to use
>> >> >> >> >them spuriously even though we don't need to:
>> >> >> >> >
>> >> >> >> >		".set push ; "
>> >> >> >> >#if __mips_isa_rev < 6
>> >> >> >> >		".set mips2 ; "
>> >> >> >> >#endif
>> >> >> >> >		"ll %0, %1 ; .set pop"
>> >> >> >> >
>> >> >> >> >or similar.
>> >> >> >> >
>> >> >> >> >It's also not clear to me whether the "m" constraint is
>> >> >> >> >valid anymore for the R6 ll/sc instructions since they take
>> >> >> >> >a 9-bit offset now instead of a
>> >> >> >16-bit offset.
>> >> >> >> >The compiler could generate an address expression whose
>> >> >> >> >offset part does not fit in 9 bits. In that case we may need
>> >> >> >> >to #if the whole function (or at least the __asm__
>> >> >> >> >statement) separately rather than just
>> >> >> >skipping the .set mips2....
>> >> >> >> >
>> >> >> >>
>> >> >> >> The "m" constrain is still valid here, as the offset will be 0 =
in this
>case..
>> >> >> >
>> >> >> >How can you assume the offset will be 0? It's the compiler's
>> >> >> >choice what to use. For instance, a_cas(&foo->bar, t, s) is
>> >> >> >likely to have an offset equal to
>> >> >> >offsetof(__typeof__(foo),bar). AFAIK this happens in practice
>> >> >> >with small offsets in mutex structures, etc. so the bug may be
>> >> >> >unlikely to be hit, but I think it's still an incorrect-
>> >constraint bug.
>> >> >>
>> >> >> Compiler generates appropriate LL/SC based on the offset.
>> >> >> Compiler adds the offset to the base register if it does not fit 9=
bits.
>> >> >
>> >> >The compiler has no way of knowing that the operand will be used
>> >> >with ll with the 9-bit offset restriction; as far as it knows, it
>> >> >will be used in a normal context where a 16-bit offset is valid. I
>> >> >don't have a toolchain that will target r6, but you can try the
>> >> >following program which produces an offset of 4096 for loading p[102=
4]:
>> >> >
>> >> >unsigned ll1k(volatile unsigned *p) {
>> >> >	unsigned val;
>> >> >	__asm__ __volatile__ ("ll %0, %1" : "=3Dr"(val) : "m"(p[1024]) :
>> >> >"memory" );
>> >> >	return val;
>> >> >}
>> >> >
>> >> >I would expect this to produce errors at assembly time on r6.
>> >> >Rich
>> >>
>> >> This is what compiler has generated for above function:
>> >>
>> >> $ gcc -c -o main.o main.c -O3 -mips32r6 -mabi=3D32
>> >>
>> >> Objdump:
>> >>
>> >> 00000000 <ll1k>:
>> >>    0:   24821000        addiu   v0,a0,4096
>> >>    4:   7c420036        ll      v0,0(v0)
>> >>    8:   d81f0000        jrc     ra
>> >>    c:   00000000        nop
>> >
>> >Can you try gcc -S instead of -c (still at -O3) to produce asm output
>> >without assembling it?
>>
>> Generated asssembly:
>>
>> #APP
>>  # 4 "test.c" 1
>>         ll $2, 4096($4)
>>  # 0 "" 2
>> #NO_APP
>>         jrc     $31
>>
>> Even if we set "noreorder" before LL, assembler generates addiu+ll:
>>
>> 00000000 <ll1k>:
>>    0:   24821000        addiu   v0,a0,4096
>>    4:   7c420036        ll      v0,0(v0)
>>    8:   d81f0000        jrc     ra
>>    c:   00000000        nop
>
>I see. I suspected the assembler was doing it. "noat", not "noreorder", is=
 the
>way to suppress things like this but I doubt even "noat" does it since a
>separate temp register ("at") is not needed in this case.
>
>If all assembers that support R6 support this rewriting, then the ZC const=
raint
>in gcc is really just an optimization, not strictly necessary. We should p=
robably
>check (1) whether clang's internal assembler can do the rewriting, and (2)
>whether clang supports the ZC constraint. I would prefer using ZC but I wa=
nt
>to do whatever is more compatible; I don't think the codegen efficiency
>matters a lot either way.
>Rich

Clang's integrated assembler does not support this rewriting. However ZC is=
 supported.
I have modified both atomic_arch.h and pthread_arch.h to reflect this.=20
Please refer to https://github.com/JaydeepIMG/musl-1/tree/fix_inline_asm_fo=
r_R6 for the patch (also listed below).
I have also added R6 as subarch.


>From 20054ee55643d9e81163ca58ac63cc38b5080969 Mon Sep 17 00:00:00 2001
From: Jaydeep Patil <jaydeep.patil@imgtec.com>
Date: Wed, 30 Mar 2016 10:37:30 +0100
Subject: [PATCH] [MIPS] Update inline asm for R6 and add R6 as subtarget

---
 arch/mips/atomic_arch.h    | 17 +++--------------
 arch/mips/pthread_arch.h   |  8 +-------
 arch/mips64/atomic_arch.h  | 12 +++++-------
 arch/mips64/pthread_arch.h |  7 +------
 configure                  |  2 ++
 5 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/arch/mips/atomic_arch.h b/arch/mips/atomic_arch.h
index ce2823b..4dbe4bb 100644
--- a/arch/mips/atomic_arch.h
+++ b/arch/mips/atomic_arch.h
@@ -3,10 +3,8 @@ static inline int a_ll(volatile int *p)
 {
 	int v;
 	__asm__ __volatile__ (
-		".set push ; .set mips2\n\t"
 		"ll %0, %1"
-		"\n\t.set pop"
-		: "=3Dr"(v) : "m"(*p));
+		: "=3Dr"(v) : "ZC"(*p));
 	return v;
 }
=20
@@ -15,24 +13,15 @@ static inline int a_sc(volatile int *p, int v)
 {
 	int r;
 	__asm__ __volatile__ (
-		".set push ; .set mips2\n\t"
 		"sc %0, %1"
-		"\n\t.set pop"
-		: "=3Dr"(r), "=3Dm"(*p) : "0"(v) : "memory");
+		: "=3Dr"(r), "=3DZC"(*p) : "0"(v) : "memory");
 	return r;
 }
=20
 #define a_barrier a_barrier
 static inline void a_barrier()
 {
-	/* mips2 sync, but using too many directives causes
-	 * gcc not to inline it, so encode with .long instead. */
-	__asm__ __volatile__ (".long 0xf" : : : "memory");
-#if 0
-	__asm__ __volatile__ (
-		".set push ; .set mips2 ; sync ; .set pop"
-		: : : "memory");
-#endif
+	__asm__ __volatile__ ("sync" : : : "memory");
 }
=20
 #define a_pre_llsc a_barrier
diff --git a/arch/mips/pthread_arch.h b/arch/mips/pthread_arch.h
index 8a49965..d8b6955 100644
--- a/arch/mips/pthread_arch.h
+++ b/arch/mips/pthread_arch.h
@@ -1,13 +1,7 @@
 static inline struct pthread *__pthread_self()
 {
-#ifdef __clang__
-	char *tp;
-	__asm__ __volatile__ (".word 0x7c03e83b ; move %0, $3" : "=3Dr" (tp) : : =
"$3" );
-#else
 	register char *tp __asm__("$3");
-	/* rdhwr $3,$29 */
-	__asm__ __volatile__ (".word 0x7c03e83b" : "=3Dr" (tp) );
-#endif
+	__asm__ __volatile__ ("rdhwr %0,$29" : "=3Dr" (tp));
 	return (pthread_t)(tp - 0x7000 - sizeof(struct pthread));
 }
=20
diff --git a/arch/mips64/atomic_arch.h b/arch/mips64/atomic_arch.h
index b468fd9..ac92891 100644
--- a/arch/mips64/atomic_arch.h
+++ b/arch/mips64/atomic_arch.h
@@ -4,7 +4,7 @@ static inline int a_ll(volatile int *p)
 	int v;
 	__asm__ __volatile__ (
 		"ll %0, %1"
-		: "=3Dr"(v) : "m"(*p));
+		: "=3Dr"(v) : "ZC"(*p));
 	return v;
 }
=20
@@ -14,7 +14,7 @@ static inline int a_sc(volatile int *p, int v)
 	int r;
 	__asm__ __volatile__ (
 		"sc %0, %1"
-		: "=3Dr"(r), "=3Dm"(*p) : "0"(v) : "memory");
+		: "=3Dr"(r), "=3DZC"(*p) : "0"(v) : "memory");
 	return r;
 }
=20
@@ -24,7 +24,7 @@ static inline void *a_ll_p(volatile void *p)
 	void *v;
 	__asm__ __volatile__ (
 		"lld %0, %1"
-		: "=3Dr"(v) : "m"(*(void *volatile *)p));
+		: "=3Dr"(v) : "ZC"(*(void *volatile *)p));
 	return v;
 }
=20
@@ -34,16 +34,14 @@ static inline int a_sc_p(volatile void *p, void *v)
 	long r;
 	__asm__ __volatile__ (
 		"scd %0, %1"
-		: "=3Dr"(r), "=3Dm"(*(void *volatile *)p) : "0"(v) : "memory");
+		: "=3Dr"(r), "=3DZC"(*(void *volatile *)p) : "0"(v) : "memory");
 	return r;
 }
=20
 #define a_barrier a_barrier
 static inline void a_barrier()
 {
-	/* mips2 sync, but using too many directives causes
-	 * gcc not to inline it, so encode with .long instead. */
-	__asm__ __volatile__ (".long 0xf" : : : "memory");
+	__asm__ __volatile__ ("sync" : : : "memory");
 }
=20
 #define a_pre_llsc a_barrier
diff --git a/arch/mips64/pthread_arch.h b/arch/mips64/pthread_arch.h
index b42edbe..d8b6955 100644
--- a/arch/mips64/pthread_arch.h
+++ b/arch/mips64/pthread_arch.h
@@ -1,12 +1,7 @@
 static inline struct pthread *__pthread_self()
 {
-#ifdef __clang__
-	char *tp;
-	__asm__ __volatile__ (".word 0x7c03e83b ; move %0, $3" : "=3Dr" (tp) : : =
"$3" );
-#else
 	register char *tp __asm__("$3");
-	__asm__ __volatile__ (".word 0x7c03e83b" : "=3Dr" (tp) );
-#endif
+	__asm__ __volatile__ ("rdhwr %0,$29" : "=3Dr" (tp));
 	return (pthread_t)(tp - 0x7000 - sizeof(struct pthread));
 }
=20
diff --git a/configure b/configure
index 213a825..969671d 100755
--- a/configure
+++ b/configure
@@ -612,11 +612,13 @@ trycppif __AARCH64EB__ "$t" && SUBARCH=3D${SUBARCH}_b=
e
 fi
=20
 if test "$ARCH" =3D "mips" ; then
+trycppif "__mips_isa_rev >=3D 6" "$t" && SUBARCH=3D${SUBARCH}r6
 trycppif "_MIPSEL || __MIPSEL || __MIPSEL__" "$t" && SUBARCH=3D${SUBARCH}e=
l
 trycppif __mips_soft_float "$t" && SUBARCH=3D${SUBARCH}-sf
 fi
=20
 if test "$ARCH" =3D "mips64" ; then
+trycppif "__mips_isa_rev >=3D 6" "$t" && SUBARCH=3D${SUBARCH}r6
 trycppif "_MIPSEL || __MIPSEL || __MIPSEL__" "$t" && SUBARCH=3D${SUBARCH}e=
l
 trycppif __mips_soft_float "$t" && SUBARCH=3D${SUBARCH}-sf
 fi
--=20
2.1.4


Thanks,
Jaydeep