mailing list of musl libc
 help / color / mirror / code / Atom feed
* build musl for armv7m
@ 2016-06-14  8:49 weimingz
  2016-06-14 13:00 ` Rich Felker
  2016-06-14 14:38 ` Rich Felker
  0 siblings, 2 replies; 21+ messages in thread
From: weimingz @ 2016-06-14  8:49 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

Hi,

I'm building MUSL with -mcpu=cortex-m3. There are a few .s files that 
cannot be assembled because: (1) use predicated instructions without IT 
instr (2) use sp inside reg list in ldmia/stmia.

Please help to review the attached patch.

Also, is there any easy way of disabling string/arm/memcpy_le.S ? For 
baremetal, unaligned access may be unavailable.

Thanks,
Weiming








[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-fix-asm-for-cortex-m3.patch --]
[-- Type: text/x-diff; name=0001-fix-asm-for-cortex-m3.patch, Size: 2137 bytes --]

From c8fafd75832b0974be59429c6d59b5c19b535c45 Mon Sep 17 00:00:00 2001
From: Weiming Zhao <weimingz@codeaurora.org>
Date: Tue, 14 Jun 2016 01:36:16 -0700
Subject: [PATCH] fix asm for cortex-m3

---
 src/setjmp/arm/longjmp.s    | 5 ++++-
 src/setjmp/arm/setjmp.s     | 4 +++-
 src/thread/arm/atomics.s    | 3 +++
 src/thread/arm/syscall_cp.s | 1 +
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..5c85cba 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -7,8 +7,11 @@ _longjmp:
 longjmp:
 	mov ip,r0
 	movs r0,r1
+  it eq
 	moveq r0,#1
-	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+ ldr sp, [ip]!
+ ldr lr, [ip]!
 
 	adr r1,1f
 	ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..f13068a 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
 _setjmp:
 setjmp:
 	mov ip,r0
-	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+ str sp, [ip]!
+ str lr, [ip]!
 	mov r0,#0
 
 	adr r1,1f
diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
index 673fc03..05fd9b9 100644
--- a/src/thread/arm/atomics.s
+++ b/src/thread/arm/atomics.s
@@ -49,6 +49,7 @@ __a_cas_dummy:
 	mov r3,r0
 	ldr r0,[r2]
 	subs r0,r3,r0
+ it eq
 	streq r1,[r2]
 	bx lr
 .global __a_cas_v6
@@ -58,6 +59,7 @@ __a_cas_v6:
 	mcr p15,0,r0,c7,c10,5
 1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
 	subs r0,r3,r0
+ itt eq
 	.word 0x01820f91        /* strexeq r0,r1,[r2] */
 	teqeq r0,#1
 	beq 1b
@@ -70,6 +72,7 @@ __a_cas_v7:
 	.word 0xf57ff05b        /* dmb ish */
 1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
 	subs r0,r3,r0
+ itt eq
 	.word 0x01820f91        /* strexeq r0,r1,[r2] */
 	teqeq r0,#1
 	beq 1b
diff --git a/src/thread/arm/syscall_cp.s b/src/thread/arm/syscall_cp.s
index a5730c0..399700b 100644
--- a/src/thread/arm/syscall_cp.s
+++ b/src/thread/arm/syscall_cp.s
@@ -15,6 +15,7 @@ __syscall_cp_asm:
 __cp_begin:
 	ldr r0,[r0]
 	cmp r0,#0
+ it ne
 	blne __cp_cancel
 	mov r7,r1
 	mov r0,r2
2.8.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14  8:49 build musl for armv7m weimingz
@ 2016-06-14 13:00 ` Rich Felker
  2016-06-14 16:12   ` Zhao, Weiming
  2016-06-14 14:38 ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-14 13:00 UTC (permalink / raw)
  To: musl

On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
> Hi,
> 
> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
> that cannot be assembled because: (1) use predicated instructions
> without IT instr (2) use sp inside reg list in ldmia/stmia.
> 
> Please help to review the attached patch.

Did you test anything? These patches do not result in working code;
they just make it assemble without errors. There's already a gas
option to automatically add IT instructions where needed for
thumb-only targets, but that's not the only thing needed to support
thumb-only/cortex-m. I'd be interested in knowing more about the setup
you're trying to target. Is it Linux or bare-metal? If Linux, are you
going to use the ARM/FDPIC toolchain & kernel mods? I'm about to leave
at the moment but I'll follow up with a more detailed review of your
patch later.

> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
> For baremetal, unaligned access may be unavailable.

That file does not perform any unaligned access. It should work on any
EABI-supported version of the arm instruction set.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14  8:49 build musl for armv7m weimingz
  2016-06-14 13:00 ` Rich Felker
@ 2016-06-14 14:38 ` Rich Felker
  2016-06-14 16:35   ` Zhao, Weiming
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-14 14:38 UTC (permalink / raw)
  To: musl

On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
> diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
> index e28d8f3..5c85cba 100644
> --- a/src/setjmp/arm/longjmp.s
> +++ b/src/setjmp/arm/longjmp.s
> @@ -7,8 +7,11 @@ _longjmp:
>  longjmp:
>  	mov ip,r0
>  	movs r0,r1
> +  it eq
>  	moveq r0,#1
> -	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
> +	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
> + ldr sp, [ip]!
> + ldr lr, [ip]!

Small note: your indention is inconsistent -- using a single space
instead of a tab.

In my earlier work trying to get an all-thumb build working, I had
tried doing this as:

+       ldmia ip!, {r2,v1,v2,v3,v4,v5,v6,sl,fp,lr}
+       mov sp,r2

which might be marginally faster but of course it reorders the fields
in the jmp_buf. Nothing should be depending on them anyway but I like
your version better I think. Likewise for setjmp.

> diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
> index 673fc03..05fd9b9 100644
> --- a/src/thread/arm/atomics.s
> +++ b/src/thread/arm/atomics.s
> @@ -49,6 +49,7 @@ __a_cas_dummy:
>  	mov r3,r0
>  	ldr r0,[r2]
>  	subs r0,r3,r0
> + it eq
>  	streq r1,[r2]
>  	bx lr
>  .global __a_cas_v6
> @@ -58,6 +59,7 @@ __a_cas_v6:
>  	mcr p15,0,r0,c7,c10,5
>  1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
>  	subs r0,r3,r0
> + itt eq
>  	.word 0x01820f91        /* strexeq r0,r1,[r2] */
>  	teqeq r0,#1
>  	beq 1b

You're ignoring that the .word-encoded instruction is not valid thumb,
and that the two 'eq' conditions are independent. The second one
(teqeq) is using the result of the strexeq, not the result of the
subs. But there are much deeper reasons this file cannot work as thumb
without significant changes: the arithmetic on pc is not
thumb-compatible.

> @@ -70,6 +72,7 @@ __a_cas_v7:
>  	.word 0xf57ff05b        /* dmb ish */
>  1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
>  	subs r0,r3,r0
> + itt eq
>  	.word 0x01820f91        /* strexeq r0,r1,[r2] */
>  	teqeq r0,#1
>  	beq 1b

Same here.

There's also the matter of the code to read the thread pointer, which
requires a special CP15 register that cortex-m's lack, I think. I
don't see any alternate way to do this functionality (and it would be
ABI-incompatible anyway) so I think the kernel (or bare-metal
undefined instruction exception trap) needs to just trap and emulate
it.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 13:00 ` Rich Felker
@ 2016-06-14 16:12   ` Zhao, Weiming
  2016-06-14 16:32     ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-14 16:12 UTC (permalink / raw)
  To: musl

I haven't did a full test as the functions I modified are not actually 
being used.

It is a bare-metal environment, using clang to compile.

Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections 
-mno-unaligned-access

Could you please let me know the gas option?

Thanks,

Weiing

On 6/14/2016 6:00 AM, Rich Felker wrote:
> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
>> Hi,
>>
>> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
>> that cannot be assembled because: (1) use predicated instructions
>> without IT instr (2) use sp inside reg list in ldmia/stmia.
>>
>> Please help to review the attached patch.
> Did you test anything? These patches do not result in working code;
> they just make it assemble without errors. There's already a gas
> option to automatically add IT instructions where needed for
> thumb-only targets, but that's not the only thing needed to support
> thumb-only/cortex-m. I'd be interested in knowing more about the setup
> you're trying to target. Is it Linux or bare-metal? If Linux, are you
> going to use the ARM/FDPIC toolchain & kernel mods? I'm about to leave
> at the moment but I'll follow up with a more detailed review of your
> patch later.
>
>> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
>> For baremetal, unaligned access may be unavailable.
> That file does not perform any unaligned access. It should work on any
> EABI-supported version of the arm instruction set.
>
> Rich

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 16:12   ` Zhao, Weiming
@ 2016-06-14 16:32     ` Szabolcs Nagy
  2016-06-14 16:58       ` Zhao, Weiming
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2016-06-14 16:32 UTC (permalink / raw)
  To: musl

* Zhao, Weiming <weimingz@codeaurora.org> [2016-06-14 09:12:17 -0700]:
> I haven't did a full test as the functions I modified are not actually being
> used.
> 
> It is a bare-metal environment, using clang to compile.
> 
> Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections
> -mno-unaligned-access
> 
> Could you please let me know the gas option?
> 

-mimplicit-it=always

https://sourceware.org/binutils/docs/as/ARM-Options.html

> Thanks,
> 
> Weiing
> 
> On 6/14/2016 6:00 AM, Rich Felker wrote:
> >On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
> >>Hi,
> >>
> >>I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
> >>that cannot be assembled because: (1) use predicated instructions
> >>without IT instr (2) use sp inside reg list in ldmia/stmia.
> >>
> >>Please help to review the attached patch.
> >Did you test anything? These patches do not result in working code;
> >they just make it assemble without errors. There's already a gas
> >option to automatically add IT instructions where needed for
> >thumb-only targets, but that's not the only thing needed to support
> >thumb-only/cortex-m. I'd be interested in knowing more about the setup
> >you're trying to target. Is it Linux or bare-metal? If Linux, are you
> >going to use the ARM/FDPIC toolchain & kernel mods? I'm about to leave
> >at the moment but I'll follow up with a more detailed review of your
> >patch later.
> >
> >>Also, is there any easy way of disabling string/arm/memcpy_le.S ?
> >>For baremetal, unaligned access may be unavailable.
> >That file does not perform any unaligned access. It should work on any
> >EABI-supported version of the arm instruction set.
> >
> >Rich
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 14:38 ` Rich Felker
@ 2016-06-14 16:35   ` Zhao, Weiming
  0 siblings, 0 replies; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-14 16:35 UTC (permalink / raw)
  To: musl

I'll fix the indention.

Why do we use .word encoded instruction? Clang's integrated-as can 
encode the instruction correctly. Is that some workaround with some 
compiler?

Regarding " The second one(teqeq) is using the result of the strexeq, 
not the result of thesubs.":

strexeq won't change flag register: 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/BABFFBJB.html

If it would, the execution of teqeq will depend on the conditionally 
executed "strexeq", is that expected?

Thanks,

Weiming


On 6/14/2016 7:38 AM, Rich Felker wrote:
> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
>> diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
>> index e28d8f3..5c85cba 100644
>> --- a/src/setjmp/arm/longjmp.s
>> +++ b/src/setjmp/arm/longjmp.s
>> @@ -7,8 +7,11 @@ _longjmp:
>>   longjmp:
>>   	mov ip,r0
>>   	movs r0,r1
>> +  it eq
>>   	moveq r0,#1
>> -	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
>> +	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
>> + ldr sp, [ip]!
>> + ldr lr, [ip]!
> Small note: your indention is inconsistent -- using a single space
> instead of a tab.
>
> In my earlier work trying to get an all-thumb build working, I had
> tried doing this as:
>
> +       ldmia ip!, {r2,v1,v2,v3,v4,v5,v6,sl,fp,lr}
> +       mov sp,r2
>
> which might be marginally faster but of course it reorders the fields
> in the jmp_buf. Nothing should be depending on them anyway but I like
> your version better I think. Likewise for setjmp.
>
>> diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
>> index 673fc03..05fd9b9 100644
>> --- a/src/thread/arm/atomics.s
>> +++ b/src/thread/arm/atomics.s
>> @@ -49,6 +49,7 @@ __a_cas_dummy:
>>   	mov r3,r0
>>   	ldr r0,[r2]
>>   	subs r0,r3,r0
>> + it eq
>>   	streq r1,[r2]
>>   	bx lr
>>   .global __a_cas_v6
>> @@ -58,6 +59,7 @@ __a_cas_v6:
>>   	mcr p15,0,r0,c7,c10,5
>>   1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
>>   	subs r0,r3,r0
>> + itt eq
>>   	.word 0x01820f91        /* strexeq r0,r1,[r2] */
>>   	teqeq r0,#1
>>   	beq 1b
> You're ignoring that the .word-encoded instruction is not valid thumb,
> and that the two 'eq' conditions are independent. The second one
> (teqeq) is using the result of the strexeq, not the result of the
> subs. But there are much deeper reasons this file cannot work as thumb
> without significant changes: the arithmetic on pc is not
> thumb-compatible.
>
>> @@ -70,6 +72,7 @@ __a_cas_v7:
>>   	.word 0xf57ff05b        /* dmb ish */
>>   1:	.word 0xe1920f9f        /* ldrex r0,[r2] */
>>   	subs r0,r3,r0
>> + itt eq
>>   	.word 0x01820f91        /* strexeq r0,r1,[r2] */
>>   	teqeq r0,#1
>>   	beq 1b
> Same here.
>
> There's also the matter of the code to read the thread pointer, which
> requires a special CP15 register that cortex-m's lack, I think. I
> don't see any alternate way to do this functionality (and it would be
> ABI-incompatible anyway) so I think the kernel (or bare-metal
> undefined instruction exception trap) needs to just trap and emulate
> it.
>
> Rich

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 16:32     ` Szabolcs Nagy
@ 2016-06-14 16:58       ` Zhao, Weiming
  2016-06-14 17:40         ` Zhao, Weiming
  0 siblings, 1 reply; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-14 16:58 UTC (permalink / raw)
  To: musl

Thank you, Szabolcs.

It's GAS-only flag, clang's integrated-as doesn't support it. I can use 
-no-integrated-as.

It also expose other errors:

src/thread/arm/atomics.s:9: Error: cannot use register index with 
PC-relative addressing -- `ldr ip,[pc,ip]'

Clang's integrated-as accepts it but seems encoded incorrectly.

I can fix it.


On 6/14/2016 9:32 AM, Szabolcs Nagy wrote:
> * Zhao, Weiming <weimingz@codeaurora.org> [2016-06-14 09:12:17 -0700]:
>> I haven't did a full test as the functions I modified are not actually being
>> used.
>>
>> It is a bare-metal environment, using clang to compile.
>>
>> Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections
>> -mno-unaligned-access
>>
>> Could you please let me know the gas option?
>>
> -mimplicit-it=always
>
> https://sourceware.org/binutils/docs/as/ARM-Options.html
>
>> Thanks,
>>
>> Weiing
>>
>> On 6/14/2016 6:00 AM, Rich Felker wrote:
>>> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org wrote:
>>>> Hi,
>>>>
>>>> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
>>>> that cannot be assembled because: (1) use predicated instructions
>>>> without IT instr (2) use sp inside reg list in ldmia/stmia.
>>>>
>>>> Please help to review the attached patch.
>>> Did you test anything? These patches do not result in working code;
>>> they just make it assemble without errors. There's already a gas
>>> option to automatically add IT instructions where needed for
>>> thumb-only targets, but that's not the only thing needed to support
>>> thumb-only/cortex-m. I'd be interested in knowing more about the setup
>>> you're trying to target. Is it Linux or bare-metal? If Linux, are you
>>> going to use the ARM/FDPIC toolchain & kernel mods? I'm about to leave
>>> at the moment but I'll follow up with a more detailed review of your
>>> patch later.
>>>
>>>> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
>>>> For baremetal, unaligned access may be unavailable.
>>> That file does not perform any unaligned access. It should work on any
>>> EABI-supported version of the arm instruction set.
>>>
>>> Rich
>> -- 
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 16:58       ` Zhao, Weiming
@ 2016-06-14 17:40         ` Zhao, Weiming
  2016-06-16 18:34           ` Zhao, Weiming
  2016-06-20 17:17           ` Zhao, Weiming
  0 siblings, 2 replies; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-14 17:40 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2614 bytes --]

Please review the changes.

I'm able to build libc.a with -mcpu=cortex-m3 -Wa,-implicit-it=always 
using clang + gas

Thanks,

weiming


On 6/14/2016 9:58 AM, Zhao, Weiming wrote:
> Thank you, Szabolcs.
>
> It's GAS-only flag, clang's integrated-as doesn't support it. I can 
> use -no-integrated-as.
>
> It also expose other errors:
>
> src/thread/arm/atomics.s:9: Error: cannot use register index with 
> PC-relative addressing -- `ldr ip,[pc,ip]'
>
> Clang's integrated-as accepts it but seems encoded incorrectly.
>
> I can fix it.
>
>
> On 6/14/2016 9:32 AM, Szabolcs Nagy wrote:
>> * Zhao, Weiming <weimingz@codeaurora.org> [2016-06-14 09:12:17 -0700]:
>>> I haven't did a full test as the functions I modified are not 
>>> actually being
>>> used.
>>>
>>> It is a bare-metal environment, using clang to compile.
>>>
>>> Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections
>>> -mno-unaligned-access
>>>
>>> Could you please let me know the gas option?
>>>
>> -mimplicit-it=always
>>
>> https://sourceware.org/binutils/docs/as/ARM-Options.html
>>
>>> Thanks,
>>>
>>> Weiing
>>>
>>> On 6/14/2016 6:00 AM, Rich Felker wrote:
>>>> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
>>>>> that cannot be assembled because: (1) use predicated instructions
>>>>> without IT instr (2) use sp inside reg list in ldmia/stmia.
>>>>>
>>>>> Please help to review the attached patch.
>>>> Did you test anything? These patches do not result in working code;
>>>> they just make it assemble without errors. There's already a gas
>>>> option to automatically add IT instructions where needed for
>>>> thumb-only targets, but that's not the only thing needed to support
>>>> thumb-only/cortex-m. I'd be interested in knowing more about the setup
>>>> you're trying to target. Is it Linux or bare-metal? If Linux, are you
>>>> going to use the ARM/FDPIC toolchain & kernel mods? I'm about to leave
>>>> at the moment but I'll follow up with a more detailed review of your
>>>> patch later.
>>>>
>>>>> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
>>>>> For baremetal, unaligned access may be unavailable.
>>>> That file does not perform any unaligned access. It should work on any
>>>> EABI-supported version of the arm instruction set.
>>>>
>>>> Rich
>>> -- 
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
>>> hosted by The Linux Foundation
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


[-- Attachment #2: cortex_m3.patch --]
[-- Type: text/plain, Size: 2136 bytes --]

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..e9b9b32 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -8,7 +8,9 @@ longjmp:
 	mov ip,r0
 	movs r0,r1
 	moveq r0,#1
-	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+	ldr sp, [ip]!
+	ldr lr, [ip]!
 
 	adr r1,1f
 	ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..fd380b0 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
 _setjmp:
 setjmp:
 	mov ip,r0
-	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+	str sp, [ip]!
+	str lr, [ip]!
 	mov r0,#0
 
 	adr r1,1f
diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
index 4db4844..2517d15 100644
--- a/src/string/arm/memcpy_le.S
+++ b/src/string/arm/memcpy_le.S
@@ -241,7 +241,8 @@ non_congruent:
 	beq     2f
 	ldr     r5, [r1], #4
 	sub     r2, r2, #4
-	orr     r4, r3, r5,             lsl lr
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
 	mov     r3, r5,                 lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4
@@ -348,7 +349,8 @@ less_than_thirtytwo:
 
 1:      ldr     r5, [r1], #4
 	sub     r2, r2, #4
-	orr     r4, r3, r5,             lsl lr
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
 	mov     r3,     r5,                     lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4
diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
index 673fc03..a4bd03a 100644
--- a/src/thread/arm/atomics.s
+++ b/src/thread/arm/atomics.s
@@ -6,7 +6,8 @@
 .type __a_barrier,%function
 __a_barrier:
 	ldr ip,1f
-	ldr ip,[pc,ip]
+	add ip,pc,ip
+	ldr ip,[ip]
 	add pc,pc,ip
 1:	.word __a_barrier_ptr-1b
 .global __a_barrier_dummy
@@ -40,7 +41,8 @@ __a_barrier_v7:
 .type __a_cas,%function
 __a_cas:
 	ldr ip,1f
-	ldr ip,[pc,ip]
+	add ip,pc,ip
+	ldr ip,[ip]
 	add pc,pc,ip
 1:	.word __a_cas_ptr-1b
 .global __a_cas_dummy
@@ -85,7 +87,8 @@ __aeabi_read_tp:
 .type __a_gettp,%function
 __a_gettp:
 	ldr r0,1f
-	ldr r0,[pc,r0]
+	add r0,pc,r0
+	ldr r0,[r0]
 	add pc,pc,r0
 1:	.word __a_gettp_ptr-1b
 .global __a_gettp_dummy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 17:40         ` Zhao, Weiming
@ 2016-06-16 18:34           ` Zhao, Weiming
  2016-06-20 19:58             ` Rich Felker
  2016-06-20 17:17           ` Zhao, Weiming
  1 sibling, 1 reply; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-16 18:34 UTC (permalink / raw)
  To: musl

I tried to build for armv6m (cortex-m0) and I got other build issues 
with .S and inline asms.

Below are the changes for building armv7m:

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..e9b9b32 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -8,7 +8,9 @@ longjmp:
      mov ip,r0
      movs r0,r1
      moveq r0,#1
-    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+    ldr sp, [ip]!
+    ldr lr, [ip]!

      adr r1,1f
      ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..fd380b0 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
  _setjmp:
  setjmp:
      mov ip,r0
-    stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+    stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+    str sp, [ip]!
+    str lr, [ip]!
      mov r0,#0

      adr r1,1f
diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
index 4db4844..2517d15 100644
--- a/src/string/arm/memcpy_le.S
+++ b/src/string/arm/memcpy_le.S
@@ -241,7 +241,8 @@ non_congruent:
      beq     2f
      ldr     r5, [r1], #4
      sub     r2, r2, #4
-    orr     r4, r3, r5,             lsl lr
+    lsl     r4, r5, lr
+    orr     r4, r3, r4
      mov     r3, r5,                 lsr r12
      str     r4, [r0], #4
      cmp     r2, #4
@@ -348,7 +349,8 @@ less_than_thirtytwo:

  1:      ldr     r5, [r1], #4
      sub     r2, r2, #4
-    orr     r4, r3, r5,             lsl lr
+    lsl     r4, r5, lr
+    orr     r4, r3, r4
      mov     r3,     r5,                     lsr r12
      str     r4, [r0], #4
      cmp     r2, #4
diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
index 673fc03..a4bd03a 100644
--- a/src/thread/arm/atomics.s
+++ b/src/thread/arm/atomics.s
@@ -6,7 +6,8 @@
  .type __a_barrier,%function
  __a_barrier:
      ldr ip,1f
-    ldr ip,[pc,ip]
+    add ip,pc,ip
+    ldr ip,[ip]
      add pc,pc,ip
  1:    .word __a_barrier_ptr-1b
  .global __a_barrier_dummy
@@ -40,7 +41,8 @@ __a_barrier_v7:
  .type __a_cas,%function
  __a_cas:
      ldr ip,1f
-    ldr ip,[pc,ip]
+    add ip,pc,ip
+    ldr ip,[ip]
      add pc,pc,ip
  1:    .word __a_cas_ptr-1b
  .global __a_cas_dummy
@@ -85,7 +87,8 @@ __aeabi_read_tp:
  .type __a_gettp,%function
  __a_gettp:
      ldr r0,1f
-    ldr r0,[pc,r0]
+    add r0,pc,r0
+    ldr r0,[r0]
      add pc,pc,r0
  1:    .word __a_gettp_ptr-1b
  .global __a_gettp_dummy

Thanks,

Weiming



On 6/14/2016 10:40 AM, Zhao, Weiming wrote:
> Please review the changes.
>
> I'm able to build libc.a with -mcpu=cortex-m3 -Wa,-implicit-it=always 
> using clang + gas
>
> Thanks,
>
> weiming
>
>
> On 6/14/2016 9:58 AM, Zhao, Weiming wrote:
>> Thank you, Szabolcs.
>>
>> It's GAS-only flag, clang's integrated-as doesn't support it. I can 
>> use -no-integrated-as.
>>
>> It also expose other errors:
>>
>> src/thread/arm/atomics.s:9: Error: cannot use register index with 
>> PC-relative addressing -- `ldr ip,[pc,ip]'
>>
>> Clang's integrated-as accepts it but seems encoded incorrectly.
>>
>> I can fix it.
>>
>>
>> On 6/14/2016 9:32 AM, Szabolcs Nagy wrote:
>>> * Zhao, Weiming <weimingz@codeaurora.org> [2016-06-14 09:12:17 -0700]:
>>>> I haven't did a full test as the functions I modified are not 
>>>> actually being
>>>> used.
>>>>
>>>> It is a bare-metal environment, using clang to compile.
>>>>
>>>> Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections
>>>> -mno-unaligned-access
>>>>
>>>> Could you please let me know the gas option?
>>>>
>>> -mimplicit-it=always
>>>
>>> https://sourceware.org/binutils/docs/as/ARM-Options.html
>>>
>>>> Thanks,
>>>>
>>>> Weiing
>>>>
>>>> On 6/14/2016 6:00 AM, Rich Felker wrote:
>>>>> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org 
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
>>>>>> that cannot be assembled because: (1) use predicated instructions
>>>>>> without IT instr (2) use sp inside reg list in ldmia/stmia.
>>>>>>
>>>>>> Please help to review the attached patch.
>>>>> Did you test anything? These patches do not result in working code;
>>>>> they just make it assemble without errors. There's already a gas
>>>>> option to automatically add IT instructions where needed for
>>>>> thumb-only targets, but that's not the only thing needed to support
>>>>> thumb-only/cortex-m. I'd be interested in knowing more about the 
>>>>> setup
>>>>> you're trying to target. Is it Linux or bare-metal? If Linux, are you
>>>>> going to use the ARM/FDPIC toolchain & kernel mods? I'm about to 
>>>>> leave
>>>>> at the moment but I'll follow up with a more detailed review of your
>>>>> patch later.
>>>>>
>>>>>> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
>>>>>> For baremetal, unaligned access may be unavailable.
>>>>> That file does not perform any unaligned access. It should work on 
>>>>> any
>>>>> EABI-supported version of the arm instruction set.
>>>>>
>>>>> Rich
>>>> -- 
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
>>>> hosted by The Linux Foundation
>>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-14 17:40         ` Zhao, Weiming
  2016-06-16 18:34           ` Zhao, Weiming
@ 2016-06-20 17:17           ` Zhao, Weiming
  1 sibling, 0 replies; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-20 17:17 UTC (permalink / raw)
  To: musl

ping?


On 6/14/2016 10:40 AM, Zhao, Weiming wrote:
> Please review the changes.
>
> I'm able to build libc.a with -mcpu=cortex-m3 -Wa,-implicit-it=always 
> using clang + gas
>
> Thanks,
>
> weiming
>
>
> On 6/14/2016 9:58 AM, Zhao, Weiming wrote:
>> Thank you, Szabolcs.
>>
>> It's GAS-only flag, clang's integrated-as doesn't support it. I can 
>> use -no-integrated-as.
>>
>> It also expose other errors:
>>
>> src/thread/arm/atomics.s:9: Error: cannot use register index with 
>> PC-relative addressing -- `ldr ip,[pc,ip]'
>>
>> Clang's integrated-as accepts it but seems encoded incorrectly.
>>
>> I can fix it.
>>
>>
>> On 6/14/2016 9:32 AM, Szabolcs Nagy wrote:
>>> * Zhao, Weiming <weimingz@codeaurora.org> [2016-06-14 09:12:17 -0700]:
>>>> I haven't did a full test as the functions I modified are not 
>>>> actually being
>>>> used.
>>>>
>>>> It is a bare-metal environment, using clang to compile.
>>>>
>>>> Main flags are -mcpu=cortex-m3 -Os -fdata-sections -ffunction-sections
>>>> -mno-unaligned-access
>>>>
>>>> Could you please let me know the gas option?
>>>>
>>> -mimplicit-it=always
>>>
>>> https://sourceware.org/binutils/docs/as/ARM-Options.html
>>>
>>>> Thanks,
>>>>
>>>> Weiing
>>>>
>>>> On 6/14/2016 6:00 AM, Rich Felker wrote:
>>>>> On Tue, Jun 14, 2016 at 01:49:40AM -0700, weimingz@codeaurora.org 
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm building MUSL with -mcpu=cortex-m3. There are a few .s files
>>>>>> that cannot be assembled because: (1) use predicated instructions
>>>>>> without IT instr (2) use sp inside reg list in ldmia/stmia.
>>>>>>
>>>>>> Please help to review the attached patch.
>>>>> Did you test anything? These patches do not result in working code;
>>>>> they just make it assemble without errors. There's already a gas
>>>>> option to automatically add IT instructions where needed for
>>>>> thumb-only targets, but that's not the only thing needed to support
>>>>> thumb-only/cortex-m. I'd be interested in knowing more about the 
>>>>> setup
>>>>> you're trying to target. Is it Linux or bare-metal? If Linux, are you
>>>>> going to use the ARM/FDPIC toolchain & kernel mods? I'm about to 
>>>>> leave
>>>>> at the moment but I'll follow up with a more detailed review of your
>>>>> patch later.
>>>>>
>>>>>> Also, is there any easy way of disabling string/arm/memcpy_le.S ?
>>>>>> For baremetal, unaligned access may be unavailable.
>>>>> That file does not perform any unaligned access. It should work on 
>>>>> any
>>>>> EABI-supported version of the arm instruction set.
>>>>>
>>>>> Rich
>>>> -- 
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
>>>> hosted by The Linux Foundation
>>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-16 18:34           ` Zhao, Weiming
@ 2016-06-20 19:58             ` Rich Felker
  2016-06-22 19:08               ` Zhao, Weiming
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-20 19:58 UTC (permalink / raw)
  To: musl

On Thu, Jun 16, 2016 at 11:34:28AM -0700, Zhao, Weiming wrote:
> I tried to build for armv6m (cortex-m0) and I got other build issues
> with .S and inline asms.
> 
> Below are the changes for building armv7m:

I thought I'd already replied to this but I don't see my reply so I'm
doing it [again?] now.

> diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
> index e28d8f3..e9b9b32 100644
> --- a/src/setjmp/arm/longjmp.s
> +++ b/src/setjmp/arm/longjmp.s
> @@ -8,7 +8,9 @@ longjmp:
>      mov ip,r0
>      movs r0,r1
>      moveq r0,#1
> -    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
> +    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
> +    ldr sp, [ip]!
> +    ldr lr, [ip]!

I think changes like this are ok. They could be conditional on
__thumb__ if they hurt performance measurably on arm but I doubt it
matters.

> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
> index 4db4844..2517d15 100644
> --- a/src/string/arm/memcpy_le.S
> +++ b/src/string/arm/memcpy_le.S
> @@ -241,7 +241,8 @@ non_congruent:
>      beq     2f
>      ldr     r5, [r1], #4
>      sub     r2, r2, #4
> -    orr     r4, r3, r5,             lsl lr
> +    lsl     r4, r5, lr
> +    orr     r4, r3, r4
>      mov     r3, r5,                 lsr r12
>      str     r4, [r0], #4
>      cmp     r2, #4

If this is in a hot path it may need to be conditional.

> diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
> index 673fc03..a4bd03a 100644
> --- a/src/thread/arm/atomics.s
> +++ b/src/thread/arm/atomics.s
> @@ -6,7 +6,8 @@
>  .type __a_barrier,%function
>  __a_barrier:
>      ldr ip,1f
> -    ldr ip,[pc,ip]
> +    add ip,pc,ip
> +    ldr ip,[ip]
>      add pc,pc,ip
>  1:    .word __a_barrier_ptr-1b
>  .global __a_barrier_dummy

As far as I can tell, this does not work at all. The arithmetic on pc
is assuming the particular offset between the instruction using pc and
the following code as arm opcodes.

There's also the matter of the cp15 register load in this file that
doesn't exist on cortex-m. IMO the kernel (or bare-metal trap handler)
needs to trap and emulate it.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-20 19:58             ` Rich Felker
@ 2016-06-22 19:08               ` Zhao, Weiming
  2016-06-22 19:19                 ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-22 19:08 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

Thanks for reviewing.

I add tests for ARMv7m for memcpy.

For atomics.s, I think the below are equivalent:

      ldr ip,1f  ==> assembler will computes the offset from current inst to the label

-    ldr ip,[pc,ip] ==> here the address to be loaded is current PC + ip
+    add ip,pc,ip  ==> here, the PC is the same as above
+    ldr ip,[ip]
But I'm not familiar with the CP15 issue you mentioned.
So, anyway, I skip the change for atomics.s in this patch.

Thanks,
Weiming


On 6/20/2016 12:58 PM, Rich Felker wrote:
> On Thu, Jun 16, 2016 at 11:34:28AM -0700, Zhao, Weiming wrote:
>> I tried to build for armv6m (cortex-m0) and I got other build issues
>> with .S and inline asms.
>>
>> Below are the changes for building armv7m:
> I thought I'd already replied to this but I don't see my reply so I'm
> doing it [again?] now.
>
>> diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
>> index e28d8f3..e9b9b32 100644
>> --- a/src/setjmp/arm/longjmp.s
>> +++ b/src/setjmp/arm/longjmp.s
>> @@ -8,7 +8,9 @@ longjmp:
>>       mov ip,r0
>>       movs r0,r1
>>       moveq r0,#1
>> -    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
>> +    ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
>> +    ldr sp, [ip]!
>> +    ldr lr, [ip]!
> I think changes like this are ok. They could be conditional on
> __thumb__ if they hurt performance measurably on arm but I doubt it
> matters.
>
>> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
>> index 4db4844..2517d15 100644
>> --- a/src/string/arm/memcpy_le.S
>> +++ b/src/string/arm/memcpy_le.S
>> @@ -241,7 +241,8 @@ non_congruent:
>>       beq     2f
>>       ldr     r5, [r1], #4
>>       sub     r2, r2, #4
>> -    orr     r4, r3, r5,             lsl lr
>> +    lsl     r4, r5, lr
>> +    orr     r4, r3, r4
>>       mov     r3, r5,                 lsr r12
>>       str     r4, [r0], #4
>>       cmp     r2, #4
> If this is in a hot path it may need to be conditional.
>
>> diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
>> index 673fc03..a4bd03a 100644
>> --- a/src/thread/arm/atomics.s
>> +++ b/src/thread/arm/atomics.s
>> @@ -6,7 +6,8 @@
>>   .type __a_barrier,%function
>>   __a_barrier:
>>       ldr ip,1f
>> -    ldr ip,[pc,ip]
>> +    add ip,pc,ip
>> +    ldr ip,[ip]
>>       add pc,pc,ip
>>   1:    .word __a_barrier_ptr-1b
>>   .global __a_barrier_dummy
> As far as I can tell, this does not work at all. The arithmetic on pc
> is assuming the particular offset between the instruction using pc and
> the following code as arm opcodes.
>
> There's also the matter of the cp15 register load in this file that
> doesn't exist on cortex-m. IMO the kernel (or bare-metal trap handler)
> needs to trap and emulate it.
>
> Rich

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1546 bytes --]

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..e9b9b32 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -8,7 +8,9 @@ longjmp:
 	mov ip,r0
 	movs r0,r1
 	moveq r0,#1
-	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+	ldr sp, [ip]!
+	ldr lr, [ip]!
 
 	adr r1,1f
 	ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..fd380b0 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
 _setjmp:
 setjmp:
 	mov ip,r0
-	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+	str sp, [ip]!
+	str lr, [ip]!
 	mov r0,#0
 
 	adr r1,1f
diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
index 4db4844..1137f55 100644
--- a/src/string/arm/memcpy_le.S
+++ b/src/string/arm/memcpy_le.S
@@ -241,7 +241,12 @@ non_congruent:
 	beq     2f
 	ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if (__ARM_ARCH_7A || __ARM_ARCH_7R || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3, r5,                 lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4
@@ -348,7 +353,12 @@ less_than_thirtytwo:
 
 1:      ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if (__ARM_ARCH_7A || __ARM_ARCH_7R || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3,     r5,                     lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-22 19:08               ` Zhao, Weiming
@ 2016-06-22 19:19                 ` Rich Felker
  2016-06-22 20:37                   ` Zhao, Weiming
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-22 19:19 UTC (permalink / raw)
  To: musl

On Wed, Jun 22, 2016 at 12:08:13PM -0700, Zhao, Weiming wrote:
> Thanks for reviewing.
> 
> I add tests for ARMv7m for memcpy.
> 
> For atomics.s, I think the below are equivalent:
> 
>      ldr ip,1f  ==> assembler will computes the offset from current inst to the label
> 
> -    ldr ip,[pc,ip] ==> here the address to be loaded is current PC + ip
> +    add ip,pc,ip  ==> here, the PC is the same as above
> +    ldr ip,[ip]

In the original code, pc reads as the address of the instruction
following the ".word" mnemonic (2 ARM insns ahead of the current
instruction). In your version, I believe pc reads as the address of
the .word (2 thumb insns ahead of the current insn) which would be
wrong, but I may be wrong; one source I saw suggested that arithmetic
on pc like this was not even defined in thumb mode on some models.

> But I'm not familiar with the CP15 issue you mentioned.
> So, anyway, I skip the change for atomics.s in this patch.

OK. But just know that you can't expect it to work unless that's
implemented.

> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
> index 4db4844..1137f55 100644
> --- a/src/string/arm/memcpy_le.S
> +++ b/src/string/arm/memcpy_le.S
> @@ -241,7 +241,12 @@ non_congruent:
>  	beq     2f
>  	ldr     r5, [r1], #4
>  	sub     r2, r2, #4
> +#if (__ARM_ARCH_7A || __ARM_ARCH_7R || __ARM_ARCH > 7)
>  	orr     r4, r3, r5,             lsl lr
> +#else
> +	lsl     r4, r5, lr
> +	orr     r4, r3, r4
> +#endif

These are not the right conditions. The selection should be made based
on whether the code is being assembled as thumb, not on the ISA
revision level. As written your patch uses the thumb code on all
pre-v7 targets.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-22 19:19                 ` Rich Felker
@ 2016-06-22 20:37                   ` Zhao, Weiming
  2016-06-22 23:26                     ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-22 20:37 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

How about using the following test?

!defined(__thumb__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH 
 > 7)

Because I think those instructions are ISA related, not just arm/thumb. 
For example, using -mcpu=cortex-a8 -mthumb, the original code still 
works.  So if only testing __thumb__, the armv7-a thumb mode will fall 
into the slow path too.


On 6/22/2016 12:19 PM, Rich Felker wrote:
> On Wed, Jun 22, 2016 at 12:08:13PM -0700, Zhao, Weiming wrote:
>> Thanks for reviewing.
>>
>> I add tests for ARMv7m for memcpy.
>>
>> For atomics.s, I think the below are equivalent:
>>
>>       ldr ip,1f  ==> assembler will computes the offset from current inst to the label
>>
>> -    ldr ip,[pc,ip] ==> here the address to be loaded is current PC + ip
>> +    add ip,pc,ip  ==> here, the PC is the same as above
>> +    ldr ip,[ip]
> In the original code, pc reads as the address of the instruction
> following the ".word" mnemonic (2 ARM insns ahead of the current
> instruction). In your version, I believe pc reads as the address of
> the .word (2 thumb insns ahead of the current insn) which would be
> wrong, but I may be wrong; one source I saw suggested that arithmetic
> on pc like this was not even defined in thumb mode on some models.
>
>> But I'm not familiar with the CP15 issue you mentioned.
>> So, anyway, I skip the change for atomics.s in this patch.
> OK. But just know that you can't expect it to work unless that's
> implemented.

>> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
>> index 4db4844..1137f55 100644
>> --- a/src/string/arm/memcpy_le.S
>> +++ b/src/string/arm/memcpy_le.S
>> @@ -241,7 +241,12 @@ non_congruent:
>>   	beq     2f
>>   	ldr     r5, [r1], #4
>>   	sub     r2, r2, #4
>> +#if (__ARM_ARCH_7A || __ARM_ARCH_7R || __ARM_ARCH > 7)
>>   	orr     r4, r3, r5,             lsl lr
>> +#else
>> +	lsl     r4, r5, lr
>> +	orr     r4, r3, r4
>> +#endif
> These are not the right conditions. The selection should be made based
> on whether the code is being assembled as thumb, not on the ISA
> revision level. As written your patch uses the thumb code on all
> pre-v7 targets.

> Rich

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1600 bytes --]

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..e9b9b32 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -8,7 +8,9 @@ longjmp:
 	mov ip,r0
 	movs r0,r1
 	moveq r0,#1
-	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+	ldr sp, [ip]!
+	ldr lr, [ip]!
 
 	adr r1,1f
 	ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..fd380b0 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
 _setjmp:
 setjmp:
 	mov ip,r0
-	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+	str sp, [ip]!
+	str lr, [ip]!
 	mov r0,#0
 
 	adr r1,1f
diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
index 4db4844..7c61979 100644
--- a/src/string/arm/memcpy_le.S
+++ b/src/string/arm/memcpy_le.S
@@ -241,7 +241,12 @@ non_congruent:
 	beq     2f
 	ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if !defined(__thumb__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3, r5,                 lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4
@@ -348,7 +353,12 @@ less_than_thirtytwo:
 
 1:      ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if !defined(__thumb__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3,     r5,                     lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-22 20:37                   ` Zhao, Weiming
@ 2016-06-22 23:26                     ` Rich Felker
  2016-06-23  0:21                       ` Zhao, Weiming
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-22 23:26 UTC (permalink / raw)
  To: musl

On Wed, Jun 22, 2016 at 01:37:38PM -0700, Zhao, Weiming wrote:
> How about using the following test?
> 
> !defined(__thumb__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ ||
> __ARM_ARCH > 7)
> 
> Because I think those instructions are ISA related, not just
> arm/thumb. For example, using -mcpu=cortex-a8 -mthumb, the original
> code still works.  So if only testing __thumb__, the armv7-a thumb
> mode will fall into the slow path too.

I think perhaps __thumb2__ is what you're looking for...?

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-22 23:26                     ` Rich Felker
@ 2016-06-23  0:21                       ` Zhao, Weiming
  2016-06-23  4:22                         ` Rich Felker
  2016-07-05 20:08                         ` Rich Felker
  0 siblings, 2 replies; 21+ messages in thread
From: Zhao, Weiming @ 2016-06-23  0:21 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

Fixed.

Btw, can we make exit() weak function? (patch attached)

This allows user code to redefine it because in baremetal environment, 
sometime we want to customize it. For example, some code never exits, so 
we can define a dummy exit() and thus save code size.


On 6/22/2016 4:26 PM, Rich Felker wrote:
> On Wed, Jun 22, 2016 at 01:37:38PM -0700, Zhao, Weiming wrote:
>> How about using the following test?
>>
>> !defined(__thumb__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ ||
>> __ARM_ARCH > 7)
>>
>> Because I think those instructions are ISA related, not just
>> arm/thumb. For example, using -mcpu=cortex-a8 -mthumb, the original
>> code still works.  So if only testing __thumb__, the armv7-a thumb
>> mode will fall into the slow path too.
> I think perhaps __thumb2__ is what you're looking for...?
>
> Rich

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1602 bytes --]

diff --git a/src/setjmp/arm/longjmp.s b/src/setjmp/arm/longjmp.s
index e28d8f3..e9b9b32 100644
--- a/src/setjmp/arm/longjmp.s
+++ b/src/setjmp/arm/longjmp.s
@@ -8,7 +8,9 @@ longjmp:
 	mov ip,r0
 	movs r0,r1
 	moveq r0,#1
-	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
+	ldr sp, [ip]!
+	ldr lr, [ip]!
 
 	adr r1,1f
 	ldr r2,1f
diff --git a/src/setjmp/arm/setjmp.s b/src/setjmp/arm/setjmp.s
index 8779163..fd380b0 100644
--- a/src/setjmp/arm/setjmp.s
+++ b/src/setjmp/arm/setjmp.s
@@ -9,7 +9,9 @@ __setjmp:
 _setjmp:
 setjmp:
 	mov ip,r0
-	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp,sp,lr}
+	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
+	str sp, [ip]!
+	str lr, [ip]!
 	mov r0,#0
 
 	adr r1,1f
diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
index 4db4844..df9f5c8 100644
--- a/src/string/arm/memcpy_le.S
+++ b/src/string/arm/memcpy_le.S
@@ -241,7 +241,12 @@ non_congruent:
 	beq     2f
 	ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if !defined(__thumb2__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3, r5,                 lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4
@@ -348,7 +353,12 @@ less_than_thirtytwo:
 
 1:      ldr     r5, [r1], #4
 	sub     r2, r2, #4
+#if !defined(__thumb2__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH > 7)
 	orr     r4, r3, r5,             lsl lr
+#else
+	lsl     r4, r5, lr
+	orr     r4, r3, r4
+#endif
 	mov     r3,     r5,                     lsr r12
 	str     r4, [r0], #4
 	cmp     r2, #4

[-- Attachment #3: patch_exit.diff --]
[-- Type: text/plain, Size: 305 bytes --]

diff --git a/src/exit/exit.c b/src/exit/exit.c
index bf7835a..694d277 100644
--- a/src/exit/exit.c
+++ b/src/exit/exit.c
@@ -25,6 +25,7 @@ static void libc_exit_fini(void)
 
 weak_alias(libc_exit_fini, __libc_exit_fini);
 
+__attribute__((__weak__))
 _Noreturn void exit(int code)
 {
 	__funcs_on_exit();

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-23  0:21                       ` Zhao, Weiming
@ 2016-06-23  4:22                         ` Rich Felker
  2016-06-23  6:04                           ` weimingz
  2016-07-05 20:08                         ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2016-06-23  4:22 UTC (permalink / raw)
  To: musl

On Wed, Jun 22, 2016 at 05:21:54PM -0700, Zhao, Weiming wrote:
> Fixed.
> 
> Btw, can we make exit() weak function? (patch attached)
> 
> This allows user code to redefine it because in baremetal
> environment, sometime we want to customize it. For example, some
> code never exits, so we can define a dummy exit() and thus save code
> size.

No, this is a hack and is not even needed to do what you're asking
for. The way linking an archive works, an object in the archive is
only pulled in to the link process when it satisfies an undefined
symbol reference, and if you defined your own exit, then exit.o would
never have reason to get linked.

But there are all sorts of things that can break unexpectedly from
redefining standard functions, which is why it's UB and this is not
supported usage.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-23  4:22                         ` Rich Felker
@ 2016-06-23  6:04                           ` weimingz
  2016-06-23  9:57                             ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: weimingz @ 2016-06-23  6:04 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker, Rich Felker

If it is not a weak symbol, you may get "multi-def" error. It depends on 
the obj/lib order on command line.
If it is a weak symbol, then a non-weak symbol can override it.

Anyway, your comment reminds me that we can use -wrap=exit to override 
it. So no change is needed.

btw, will you commit the change of those .s files for armv7m ?

Thanks,
Weiming

On 2016-06-22 21:22, Rich Felker wrote:
> On Wed, Jun 22, 2016 at 05:21:54PM -0700, Zhao, Weiming wrote:
>> Fixed.
>> 
>> Btw, can we make exit() weak function? (patch attached)
>> 
>> This allows user code to redefine it because in baremetal
>> environment, sometime we want to customize it. For example, some
>> code never exits, so we can define a dummy exit() and thus save code
>> size.
> 
> No, this is a hack and is not even needed to do what you're asking
> for. The way linking an archive works, an object in the archive is
> only pulled in to the link process when it satisfies an undefined
> symbol reference, and if you defined your own exit, then exit.o would
> never have reason to get linked.
> 
> But there are all sorts of things that can break unexpectedly from
> redefining standard functions, which is why it's UB and this is not
> supported usage.
> 
> Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-23  6:04                           ` weimingz
@ 2016-06-23  9:57                             ` Szabolcs Nagy
  2016-06-23 14:22                               ` weimingz
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2016-06-23  9:57 UTC (permalink / raw)
  To: weimingz; +Cc: musl, Rich Felker, Rich Felker

* weimingz@codeaurora.org <weimingz@codeaurora.org> [2016-06-22 23:04:18 -0700]:
> If it is not a weak symbol, you may get "multi-def" error. It depends on the
> obj/lib order on command line.

the only external symbol definition with non-weak binding
in the exit.o object is exit.

so the only way exit.o can get linked is an undefined reference
to exit.

you only need to make sure that -lc is at the end of the cmdline
when you want to interpose exit with your own definition
(then libc exit.o wont get linked in).

but to repeat rich: interposing libc symbols can break things.

> If it is a weak symbol, then a non-weak symbol can override it.
> 
> Anyway, your comment reminds me that we can use -wrap=exit to override it.
> So no change is needed.
> 
> btw, will you commit the change of those .s files for armv7m ?
> 
> Thanks,
> Weiming
> 
> On 2016-06-22 21:22, Rich Felker wrote:
> >On Wed, Jun 22, 2016 at 05:21:54PM -0700, Zhao, Weiming wrote:
> >>Fixed.
> >>
> >>Btw, can we make exit() weak function? (patch attached)
> >>
> >>This allows user code to redefine it because in baremetal
> >>environment, sometime we want to customize it. For example, some
> >>code never exits, so we can define a dummy exit() and thus save code
> >>size.
> >
> >No, this is a hack and is not even needed to do what you're asking
> >for. The way linking an archive works, an object in the archive is
> >only pulled in to the link process when it satisfies an undefined
> >symbol reference, and if you defined your own exit, then exit.o would
> >never have reason to get linked.
> >
> >But there are all sorts of things that can break unexpectedly from
> >redefining standard functions, which is why it's UB and this is not
> >supported usage.
> >
> >Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-23  9:57                             ` Szabolcs Nagy
@ 2016-06-23 14:22                               ` weimingz
  0 siblings, 0 replies; 21+ messages in thread
From: weimingz @ 2016-06-23 14:22 UTC (permalink / raw)
  To: weimingz; +Cc: musl, Rich Felker, Rich Felker

Correct. Thank you Szabolcs.

Weiming

On 2016-06-23 02:57, Szabolcs Nagy wrote:
> * weimingz@codeaurora.org <weimingz@codeaurora.org> [2016-06-22 
> 23:04:18 -0700]:
>> If it is not a weak symbol, you may get "multi-def" error. It depends 
>> on the
>> obj/lib order on command line.
> 
> the only external symbol definition with non-weak binding
> in the exit.o object is exit.
> 
> so the only way exit.o can get linked is an undefined reference
> to exit.
> 
> you only need to make sure that -lc is at the end of the cmdline
> when you want to interpose exit with your own definition
> (then libc exit.o wont get linked in).
> 
> but to repeat rich: interposing libc symbols can break things.
> 
>> If it is a weak symbol, then a non-weak symbol can override it.
>> 
>> Anyway, your comment reminds me that we can use -wrap=exit to override 
>> it.
>> So no change is needed.
>> 
>> btw, will you commit the change of those .s files for armv7m ?
>> 
>> Thanks,
>> Weiming
>> 
>> On 2016-06-22 21:22, Rich Felker wrote:
>> >On Wed, Jun 22, 2016 at 05:21:54PM -0700, Zhao, Weiming wrote:
>> >>Fixed.
>> >>
>> >>Btw, can we make exit() weak function? (patch attached)
>> >>
>> >>This allows user code to redefine it because in baremetal
>> >>environment, sometime we want to customize it. For example, some
>> >>code never exits, so we can define a dummy exit() and thus save code
>> >>size.
>> >
>> >No, this is a hack and is not even needed to do what you're asking
>> >for. The way linking an archive works, an object in the archive is
>> >only pulled in to the link process when it satisfies an undefined
>> >symbol reference, and if you defined your own exit, then exit.o would
>> >never have reason to get linked.
>> >
>> >But there are all sorts of things that can break unexpectedly from
>> >redefining standard functions, which is why it's UB and this is not
>> >supported usage.
>> >
>> >Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: build musl for armv7m
  2016-06-23  0:21                       ` Zhao, Weiming
  2016-06-23  4:22                         ` Rich Felker
@ 2016-07-05 20:08                         ` Rich Felker
  1 sibling, 0 replies; 21+ messages in thread
From: Rich Felker @ 2016-07-05 20:08 UTC (permalink / raw)
  To: musl

On Wed, Jun 22, 2016 at 05:21:54PM -0700, Zhao, Weiming wrote:
> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
> index 4db4844..df9f5c8 100644
> --- a/src/string/arm/memcpy_le.S
> +++ b/src/string/arm/memcpy_le.S
> @@ -241,7 +241,12 @@ non_congruent:
>  	beq     2f
>  	ldr     r5, [r1], #4
>  	sub     r2, r2, #4
> +#if !defined(__thumb2__) || (__ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH > 7)
>  	orr     r4, r3, r5,             lsl lr
> +#else
> +	lsl     r4, r5, lr
> +	orr     r4, r3, r4
> +#endif

I don't see how this new logic makes sense at all. The goal is to use
the less-efficient code for the case where you're building as thumb
and thumb2 is not supported (i.e. limited to thumb1). That would be:

#if defined(__thumb2__) || !defined(__thumb__)
 	orr     r4, r3, r5,             lsl lr
#else
	lsl     r4, r5, lr
	orr     r4, r3, r4
#endif

Does that look right?

I'm trying to wrap up the 1.1.15 release right now so I don't think
these changes will make it in, but I want to finish addressing _all_
of the thumb/cortex-m compat issues soon, hopefully in the next
release cycle. Let's start a new thread figuring out what's still
needed.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-07-05 20:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14  8:49 build musl for armv7m weimingz
2016-06-14 13:00 ` Rich Felker
2016-06-14 16:12   ` Zhao, Weiming
2016-06-14 16:32     ` Szabolcs Nagy
2016-06-14 16:58       ` Zhao, Weiming
2016-06-14 17:40         ` Zhao, Weiming
2016-06-16 18:34           ` Zhao, Weiming
2016-06-20 19:58             ` Rich Felker
2016-06-22 19:08               ` Zhao, Weiming
2016-06-22 19:19                 ` Rich Felker
2016-06-22 20:37                   ` Zhao, Weiming
2016-06-22 23:26                     ` Rich Felker
2016-06-23  0:21                       ` Zhao, Weiming
2016-06-23  4:22                         ` Rich Felker
2016-06-23  6:04                           ` weimingz
2016-06-23  9:57                             ` Szabolcs Nagy
2016-06-23 14:22                               ` weimingz
2016-07-05 20:08                         ` Rich Felker
2016-06-20 17:17           ` Zhao, Weiming
2016-06-14 14:38 ` Rich Felker
2016-06-14 16:35   ` Zhao, Weiming

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).