From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11892 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?J=c3=b6rg_Mische?= Newsgroups: gmane.linux.lib.musl.general Subject: simplification of __aeabi_read_tp Date: Fri, 1 Sep 2017 00:26:05 +0200 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1504218386 28553 195.159.176.226 (31 Aug 2017 22:26:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 31 Aug 2017 22:26:26 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 To: musl@lists.openwall.com Original-X-From: musl-return-11905-gllmg-musl=m.gmane.org@lists.openwall.com Fri Sep 01 00:26:22 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dnXuh-0006rO-87 for gllmg-musl@m.gmane.org; Fri, 01 Sep 2017 00:26:15 +0200 Original-Received: (qmail 1390 invoked by uid 550); 31 Aug 2017 22:26:19 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 1350 invoked from network); 31 Aug 2017 22:26:18 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=EdFTIFC8i4RQ3Gxr16aBrDHyaiWdwejSyKyAAMIwyzQ=; b=oLY6x5gmD2msiwXAdb7/MlDO9F4Xk/LZBrM4VsDkR1wcWKiNVJYXA1dX40g6MT0ZBy K24nAV2CuzCIB3zlXjxe0mPfAPl2sARjNttWxB6swfDxta3Itzssl/NUG/fuOKFEiLjg IzvrmTZKl7Pz2/ak3rDoDcyVLPvTs42n24RNPeBppWkwk8q9AN0j5nauZvhPuIKbvQu8 Qp02Mrw74jNkDVDZKWDlQgc8lyV+Lb7Wb0vIYVHzkvTXc4q6cOfqG2BTl0x42/V6YIhm Uhu9ca1MScIcdr4HwcVrTZvPFDgp0eh2FD2dfNSZnriZrX+SySF6+nz7TlK/yiA9o1ZI tVGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=EdFTIFC8i4RQ3Gxr16aBrDHyaiWdwejSyKyAAMIwyzQ=; b=ucGTFEWesXfORxlMrc+7Ougwh45bMrrBIWQaGqXEDm11k/Ktcx0M0fvWZUcYSWB+Qv lqBkVYkapnJmqvw/D+0mcHeohWoStl/p7p9WWTXVjBaConI4CD22LDCwUGDSy3otR4cT eBhYlAhCMpaVYGYGrzYn5629vfbobi4ZWIGpsMRo46yEF94cPqagBRH9IJUK58uHfJRo ILhz0geUnoVPUUOrUhvqbIUrTaBimj0WX7baiR1TzmCgNm915ldBQkLenT5xRXJpGO+a ft7VGh2uTWr7uWPibWpFpbbeE7P2/yuzIWR/zi98xoE081EVTDAZvIrurbei9psxBo0D Qcyg== X-Gm-Message-State: AHYfb5jtL9hWIuYIAoxWzCcQoZCfjJmPF4HL0qSOZDqOzg+504mk09ir 09dkOddCcpdUNZIu X-Google-Smtp-Source: ADKCNb5hoEtU/pd+7z017YO4/3p9KYPCjFrvZIhDD1ifomnrwg3QGsQKESTR/6SfbdCfh0oGHm/nMw== X-Received: by 10.80.173.245 with SMTP id b50mr2765070edd.143.1504218367002; Thu, 31 Aug 2017 15:26:07 -0700 (PDT) Content-Language: en-US Xref: news.gmane.org gmane.linux.lib.musl.general:11892 Archived-At: Hi, I am trying to adapt the ARM assembler parts to ARMv6-M (the thumb2 subset of the Cortex-M0) without breaking ARMv4T compatibility. One issue is the function __aeabi_read_tp(), which may not clobber any registers except the return value in r0. Register saving code that avoids "pop {lr}" (which is not supported by ARMv6-M) and "pop {pc}" (which is not supported by ARMv4T) is very ugly, therefore I took a closer look at its internals and discovered the following: __aeabi_read_tp() calls __aeabi_read_tp_c() which inlines the function __pthread_self(). With ARMv7 and above, __pthread_self() simply reads the coprocessor register c13 without clobbering any registers. Below ARMv7, the function pointer __a_gettp_ptr is called. __a_gettp_ptr either points to __a_gettp_cp15() (a routine that reads c13) or to the kuser_get_tls function provided by the kernel. The interesting point is that neither __a_gettp_cp15() (only one instruction and a return) nor kuser_get_tls (according to the kernel spec) clobber any registers. The only reason for saving the registers is the indirection via the C-function __aeabi_read_tp_c(), where the compiler is allowed to clobber r0-r3. Since inline functions cannot be called from assembler and any C code must be avoided, I rewrote the code of __pthread_self() directly in assembler in __aeabi_read_tp.S. With these modifications the binary code is not only faster, it also works on the Cortex-M0 processor. Best regards, Jörg --- src/thread/arm/__aeabi_read_tp.S | 22 ++++++++++++++++++++++ src/thread/arm/__aeabi_read_tp.s | 8 -------- src/thread/arm/__aeabi_read_tp_c.c | 8 -------- 3 files changed, 22 insertions(+), 16 deletions(-) diff --git a/src/thread/arm/__aeabi_read_tp.S b/src/thread/arm/__aeabi_read_tp.S new file mode 100644 index 0000000..897b4f8 --- /dev/null +++ b/src/thread/arm/__aeabi_read_tp.S @@ -0,0 +1,22 @@ +.syntax unified +.global __a_gettp_ptr +.hidden __a_gettp_ptr +.global __aeabi_read_tp +.type __aeabi_read_tp,%function +__aeabi_read_tp: + +#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6ZK__) && !__thumb__) || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7 + + mrc p15,0,r0,c13,c0,3 + bx lr + +#else + + ldr r0,2f + add r0,r0,pc + ldr r0,[r0] +1: bx r0 + .align 2 +2: .word __a_gettp_ptr-1b + +#endif diff --git a/src/thread/arm/__aeabi_read_tp.s b/src/thread/arm/__aeabi_read_tp.s deleted file mode 100644 index 9d0cd31..0000000 --- a/src/thread/arm/__aeabi_read_tp.s +++ /dev/null @@ -1,8 +0,0 @@ -.syntax unified -.global __aeabi_read_tp -.type __aeabi_read_tp,%function -__aeabi_read_tp: - push {r1,r2,r3,lr} - bl __aeabi_read_tp_c - pop {r1,r2,r3,lr} - bx lr diff --git a/src/thread/arm/__aeabi_read_tp_c.c b/src/thread/arm/__aeabi_read_tp_c.c deleted file mode 100644 index 654bdc5..0000000 --- a/src/thread/arm/__aeabi_read_tp_c.c +++ /dev/null @@ -1,8 +0,0 @@ -#include "pthread_impl.h" -#include - -__attribute__((__visibility__("hidden"))) -void *__aeabi_read_tp_c(void) -{ - return (void *)((uintptr_t)__pthread_self()-8+sizeof(struct pthread)); -}