From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12175 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] Wasm support patch 3 (the actual arch/wasm) Date: Wed, 29 Nov 2017 15:36:41 -0500 Message-ID: <20171129203641.GF1627@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1511987821 5816 195.159.176.226 (29 Nov 2017 20:37:01 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 29 Nov 2017 20:37:01 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12191-gllmg-musl=m.gmane.org@lists.openwall.com Wed Nov 29 21:36:57 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1eK96A-0000sB-VN for gllmg-musl@m.gmane.org; Wed, 29 Nov 2017 21:36:51 +0100 Original-Received: (qmail 21821 invoked by uid 550); 29 Nov 2017 20:36:55 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 21796 invoked from network); 29 Nov 2017 20:36:54 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12175 Archived-At: On Tue, Nov 28, 2017 at 12:39:03PM +0000, Nicholas Wilson wrote: > Hi, > > Here's the patch with the actual Wasm arch implementation! > > It's got a certain amount of boilerplate, mainly in "arch/wasm/bits" > where certain structures have to be defined per-architecture. I just > copied x32 where possible, since Wasm a new arch and we have free > choice of how to define things, and x32 is another modern arch > that's like i386 but with some legacy cruft removed. > > As you can see, I'm using the "static syscalls" approach. There's a > README explaining the approach in "src/internal/wasm", where I > explain that most syscalls are to be provided by the embedding > environment, but some syscalls like brk can actually be implemented > using native Wasm intrinsics, which I've done in > "src/internal/wasm". > > The result of these three patches (this and the last two emails), is > a version of Musl that builds using Clang's Wasm support, and > creates executable Wasm modules that can use malloc, do I/O, and all > sorts of other things. I haven't tested the full range of syscalls > yet, but the ones I have tried work well. > > Ready for feedback! See inline below: > diff --git a/arch/wasm/atomic_arch.h b/arch/wasm/atomic_arch.h > new file mode 100644 > index 00000000..9da04059 > --- /dev/null > +++ b/arch/wasm/atomic_arch.h > @@ -0,0 +1,74 @@ > +#include > + > +#define a_ctz_l a_ctz_l > +static inline int a_ctz_l(unsigned long x) > +{ > + return __builtin_ctzl(x); > +} > +#define a_ctz_64 a_ctz_64 > +static inline int a_ctz_64(uint64_t x) > +{ > + return __builtin_ctzll(x); > +} > + Unsure about whether this is good; you can just omit them and the higher-level header provides default definitions. I think clang even optimizes them to a single logical insn. Also note that coding style is to indent with tabs not spaces. > +#define a_and_64 a_and_64 > +static inline void a_and_64(volatile uint64_t *p, uint64_t v) > +{ > + // TODO use a WebAssembly CAS builtin, when those arrive with the threads feature > + //__atomic_fetch_and(p, v, __ATOMIC_SEQ_CST); > + *p &= v; > +} These need to be non-dummy at some point but we can discuss that later. However you shouldn't define dummy versions of all of them like this. Either just define dummy a_ll and a_sc, or dummy a_cas. Let the higher level framework take care of all the rest unless you really have a working, better-optimized version of them like x86 does. > +#define a_crash a_crash > +static inline void a_crash() > +{ > + // This generates the Wasm "unreachable" instruction which traps when reached > + __builtin_unreachable(); > +} Shouldn't it be __builtin_trap()? __builtin_unreachable allows the compiler to assume the code path is not reachable, which is exactly the opposite of what we want. > diff --git a/arch/wasm/bits/alltypes.h.in b/arch/wasm/bits/alltypes.h.in > new file mode 100644 > index 00000000..ca4e94a2 > --- /dev/null > +++ b/arch/wasm/bits/alltypes.h.in > @@ -0,0 +1,33 @@ > +#ifdef __wasm32__ > +#define _Addr int > +#define _Int64 long long > +#define _Reg int > + > +#elif defined __wasm64__ > +#define _Addr long > +#define _Int64 long long > +#define _Reg long > + > +#endif Generally we don't do 32/64-bit archs together as one arch with #ifdefs but as separate ones. There are a number of additional things that have to be different, like... > +TYPEDEF __builtin_va_list va_list; > +TYPEDEF __builtin_va_list __isoc_va_list; > + > +#ifndef __cplusplus > +TYPEDEF __WCHAR_TYPE__ wchar_t; > +#endif > +TYPEDEF __WINT_TYPE__ wint_t; > + > +TYPEDEF float float_t; > +TYPEDEF double double_t; > + > +TYPEDEF long time_t; > +TYPEDEF long suseconds_t; > + > +TYPEDEF struct { union { int __i[9]; volatile int __vi[9]; unsigned __s[9]; } __u; } pthread_attr_t; > +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile void *volatile __p[6]; } __u; } pthread_mutex_t; > +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile void *volatile __p[6]; } __u; } mtx_t; > +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[12]; } __u; } pthread_cond_t; > +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[12]; } __u; } cnd_t; > +TYPEDEF struct { union { int __i[8]; volatile int __vi[8]; void *__p[8]; } __u; } pthread_rwlock_t; > +TYPEDEF struct { union { int __i[5]; volatile int __vi[5]; void *__p[5]; } __u; } pthread_barrier_t; ..despite these being in the arch headers, musl policy actually requires all 32-bit archs and all 64-bit archs to have the same pthread type sizes/definitions. Using the 32-bit definitions on a 64-bit arch will not work because of how pthread_impl.h lays out the actual usage of the slots; even if it did for some of them it wouldn't be future-proof. > diff --git a/arch/wasm/bits/float.h b/arch/wasm/bits/float.h > new file mode 100644 > index 00000000..9a56ad14 > --- /dev/null > +++ b/arch/wasm/bits/float.h > @@ -0,0 +1,16 @@ > +#define FLT_EVAL_METHOD __FLT_EVAL_METHOD__ > + > +#define LDBL_TRUE_MIN __LDBL_DENORM_MIN__ > +#define LDBL_MIN __LDBL_MIN__ > +#define LDBL_MAX __LDBL_MAX__ > +#define LDBL_EPSILON __LDBL_EPSILON__ > + > +#define LDBL_MANT_DIG __LDBL_MANT_DIG__ > +#define LDBL_MIN_EXP __LDBL_MIN_EXP__ > +#define LDBL_MAX_EXP __LDBL_MAX_EXP__ > + > +#define LDBL_DIG __LDBL_DIG__ > +#define LDBL_MIN_10_EXP __LDBL_MIN_10_EXP__ > +#define LDBL_MAX_10_EXP __LDBL_MAX_10_EXP__ > + > +#define DECIMAL_DIG __DECIMAL_DIG__ These should be defined explicitly to the correct values so that there are not silently varying arch parameters resulting in undocumentedly incompatible ABIs. I would assume they're the same as double, no? > diff --git a/arch/wasm/bits/limits.h b/arch/wasm/bits/limits.h > new file mode 100644 > index 00000000..649d7d74 > --- /dev/null > +++ b/arch/wasm/bits/limits.h > @@ -0,0 +1,9 @@ > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > +// The WebAssembly fixed page size is 64KiB > +#define PAGE_SIZE 65536 > +#define LONG_BIT (__SIZEOF_LONG__*8) > +#endif > + > +#define LONG_MAX __LONG_MAX__ > +#define LLONG_MAX __LONG_LONG_MAX__ > diff --git a/arch/wasm/bits/posix.h b/arch/wasm/bits/posix.h > new file mode 100644 > index 00000000..c37b94c1 > --- /dev/null > +++ b/arch/wasm/bits/posix.h > @@ -0,0 +1,2 @@ > +#define _POSIX_V6_LP64_OFF64 1 > +#define _POSIX_V7_LP64_OFF64 1 These differ by 32/64-bit. > diff --git a/arch/wasm/bits/setjmp.h b/arch/wasm/bits/setjmp.h > new file mode 100644 > index 00000000..a9262a64 > --- /dev/null > +++ b/arch/wasm/bits/setjmp.h > @@ -0,0 +1 @@ > +typedef unsigned long long __jmp_buf[8]; Are you sure that makes sense? What does the wasm register file look like? What registers are call-saved? > diff --git a/arch/wasm/bits/signal.h b/arch/wasm/bits/signal.h > new file mode 100644 > index 00000000..c8d10a81 > --- /dev/null > +++ b/arch/wasm/bits/signal.h > @@ -0,0 +1,77 @@ > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > + > +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > +#define MINSIGSTKSZ 2048 > +#define SIGSTKSZ 8192 > +#endif > + > +// I'm not expecting any of this to be actually usable... these definitions are > +// just the bare minimum so that src/signal/*.c compiles. > + > +typedef struct { > + unsigned long long __ip_dummy; > +} mcontext_t; Likewise here. > +struct sigaltstack { > + void *ss_sp; > + int ss_flags; > + size_t ss_size; > +}; > + > +typedef struct __ucontext { > + unsigned long uc_flags; > + struct __ucontext *uc_link; > +// stack_t uc_stack; > + mcontext_t uc_mcontext; > + sigset_t uc_sigmask; > +// unsigned long long __fpregs_mem[64]; > +} ucontext_t; Why is uc_stack omitted? > diff --git a/arch/wasm/bits/syscall.h.in b/arch/wasm/bits/syscall.h.in > new file mode 100644 > index 00000000..00c31889 > --- /dev/null > +++ b/arch/wasm/bits/syscall.h.in > @@ -0,0 +1,321 @@ > +// For Wasm, we don't use syscall numbers! We statically link in only the > +// syscalls which are invoked. > + > +#define SYS_accept4 syscall_accept4 They probably need casts to an arithmetic type and need to be in a reserved namespace (like __syscall_accept4, etc.). > diff --git a/arch/wasm/crt_arch.h b/arch/wasm/crt_arch.h > new file mode 100644 > index 00000000..e69de29b If this is empty, how does program entry point get created? How does __libc_start_main get called? > diff --git a/arch/wasm/pthread_arch.h b/arch/wasm/pthread_arch.h > new file mode 100644 > index 00000000..75aac668 > --- /dev/null > +++ b/arch/wasm/pthread_arch.h > @@ -0,0 +1,5 @@ > +static inline struct pthread *__pthread_self(void) { return pthread_self(); } This is a circular definition. You need code to load the thread pointer from whatever part of the register file it's stored in. > diff --git a/arch/wasm/reloc.h b/arch/wasm/reloc.h > new file mode 100644 > index 00000000..eca58d9b > --- /dev/null > +++ b/arch/wasm/reloc.h > @@ -0,0 +1 @@ > +#define CRTJMP(pc,sp) __builtin_unreachable() > \ No newline at end of file Needs newline. This is probably not actually important without dynamic linking. > diff --git a/arch/wasm/syscall_arch.h b/arch/wasm/syscall_arch.h > new file mode 100644 > index 00000000..41baf881 > --- /dev/null > +++ b/arch/wasm/syscall_arch.h > @@ -0,0 +1,333 @@ > +// For both Wasm32 and Wasm64, we assume that the host environment can't > +// provide 64-bit types, and split 64-bit values into two arguments. A Wasm > +// interpreter *could* provide i64 support, but in practice web browsers aren't > +// doing that right now, and any i64 we pass out to a syscall will get chopped > +// to 58-bit precision. > +#define __SYSCALL_LL_E(x) \ > +((union { long long ll; long l[2]; }){ .ll = x }).l[0], \ > +((union { long long ll; long l[2]; }){ .ll = x }).l[1] > +#define __SYSCALL_LL_O(x) 0, __SYSCALL_LL_E((x)) > + > +long __syscall_accept4(long arg1, ...); ... > +### Details > + > +* Due to the type-safe nature of Wasm linkage, syscalls cannot actually be > + variadic if defined externally. Any syscalls called with a variable argument > + count, and not provided here by Musl, could be fixed on a case-by-case basis > + as needed. > \ No newline at end of file If variadic is a problem you could make them just always take 6 fixed long args and make the syscall_arch.h machinery just pass dummy zeros for the unused slots. > diff --git a/src/internal/wasm/syscall_brk.c b/src/internal/wasm/syscall_brk.c > new file mode 100644 > index 00000000..28c5fc23 > --- /dev/null > +++ b/src/internal/wasm/syscall_brk.c > @@ -0,0 +1,17 @@ > +#include > +#include "syscall.h" > + > +long __syscall_brk(long arg1, ...) > +{ > + unsigned long newbrk = (unsigned long)arg1; > + unsigned long pages = __builtin_wasm_current_memory(); > + if (newbrk % PAGE_SIZE) > + goto end; > + unsigned long new_pages = newbrk / PAGE_SIZE; > + if (new_pages <= pages || new_pages >= (0xffffffffu / PAGE_SIZE)) > + goto end; > + if (__builtin_wasm_grow_memory(new_pages - pages) != (unsigned long)-1) > + pages = new_pages; > + end: > + return pages * PAGE_SIZE; > +} As noted before I think it makes sense to just drop brk. It's not needed. > diff --git a/src/internal/wasm/syscall_futex.c b/src/internal/wasm/syscall_futex.c > new file mode 100644 > index 00000000..9c14fa0b > --- /dev/null > +++ b/src/internal/wasm/syscall_futex.c > @@ -0,0 +1,45 @@ > +#include > +#include > +#include > +#include "syscall.h" > + > +// Wasm doesn't *yet* have futex(), but it's being planned as part of the > +// threaded-Wasm support in Spring 2018. > +// > +// For now, Wasm is single-threaded and we simply assert that the lock is not > +// held, and abort if a wait would be required (assume it's a corrupted lock). I think this is probably a bogus assumption; you can't assume anything about how futex is being used to implement locks. Note that a perfectly valid implementation of futex is just one that always returns success; it will simply result in spinning at 100% cpu load waiting for the event to happen rather than going to sleep. This was all a pretty quick, high-level review, but I hope it's helpful. Rich