From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12247 Path: news.gmane.org!.POSTED!not-for-mail From: Nicholas Wilson Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] Wasm support patch 3 (the actual arch/wasm) Date: Fri, 15 Dec 2017 17:02:47 +0000 Message-ID: References: ,<20171129203641.GF1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1513357386 15630 195.159.176.226 (15 Dec 2017 17:03:06 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 15 Dec 2017 17:03:06 +0000 (UTC) To: "musl@lists.openwall.com" Original-X-From: musl-return-12263-gllmg-musl=m.gmane.org@lists.openwall.com Fri Dec 15 18:03:01 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1ePtO1-0003jq-4l for gllmg-musl@m.gmane.org; Fri, 15 Dec 2017 18:03:01 +0100 Original-Received: (qmail 11511 invoked by uid 550); 15 Dec 2017 17:03:05 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 11493 invoked from network); 15 Dec 2017 17:03:05 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=realvnc.onmicrosoft.com; s=selector1-realvnc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=9tGvU9PHs0ltBW+y9AMFkNKaYBQKP4Ge6QVujOmvPaA=; b=kOlvtDYPyKAh5dDiT5eXm9QeLxljVPsZeu8ahsjQ0MdCw5KdxLy5ivg9638CjjsX3/ZqSITaSbjMKdRpkZjpxEDUlg4WC0pVuYKbVl0RH7vAxSXzN9IHtAP1q22jYXHOlg1MkICmD3TIZaDMFs8R5JDLUBWR/n7vyA6M0q1Z9DM= Thread-Topic: [musl] [PATCH] Wasm support patch 3 (the actual arch/wasm) Thread-Index: AQHTaEUVsg/yBrmdbUC/9rjnZh7tOqMr0vuAgBi0lR0= In-Reply-To: <20171129203641.GF1627@brightrain.aerifal.cx> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=nicholas.wilson@realvnc.com; x-originating-ip: [2a02:390:a001:192:d6be:d9ff:fe9c:1892] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;HE1PR0502MB3881;6:ajcLzxaLM/X7kdHANtEwuYfOLcv3K57X8TfIx2UNZj6uk8U9s1gZR82spEGE6VHuEXp9D31h675kjskt4CRu0+LPODyxed/vjd4nqYIHtb1Kyva3iKYNj9pa6xTDdxXaJBi1MdWRpJMI0++KSQBxIbgMQr/poEBKYd3UpPEPIojrjzY3V0hPVUnmFJ079sPsUaR6RFSdXt6un6cgLWaRyfFj40uT3MvC9LL4EA/GwQ5FXHKErLcQTxlMEBaGzQZqSAmpsNgPRZKboWM4U27edfd1CmpZOvStjUHfXE83csjuK2bQG49vNO4DWyBTbMoPn12M9MDYlirUnqKR67ZwIjLqxmzdsmldQuKln2wJhkY=;5:fu38nr2ml+I7M1vz97nzZTwq8e+oJlbs7s+zBTlEP9GvWe43050Cn6/0IXVkZetUbrlaWHtrwU43NZaX8Dz8NIVlVUY1h2ZkBChcwo79VgtN0h7tKV5/3cuWG0JuPIZ7Lpslg1mNDhw5GVmgkZpt/n3+mqSfLOkdLxYUDjhrqr0=;24:m1+UAjTFo+SpHRPSER+x2IETPrhWgVubpXky9lfZ8Ska7tmtHcGgBuGBlG1A9D6Jw/HgUSWBjaFL/xTXB1UphwKTp/daLL0e2jZrHa9Kfs0=;7:4oT7Dkw8zA086VrLrR3rihkf+6IVq/q2r8cGyBdIwZ4YpJIsDNJoukrSHDmTK/6+dWK4m5M5AwXPgnwe2mI/GUDmAhjp w6gyuv+gefMejVa6dtMiZsVZ1OEV2U7I8D/TQJs1Don2sc5PTRk3Juvem+gkD2ExWRLcWdfOzTRCQJRTzVEvWW7nxLwi+Ort/+ x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 8a8e4fce-7e0e-48c4-93f5-08d543ddab30 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(5600026)(4604075)(4534020)(4602075)(4603075)(4627115)(201702281549075)(2017052603307);SRVR:HE1PR0502MB3881; x-ms-traffictypediagnostic: HE1PR0502MB3881: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(5005006)(8121501046)(3002001)(10201501046)(3231023)(93006095)(93001095)(6041248)(20161123555025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(2016111802025)(20161123558100)(20161123564025)(20161123562025)(6043046)(6072148)(201708071742011);SRVR:HE1PR0502MB3881;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:HE1PR0502MB3881; x-forefront-prvs: 05220145DE x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(366004)(396003)(346002)(39850400004)(376002)(43544003)(24454002)(189003)(199004)(52314003)(5640700003)(5660300001)(53936002)(25786009)(6246003)(102836003)(6116002)(33656002)(6306002)(9686003)(561944003)(229853002)(2351001)(6436002)(316002)(86362001)(7736002)(105586002)(2900100001)(2906002)(106356001)(3280700002)(2501003)(305945005)(74316002)(55016002)(68736007)(6506007)(99286004)(7696005)(76176011)(966005)(97736004)(14454004)(53546011)(59450400001)(5250100002)(6916009)(8676002)(345774005)(3660700001)(1730700003)(8936002)(81156014)(81166006)(478600001)(2950100002);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR0502MB3881;H:HE1PR0502MB3883.eurprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: realvnc.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM X-MS-Exchange-CrossTenant-Network-Message-Id: 8a8e4fce-7e0e-48c4-93f5-08d543ddab30 X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Dec 2017 17:02:47.9995 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 9ad766d3-c6a5-4458-8c58-244e7c118728 X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0502MB3881 X-OriginatorOrg: realvnc.com Xref: news.gmane.org gmane.linux.lib.musl.general:12247 Archived-At: Hi, Thank you for this feedback, it was helpful. It was also put in my spam bin= , hence the late reply :( I've moved the patch into a GitHub branch while I'm working on it, so the l= atest version of the patch is now here: https://github.com/NWilson/musl/pul= l/1 On 29 November 2017 20:36, Rich Felker wrote: >> +#define a_ctz_64 a_ctz_64 >> +static inline int a_ctz_64(uint64_t x) >> +{ >> + return __builtin_ctzll(x); >> +} >> + > Unsure about whether this is good; you can just omit them and the > higher-level header provides default definitions. I think clang even > optimizes them to a single logical insn. If Clang can optimise the DeBruijn thing to the native instruction, that's = magic :) Maybe it's target-specific though. I tested with the a_ctz_64 definition fr= om atomic.h, and found that unfortunately it's *not* optimised with -O2 for= Wasm. Perhaps Musl should default to using the intrinsic for all platforms that d= on't provide it, since the intrinsic's fallback ought to be as good as Musl= 's. Anyway, using the builtin works for now, and it's the only way I've found t= o get the fast instruction to be used. > Also note that coding style is to indent with tabs not spaces. Fixed :) >> +#define a_and_64 a_and_64 >> +static inline void a_and_64(volatile uint64_t *p, uint64_t v) >> +{ >> + // TODO use a WebAssembly CAS builtin, when those arrive with the thr= eads feature >> + //__atomic_fetch_and(p, v, __ATOMIC_SEQ_CST); >> + *p &=3D v; >> +} > These need to be non-dummy at some point but we can discuss that > later. However you shouldn't define dummy versions of all of them like > this. Either just define dummy a_ll and a_sc, or dummy a_cas. Let the > higher level framework take care of all the rest unless you really > have a working, better-optimized version of them like x86 does. I put a placeholder in wasm/atomic_arch.h for each atomic instruction in th= e Wasm atomics proposal, as a reminder and to show what's coming. Multithre= aded Wasm support will come sometime in 2018, for now it's impossible to ha= ve races or threads, so the placeholders are equivalent to what would be ge= nerated via a_cas. >> +#define a_crash a_crash >> +static inline void a_crash() >> +{ >> + // This generates the Wasm "unreachable" instruction which traps when= reached >> + __builtin_unreachable(); >> +} > Shouldn't it be __builtin_trap()? __builtin_unreachable allows the > compiler to assume the code path is not reachable, which is exactly > the opposite of what we want. Good point, __builtin_unreachable is emitted as a trap on Wasm (has the sam= e effect), but you're right the C frontend will probably treat them differe= ntly with respect to inlining and branch elimination. __builtin_trap won't = cause the code leading to it to be culled. Fixed. > Generally we don't do 32/64-bit archs together as one arch with > #ifdefs but as separate ones. We might find that we can get away with it on Wasm though? Apart from the s= ize of long and void* changing, they are pretty much architectures (like ru= nning a CPU in 32/64 bit "mode"). I could take out the Wasm64 support for now, given that it's hypothetical a= nd the toolchain's not done yet. I just thought it would save time later if= the headers were 64-bit ready. >> +TYPEDEF struct { union { int __i[9]; volatile int __vi[9]; unsigned __s= [9]; } __u; } pthread_attr_t; >> +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile voi= d *volatile __p[6]; } __u; } pthread_mutex_t; >> +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile voi= d *volatile __p[6]; } __u; } mtx_t; >> +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[= 12]; } __u; } pthread_cond_t; >> +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[= 12]; } __u; } cnd_t; >> +TYPEDEF struct { union { int __i[8]; volatile int __vi[8]; void *__p[8]= ; } __u; } pthread_rwlock_t; >> +TYPEDEF struct { union { int __i[5]; volatile int __vi[5]; void *__p[5]= ; } __u; } pthread_barrier_t; > ..despite these being in the arch headers, musl policy actually > requires all 32-bit archs and all 64-bit archs to have the same > pthread type sizes/definitions. Using the 32-bit definitions on a > 64-bit arch will not work because of how pthread_impl.h lays out the > actual usage of the slots; even if it did for some of them it wouldn't > be future-proof. Interesting, I had wondered what the rationale was. It looks like most of t= he slots are unused? I can't work out why the 64-bit structures actually ne= ed to be bigger than the 32-bit ones. In any case these are just stubs for now - threading won't work on Wasm. It= 's easier to put some dummy defs in the headers than to hack out all the so= urce files which use threading. >> +++ b/arch/wasm/bits/float.h > These should be defined explicitly to the correct values so that there > are not silently varying arch parameters resulting in undocumentedly > incompatible ABIs. I would assume they're the same as double, no? Fair enough, I've replaced the float constants with hardcoded values. Slightly surprisingly to me, long double seems to be 80-bit precision, like= on ARM64. I haven't really thought about how it should work, I've not been= involved with that part of the compiler frontend. float/double are normal = IEEE on Wasm, so presumably we've got the compiler to emulate higher-precis= ion. >> diff --git a/arch/wasm/bits/limits.h b/arch/wasm/bits/limits.h >> +#define _POSIX_V6_LP64_OFF64 1 >> +#define _POSIX_V7_LP64_OFF64 1 > These differ by 32/64-bit. Thanks! Dan Gohman (Mozilla) spotted it too, hence the patch for x32 the ot= her day. >> diff --git a/arch/wasm/bits/setjmp.h b/arch/wasm/bits/setjmp.h >> +typedef unsigned long long __jmp_buf[8]; > Are you sure that makes sense? What does the wasm register file look > like? What registers are call-saved? setjmp doesn't make sense for Wasm. We don't support setjmp, and Wasm doesn= 't even have any registers at all. It's a stack-based architecture, where i= nstructions push/pop data to the stack for subsequent instructions to consu= me. All the things that don't make sense for Wasm just have dummy definitions, = to keep the build working. In this case, signal.c includes setjmp.h, which = in turn depends on __jmp_buf being defined. Any actual calls to setjmp will= fail to link though, because Wasm doesn't define it. It's not ideal, but putting these dummies in the arch/wasm headers seems ni= cer than messing with code outside the wasm directories. > Why is uc_stack omitted? Ditto for arch/wasm/bits/signal.h, where I've got a comment explaining the = definitions are dummies. All the members of struct __ucontext that are not = referenced by the platform-independent code (only used by arch-specific fil= es) can be omitted from Wasm. >> diff --git a/arch/wasm/bits/syscall.h.in b/arch/wasm/bits/syscall.h.in >> +#define SYS_accept4 syscall_accept4 > They probably need casts to an arithmetic type and need to be in a > reserved namespace (like __syscall_accept4, etc.). Yes, I've tweaked it now so that the syscall casts work well. I think it's = quite a satisfactory solution for the static syscalls now. >> diff --git a/arch/wasm/crt_arch.h b/arch/wasm/crt_arch.h >> new file mode 100644 >> index 00000000..e69de29b > If this is empty, how does program entry point get created? How does > __libc_start_main get called? It's not empty now :) I've added two entrypoints: * The standard _start entrypoint is there for people who want to run run "e= xit(main(...))" straight away. We think that very few people will actually = want this however. * In src/internal/wasm/start_wasm.c I've added a function "_start_wasm" whi= ch initialises stdio and calls pre-main constructors, then simply returns. = We're expecting that this (or something like it) will become the default en= trypoint for the Wasm linker. Typical usage for Wasm will be calling into the (compiled) WebAssembly modu= le from JavaScript. The Wasm module will sit there doing nothing until you = call a function from it, which then becomes the bottom of the stack and we = enter into the Wasm code. When that invocation returns, the heap and all th= e global variables etc are still there in memory, the Wasm module just sits= doing nothing (no execution stack) until the next time you call into it fr= om JavaScript. The linker arranges for _start_wasm to run automatically at the point when = the browser first loads the module, before any other code is executed. The Wasm linker is currently being reworked not to use the .init_array sect= ion anymore, this may need some tweaking soon. Future support will be avail= able exclusively via the @llvm.global.ctors intrinsic - this is the mechani= sm that C++ constructors and __attribute__((ctor)) functions hook into. So - although I thought it was sorted last week, it's now back in flux. >> diff --git a/arch/wasm/pthread_arch.h b/arch/wasm/pthread_arch.h >> +static inline struct pthread *__pthread_self(void) { return pthread_sel= f(); } > This is a circular definition. You need code to load the thread > pointer from whatever part of the register file it's stored in. Hm, you're right, I just inherited this from some Emscripten code. There are no registers or threads, I'll have to implement this using a glob= al. >> diff --git a/arch/wasm/reloc.h b/arch/wasm/reloc.h >> +#define CRTJMP(pc,sp) __builtin_unreachable() > Needs newline. This is probably not actually important without dynamic > linking. Newline added. Making it unreachable seems a good compromise - attempts to = use the dynamic linker won't work, but no special work needs to be done to = adapt the existing code to Wasm yet. >> +* Due to the type-safe nature of Wasm linkage, syscalls cannot actually= be >> + variadic if defined externally. Any syscalls called with a variable = argument >> + count, and not provided here by Musl, could be fixed on a case-by-cas= e basis >> + as needed. >> \ No newline at end of file > If variadic is a problem you could make them just always take 6 fixed > long args and make the syscall_arch.h machinery just pass dummy zeros > for the unused slots. Thanks, that's exactly what I ended up doing. :) > As noted before I think it makes sense to just drop brk. It's not > needed. I've done that, and implemented mmap instead. >> +// Wasm doesn't *yet* have futex(), but it's being planned as part of t= he >> +// threaded-Wasm support in Spring 2018. >> +// >> +// For now, Wasm is single-threaded and we simply assert that the lock = is not >> +// held, and abort if a wait would be required (assume it's a corrupted= lock). > I think this is probably a bogus assumption; you can't assume anything > about how futex is being used to implement locks. Note that a > perfectly valid implementation of futex is just one that always > returns success; it will simply result in spinning at 100% cpu load > waiting for the event to happen rather than going to sleep. I'd argue it is valid. In Wasm, there's only one thread of execution, so if= the condition isn't set now then it won't ever be. It doesn't matter that = futex might have spurious wakeups - on "normal" archs futex will wait, so i= f a single-threaded arch like Wasm finds itself trying to wait on condition= that can't ever happen, then it's reasonable to abort. It's certainly more useful for debugging that causing a spin at 100% CPU. Thanks very much for your feedback Rich, I'll try to keep maintaining my br= anch and keeping you updated on it, if you're happy with the idea that Wasm= might be an accepted architecture (when the code's right). Nick=