From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 3 Feb 2016 09:51:50 -0600 Message-ID: From: Steven Stallion To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Subject: Re: [9fans] FP register usage in Plan9 assembler Topicbox-Message-UUID: 822fec98-ead9-11e9-9d60-3106f5b1d025 On Wed, Feb 3, 2016 at 9:24 AM, erik quanstrom wrote: > i think this is off the original point, but as to modifying the assmbler. > to add a new instruction, the linker, assembler and libmach need modification. > typically this is a matter of adding a line to each one for assembly, linking, > and disassembly, respecively. it's worth looking at the addition of the ymm, > and then zmm registers and those patterns to 6[al] and libmach. this is an > example of adding a new instruction encoding to an existing arch. > >> * The diff to update support for ARMv7-A to 5a came in at over 2800 >> lines; this was to add a handful of instructions. > > do you perhaps mean the linker? > > the pi kernels, which supports v6 (original) and v7 (pi2) rely on small asm > files for arch-specific functions. i think you provided some explaination of > why this approach would not work for v5, but unfortunately, i don't remember > it. The patch ended up in the usual place on sources. The patch itself modernized all of the v7-A ports (this pre-dated the pi2 port by a couple of years) to support common instructions rather than cramming opcodes into the instruction stream. The exynos port relied on it heavily since the Cortex-A15 required a bit more special handling than the simpler cores in the omap, pi2, and teg2 ports. As you mentioned, 5[al], libmach, even acid were updated. You really can't touch one thing without modifying everything else. WRT to drawbacks in the loader for writing assembly, one of the biggest problems is merciless optimization that cannot be disabled on a per translation-unit basis (we're using a loader, remember?) As an example, it's damned near impossible to perform PC-relative branching in the vector table. Instead you end up having to the slower (and sillier) load method that exists today. As far as consistency between architectures go, it's a non-goal for me. C is my portability layer - odds are very good that if I need to dip into assembly, I need complete control over the instruction stream. Steve