Github messages for voidlinux
 help / color / mirror / Atom feed
* [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup.
@ 2024-01-31  8:15 yoshiyoshyosh
  2024-01-31  8:17 ` [PR PATCH] [Updated] " yoshiyoshyosh
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31  8:15 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2765 bytes --]

There is a new pull request by yoshiyoshyosh against master on the void-packages repository

https://github.com/yoshiyoshyosh/void-packages luajit-2.1-rolling
https://github.com/void-linux/void-packages/pull/48453

LuaJIT: update to 2.1.1692580715, cleanup.
#### Testing the changes
- I tested the changes in this PR: **briefly**
  - The only lua thing I really run is awesomewm. I built awesomewm against luajit with the build option and everything seems good, but of course any further testing is encouraged.

#### Local build testing
- I built this PR locally for my native architecture, (`x86_64-glibc`)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - `x86_64-musl`
  - `i686-glibc` (both crossbuild and masterdir)
  - `aarch64-glibc` (crossbuild)
  - `aarch64-musl` (crossbuild)
  - `armv7l-glibc` (crossbuild)

This addresses #48349.

LuaJIT has moved to "rolling releases" on branches in their git repo, which basically means releases are git commits to a `v2.1` branch. Of course, this is incompatible with void's packaging philosophy. However, there also seems to be a `v2.1` *tag* that was created during the move and not updated since. I'm unsure on whether this tag is simply meant to be a marker for the start of v2.1 in the new rolling release era, or if they intend for it to be a stable tag that "releases" might occasionally get pushed to every now and then.
Whatever the case, this is a tag that was "released" in a form they seemingly deem stable enough, which is why I think of it as enough to update to (especially since we'd be getting off a 6 year old version to a 5 month old version now).

Regarding the version number: In the makefiles, there exists a `RELVER` macro [that gets set by a `git show` command](https://repo.or.cz/luajit-2.0.git/blob/2090842410e0ba6f81fad310a77bf5432488249a:/src/Makefile#l478). The "canonical" version number in the makefiles then becomes `major.minor.relver` and the binary/library version is installed with this version number. This is the only real patch version number that we have, so it's what I believe should go in the version number. I just used the same `git show` that they use and baked it into `version`

I removed all but two of the patches, as they have either been upstreamed into the `v2.1` tag or were for powerpc, which void doesn't support anymore. Should we even have the "enable debug symbols" patch for main repo builds instead of leaving it to `-dbg`? I'm only keeping it because every other distribution keeps it.

I also just cleaned up the template in general; it's more concise and organized IMO while achieving the same thing.

A patch file from https://github.com/void-linux/void-packages/pull/48453.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-luajit-2.1-rolling-48453.patch --]
[-- Type: text/x-diff, Size: 158326 bytes --]

From f101aea54d1e29c899c7d5361bbdab64cb6b00fb Mon Sep 17 00:00:00 2001
From: yosh <yosh-git@riseup.net>
Date: Wed, 31 Jan 2024 02:54:09 -0500
Subject: [PATCH] LuaJIT: update to 2.1.1692580715, cleanup.

---
 .../patches/ppc/musl-ppc-secureplt.patch      |   93 -
 .../patches/ppc64/add-ppc64-support.patch     | 3521 -----------------
 .../patches/ppc64/fix-vm-jit-ppc64.patch      |   11 -
 .../aarch64-Fix-exit-stub-patching.patch      |  231 --
 .../aarch64-register-allocation-bug-fix.patch |   29 -
 ...1abec542e6f9851ff2368e7f196b6382a44c.patch |  562 ---
 .../LuaJIT/patches/enable-debug-symbols.patch |   14 +-
 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch |   33 -
 .../get-rid-of-luajit-version-sym.patch       |   37 +-
 .../patches/unpollute-global-namespace.patch  |   21 -
 srcpkgs/LuaJIT/template                       |   73 +-
 srcpkgs/LuaJIT/update                         |    2 +-
 12 files changed, 47 insertions(+), 4580 deletions(-)
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch

diff --git a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch b/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
deleted file mode 100644
index 3000ca0ed3d53..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-Imported from https://github.com/LuaJIT/LuaJIT/pull/486.
-
-This fixes crashes on ppc-musl, as musl only supports secureplt.
-
---- a/src/lj_dispatch.c
-+++ b/src/lj_dispatch.c
-@@ -56,6 +56,18 @@ static const ASMFunction dispatch_got[] = {
- #undef GOTFUNC
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+#include <math.h>
-+LJ_FUNCA_NORET void LJ_FASTCALL lj_ffh_coroutine_wrap_err(lua_State *L,
-+							  lua_State *co);
-+
-+#define GOTFUNC(name)	(ASMFunction)name,
-+static const ASMFunction dispatch_got[] = {
-+  GOTDEF(GOTFUNC)
-+};
-+#undef GOTFUNC
-+#endif
-+
- /* Initialize instruction dispatch table and hot counters. */
- void lj_dispatch_init(GG_State *GG)
- {
-@@ -77,6 +89,9 @@ void lj_dispatch_init(GG_State *GG)
- #if LJ_TARGET_MIPS
-   memcpy(GG->got, dispatch_got, LJ_GOT__MAX*sizeof(ASMFunction *));
- #endif
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+  memcpy(GG->got, dispatch_got, LJ_GOT__MAX*4);
-+#endif
- }
- 
- #if LJ_HASJIT
---- a/src/lj_dispatch.h
-+++ b/src/lj_dispatch.h
-@@ -66,6 +66,21 @@ GOTDEF(GOTENUM)
- };
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+/* Need our own global offset table for the dreaded MIPS calling conventions. */
-+#define GOTDEF(_) \
-+  _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \
-+  _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \
-+  _(pow) _(fmod) _(ldexp) _(sqrt)
-+
-+enum {
-+#define GOTENUM(name) LJ_GOT_##name,
-+GOTDEF(GOTENUM)
-+#undef GOTENUM
-+  LJ_GOT__MAX
-+};
-+#endif
-+
- /* Type of hot counter. Must match the code in the assembler VM. */
- /* 16 bits are sufficient. Only 0.0015% overhead with maximum slot penalty. */
- typedef uint16_t HotCount;
-@@ -89,7 +104,7 @@ typedef uint16_t HotCount;
- typedef struct GG_State {
-   lua_State L;				/* Main thread. */
-   global_State g;			/* Global state. */
--#if LJ_TARGET_MIPS
-+#if LJ_TARGET_MIPS || (LJ_TARGET_PPC && (LJ_ARCH_BITS == 32))
-   ASMFunction got[LJ_GOT__MAX];		/* Global offset table. */
- #endif
- #if LJ_HASJIT
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -59,7 +59,12 @@
- |.define ENV_OFS,	8
- |.endif
- |.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro blex, target
-+|  lwz TMP0, DISPATCH_GOT(target)(DISPATCH)
-+|  mtctr TMP0
-+|  bctrl
-+|  //bl extern target@plt
-+|.endmacro
- |.macro .toc, a, b; .endmacro
- |.endif
- |.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-@@ -448,6 +453,8 @@
- |// Assumes DISPATCH is relative to GL.
- #define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
- #define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
-+#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
-+#define DISPATCH_GOT(name)	(GG_DISP2GOT + 4*LJ_GOT_##name)
- |
- #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
- |
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch b/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
deleted file mode 100644
index 7c865859da923..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
+++ /dev/null
@@ -1,3521 +0,0 @@
-From: "Rodrigo R. Galvao" <rosattig@br.ibm.com>
-Date: Wed, 11 Oct 2017 08:41:47 +0000
-Subject: New patch proposal for PPC64 support
-
- Create a patch for PPC64 support based on 
-https://github.com/LuaJIT/LuaJIT/pull/140.
- It replaces the old patch since this new one is more likely to be merged 
-with luajit upstream.
-
-
-Author: Rodrigo R. Galvao <rosattig@br.ibm.com>
----
- dynasm/dasm_ppc.lua    |    5 +
- src/Makefile           |   11 +-
- src/host/buildvm_asm.c |   16 +-
- src/lj_arch.h          |   18 +-
- src/lj_ccall.c         |  166 ++++++-
- src/lj_ccall.h         |   13 +
- src/lj_ccallback.c     |   68 ++-
- src/lj_ctype.h         |    2 +-
- src/lj_def.h           |    4 +
- src/lj_frame.h         |    9 +
- src/lj_target_ppc.h    |   14 +
- src/vm_ppc.dasc        | 1290 ++++++++++++++++++++++++++++++++----------------
- 12 files changed, 1162 insertions(+), 454 deletions(-)
-
-diff --git dynasm/dasm_ppc.lua dynasm/dasm_ppc.lua
-index f73974d..a4ad70b 100644
---- a/dynasm/dasm_ppc.lua
-+++ b/dynasm/dasm_ppc.lua
-@@ -257,9 +257,11 @@ map_op = {
-   addic_3 =	"30000000RRI",
-   ["addic._3"] = "34000000RRI",
-   addi_3 =	"38000000RR0I",
-+  addil_3 =	"38000000RR0J",
-   li_2 =	"38000000RI",
-   la_2 =	"38000000RD",
-   addis_3 =	"3c000000RR0I",
-+  addisl_3 =	"3c000000RR0J",
-   lis_2 =	"3c000000RI",
-   lus_2 =	"3c000000RU",
-   bc_3 =	"40000000AAK",
-@@ -842,6 +844,9 @@ map_op = {
-   srdi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "64-("..p[3]..")"
-   end),
-+  ["srdi._3"] =	op_alias("rldicl._4", function(p)
-+    p[4] = p[3]; p[3] = "64-("..p[3]..")"
-+  end),
-   clrldi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "0"
-   end),
-diff --git src/Makefile src/Makefile
-index 6b73a89..cc50bae 100644
---- a/src/Makefile
-+++ b/src/Makefile
-@@ -453,7 +453,16 @@ ifeq (ppc,$(TARGET_LJARCH))
-     DASM_AFLAGS+= -D GPR64
-   endif
-   ifeq (PS3,$(TARGET_SYS))
--    DASM_AFLAGS+= -D PPE -D TOC
-+    DASM_AFLAGS+= -D PPE
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPD 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPD
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPDENV 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPDENV
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_ELFV2 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D ELFV2
-   endif
-   ifneq (,$(findstring LJ_ARCH_PPC64 ,$(TARGET_TESTARCH)))
-     DASM_ARCH= ppc64
-diff --git src/host/buildvm_asm.c src/host/buildvm_asm.c
-index ffd1490..6bb995e 100644
---- a/src/host/buildvm_asm.c
-+++ b/src/host/buildvm_asm.c
-@@ -140,18 +140,14 @@ static void emit_asm_wordreloc(BuildCtx *ctx, uint8_t *p, int n,
- #else
- #define TOCPREFIX ""
- #endif
--  if ((ins >> 26) == 16) {
-+  if ((ins >> 26) == 14) {
-+    fprintf(ctx->fp, "\taddi %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 15) {
-+    fprintf(ctx->fp, "\taddis %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 16) {
-     fprintf(ctx->fp, "\t%s %d, %d, " TOCPREFIX "%s\n",
- 	    (ins & 1) ? "bcl" : "bc", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-   } else if ((ins >> 26) == 18) {
--#if LJ_ARCH_PPC64
--    const char *suffix = strchr(sym, '@');
--    if (suffix && suffix[1] == 'h') {
--      fprintf(ctx->fp, "\taddis 11, 2, %s\n", sym);
--    } else if (suffix && suffix[1] == 'l') {
--      fprintf(ctx->fp, "\tld 12, %s\n", sym);
--    } else
--#endif
-     fprintf(ctx->fp, "\t%s " TOCPREFIX "%s\n", (ins & 1) ? "bl" : "b", sym);
-   } else {
-     fprintf(stderr,
-@@ -250,7 +246,7 @@ void emit_asm(BuildCtx *ctx)
-   int i, rel;
- 
-   fprintf(ctx->fp, "\t.file \"buildvm_%s.dasc\"\n", ctx->dasm_arch);
--#if LJ_ARCH_PPC64
-+#if LJ_ARCH_PPC_ELFV2
-   fprintf(ctx->fp, "\t.abiversion 2\n");
- #endif
-   fprintf(ctx->fp, "\t.text\n");
-diff --git src/lj_arch.h src/lj_arch.h
-index d609b37..53bc651 100644
---- a/src/lj_arch.h
-+++ b/src/lj_arch.h
-@@ -269,10 +269,18 @@
- #if LJ_TARGET_CONSOLE
- #define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOFFI		1
-+#if LJ_TARGET_PS3
-+#define LJ_ARCH_PPC_OPD		1
-+#endif
- #elif LJ_ARCH_BITS == 64
--#define LJ_ARCH_PPC64		1
--#define LJ_TARGET_GC64		1
-+#define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOJIT		1	/* NYI */
-+#if _CALL_ELF == 2
-+#define LJ_ARCH_PPC_ELFV2	1
-+#else
-+#define LJ_ARCH_PPC_OPD		1
-+#define LJ_ARCH_PPC_OPDENV	1
-+#endif
- #endif
- 
- #if _ARCH_PWR7
-@@ -423,12 +431,6 @@
- #if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
- #error "No support for PowerPC CPUs without double-precision FPU"
- #endif
--#if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
--#error "No support for little-endian PPC32"
--#endif
--#if LJ_ARCH_PPC64
--#error "No support for PowerPC 64 bit mode (yet)"
--#endif
- #ifdef __NO_FPRS__
- #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
- #endif
-diff --git src/lj_ccall.c src/lj_ccall.c
-index 5c252e5..b891591 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -369,21 +369,97 @@
- #elif LJ_TARGET_PPC
- /* -- PPC calling conventions --------------------------------------------- */
- 
-+#if LJ_ARCH_BITS == 64
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  if (sz > 16 && ccall_classify_fp(cts, ctr) <= 0) { \
-+    cc->retref = 1;  /* Return by reference. */ \
-+    cc->gpr[ngpr++] = (GPRArg)dp; \
-+  }
-+
-+#define CCALL_HANDLE_STRUCTRET2 \
-+  int isfp = ccall_classify_fp(cts, ctr); \
-+  int i; \
-+  if (isfp == FTYPE_FLOAT) { \
-+    for (i = 0; i < ctr->size / 4; i++) \
-+      ((float *)dp)[i] = cc->fpr[i]; \
-+  } else if (isfp == FTYPE_DOUBLE) { \
-+    for (i = 0; i < ctr->size / 8; i++) \
-+      ((double *)dp)[i] = cc->fpr[i]; \
-+  } else { \
-+    if (ctr->size < 8 && LJ_BE) { \
-+      sp += 8 - ctr->size; \
-+    } \
-+    memcpy(dp, sp, ctr->size); \
-+  }
-+
-+#else
-+
- #define CCALL_HANDLE_STRUCTRET \
-   cc->retref = 1;  /* Return all structs by reference. */ \
-   cc->gpr[ngpr++] = (GPRArg)dp;
- 
-+#endif
-+
- #define CCALL_HANDLE_COMPLEXRET \
-   /* Complex values are returned in 2 or 4 GPRs. */ \
-   cc->retref = 0;
- 
-+#define CCALL_HANDLE_STRUCTARG
-+
- #define CCALL_HANDLE_COMPLEXRET2 \
--  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+  if (ctr->size == 2*sizeof(float)) {  /* Copy complex float from FPRs. */ \
-+    ((float *)dp)[0] = cc->fpr[0]; \
-+    ((float *)dp)[1] = cc->fpr[1]; \
-+  } else {  /* Copy complex double from FPRs. */ \
-+    ((double *)dp)[0] = cc->fpr[0]; \
-+    ((double *)dp)[1] = cc->fpr[1]; \
-+  }
-+
-+#define CCALL_HANDLE_COMPLEXARG \
-+  isfp = 1; \
-+  if (d->size == sizeof(float) * 2) { \
-+    d = ctype_get(cts, CTID_COMPLEX_DOUBLE); \
-+    isf32 = 1; \
-+  }
-+
-+#define CCALL_HANDLE_REGARG \
-+  if (isfp && d->size == sizeof(float)) { \
-+    d = ctype_get(cts, CTID_DOUBLE); \
-+    isf32 = 1; \
-+  } \
-+  if (ngpr < maxgpr) { \
-+   dp = &cc->gpr[ngpr]; \
-+   ngpr += n; \
-+   if (ngpr > maxgpr) { \
-+     nsp += ngpr - 8; \
-+     ngpr = 8; \
-+     if (nsp > CCALL_MAXSTACK) { \
-+       goto err_nyi; \
-+     } \
-+   } \
-+   goto done; \
-+  }
-+
-+#else
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  cc->retref = 1;  /* Return all structs by reference. */ \
-+  cc->gpr[ngpr++] = (GPRArg)dp;
-+
-+#define CCALL_HANDLE_COMPLEXRET \
-+  /* Complex values are returned in 2 or 4 GPRs. */ \
-+  cc->retref = 0;
- 
- #define CCALL_HANDLE_STRUCTARG \
-   rp = cdataptr(lj_cdata_new(cts, did, sz)); \
-   sz = CTSIZE_PTR;  /* Pass all structs by reference. */
- 
-+#define CCALL_HANDLE_COMPLEXRET2 \
-+  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+
- #define CCALL_HANDLE_COMPLEXARG \
-   /* Pass complex by value in 2 or 4 GPRs. */
- 
-@@ -410,6 +486,8 @@
-     } \
-   }
- 
-+#endif
-+
- #define CCALL_HANDLE_RET \
-   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
-     ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
-@@ -801,6 +879,50 @@ noth:  /* Not a homogeneous float/double aggregate. */
- 
- #endif
- 
-+/* -- PowerPC64 ELFv2 ABI struct classification ------------------- */
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define FTYPE_FLOAT	1
-+#define FTYPE_DOUBLE	2
-+
-+static unsigned int ccall_classify_fp(CTState *cts, CType *ct) {
-+  if (ctype_isfp(ct->info)) {
-+    if (ct->size == sizeof(float))
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_iscomplex(ct->info)) {
-+    if (ct->size == sizeof(float) * 2)
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_isstruct(ct->info)) {
-+    int res = -1;
-+    int sz = ct->size;
-+    while (ct->sib) {
-+      ct = ctype_get(cts, ct->sib);
-+      if (ctype_isfield(ct->info)) {
-+        int sub = ccall_classify_fp(cts, ctype_rawchild(cts, ct));
-+        if (res == -1)
-+          res = sub;
-+        if (sub != -1 && sub != res)
-+          return 0;
-+      } else if (ctype_isbitfield(ct->info) ||
-+        ctype_isxattrib(ct->info, CTA_SUBTYPE)) {
-+        return 0;
-+      }
-+    }
-+    if (res > 0 && sz > res * 4 * 8)
-+      return 0;
-+    return res;
-+  } else {
-+    return 0;
-+  }
-+}
-+
-+#endif
-+
- /* -- MIPS64 ABI struct classification ---------------------------- */
- 
- #if LJ_TARGET_MIPS64
-@@ -974,6 +1096,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     CTSize sz;
-     MSize n, isfp = 0, isva = 0;
-     void *dp, *rp = NULL;
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    int isf32 = 0;
-+#endif
- 
-     if (fid) {  /* Get argument type from field. */
-       CType *ctf = ctype_get(cts, fid);
-@@ -1030,7 +1155,37 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-       *(void **)dp = rp;
-       dp = rp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64 && LJ_BE
-+    if (ctype_isstruct(d->info) && sz < CTSIZE_PTR) {
-+      dp = (char *)dp + (CTSIZE_PTR - sz);
-+    }
-+#endif
-     lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg));
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (isfp) {
-+      int i;
-+      for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+        cc->fpr[nfpr++] = ((double *)dp)[i];
-+    }
-+    if (isf32) {
-+      int i;
-+      for (i = 0; i < d->size / 8; i++)
-+        ((float *)dp)[i*2] = ((double *)dp)[i];
-+    }
-+#endif
-+#if LJ_ARCH_PPC_ELFV2
-+    if (ctype_isstruct(d->info)) {
-+      isfp = ccall_classify_fp(cts, d);
-+      int i;
-+      if (isfp == FTYPE_FLOAT) {
-+        for (i = 0; i < d->size / 4 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((float *)dp)[i];
-+      } else if (isfp == FTYPE_DOUBLE) {
-+        for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((double *)dp)[i];
-+      }
-+    }
-+#endif
-     /* Extend passed integers to 32 bits at least. */
-     if (ctype_isinteger_or_bool(d->info) && d->size < 4) {
-       if (d->info & CTF_UNSIGNED)
-@@ -1044,6 +1199,15 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     if (isfp && d->size == sizeof(float))
-       ((float *)dp)[1] = ((float *)dp)[0];  /* Floats occupy high slot. */
- #endif
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info))
-+	&& d->size <= 4) {
-+      if (d->info & CTF_UNSIGNED)
-+	*(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info)
- #if LJ_TARGET_MIPS64
-diff --git src/lj_ccall.h src/lj_ccall.h
-index 59f6648..bbf309f 100644
---- a/src/lj_ccall.h
-+++ b/src/lj_ccall.h
-@@ -86,10 +86,23 @@ typedef union FPRArg {
- #elif LJ_TARGET_PPC
- 
- #define CCALL_NARG_GPR		8
-+#if LJ_ARCH_BITS == 64
-+#define CCALL_NARG_FPR		13
-+#if LJ_ARCH_PPC_ELFV2
-+#define CCALL_NRET_GPR		2
-+#define CCALL_NRET_FPR		8
-+#define CCALL_SPS_EXTRA		14
-+#else
-+#define CCALL_NRET_GPR		1
-+#define CCALL_NRET_FPR		2
-+#define CCALL_SPS_EXTRA		16
-+#endif
-+#else
- #define CCALL_NARG_FPR		8
- #define CCALL_NRET_GPR		4	/* For complex double. */
- #define CCALL_NRET_FPR		1
- #define CCALL_SPS_EXTRA		4
-+#endif
- #define CCALL_SPS_FREE		0
- 
- typedef intptr_t GPRArg;
-diff --git src/lj_ccallback.c src/lj_ccallback.c
-index 846827b..eb7f445 100644
---- a/src/lj_ccallback.c
-+++ b/src/lj_ccallback.c
-@@ -61,8 +61,24 @@ static MSize CALLBACK_OFS2SLOT(MSize ofs)
- 
- #elif LJ_TARGET_PPC
- 
-+#if LJ_ARCH_PPC_OPD
-+
-+#define CALLBACK_SLOT2OFS(slot)		(24*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/24)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_OFS2SLOT(CALLBACK_MCODE_SIZE))
-+
-+#elif LJ_ARCH_PPC_ELFV2
-+
-+#define CALLBACK_SLOT2OFS(slot)		(4*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/4)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_MCODE_SIZE/4 - 10)
-+
-+#else
-+
- #define CALLBACK_MCODE_HEAD		24
- 
-+#endif
-+
- #elif LJ_TARGET_MIPS32
- 
- #define CALLBACK_MCODE_HEAD		20
-@@ -188,24 +204,59 @@ static void callback_mcode_init(global_State *g, uint32_t *page)
-   lua_assert(p - page <= CALLBACK_MCODE_SIZE);
- }
- #elif LJ_TARGET_PPC
-+#if LJ_ARCH_PPC_OPD
-+register void *vm_toc __asm__("r2");
-+static void callback_mcode_init(global_State *g, uint64_t *page)
-+{
-+  uint64_t *p = page;
-+  void *target = (void *)lj_vm_ffi_callback;
-+  MSize slot;
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p++ = (uint64_t)target;
-+    *p++ = (uint64_t)vm_toc;
-+    *p++ = (uint64_t)g | ((uint64_t)slot << 47);
-+  }
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 8);
-+}
-+#else
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-   uint32_t *p = page;
-   void *target = (void *)lj_vm_ffi_callback;
-   MSize slot;
-+#if LJ_ARCH_PPC_ELFV2
-+  // Needs to be in sync with lj_vm_ffi_callback.
-+  lua_assert(CALLBACK_MCODE_SIZE == 4096);
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p = PPCI_B | (((page+CALLBACK_MAX_SLOT-p) & 0x00ffffffu) << 2);
-+    p++;
-+  }
-+  *p++ = PPCI_LI | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 32) & 0xffff);
-+  *p++ = PPCI_LI | PPCF_T(RID_R11) | ((((intptr_t)g) >> 32) & 0xffff);
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_SYS1) | PPCF_A(RID_SYS1) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_R11) | PPCF_A(RID_R11) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_ORIS | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 16) & 0xffff);
-+  *p++ = PPCI_ORIS | PPCF_A(RID_R11) | PPCF_T(RID_R11) | ((((intptr_t)g) >> 16) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | (((intptr_t)target) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11) | PPCF_T(RID_R11) | (((intptr_t)g) & 0xffff);
-+  *p++ = PPCI_MTCTR | PPCF_T(RID_SYS1);
-+  *p++ = PPCI_BCTR;
-+#else
-   *p++ = PPCI_LIS | PPCF_T(RID_TMP) | (u32ptr(target) >> 16);
--  *p++ = PPCI_LIS | PPCF_T(RID_R12) | (u32ptr(g) >> 16);
-+  *p++ = PPCI_LIS | PPCF_T(RID_R11) | (u32ptr(g) >> 16);
-   *p++ = PPCI_ORI | PPCF_A(RID_TMP)|PPCF_T(RID_TMP) | (u32ptr(target) & 0xffff);
--  *p++ = PPCI_ORI | PPCF_A(RID_R12)|PPCF_T(RID_R12) | (u32ptr(g) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11)|PPCF_T(RID_R11) | (u32ptr(g) & 0xffff);
-   *p++ = PPCI_MTCTR | PPCF_T(RID_TMP);
-   *p++ = PPCI_BCTR;
-   for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
--    *p++ = PPCI_LI | PPCF_T(RID_R11) | slot;
-+    *p++ = PPCI_LI | PPCF_T(RID_R12) | slot;
-     *p = PPCI_B | (((page-p) & 0x00ffffffu) << 2);
-     p++;
-   }
--  lua_assert(p - page <= CALLBACK_MCODE_SIZE);
-+#endif
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 4);
- }
-+#endif
- #elif LJ_TARGET_MIPS
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-@@ -641,6 +692,15 @@ static void callback_conv_result(CTState *cts, lua_State *L, TValue *o)
- 	*(int32_t *)dp = ctr->size == 1 ? (int32_t)*(int8_t *)dp :
- 					  (int32_t)*(int16_t *)dp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (ctr->size <= 4 &&
-+       (ctype_isinteger_or_bool(ctr->info) || ctype_isenum(ctr->info))) {
-+      if (ctr->info & CTF_UNSIGNED)
-+        *(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     /* Always sign-extend results to 64 bits. Even a soft-fp 'float'. */
-     if (ctr->size <= 4 &&
-diff --git src/lj_ctype.h src/lj_ctype.h
-index 0c220a8..105865b 100644
---- a/src/lj_ctype.h
-+++ b/src/lj_ctype.h
-@@ -153,7 +153,7 @@ typedef struct CType {
- 
- /* Simplify target-specific configuration. Checked in lj_ccall.h. */
- #define CCALL_MAX_GPR		8
--#define CCALL_MAX_FPR		8
-+#define CCALL_MAX_FPR		14
- 
- typedef LJ_ALIGN(8) union FPRCBArg { double d; float f[2]; } FPRCBArg;
- 
-diff --git src/lj_def.h src/lj_def.h
-index 2d8fff6..381d6f5 100644
---- a/src/lj_def.h
-+++ b/src/lj_def.h
-@@ -71,7 +71,11 @@ typedef unsigned int uintptr_t;
- #define LJ_MAX_IDXCHAIN	100		/* __index/__newindex chain limit. */
- #define LJ_STACK_EXTRA	(5+2*LJ_FR2)	/* Extra stack space (metamethods). */
- 
-+#if defined(__powerpc64__) && _CALL_ELF != 2
-+#define LJ_NUM_CBPAGE	4		/* Number of FFI callback pages. */
-+#else
- #define LJ_NUM_CBPAGE	1		/* Number of FFI callback pages. */
-+#endif
- 
- /* Minimum table/buffer sizes. */
- #define LJ_MIN_GLOBAL	6		/* Min. global table size (hbits). */
-diff --git src/lj_frame.h src/lj_frame.h
-index 19c49a4..c666418 100644
---- a/src/lj_frame.h
-+++ b/src/lj_frame.h
-@@ -210,6 +210,15 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
- #define CFRAME_OFS_MULTRES	408
- #define CFRAME_SIZE		384
- #define CFRAME_SHIFT_MULTRES	3
-+#elif LJ_ARCH_PPC_ELFV2
-+#define CFRAME_OFS_ERRF		360
-+#define CFRAME_OFS_NRES		356
-+#define CFRAME_OFS_PREV		336
-+#define CFRAME_OFS_L		352
-+#define CFRAME_OFS_PC		348
-+#define CFRAME_OFS_MULTRES	344
-+#define CFRAME_SIZE		368
-+#define CFRAME_SHIFT_MULTRES	3
- #elif LJ_ARCH_PPC32ON64
- #define CFRAME_OFS_ERRF		472
- #define CFRAME_OFS_NRES		468
-diff --git src/lj_target_ppc.h src/lj_target_ppc.h
-index c5c991a..f0c8c94 100644
---- a/src/lj_target_ppc.h
-+++ b/src/lj_target_ppc.h
-@@ -30,8 +30,13 @@ enum {
- 
-   /* Calling conventions. */
-   RID_RET = RID_R3,
-+#if LJ_LE
-+  RID_RETHI = RID_R4,
-+  RID_RETLO = RID_R3,
-+#else
-   RID_RETHI = RID_R3,
-   RID_RETLO = RID_R4,
-+#endif
-   RID_FPRET = RID_F1,
- 
-   /* These definitions must match with the *.dasc file(s): */
-@@ -131,6 +136,8 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define PPCF_C(r)	((r) << 6)
- #define PPCF_MB(n)	((n) << 6)
- #define PPCF_ME(n)	((n) << 1)
-+#define PPCF_SH(n)	((((n) & 31) << (11+1)) | (((n) & 32) >> (5-1)))
-+#define PPCF_M6(n)	((((n) & 31) << (5+1)) | (((n) & 32) << (11-5)))
- #define PPCF_Y		0x00200000
- #define PPCF_DOT	0x00000001
- 
-@@ -200,6 +207,13 @@ typedef enum PPCIns {
-   PPCI_RLWINM = 0x54000000,
-   PPCI_RLWIMI = 0x50000000,
- 
-+  PPCI_RLDICL = 0x78000000,
-+  PPCI_RLDICR = 0x78000004,
-+  PPCI_RLDIC = 0x78000008,
-+  PPCI_RLDIMI = 0x7800000c,
-+  PPCI_RLDCL = 0x78000010,
-+  PPCI_RLDCR = 0x78000012,
-+
-   PPCI_B = 0x48000000,
-   PPCI_BL = 0x48000001,
-   PPCI_BC = 0x40800000,
-diff --git src/vm_ppc.dasc src/vm_ppc.dasc
-index b4260eb..abb381e 100644
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -22,35 +22,40 @@
- |// GPR64   64 bit registers (but possibly 32 bit pointers, e.g. PS3).
- |//         Affects reg saves, stack layout, carry/overflow/dot flags etc.
- |// FRAME32 Use 32 bit frame layout, even with GPR64 (Xbox 360).
--|// TOC     Need table of contents (64 bit or 32 bit variant, e.g. PS3).
-+|// OPD     Need function descriptors (64 bit or 32 bit variant, e.g. PS3).
- |//         Function pointers are really a struct: code, TOC, env (optional).
--|// TOCENV  Function pointers have an environment pointer, too (not on PS3).
-+|// OPDENV  Function pointers have an environment pointer, too (not on PS3).
-+|// ELFV2   The 64-bit ELF V2 ABI is in use.
- |// PPE     Power Processor Element of Cell (PS3) or Xenon (Xbox 360).
- |//         Must avoid (slow) micro-coded instructions.
- |
- |.if P64
--|.define TOC, 1
--|.define TOCENV, 1
- |.macro lpx, a, b, c; ldx a, b, c; .endmacro
- |.macro lp, a, b; ld a, b; .endmacro
- |.macro stp, a, b; std a, b; .endmacro
-+|.macro stpx, a, b, c; stdx a, b, c; .endmacro
- |.define decode_OPP, decode_OP8
--|.if FFI
--|// Missing: Calling conventions, 64 bit regs, TOC.
--|.error lib_ffi not yet implemented for PPC64
--|.endif
-+|.define PSIZE, 8
- |.else
- |.macro lpx, a, b, c; lwzx a, b, c; .endmacro
- |.macro lp, a, b; lwz a, b; .endmacro
- |.macro stp, a, b; stw a, b; .endmacro
-+|.macro stpx, a, b, c; stwx a, b, c; .endmacro
- |.define decode_OPP, decode_OP4
-+|.define PSIZE, 4
- |.endif
- |
- |// Convenience macros for TOC handling.
--|.if TOC
-+|.if OPD or ELFV2
- |// Linker needs a TOC patch area for every external call relocation.
--|.macro blex, target; bl extern target@plt; nop; .endmacro
-+|.macro blex, target; bl extern target; nop; .endmacro
- |.macro .toc, a, b; a, b; .endmacro
-+|.else
-+|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro .toc, a, b; .endmacro
-+|.endif
-+|.if OPD
-+|.macro .opd, a, b; a, b; .endmacro
- |.if P64
- |.define TOC_OFS,	 8
- |.define ENV_OFS,	16
-@@ -58,13 +63,13 @@
- |.define TOC_OFS,	4
- |.define ENV_OFS,	8
- |.endif
--|.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
--|.macro .toc, a, b; .endmacro
-+|.else  // No OPD.
-+|.macro .opd, a, b; .endmacro
- |.endif
--|.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-+|.macro .opdenv, a, b; .if OPDENV; a, b; .endif; .endmacro
- |
- |.macro .gpr64, a, b; .if GPR64; a, b; .endif; .endmacro
-+|.macro .elfv2, a, b; .if ELFV2; a, b; .endif; .endmacro
- |
- |.macro andix., y, a, i
- |.if PPE
-@@ -75,29 +80,6 @@
- |.endif
- |.endmacro
- |
--|.macro clrso, reg
--|.if PPE
--|  li reg, 0
--|  mtxer reg
--|.else
--|  mcrxr cr0
--|.endif
--|.endmacro
--|
--|.macro checkov, reg, noov
--|.if PPE
--|  mfxer reg
--|  add reg, reg, reg
--|  cmpwi reg, 0
--|   li reg, 0
--|   mtxer reg
--|  bgey noov
--|.else
--|  mcrxr cr0
--|  bley noov
--|.endif
--|.endmacro
--|
- |//-----------------------------------------------------------------------
- |
- |// Fixed register assignments for the interpreter.
-@@ -111,6 +93,7 @@
- |.define LREG,		r18	// Register holding lua_State (also in SAVE_L).
- |.define MULTRES,	r19	// Size of multi-result: (nresults+1)*8.
- |.define JGL,		r31	// On-trace: global_State + 32768.
-+|.define BASEP4,	r25	// Equal to BASE + 4
- |
- |// Constants for type-comparisons, stores and conversions. C callee-save.
- |.define TISNUM,	r22
-@@ -143,12 +126,19 @@
- |
- |.define FARG1,		f1
- |.define FARG2,		f2
-+|.define FARG3,		f3
-+|.define FARG4,		f4
-+|.define FARG5,		f5
-+|.define FARG6,		f6
-+|.define FARG7,		f7
-+|.define FARG8,		f8
- |
- |.define CRET1,		r3
- |.define CRET2,		r4
- |
- |.define TOCREG,	r2	// TOC register (only used by C code).
- |.define ENVREG,	r11	// Environment pointer (nested C functions).
-+|.define FUNCREG,	r12	// ELFv2 function pointer (overlaps RD)
- |
- |// Stack layout while in interpreter. Must match with lj_frame.h.
- |.if GPR64
-@@ -182,6 +172,49 @@
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
- |
-+|.elif ELFV2
-+|
-+|//			392(sp) // \ 32 bit C frame info.
-+|.define SAVE_LR,	384(sp)
-+|.define SAVE_CR,	376(sp) // 64 bit CR save.
-+|.define CFRAME_SPACE,	368     // Delta for sp.
-+|// Back chain for sp:	368(sp) <-- sp entering interpreter
-+|.define SAVE_ERRF,	360(sp) // |
-+|.define SAVE_NRES,	356(sp) // |
-+|.define SAVE_L,	352(sp) //  > Parameter save area.
-+|.define SAVE_PC,	348(sp) // |
-+|.define SAVE_MULTRES,	344(sp) // |
-+|.define SAVE_CFRAME,	336(sp) // / 64 bit C frame chain.
-+|.define SAVE_FPR_,	192     // .. 192+18*8: 64 bit FPR saves.
-+|.define SAVE_GPR_,	48      // .. 48+18*8: 64 bit GPR saves.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	44(sp)
-+|.define TMPD_LO,	40(sp)
-+|.define TONUM_HI,	36(sp)
-+|.define TONUM_LO,	32(sp)
-+|.else
-+|.define TMPD_LO,	44(sp)
-+|.define TMPD_HI,	40(sp)
-+|.define TONUM_LO,	36(sp)
-+|.define TONUM_HI,	32(sp)
-+|.endif
-+|.define SAVE_TOC,	24(sp)  // TOC save area.
-+|// Next frame lr:	16(sp)
-+|// Next frame cr:	8(sp)
-+|// Back chain for sp:	0(sp)	<-- sp while in interpreter
-+|
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
-+|.define TMPD_BLO,	39(sp)
-+|.define TMPD,		TMPD_HI
-+|.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	32
-+|
- |.else
- |
- |//			508(sp) // \ 32 bit C frame info.
-@@ -192,23 +225,39 @@
- |.define SAVE_MULTRES,	456(sp) // |
- |.define SAVE_CFRAME,	448(sp) // / 64 bit C frame chain.
- |.define SAVE_LR,	416(sp)
-+|.define SAVE_CR,	408(sp)  // 64 bit CR save.
- |.define CFRAME_SPACE,	400     // Delta for sp.
- |// Back chain for sp:	400(sp) <-- sp entering interpreter
- |.define SAVE_FPR_,	256     // .. 256+18*8: 64 bit FPR saves.
- |.define SAVE_GPR_,	112     // .. 112+18*8: 64 bit GPR saves.
- |//			48(sp)  // Callee parameter save area (ABI mandated).
- |.define SAVE_TOC,	40(sp)  // TOC save area.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	36(sp)  // \ Link editor temp (ABI mandated).
-+|.define TMPD_LO,	32(sp)  // /
-+|.define TONUM_HI,	28(sp)  // \ Compiler temp (ABI mandated).
-+|.define TONUM_LO,	24(sp)  // /
-+|.else
- |.define TMPD_LO,	36(sp)  // \ Link editor temp (ABI mandated).
- |.define TMPD_HI,	32(sp)  // /
- |.define TONUM_LO,	28(sp)  // \ Compiler temp (ABI mandated).
- |.define TONUM_HI,	24(sp)  // /
-+|.endif
- |// Next frame lr:	16(sp)
--|.define SAVE_CR,	8(sp)  // 64 bit CR save.
-+|// Next frame cr:	8(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	39(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	112
- |
- |.endif
- |.else
-@@ -226,16 +275,31 @@
- |.define SAVE_PC,	32(sp)
- |.define SAVE_MULTRES,	28(sp)
- |.define UNUSED1,	24(sp)
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	20(sp)
-+|.define TMPD_LO,	16(sp)
-+|.define TONUM_HI,	12(sp)
-+|.define TONUM_LO,	8(sp)
-+|.else
- |.define TMPD_LO,	20(sp)
- |.define TMPD_HI,	16(sp)
- |.define TONUM_LO,	12(sp)
- |.define TONUM_HI,	8(sp)
-+|.endif
- |// Next frame lr:	4(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	16(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	23(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	16
- |
- |.endif
- |
-@@ -350,8 +414,35 @@
- |//-----------------------------------------------------------------------
- |
- |// Access to frame relative to BASE.
-+|.if ENDIAN_LE
-+|.define FRAME_PC,	-4
-+|.define FRAME_FUNC,	-8
-+|.define FRAME_CONTPC,	-12
-+|.define FRAME_CONTRET,	-16
-+|.define WORD_LO,	0
-+|.define WORD_HI,	4
-+|.define WORD_BLO,	0
-+|.define BASE_LO,	BASE
-+|.define BASE_HI,	BASEP4
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux lo, base, idx
-+|  lwz hi, 4(base)
-+|.endmacro
-+|.else
- |.define FRAME_PC,	-8
- |.define FRAME_FUNC,	-4
-+|.define FRAME_CONTPC,	-16
-+|.define FRAME_CONTRET,	-12
-+|.define WORD_LO,	4
-+|.define WORD_HI,	0
-+|.define WORD_BLO,	7
-+|.define BASE_LO,	BASEP4
-+|.define BASE_HI,	BASE
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux hi, base, idx
-+|  lwz lo, 4(base)
-+|.endmacro
-+|.endif
- |
- |// Instruction decode.
- |.macro decode_OP4, dst, ins; rlwinm dst, ins, 2, 22, 29; .endmacro
-@@ -412,6 +503,7 @@
- |// Call decode and dispatch.
- |.macro ins_callt
- |  // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
-+|  addi BASEP4, BASE, 4
- |  lwz PC, LFUNC:RB->pc
- |  lwz INS, 0(PC)
- |   addi PC, PC, 4
-@@ -504,7 +596,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz PC, FRAME_PC(TMP2)		// Fetch PC of previous frame.
-   |  mr BASE, TMP2			// Restore caller base.
-   |  // Prepending may overwrite the pcall frame, so do it at the end.
--  |   stwu TMP1, FRAME_PC(RA)		// Prepend true to results.
-+  |  .if ENDIAN_LE
-+  |    addi RA, RA, -8
-+  |    stw TMP1, WORD_HI(RA)		// Prepend true to results.
-+  |  .else
-+  |    stwu TMP1, -8(RA)		// Prepend true to results.
-+  |  .endif
-   |
-   |->vm_returnc:
-   |  addi RD, RD, 8			// RD = (nresults+1)*8.
-@@ -560,7 +657,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP1, L->maxstack
-   |  cmplw BASE, TMP1
-   |  bge >8
--  |  stw TISNIL, 0(BASE)
-+  |  stw TISNIL, WORD_HI(BASE)
-   |  addi RD, RD, 8
-   |  addi BASE, BASE, 8
-   |  b <2
-@@ -611,7 +708,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_unwind_ff_eh:			// Landing pad for external unwinder.
-   |  lwz L, SAVE_L
-   |  .toc ld TOCREG, SAVE_TOC
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp BASE, L->base
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
-@@ -626,7 +728,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)			// Results start at BASE-8.
-   |     stw TMP3, TMPD
-   |   addi DISPATCH, DISPATCH, GG_G2DISP
--  |  stw TMP1, 0(RA)			// Prepend false to error message.
-+  |  stw TMP1, WORD_HI(RA)		// Prepend false to error message.
-   |  li RD, 16				// 2 results: false + error message.
-   |    st_vmstate
-   |     lfs TONUM, TMPD
-@@ -687,7 +789,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mr RA, BASE
-   |   lp BASE, L->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |  lwz PC, FRAME_PC(BASE)
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-@@ -737,7 +844,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:  // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype).
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |  add PC, PC, BASE
-@@ -757,8 +869,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->vm_call_dispatch:
-   |  // TMP2 = old base, BASE = new base, RC = nargs*8, PC = caller PC
--  |  lwz TMP0, FRAME_PC(BASE)
--  |   lwz LFUNC:RB, FRAME_FUNC(BASE)
-+  |  lwz TMP0, WORD_HI-8(BASE)
-+  |   lwz LFUNC:RB, WORD_LO-8(BASE)
-   |  checkfunc TMP0; bne ->vmeta_call
-   |
-   |->vm_call_dispatch_f:
-@@ -777,7 +889,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   sub TMP0, TMP0, TMP1		// Compute -savestack(L, L->top).
-   |    lp TMP1, L->cframe
-   |     addi DISPATCH, DISPATCH, GG_G2DISP
--  |  .toc lp CARG4, 0(CARG4)
-+  |  .opd lp TOCREG, TOC_OFS(CARG4)
-+  |  .opdenv lp ENVREG, ENV_OFS(CARG4)
-+  |  .opd lp CARG4, 0(CARG4)
-   |  li TMP2, 0
-   |   stw TMP0, SAVE_NRES		// Neg. delta means cframe w/o frame.
-   |  stw TMP2, SAVE_ERRF		// No error function.
-@@ -785,7 +899,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stp sp, L->cframe		// Add our C frame to cframe chain.
-   |     stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mtctr CARG4
-+  |  .elfv2 mr FUNCREG, CARG4
-   |  bctrl			// (lua_State *L, lua_CFunction func, void *ud)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |.if PPE
-   |  mr BASE, CRET1
-   |  cmpwi CRET1, 0
-@@ -807,20 +923,27 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_dispatch:
-   |  // BASE = meta base, RA = resultptr, RD = (nresults+1)*8
--  |  lwz TMP0, -12(BASE)		// Continuation.
-+  |  lwz TMP0, FRAME_CONTRET(BASE)	// Continuation.
-   |   mr RB, BASE
-   |   mr BASE, TMP2			// Restore caller BASE.
-   |    lwz LFUNC:TMP1, FRAME_FUNC(TMP2)
-   |.if FFI
-   |  cmplwi TMP0, 1
-   |.endif
--  |     lwz PC, -16(RB)			// Restore PC from [cont|PC].
--  |   subi TMP2, RD, 8
-+  |     lwz PC, FRAME_CONTPC(RB)	// Restore PC from [cont|PC].
-+  |  addi BASEP4, BASE, 4
-+  |   addi TMP2, RD, WORD_HI-8
-   |    lwz TMP1, LFUNC:TMP1->pc
-   |   stwx TISNIL, RA, TMP2		// Ensure one valid arg.
-+  |.if P64
-+  |   ld TMP3, 0(DISPATCH)
-+  |.endif
-   |.if FFI
-   |  ble >1
-   |.endif
-+  |.if P64
-+  |  add TMP0, TMP0, TMP3
-+  |.endif
-   |    lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // BASE = base, RA = resultptr, RB = meta base
-   |  mtctr TMP0
-@@ -856,20 +979,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgetb:			// TMP0 = index
-@@ -880,8 +1003,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -909,7 +1032,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 16			// 2 args for func(t, k).
-@@ -923,7 +1046,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f14, 0(CRET1)
-   |  b ->BC_TGETR_Z
-   |1:
--  |  stwx TISNIL, BASE, RA
-+  |  stwx TISNIL, BASE_HI, RA
-   |  b ->cont_nop
-   |
-   |//-----------------------------------------------------------------------
-@@ -932,20 +1055,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsetb:			// TMP0 = index
-@@ -956,8 +1079,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -986,7 +1109,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
-@@ -1006,17 +1129,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_comp:
-   |  mr CARG1, L
-   |   subi PC, PC, 4
--  |.if DUALNUM
--  |  mr CARG2, RA
--  |.else
-   |  add CARG2, BASE, RA
--  |.endif
-   |   stw PC, SAVE_PC
--  |.if DUALNUM
--  |  mr CARG3, RD
--  |.else
-   |  add CARG3, BASE, RD
--  |.endif
-   |   stp BASE, L->base
-   |  decode_OP1 CARG4, INS
-   |  bl extern lj_meta_comp  // (lua_State *L, TValue *o1, *o2, int op)
-@@ -1043,7 +1158,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b ->cont_nop
-   |
-   |->cont_condt:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is true.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1051,7 +1166,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b <4
-   |
-   |->cont_condf:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is false.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1103,8 +1218,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |
-   |->vmeta_unm:
--  |  mr CARG3, RD
--  |  mr CARG4, RD
-+  |  add CARG3, BASE, RD
-+  |  add CARG4, BASE, RD
-   |  b >1
-   |
-   |->vmeta_arith_vn:
-@@ -1139,7 +1254,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_binop:
-   |  // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2
-   |  sub TMP1, CRET1, BASE
--  |   stw PC, -16(CRET1)		// [cont|PC]
-+  |   stw PC, FRAME_CONTPC(CRET1)	// [cont|PC]
-   |   mr TMP2, BASE
-   |  addi PC, TMP1, FRAME_CONT
-   |   mr BASE, CRET1
-@@ -1150,7 +1265,7 @@ static void build_subroutines(BuildCtx *ctx)
- #if LJ_52
-   |  mr SAVE0, CARG1
- #endif
--  |  mr CARG2, RD
-+  |  add CARG2, BASE, RD
-   |   stp BASE, L->base
-   |  mr CARG1, L
-   |   stw PC, SAVE_PC
-@@ -1227,25 +1342,25 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_1, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_2, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG4, 8(BASE)
--  |   lwz CARG1, 4(BASE)
--  |    lwz CARG2, 12(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG4, WORD_HI+8(BASE)
-+  |   lwz CARG1, WORD_LO(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_n, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1254,9 +1369,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_nn, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |    lfd FARG2, 8(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1279,9 +1394,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplw cr1, CARG3, TMP1
-   |    lwz PC, FRAME_PC(BASE)
-   |  bge cr1, ->fff_fallback
--  |   stw CARG3, 0(RA)
-+  |   stw CARG3, WORD_HI(RA)
-   |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
--  |   stw CARG1, 4(RA)
-+  |   stw CARG1, WORD_LO(RA)
-   |  beq ->fff_res			// Done if exactly 1 argument.
-   |  li TMP1, 8
-   |  subi RC, RC, 8
-@@ -1295,17 +1410,36 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc type
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |  blt ->fff_fallback
-   |  .gpr64 extsw CARG1, CARG1
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG1, 15
-+  |  subfc TMP1, TMP0, CARG1
-+  |.else
-   |  subfc TMP0, TISNUM, CARG1
--  |  subfe TMP2, CARG1, CARG1
-+  |.endif
-+  |    subfe TMP2, CARG1, CARG1
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >1
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 3
-+  |2:
-   |   la TMP2, CFUNC:RB->upvalue
-   |  lfdx FARG1, TMP2, TMP1
-   |  b ->fff_resn
-+  |.if P64
-+  |1:
-+  |  li TMP1, ~LJ_TLIGHTUD<<3
-+  |  b <2
-+  |.endif
-   |
-   |//-- Base library: getters and setters ---------------------------------
-   |
-@@ -1328,10 +1462,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  sub TMP1, TMP0, TMP1
-   |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-   |3:  // Rearranged logic, because we expect _not_ to find the key.
--  |  lwz CARG4, NODE:TMP2->key
--  |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--  |    lwz CARG2, NODE:TMP2->val
--  |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+  |  lwz CARG4, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+  |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+  |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+  |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-   |  checkstr CARG4; bne >4
-   |   cmpw TMP0, STR:RC; beq >5
-   |4:
-@@ -1349,14 +1483,33 @@ static void build_subroutines(BuildCtx *ctx)
-   |6:
-   |  cmpwi CARG3, LJ_TUDATA; beq <1
-   |  .gpr64 extsw CARG3, CARG3
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG3, 15
-+  |  subfc TMP1, TMP0, CARG3
-+  |.else
-   |  subfc TMP0, TISNUM, CARG3
-+  |.endif
-   |  subfe TMP2, CARG3, CARG3
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >7
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 2
-+  |8:
-   |   la TMP2, DISPATCH_GL(gcroot[GCROOT_BASEMT])(DISPATCH)
-   |  lwzx TAB:CARG1, TMP2, TMP1
-   |  b <2
-+  |.if P64
-+  |7:
-+  |  li TMP1, ~LJ_TLIGHTUD<<2
-+  |  b <8
-+  |.endif
-   |
-   |.ffunc_2 setmetatable
-   |  // Fast path: no mt for table yet and not clearing the mt.
-@@ -1374,8 +1527,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc rawget
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG4, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checktab CARG4; bne ->fff_fallback
-   |   la CARG3, 8(BASE)
-@@ -1390,7 +1543,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc tonumber
-   |  // Only handles the number case inline (without a base argument).
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly one argument.
-   |   checknum CARG1; bgt ->fff_fallback
-@@ -1425,10 +1578,15 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc next
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-+  |.if ENDIAN_LE
-+  |   add TMP1, BASE, NARGS8:RC
-+  |   stw TISNIL, WORD_HI(TMP1)		// Set missing 2nd arg to nil.
-+  |.else
-   |   stwx TISNIL, BASE, NARGS8:RC	// Set missing 2nd arg to nil.
-+  |.endif
-   |  checktab CARG1
-   |   lwz PC, FRAME_PC(BASE)
-   |  bne ->fff_fallback
-@@ -1464,18 +1622,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, CFUNC:RB->upvalue[0]
-   |  la RA, -8(BASE)
- #endif
--  |   stw TISNIL, 8(BASE)
-+  |   stw TISNIL, 8+WORD_HI(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-   |
-   |.ffunc ipairs_aux
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz TAB:CARG1, 4(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz TAB:CARG1, WORD_LO(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP2, 12(BASE)
-+  |    lwz TMP2, 8+WORD_LO(BASE)
-   |.else
-   |    lfd FARG2, 8(BASE)
-   |.endif
-@@ -1504,16 +1662,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |   la RA, -8(BASE)
-   |  cmplw TMP0, TMP2
-   |.if DUALNUM
--  |  stw TISNUM, 0(RA)
-+  |  stw TISNUM, WORD_HI(RA)
-   |   slwi TMP3, TMP2, 3
--  |  stw TMP2, 4(RA)
-+  |  stw TMP2, WORD_LO(RA)
-   |.else
-   |   slwi TMP3, TMP2, 3
-   |  stfd FARG2, 0(RA)
-   |.endif
-   |  ble >2				// Not in array part?
--  |  lwzx TMP2, TMP1, TMP3
--  |  lfdx f0, TMP1, TMP3
-+  |  lfdux f0, TMP1, TMP3
-+  |  lwz TMP2, WORD_HI(TMP1)
-   |1:
-   |  checknil TMP2
-   |   li RD, (0+1)*8
-@@ -1532,7 +1690,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplwi CRET1, 0
-   |   li RD, (0+1)*8
-   |  beq ->fff_res
--  |  lwz TMP2, 0(CRET1)
-+  |  lwz TMP2, WORD_HI(CRET1)
-   |  lfd f0, 0(CRET1)
-   |  b <1
-   |
-@@ -1551,11 +1709,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)
- #endif
-   |.if DUALNUM
--  |  stw TISNUM, 8(BASE)
-+  |  stw TISNUM, 8+WORD_HI(BASE)
-   |.else
--  |  stw ZERO, 8(BASE)
-+  |  stw ZERO, 8+WORD_HI(BASE)
-   |.endif
--  |   stw ZERO, 12(BASE)
-+  |   stw ZERO, 8+WORD_LO(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-@@ -1576,7 +1734,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc xpcall
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |    lfd FARG2, 8(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-@@ -1673,7 +1831,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if resume
-   |  li TMP1, LJ_TTRUE
-   |   la RA, -8(BASE)
--  |  stw TMP1, -8(BASE)			// Prepend true to results.
-+  |  stw TMP1, WORD_HI-8(BASE)		// Prepend true to results.
-   |  addi RD, RD, 16
-   |.else
-   |  mr RA, BASE
-@@ -1693,7 +1851,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, 0(TMP3)
-   |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
-   |    li RD, (2+1)*8
--  |   stw TMP1, -8(BASE)		// Prepend false to results.
-+  |   stw TMP1, WORD_HI-8(BASE)		// Prepend false to results.
-   |    la RA, -8(BASE)
-   |  stfd f0, 0(BASE)			// Copy error message.
-   |  b <7
-@@ -1746,8 +1904,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_resi:
-   |  lwz PC, FRAME_PC(BASE)
-   |  la RA, -8(BASE)
--  |  stw TISNUM, -8(BASE)
--  |  stw CRET1, -4(BASE)
-+  |  stw TISNUM, WORD_HI-8(BASE)
-+  |  stw CRET1, WORD_LO-8(BASE)
-   |  b ->fff_res1
-   |1:
-   |  lus CARG3, 0x41e0	// 2^31.
-@@ -1762,9 +1920,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_restv:
-   |  // CARG3/CARG1 = TValue result.
-   |  lwz PC, FRAME_PC(BASE)
--  |   stw CARG3, -8(BASE)
-+  |   stw CARG3, WORD_HI-8(BASE)
-   |  la RA, -8(BASE)
--  |   stw CARG1, -4(BASE)
-+  |   stw CARG1, WORD_LO-8(BASE)
-   |->fff_res1:
-   |  // RA = results, PC = return.
-   |  li RD, (1+1)*8
-@@ -1782,10 +1940,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  ins_next1
-   |  // Adjust BASE. KBASE is assumed to be set for the calling frame.
-   |   sub BASE, RA, TMP0
-+  |   addi BASEP4, BASE, 4
-   |  ins_next2
-   |
-   |6:  // Fill up results with nil.
--  |  subi TMP1, RD, 8
-+  |  addi TMP1, RD, WORD_HI-8
-   |   addi RD, RD, 8
-   |  stwx TISNIL, RA, TMP1
-   |  b <5
-@@ -1898,7 +2057,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc math_log
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1923,13 +2082,13 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if DUALNUM
-   |.ffunc math_ldexp
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |.if GPR64
--  |    lwz CARG2, 12(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |.else
--  |    lwz CARG1, 12(BASE)
-+  |    lwz CARG1, WORD_LO+8(BASE)
-   |.endif
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1961,8 +2120,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stfd FARG1, 0(RA)
-   |  li RD, (2+1)*8
-   |.if DUALNUM
--  |   stw TISNUM, 8(RA)
--  |   stw TMP1, 12(RA)
-+  |   stw TISNUM, WORD_HI+8(RA)
-+  |   stw TMP1, WORD_LO+8(RA)
-   |.else
-   |   stfd FARG2, 8(RA)
-   |.endif
-@@ -1989,9 +2148,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   add TMP2, BASE, NARGS8:RC
-   |  bne >4
-   |1:  // Handle integers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |   bge cr1, ->fff_resi
-   |  checknum CARG4
-   |   xoris TMP0, CARG1, 0x8000
-@@ -2020,7 +2179,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lfd FARG1, 0(BASE)
-   |  bge ->fff_fallback
-   |5:  // Handle numbers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |  lfd FARG2, 0(TMP1)
-   |   bge cr1, ->fff_resn
-@@ -2035,7 +2194,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |  b <5
-   |7:  // Convert integer to number and continue above.
--  |   lwz CARG2, 4(TMP1)
-+  |   lwz CARG2, WORD_LO(TMP1)
-   |  bne ->fff_fallback
-   |  tonum_i FARG2, CARG2
-   |  b <6
-@@ -2043,7 +2202,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc_n name
-   |  li TMP1, 8
-   |1:
-+  |.if ENDIAN_LE
-+  |   add CARG2, BASE, TMP1
-+  |   lwz CARG2, WORD_HI(CARG2)
-+  |.else
-   |   lwzx CARG2, BASE, TMP1
-+  |.endif
-   |   lfdx FARG2, BASE, TMP1
-   |  cmplw cr1, TMP1, NARGS8:RC
-   |   checknum CARG2
-@@ -2067,8 +2231,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc string_byte			// Only handle the 1-arg case here.
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |   checkstr CARG3
-   |   bne ->fff_fallback
-@@ -2099,12 +2263,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_char			// Only handle the 1-arg case here.
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP0, 4(BASE)
-+  |    lwz TMP0, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-   |  checknum CARG3; bne ->fff_fallback
--  |   la CARG2, 7(BASE)
-+  |   la CARG2, WORD_BLO(BASE)
-   |.else
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-@@ -2128,16 +2292,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_sub
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 16(BASE)
-+  |   lwz CARG3, WORD_HI+16(BASE)
-   |.if not DUALNUM
-   |    lfd f0, 16(BASE)
-   |.endif
--  |   lwz TMP0, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz TMP0, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
--  |   lwz CARG2, 8(BASE)
-+  |   lwz CARG2, WORD_HI+8(BASE)
-   |.if DUALNUM
--  |    lwz TMP1, 12(BASE)
-+  |    lwz TMP1, WORD_LO+8(BASE)
-   |.else
-   |    lfd f1, 8(BASE)
-   |.endif
-@@ -2145,7 +2309,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  beq >1
-   |.if DUALNUM
-   |  checknum CARG3
--  |   lwz TMP2, 20(BASE)
-+  |   lwz TMP2, WORD_LO+16(BASE)
-   |  bne ->fff_fallback
-   |1:
-   |  checknum CARG2; bne ->fff_fallback
-@@ -2201,8 +2365,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc string_ .. name
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG2, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checkstr CARG3
-   |   la SBUF:CARG1, DISPATCH_GL(tmpbuf)(DISPATCH)
-@@ -2240,10 +2404,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  addi TMP1, BASE, 8
-   |  add TMP2, BASE, NARGS8:RC
-   |1:
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |.if DUALNUM
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |.else
-   |  lfd FARG1, 0(TMP1)
-   |.endif
-@@ -2344,20 +2508,23 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->fff_fallback:			// Call fast function fallback handler.
-   |  // BASE = new base, RB = CFUNC, RC = nargs*8
--  |  lp TMP3, CFUNC:RB->f
-+  |  lp FUNCREG, CFUNC:RB->f
-   |    add TMP1, BASE, NARGS8:RC
-   |   lwz PC, FRAME_PC(BASE)		// Fallback may overwrite PC.
-   |    addi TMP0, TMP1, 8*LUA_MINSTACK
-   |     lwz TMP2, L->maxstack
-   |   stw PC, SAVE_PC			// Redundant (but a defined value).
--  |  .toc lp TMP3, 0(TMP3)
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-   |  cmplw TMP0, TMP2
-   |     stp BASE, L->base
-   |    stp TMP1, L->top
-   |   mr CARG1, L
-   |  bgt >5				// Need to grow stack.
--  |  mtctr TMP3
-+  |  mtctr FUNCREG
-   |  bctrl				// (lua_State *L)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |  // Either throws an error, or recovers and returns -1, 0 or nresults+1.
-   |  lp BASE, L->base
-   |  cmpwi CRET1, 0
-@@ -2459,6 +2626,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:
-   |  lp BASE, L->base
-   |4:  // Re-dispatch to static ins.
-+  |  addi BASEP4, BASE, 4
-   |  lwz INS, -4(PC)
-   |  decode_OPP TMP1, INS
-   |   decode_RB8 RB, INS
-@@ -2472,7 +2640,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_hook:				// Continue from hook yield.
-   |  addi PC, PC, 4
--  |  lwz MULTRES, -20(RB)		// Restore MULTRES for *M ins.
-+  |  lwz MULTRES, WORD_LO-24(RB)		// Restore MULTRES for *M ins.
-   |  b <4
-   |
-   |->vm_hotloop:			// Hot loop counter underflow.
-@@ -2514,6 +2682,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lp BASE, L->base
-   |   lp TMP0, L->top
-   |   stw ZERO, SAVE_PC			// Invalidate for subsequent line hook.
-+  |  addi BASEP4, BASE, 4
-   |  sub NARGS8:RC, TMP0, BASE
-   |  add RA, BASE, RA
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)
-@@ -2525,7 +2694,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if JIT
-   |  // RA = resultptr, RB = meta base
-   |  lwz INS, -4(PC)
--  |    lwz TRACE:TMP2, -20(RB)		// Save previous trace.
-+  |    lwz TRACE:TMP2, WORD_LO-24(RB)	// Save previous trace.
-   |   addic. TMP1, MULTRES, -8
-   |  decode_RA8 RC, INS			// Call base.
-   |   beq >2
-@@ -2560,10 +2729,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG2, PC
-   |  bl extern lj_dispatch_stitch	// (jit_State *J, const BCIns *pc)
-   |  lp BASE, L->base
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
-   |
-   |9:
-+  |.if ENDIAN_LE
-+  |  addi BASEP4, BASE, 4
-+  |  stwx TISNIL, BASEP4, RC
-+  |.else
-   |  stwx TISNIL, BASE, RC
-+  |.endif
-   |  addi RC, RC, 8
-   |  b <3
-   |.endif
-@@ -2578,6 +2753,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
-   |  lp BASE, L->base
-   |  subi PC, PC, 4
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
- #endif
-   |
-@@ -2586,39 +2762,72 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-----------------------------------------------------------------------
-   |
-   |.macro savex_, a, b, c, d
--  |  stfd f..a, 16+a*8(sp)
--  |  stfd f..b, 16+b*8(sp)
--  |  stfd f..c, 16+c*8(sp)
--  |  stfd f..d, 16+d*8(sp)
-+  |  stfd f..a, EXIT_OFFSET+a*8(sp)
-+  |  stfd f..b, EXIT_OFFSET+b*8(sp)
-+  |  stfd f..c, EXIT_OFFSET+c*8(sp)
-+  |  stfd f..d, EXIT_OFFSET+d*8(sp)
-+  |.endmacro
-+  |
-+  |.macro saver, a
-+  |  stp r..a, EXIT_OFFSET+32*8+a*PSIZE(sp)
-   |.endmacro
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, sp, -(16+32*8+32*4)
--  |  stmw r2, 16+32*8+2*4(sp)
-+  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  saver 3 // CARG1
-+  |  saver 4 // CARG2
-+  |  saver 5 // CARG3
-+  |  saver 17 // DISPATCH
-   |    addi DISPATCH, JGL, -GG_DISP2G-32768
-   |    li CARG2, ~LJ_VMST_EXIT
--  |   lwz CARG1, 16+32*8+32*4(sp)	// Get stack chain.
-+  |   lp CARG1, EXIT_OFFSET+32*8+32*PSIZE(sp)	// Get stack chain.
-   |    stw CARG2, DISPATCH_GL(vmstate)(DISPATCH)
-+  |  saver 2
-+  |  saver 6
-+  |  saver 7
-+  |  saver 8
-+  |  saver 9
-+  |  saver 10
-+  |  saver 11
-+  |  saver 12
-+  |  saver 13
-   |  savex_ 0,1,2,3
--  |   stw CARG1, 0(sp)			// Store extended stack chain.
--  |   clrso TMP1
-+  |   stp CARG1, 0(sp)			// Store extended stack chain.
-+
-   |  savex_ 4,5,6,7
--  |   addi CARG2, sp, 16+32*8+32*4	// Recompute original value of sp.
-+  |  saver 14
-+  |  saver 15
-+  |  saver 16
-+  |  saver 18
-+  |   addi CARG2, sp, EXIT_OFFSET+32*8+32*PSIZE	// Recompute original value of sp.
-   |  savex_ 8,9,10,11
--  |   stw CARG2, 16+32*8+1*4(sp)	// Store sp in RID_SP.
-+  |   stp CARG2, EXIT_OFFSET+32*8+1*PSIZE(sp)	// Store sp in RID_SP.
-   |  savex_ 12,13,14,15
-   |   mflr CARG3
-   |   li TMP1, 0
-   |  savex_ 16,17,18,19
--  |   stw TMP1, 16+32*8+0*4(sp)		// Clear RID_TMP.
-+  |   stw TMP1, EXIT_OFFSET+32*8+0*PSIZE(sp)		// Clear RID_TMP.
-   |  savex_ 20,21,22,23
-   |   lhz CARG4, 2(CARG3)		// Load trace number.
-   |  savex_ 24,25,26,27
-   |  lwz L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  savex_ 28,29,30,31
-+  |  saver 19
-+  |  saver 20
-+  |  saver 21
-+  |  saver 22
-+  |  saver 23
-+  |  saver 24
-+  |  saver 25
-+  |  saver 26
-+  |  saver 27
-+  |  saver 28
-+  |  saver 29
-+  |  saver 30
-+  |  saver 31
-   |   sub CARG3, TMP0, CARG3		// Compute exit number.
--  |  lp BASE, DISPATCH_GL(jit_base)(DISPATCH)
-+  |  lwz BASE, DISPATCH_GL(jit_base)(DISPATCH)
-   |   srwi CARG3, CARG3, 2
-   |  stp L, DISPATCH_J(L)(DISPATCH)
-   |   subi CARG3, CARG3, 2
-@@ -2627,11 +2836,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw TMP1, DISPATCH_GL(jit_base)(DISPATCH)
-   |  addi CARG1, DISPATCH, GG_DISP2J
-   |   stw CARG3, DISPATCH_J(exitno)(DISPATCH)
--  |  addi CARG2, sp, 16
-+  |  addi CARG2, sp, EXIT_OFFSET
-   |  bl extern lj_trace_exit		// (jit_State *J, ExitState *ex)
-   |  // Returns MULTRES (unscaled) or negated error code.
-   |  lp TMP1, L->cframe
--  |  lwz TMP2, 0(sp)
-+  |  lp TMP2, 0(sp)
-   |   lp BASE, L->base
-   |.if GPR64
-   |  rldicr sp, TMP1, 0, 61
-@@ -2639,7 +2848,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  rlwinm sp, TMP1, 0, 0, 29
-   |.endif
-   |   lwz PC, SAVE_PC			// Get SAVE_PC.
--  |  stw TMP2, 0(sp)
-+  |  stp TMP2, 0(sp)
-   |  stw L, SAVE_L			// Set SAVE_L (on-trace resume/yield).
-   |  b >1
-   |.endif
-@@ -2660,7 +2869,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stw TMP2, DISPATCH_GL(jit_base)(DISPATCH)
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // Setup type comparison constants.
-+  |.if P64
-+  |  lus TISNUM, LJ_TISNUM >> 16
-+  |  ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |  li TISNUM, LJ_TISNUM
-+  |.endif
-   |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
-   |  stw TMP3, TMPD
-   |  li ZERO, 0
-@@ -2680,14 +2894,14 @@ static void build_subroutines(BuildCtx *ctx)
-   |   decode_RA8 RA, INS
-   |  lpx TMP0, DISPATCH, TMP1
-   |  mtctr TMP0
--  |  cmplwi TMP1, BC_FUNCF*4		// Function header?
-+  |  cmplwi TMP1, BC_FUNCF*PSIZE	// Function header?
-   |  bge >2
-   |   decode_RB8 RB, INS
-   |   decode_RD8 RD, INS
-   |   decode_RC8 RC, INS
-   |  bctr
-   |2:
--  |  cmplwi TMP1, (BC_FUNCC+2)*4	// Fast function?
-+  |  cmplwi TMP1, (BC_FUNCC+2)*PSIZE	// Fast function?
-   |  blt >3
-   |  // Check frame below fast function.
-   |  lwz TMP1, FRAME_PC(BASE)
-@@ -2697,7 +2911,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP2, -4(TMP1)
-   |  decode_RA8 TMP0, TMP2
-   |  sub TMP1, BASE, TMP0
--  |  lwz LFUNC:TMP2, -12(TMP1)
-+  |  lwz LFUNC:TMP2, WORD_LO-16(TMP1)
-   |  lwz TMP1, LFUNC:TMP2->pc
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |3:
-@@ -2718,6 +2932,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |// NYI: Use internal implementations of floor, ceil, trunc.
-   |
-   |->vm_modi:
-+  |  li TMP1, 0
-+  |  mtxer TMP1
-   |  divwo. TMP0, CARG1, CARG2
-   |  bso >1
-   |.if GPR64
-@@ -2736,7 +2952,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmpwi CARG2, 0
-   |   li CARG1, 0
-   |  beqlr
--  |  clrso TMP0			// Clear SO for -2147483648 % -1 and return 0.
-+  |  // Clear SO for -2147483648 % -1 and return 0.
-+  |  crxor 4*cr0+so, 4*cr0+so, 4*cr0+so
-   |  blr
-   |
-   |//-----------------------------------------------------------------------
-@@ -2749,10 +2966,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_cachesync:
-   |.if JIT or FFI
-   |  // Compute start of first cache line and number of cache lines.
-+  |  .if GPR64
-+  |  rldicr CARG1, CARG1, 0, 58
-+  |  .else
-   |  rlwinm CARG1, CARG1, 0, 0, 26
-+  |  .endif
-   |  sub CARG2, CARG2, CARG1
-   |  addi CARG2, CARG2, 31
-+  |  .if GPR64
-+  |  srdi. CARG2, CARG2, 5
-+  |  .else
-   |  rlwinm. CARG2, CARG2, 27, 5, 31
-+  |  .endif
-   |  beqlr
-   |  mtctr CARG2
-   |  mr CARG3, CARG1
-@@ -2774,39 +2999,70 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-- FFI helper functions -----------------------------------------------
-   |//-----------------------------------------------------------------------
-   |
--  |// Handler for callback functions. Callback slot number in r11, g in r12.
-+  |// Handler for callback functions.
-+  |// 32-bit: Callback slot number in r12, g in r11.
-+  |// 64-bit v1: Callback slot number in bits 47+ of r11, g in 0-46, TOC in r2.
-+  |// 64-bit v2: Callback slot number in bits 2-11 of r12, g in r11,
-+  |// vm_ffi_callback in r2.
-   |->vm_ffi_callback:
-   |.if FFI
-   |.type CTSTATE, CTState, PC
-+  |  .if OPD
-+  |   rldicl r12, r11, 17, 47
-+  |   rldicl r11, r11, 0, 17
-+  |  .endif
-+  |  .if ELFV2
-+  |   rlwinm r12, r12, 30, 22, 31
-+  |   addisl TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@ha
-+  |   addil TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@l
-+  |  .endif
-   |  saveregs
--  |  lwz CTSTATE, GL:r12->ctype_state
--  |   addi DISPATCH, r12, GG_G2DISP
--  |  stw r11, CTSTATE->cb.slot
--  |  stw r3, CTSTATE->cb.gpr[0]
-+  |  lwz CTSTATE, GL:r11->ctype_state
-+  |   addi DISPATCH, r11, GG_G2DISP
-+  |  stw r12, CTSTATE->cb.slot
-+  |  stp r3, CTSTATE->cb.gpr[0]
-   |   stfd f1, CTSTATE->cb.fpr[0]
--  |  stw r4, CTSTATE->cb.gpr[1]
-+  |  stp r4, CTSTATE->cb.gpr[1]
-   |   stfd f2, CTSTATE->cb.fpr[1]
--  |  stw r5, CTSTATE->cb.gpr[2]
-+  |  stp r5, CTSTATE->cb.gpr[2]
-   |   stfd f3, CTSTATE->cb.fpr[2]
--  |  stw r6, CTSTATE->cb.gpr[3]
-+  |  stp r6, CTSTATE->cb.gpr[3]
-   |   stfd f4, CTSTATE->cb.fpr[3]
--  |  stw r7, CTSTATE->cb.gpr[4]
-+  |  stp r7, CTSTATE->cb.gpr[4]
-   |   stfd f5, CTSTATE->cb.fpr[4]
--  |  stw r8, CTSTATE->cb.gpr[5]
-+  |  stp r8, CTSTATE->cb.gpr[5]
-   |   stfd f6, CTSTATE->cb.fpr[5]
--  |  stw r9, CTSTATE->cb.gpr[6]
-+  |  stp r9, CTSTATE->cb.gpr[6]
-   |   stfd f7, CTSTATE->cb.fpr[6]
--  |  stw r10, CTSTATE->cb.gpr[7]
-+  |  stp r10, CTSTATE->cb.gpr[7]
-   |   stfd f8, CTSTATE->cb.fpr[7]
-+  |  .if GPR64
-+  |   stfd f9, CTSTATE->cb.fpr[8]
-+  |   stfd f10, CTSTATE->cb.fpr[9]
-+  |   stfd f11, CTSTATE->cb.fpr[10]
-+  |   stfd f12, CTSTATE->cb.fpr[11]
-+  |   stfd f13, CTSTATE->cb.fpr[12]
-+  |  .endif
-+  |  .if ELFV2
-+  |  addi TMP0, sp, CFRAME_SPACE+96
-+  |  .elif GPR64
-+  |  addi TMP0, sp, CFRAME_SPACE+112
-+  |  .else
-   |  addi TMP0, sp, CFRAME_SPACE+8
--  |  stw TMP0, CTSTATE->cb.stack
-+  |  .endif
-+  |  stp TMP0, CTSTATE->cb.stack
-   |   mr CARG1, CTSTATE
-   |  stw CTSTATE, SAVE_PC		// Any value outside of bytecode is ok.
-   |   mr CARG2, sp
-   |  bl extern lj_ccallback_enter	// (CTState *cts, void *cf)
-   |  // Returns lua_State *.
-   |  lp BASE, L:CRET1->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp RC, L:CRET1->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |     li ZERO, 0
-@@ -2835,9 +3091,21 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG1, CTSTATE
-   |  mr CARG2, RA
-   |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
--  |  lwz CRET1, CTSTATE->cb.gpr[0]
-+  |  lp CRET1, CTSTATE->cb.gpr[0]
-   |  lfd FARG1, CTSTATE->cb.fpr[0]
--  |  lwz CRET2, CTSTATE->cb.gpr[1]
-+  |  lp CRET2, CTSTATE->cb.gpr[1]
-+  |  .if GPR64
-+  |    lfd FARG2, CTSTATE->cb.fpr[1]
-+  |  .else
-+  |    lp CARG3, CTSTATE->cb.gpr[2]
-+  |    lp CARG4, CTSTATE->cb.gpr[3]
-+  |  .endif
-+  |  .elfv2 lfd f3, CTSTATE->cb.fpr[2]
-+  |  .elfv2 lfd f4, CTSTATE->cb.fpr[3]
-+  |  .elfv2 lfd f5, CTSTATE->cb.fpr[4]
-+  |  .elfv2 lfd f6, CTSTATE->cb.fpr[5]
-+  |  .elfv2 lfd f7, CTSTATE->cb.fpr[6]
-+  |  .elfv2 lfd f8, CTSTATE->cb.fpr[7]
-   |  b ->vm_leave_unw
-   |.endif
-   |
-@@ -2850,23 +3118,46 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lbz CARG2, CCSTATE->nsp
-   |   lbz CARG3, CCSTATE->nfpr
-   |  neg TMP1, TMP1
-+  |  .if GPR64
-+  |    std TMP0, 16(sp)
-+  |  .else
-   |    stw TMP0, 4(sp)
-+  |  .endif
-   |   cmpwi cr1, CARG3, 0
-   |  mr TMP2, sp
-   |   addic. CARG2, CARG2, -1
-+  |  .if GPR64
-+  |  stdux sp, sp, TMP1
-+  |  .else
-   |  stwux sp, sp, TMP1
-+  |  .endif
-   |   crnot 4*cr1+eq, 4*cr1+eq		// For vararg calls.
--  |  stw r14, -4(TMP2)
--  |  stw CCSTATE, -8(TMP2)
-+  |  .if GPR64
-+  |    std r14, -8(TMP2)
-+  |    std CCSTATE, -16(TMP2)
-+  |  .else
-+  |    stw r14, -4(TMP2)
-+  |    stw CCSTATE, -8(TMP2)
-+  |  .endif
-   |  mr r14, TMP2
-   |  la TMP1, CCSTATE->stack
-+  |  .if GPR64
-+  |   sldi CARG2, CARG2, 3
-+  |  .else
-   |   slwi CARG2, CARG2, 2
-+  |  .endif
-   |   blty >2
--  |  la TMP2, 8(sp)
-+  |  .if ELFV2
-+  |    la TMP2, 96(sp)
-+  |  .elif GPR64
-+  |    la TMP2, 112(sp)
-+  |  .else
-+  |    la TMP2, 8(sp)
-+  |  .endif
-   |1:
--  |  lwzx TMP0, TMP1, CARG2
--  |  stwx TMP0, TMP2, CARG2
--  |   addic. CARG2, CARG2, -4
-+  |  lpx TMP0, TMP1, CARG2
-+  |  stpx TMP0, TMP2, CARG2
-+  |   addic. CARG2, CARG2, -PSIZE
-   |  bge <1
-   |2:
-   |  bney cr1, >3
-@@ -2878,28 +3169,55 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f6, CCSTATE->fpr[5]
-   |  lfd f7, CCSTATE->fpr[6]
-   |  lfd f8, CCSTATE->fpr[7]
-+  |  .if GPR64
-+  |  lfd f9, CCSTATE->fpr[8]
-+  |  lfd f10, CCSTATE->fpr[9]
-+  |  lfd f11, CCSTATE->fpr[10]
-+  |  lfd f12, CCSTATE->fpr[11]
-+  |  lfd f13, CCSTATE->fpr[12]
-+  |  .endif
-   |3:
--  |   lp TMP0, CCSTATE->func
--  |  lwz CARG2, CCSTATE->gpr[1]
--  |  lwz CARG3, CCSTATE->gpr[2]
--  |  lwz CARG4, CCSTATE->gpr[3]
--  |  lwz CARG5, CCSTATE->gpr[4]
--  |   mtctr TMP0
--  |  lwz r8, CCSTATE->gpr[5]
--  |  lwz r9, CCSTATE->gpr[6]
--  |  lwz r10, CCSTATE->gpr[7]
--  |  lwz CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-+  |  .toc std TOCREG, SAVE_TOC
-+  |   lp FUNCREG, CCSTATE->func
-+  |  lp CARG2, CCSTATE->gpr[1]
-+  |  lp CARG3, CCSTATE->gpr[2]
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-+  |  lp CARG4, CCSTATE->gpr[3]
-+  |  lp CARG5, CCSTATE->gpr[4]
-+  |   mtctr FUNCREG
-+  |  lp r8, CCSTATE->gpr[5]
-+  |  lp r9, CCSTATE->gpr[6]
-+  |  lp r10, CCSTATE->gpr[7]
-+  |  lp CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-   |   bctrl
--  |  lwz CCSTATE:TMP1, -8(r14)
--  |  lwz TMP2, -4(r14)
-+  |   .toc lp TOCREG, SAVE_TOC
-+  |  .if GPR64
-+  |   ld CCSTATE:TMP1, -16(r14)
-+  |   ld TMP2, -8(r14)
-+  |   ld TMP0, 16(r14)
-+  |  .else
-+  |   lwz CCSTATE:TMP1, -8(r14)
-+  |   lwz TMP2, -4(r14)
-   |   lwz TMP0, 4(r14)
--  |  stw CARG1, CCSTATE:TMP1->gpr[0]
-+  |  .endif
-+  |  stp CARG1, CCSTATE:TMP1->gpr[0]
-   |  stfd FARG1, CCSTATE:TMP1->fpr[0]
--  |  stw CARG2, CCSTATE:TMP1->gpr[1]
-+  |  stp CARG2, CCSTATE:TMP1->gpr[1]
-+  |  .if GPR64
-+  |   stfd FARG2, CCSTATE:TMP1->fpr[1]
-+  |  .endif
-+  |  .elfv2 stfd FARG3, CCSTATE:TMP1->fpr[2]
-+  |  .elfv2 stfd FARG4, CCSTATE:TMP1->fpr[3]
-+  |  .elfv2 stfd FARG5, CCSTATE:TMP1->fpr[4]
-+  |  .elfv2 stfd FARG6, CCSTATE:TMP1->fpr[5]
-+  |  .elfv2 stfd FARG7, CCSTATE:TMP1->fpr[6]
-+  |  .elfv2 stfd FARG8, CCSTATE:TMP1->fpr[7]
-   |   mtlr TMP0
--  |  stw CARG3, CCSTATE:TMP1->gpr[2]
-+  |  stp CARG3, CCSTATE:TMP1->gpr[2]
-   |   mr sp, r14
--  |  stw CARG4, CCSTATE:TMP1->gpr[3]
-+  |  stp CARG4, CCSTATE:TMP1->gpr[3]
-   |   mr r14, TMP2
-   |  blr
-   |.endif
-@@ -2923,13 +3241,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzx TMP1, BASE_HI, RD
-     |    lwz TMP2, -4(PC)
-     |  checknum cr0, TMP0
--    |   lwz CARG3, 4(RD)
-+    |   lwzx CARG3, BASE_LO, RD
-     |    decode_RD4 TMP2, TMP2
-     |  checknum cr1, TMP1
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-@@ -2953,7 +3271,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bgt cr0, ->vmeta_comp
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  bgt cr1, ->vmeta_comp
-     |  blt cr1, >4
-     |  // RA is a number, RD is an integer.
-@@ -2965,7 +3283,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA is an integer, RD is a number.
-     |  tonum_i f0, CARG2
-     |4:
--    |  lfd f1, 0(RD)
-+    |  lfdx f1, BASE, RD
-     |5:
-     |  fcmpu cr0, f0, f1
-     if (op == BC_ISLT) {
-@@ -2981,10 +3299,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  b <1
-     |.else
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
--    |  lwzx TMP1, BASE, RD
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |   lfdx f1, BASE, RD
-@@ -3015,15 +3333,23 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQV;
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  .if ENDIAN_LE
-+    |    lwzx TMP1, BASE_HI, RD
-+    |  .else
-+    |    lwzux TMP1, RD, BASE_HI
-+    |  .endif
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-+    |  .if ENDIAN_LE
-+    |   lwzux CARG3, RD, BASE_LO
-+    |  .else
-+    |   lwz CARG3, WORD_LO(RD)
-+    |  .endif
-     |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-@@ -3032,14 +3358,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  ble cr7, ->BC_ISNEN_Z
-     }
-     |.else
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   lwz TMP2, 0(PC)
--    |    lfd f0, 0(RA)
-+    |    lfdx f0, BASE, RA
-     |   addi PC, PC, 4
--    |  lwzux TMP1, RD, BASE
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |   decode_RD4 TMP2, TMP2
--    |    lfd f1, 0(RD)
-+    |    lfdx f1, BASE, RD
-     |  checknum cr1, TMP1
-     |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     |  bge cr0, >5
-@@ -3057,8 +3383,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.endif
-     |5:  // Either or both types are not numbers.
-     |.if not DUALNUM
--    |    lwz CARG2, 4(RA)
--    |    lwz CARG3, 4(RD)
-+    |    lwzx CARG2, BASE_LO, RA
-+    |    lwzx CARG3, BASE_LO, RD
-     |.endif
-     |.if FFI
-     |  cmpwi cr7, TMP0, LJ_TCDATA
-@@ -3074,10 +3400,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.if FFI
-     |  beq cr7, ->vmeta_equal_cd
-     |.endif
-+    |.if P64
-+    |   cmplwi cr7, TMP3, ~LJ_TUDATA		// Avoid 64 bit lightuserdata.
-+    |.endif
-     |    cmplw cr5, CARG2, CARG3
-     |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
-     |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
-     |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
-+    |.if P64
-+    |   cror 4*cr6+lt, 4*cr6+lt, 4*cr7+gt
-+    |.endif
-     |   mr SAVE0, PC
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
-     |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
-@@ -3116,9 +3448,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQS: case BC_ISNES:
-     vk = op == BC_ISEQS;
-     |  // RA = src*8, RD = str_const*8 (~), JMP with RD = target
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi RD, RD, 1
--    |  lwz STR:TMP3, 4(RA)
-+    |  lwzx STR:TMP3, BASE_LO, RA
-     |    lwz TMP2, 0(PC)
-     |   subfic RD, RD, -4
-     |    addi PC, PC, 4
-@@ -3150,15 +3482,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQN;
-     |  // RA = src*8, RD = num_const*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, KBASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzux2 TMP1, CARG3, RD, KBASE
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-       |->BC_ISEQN_Z:
-@@ -3175,7 +3506,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     } else {
-       |->BC_ISNEN_Z:  // Dummy label.
-     }
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
-     |    lwz TMP2, -4(PC)
-@@ -3213,7 +3544,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bge cr0, <3
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  blt cr1, >1
-     |  // RA is a number, RD is an integer.
-     |  tonum_i f1, CARG3
-@@ -3232,7 +3563,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQP: case BC_ISNEP:
-     vk = op == BC_ISEQP;
-     |  // RA = src*8, RD = primitive_type*8 (~), JMP with RD = target
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi TMP1, RD, 3
-     |    lwz TMP2, 0(PC)
-     |   not TMP1, TMP1
-@@ -3262,7 +3593,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
-     |  // RA = dst*8 or unused, RD = src*8, JMP with RD = target
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |   lwz INS, 0(PC)
-     |   addi PC, PC, 4
-     if (op == BC_IST || op == BC_ISF) {
-@@ -3297,7 +3628,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTYPE:
-     |  // RA = src*8, RD = -type*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  srwi TMP1, RD, 3
-     |  ins_next1
-     |.if not PPE and not GPR64
-@@ -3311,7 +3642,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_ISNUM:
-     |  // RA = src*8, RD = -(TISNUM-1)*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  ins_next1
-     |  checknum TMP0
-     |  bge ->vmeta_istype
-@@ -3330,17 +3661,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_NOT:
-     |  // RA = dst*8, RD = src*8
-     |  ins_next1
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |  .gpr64 extsw TMP0, TMP0
-     |  subfic TMP1, TMP0, LJ_TTRUE
-     |  adde TMP0, TMP0, TMP1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_UNM:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP1, RD, BASE
--    |   lwz TMP0, 4(RD)
-+    |  lwzx TMP1, BASE_HI, RD
-+    |   lwzx TMP0, BASE_LO, RD
-+    |.if DUALNUM and not GPR64
-+    |  mtxer ZERO
-+    |.endif
-     |  checknum TMP1
-     |.if DUALNUM
-     |  bne >5
-@@ -3352,18 +3686,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.else
-     |  nego. TMP0, TMP0
-     |  bso >4
--    |1:
-     |.endif
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |3:
-     |  ins_next2
-     |4:
--    |.if not GPR64
--    |  // Potential overflow.
--    |  checkov TMP1, <1			// Ignore unrelated overflow.
--    |.endif
-     |  lus TMP1, 0x41e0			// 2^31.
-     |  li TMP0, 0
-     |  b >7
-@@ -3373,8 +3702,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  xoris TMP1, TMP1, 0x8000
-     |7:
-     |  ins_next1
--    |  stwux TMP1, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TMP1, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |.if DUALNUM
-     |  b <3
-     |.else
-@@ -3383,15 +3712,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_LEN:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP0, RD, BASE
--    |   lwz CARG1, 4(RD)
-+    |  lwzx TMP0, BASE_HI, RD
-+    |   lwzx CARG1, BASE_LO, RD
-     |  checkstr TMP0; bne >2
-     |  lwz CRET1, STR:CARG1->len
-     |1:
-     |.if DUALNUM
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw CRET1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx CRET1, BASE_LO, RA
-     |.else
-     |  tonum_u f0, CRET1		// Result is a non-negative integer.
-     |  ins_next1
-@@ -3426,9 +3755,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, KBASE, RC
-@@ -3442,9 +3778,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f15, BASE, RB
-     |    lfdx f14, KBASE, RC
-@@ -3458,8 +3801,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||default:
--    |   lwzx TMP1, BASE, RB
--    |   lwzx TMP2, BASE, RC
-+    |   lwzx TMP1, BASE_HI, RB
-+    |   lwzx TMP2, BASE_HI, RC
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, BASE, RC
-     |   checknum cr0, TMP1
-@@ -3514,41 +3857,62 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG2, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG1, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG2, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG1, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG1, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG2, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG1, 4(RC)
-+    |   .endif
-     ||  break;
-     ||default:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, BASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |     lwzx TMP2, RC, BASE_HI
-+    |      lwzux CARG1, RB, BASE
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RC, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, BASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||}
-+    |  mtxer ZERO
-     |  checknum cr1, TMP2
-     |  bne >5
-     |  bne cr1, >5
-     |  intins CARG1, CARG1, CARG2
--    |  bso >4
--    |1:
-+    |  ins_arithfallback bso
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |  stw CARG1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |  stwx CARG1, BASE_LO, RA
-     |2:
-     |  ins_next2
--    |4:  // Overflow.
--    |  checkov TMP0, <1			// Ignore unrelated overflow.
--    |  ins_arithfallback b
-     |5:  // FP variant.
-     ||if (vk == 1) {
-     |  lfd f15, 0(RB)
-@@ -3620,9 +3984,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_POW:
-     |  // NYI: (partial) integer arithmetic.
--    |  lwzx TMP1, BASE, RB
-+    |  lwzx TMP1, BASE_HI, RB
-     |   lfdx FARG1, BASE, RB
--    |  lwzx TMP2, BASE, RC
-+    |  lwzx TMP2, BASE_HI, RC
-     |   lfdx FARG2, BASE, RC
-     |  checknum cr0, TMP1
-     |  checknum cr1, TMP2
-@@ -3648,6 +4012,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns NULL (finished) or TValue * (metamethod).
-     |  cmplwi CRET1, 0
-     |   lp BASE, L->base
-+    |   addi BASEP4, BASE, 4
-     |  bne ->vmeta_binop
-     |  ins_next1
-     |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
-@@ -3664,8 +4029,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-str_const*4
-     |  li TMP2, LJ_TSTR
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     break;
-   case BC_KCDATA:
-@@ -3676,8 +4041,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-cdata_const*4
-     |  li TMP2, LJ_TCDATA
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3687,14 +4052,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  slwi RD, RD, 13
-     |  srawi RD, RD, 16
-     |  ins_next1
--    |   stwux TISNUM, RA, BASE
--    |   stw RD, 4(RA)
-+    |   stwx TISNUM, BASE_HI, RA
-+    |   stwx RD, BASE_LO, RA
-     |  ins_next2
-     |.else
-     |  // The soft-float approach is faster.
-     |  slwi RD, RD, 13
-     |  srawi TMP1, RD, 31
-     |  xor TMP2, TMP1, RD
-+    |  .gpr64 extsw RD, RD
-     |  sub TMP2, TMP2, TMP1		// TMP2 = abs(x)
-     |  cntlzw TMP3, TMP2
-     |  subfic TMP1, TMP3, 0x40d		// TMP1 = exponent-1
-@@ -3706,8 +4072,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add RD, RD, TMP1		// hi = hi + exponent-1
-     |    and RD, RD, TMP0		// hi = x == 0 ? 0 : hi
-     |  ins_next1
--    |    stwux RD, RA, BASE
--    |    stw ZERO, 4(RA)
-+    |    stwx RD, BASE_HI, RA
-+    |    stwx ZERO, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3723,15 +4089,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  srwi TMP1, RD, 3
-     |  not TMP0, TMP1
-     |  ins_next1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_KNIL:
-     |  // RA = base*8, RD = end*8
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |   addi RA, RA, 8
-     |1:
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |  cmpw RA, RD
-     |   addi RA, RA, 8
-     |  blt <1
-@@ -3763,10 +4129,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lwz CARG2, UPVAL:RB->v
-     |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
-     |    lbz TMP0, UPVAL:RB->closed
--    |   lwz TMP2, 0(RD)
-+    |   lwz TMP2, WORD_HI(RD)
-     |   stfd f0, 0(CARG2)
-     |    cmplwi cr1, TMP0, 0
--    |   lwz TMP1, 4(RD)
-+    |   lwz TMP1, WORD_LO(RD)
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
-     |   subi TMP2, TMP2, (LJ_TNUMX+1)
-     |  bne >2				// Upvalue is closed and black?
-@@ -3799,8 +4165,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lbz TMP3, STR:TMP1->marked
-     |   lbz TMP2, UPVAL:RB->closed
-     |   li TMP0, LJ_TSTR
--    |   stw STR:TMP1, 4(CARG2)
--    |   stw TMP0, 0(CARG2)
-+    |   stw STR:TMP1, WORD_LO(CARG2)
-+    |   stw TMP0, WORD_HI(CARG2)
-     |  bne >2
-     |1:
-     |  ins_next
-@@ -3837,7 +4203,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwzx UPVAL:RB, LFUNC:RB, RA
-     |  ins_next1
-     |  lwz TMP1, UPVAL:RB->v
--    |  stw TMP0, 0(TMP1)
-+    |  stw TMP0, WORD_HI(TMP1)
-     |  ins_next2
-     break;
- 
-@@ -3852,6 +4218,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add CARG2, BASE, RA
-     |  bl extern lj_func_closeuv	// (lua_State *L, TValue *level)
-     |  lp BASE, L->base
-+    |  addi BASEP4, BASE, 4
-     |1:
-     |  ins_next
-     break;
-@@ -3870,8 +4237,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns GCfuncL *.
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TFUNC
--    |  stwux TMP0, RA, BASE
--    |  stw LFUNC:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx LFUNC:CRET1, BASE_LO, RA
-     |  ins_next
-     break;
- 
-@@ -3904,8 +4272,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TTAB
--    |  stwux TMP0, RA, BASE
--    |  stw TAB:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx TAB:CRET1, BASE_LO, RA
-     |  ins_next
-     if (op == BC_TNEW) {
-       |3:
-@@ -3938,13 +4307,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TGETV:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -3971,8 +4340,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP2, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tgetv		// Integer key and in array part?
--    |  lwzx TMP0, TMP1, TMP2
--    |   lfdx f14, TMP1, TMP2
-+    |  .if ENDIAN_LE
-+    |    lfdux f14, TMP1, TMP2
-+    |    lwz TMP0, WORD_HI(TMP1)
-+    |  .else
-+    |    lwzx TMP0, TMP1, TMP2
-+    |    lfdx f14, TMP1, TMP2
-+    |  .endif
-     |  checknil TMP0; beq >2
-     |1:
-     |  ins_next1
-@@ -3991,15 +4365,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tgetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TGETS_Z			// String key?
-     break;
-   case BC_TGETS:
-     |  // RA = dst*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4015,16 +4389,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  sub TMP1, TMP0, TMP1
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
--    |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+    |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-     |  checkstr CARG1; bne >4
-     |   cmpw TMP0, STR:RC; bne >4
-     |    checknil CARG2; beq >5		// Key found, but nil value?
-     |3:
--    |    stwux CARG2, RA, BASE
--    |     stw TMP1, 4(RA)
-+    |    stwx CARG2, BASE_HI, RA
-+    |     stwx TMP1, BASE_LO, RA
-     |  ins_next
-     |
-     |4:  // Follow hash chain.
-@@ -4045,15 +4419,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETB:
-     |  // RA = dst*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tgetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-     |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
--    |  lwzx TMP1, TMP2, RC
--    |   lfdx f0, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    lfdux f0, TMP2, RC
-+    |    lwz TMP1, WORD_HI(TMP2)
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |    lfdx f0, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  ins_next1
-@@ -4071,12 +4450,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG1, 4(RB)
-+    |  lwzx TAB:CARG1, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |  lwz TMP0, TAB:CARG1->asize
--    |  lwz CARG2, 4(RC)
-+    |  lwzx CARG2, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG1->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4096,13 +4473,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TSETV:
-     |  // RA = src*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -4129,7 +4506,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP0, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tsetv		// Integer key and in array part?
-+    |  .if ENDIAN_LE
-+    |   addi TMP2, TMP1, 4
-+    |   lwzx TMP2, TMP2, TMP0
-+    |  .else
-     |   lwzx TMP2, TMP1, TMP0
-+    |  .endif
-     |  lbz TMP3, TAB:RB->marked
-     |    lfdx f14, BASE, RA
-     |   checknil TMP2; beq >3
-@@ -4152,7 +4534,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tsetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TSETS_Z			// String key?
-     |
-@@ -4162,9 +4544,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETS:
-     |  // RA = src*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4183,9 +4565,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    lbz TMP3, TAB:RB->marked
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-     |     lwz NODE:TMP1, NODE:TMP2->next
-     |  checkstr CARG1; bne >5
-     |   cmpw TMP0, STR:RC; bne >5
-@@ -4225,13 +4607,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq ->vmeta_tsets		// 'no __newindex' flag NOT set: check.
-     |6:
-     |  li TMP0, LJ_TSTR
--    |   stw STR:RC, 4(CARG3)
-+    |   stw STR:RC, WORD_LO(CARG3)
-     |   mr CARG2, TAB:RB
--    |  stw TMP0, 0(CARG3)
-+    |  stw TMP0, WORD_HI(CARG3)
-     |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
-     |  // Returns TValue *.
-     |  lp BASE, L->base
-     |  stfd f14, 0(CRET1)
-+    |   addi BASEP4, BASE, 4
-     |  b <3				// No 2nd write barrier needed.
-     |
-     |7:  // Possible table write barrier for the value. Skip valiswhite check.
-@@ -4240,9 +4623,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETB:
-     |  // RA = src*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tsetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-@@ -4250,7 +4633,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw TMP0, TMP1
-     |   lfdx f14, BASE, RA
-     |  bge ->vmeta_tsetb
--    |  lwzx TMP1, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    addi TMP1, TMP2, 4
-+    |    lwzx TMP1, TMP1, RC
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
-@@ -4274,13 +4662,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG2, 4(RB)
-+    |  lwzx TAB:CARG2, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |    lbz TMP3, TAB:CARG2->marked
-     |  lwz TMP0, TAB:CARG2->asize
--    |  lwz CARG3, 4(RC)
-+    |  lwzx CARG3, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG2->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4311,9 +4697,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |1:
-     |   add TMP3, KBASE, RD
--    |  lwz TAB:CARG2, -4(RA)		// Guaranteed to be a table.
-+    |  lwz TAB:CARG2, WORD_LO-8(RA)	// Guaranteed to be a table.
-     |    addic. TMP0, MULTRES, -8
--    |   lwz TMP3, 4(TMP3)		// Integer constant is in lo-word.
-+    |   lwz TMP3, WORD_LO(TMP3)		// Integer constant is in lo-word.
-     |    srwi CARG3, TMP0, 3
-     |    beq >4				// Nothing to copy?
-     |  add CARG3, CARG3, TMP3
-@@ -4362,8 +4748,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_CALL:
-     |  // RA = base*8, (RB = (nresults+1)*8,) RC = (nargs+1)*8
-     |  mr TMP2, BASE
--    |  lwzux TMP0, BASE, RA
--    |   lwz LFUNC:RB, 4(BASE)
-+    |  lwzux2 TMP0, LFUNC:RB, BASE, RA
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |   addi BASE, BASE, 8
-     |  checkfunc TMP0; bne ->vmeta_call
-@@ -4377,8 +4762,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_CALLT:
-     |  // RA = base*8, (RB = 0,) RC = (nargs+1)*8
--    |  lwzux TMP0, RA, BASE
--    |   lwz LFUNC:RB, 4(RA)
-+    |  lwzux2 TMP0, LFUNC:RB, RA, BASE
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |    lwz TMP1, FRAME_PC(BASE)
-     |  checkfunc TMP0
-@@ -4430,12 +4814,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 ((2+1)*8))
-     |  mr TMP2, BASE
-     |  add BASE, BASE, RA
--    |  lwz TMP1, -24(BASE)
--    |   lwz LFUNC:RB, -20(BASE)
-+    |  lwz TMP1, WORD_HI-24(BASE)
-+    |   lwz LFUNC:RB, WORD_LO-24(BASE)
-     |    lfd f1, -8(BASE)
-     |    lfd f0, -16(BASE)
--    |  stw TMP1, 0(BASE)		// Copy callable.
--    |   stw LFUNC:RB, 4(BASE)
-+    |  stw TMP1, WORD_HI(BASE)		// Copy callable.
-+    |   stw LFUNC:RB, WORD_LO(BASE)
-     |  checkfunc TMP1
-     |    stfd f1, 16(BASE)		// Copy control var.
-     |     li NARGS8:RC, 16		// Iterators get 2 arguments.
-@@ -4450,8 +4834,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // NYI: add hotloop, record BC_ITERN.
-     |.endif
-     |  add RA, BASE, RA
--    |  lwz TAB:RB, -12(RA)
--    |  lwz RC, -4(RA)			// Get index from control var.
-+    |  lwz TAB:RB, WORD_LO-16(RA)
-+    |  lwz RC, WORD_LO-8(RA)		// Get index from control var.
-     |  lwz TMP0, TAB:RB->asize
-     |  lwz TMP1, TAB:RB->array
-     |   addi PC, PC, 4
-@@ -4459,14 +4843,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw RC, TMP0
-     |   slwi TMP3, RC, 3
-     |  bge >5				// Index points after array part?
--    |  lwzx TMP2, TMP1, TMP3
--    |   lfdx f0, TMP1, TMP3
-+    |  lfdux f0, TMP3, TMP1
-+    |   lwz TMP2, WORD_HI(TMP3)
-     |  checknil TMP2
-     |     lwz INS, -4(PC)
-     |  beq >4
-     |.if DUALNUM
--    |   stw RC, 4(RA)
--    |   stw TISNUM, 0(RA)
-+    |   stw RC, WORD_LO(RA)
-+    |   stw TISNUM, WORD_HI(RA)
-     |.else
-     |   tonum_u f1, RC
-     |.endif
-@@ -4474,7 +4858,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
-     |  stfd f0, 8(RA)
-     |     decode_RD4 TMP1, INS
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |     add PC, TMP1, TMP3
-     |.if not DUALNUM
-     |   stfd f1, 0(RA)
-@@ -4496,9 +4880,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgty <3
-     |   slwi RB, RC, 3
-     |   sub TMP3, TMP3, RB
--    |  lwzx RB, TMP2, TMP3
--    |  lfdx f0, TMP2, TMP3
--    |   add NODE:TMP3, TMP2, TMP3
-+    |  lfdux f0, TMP3, TMP2
-+    |  lwz RB, WORD_HI(TMP3)
-     |  checknil RB
-     |     lwz INS, -4(PC)
-     |  beq >7
-@@ -4510,7 +4893,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   stfd f1, 0(RA)
-     |    addi RC, RC, 1
-     |     add PC, TMP1, TMP2
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |  b <3
-     |
-     |7:  // Skip holes in hash part.
-@@ -4521,10 +4904,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISNEXT:
-     |  // RA = base*8, RD = target (points to ITERN)
-     |  add RA, BASE, RA
--    |  lwz TMP0, -24(RA)
--    |  lwz CFUNC:TMP1, -20(RA)
--    |   lwz TMP2, -16(RA)
--    |    lwz TMP3, -8(RA)
-+    |  lwz TMP0, WORD_HI-24(RA)
-+    |  lwz CFUNC:TMP1, WORD_LO-24(RA)
-+    |   lwz TMP2, WORD_HI-16(RA)
-+    |    lwz TMP3, WORD_HI-8(RA)
-     |   cmpwi cr0, TMP2, LJ_TTAB
-     |  cmpwi cr1, TMP0, LJ_TFUNC
-     |    cmpwi cr6, TMP3, LJ_TNIL
-@@ -4538,17 +4921,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bne cr0, >5
-     |  lus TMP1, 0xfffe
-     |  ori TMP1, TMP1, 0x7fff
--    |  stw ZERO, -4(RA)			// Initialize control var.
--    |  stw TMP1, -8(RA)
-+    |  stw ZERO, WORD_LO-8(RA)		// Initialize control var.
-+    |  stw TMP1, WORD_HI-8(RA)
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-     |1:
-     |  ins_next
-     |5:  // Despecialize bytecode if any of the checks fail.
-     |  li TMP0, BC_JMP
-     |   li TMP1, BC_ITERC
-+    |  .if ENDIAN_LE
-+    |  stb TMP0, -4(PC)
-+    |  .else
-     |  stb TMP0, -1(PC)
-+    |  .endif
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-+    |  .if ENDIAN_LE
-+    |   stb TMP1, 0(PC)
-+    |  .else
-     |   stb TMP1, 3(PC)
-+    |  .endif
-     |  b <1
-     break;
- 
-@@ -4582,7 +4973,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    addi RA, RA, 8
-     |   blt cr1, <1			// More vararg slots?
-     |2:  // Fill up remainder with nil.
--    |  stw TISNIL, 0(RA)
-+    |  stw TISNIL, WORD_HI(RA)
-     |  cmplw RA, TMP2
-     |   addi RA, RA, 8
-     |  blt <2
-@@ -4619,6 +5010,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |  add RC, BASE, SAVE0
-     |  subi TMP3, BASE, 8
-+    |  addi BASEP4, BASE, 4
-     |  b <6
-     break;
- 
-@@ -4667,13 +5059,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4709,13 +5102,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4741,11 +5135,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = (op == BC_IFORL || op == BC_JFORL);
-     |.if DUALNUM
-     |  // Integer loop.
--    |  lwzux TMP1, RA, BASE
--    |   lwz CARG1, FORL_IDX*8+4(RA)
-+    |  lwzux2 TMP1, CARG1, RA, BASE
-+    if (vk) {
-+      |  mtxer ZERO
-+    }
-     |  cmplw cr0, TMP1, TISNUM
-     if (vk) {
--      |   lwz CARG3, FORL_STEP*8+4(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-       |  bne >9
-       |.if GPR64
-       |  // Need to check overflow for (a<<32) + (b<<32).
-@@ -4757,15 +5153,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  addo. CARG1, CARG1, CARG3
-       |.endif
-       |    cmpwi cr6, CARG3, 0
--      |   lwz CARG2, FORL_STOP*8+4(RA)
--      |  bso >6
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-+      |  bso >2
-       |4:
--      |  stw CARG1, FORL_IDX*8+4(RA)
-+      |  stw CARG1, FORL_IDX*8+WORD_LO(RA)
-     } else {
--      |  lwz TMP3, FORL_STEP*8(RA)
--      |   lwz CARG3, FORL_STEP*8+4(RA)
--      |  lwz TMP2, FORL_STOP*8(RA)
--      |   lwz CARG2, FORL_STOP*8+4(RA)
-+      |  lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-+      |  lwz TMP2, FORL_STOP*8+WORD_HI(RA)
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-       |  cmplw cr7, TMP3, TISNUM
-       |  cmplw cr1, TMP2, TISNUM
-       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
-@@ -4776,11 +5172,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    blt cr6, >5
-     |  cmpw CARG1, CARG2
-     |1:
--    |   stw TISNUM, FORL_EXT*8(RA)
-+    |   stw TISNUM, FORL_EXT*8+WORD_HI(RA)
-     if (op != BC_JFORL) {
-       |  srwi RD, RD, 1
-     }
--    |   stw CARG1, FORL_EXT*8+4(RA)
-+    |   stw CARG1, FORL_EXT*8+WORD_LO(RA)
-     if (op != BC_JFORL) {
-       |  add RD, PC, RD
-     }
-@@ -4800,11 +5196,6 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:  // Invert check for negative step.
-     |  cmpw CARG2, CARG1
-     |  b <1
--    if (vk) {
--      |6:  // Potential overflow.
--      |  checkov TMP0, <4		// Ignore unrelated overflow.
--      |  b <2
--    }
-     |.endif
-     if (vk) {
-       |.if DUALNUM
-@@ -4815,14 +5206,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |.endif
-       |  lfd f3, FORL_STEP*8(RA)
-       |  lfd f2, FORL_STOP*8(RA)
--      |   lwz TMP3, FORL_STEP*8(RA)
-+      |   lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-       |  fadd f1, f1, f3
-       |  stfd f1, FORL_IDX*8(RA)
-     } else {
-       |.if DUALNUM
-       |9:  // FP loop.
-       |.else
-+      |.if ENDIAN_LE
-+      |  lwzx TMP1, RA, BASE_LO
-+      |  add RA, RA, BASE
-+      |.else
-       |  lwzux TMP1, RA, BASE
-+      |.endif
-       |  lwz TMP3, FORL_STEP*8(RA)
-       |  lwz TMP2, FORL_STOP*8(RA)
-       |  cmplw cr0, TMP1, TISNUM
-@@ -4903,17 +5299,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- #endif
-   case BC_IITERL:
-     |  // RA = base*8, RD = target
--    |  lwzux TMP1, RA, BASE
--    |   lwz TMP2, 4(RA)
-+    |  lwzux2 TMP1, TMP2, RA, BASE
-     |  checknil TMP1; beq >1		// Stop if iterator returned nil.
-     if (op == BC_JITERL) {
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-       |  b =>BC_JLOOP
-     } else {
-       |  branch_RD			// Otherwise save control var + branch.
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-     }
-     |1:
-     |  ins_next
-@@ -4942,7 +5337,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Traces on PPC don't store the trace number, so use 0.
-     |   stw ZERO, DISPATCH_GL(vmstate)(DISPATCH)
-     |  lwzx TRACE:TMP2, TMP1, RD
--    |  clrso TMP1
-+    |  mtxer ZERO
-     |  lp TMP2, TRACE:TMP2->mcode
-     |   stw BASE, DISPATCH_GL(jit_base)(DISPATCH)
-     |  mtctr TMP2
-@@ -4994,7 +5389,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |
-     |3:  // Clear missing parameters.
--    |  stwx TISNIL, BASE, NARGS8:RC
-+    |  stwx TISNIL, BASE_HI, NARGS8:RC
-     |  addi NARGS8:RC, NARGS8:RC, 8
-     |  b <2
-     break;
-@@ -5011,11 +5406,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwz TMP2, L->maxstack
-     |   add TMP1, BASE, RC
-     |  add TMP0, RA, RC
--    |   stw LFUNC:RB, 4(TMP1)		// Store copy of LFUNC.
-+    |   stw LFUNC:RB, WORD_LO(TMP1)	// Store copy of LFUNC.
-     |   addi TMP3, RC, 8+FRAME_VARG
-     |    lwz KBASE, -4+PC2PROTO(k)(PC)
-     |  cmplw TMP0, TMP2
--    |   stw TMP3, 0(TMP1)		// Store delta + FRAME_VARG.
-+    |   stw TMP3, WORD_HI(TMP1)		// Store delta + FRAME_VARG.
-     |  bge ->vm_growstack_l
-     |  lbz TMP2, -4+PC2PROTO(numparams)(PC)
-     |   mr RA, BASE
-@@ -5026,18 +5421,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq >3
-     |1:
-     |  cmplw RA, RC			// Less args than parameters?
--    |   lwz TMP0, 0(RA)
--    |   lwz TMP3, 4(RA)
-+    |   lwz TMP0, WORD_HI(RA)
-+    |   lwz TMP3, WORD_LO(RA)
-     |  bge >4
--    |    stw TISNIL, 0(RA)		// Clear old fixarg slot (help the GC).
-+    |    stw TISNIL, WORD_HI(RA)	// Clear old fixarg slot (help the GC).
-     |    addi RA, RA, 8
-     |2:
-     |  addic. TMP2, TMP2, -1
--    |   stw TMP0, 8(TMP1)
--    |   stw TMP3, 12(TMP1)
-+    |   stw TMP0, WORD_HI+8(TMP1)
-+    |   stw TMP3, WORD_LO+8(TMP1)
-     |    addi TMP1, TMP1, 8
-     |  bne <1
-     |3:
-+    |  addi BASEP4, BASE, 4
-     |  ins_next2
-     |
-     |4:  // Clear missing parameters.
-@@ -5049,35 +5445,35 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_FUNCCW:
-     |  // BASE = new base, RA = BASE+framesize*8, RB = CFUNC, RC = nargs*8
-     if (op == BC_FUNCC) {
--      |  lp RD, CFUNC:RB->f
-+      |  lp FUNCREG, CFUNC:RB->f
-     } else {
--      |  lp RD, DISPATCH_GL(wrapf)(DISPATCH)
-+      |  lp FUNCREG, DISPATCH_GL(wrapf)(DISPATCH)
-     }
-     |   add TMP1, RA, NARGS8:RC
-     |   lwz TMP2, L->maxstack
--    |  .toc lp TMP3, 0(RD)
-+    |  .opd lp TMP3, 0(FUNCREG)
-     |    add RC, BASE, NARGS8:RC
-     |   stp BASE, L->base
-     |   cmplw TMP1, TMP2
-     |    stp RC, L->top
-     |     li_vmstate C
--    |.if TOC
-+    |.if OPD
-     |  mtctr TMP3
-     |.else
--    |  mtctr RD
-+    |  mtctr FUNCREG
-     |.endif
-     if (op == BC_FUNCCW) {
-       |  lp CARG2, CFUNC:RB->f
-     }
-     |  mr CARG1, L
-     |   bgt ->vm_growstack_c		// Need to grow stack.
--    |  .toc lp TOCREG, TOC_OFS(RD)
--    |  .tocenv lp ENVREG, ENV_OFS(RD)
-+    |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+    |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-     |     st_vmstate
-     |  bctrl				// (lua_State *L [, lua_CFunction f])
-+    |  .toc lp TOCREG, SAVE_TOC
-     |  // Returns nresults.
-     |  lp BASE, L->base
--    |  .toc ld TOCREG, SAVE_TOC
-     |   slwi RD, CRET1, 3
-     |  lp TMP1, L->top
-     |    li_vmstate INTERP
-@@ -5128,7 +5524,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.byte 0xc\n\t.uleb128 1\n\t.uleb128 0\n"
- 	"\t.align 2\n"
-@@ -5141,14 +5541,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long .Lbegin\n"
- 	"\t.long %d\n"
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE0:\n\n");
-@@ -5164,8 +5574,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call\n"
- #endif
- 	"\t.long %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
-@@ -5180,7 +5594,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"zPR\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.uleb128 6\n"			/* augmentation length */
- 	"\t.byte 0x1b\n"			/* pcrel|sdata4 */
-@@ -5198,14 +5616,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE2:\n\n");
-@@ -5233,8 +5661,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call-.\n"
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch b/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
deleted file mode 100644
index f4e760b738361..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
+++ /dev/null
@@ -1,11 +0,0 @@
---- a/src/vm_ppc.dasc	2019-06-03 19:41:50.214671731 +0200
-+++ b/src/vm_ppc.dasc	2019-06-03 19:44:40.229686143 +0200
-@@ -2774,7 +2774,7 @@
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  addi sp, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-   |  saver 3 // CARG1
-   |  saver 4 // CARG2
-   |  saver 5 // CARG3
diff --git a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch b/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
deleted file mode 100644
index 487a1cd1ca787..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
+++ /dev/null
@@ -1,231 +0,0 @@
-commit 9da06535092d6d9dec442641a26c64bce5574322
-Author: Mike Pall <mike>
-Date:   Sun Jun 24 14:08:59 2018 +0200
-
-    ARM64: Fix exit stub patching.
-    
-    Contributed by Javier Guerra Giraldez.
-
-diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
-index cbb186d3..baafa21a 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -56,11 +56,11 @@ static void asm_exitstub_setup(ASMState *as, ExitNo nexits)
-     asm_mclimit(as);
-   /* 1: str lr,[sp]; bl ->vm_exit_handler; movz w0,traceno; bl <1; bl <1; ... */
-   for (i = nexits-1; (int32_t)i >= 0; i--)
--    *--mxp = A64I_LE(A64I_BL|((-3-i)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_MOVZw|A64F_U16(as->T->traceno));
-+    *--mxp = A64I_LE(A64I_BL | A64F_S26(-3-i));
-+  *--mxp = A64I_LE(A64I_MOVZw | A64F_U16(as->T->traceno));
-   mxp--;
--  *mxp = A64I_LE(A64I_BL|(((MCode *)(void *)lj_vm_exit_handler-mxp)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_STRx|A64F_D(RID_LR)|A64F_N(RID_SP));
-+  *mxp = A64I_LE(A64I_BL | A64F_S26(((MCode *)(void *)lj_vm_exit_handler-mxp)));
-+  *--mxp = A64I_LE(A64I_STRx | A64F_D(RID_LR) | A64F_N(RID_SP));
-   as->mctop = mxp;
- }
- 
-@@ -77,7 +77,7 @@ static void asm_guardcc(ASMState *as, A64CC cc)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cond_branch(as, cc^1, p-1);
-     return;
-   }
-@@ -91,7 +91,7 @@ static void asm_guardtnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_tnb(as, ai^0x01000000u, r, bit, p-1);
-     return;
-   }
-@@ -105,7 +105,7 @@ static void asm_guardcnb(ASMState *as, A64Ins ai, Reg r)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cnb(as, ai^0x01000000u, r, p-1);
-     return;
-   }
-@@ -1850,7 +1850,7 @@ static void asm_loop_fixup(ASMState *as)
-     p[-2] |= ((uint32_t)delta & mask) << 5;
-   } else {
-     ptrdiff_t delta = target - (p - 1);
--    p[-1] = A64I_B | ((uint32_t)(delta) & 0x03ffffffu);
-+    p[-1] = A64I_B | A64F_S26(delta);
-   }
- }
- 
-@@ -1919,7 +1919,7 @@ static void asm_tail_fixup(ASMState *as, TraceNo lnk)
-   }
-   /* Patch exit branch. */
-   target = lnk ? traceref(as->J, lnk)->mcode : (MCode *)lj_vm_exit_interp;
--  p[-1] = A64I_B | (((target-p)+1)&0x03ffffffu);
-+  p[-1] = A64I_B | A64F_S26((target-p)+1);
- }
- 
- /* Prepare tail of code. */
-@@ -1982,40 +1982,50 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
- {
-   MCode *p = T->mcode;
-   MCode *pe = (MCode *)((char *)p + T->szmcode);
--  MCode *cstart = NULL, *cend = p;
-+  MCode *cstart = NULL;
-   MCode *mcarea = lj_mcode_patch(J, p, 0);
-   MCode *px = exitstub_trace_addr(T, exitno);
-+  /* Note: this assumes a trace exit is only ever patched once. */
-   for (; p < pe; p++) {
-     /* Look for exitstub branch, replace with branch to target. */
-+    ptrdiff_t delta = target - p;
-     MCode ins = A64I_LE(*p);
-     if ((ins & 0xff000000u) == 0x54000000u &&
- 	((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch bcc exitstub. */
--      *p = A64I_LE((ins & 0xff00001fu) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch bcc, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0xfc000000u) == 0x14000000u &&
- 	       ((ins ^ (px-p)) & 0x03ffffffu) == 0) {
--      /* Patch b exitstub. */
--      *p = A64I_LE((ins & 0xfc000000u) | ((target-p) & 0x03ffffffu));
--      cend = p+1;
-+      /* Patch b. */
-+      lua_assert(A64F_S_OK(delta, 26));
-+      *p = A64I_LE((ins & 0xfc000000u) | A64F_S26(delta));
-       if (!cstart) cstart = p;
-     } else if ((ins & 0x7e000000u) == 0x34000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch cbz/cbnz exitstub. */
--      *p = A64I_LE((ins & 0xff00001f) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch cbz/cbnz, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0x7e000000u) == 0x36000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x0007ffe0u) == 0) {
--      /* Patch tbz/tbnz exitstub. */
--      *p = A64I_LE((ins & 0xfff8001fu) | (((target-p)<<5) & 0x0007ffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch tbz/tbnz, if within range. */
-+      if (A64F_S_OK(delta, 14)) {
-+	*p = A64I_LE((ins & 0xfff8001fu) | A64F_S14(delta));
-+	if (!cstart) cstart = p;
-+      }
-     }
-   }
--  lua_assert(cstart != NULL);
--  lj_mcode_sync(cstart, cend);
-+  {  /* Always patch long-range branch in exit stub itself. */
-+    ptrdiff_t delta = target - px;
-+    lua_assert(A64F_S_OK(delta, 26));
-+    *px = A64I_B | A64F_S26(delta);
-+    if (!cstart) cstart = px;
-+  }
-+  lj_mcode_sync(cstart, px+1);
-   lj_mcode_patch(J, mcarea, 1);
- }
- 
-diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
-index 6da4c7d4..1001b1d8 100644
---- a/src/lj_emit_arm64.h
-+++ b/src/lj_emit_arm64.h
-@@ -241,7 +241,7 @@ static void emit_loadk(ASMState *as, Reg rd, uint64_t u64, int is64)
- #define mcpofs(as, k) \
-   ((intptr_t)((uintptr_t)(k) - (uintptr_t)(as->mcp - 1)))
- #define checkmcpofs(as, k) \
--  ((((mcpofs(as, k)>>2) + 0x00040000) >> 19) == 0)
-+  (A64F_S_OK(mcpofs(as, k)>>2, 19))
- 
- static Reg ra_allock(ASMState *as, intptr_t k, RegSet allow);
- 
-@@ -312,7 +312,7 @@ static void emit_cond_branch(ASMState *as, A64CC cond, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = A64I_BCC | A64F_S19(delta) | cond;
- }
- 
-@@ -320,24 +320,24 @@ static void emit_branch(ASMState *as, A64Ins ai, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x02000000) >> 26) == 0);
--  *p = ai | ((uint32_t)delta & 0x03ffffffu);
-+  lua_assert(A64F_S_OK(delta, 26));
-+  *p = ai | A64F_S26(delta);
- }
- 
- static void emit_tnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(bit < 63 && ((delta + 0x2000) >> 14) == 0);
-+  lua_assert(bit < 63 && A64F_S_OK(delta, 14));
-   if (bit > 31) ai |= A64I_X;
--  *p = ai | A64F_BIT(bit & 31) | A64F_S14((uint32_t)delta & 0x3fffu) | r;
-+  *p = ai | A64F_BIT(bit & 31) | A64F_S14(delta) | r;
- }
- 
- static void emit_cnb(ASMState *as, A64Ins ai, Reg r, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = ai | A64F_S19(delta) | r;
- }
- 
-@@ -347,8 +347,8 @@ static void emit_call(ASMState *as, void *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = (char *)target - (char *)p;
--  if ((((delta>>2) + 0x02000000) >> 26) == 0) {
--    *p = A64I_BL | ((uint32_t)(delta>>2) & 0x03ffffffu);
-+  if (A64F_S_OK(delta>>2, 26)) {
-+    *p = A64I_BL | A64F_S26(delta>>2);
-   } else {  /* Target out of range: need indirect call. But don't use R0-R7. */
-     Reg r = ra_allock(as, i64ptr(target),
- 		      RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED);
-diff --git a/src/lj_target_arm64.h b/src/lj_target_arm64.h
-index 520023ae..a207a2ba 100644
---- a/src/lj_target_arm64.h
-+++ b/src/lj_target_arm64.h
-@@ -132,9 +132,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_IMMR(x)	((x) << 16)
- #define A64F_U16(x)	((x) << 5)
- #define A64F_U12(x)	((x) << 10)
--#define A64F_S26(x)	(x)
-+#define A64F_S26(x)	(((uint32_t)(x) & 0x03ffffffu))
- #define A64F_S19(x)	(((uint32_t)(x) & 0x7ffffu) << 5)
--#define A64F_S14(x)	((x) << 5)
-+#define A64F_S14(x)	(((uint32_t)(x) & 0x3fffu) << 5)
- #define A64F_S9(x)	((x) << 12)
- #define A64F_BIT(x)	((x) << 19)
- #define A64F_SH(sh, x)	(((sh) << 22) | ((x) << 10))
-@@ -145,6 +145,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_LSL16(x)	(((x) / 16) << 21)
- #define A64F_BSH(sh)	((sh) << 10)
- 
-+/* Check for valid field range. */
-+#define A64F_S_OK(x, b)	((((x) + (1 << (b-1))) >> (b)) == 0)
-+
- typedef enum A64Ins {
-   A64I_S = 0x20000000,
-   A64I_X = 0x80000000,
diff --git a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch b/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
deleted file mode 100644
index c30264786755f..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From: Jason Teplitz <jason@tensyr.com>
-Date: Mon, 9 Oct 2017 23:03:09 +0000
-Subject: Fix register allocation bug in arm64
-
----
- src/lj_asm_arm64.h | 3 +--
- 1 file changed, 1 insertion(+), 2 deletions(-)
-
-diff --git src/lj_asm_arm64.h src/lj_asm_arm64.h
-index 8fd92e7..549f8a6 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -871,7 +871,7 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   int bigofs = !emit_checkofs(A64I_LDRx, ofs);
-   RegSet allow = RSET_GPR;
-   Reg dest = (ra_used(ir) || bigofs) ? ra_dest(as, ir, RSET_GPR) : RID_NONE;
--  Reg node = ra_alloc1(as, ir->op1, allow);
-+  Reg node = ra_alloc1(as, ir->op1, ra_hasreg(dest) ? rset_clear(allow, dest) : allow);
-   Reg key = ra_scratch(as, rset_clear(allow, node));
-   Reg idx = node;
-   uint64_t k;
-@@ -879,7 +879,6 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   rset_clear(allow, key);
-   if (bigofs) {
-     idx = dest;
--    rset_clear(allow, dest);
-     kofs = (int32_t)offsetof(Node, key);
-   } else if (ra_hasreg(dest)) {
-     emit_opk(as, A64I_ADDx, dest, node, ofs, allow);
diff --git a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch b/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
deleted file mode 100644
index a217866c392cf..0000000000000
--- a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
+++ /dev/null
@@ -1,562 +0,0 @@
-From e9af1abec542e6f9851ff2368e7f196b6382a44c Mon Sep 17 00:00:00 2001
-From: Mike Pall <mike>
-Date: Wed, 30 Sep 2020 01:31:27 +0200
-Subject: [PATCH] Add support for full-range 64 bit lightuserdata.
-
----
- doc/status.html   | 11 ---------
- src/jit/dump.lua  |  4 +++-
- src/lib_debug.c   | 12 +++++-----
- src/lib_jit.c     | 14 ++++++------
- src/lib_package.c |  8 +++----
- src/lib_string.c  |  2 +-
- src/lj_api.c      | 40 +++++++++++++++++++++++++++++----
- src/lj_ccall.c    |  2 +-
- src/lj_cconv.c    |  2 +-
- src/lj_crecord.c  |  6 ++---
- src/lj_dispatch.c |  2 +-
- src/lj_ir.c       |  6 +++--
- src/lj_obj.c      |  5 +++--
- src/lj_obj.h      | 57 ++++++++++++++++++++++++++++++-----------------
- src/lj_snap.c     |  9 +++++++-
- src/lj_state.c    |  6 +++++
- src/lj_strfmt.c   |  2 +-
- 17 files changed, 121 insertions(+), 67 deletions(-)
-
-#diff --git a/doc/status.html b/doc/status.html
-#index 0aafe13a2..fd0ae8bae 100644
-#--- a/doc/status.html
-#+++ b/doc/status.html
-#@@ -91,17 +91,6 @@ <h2>Current Status</h2>
-# <tt>lua_atpanic</tt> on x64. This issue will be fixed with the new
-# garbage collector.
-# </li>
-#-<li>
-#-LuaJIT on 64 bit systems provides a <b>limited range</b> of 47 bits for the
-#-<b>legacy <tt>lightuserdata</tt></b> data type.
-#-This is only relevant on x64 systems which use the negative part of the
-#-virtual address space in user mode, e.g. Solaris/x64, and on ARM64 systems
-#-configured with a 48 bit or 52 bit VA.
-#-Avoid using <tt>lightuserdata</tt> to hold pointers that may point outside
-#-of that range, e.g. variables on the stack. In general, avoid this data
-#-type for new code and replace it with (much more performant) FFI bindings.
-#-FFI cdata pointers can address the full 64 bit range.
-#-</li>
-# </ul>
-# <br class="flush">
-# </div>
-Index: luajit/src/jit/dump.lua
-===================================================================
---- luajit.orig/src/jit/dump.lua
-+++ luajit/src/jit/dump.lua
-@@ -315,7 +315,9 @@
-   local tn = type(k)
-   local s
-   if tn == "number" then
--    if band(sn or 0, 0x30000) ~= 0 then
-+    if t < 12 then
-+      s = k == 0 and "NULL" or format("[0x%08x]", k)
-+    elseif band(sn or 0, 0x30000) ~= 0 then
-       s = band(sn, 0x20000) ~= 0 and "contpc" or "ftsz"
-     elseif k == 2^52+2^51 then
-       s = "bias"
-Index: luajit/src/lib_debug.c
-===================================================================
---- luajit.orig/src/lib_debug.c
-+++ luajit/src/lib_debug.c
-@@ -231,8 +231,8 @@
-   int32_t n = lj_lib_checkint(L, 2) - 1;
-   if ((uint32_t)n >= fn->l.nupvalues)
-     lj_err_arg(L, 2, LJ_ERR_IDXRNG);
--  setlightudV(L->top-1, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
--					(void *)&fn->c.upvalue[n]);
-+  lua_pushlightuserdata(L, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
-+					   (void *)&fn->c.upvalue[n]);
-   return 1;
- }
- 
-@@ -283,13 +283,13 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define KEY_HOOK	((void *)0x3004)
-+#define KEY_HOOK	(U64x(80000000,00000000)|'h')
- 
- static void hookf(lua_State *L, lua_Debug *ar)
- {
-   static const char *const hooknames[] =
-     {"call", "return", "line", "count", "tail return"};
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_rawget(L, LUA_REGISTRYINDEX);
-   if (lua_isfunction(L, -1)) {
-     lua_pushstring(L, hooknames[(int)ar->event]);
-@@ -334,7 +334,7 @@
-     count = luaL_optint(L, arg+3, 0);
-     func = hookf; mask = makemask(smask, count);
-   }
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_pushvalue(L, arg+1);
-   lua_rawset(L, LUA_REGISTRYINDEX);
-   lua_sethook(L, func, mask, count);
-@@ -349,7 +349,7 @@
-   if (hook != NULL && hook != hookf) {  /* external hook? */
-     lua_pushliteral(L, "external hook");
-   } else {
--    lua_pushlightuserdata(L, KEY_HOOK);
-+    (L->top++)->u64 = KEY_HOOK;
-     lua_rawget(L, LUA_REGISTRYINDEX);   /* get hook */
-   }
-   lua_pushstring(L, unmakemask(mask, buff));
-Index: luajit/src/lib_jit.c
-===================================================================
---- luajit.orig/src/lib_jit.c
-+++ luajit/src/lib_jit.c
-@@ -540,15 +540,15 @@
- 
- /* Not loaded by default, use: local profile = require("jit.profile") */
- 
--static const char KEY_PROFILE_THREAD = 't';
--static const char KEY_PROFILE_FUNC = 'f';
-+#define KEY_PROFILE_THREAD	(U64x(80000000,00000000)|'t')
-+#define KEY_PROFILE_FUNC	(U64x(80000000,00000000)|'f')
- 
- static void jit_profile_callback(lua_State *L2, lua_State *L, int samples,
- 				 int vmstate)
- {
-   TValue key;
-   cTValue *tv;
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   tv = lj_tab_get(L, tabV(registry(L)), &key);
-   if (tvisfunc(tv)) {
-     char vmst = (char)vmstate;
-@@ -575,9 +575,9 @@
-   lua_State *L2 = lua_newthread(L);  /* Thread that runs profiler callback. */
-   TValue key;
-   /* Anchor thread and function in registry. */
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setthreadV(L, lj_tab_set(L, registry, &key), L2);
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setfuncV(L, lj_tab_set(L, registry, &key), func);
-   lj_gc_anybarriert(L, registry);
-   luaJIT_profile_start(L, mode ? strdata(mode) : "",
-@@ -592,9 +592,9 @@
-   TValue key;
-   luaJIT_profile_stop(L);
-   registry = tabV(registry(L));
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setnilV(lj_tab_set(L, registry, &key));
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setnilV(lj_tab_set(L, registry, &key));
-   lj_gc_anybarriert(L, registry);
-   return 0;
-Index: luajit/src/lib_package.c
-===================================================================
---- luajit.orig/src/lib_package.c
-+++ luajit/src/lib_package.c
-@@ -398,7 +398,7 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define sentinel	((void *)0x4004)
-+#define KEY_SENTINEL	(U64x(80000000,00000000)|'s')
- 
- static int lj_cf_package_require(lua_State *L)
- {
-@@ -408,7 +408,7 @@
-   lua_getfield(L, LUA_REGISTRYINDEX, "_LOADED");
-   lua_getfield(L, 2, name);
-   if (lua_toboolean(L, -1)) {  /* is it there? */
--    if (lua_touserdata(L, -1) == sentinel)  /* check loops */
-+    if ((L->top-1)->u64 == KEY_SENTINEL)  /* check loops */
-       luaL_error(L, "loop or previous error loading module " LUA_QS, name);
-     return 1;  /* package is already loaded */
-   }
-@@ -431,14 +431,14 @@
-     else
-       lua_pop(L, 1);
-   }
--  lua_pushlightuserdata(L, sentinel);
-+  (L->top++)->u64 = KEY_SENTINEL;
-   lua_setfield(L, 2, name);  /* _LOADED[name] = sentinel */
-   lua_pushstring(L, name);  /* pass name as argument to module */
-   lua_call(L, 1, 1);  /* run loaded module */
-   if (!lua_isnil(L, -1))  /* non-nil return? */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = returned value */
-   lua_getfield(L, 2, name);
--  if (lua_touserdata(L, -1) == sentinel) {   /* module did not set a value? */
-+  if ((L->top-1)->u64 == KEY_SENTINEL) {   /* module did not set a value? */
-     lua_pushboolean(L, 1);  /* use true as result */
-     lua_pushvalue(L, -1);  /* extra copy to be returned */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = true */
-Index: luajit/src/lib_string.c
-===================================================================
---- luajit.orig/src/lib_string.c
-+++ luajit/src/lib_string.c
-@@ -714,7 +714,7 @@
- 	lj_strfmt_putfchar(sb, sf, lj_lib_checkint(L, arg));
- 	break;
-       case STRFMT_PTR:  /* No formatting. */
--	lj_strfmt_putptr(sb, lj_obj_ptr(L->base+arg-1));
-+	lj_strfmt_putptr(sb, lj_obj_ptr(G(L), L->base+arg-1));
- 	break;
-       default:
- 	lua_assert(0);
-Index: luajit/src/lj_api.c
-===================================================================
---- luajit.orig/src/lj_api.c
-+++ luajit/src/lj_api.c
-@@ -595,7 +595,7 @@
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(G(L), o);
-   else
-     return NULL;
- }
-@@ -608,7 +608,7 @@
- 
- LUA_API const void *lua_topointer(lua_State *L, int idx)
- {
--  return lj_obj_ptr(index2adr(L, idx));
-+  return lj_obj_ptr(G(L), index2adr(L, idx));
- }
- 
- /* -- Stack setters (object creation) ------------------------------------- */
-@@ -694,9 +694,38 @@
-   incr_top(L);
- }
- 
-+#if LJ_64
-+static void *lightud_intern(lua_State *L, void *p)
-+{
-+  global_State *g = G(L);
-+  uint64_t u = (uint64_t)p;
-+  uint32_t up = lightudup(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  MSize segnum = g->gc.lightudnum;
-+  if (segmap) {
-+    MSize seg;
-+    for (seg = 0; seg <= segnum; seg++)
-+      if (segmap[seg] == up)  /* Fast path. */
-+	return (void *)(((uint64_t)seg << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+    segnum++;
-+  }
-+  if (!((segnum-1) & segnum) && segnum != 1) {
-+    if (segnum >= (1 << LJ_LIGHTUD_BITS_SEG)) lj_err_msg(L, LJ_ERR_BADLU);
-+    lj_mem_reallocvec(L, segmap, segnum, segnum ? 2*segnum : 2u, uint32_t);
-+    setmref(g->gc.lightudseg, segmap);
-+  }
-+  g->gc.lightudnum = segnum;
-+  segmap[segnum] = up;
-+  return (void *)(((uint64_t)segnum << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+}
-+#endif
-+
- LUA_API void lua_pushlightuserdata(lua_State *L, void *p)
- {
--  setlightudV(L->top, checklightudptr(L, p));
-+#if LJ_64
-+  p = lightud_intern(L, p);
-+#endif
-+  setrawlightudV(L->top, p);
-   incr_top(L);
- }
- 
-@@ -1138,7 +1167,10 @@
-   fn->c.f = func;
-   setfuncV(L, top++, fn);
-   if (LJ_FR2) setnilV(top++);
--  setlightudV(top++, checklightudptr(L, ud));
-+#if LJ_64
-+  ud = lightud_intern(L, ud);
-+#endif
-+  setrawlightudV(top++, ud);
-   cframe_nres(L->cframe) = 1+0;  /* Zero results. */
-   L->top = top;
-   return top-1;  /* Now call the newly allocated C function. */
-Index: luajit/src/lj_ccall.c
-===================================================================
---- luajit.orig/src/lj_ccall.c
-+++ luajit/src/lj_ccall.c
-@@ -1314,7 +1314,7 @@
-     lj_vm_ffi_call(&cc);
-     if (cts->cb.slot != ~0u) {  /* Blacklist function that called a callback. */
-       TValue tv;
--      setlightudV(&tv, (void *)cc.func);
-+      tv.u64 = ((uintptr_t)(void *)cc.func >> 2) | U64x(800000000, 00000000);
-       setboolV(lj_tab_set(L, cts->miscmap, &tv), 1);
-     }
-     ct = (CType *)((intptr_t)ct+(intptr_t)cts->tab);  /* May be reallocated. */
-Index: luajit/src/lj_cconv.c
-===================================================================
---- luajit.orig/src/lj_cconv.c
-+++ luajit/src/lj_cconv.c
-@@ -611,7 +611,7 @@
-     if (ud->udtype == UDTYPE_IO_FILE)
-       tmpptr = *(void **)tmpptr;
-   } else if (tvislightud(o)) {
--    tmpptr = lightudV(o);
-+    tmpptr = lightudV(cts->g, o);
-   } else if (tvisfunc(o)) {
-     void *p = lj_ccallback_new(cts, d, funcV(o));
-     if (p) {
-Index: luajit/src/lj_crecord.c
-===================================================================
---- luajit.orig/src/lj_crecord.c
-+++ luajit/src/lj_crecord.c
-@@ -643,8 +643,7 @@
-     }
-   } else if (tref_islightud(sp)) {
- #if LJ_64
--    sp = emitir(IRT(IR_BAND, IRT_P64), sp,
--		lj_ir_kint64(J, U64x(00007fff,ffffffff)));
-+    lj_trace_err(J, LJ_TRERR_NYICONV);
- #endif
-   } else {  /* NYI: tref_istab(sp). */
-     IRType t;
-@@ -1209,8 +1208,7 @@
-     TRef tr;
-     TValue tv;
-     /* Check for blacklisted C functions that might call a callback. */
--    setlightudV(&tv,
--		cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4));
-+    tv.u64 = ((uintptr_t)cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4) >> 2) | U64x(800000000, 00000000);
-     if (tvistrue(lj_tab_get(J->L, cts->miscmap, &tv)))
-       lj_trace_err(J, LJ_TRERR_BLACKL);
-     if (ctype_isvoid(ctr->info)) {
-Index: luajit/src/lj_dispatch.c
-===================================================================
---- luajit.orig/src/lj_dispatch.c
-+++ luajit/src/lj_dispatch.c
-@@ -302,7 +302,7 @@
-       if (idx != 0) {
- 	cTValue *tv = idx > 0 ? L->base + (idx-1) : L->top + idx;
- 	if (tvislightud(tv))
--	  g->wrapf = (lua_CFunction)lightudV(tv);
-+	  g->wrapf = (lua_CFunction)lightudV(g, tv);
- 	else
- 	  return 0;  /* Failed. */
-       } else {
-Index: luajit/src/lj_ir.c
-===================================================================
---- luajit.orig/src/lj_ir.c
-+++ luajit/src/lj_ir.c
-@@ -386,8 +386,10 @@
-   case IR_KPRI: setpriV(tv, irt_toitype(ir->t)); break;
-   case IR_KINT: setintV(tv, ir->i); break;
-   case IR_KGC: setgcV(L, tv, ir_kgc(ir), irt_toitype(ir->t)); break;
--  case IR_KPTR: case IR_KKPTR: setlightudV(tv, ir_kptr(ir)); break;
--  case IR_KNULL: setlightudV(tv, NULL); break;
-+  case IR_KPTR: case IR_KKPTR:
-+    setnumV(tv, (lua_Number)(uintptr_t)ir_kptr(ir));
-+    break;
-+  case IR_KNULL: setintV(tv, 0); break;
-   case IR_KNUM: setnumV(tv, ir_knum(ir)->n); break;
- #if LJ_HASFFI
-   case IR_KINT64: {
-Index: luajit/src/lj_obj.c
-===================================================================
---- luajit.orig/src/lj_obj.c
-+++ luajit/src/lj_obj.c
-@@ -34,12 +34,13 @@
- }
- 
- /* Return pointer to object or its object data. */
--const void * LJ_FASTCALL lj_obj_ptr(cTValue *o)
-+const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o)
- {
-+  UNUSED(g);
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(g, o);
-   else if (LJ_HASFFI && tviscdata(o))
-     return cdataptr(cdataV(o));
-   else if (tvisgcv(o))
-Index: luajit/src/lj_obj.h
-===================================================================
---- luajit.orig/src/lj_obj.h
-+++ luajit/src/lj_obj.h
-@@ -232,7 +232,7 @@
- **                  ---MSW---.---LSW---
- ** primitive types |  itype  |         |
- ** lightuserdata   |  itype  |  void * |  (32 bit platforms)
--** lightuserdata   |ffff|    void *    |  (64 bit platforms, 47 bit pointers)
-+** lightuserdata   |ffff|seg|    ofs   |  (64 bit platforms)
- ** GC objects      |  itype  |  GCRef  |
- ** int (LJ_DUALNUM)|  itype  |   int   |
- ** number           -------double------
-@@ -245,7 +245,8 @@
- **
- **                     ------MSW------.------LSW------
- ** primitive types    |1..1|itype|1..................1|
--** GC objects/lightud |1..1|itype|-------GCRef--------|
-+** GC objects         |1..1|itype|-------GCRef--------|
-+** lightuserdata      |1..1|itype|seg|------ofs-------|
- ** int (LJ_DUALNUM)   |1..1|itype|0..0|-----int-------|
- ** number              ------------double-------------
- **
-@@ -285,6 +286,12 @@
- #define LJ_GCVMASK		(((uint64_t)1 << 47) - 1)
- #endif
- 
-+#if LJ_64
-+/* To stay within 47 bits, lightuserdata is segmented. */
-+#define LJ_LIGHTUD_BITS_SEG	8
-+#define LJ_LIGHTUD_BITS_LO	(47 - LJ_LIGHTUD_BITS_SEG)
-+#endif
-+
- /* -- String object ------------------------------------------------------- */
- 
- /* String object header. String payload follows. */
-@@ -576,7 +583,11 @@
-   uint8_t currentwhite;	/* Current white color. */
-   uint8_t state;	/* GC state. */
-   uint8_t nocdatafin;	/* No cdata finalizer called. */
--  uint8_t unused2;
-+#if LJ_64
-+  uint8_t lightudnum;	/* Number of lightuserdata segments - 1. */
-+#else
-+  uint8_t unused1;
-+#endif
-   MSize sweepstr;	/* Sweep position in string table. */
-   GCRef root;		/* List of all collectable objects. */
-   MRef sweep;		/* Sweep position in root list. */
-@@ -588,6 +599,9 @@
-   GCSize estimate;	/* Estimate of memory actually in use. */
-   MSize stepmul;	/* Incremental GC step granularity. */
-   MSize pause;		/* Pause between successive GC cycles. */
-+#if LJ_64
-+  MRef lightudseg;	/* Upper bits of lightuserdata segments. */
-+#endif
- } GCState;
- 
- /* Global state, shared by all threads of a Lua universe. */
-@@ -795,10 +809,23 @@
- #endif
- #define boolV(o)	check_exp(tvisbool(o), (LJ_TFALSE - itype(o)))
- #if LJ_64
--#define lightudV(o) \
--  check_exp(tvislightud(o), (void *)((o)->u64 & U64x(00007fff,ffffffff)))
-+#define lightudseg(u) \
-+  (((u) >> LJ_LIGHTUD_BITS_LO) & ((1 << LJ_LIGHTUD_BITS_SEG)-1))
-+#define lightudlo(u) \
-+  ((u) & (((uint64_t)1 << LJ_LIGHTUD_BITS_LO) - 1))
-+#define lightudup(p) \
-+  ((uint32_t)(((p) >> LJ_LIGHTUD_BITS_LO) << (LJ_LIGHTUD_BITS_LO-32)))
-+static LJ_AINLINE void *lightudV(global_State *g, cTValue *o)
-+{
-+  uint64_t u = o->u64;
-+  uint64_t seg = lightudseg(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  lua_assert(tvislightud(o));
-+  lua_assert(seg <= g->gc.lightudnum);
-+  return (void *)(((uint64_t)segmap[seg] << 32) | lightudlo(u));
-+}
- #else
--#define lightudV(o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
-+#define lightudV(g, o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
- #endif
- #define gcV(o)		check_exp(tvisgcv(o), gcval(o))
- #define strV(o)		check_exp(tvisstr(o), &gcval(o)->str)
-@@ -824,7 +851,7 @@
- #define setpriV(o, i)		(setitype((o), (i)))
- #endif
- 
--static LJ_AINLINE void setlightudV(TValue *o, void *p)
-+static LJ_AINLINE void setrawlightudV(TValue *o, void *p)
- {
- #if LJ_GC64
-   o->u64 = (uint64_t)p | (((uint64_t)LJ_TLIGHTUD) << 47);
-@@ -835,24 +862,14 @@
- #endif
- }
- 
--#if LJ_64
--#define checklightudptr(L, p) \
--  (((uint64_t)(p) >> 47) ? (lj_err_msg(L, LJ_ERR_BADLU), NULL) : (p))
--#else
--#define checklightudptr(L, p)	(p)
--#endif
--
--#if LJ_FR2
-+#if LJ_FR2 || LJ_32
- #define contptr(f)		((void *)(f))
- #define setcont(o, f)		((o)->u64 = (uint64_t)(uintptr_t)contptr(f))
--#elif LJ_64
-+#else
- #define contptr(f) \
-   ((void *)(uintptr_t)(uint32_t)((intptr_t)(f) - (intptr_t)lj_vm_asm_begin))
- #define setcont(o, f) \
-   ((o)->u64 = (uint64_t)(void *)(f) - (uint64_t)lj_vm_asm_begin)
--#else
--#define contptr(f)		((void *)(f))
--#define setcont(o, f)		setlightudV((o), contptr(f))
- #endif
- 
- #define tvchecklive(L, o) \
-@@ -978,6 +995,6 @@
- 
- /* Compare two objects without calling metamethods. */
- LJ_FUNC int LJ_FASTCALL lj_obj_equal(cTValue *o1, cTValue *o2);
--LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(cTValue *o);
-+LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o);
- 
- #endif
-Index: luajit/src/lj_snap.c
-===================================================================
---- luajit.orig/src/lj_snap.c
-+++ luajit/src/lj_snap.c
-@@ -626,7 +626,12 @@
-   IRType1 t = ir->t;
-   RegSP rs = ir->prev;
-   if (irref_isk(ref)) {  /* Restore constant slot. */
--    lj_ir_kvalue(J->L, o, ir);
-+    if (ir->o == IR_KPTR) {
-+      o->u64 = (uint64_t)(uintptr_t)ir_kptr(ir);
-+    } else {
-+      lua_assert(!(ir->o == IR_KKPTR || ir->o == IR_KNULL));
-+      lj_ir_kvalue(J->L, o, ir);
-+    }
-     return;
-   }
-   if (LJ_UNLIKELY(bloomtest(rfilt, ref)))
-Index: luajit/src/lj_state.c
-===================================================================
---- luajit.orig/src/lj_state.c
-+++ luajit/src/lj_state.c
-@@ -171,6 +171,12 @@
-   lj_mem_freevec(g, g->strhash, g->strmask+1, GCRef);
-   lj_buf_free(g, &g->tmpbuf);
-   lj_mem_freevec(g, tvref(L->stack), L->stacksize, TValue);
-+#if LJ_64
-+  if (mref(g->gc.lightudseg, uint32_t)) {
-+    MSize segnum = g->gc.lightudnum ? (2 << lj_fls(g->gc.lightudnum)) : 2;
-+    lj_mem_freevec(g, mref(g->gc.lightudseg, uint32_t), segnum, uint32_t);
-+  }
-+#endif
-   lua_assert(g->gc.total == sizeof(GG_State));
- #ifndef LUAJIT_USE_SYSMALLOC
-   if (g->allocf == lj_alloc_f)
-Index: luajit/src/lj_strfmt.c
-===================================================================
---- luajit.orig/src/lj_strfmt.c
-+++ luajit/src/lj_strfmt.c
-@@ -393,7 +393,7 @@
-       p = lj_buf_wmem(p, "builtin#", 8);
-       p = lj_strfmt_wint(p, funcV(o)->c.ffid);
-     } else {
--      p = lj_strfmt_wptr(p, lj_obj_ptr(o));
-+      p = lj_strfmt_wptr(p, lj_obj_ptr(G(L), o));
-     }
-     return lj_str_new(L, buf, (size_t)(p - buf));
-   }
diff --git a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
index 96e4c9106acf9..ac2a967c974d4 100644
--- a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
+++ b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
@@ -1,16 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Tue, 17 Nov 2015 16:27:11 +0100
-Subject: Enable debugging symbols in the build
-
----
- src/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git src/Makefile src/Makefile
-index 8a38efd..6b73a89 100644
+diff --git a/src/Makefile b/src/Makefile
+index 3a6a432..b606927 100644
 --- a/src/Makefile
 +++ b/src/Makefile
-@@ -54,9 +54,9 @@ CCOPT_arm64=
+@@ -53,9 +53,9 @@ CCOPT_arm64=
  CCOPT_ppc=
  CCOPT_mips=
  #
diff --git a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch b/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
deleted file mode 100644
index f53e211071063..0000000000000
--- a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
+++ /dev/null
@@ -1,33 +0,0 @@
---- a/src/jit/bcsave.lua	2018-12-17 19:06:27.215042417 +0100
-+++ b/src/jit/bcsave.lua	2018-12-17 19:17:12.982477945 +0100
-@@ -64,7 +64,7 @@
- 
- local map_arch = {
-   x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
--  ppc = true, mips = true, mipsel = true,
-+  ppc = true, ppc64 = true, ppc64le = true, mips = true, mipsel = true,
- }
- 
- local map_os = {
-@@ -200,9 +200,10 @@
- ]]
-   local symname = LJBC_PREFIX..ctx.modname
-   local is64, isbe = false, false
--  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
-+  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" or ctx.arch == "ppc64" or ctx.arch == "ppc64le" then
-     is64 = true
--  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
-+  end
-+  if ctx.arch == "ppc" or ctx.arch == "ppc64" or ctx.arch == "mips" then
-     isbe = true
-   end
- 
-@@ -237,7 +238,7 @@
-   hdr.eendian = isbe and 2 or 1
-   hdr.eversion = 1
-   hdr.type = f16(1)
--  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
-+  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, ppc64=21, ppc64le=21, mips=8, mipsel=8 })[ctx.arch])
-   if ctx.arch == "mips" or ctx.arch == "mipsel" then
-     hdr.flags = f32(0x50001006)
-   end
diff --git a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
index 59e1ee72fcbb8..207762263de53 100644
--- a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
+++ b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
@@ -1,18 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Thu, 19 Nov 2015 16:29:02 +0200
-Subject: Get rid of LUAJIT_VERSION_SYM that changes ABI on every patch release
-
----
- src/lj_dispatch.c | 5 -----
- src/luajit.c      | 2 --
- src/luajit.h      | 3 ---
- 3 files changed, 10 deletions(-)
-
-diff --git src/lj_dispatch.c src/lj_dispatch.c
-index 5d6795f..e865a78 100644
+diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
+index 7b66be7..6d31a61 100644
 --- a/src/lj_dispatch.c
 +++ b/src/lj_dispatch.c
-@@ -319,11 +319,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
+@@ -318,11 +318,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
    return 1;  /* OK. */
  }
  
@@ -24,23 +14,22 @@ index 5d6795f..e865a78 100644
  /* -- Hooks --------------------------------------------------------------- */
  
  /* This function can be called asynchronously (e.g. during a signal). */
-diff --git src/luajit.c src/luajit.c
-index 1ca2430..ccf425e 100644
+diff --git a/src/luajit.c b/src/luajit.c
+index 73e29d4..31fdba1 100644
 --- a/src/luajit.c
 +++ b/src/luajit.c
-@@ -516,8 +516,6 @@ static int pmain(lua_State *L)
+@@ -515,7 +515,6 @@ static int pmain(lua_State *L)
+   int argn;
+   int flags = 0;
    globalL = L;
-   if (argv[0] && argv[0][0]) progname = argv[0];
- 
 -  LUAJIT_VERSION_SYM();  /* Linker-enforced version check. */
--
+ 
    argn = collectargs(argv, &flags);
    if (argn < 0) {  /* Invalid args? */
-     print_usage();
-diff --git src/luajit.h src/luajit.h
-index 708a5a1..35ae02c 100644
---- a/src/luajit.h
-+++ b/src/luajit.h
+diff --git a/src/luajit_rolling.h b/src/luajit_rolling.h
+index e564477..1c7c142 100644
+--- a/src/luajit_rolling.h
++++ b/src/luajit_rolling.h
 @@ -73,7 +73,4 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
  LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt,
  					     int depth, size_t *len);
diff --git a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch b/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
deleted file mode 100644
index aedaacbaaea46..0000000000000
--- a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
+++ /dev/null
@@ -1,21 +0,0 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Wed, 11 Oct 2017 08:42:41 +0000
-Subject: Make ccall_copy_struct static to unpollute global library namespace
-
----
- src/lj_ccall.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git src/lj_ccall.c src/lj_ccall.c
-index b891591..a7dcc1b 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -960,7 +960,7 @@ noth:  /* Not a homogeneous float/double aggregate. */
-   return 0;  /* Struct is in GPRs. */
- }
- 
--void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
-+static void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
- {
-   if (LJ_ABI_SOFTFP ? ft :
-       ((ft & 3) == FTYPE_FLOAT || (ft >> 2) == FTYPE_FLOAT)) {
diff --git a/srcpkgs/LuaJIT/template b/srcpkgs/LuaJIT/template
index 85449ac3d6f73..d521d528c2a18 100644
--- a/srcpkgs/LuaJIT/template
+++ b/srcpkgs/LuaJIT/template
@@ -1,71 +1,58 @@
 # Template file for 'LuaJIT'
 pkgname=LuaJIT
-version=2.1.0beta3
-revision=2
-_so_version=2.1.0
-_dist_version=${_so_version}-beta3
+version=2.1.1692580715
+revision=1
+_dist_version=2.1.ROLLING
 hostmakedepends="lua52-BitOp"
+build_style=gnu-makefile
 short_desc="Just-In-Time Compiler for Lua"
 maintainer="Orphaned <orphan@voidlinux.org>"
 license="MIT"
 homepage="http://www.luajit.org"
-distfiles="http://luajit.org/download/${pkgname}-${_dist_version}.tar.gz"
-checksum=1ad2e34b111c802f9d0cdf019e986909123237a28c746b21295b63c9e785d9c3
+distfiles="https://repo.or.cz/luajit-2.0.git/snapshot/refs/tags/v${_dist_version}.tar.gz"
+checksum=e4b2e127def9b7c7fa97161c4d2f35d1d273a8c73c8734f97bc54208324e8fea
 
 build_options="lua52compat"
+desc_option_lua52compat="higher compatibility with lua 5.2"
 
-_cross_cc="cc"
-if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
-	if [ "${XBPS_MACHINE/-musl/}" = "x86_64" ]; then
-		hostmakedepends+=" cross-i686-linux-musl"
-		_cross_cc="i686-linux-musl-gcc -static"
-	else
-		broken="Host and target wordsize must match"
+_host_cc="cc"
+if [ -n "$CROSS_BUILD" ]; then
+	if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
+		if [ "${XBPS_MACHINE%%-*}" = "x86_64" ]; then
+			hostmakedepends+=" cross-i686-linux-musl"
+			_host_cc="i686-linux-musl-gcc -static"
+		else
+			broken="Host and target wordsize must match when not on x86_64"
+		fi
 	fi
-fi
 
-# the ppc64 patchset subtly breaks ppc, needs investigation; for
-# now apply patches conditionally, separately for ppc64 and ppc
-post_patch() {
-	local patchdir
+	make_build_args+=" CROSS=${XBPS_CROSS_TRIPLET}-"
+fi
 
-	case "$XBPS_TARGET_MACHINE" in
-		ppc64*) patchdir="ppc64";;
-		ppc*) patchdir="ppc";;
-		*) return;;
-	esac
+pre_build() {
+	if [ "$build_option_lua52compat" ]; then
+		make_build_args+=" XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
+	fi
 
-	for i in ${FILESDIR}/patches/${patchdir}/*.patch; do
-		msg_normal "patching: $i\n"
-		patch -sNp1 -i ${i}
-	done
+	# luajit gets its lowest version from this file if not using git
+	printf '%s' "${version##*.}" > "${wrksrc}/.relver"
 }
 
 do_build() {
+	# if we don't unset, the build fails to cross-compile
+	# due to confliction with the makefile macros
 	local _cflags=$CFLAGS
 	local _ldflags=$LDFLAGS
-
-	if [ "$CROSS_BUILD" ]; then
-		local cross="CROSS=${XBPS_CROSS_TRIPLET}-"
-	fi
-
-	if [ "$build_option_lua52compat" ]; then
-		local _xcflags="XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
-	fi
-
 	unset CFLAGS LDFLAGS
-	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 HOST_CC="${_cross_cc}" \
+
+	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 \
 		HOST_CFLAGS="$XBPS_CFLAGS" HOST_LDFLAGS="$XBPS_LDFLAGS" \
 		TARGET_CFLAGS="${_cflags}" TARGET_LDFLAGS="${_ldflags}" \
-		${_xcflags} ${cross}
+		HOST_CC="${_host_cc}" ${make_build_args}
 }
 
-do_install() {
-	make DPREFIX=${DESTDIR}/usr DESTDIR=${DESTDIR} \
-		INSTALL_SHARE=${DESTDIR}/usr/share PREFIX=/usr install
-
+post_install() {
 	mv ${DESTDIR}/usr/bin/luajit-* ${DESTDIR}/usr/bin/luajit
-	ln -fs libluajit-5.1.so.${_so_version} ${DESTDIR}/usr/lib/libluajit-5.1.so.2
 	vlicense COPYRIGHT
 }
 
diff --git a/srcpkgs/LuaJIT/update b/srcpkgs/LuaJIT/update
index 15613910677c9..96bbbd453c32c 100644
--- a/srcpkgs/LuaJIT/update
+++ b/srcpkgs/LuaJIT/update
@@ -1 +1 @@
-site="http://luajit.org/download.html"
+disabled="impossible to determine given release style and versioning scheme"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR PATCH] [Updated] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
@ 2024-01-31  8:17 ` yoshiyoshyosh
  2024-01-31 17:41 ` [PR REVIEW] " Chocimier
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31  8:17 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2770 bytes --]

There is an updated pull request by yoshiyoshyosh against master on the void-packages repository

https://github.com/yoshiyoshyosh/void-packages luajit-2.1-rolling
https://github.com/void-linux/void-packages/pull/48453

LuaJIT: update to 2.1.1692580715, cleanup.
#### Testing the changes
- I tested the changes in this PR: **briefly**
  - The only lua thing I really run is awesomewm. I built awesomewm against luajit with the build option and everything seems good, but of course any further testing is encouraged.

#### Local build testing
- I built this PR locally for my native architecture, (`x86_64-glibc`)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - `x86_64-musl`
  - `i686-glibc` (both crossbuild and masterdir)
  - `aarch64-glibc` (crossbuild)
  - `aarch64-musl` (crossbuild)
  - `armv7l-glibc` (crossbuild)

This addresses #48349.

LuaJIT has moved to "rolling releases" on branches in their git repo, which basically means releases are git commits to a `v2.1` branch. Of course, this is incompatible with void's packaging philosophy. However, there also seems to be a `v2.1` *tag* that was created during the move and not updated since. I'm unsure on whether this tag is simply meant to be a marker for the start of v2.1 in the new rolling release era, or if they intend for it to be a stable tag that "releases" might occasionally get pushed to every now and then.
Whatever the case, this is a tag that was "released" in a form they seemingly deem stable enough, which is why I think of it as enough to update to (especially since we'd be getting off a 6 year old version to a 5 month old version now).

Regarding the version number: In the makefiles, there exists a `RELVER` macro [that gets set by a `git show` command](https://repo.or.cz/luajit-2.0.git/blob/2090842410e0ba6f81fad310a77bf5432488249a:/src/Makefile#l478). The "canonical" version number in the makefiles then becomes `major.minor.relver` and the binary/library version is installed with this version number. This is the only real patch version number that we have, so it's what I believe should go in the version number. I just used the same `git show` that they use and baked it into `version`

I removed all but two of the patches, as they have either been upstreamed into the `v2.1` tag or were for powerpc, which void doesn't support anymore. Should we even have the "enable debug symbols" patch for main repo builds instead of leaving it to `-dbg`? I'm only keeping it because every other distribution keeps it.

I also just cleaned up the template in general; it's more concise and organized IMO while achieving the same thing.

A patch file from https://github.com/void-linux/void-packages/pull/48453.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-luajit-2.1-rolling-48453.patch --]
[-- Type: text/x-diff, Size: 158326 bytes --]

From 144bac441c0ad7f77679b614d66fef3fa3a0da26 Mon Sep 17 00:00:00 2001
From: yosh <yosh-git@riseup.net>
Date: Wed, 31 Jan 2024 02:54:09 -0500
Subject: [PATCH] LuaJIT: update to 2.1.1692580715, cleanup.

---
 .../patches/ppc/musl-ppc-secureplt.patch      |   93 -
 .../patches/ppc64/add-ppc64-support.patch     | 3521 -----------------
 .../patches/ppc64/fix-vm-jit-ppc64.patch      |   11 -
 .../aarch64-Fix-exit-stub-patching.patch      |  231 --
 .../aarch64-register-allocation-bug-fix.patch |   29 -
 ...1abec542e6f9851ff2368e7f196b6382a44c.patch |  562 ---
 .../LuaJIT/patches/enable-debug-symbols.patch |   14 +-
 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch |   33 -
 .../get-rid-of-luajit-version-sym.patch       |   37 +-
 .../patches/unpollute-global-namespace.patch  |   21 -
 srcpkgs/LuaJIT/template                       |   73 +-
 srcpkgs/LuaJIT/update                         |    2 +-
 12 files changed, 47 insertions(+), 4580 deletions(-)
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch

diff --git a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch b/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
deleted file mode 100644
index 3000ca0ed3d53..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-Imported from https://github.com/LuaJIT/LuaJIT/pull/486.
-
-This fixes crashes on ppc-musl, as musl only supports secureplt.
-
---- a/src/lj_dispatch.c
-+++ b/src/lj_dispatch.c
-@@ -56,6 +56,18 @@ static const ASMFunction dispatch_got[] = {
- #undef GOTFUNC
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+#include <math.h>
-+LJ_FUNCA_NORET void LJ_FASTCALL lj_ffh_coroutine_wrap_err(lua_State *L,
-+							  lua_State *co);
-+
-+#define GOTFUNC(name)	(ASMFunction)name,
-+static const ASMFunction dispatch_got[] = {
-+  GOTDEF(GOTFUNC)
-+};
-+#undef GOTFUNC
-+#endif
-+
- /* Initialize instruction dispatch table and hot counters. */
- void lj_dispatch_init(GG_State *GG)
- {
-@@ -77,6 +89,9 @@ void lj_dispatch_init(GG_State *GG)
- #if LJ_TARGET_MIPS
-   memcpy(GG->got, dispatch_got, LJ_GOT__MAX*sizeof(ASMFunction *));
- #endif
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+  memcpy(GG->got, dispatch_got, LJ_GOT__MAX*4);
-+#endif
- }
- 
- #if LJ_HASJIT
---- a/src/lj_dispatch.h
-+++ b/src/lj_dispatch.h
-@@ -66,6 +66,21 @@ GOTDEF(GOTENUM)
- };
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+/* Need our own global offset table for the dreaded MIPS calling conventions. */
-+#define GOTDEF(_) \
-+  _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \
-+  _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \
-+  _(pow) _(fmod) _(ldexp) _(sqrt)
-+
-+enum {
-+#define GOTENUM(name) LJ_GOT_##name,
-+GOTDEF(GOTENUM)
-+#undef GOTENUM
-+  LJ_GOT__MAX
-+};
-+#endif
-+
- /* Type of hot counter. Must match the code in the assembler VM. */
- /* 16 bits are sufficient. Only 0.0015% overhead with maximum slot penalty. */
- typedef uint16_t HotCount;
-@@ -89,7 +104,7 @@ typedef uint16_t HotCount;
- typedef struct GG_State {
-   lua_State L;				/* Main thread. */
-   global_State g;			/* Global state. */
--#if LJ_TARGET_MIPS
-+#if LJ_TARGET_MIPS || (LJ_TARGET_PPC && (LJ_ARCH_BITS == 32))
-   ASMFunction got[LJ_GOT__MAX];		/* Global offset table. */
- #endif
- #if LJ_HASJIT
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -59,7 +59,12 @@
- |.define ENV_OFS,	8
- |.endif
- |.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro blex, target
-+|  lwz TMP0, DISPATCH_GOT(target)(DISPATCH)
-+|  mtctr TMP0
-+|  bctrl
-+|  //bl extern target@plt
-+|.endmacro
- |.macro .toc, a, b; .endmacro
- |.endif
- |.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-@@ -448,6 +453,8 @@
- |// Assumes DISPATCH is relative to GL.
- #define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
- #define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
-+#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
-+#define DISPATCH_GOT(name)	(GG_DISP2GOT + 4*LJ_GOT_##name)
- |
- #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
- |
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch b/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
deleted file mode 100644
index 7c865859da923..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
+++ /dev/null
@@ -1,3521 +0,0 @@
-From: "Rodrigo R. Galvao" <rosattig@br.ibm.com>
-Date: Wed, 11 Oct 2017 08:41:47 +0000
-Subject: New patch proposal for PPC64 support
-
- Create a patch for PPC64 support based on 
-https://github.com/LuaJIT/LuaJIT/pull/140.
- It replaces the old patch since this new one is more likely to be merged 
-with luajit upstream.
-
-
-Author: Rodrigo R. Galvao <rosattig@br.ibm.com>
----
- dynasm/dasm_ppc.lua    |    5 +
- src/Makefile           |   11 +-
- src/host/buildvm_asm.c |   16 +-
- src/lj_arch.h          |   18 +-
- src/lj_ccall.c         |  166 ++++++-
- src/lj_ccall.h         |   13 +
- src/lj_ccallback.c     |   68 ++-
- src/lj_ctype.h         |    2 +-
- src/lj_def.h           |    4 +
- src/lj_frame.h         |    9 +
- src/lj_target_ppc.h    |   14 +
- src/vm_ppc.dasc        | 1290 ++++++++++++++++++++++++++++++++----------------
- 12 files changed, 1162 insertions(+), 454 deletions(-)
-
-diff --git dynasm/dasm_ppc.lua dynasm/dasm_ppc.lua
-index f73974d..a4ad70b 100644
---- a/dynasm/dasm_ppc.lua
-+++ b/dynasm/dasm_ppc.lua
-@@ -257,9 +257,11 @@ map_op = {
-   addic_3 =	"30000000RRI",
-   ["addic._3"] = "34000000RRI",
-   addi_3 =	"38000000RR0I",
-+  addil_3 =	"38000000RR0J",
-   li_2 =	"38000000RI",
-   la_2 =	"38000000RD",
-   addis_3 =	"3c000000RR0I",
-+  addisl_3 =	"3c000000RR0J",
-   lis_2 =	"3c000000RI",
-   lus_2 =	"3c000000RU",
-   bc_3 =	"40000000AAK",
-@@ -842,6 +844,9 @@ map_op = {
-   srdi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "64-("..p[3]..")"
-   end),
-+  ["srdi._3"] =	op_alias("rldicl._4", function(p)
-+    p[4] = p[3]; p[3] = "64-("..p[3]..")"
-+  end),
-   clrldi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "0"
-   end),
-diff --git src/Makefile src/Makefile
-index 6b73a89..cc50bae 100644
---- a/src/Makefile
-+++ b/src/Makefile
-@@ -453,7 +453,16 @@ ifeq (ppc,$(TARGET_LJARCH))
-     DASM_AFLAGS+= -D GPR64
-   endif
-   ifeq (PS3,$(TARGET_SYS))
--    DASM_AFLAGS+= -D PPE -D TOC
-+    DASM_AFLAGS+= -D PPE
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPD 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPD
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPDENV 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPDENV
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_ELFV2 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D ELFV2
-   endif
-   ifneq (,$(findstring LJ_ARCH_PPC64 ,$(TARGET_TESTARCH)))
-     DASM_ARCH= ppc64
-diff --git src/host/buildvm_asm.c src/host/buildvm_asm.c
-index ffd1490..6bb995e 100644
---- a/src/host/buildvm_asm.c
-+++ b/src/host/buildvm_asm.c
-@@ -140,18 +140,14 @@ static void emit_asm_wordreloc(BuildCtx *ctx, uint8_t *p, int n,
- #else
- #define TOCPREFIX ""
- #endif
--  if ((ins >> 26) == 16) {
-+  if ((ins >> 26) == 14) {
-+    fprintf(ctx->fp, "\taddi %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 15) {
-+    fprintf(ctx->fp, "\taddis %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 16) {
-     fprintf(ctx->fp, "\t%s %d, %d, " TOCPREFIX "%s\n",
- 	    (ins & 1) ? "bcl" : "bc", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-   } else if ((ins >> 26) == 18) {
--#if LJ_ARCH_PPC64
--    const char *suffix = strchr(sym, '@');
--    if (suffix && suffix[1] == 'h') {
--      fprintf(ctx->fp, "\taddis 11, 2, %s\n", sym);
--    } else if (suffix && suffix[1] == 'l') {
--      fprintf(ctx->fp, "\tld 12, %s\n", sym);
--    } else
--#endif
-     fprintf(ctx->fp, "\t%s " TOCPREFIX "%s\n", (ins & 1) ? "bl" : "b", sym);
-   } else {
-     fprintf(stderr,
-@@ -250,7 +246,7 @@ void emit_asm(BuildCtx *ctx)
-   int i, rel;
- 
-   fprintf(ctx->fp, "\t.file \"buildvm_%s.dasc\"\n", ctx->dasm_arch);
--#if LJ_ARCH_PPC64
-+#if LJ_ARCH_PPC_ELFV2
-   fprintf(ctx->fp, "\t.abiversion 2\n");
- #endif
-   fprintf(ctx->fp, "\t.text\n");
-diff --git src/lj_arch.h src/lj_arch.h
-index d609b37..53bc651 100644
---- a/src/lj_arch.h
-+++ b/src/lj_arch.h
-@@ -269,10 +269,18 @@
- #if LJ_TARGET_CONSOLE
- #define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOFFI		1
-+#if LJ_TARGET_PS3
-+#define LJ_ARCH_PPC_OPD		1
-+#endif
- #elif LJ_ARCH_BITS == 64
--#define LJ_ARCH_PPC64		1
--#define LJ_TARGET_GC64		1
-+#define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOJIT		1	/* NYI */
-+#if _CALL_ELF == 2
-+#define LJ_ARCH_PPC_ELFV2	1
-+#else
-+#define LJ_ARCH_PPC_OPD		1
-+#define LJ_ARCH_PPC_OPDENV	1
-+#endif
- #endif
- 
- #if _ARCH_PWR7
-@@ -423,12 +431,6 @@
- #if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
- #error "No support for PowerPC CPUs without double-precision FPU"
- #endif
--#if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
--#error "No support for little-endian PPC32"
--#endif
--#if LJ_ARCH_PPC64
--#error "No support for PowerPC 64 bit mode (yet)"
--#endif
- #ifdef __NO_FPRS__
- #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
- #endif
-diff --git src/lj_ccall.c src/lj_ccall.c
-index 5c252e5..b891591 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -369,21 +369,97 @@
- #elif LJ_TARGET_PPC
- /* -- PPC calling conventions --------------------------------------------- */
- 
-+#if LJ_ARCH_BITS == 64
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  if (sz > 16 && ccall_classify_fp(cts, ctr) <= 0) { \
-+    cc->retref = 1;  /* Return by reference. */ \
-+    cc->gpr[ngpr++] = (GPRArg)dp; \
-+  }
-+
-+#define CCALL_HANDLE_STRUCTRET2 \
-+  int isfp = ccall_classify_fp(cts, ctr); \
-+  int i; \
-+  if (isfp == FTYPE_FLOAT) { \
-+    for (i = 0; i < ctr->size / 4; i++) \
-+      ((float *)dp)[i] = cc->fpr[i]; \
-+  } else if (isfp == FTYPE_DOUBLE) { \
-+    for (i = 0; i < ctr->size / 8; i++) \
-+      ((double *)dp)[i] = cc->fpr[i]; \
-+  } else { \
-+    if (ctr->size < 8 && LJ_BE) { \
-+      sp += 8 - ctr->size; \
-+    } \
-+    memcpy(dp, sp, ctr->size); \
-+  }
-+
-+#else
-+
- #define CCALL_HANDLE_STRUCTRET \
-   cc->retref = 1;  /* Return all structs by reference. */ \
-   cc->gpr[ngpr++] = (GPRArg)dp;
- 
-+#endif
-+
- #define CCALL_HANDLE_COMPLEXRET \
-   /* Complex values are returned in 2 or 4 GPRs. */ \
-   cc->retref = 0;
- 
-+#define CCALL_HANDLE_STRUCTARG
-+
- #define CCALL_HANDLE_COMPLEXRET2 \
--  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+  if (ctr->size == 2*sizeof(float)) {  /* Copy complex float from FPRs. */ \
-+    ((float *)dp)[0] = cc->fpr[0]; \
-+    ((float *)dp)[1] = cc->fpr[1]; \
-+  } else {  /* Copy complex double from FPRs. */ \
-+    ((double *)dp)[0] = cc->fpr[0]; \
-+    ((double *)dp)[1] = cc->fpr[1]; \
-+  }
-+
-+#define CCALL_HANDLE_COMPLEXARG \
-+  isfp = 1; \
-+  if (d->size == sizeof(float) * 2) { \
-+    d = ctype_get(cts, CTID_COMPLEX_DOUBLE); \
-+    isf32 = 1; \
-+  }
-+
-+#define CCALL_HANDLE_REGARG \
-+  if (isfp && d->size == sizeof(float)) { \
-+    d = ctype_get(cts, CTID_DOUBLE); \
-+    isf32 = 1; \
-+  } \
-+  if (ngpr < maxgpr) { \
-+   dp = &cc->gpr[ngpr]; \
-+   ngpr += n; \
-+   if (ngpr > maxgpr) { \
-+     nsp += ngpr - 8; \
-+     ngpr = 8; \
-+     if (nsp > CCALL_MAXSTACK) { \
-+       goto err_nyi; \
-+     } \
-+   } \
-+   goto done; \
-+  }
-+
-+#else
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  cc->retref = 1;  /* Return all structs by reference. */ \
-+  cc->gpr[ngpr++] = (GPRArg)dp;
-+
-+#define CCALL_HANDLE_COMPLEXRET \
-+  /* Complex values are returned in 2 or 4 GPRs. */ \
-+  cc->retref = 0;
- 
- #define CCALL_HANDLE_STRUCTARG \
-   rp = cdataptr(lj_cdata_new(cts, did, sz)); \
-   sz = CTSIZE_PTR;  /* Pass all structs by reference. */
- 
-+#define CCALL_HANDLE_COMPLEXRET2 \
-+  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+
- #define CCALL_HANDLE_COMPLEXARG \
-   /* Pass complex by value in 2 or 4 GPRs. */
- 
-@@ -410,6 +486,8 @@
-     } \
-   }
- 
-+#endif
-+
- #define CCALL_HANDLE_RET \
-   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
-     ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
-@@ -801,6 +879,50 @@ noth:  /* Not a homogeneous float/double aggregate. */
- 
- #endif
- 
-+/* -- PowerPC64 ELFv2 ABI struct classification ------------------- */
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define FTYPE_FLOAT	1
-+#define FTYPE_DOUBLE	2
-+
-+static unsigned int ccall_classify_fp(CTState *cts, CType *ct) {
-+  if (ctype_isfp(ct->info)) {
-+    if (ct->size == sizeof(float))
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_iscomplex(ct->info)) {
-+    if (ct->size == sizeof(float) * 2)
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_isstruct(ct->info)) {
-+    int res = -1;
-+    int sz = ct->size;
-+    while (ct->sib) {
-+      ct = ctype_get(cts, ct->sib);
-+      if (ctype_isfield(ct->info)) {
-+        int sub = ccall_classify_fp(cts, ctype_rawchild(cts, ct));
-+        if (res == -1)
-+          res = sub;
-+        if (sub != -1 && sub != res)
-+          return 0;
-+      } else if (ctype_isbitfield(ct->info) ||
-+        ctype_isxattrib(ct->info, CTA_SUBTYPE)) {
-+        return 0;
-+      }
-+    }
-+    if (res > 0 && sz > res * 4 * 8)
-+      return 0;
-+    return res;
-+  } else {
-+    return 0;
-+  }
-+}
-+
-+#endif
-+
- /* -- MIPS64 ABI struct classification ---------------------------- */
- 
- #if LJ_TARGET_MIPS64
-@@ -974,6 +1096,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     CTSize sz;
-     MSize n, isfp = 0, isva = 0;
-     void *dp, *rp = NULL;
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    int isf32 = 0;
-+#endif
- 
-     if (fid) {  /* Get argument type from field. */
-       CType *ctf = ctype_get(cts, fid);
-@@ -1030,7 +1155,37 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-       *(void **)dp = rp;
-       dp = rp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64 && LJ_BE
-+    if (ctype_isstruct(d->info) && sz < CTSIZE_PTR) {
-+      dp = (char *)dp + (CTSIZE_PTR - sz);
-+    }
-+#endif
-     lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg));
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (isfp) {
-+      int i;
-+      for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+        cc->fpr[nfpr++] = ((double *)dp)[i];
-+    }
-+    if (isf32) {
-+      int i;
-+      for (i = 0; i < d->size / 8; i++)
-+        ((float *)dp)[i*2] = ((double *)dp)[i];
-+    }
-+#endif
-+#if LJ_ARCH_PPC_ELFV2
-+    if (ctype_isstruct(d->info)) {
-+      isfp = ccall_classify_fp(cts, d);
-+      int i;
-+      if (isfp == FTYPE_FLOAT) {
-+        for (i = 0; i < d->size / 4 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((float *)dp)[i];
-+      } else if (isfp == FTYPE_DOUBLE) {
-+        for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((double *)dp)[i];
-+      }
-+    }
-+#endif
-     /* Extend passed integers to 32 bits at least. */
-     if (ctype_isinteger_or_bool(d->info) && d->size < 4) {
-       if (d->info & CTF_UNSIGNED)
-@@ -1044,6 +1199,15 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     if (isfp && d->size == sizeof(float))
-       ((float *)dp)[1] = ((float *)dp)[0];  /* Floats occupy high slot. */
- #endif
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info))
-+	&& d->size <= 4) {
-+      if (d->info & CTF_UNSIGNED)
-+	*(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info)
- #if LJ_TARGET_MIPS64
-diff --git src/lj_ccall.h src/lj_ccall.h
-index 59f6648..bbf309f 100644
---- a/src/lj_ccall.h
-+++ b/src/lj_ccall.h
-@@ -86,10 +86,23 @@ typedef union FPRArg {
- #elif LJ_TARGET_PPC
- 
- #define CCALL_NARG_GPR		8
-+#if LJ_ARCH_BITS == 64
-+#define CCALL_NARG_FPR		13
-+#if LJ_ARCH_PPC_ELFV2
-+#define CCALL_NRET_GPR		2
-+#define CCALL_NRET_FPR		8
-+#define CCALL_SPS_EXTRA		14
-+#else
-+#define CCALL_NRET_GPR		1
-+#define CCALL_NRET_FPR		2
-+#define CCALL_SPS_EXTRA		16
-+#endif
-+#else
- #define CCALL_NARG_FPR		8
- #define CCALL_NRET_GPR		4	/* For complex double. */
- #define CCALL_NRET_FPR		1
- #define CCALL_SPS_EXTRA		4
-+#endif
- #define CCALL_SPS_FREE		0
- 
- typedef intptr_t GPRArg;
-diff --git src/lj_ccallback.c src/lj_ccallback.c
-index 846827b..eb7f445 100644
---- a/src/lj_ccallback.c
-+++ b/src/lj_ccallback.c
-@@ -61,8 +61,24 @@ static MSize CALLBACK_OFS2SLOT(MSize ofs)
- 
- #elif LJ_TARGET_PPC
- 
-+#if LJ_ARCH_PPC_OPD
-+
-+#define CALLBACK_SLOT2OFS(slot)		(24*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/24)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_OFS2SLOT(CALLBACK_MCODE_SIZE))
-+
-+#elif LJ_ARCH_PPC_ELFV2
-+
-+#define CALLBACK_SLOT2OFS(slot)		(4*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/4)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_MCODE_SIZE/4 - 10)
-+
-+#else
-+
- #define CALLBACK_MCODE_HEAD		24
- 
-+#endif
-+
- #elif LJ_TARGET_MIPS32
- 
- #define CALLBACK_MCODE_HEAD		20
-@@ -188,24 +204,59 @@ static void callback_mcode_init(global_State *g, uint32_t *page)
-   lua_assert(p - page <= CALLBACK_MCODE_SIZE);
- }
- #elif LJ_TARGET_PPC
-+#if LJ_ARCH_PPC_OPD
-+register void *vm_toc __asm__("r2");
-+static void callback_mcode_init(global_State *g, uint64_t *page)
-+{
-+  uint64_t *p = page;
-+  void *target = (void *)lj_vm_ffi_callback;
-+  MSize slot;
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p++ = (uint64_t)target;
-+    *p++ = (uint64_t)vm_toc;
-+    *p++ = (uint64_t)g | ((uint64_t)slot << 47);
-+  }
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 8);
-+}
-+#else
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-   uint32_t *p = page;
-   void *target = (void *)lj_vm_ffi_callback;
-   MSize slot;
-+#if LJ_ARCH_PPC_ELFV2
-+  // Needs to be in sync with lj_vm_ffi_callback.
-+  lua_assert(CALLBACK_MCODE_SIZE == 4096);
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p = PPCI_B | (((page+CALLBACK_MAX_SLOT-p) & 0x00ffffffu) << 2);
-+    p++;
-+  }
-+  *p++ = PPCI_LI | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 32) & 0xffff);
-+  *p++ = PPCI_LI | PPCF_T(RID_R11) | ((((intptr_t)g) >> 32) & 0xffff);
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_SYS1) | PPCF_A(RID_SYS1) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_R11) | PPCF_A(RID_R11) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_ORIS | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 16) & 0xffff);
-+  *p++ = PPCI_ORIS | PPCF_A(RID_R11) | PPCF_T(RID_R11) | ((((intptr_t)g) >> 16) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | (((intptr_t)target) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11) | PPCF_T(RID_R11) | (((intptr_t)g) & 0xffff);
-+  *p++ = PPCI_MTCTR | PPCF_T(RID_SYS1);
-+  *p++ = PPCI_BCTR;
-+#else
-   *p++ = PPCI_LIS | PPCF_T(RID_TMP) | (u32ptr(target) >> 16);
--  *p++ = PPCI_LIS | PPCF_T(RID_R12) | (u32ptr(g) >> 16);
-+  *p++ = PPCI_LIS | PPCF_T(RID_R11) | (u32ptr(g) >> 16);
-   *p++ = PPCI_ORI | PPCF_A(RID_TMP)|PPCF_T(RID_TMP) | (u32ptr(target) & 0xffff);
--  *p++ = PPCI_ORI | PPCF_A(RID_R12)|PPCF_T(RID_R12) | (u32ptr(g) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11)|PPCF_T(RID_R11) | (u32ptr(g) & 0xffff);
-   *p++ = PPCI_MTCTR | PPCF_T(RID_TMP);
-   *p++ = PPCI_BCTR;
-   for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
--    *p++ = PPCI_LI | PPCF_T(RID_R11) | slot;
-+    *p++ = PPCI_LI | PPCF_T(RID_R12) | slot;
-     *p = PPCI_B | (((page-p) & 0x00ffffffu) << 2);
-     p++;
-   }
--  lua_assert(p - page <= CALLBACK_MCODE_SIZE);
-+#endif
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 4);
- }
-+#endif
- #elif LJ_TARGET_MIPS
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-@@ -641,6 +692,15 @@ static void callback_conv_result(CTState *cts, lua_State *L, TValue *o)
- 	*(int32_t *)dp = ctr->size == 1 ? (int32_t)*(int8_t *)dp :
- 					  (int32_t)*(int16_t *)dp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (ctr->size <= 4 &&
-+       (ctype_isinteger_or_bool(ctr->info) || ctype_isenum(ctr->info))) {
-+      if (ctr->info & CTF_UNSIGNED)
-+        *(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     /* Always sign-extend results to 64 bits. Even a soft-fp 'float'. */
-     if (ctr->size <= 4 &&
-diff --git src/lj_ctype.h src/lj_ctype.h
-index 0c220a8..105865b 100644
---- a/src/lj_ctype.h
-+++ b/src/lj_ctype.h
-@@ -153,7 +153,7 @@ typedef struct CType {
- 
- /* Simplify target-specific configuration. Checked in lj_ccall.h. */
- #define CCALL_MAX_GPR		8
--#define CCALL_MAX_FPR		8
-+#define CCALL_MAX_FPR		14
- 
- typedef LJ_ALIGN(8) union FPRCBArg { double d; float f[2]; } FPRCBArg;
- 
-diff --git src/lj_def.h src/lj_def.h
-index 2d8fff6..381d6f5 100644
---- a/src/lj_def.h
-+++ b/src/lj_def.h
-@@ -71,7 +71,11 @@ typedef unsigned int uintptr_t;
- #define LJ_MAX_IDXCHAIN	100		/* __index/__newindex chain limit. */
- #define LJ_STACK_EXTRA	(5+2*LJ_FR2)	/* Extra stack space (metamethods). */
- 
-+#if defined(__powerpc64__) && _CALL_ELF != 2
-+#define LJ_NUM_CBPAGE	4		/* Number of FFI callback pages. */
-+#else
- #define LJ_NUM_CBPAGE	1		/* Number of FFI callback pages. */
-+#endif
- 
- /* Minimum table/buffer sizes. */
- #define LJ_MIN_GLOBAL	6		/* Min. global table size (hbits). */
-diff --git src/lj_frame.h src/lj_frame.h
-index 19c49a4..c666418 100644
---- a/src/lj_frame.h
-+++ b/src/lj_frame.h
-@@ -210,6 +210,15 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
- #define CFRAME_OFS_MULTRES	408
- #define CFRAME_SIZE		384
- #define CFRAME_SHIFT_MULTRES	3
-+#elif LJ_ARCH_PPC_ELFV2
-+#define CFRAME_OFS_ERRF		360
-+#define CFRAME_OFS_NRES		356
-+#define CFRAME_OFS_PREV		336
-+#define CFRAME_OFS_L		352
-+#define CFRAME_OFS_PC		348
-+#define CFRAME_OFS_MULTRES	344
-+#define CFRAME_SIZE		368
-+#define CFRAME_SHIFT_MULTRES	3
- #elif LJ_ARCH_PPC32ON64
- #define CFRAME_OFS_ERRF		472
- #define CFRAME_OFS_NRES		468
-diff --git src/lj_target_ppc.h src/lj_target_ppc.h
-index c5c991a..f0c8c94 100644
---- a/src/lj_target_ppc.h
-+++ b/src/lj_target_ppc.h
-@@ -30,8 +30,13 @@ enum {
- 
-   /* Calling conventions. */
-   RID_RET = RID_R3,
-+#if LJ_LE
-+  RID_RETHI = RID_R4,
-+  RID_RETLO = RID_R3,
-+#else
-   RID_RETHI = RID_R3,
-   RID_RETLO = RID_R4,
-+#endif
-   RID_FPRET = RID_F1,
- 
-   /* These definitions must match with the *.dasc file(s): */
-@@ -131,6 +136,8 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define PPCF_C(r)	((r) << 6)
- #define PPCF_MB(n)	((n) << 6)
- #define PPCF_ME(n)	((n) << 1)
-+#define PPCF_SH(n)	((((n) & 31) << (11+1)) | (((n) & 32) >> (5-1)))
-+#define PPCF_M6(n)	((((n) & 31) << (5+1)) | (((n) & 32) << (11-5)))
- #define PPCF_Y		0x00200000
- #define PPCF_DOT	0x00000001
- 
-@@ -200,6 +207,13 @@ typedef enum PPCIns {
-   PPCI_RLWINM = 0x54000000,
-   PPCI_RLWIMI = 0x50000000,
- 
-+  PPCI_RLDICL = 0x78000000,
-+  PPCI_RLDICR = 0x78000004,
-+  PPCI_RLDIC = 0x78000008,
-+  PPCI_RLDIMI = 0x7800000c,
-+  PPCI_RLDCL = 0x78000010,
-+  PPCI_RLDCR = 0x78000012,
-+
-   PPCI_B = 0x48000000,
-   PPCI_BL = 0x48000001,
-   PPCI_BC = 0x40800000,
-diff --git src/vm_ppc.dasc src/vm_ppc.dasc
-index b4260eb..abb381e 100644
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -22,35 +22,40 @@
- |// GPR64   64 bit registers (but possibly 32 bit pointers, e.g. PS3).
- |//         Affects reg saves, stack layout, carry/overflow/dot flags etc.
- |// FRAME32 Use 32 bit frame layout, even with GPR64 (Xbox 360).
--|// TOC     Need table of contents (64 bit or 32 bit variant, e.g. PS3).
-+|// OPD     Need function descriptors (64 bit or 32 bit variant, e.g. PS3).
- |//         Function pointers are really a struct: code, TOC, env (optional).
--|// TOCENV  Function pointers have an environment pointer, too (not on PS3).
-+|// OPDENV  Function pointers have an environment pointer, too (not on PS3).
-+|// ELFV2   The 64-bit ELF V2 ABI is in use.
- |// PPE     Power Processor Element of Cell (PS3) or Xenon (Xbox 360).
- |//         Must avoid (slow) micro-coded instructions.
- |
- |.if P64
--|.define TOC, 1
--|.define TOCENV, 1
- |.macro lpx, a, b, c; ldx a, b, c; .endmacro
- |.macro lp, a, b; ld a, b; .endmacro
- |.macro stp, a, b; std a, b; .endmacro
-+|.macro stpx, a, b, c; stdx a, b, c; .endmacro
- |.define decode_OPP, decode_OP8
--|.if FFI
--|// Missing: Calling conventions, 64 bit regs, TOC.
--|.error lib_ffi not yet implemented for PPC64
--|.endif
-+|.define PSIZE, 8
- |.else
- |.macro lpx, a, b, c; lwzx a, b, c; .endmacro
- |.macro lp, a, b; lwz a, b; .endmacro
- |.macro stp, a, b; stw a, b; .endmacro
-+|.macro stpx, a, b, c; stwx a, b, c; .endmacro
- |.define decode_OPP, decode_OP4
-+|.define PSIZE, 4
- |.endif
- |
- |// Convenience macros for TOC handling.
--|.if TOC
-+|.if OPD or ELFV2
- |// Linker needs a TOC patch area for every external call relocation.
--|.macro blex, target; bl extern target@plt; nop; .endmacro
-+|.macro blex, target; bl extern target; nop; .endmacro
- |.macro .toc, a, b; a, b; .endmacro
-+|.else
-+|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro .toc, a, b; .endmacro
-+|.endif
-+|.if OPD
-+|.macro .opd, a, b; a, b; .endmacro
- |.if P64
- |.define TOC_OFS,	 8
- |.define ENV_OFS,	16
-@@ -58,13 +63,13 @@
- |.define TOC_OFS,	4
- |.define ENV_OFS,	8
- |.endif
--|.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
--|.macro .toc, a, b; .endmacro
-+|.else  // No OPD.
-+|.macro .opd, a, b; .endmacro
- |.endif
--|.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-+|.macro .opdenv, a, b; .if OPDENV; a, b; .endif; .endmacro
- |
- |.macro .gpr64, a, b; .if GPR64; a, b; .endif; .endmacro
-+|.macro .elfv2, a, b; .if ELFV2; a, b; .endif; .endmacro
- |
- |.macro andix., y, a, i
- |.if PPE
-@@ -75,29 +80,6 @@
- |.endif
- |.endmacro
- |
--|.macro clrso, reg
--|.if PPE
--|  li reg, 0
--|  mtxer reg
--|.else
--|  mcrxr cr0
--|.endif
--|.endmacro
--|
--|.macro checkov, reg, noov
--|.if PPE
--|  mfxer reg
--|  add reg, reg, reg
--|  cmpwi reg, 0
--|   li reg, 0
--|   mtxer reg
--|  bgey noov
--|.else
--|  mcrxr cr0
--|  bley noov
--|.endif
--|.endmacro
--|
- |//-----------------------------------------------------------------------
- |
- |// Fixed register assignments for the interpreter.
-@@ -111,6 +93,7 @@
- |.define LREG,		r18	// Register holding lua_State (also in SAVE_L).
- |.define MULTRES,	r19	// Size of multi-result: (nresults+1)*8.
- |.define JGL,		r31	// On-trace: global_State + 32768.
-+|.define BASEP4,	r25	// Equal to BASE + 4
- |
- |// Constants for type-comparisons, stores and conversions. C callee-save.
- |.define TISNUM,	r22
-@@ -143,12 +126,19 @@
- |
- |.define FARG1,		f1
- |.define FARG2,		f2
-+|.define FARG3,		f3
-+|.define FARG4,		f4
-+|.define FARG5,		f5
-+|.define FARG6,		f6
-+|.define FARG7,		f7
-+|.define FARG8,		f8
- |
- |.define CRET1,		r3
- |.define CRET2,		r4
- |
- |.define TOCREG,	r2	// TOC register (only used by C code).
- |.define ENVREG,	r11	// Environment pointer (nested C functions).
-+|.define FUNCREG,	r12	// ELFv2 function pointer (overlaps RD)
- |
- |// Stack layout while in interpreter. Must match with lj_frame.h.
- |.if GPR64
-@@ -182,6 +172,49 @@
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
- |
-+|.elif ELFV2
-+|
-+|//			392(sp) // \ 32 bit C frame info.
-+|.define SAVE_LR,	384(sp)
-+|.define SAVE_CR,	376(sp) // 64 bit CR save.
-+|.define CFRAME_SPACE,	368     // Delta for sp.
-+|// Back chain for sp:	368(sp) <-- sp entering interpreter
-+|.define SAVE_ERRF,	360(sp) // |
-+|.define SAVE_NRES,	356(sp) // |
-+|.define SAVE_L,	352(sp) //  > Parameter save area.
-+|.define SAVE_PC,	348(sp) // |
-+|.define SAVE_MULTRES,	344(sp) // |
-+|.define SAVE_CFRAME,	336(sp) // / 64 bit C frame chain.
-+|.define SAVE_FPR_,	192     // .. 192+18*8: 64 bit FPR saves.
-+|.define SAVE_GPR_,	48      // .. 48+18*8: 64 bit GPR saves.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	44(sp)
-+|.define TMPD_LO,	40(sp)
-+|.define TONUM_HI,	36(sp)
-+|.define TONUM_LO,	32(sp)
-+|.else
-+|.define TMPD_LO,	44(sp)
-+|.define TMPD_HI,	40(sp)
-+|.define TONUM_LO,	36(sp)
-+|.define TONUM_HI,	32(sp)
-+|.endif
-+|.define SAVE_TOC,	24(sp)  // TOC save area.
-+|// Next frame lr:	16(sp)
-+|// Next frame cr:	8(sp)
-+|// Back chain for sp:	0(sp)	<-- sp while in interpreter
-+|
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
-+|.define TMPD_BLO,	39(sp)
-+|.define TMPD,		TMPD_HI
-+|.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	32
-+|
- |.else
- |
- |//			508(sp) // \ 32 bit C frame info.
-@@ -192,23 +225,39 @@
- |.define SAVE_MULTRES,	456(sp) // |
- |.define SAVE_CFRAME,	448(sp) // / 64 bit C frame chain.
- |.define SAVE_LR,	416(sp)
-+|.define SAVE_CR,	408(sp)  // 64 bit CR save.
- |.define CFRAME_SPACE,	400     // Delta for sp.
- |// Back chain for sp:	400(sp) <-- sp entering interpreter
- |.define SAVE_FPR_,	256     // .. 256+18*8: 64 bit FPR saves.
- |.define SAVE_GPR_,	112     // .. 112+18*8: 64 bit GPR saves.
- |//			48(sp)  // Callee parameter save area (ABI mandated).
- |.define SAVE_TOC,	40(sp)  // TOC save area.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	36(sp)  // \ Link editor temp (ABI mandated).
-+|.define TMPD_LO,	32(sp)  // /
-+|.define TONUM_HI,	28(sp)  // \ Compiler temp (ABI mandated).
-+|.define TONUM_LO,	24(sp)  // /
-+|.else
- |.define TMPD_LO,	36(sp)  // \ Link editor temp (ABI mandated).
- |.define TMPD_HI,	32(sp)  // /
- |.define TONUM_LO,	28(sp)  // \ Compiler temp (ABI mandated).
- |.define TONUM_HI,	24(sp)  // /
-+|.endif
- |// Next frame lr:	16(sp)
--|.define SAVE_CR,	8(sp)  // 64 bit CR save.
-+|// Next frame cr:	8(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	39(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	112
- |
- |.endif
- |.else
-@@ -226,16 +275,31 @@
- |.define SAVE_PC,	32(sp)
- |.define SAVE_MULTRES,	28(sp)
- |.define UNUSED1,	24(sp)
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	20(sp)
-+|.define TMPD_LO,	16(sp)
-+|.define TONUM_HI,	12(sp)
-+|.define TONUM_LO,	8(sp)
-+|.else
- |.define TMPD_LO,	20(sp)
- |.define TMPD_HI,	16(sp)
- |.define TONUM_LO,	12(sp)
- |.define TONUM_HI,	8(sp)
-+|.endif
- |// Next frame lr:	4(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	16(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	23(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	16
- |
- |.endif
- |
-@@ -350,8 +414,35 @@
- |//-----------------------------------------------------------------------
- |
- |// Access to frame relative to BASE.
-+|.if ENDIAN_LE
-+|.define FRAME_PC,	-4
-+|.define FRAME_FUNC,	-8
-+|.define FRAME_CONTPC,	-12
-+|.define FRAME_CONTRET,	-16
-+|.define WORD_LO,	0
-+|.define WORD_HI,	4
-+|.define WORD_BLO,	0
-+|.define BASE_LO,	BASE
-+|.define BASE_HI,	BASEP4
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux lo, base, idx
-+|  lwz hi, 4(base)
-+|.endmacro
-+|.else
- |.define FRAME_PC,	-8
- |.define FRAME_FUNC,	-4
-+|.define FRAME_CONTPC,	-16
-+|.define FRAME_CONTRET,	-12
-+|.define WORD_LO,	4
-+|.define WORD_HI,	0
-+|.define WORD_BLO,	7
-+|.define BASE_LO,	BASEP4
-+|.define BASE_HI,	BASE
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux hi, base, idx
-+|  lwz lo, 4(base)
-+|.endmacro
-+|.endif
- |
- |// Instruction decode.
- |.macro decode_OP4, dst, ins; rlwinm dst, ins, 2, 22, 29; .endmacro
-@@ -412,6 +503,7 @@
- |// Call decode and dispatch.
- |.macro ins_callt
- |  // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
-+|  addi BASEP4, BASE, 4
- |  lwz PC, LFUNC:RB->pc
- |  lwz INS, 0(PC)
- |   addi PC, PC, 4
-@@ -504,7 +596,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz PC, FRAME_PC(TMP2)		// Fetch PC of previous frame.
-   |  mr BASE, TMP2			// Restore caller base.
-   |  // Prepending may overwrite the pcall frame, so do it at the end.
--  |   stwu TMP1, FRAME_PC(RA)		// Prepend true to results.
-+  |  .if ENDIAN_LE
-+  |    addi RA, RA, -8
-+  |    stw TMP1, WORD_HI(RA)		// Prepend true to results.
-+  |  .else
-+  |    stwu TMP1, -8(RA)		// Prepend true to results.
-+  |  .endif
-   |
-   |->vm_returnc:
-   |  addi RD, RD, 8			// RD = (nresults+1)*8.
-@@ -560,7 +657,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP1, L->maxstack
-   |  cmplw BASE, TMP1
-   |  bge >8
--  |  stw TISNIL, 0(BASE)
-+  |  stw TISNIL, WORD_HI(BASE)
-   |  addi RD, RD, 8
-   |  addi BASE, BASE, 8
-   |  b <2
-@@ -611,7 +708,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_unwind_ff_eh:			// Landing pad for external unwinder.
-   |  lwz L, SAVE_L
-   |  .toc ld TOCREG, SAVE_TOC
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp BASE, L->base
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
-@@ -626,7 +728,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)			// Results start at BASE-8.
-   |     stw TMP3, TMPD
-   |   addi DISPATCH, DISPATCH, GG_G2DISP
--  |  stw TMP1, 0(RA)			// Prepend false to error message.
-+  |  stw TMP1, WORD_HI(RA)		// Prepend false to error message.
-   |  li RD, 16				// 2 results: false + error message.
-   |    st_vmstate
-   |     lfs TONUM, TMPD
-@@ -687,7 +789,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mr RA, BASE
-   |   lp BASE, L->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |  lwz PC, FRAME_PC(BASE)
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-@@ -737,7 +844,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:  // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype).
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |  add PC, PC, BASE
-@@ -757,8 +869,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->vm_call_dispatch:
-   |  // TMP2 = old base, BASE = new base, RC = nargs*8, PC = caller PC
--  |  lwz TMP0, FRAME_PC(BASE)
--  |   lwz LFUNC:RB, FRAME_FUNC(BASE)
-+  |  lwz TMP0, WORD_HI-8(BASE)
-+  |   lwz LFUNC:RB, WORD_LO-8(BASE)
-   |  checkfunc TMP0; bne ->vmeta_call
-   |
-   |->vm_call_dispatch_f:
-@@ -777,7 +889,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   sub TMP0, TMP0, TMP1		// Compute -savestack(L, L->top).
-   |    lp TMP1, L->cframe
-   |     addi DISPATCH, DISPATCH, GG_G2DISP
--  |  .toc lp CARG4, 0(CARG4)
-+  |  .opd lp TOCREG, TOC_OFS(CARG4)
-+  |  .opdenv lp ENVREG, ENV_OFS(CARG4)
-+  |  .opd lp CARG4, 0(CARG4)
-   |  li TMP2, 0
-   |   stw TMP0, SAVE_NRES		// Neg. delta means cframe w/o frame.
-   |  stw TMP2, SAVE_ERRF		// No error function.
-@@ -785,7 +899,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stp sp, L->cframe		// Add our C frame to cframe chain.
-   |     stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mtctr CARG4
-+  |  .elfv2 mr FUNCREG, CARG4
-   |  bctrl			// (lua_State *L, lua_CFunction func, void *ud)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |.if PPE
-   |  mr BASE, CRET1
-   |  cmpwi CRET1, 0
-@@ -807,20 +923,27 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_dispatch:
-   |  // BASE = meta base, RA = resultptr, RD = (nresults+1)*8
--  |  lwz TMP0, -12(BASE)		// Continuation.
-+  |  lwz TMP0, FRAME_CONTRET(BASE)	// Continuation.
-   |   mr RB, BASE
-   |   mr BASE, TMP2			// Restore caller BASE.
-   |    lwz LFUNC:TMP1, FRAME_FUNC(TMP2)
-   |.if FFI
-   |  cmplwi TMP0, 1
-   |.endif
--  |     lwz PC, -16(RB)			// Restore PC from [cont|PC].
--  |   subi TMP2, RD, 8
-+  |     lwz PC, FRAME_CONTPC(RB)	// Restore PC from [cont|PC].
-+  |  addi BASEP4, BASE, 4
-+  |   addi TMP2, RD, WORD_HI-8
-   |    lwz TMP1, LFUNC:TMP1->pc
-   |   stwx TISNIL, RA, TMP2		// Ensure one valid arg.
-+  |.if P64
-+  |   ld TMP3, 0(DISPATCH)
-+  |.endif
-   |.if FFI
-   |  ble >1
-   |.endif
-+  |.if P64
-+  |  add TMP0, TMP0, TMP3
-+  |.endif
-   |    lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // BASE = base, RA = resultptr, RB = meta base
-   |  mtctr TMP0
-@@ -856,20 +979,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgetb:			// TMP0 = index
-@@ -880,8 +1003,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -909,7 +1032,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 16			// 2 args for func(t, k).
-@@ -923,7 +1046,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f14, 0(CRET1)
-   |  b ->BC_TGETR_Z
-   |1:
--  |  stwx TISNIL, BASE, RA
-+  |  stwx TISNIL, BASE_HI, RA
-   |  b ->cont_nop
-   |
-   |//-----------------------------------------------------------------------
-@@ -932,20 +1055,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsetb:			// TMP0 = index
-@@ -956,8 +1079,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -986,7 +1109,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
-@@ -1006,17 +1129,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_comp:
-   |  mr CARG1, L
-   |   subi PC, PC, 4
--  |.if DUALNUM
--  |  mr CARG2, RA
--  |.else
-   |  add CARG2, BASE, RA
--  |.endif
-   |   stw PC, SAVE_PC
--  |.if DUALNUM
--  |  mr CARG3, RD
--  |.else
-   |  add CARG3, BASE, RD
--  |.endif
-   |   stp BASE, L->base
-   |  decode_OP1 CARG4, INS
-   |  bl extern lj_meta_comp  // (lua_State *L, TValue *o1, *o2, int op)
-@@ -1043,7 +1158,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b ->cont_nop
-   |
-   |->cont_condt:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is true.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1051,7 +1166,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b <4
-   |
-   |->cont_condf:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is false.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1103,8 +1218,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |
-   |->vmeta_unm:
--  |  mr CARG3, RD
--  |  mr CARG4, RD
-+  |  add CARG3, BASE, RD
-+  |  add CARG4, BASE, RD
-   |  b >1
-   |
-   |->vmeta_arith_vn:
-@@ -1139,7 +1254,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_binop:
-   |  // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2
-   |  sub TMP1, CRET1, BASE
--  |   stw PC, -16(CRET1)		// [cont|PC]
-+  |   stw PC, FRAME_CONTPC(CRET1)	// [cont|PC]
-   |   mr TMP2, BASE
-   |  addi PC, TMP1, FRAME_CONT
-   |   mr BASE, CRET1
-@@ -1150,7 +1265,7 @@ static void build_subroutines(BuildCtx *ctx)
- #if LJ_52
-   |  mr SAVE0, CARG1
- #endif
--  |  mr CARG2, RD
-+  |  add CARG2, BASE, RD
-   |   stp BASE, L->base
-   |  mr CARG1, L
-   |   stw PC, SAVE_PC
-@@ -1227,25 +1342,25 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_1, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_2, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG4, 8(BASE)
--  |   lwz CARG1, 4(BASE)
--  |    lwz CARG2, 12(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG4, WORD_HI+8(BASE)
-+  |   lwz CARG1, WORD_LO(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_n, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1254,9 +1369,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_nn, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |    lfd FARG2, 8(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1279,9 +1394,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplw cr1, CARG3, TMP1
-   |    lwz PC, FRAME_PC(BASE)
-   |  bge cr1, ->fff_fallback
--  |   stw CARG3, 0(RA)
-+  |   stw CARG3, WORD_HI(RA)
-   |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
--  |   stw CARG1, 4(RA)
-+  |   stw CARG1, WORD_LO(RA)
-   |  beq ->fff_res			// Done if exactly 1 argument.
-   |  li TMP1, 8
-   |  subi RC, RC, 8
-@@ -1295,17 +1410,36 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc type
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |  blt ->fff_fallback
-   |  .gpr64 extsw CARG1, CARG1
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG1, 15
-+  |  subfc TMP1, TMP0, CARG1
-+  |.else
-   |  subfc TMP0, TISNUM, CARG1
--  |  subfe TMP2, CARG1, CARG1
-+  |.endif
-+  |    subfe TMP2, CARG1, CARG1
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >1
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 3
-+  |2:
-   |   la TMP2, CFUNC:RB->upvalue
-   |  lfdx FARG1, TMP2, TMP1
-   |  b ->fff_resn
-+  |.if P64
-+  |1:
-+  |  li TMP1, ~LJ_TLIGHTUD<<3
-+  |  b <2
-+  |.endif
-   |
-   |//-- Base library: getters and setters ---------------------------------
-   |
-@@ -1328,10 +1462,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  sub TMP1, TMP0, TMP1
-   |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-   |3:  // Rearranged logic, because we expect _not_ to find the key.
--  |  lwz CARG4, NODE:TMP2->key
--  |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--  |    lwz CARG2, NODE:TMP2->val
--  |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+  |  lwz CARG4, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+  |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+  |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+  |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-   |  checkstr CARG4; bne >4
-   |   cmpw TMP0, STR:RC; beq >5
-   |4:
-@@ -1349,14 +1483,33 @@ static void build_subroutines(BuildCtx *ctx)
-   |6:
-   |  cmpwi CARG3, LJ_TUDATA; beq <1
-   |  .gpr64 extsw CARG3, CARG3
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG3, 15
-+  |  subfc TMP1, TMP0, CARG3
-+  |.else
-   |  subfc TMP0, TISNUM, CARG3
-+  |.endif
-   |  subfe TMP2, CARG3, CARG3
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >7
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 2
-+  |8:
-   |   la TMP2, DISPATCH_GL(gcroot[GCROOT_BASEMT])(DISPATCH)
-   |  lwzx TAB:CARG1, TMP2, TMP1
-   |  b <2
-+  |.if P64
-+  |7:
-+  |  li TMP1, ~LJ_TLIGHTUD<<2
-+  |  b <8
-+  |.endif
-   |
-   |.ffunc_2 setmetatable
-   |  // Fast path: no mt for table yet and not clearing the mt.
-@@ -1374,8 +1527,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc rawget
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG4, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checktab CARG4; bne ->fff_fallback
-   |   la CARG3, 8(BASE)
-@@ -1390,7 +1543,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc tonumber
-   |  // Only handles the number case inline (without a base argument).
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly one argument.
-   |   checknum CARG1; bgt ->fff_fallback
-@@ -1425,10 +1578,15 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc next
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-+  |.if ENDIAN_LE
-+  |   add TMP1, BASE, NARGS8:RC
-+  |   stw TISNIL, WORD_HI(TMP1)		// Set missing 2nd arg to nil.
-+  |.else
-   |   stwx TISNIL, BASE, NARGS8:RC	// Set missing 2nd arg to nil.
-+  |.endif
-   |  checktab CARG1
-   |   lwz PC, FRAME_PC(BASE)
-   |  bne ->fff_fallback
-@@ -1464,18 +1622,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, CFUNC:RB->upvalue[0]
-   |  la RA, -8(BASE)
- #endif
--  |   stw TISNIL, 8(BASE)
-+  |   stw TISNIL, 8+WORD_HI(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-   |
-   |.ffunc ipairs_aux
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz TAB:CARG1, 4(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz TAB:CARG1, WORD_LO(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP2, 12(BASE)
-+  |    lwz TMP2, 8+WORD_LO(BASE)
-   |.else
-   |    lfd FARG2, 8(BASE)
-   |.endif
-@@ -1504,16 +1662,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |   la RA, -8(BASE)
-   |  cmplw TMP0, TMP2
-   |.if DUALNUM
--  |  stw TISNUM, 0(RA)
-+  |  stw TISNUM, WORD_HI(RA)
-   |   slwi TMP3, TMP2, 3
--  |  stw TMP2, 4(RA)
-+  |  stw TMP2, WORD_LO(RA)
-   |.else
-   |   slwi TMP3, TMP2, 3
-   |  stfd FARG2, 0(RA)
-   |.endif
-   |  ble >2				// Not in array part?
--  |  lwzx TMP2, TMP1, TMP3
--  |  lfdx f0, TMP1, TMP3
-+  |  lfdux f0, TMP1, TMP3
-+  |  lwz TMP2, WORD_HI(TMP1)
-   |1:
-   |  checknil TMP2
-   |   li RD, (0+1)*8
-@@ -1532,7 +1690,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplwi CRET1, 0
-   |   li RD, (0+1)*8
-   |  beq ->fff_res
--  |  lwz TMP2, 0(CRET1)
-+  |  lwz TMP2, WORD_HI(CRET1)
-   |  lfd f0, 0(CRET1)
-   |  b <1
-   |
-@@ -1551,11 +1709,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)
- #endif
-   |.if DUALNUM
--  |  stw TISNUM, 8(BASE)
-+  |  stw TISNUM, 8+WORD_HI(BASE)
-   |.else
--  |  stw ZERO, 8(BASE)
-+  |  stw ZERO, 8+WORD_HI(BASE)
-   |.endif
--  |   stw ZERO, 12(BASE)
-+  |   stw ZERO, 8+WORD_LO(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-@@ -1576,7 +1734,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc xpcall
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |    lfd FARG2, 8(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-@@ -1673,7 +1831,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if resume
-   |  li TMP1, LJ_TTRUE
-   |   la RA, -8(BASE)
--  |  stw TMP1, -8(BASE)			// Prepend true to results.
-+  |  stw TMP1, WORD_HI-8(BASE)		// Prepend true to results.
-   |  addi RD, RD, 16
-   |.else
-   |  mr RA, BASE
-@@ -1693,7 +1851,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, 0(TMP3)
-   |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
-   |    li RD, (2+1)*8
--  |   stw TMP1, -8(BASE)		// Prepend false to results.
-+  |   stw TMP1, WORD_HI-8(BASE)		// Prepend false to results.
-   |    la RA, -8(BASE)
-   |  stfd f0, 0(BASE)			// Copy error message.
-   |  b <7
-@@ -1746,8 +1904,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_resi:
-   |  lwz PC, FRAME_PC(BASE)
-   |  la RA, -8(BASE)
--  |  stw TISNUM, -8(BASE)
--  |  stw CRET1, -4(BASE)
-+  |  stw TISNUM, WORD_HI-8(BASE)
-+  |  stw CRET1, WORD_LO-8(BASE)
-   |  b ->fff_res1
-   |1:
-   |  lus CARG3, 0x41e0	// 2^31.
-@@ -1762,9 +1920,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_restv:
-   |  // CARG3/CARG1 = TValue result.
-   |  lwz PC, FRAME_PC(BASE)
--  |   stw CARG3, -8(BASE)
-+  |   stw CARG3, WORD_HI-8(BASE)
-   |  la RA, -8(BASE)
--  |   stw CARG1, -4(BASE)
-+  |   stw CARG1, WORD_LO-8(BASE)
-   |->fff_res1:
-   |  // RA = results, PC = return.
-   |  li RD, (1+1)*8
-@@ -1782,10 +1940,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  ins_next1
-   |  // Adjust BASE. KBASE is assumed to be set for the calling frame.
-   |   sub BASE, RA, TMP0
-+  |   addi BASEP4, BASE, 4
-   |  ins_next2
-   |
-   |6:  // Fill up results with nil.
--  |  subi TMP1, RD, 8
-+  |  addi TMP1, RD, WORD_HI-8
-   |   addi RD, RD, 8
-   |  stwx TISNIL, RA, TMP1
-   |  b <5
-@@ -1898,7 +2057,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc math_log
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1923,13 +2082,13 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if DUALNUM
-   |.ffunc math_ldexp
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |.if GPR64
--  |    lwz CARG2, 12(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |.else
--  |    lwz CARG1, 12(BASE)
-+  |    lwz CARG1, WORD_LO+8(BASE)
-   |.endif
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1961,8 +2120,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stfd FARG1, 0(RA)
-   |  li RD, (2+1)*8
-   |.if DUALNUM
--  |   stw TISNUM, 8(RA)
--  |   stw TMP1, 12(RA)
-+  |   stw TISNUM, WORD_HI+8(RA)
-+  |   stw TMP1, WORD_LO+8(RA)
-   |.else
-   |   stfd FARG2, 8(RA)
-   |.endif
-@@ -1989,9 +2148,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   add TMP2, BASE, NARGS8:RC
-   |  bne >4
-   |1:  // Handle integers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |   bge cr1, ->fff_resi
-   |  checknum CARG4
-   |   xoris TMP0, CARG1, 0x8000
-@@ -2020,7 +2179,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lfd FARG1, 0(BASE)
-   |  bge ->fff_fallback
-   |5:  // Handle numbers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |  lfd FARG2, 0(TMP1)
-   |   bge cr1, ->fff_resn
-@@ -2035,7 +2194,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |  b <5
-   |7:  // Convert integer to number and continue above.
--  |   lwz CARG2, 4(TMP1)
-+  |   lwz CARG2, WORD_LO(TMP1)
-   |  bne ->fff_fallback
-   |  tonum_i FARG2, CARG2
-   |  b <6
-@@ -2043,7 +2202,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc_n name
-   |  li TMP1, 8
-   |1:
-+  |.if ENDIAN_LE
-+  |   add CARG2, BASE, TMP1
-+  |   lwz CARG2, WORD_HI(CARG2)
-+  |.else
-   |   lwzx CARG2, BASE, TMP1
-+  |.endif
-   |   lfdx FARG2, BASE, TMP1
-   |  cmplw cr1, TMP1, NARGS8:RC
-   |   checknum CARG2
-@@ -2067,8 +2231,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc string_byte			// Only handle the 1-arg case here.
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |   checkstr CARG3
-   |   bne ->fff_fallback
-@@ -2099,12 +2263,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_char			// Only handle the 1-arg case here.
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP0, 4(BASE)
-+  |    lwz TMP0, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-   |  checknum CARG3; bne ->fff_fallback
--  |   la CARG2, 7(BASE)
-+  |   la CARG2, WORD_BLO(BASE)
-   |.else
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-@@ -2128,16 +2292,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_sub
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 16(BASE)
-+  |   lwz CARG3, WORD_HI+16(BASE)
-   |.if not DUALNUM
-   |    lfd f0, 16(BASE)
-   |.endif
--  |   lwz TMP0, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz TMP0, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
--  |   lwz CARG2, 8(BASE)
-+  |   lwz CARG2, WORD_HI+8(BASE)
-   |.if DUALNUM
--  |    lwz TMP1, 12(BASE)
-+  |    lwz TMP1, WORD_LO+8(BASE)
-   |.else
-   |    lfd f1, 8(BASE)
-   |.endif
-@@ -2145,7 +2309,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  beq >1
-   |.if DUALNUM
-   |  checknum CARG3
--  |   lwz TMP2, 20(BASE)
-+  |   lwz TMP2, WORD_LO+16(BASE)
-   |  bne ->fff_fallback
-   |1:
-   |  checknum CARG2; bne ->fff_fallback
-@@ -2201,8 +2365,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc string_ .. name
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG2, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checkstr CARG3
-   |   la SBUF:CARG1, DISPATCH_GL(tmpbuf)(DISPATCH)
-@@ -2240,10 +2404,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  addi TMP1, BASE, 8
-   |  add TMP2, BASE, NARGS8:RC
-   |1:
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |.if DUALNUM
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |.else
-   |  lfd FARG1, 0(TMP1)
-   |.endif
-@@ -2344,20 +2508,23 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->fff_fallback:			// Call fast function fallback handler.
-   |  // BASE = new base, RB = CFUNC, RC = nargs*8
--  |  lp TMP3, CFUNC:RB->f
-+  |  lp FUNCREG, CFUNC:RB->f
-   |    add TMP1, BASE, NARGS8:RC
-   |   lwz PC, FRAME_PC(BASE)		// Fallback may overwrite PC.
-   |    addi TMP0, TMP1, 8*LUA_MINSTACK
-   |     lwz TMP2, L->maxstack
-   |   stw PC, SAVE_PC			// Redundant (but a defined value).
--  |  .toc lp TMP3, 0(TMP3)
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-   |  cmplw TMP0, TMP2
-   |     stp BASE, L->base
-   |    stp TMP1, L->top
-   |   mr CARG1, L
-   |  bgt >5				// Need to grow stack.
--  |  mtctr TMP3
-+  |  mtctr FUNCREG
-   |  bctrl				// (lua_State *L)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |  // Either throws an error, or recovers and returns -1, 0 or nresults+1.
-   |  lp BASE, L->base
-   |  cmpwi CRET1, 0
-@@ -2459,6 +2626,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:
-   |  lp BASE, L->base
-   |4:  // Re-dispatch to static ins.
-+  |  addi BASEP4, BASE, 4
-   |  lwz INS, -4(PC)
-   |  decode_OPP TMP1, INS
-   |   decode_RB8 RB, INS
-@@ -2472,7 +2640,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_hook:				// Continue from hook yield.
-   |  addi PC, PC, 4
--  |  lwz MULTRES, -20(RB)		// Restore MULTRES for *M ins.
-+  |  lwz MULTRES, WORD_LO-24(RB)		// Restore MULTRES for *M ins.
-   |  b <4
-   |
-   |->vm_hotloop:			// Hot loop counter underflow.
-@@ -2514,6 +2682,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lp BASE, L->base
-   |   lp TMP0, L->top
-   |   stw ZERO, SAVE_PC			// Invalidate for subsequent line hook.
-+  |  addi BASEP4, BASE, 4
-   |  sub NARGS8:RC, TMP0, BASE
-   |  add RA, BASE, RA
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)
-@@ -2525,7 +2694,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if JIT
-   |  // RA = resultptr, RB = meta base
-   |  lwz INS, -4(PC)
--  |    lwz TRACE:TMP2, -20(RB)		// Save previous trace.
-+  |    lwz TRACE:TMP2, WORD_LO-24(RB)	// Save previous trace.
-   |   addic. TMP1, MULTRES, -8
-   |  decode_RA8 RC, INS			// Call base.
-   |   beq >2
-@@ -2560,10 +2729,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG2, PC
-   |  bl extern lj_dispatch_stitch	// (jit_State *J, const BCIns *pc)
-   |  lp BASE, L->base
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
-   |
-   |9:
-+  |.if ENDIAN_LE
-+  |  addi BASEP4, BASE, 4
-+  |  stwx TISNIL, BASEP4, RC
-+  |.else
-   |  stwx TISNIL, BASE, RC
-+  |.endif
-   |  addi RC, RC, 8
-   |  b <3
-   |.endif
-@@ -2578,6 +2753,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
-   |  lp BASE, L->base
-   |  subi PC, PC, 4
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
- #endif
-   |
-@@ -2586,39 +2762,72 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-----------------------------------------------------------------------
-   |
-   |.macro savex_, a, b, c, d
--  |  stfd f..a, 16+a*8(sp)
--  |  stfd f..b, 16+b*8(sp)
--  |  stfd f..c, 16+c*8(sp)
--  |  stfd f..d, 16+d*8(sp)
-+  |  stfd f..a, EXIT_OFFSET+a*8(sp)
-+  |  stfd f..b, EXIT_OFFSET+b*8(sp)
-+  |  stfd f..c, EXIT_OFFSET+c*8(sp)
-+  |  stfd f..d, EXIT_OFFSET+d*8(sp)
-+  |.endmacro
-+  |
-+  |.macro saver, a
-+  |  stp r..a, EXIT_OFFSET+32*8+a*PSIZE(sp)
-   |.endmacro
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, sp, -(16+32*8+32*4)
--  |  stmw r2, 16+32*8+2*4(sp)
-+  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  saver 3 // CARG1
-+  |  saver 4 // CARG2
-+  |  saver 5 // CARG3
-+  |  saver 17 // DISPATCH
-   |    addi DISPATCH, JGL, -GG_DISP2G-32768
-   |    li CARG2, ~LJ_VMST_EXIT
--  |   lwz CARG1, 16+32*8+32*4(sp)	// Get stack chain.
-+  |   lp CARG1, EXIT_OFFSET+32*8+32*PSIZE(sp)	// Get stack chain.
-   |    stw CARG2, DISPATCH_GL(vmstate)(DISPATCH)
-+  |  saver 2
-+  |  saver 6
-+  |  saver 7
-+  |  saver 8
-+  |  saver 9
-+  |  saver 10
-+  |  saver 11
-+  |  saver 12
-+  |  saver 13
-   |  savex_ 0,1,2,3
--  |   stw CARG1, 0(sp)			// Store extended stack chain.
--  |   clrso TMP1
-+  |   stp CARG1, 0(sp)			// Store extended stack chain.
-+
-   |  savex_ 4,5,6,7
--  |   addi CARG2, sp, 16+32*8+32*4	// Recompute original value of sp.
-+  |  saver 14
-+  |  saver 15
-+  |  saver 16
-+  |  saver 18
-+  |   addi CARG2, sp, EXIT_OFFSET+32*8+32*PSIZE	// Recompute original value of sp.
-   |  savex_ 8,9,10,11
--  |   stw CARG2, 16+32*8+1*4(sp)	// Store sp in RID_SP.
-+  |   stp CARG2, EXIT_OFFSET+32*8+1*PSIZE(sp)	// Store sp in RID_SP.
-   |  savex_ 12,13,14,15
-   |   mflr CARG3
-   |   li TMP1, 0
-   |  savex_ 16,17,18,19
--  |   stw TMP1, 16+32*8+0*4(sp)		// Clear RID_TMP.
-+  |   stw TMP1, EXIT_OFFSET+32*8+0*PSIZE(sp)		// Clear RID_TMP.
-   |  savex_ 20,21,22,23
-   |   lhz CARG4, 2(CARG3)		// Load trace number.
-   |  savex_ 24,25,26,27
-   |  lwz L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  savex_ 28,29,30,31
-+  |  saver 19
-+  |  saver 20
-+  |  saver 21
-+  |  saver 22
-+  |  saver 23
-+  |  saver 24
-+  |  saver 25
-+  |  saver 26
-+  |  saver 27
-+  |  saver 28
-+  |  saver 29
-+  |  saver 30
-+  |  saver 31
-   |   sub CARG3, TMP0, CARG3		// Compute exit number.
--  |  lp BASE, DISPATCH_GL(jit_base)(DISPATCH)
-+  |  lwz BASE, DISPATCH_GL(jit_base)(DISPATCH)
-   |   srwi CARG3, CARG3, 2
-   |  stp L, DISPATCH_J(L)(DISPATCH)
-   |   subi CARG3, CARG3, 2
-@@ -2627,11 +2836,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw TMP1, DISPATCH_GL(jit_base)(DISPATCH)
-   |  addi CARG1, DISPATCH, GG_DISP2J
-   |   stw CARG3, DISPATCH_J(exitno)(DISPATCH)
--  |  addi CARG2, sp, 16
-+  |  addi CARG2, sp, EXIT_OFFSET
-   |  bl extern lj_trace_exit		// (jit_State *J, ExitState *ex)
-   |  // Returns MULTRES (unscaled) or negated error code.
-   |  lp TMP1, L->cframe
--  |  lwz TMP2, 0(sp)
-+  |  lp TMP2, 0(sp)
-   |   lp BASE, L->base
-   |.if GPR64
-   |  rldicr sp, TMP1, 0, 61
-@@ -2639,7 +2848,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  rlwinm sp, TMP1, 0, 0, 29
-   |.endif
-   |   lwz PC, SAVE_PC			// Get SAVE_PC.
--  |  stw TMP2, 0(sp)
-+  |  stp TMP2, 0(sp)
-   |  stw L, SAVE_L			// Set SAVE_L (on-trace resume/yield).
-   |  b >1
-   |.endif
-@@ -2660,7 +2869,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stw TMP2, DISPATCH_GL(jit_base)(DISPATCH)
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // Setup type comparison constants.
-+  |.if P64
-+  |  lus TISNUM, LJ_TISNUM >> 16
-+  |  ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |  li TISNUM, LJ_TISNUM
-+  |.endif
-   |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
-   |  stw TMP3, TMPD
-   |  li ZERO, 0
-@@ -2680,14 +2894,14 @@ static void build_subroutines(BuildCtx *ctx)
-   |   decode_RA8 RA, INS
-   |  lpx TMP0, DISPATCH, TMP1
-   |  mtctr TMP0
--  |  cmplwi TMP1, BC_FUNCF*4		// Function header?
-+  |  cmplwi TMP1, BC_FUNCF*PSIZE	// Function header?
-   |  bge >2
-   |   decode_RB8 RB, INS
-   |   decode_RD8 RD, INS
-   |   decode_RC8 RC, INS
-   |  bctr
-   |2:
--  |  cmplwi TMP1, (BC_FUNCC+2)*4	// Fast function?
-+  |  cmplwi TMP1, (BC_FUNCC+2)*PSIZE	// Fast function?
-   |  blt >3
-   |  // Check frame below fast function.
-   |  lwz TMP1, FRAME_PC(BASE)
-@@ -2697,7 +2911,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP2, -4(TMP1)
-   |  decode_RA8 TMP0, TMP2
-   |  sub TMP1, BASE, TMP0
--  |  lwz LFUNC:TMP2, -12(TMP1)
-+  |  lwz LFUNC:TMP2, WORD_LO-16(TMP1)
-   |  lwz TMP1, LFUNC:TMP2->pc
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |3:
-@@ -2718,6 +2932,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |// NYI: Use internal implementations of floor, ceil, trunc.
-   |
-   |->vm_modi:
-+  |  li TMP1, 0
-+  |  mtxer TMP1
-   |  divwo. TMP0, CARG1, CARG2
-   |  bso >1
-   |.if GPR64
-@@ -2736,7 +2952,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmpwi CARG2, 0
-   |   li CARG1, 0
-   |  beqlr
--  |  clrso TMP0			// Clear SO for -2147483648 % -1 and return 0.
-+  |  // Clear SO for -2147483648 % -1 and return 0.
-+  |  crxor 4*cr0+so, 4*cr0+so, 4*cr0+so
-   |  blr
-   |
-   |//-----------------------------------------------------------------------
-@@ -2749,10 +2966,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_cachesync:
-   |.if JIT or FFI
-   |  // Compute start of first cache line and number of cache lines.
-+  |  .if GPR64
-+  |  rldicr CARG1, CARG1, 0, 58
-+  |  .else
-   |  rlwinm CARG1, CARG1, 0, 0, 26
-+  |  .endif
-   |  sub CARG2, CARG2, CARG1
-   |  addi CARG2, CARG2, 31
-+  |  .if GPR64
-+  |  srdi. CARG2, CARG2, 5
-+  |  .else
-   |  rlwinm. CARG2, CARG2, 27, 5, 31
-+  |  .endif
-   |  beqlr
-   |  mtctr CARG2
-   |  mr CARG3, CARG1
-@@ -2774,39 +2999,70 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-- FFI helper functions -----------------------------------------------
-   |//-----------------------------------------------------------------------
-   |
--  |// Handler for callback functions. Callback slot number in r11, g in r12.
-+  |// Handler for callback functions.
-+  |// 32-bit: Callback slot number in r12, g in r11.
-+  |// 64-bit v1: Callback slot number in bits 47+ of r11, g in 0-46, TOC in r2.
-+  |// 64-bit v2: Callback slot number in bits 2-11 of r12, g in r11,
-+  |// vm_ffi_callback in r2.
-   |->vm_ffi_callback:
-   |.if FFI
-   |.type CTSTATE, CTState, PC
-+  |  .if OPD
-+  |   rldicl r12, r11, 17, 47
-+  |   rldicl r11, r11, 0, 17
-+  |  .endif
-+  |  .if ELFV2
-+  |   rlwinm r12, r12, 30, 22, 31
-+  |   addisl TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@ha
-+  |   addil TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@l
-+  |  .endif
-   |  saveregs
--  |  lwz CTSTATE, GL:r12->ctype_state
--  |   addi DISPATCH, r12, GG_G2DISP
--  |  stw r11, CTSTATE->cb.slot
--  |  stw r3, CTSTATE->cb.gpr[0]
-+  |  lwz CTSTATE, GL:r11->ctype_state
-+  |   addi DISPATCH, r11, GG_G2DISP
-+  |  stw r12, CTSTATE->cb.slot
-+  |  stp r3, CTSTATE->cb.gpr[0]
-   |   stfd f1, CTSTATE->cb.fpr[0]
--  |  stw r4, CTSTATE->cb.gpr[1]
-+  |  stp r4, CTSTATE->cb.gpr[1]
-   |   stfd f2, CTSTATE->cb.fpr[1]
--  |  stw r5, CTSTATE->cb.gpr[2]
-+  |  stp r5, CTSTATE->cb.gpr[2]
-   |   stfd f3, CTSTATE->cb.fpr[2]
--  |  stw r6, CTSTATE->cb.gpr[3]
-+  |  stp r6, CTSTATE->cb.gpr[3]
-   |   stfd f4, CTSTATE->cb.fpr[3]
--  |  stw r7, CTSTATE->cb.gpr[4]
-+  |  stp r7, CTSTATE->cb.gpr[4]
-   |   stfd f5, CTSTATE->cb.fpr[4]
--  |  stw r8, CTSTATE->cb.gpr[5]
-+  |  stp r8, CTSTATE->cb.gpr[5]
-   |   stfd f6, CTSTATE->cb.fpr[5]
--  |  stw r9, CTSTATE->cb.gpr[6]
-+  |  stp r9, CTSTATE->cb.gpr[6]
-   |   stfd f7, CTSTATE->cb.fpr[6]
--  |  stw r10, CTSTATE->cb.gpr[7]
-+  |  stp r10, CTSTATE->cb.gpr[7]
-   |   stfd f8, CTSTATE->cb.fpr[7]
-+  |  .if GPR64
-+  |   stfd f9, CTSTATE->cb.fpr[8]
-+  |   stfd f10, CTSTATE->cb.fpr[9]
-+  |   stfd f11, CTSTATE->cb.fpr[10]
-+  |   stfd f12, CTSTATE->cb.fpr[11]
-+  |   stfd f13, CTSTATE->cb.fpr[12]
-+  |  .endif
-+  |  .if ELFV2
-+  |  addi TMP0, sp, CFRAME_SPACE+96
-+  |  .elif GPR64
-+  |  addi TMP0, sp, CFRAME_SPACE+112
-+  |  .else
-   |  addi TMP0, sp, CFRAME_SPACE+8
--  |  stw TMP0, CTSTATE->cb.stack
-+  |  .endif
-+  |  stp TMP0, CTSTATE->cb.stack
-   |   mr CARG1, CTSTATE
-   |  stw CTSTATE, SAVE_PC		// Any value outside of bytecode is ok.
-   |   mr CARG2, sp
-   |  bl extern lj_ccallback_enter	// (CTState *cts, void *cf)
-   |  // Returns lua_State *.
-   |  lp BASE, L:CRET1->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp RC, L:CRET1->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |     li ZERO, 0
-@@ -2835,9 +3091,21 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG1, CTSTATE
-   |  mr CARG2, RA
-   |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
--  |  lwz CRET1, CTSTATE->cb.gpr[0]
-+  |  lp CRET1, CTSTATE->cb.gpr[0]
-   |  lfd FARG1, CTSTATE->cb.fpr[0]
--  |  lwz CRET2, CTSTATE->cb.gpr[1]
-+  |  lp CRET2, CTSTATE->cb.gpr[1]
-+  |  .if GPR64
-+  |    lfd FARG2, CTSTATE->cb.fpr[1]
-+  |  .else
-+  |    lp CARG3, CTSTATE->cb.gpr[2]
-+  |    lp CARG4, CTSTATE->cb.gpr[3]
-+  |  .endif
-+  |  .elfv2 lfd f3, CTSTATE->cb.fpr[2]
-+  |  .elfv2 lfd f4, CTSTATE->cb.fpr[3]
-+  |  .elfv2 lfd f5, CTSTATE->cb.fpr[4]
-+  |  .elfv2 lfd f6, CTSTATE->cb.fpr[5]
-+  |  .elfv2 lfd f7, CTSTATE->cb.fpr[6]
-+  |  .elfv2 lfd f8, CTSTATE->cb.fpr[7]
-   |  b ->vm_leave_unw
-   |.endif
-   |
-@@ -2850,23 +3118,46 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lbz CARG2, CCSTATE->nsp
-   |   lbz CARG3, CCSTATE->nfpr
-   |  neg TMP1, TMP1
-+  |  .if GPR64
-+  |    std TMP0, 16(sp)
-+  |  .else
-   |    stw TMP0, 4(sp)
-+  |  .endif
-   |   cmpwi cr1, CARG3, 0
-   |  mr TMP2, sp
-   |   addic. CARG2, CARG2, -1
-+  |  .if GPR64
-+  |  stdux sp, sp, TMP1
-+  |  .else
-   |  stwux sp, sp, TMP1
-+  |  .endif
-   |   crnot 4*cr1+eq, 4*cr1+eq		// For vararg calls.
--  |  stw r14, -4(TMP2)
--  |  stw CCSTATE, -8(TMP2)
-+  |  .if GPR64
-+  |    std r14, -8(TMP2)
-+  |    std CCSTATE, -16(TMP2)
-+  |  .else
-+  |    stw r14, -4(TMP2)
-+  |    stw CCSTATE, -8(TMP2)
-+  |  .endif
-   |  mr r14, TMP2
-   |  la TMP1, CCSTATE->stack
-+  |  .if GPR64
-+  |   sldi CARG2, CARG2, 3
-+  |  .else
-   |   slwi CARG2, CARG2, 2
-+  |  .endif
-   |   blty >2
--  |  la TMP2, 8(sp)
-+  |  .if ELFV2
-+  |    la TMP2, 96(sp)
-+  |  .elif GPR64
-+  |    la TMP2, 112(sp)
-+  |  .else
-+  |    la TMP2, 8(sp)
-+  |  .endif
-   |1:
--  |  lwzx TMP0, TMP1, CARG2
--  |  stwx TMP0, TMP2, CARG2
--  |   addic. CARG2, CARG2, -4
-+  |  lpx TMP0, TMP1, CARG2
-+  |  stpx TMP0, TMP2, CARG2
-+  |   addic. CARG2, CARG2, -PSIZE
-   |  bge <1
-   |2:
-   |  bney cr1, >3
-@@ -2878,28 +3169,55 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f6, CCSTATE->fpr[5]
-   |  lfd f7, CCSTATE->fpr[6]
-   |  lfd f8, CCSTATE->fpr[7]
-+  |  .if GPR64
-+  |  lfd f9, CCSTATE->fpr[8]
-+  |  lfd f10, CCSTATE->fpr[9]
-+  |  lfd f11, CCSTATE->fpr[10]
-+  |  lfd f12, CCSTATE->fpr[11]
-+  |  lfd f13, CCSTATE->fpr[12]
-+  |  .endif
-   |3:
--  |   lp TMP0, CCSTATE->func
--  |  lwz CARG2, CCSTATE->gpr[1]
--  |  lwz CARG3, CCSTATE->gpr[2]
--  |  lwz CARG4, CCSTATE->gpr[3]
--  |  lwz CARG5, CCSTATE->gpr[4]
--  |   mtctr TMP0
--  |  lwz r8, CCSTATE->gpr[5]
--  |  lwz r9, CCSTATE->gpr[6]
--  |  lwz r10, CCSTATE->gpr[7]
--  |  lwz CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-+  |  .toc std TOCREG, SAVE_TOC
-+  |   lp FUNCREG, CCSTATE->func
-+  |  lp CARG2, CCSTATE->gpr[1]
-+  |  lp CARG3, CCSTATE->gpr[2]
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-+  |  lp CARG4, CCSTATE->gpr[3]
-+  |  lp CARG5, CCSTATE->gpr[4]
-+  |   mtctr FUNCREG
-+  |  lp r8, CCSTATE->gpr[5]
-+  |  lp r9, CCSTATE->gpr[6]
-+  |  lp r10, CCSTATE->gpr[7]
-+  |  lp CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-   |   bctrl
--  |  lwz CCSTATE:TMP1, -8(r14)
--  |  lwz TMP2, -4(r14)
-+  |   .toc lp TOCREG, SAVE_TOC
-+  |  .if GPR64
-+  |   ld CCSTATE:TMP1, -16(r14)
-+  |   ld TMP2, -8(r14)
-+  |   ld TMP0, 16(r14)
-+  |  .else
-+  |   lwz CCSTATE:TMP1, -8(r14)
-+  |   lwz TMP2, -4(r14)
-   |   lwz TMP0, 4(r14)
--  |  stw CARG1, CCSTATE:TMP1->gpr[0]
-+  |  .endif
-+  |  stp CARG1, CCSTATE:TMP1->gpr[0]
-   |  stfd FARG1, CCSTATE:TMP1->fpr[0]
--  |  stw CARG2, CCSTATE:TMP1->gpr[1]
-+  |  stp CARG2, CCSTATE:TMP1->gpr[1]
-+  |  .if GPR64
-+  |   stfd FARG2, CCSTATE:TMP1->fpr[1]
-+  |  .endif
-+  |  .elfv2 stfd FARG3, CCSTATE:TMP1->fpr[2]
-+  |  .elfv2 stfd FARG4, CCSTATE:TMP1->fpr[3]
-+  |  .elfv2 stfd FARG5, CCSTATE:TMP1->fpr[4]
-+  |  .elfv2 stfd FARG6, CCSTATE:TMP1->fpr[5]
-+  |  .elfv2 stfd FARG7, CCSTATE:TMP1->fpr[6]
-+  |  .elfv2 stfd FARG8, CCSTATE:TMP1->fpr[7]
-   |   mtlr TMP0
--  |  stw CARG3, CCSTATE:TMP1->gpr[2]
-+  |  stp CARG3, CCSTATE:TMP1->gpr[2]
-   |   mr sp, r14
--  |  stw CARG4, CCSTATE:TMP1->gpr[3]
-+  |  stp CARG4, CCSTATE:TMP1->gpr[3]
-   |   mr r14, TMP2
-   |  blr
-   |.endif
-@@ -2923,13 +3241,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzx TMP1, BASE_HI, RD
-     |    lwz TMP2, -4(PC)
-     |  checknum cr0, TMP0
--    |   lwz CARG3, 4(RD)
-+    |   lwzx CARG3, BASE_LO, RD
-     |    decode_RD4 TMP2, TMP2
-     |  checknum cr1, TMP1
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-@@ -2953,7 +3271,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bgt cr0, ->vmeta_comp
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  bgt cr1, ->vmeta_comp
-     |  blt cr1, >4
-     |  // RA is a number, RD is an integer.
-@@ -2965,7 +3283,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA is an integer, RD is a number.
-     |  tonum_i f0, CARG2
-     |4:
--    |  lfd f1, 0(RD)
-+    |  lfdx f1, BASE, RD
-     |5:
-     |  fcmpu cr0, f0, f1
-     if (op == BC_ISLT) {
-@@ -2981,10 +3299,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  b <1
-     |.else
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
--    |  lwzx TMP1, BASE, RD
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |   lfdx f1, BASE, RD
-@@ -3015,15 +3333,23 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQV;
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  .if ENDIAN_LE
-+    |    lwzx TMP1, BASE_HI, RD
-+    |  .else
-+    |    lwzux TMP1, RD, BASE_HI
-+    |  .endif
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-+    |  .if ENDIAN_LE
-+    |   lwzux CARG3, RD, BASE_LO
-+    |  .else
-+    |   lwz CARG3, WORD_LO(RD)
-+    |  .endif
-     |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-@@ -3032,14 +3358,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  ble cr7, ->BC_ISNEN_Z
-     }
-     |.else
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   lwz TMP2, 0(PC)
--    |    lfd f0, 0(RA)
-+    |    lfdx f0, BASE, RA
-     |   addi PC, PC, 4
--    |  lwzux TMP1, RD, BASE
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |   decode_RD4 TMP2, TMP2
--    |    lfd f1, 0(RD)
-+    |    lfdx f1, BASE, RD
-     |  checknum cr1, TMP1
-     |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     |  bge cr0, >5
-@@ -3057,8 +3383,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.endif
-     |5:  // Either or both types are not numbers.
-     |.if not DUALNUM
--    |    lwz CARG2, 4(RA)
--    |    lwz CARG3, 4(RD)
-+    |    lwzx CARG2, BASE_LO, RA
-+    |    lwzx CARG3, BASE_LO, RD
-     |.endif
-     |.if FFI
-     |  cmpwi cr7, TMP0, LJ_TCDATA
-@@ -3074,10 +3400,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.if FFI
-     |  beq cr7, ->vmeta_equal_cd
-     |.endif
-+    |.if P64
-+    |   cmplwi cr7, TMP3, ~LJ_TUDATA		// Avoid 64 bit lightuserdata.
-+    |.endif
-     |    cmplw cr5, CARG2, CARG3
-     |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
-     |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
-     |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
-+    |.if P64
-+    |   cror 4*cr6+lt, 4*cr6+lt, 4*cr7+gt
-+    |.endif
-     |   mr SAVE0, PC
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
-     |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
-@@ -3116,9 +3448,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQS: case BC_ISNES:
-     vk = op == BC_ISEQS;
-     |  // RA = src*8, RD = str_const*8 (~), JMP with RD = target
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi RD, RD, 1
--    |  lwz STR:TMP3, 4(RA)
-+    |  lwzx STR:TMP3, BASE_LO, RA
-     |    lwz TMP2, 0(PC)
-     |   subfic RD, RD, -4
-     |    addi PC, PC, 4
-@@ -3150,15 +3482,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQN;
-     |  // RA = src*8, RD = num_const*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, KBASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzux2 TMP1, CARG3, RD, KBASE
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-       |->BC_ISEQN_Z:
-@@ -3175,7 +3506,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     } else {
-       |->BC_ISNEN_Z:  // Dummy label.
-     }
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
-     |    lwz TMP2, -4(PC)
-@@ -3213,7 +3544,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bge cr0, <3
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  blt cr1, >1
-     |  // RA is a number, RD is an integer.
-     |  tonum_i f1, CARG3
-@@ -3232,7 +3563,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQP: case BC_ISNEP:
-     vk = op == BC_ISEQP;
-     |  // RA = src*8, RD = primitive_type*8 (~), JMP with RD = target
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi TMP1, RD, 3
-     |    lwz TMP2, 0(PC)
-     |   not TMP1, TMP1
-@@ -3262,7 +3593,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
-     |  // RA = dst*8 or unused, RD = src*8, JMP with RD = target
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |   lwz INS, 0(PC)
-     |   addi PC, PC, 4
-     if (op == BC_IST || op == BC_ISF) {
-@@ -3297,7 +3628,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTYPE:
-     |  // RA = src*8, RD = -type*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  srwi TMP1, RD, 3
-     |  ins_next1
-     |.if not PPE and not GPR64
-@@ -3311,7 +3642,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_ISNUM:
-     |  // RA = src*8, RD = -(TISNUM-1)*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  ins_next1
-     |  checknum TMP0
-     |  bge ->vmeta_istype
-@@ -3330,17 +3661,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_NOT:
-     |  // RA = dst*8, RD = src*8
-     |  ins_next1
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |  .gpr64 extsw TMP0, TMP0
-     |  subfic TMP1, TMP0, LJ_TTRUE
-     |  adde TMP0, TMP0, TMP1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_UNM:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP1, RD, BASE
--    |   lwz TMP0, 4(RD)
-+    |  lwzx TMP1, BASE_HI, RD
-+    |   lwzx TMP0, BASE_LO, RD
-+    |.if DUALNUM and not GPR64
-+    |  mtxer ZERO
-+    |.endif
-     |  checknum TMP1
-     |.if DUALNUM
-     |  bne >5
-@@ -3352,18 +3686,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.else
-     |  nego. TMP0, TMP0
-     |  bso >4
--    |1:
-     |.endif
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |3:
-     |  ins_next2
-     |4:
--    |.if not GPR64
--    |  // Potential overflow.
--    |  checkov TMP1, <1			// Ignore unrelated overflow.
--    |.endif
-     |  lus TMP1, 0x41e0			// 2^31.
-     |  li TMP0, 0
-     |  b >7
-@@ -3373,8 +3702,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  xoris TMP1, TMP1, 0x8000
-     |7:
-     |  ins_next1
--    |  stwux TMP1, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TMP1, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |.if DUALNUM
-     |  b <3
-     |.else
-@@ -3383,15 +3712,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_LEN:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP0, RD, BASE
--    |   lwz CARG1, 4(RD)
-+    |  lwzx TMP0, BASE_HI, RD
-+    |   lwzx CARG1, BASE_LO, RD
-     |  checkstr TMP0; bne >2
-     |  lwz CRET1, STR:CARG1->len
-     |1:
-     |.if DUALNUM
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw CRET1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx CRET1, BASE_LO, RA
-     |.else
-     |  tonum_u f0, CRET1		// Result is a non-negative integer.
-     |  ins_next1
-@@ -3426,9 +3755,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, KBASE, RC
-@@ -3442,9 +3778,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f15, BASE, RB
-     |    lfdx f14, KBASE, RC
-@@ -3458,8 +3801,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||default:
--    |   lwzx TMP1, BASE, RB
--    |   lwzx TMP2, BASE, RC
-+    |   lwzx TMP1, BASE_HI, RB
-+    |   lwzx TMP2, BASE_HI, RC
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, BASE, RC
-     |   checknum cr0, TMP1
-@@ -3514,41 +3857,62 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG2, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG1, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG2, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG1, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG1, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG2, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG1, 4(RC)
-+    |   .endif
-     ||  break;
-     ||default:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, BASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |     lwzx TMP2, RC, BASE_HI
-+    |      lwzux CARG1, RB, BASE
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RC, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, BASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||}
-+    |  mtxer ZERO
-     |  checknum cr1, TMP2
-     |  bne >5
-     |  bne cr1, >5
-     |  intins CARG1, CARG1, CARG2
--    |  bso >4
--    |1:
-+    |  ins_arithfallback bso
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |  stw CARG1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |  stwx CARG1, BASE_LO, RA
-     |2:
-     |  ins_next2
--    |4:  // Overflow.
--    |  checkov TMP0, <1			// Ignore unrelated overflow.
--    |  ins_arithfallback b
-     |5:  // FP variant.
-     ||if (vk == 1) {
-     |  lfd f15, 0(RB)
-@@ -3620,9 +3984,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_POW:
-     |  // NYI: (partial) integer arithmetic.
--    |  lwzx TMP1, BASE, RB
-+    |  lwzx TMP1, BASE_HI, RB
-     |   lfdx FARG1, BASE, RB
--    |  lwzx TMP2, BASE, RC
-+    |  lwzx TMP2, BASE_HI, RC
-     |   lfdx FARG2, BASE, RC
-     |  checknum cr0, TMP1
-     |  checknum cr1, TMP2
-@@ -3648,6 +4012,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns NULL (finished) or TValue * (metamethod).
-     |  cmplwi CRET1, 0
-     |   lp BASE, L->base
-+    |   addi BASEP4, BASE, 4
-     |  bne ->vmeta_binop
-     |  ins_next1
-     |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
-@@ -3664,8 +4029,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-str_const*4
-     |  li TMP2, LJ_TSTR
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     break;
-   case BC_KCDATA:
-@@ -3676,8 +4041,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-cdata_const*4
-     |  li TMP2, LJ_TCDATA
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3687,14 +4052,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  slwi RD, RD, 13
-     |  srawi RD, RD, 16
-     |  ins_next1
--    |   stwux TISNUM, RA, BASE
--    |   stw RD, 4(RA)
-+    |   stwx TISNUM, BASE_HI, RA
-+    |   stwx RD, BASE_LO, RA
-     |  ins_next2
-     |.else
-     |  // The soft-float approach is faster.
-     |  slwi RD, RD, 13
-     |  srawi TMP1, RD, 31
-     |  xor TMP2, TMP1, RD
-+    |  .gpr64 extsw RD, RD
-     |  sub TMP2, TMP2, TMP1		// TMP2 = abs(x)
-     |  cntlzw TMP3, TMP2
-     |  subfic TMP1, TMP3, 0x40d		// TMP1 = exponent-1
-@@ -3706,8 +4072,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add RD, RD, TMP1		// hi = hi + exponent-1
-     |    and RD, RD, TMP0		// hi = x == 0 ? 0 : hi
-     |  ins_next1
--    |    stwux RD, RA, BASE
--    |    stw ZERO, 4(RA)
-+    |    stwx RD, BASE_HI, RA
-+    |    stwx ZERO, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3723,15 +4089,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  srwi TMP1, RD, 3
-     |  not TMP0, TMP1
-     |  ins_next1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_KNIL:
-     |  // RA = base*8, RD = end*8
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |   addi RA, RA, 8
-     |1:
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |  cmpw RA, RD
-     |   addi RA, RA, 8
-     |  blt <1
-@@ -3763,10 +4129,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lwz CARG2, UPVAL:RB->v
-     |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
-     |    lbz TMP0, UPVAL:RB->closed
--    |   lwz TMP2, 0(RD)
-+    |   lwz TMP2, WORD_HI(RD)
-     |   stfd f0, 0(CARG2)
-     |    cmplwi cr1, TMP0, 0
--    |   lwz TMP1, 4(RD)
-+    |   lwz TMP1, WORD_LO(RD)
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
-     |   subi TMP2, TMP2, (LJ_TNUMX+1)
-     |  bne >2				// Upvalue is closed and black?
-@@ -3799,8 +4165,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lbz TMP3, STR:TMP1->marked
-     |   lbz TMP2, UPVAL:RB->closed
-     |   li TMP0, LJ_TSTR
--    |   stw STR:TMP1, 4(CARG2)
--    |   stw TMP0, 0(CARG2)
-+    |   stw STR:TMP1, WORD_LO(CARG2)
-+    |   stw TMP0, WORD_HI(CARG2)
-     |  bne >2
-     |1:
-     |  ins_next
-@@ -3837,7 +4203,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwzx UPVAL:RB, LFUNC:RB, RA
-     |  ins_next1
-     |  lwz TMP1, UPVAL:RB->v
--    |  stw TMP0, 0(TMP1)
-+    |  stw TMP0, WORD_HI(TMP1)
-     |  ins_next2
-     break;
- 
-@@ -3852,6 +4218,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add CARG2, BASE, RA
-     |  bl extern lj_func_closeuv	// (lua_State *L, TValue *level)
-     |  lp BASE, L->base
-+    |  addi BASEP4, BASE, 4
-     |1:
-     |  ins_next
-     break;
-@@ -3870,8 +4237,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns GCfuncL *.
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TFUNC
--    |  stwux TMP0, RA, BASE
--    |  stw LFUNC:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx LFUNC:CRET1, BASE_LO, RA
-     |  ins_next
-     break;
- 
-@@ -3904,8 +4272,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TTAB
--    |  stwux TMP0, RA, BASE
--    |  stw TAB:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx TAB:CRET1, BASE_LO, RA
-     |  ins_next
-     if (op == BC_TNEW) {
-       |3:
-@@ -3938,13 +4307,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TGETV:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -3971,8 +4340,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP2, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tgetv		// Integer key and in array part?
--    |  lwzx TMP0, TMP1, TMP2
--    |   lfdx f14, TMP1, TMP2
-+    |  .if ENDIAN_LE
-+    |    lfdux f14, TMP1, TMP2
-+    |    lwz TMP0, WORD_HI(TMP1)
-+    |  .else
-+    |    lwzx TMP0, TMP1, TMP2
-+    |    lfdx f14, TMP1, TMP2
-+    |  .endif
-     |  checknil TMP0; beq >2
-     |1:
-     |  ins_next1
-@@ -3991,15 +4365,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tgetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TGETS_Z			// String key?
-     break;
-   case BC_TGETS:
-     |  // RA = dst*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4015,16 +4389,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  sub TMP1, TMP0, TMP1
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
--    |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+    |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-     |  checkstr CARG1; bne >4
-     |   cmpw TMP0, STR:RC; bne >4
-     |    checknil CARG2; beq >5		// Key found, but nil value?
-     |3:
--    |    stwux CARG2, RA, BASE
--    |     stw TMP1, 4(RA)
-+    |    stwx CARG2, BASE_HI, RA
-+    |     stwx TMP1, BASE_LO, RA
-     |  ins_next
-     |
-     |4:  // Follow hash chain.
-@@ -4045,15 +4419,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETB:
-     |  // RA = dst*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tgetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-     |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
--    |  lwzx TMP1, TMP2, RC
--    |   lfdx f0, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    lfdux f0, TMP2, RC
-+    |    lwz TMP1, WORD_HI(TMP2)
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |    lfdx f0, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  ins_next1
-@@ -4071,12 +4450,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG1, 4(RB)
-+    |  lwzx TAB:CARG1, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |  lwz TMP0, TAB:CARG1->asize
--    |  lwz CARG2, 4(RC)
-+    |  lwzx CARG2, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG1->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4096,13 +4473,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TSETV:
-     |  // RA = src*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -4129,7 +4506,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP0, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tsetv		// Integer key and in array part?
-+    |  .if ENDIAN_LE
-+    |   addi TMP2, TMP1, 4
-+    |   lwzx TMP2, TMP2, TMP0
-+    |  .else
-     |   lwzx TMP2, TMP1, TMP0
-+    |  .endif
-     |  lbz TMP3, TAB:RB->marked
-     |    lfdx f14, BASE, RA
-     |   checknil TMP2; beq >3
-@@ -4152,7 +4534,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tsetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TSETS_Z			// String key?
-     |
-@@ -4162,9 +4544,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETS:
-     |  // RA = src*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4183,9 +4565,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    lbz TMP3, TAB:RB->marked
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-     |     lwz NODE:TMP1, NODE:TMP2->next
-     |  checkstr CARG1; bne >5
-     |   cmpw TMP0, STR:RC; bne >5
-@@ -4225,13 +4607,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq ->vmeta_tsets		// 'no __newindex' flag NOT set: check.
-     |6:
-     |  li TMP0, LJ_TSTR
--    |   stw STR:RC, 4(CARG3)
-+    |   stw STR:RC, WORD_LO(CARG3)
-     |   mr CARG2, TAB:RB
--    |  stw TMP0, 0(CARG3)
-+    |  stw TMP0, WORD_HI(CARG3)
-     |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
-     |  // Returns TValue *.
-     |  lp BASE, L->base
-     |  stfd f14, 0(CRET1)
-+    |   addi BASEP4, BASE, 4
-     |  b <3				// No 2nd write barrier needed.
-     |
-     |7:  // Possible table write barrier for the value. Skip valiswhite check.
-@@ -4240,9 +4623,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETB:
-     |  // RA = src*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tsetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-@@ -4250,7 +4633,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw TMP0, TMP1
-     |   lfdx f14, BASE, RA
-     |  bge ->vmeta_tsetb
--    |  lwzx TMP1, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    addi TMP1, TMP2, 4
-+    |    lwzx TMP1, TMP1, RC
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
-@@ -4274,13 +4662,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG2, 4(RB)
-+    |  lwzx TAB:CARG2, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |    lbz TMP3, TAB:CARG2->marked
-     |  lwz TMP0, TAB:CARG2->asize
--    |  lwz CARG3, 4(RC)
-+    |  lwzx CARG3, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG2->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4311,9 +4697,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |1:
-     |   add TMP3, KBASE, RD
--    |  lwz TAB:CARG2, -4(RA)		// Guaranteed to be a table.
-+    |  lwz TAB:CARG2, WORD_LO-8(RA)	// Guaranteed to be a table.
-     |    addic. TMP0, MULTRES, -8
--    |   lwz TMP3, 4(TMP3)		// Integer constant is in lo-word.
-+    |   lwz TMP3, WORD_LO(TMP3)		// Integer constant is in lo-word.
-     |    srwi CARG3, TMP0, 3
-     |    beq >4				// Nothing to copy?
-     |  add CARG3, CARG3, TMP3
-@@ -4362,8 +4748,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_CALL:
-     |  // RA = base*8, (RB = (nresults+1)*8,) RC = (nargs+1)*8
-     |  mr TMP2, BASE
--    |  lwzux TMP0, BASE, RA
--    |   lwz LFUNC:RB, 4(BASE)
-+    |  lwzux2 TMP0, LFUNC:RB, BASE, RA
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |   addi BASE, BASE, 8
-     |  checkfunc TMP0; bne ->vmeta_call
-@@ -4377,8 +4762,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_CALLT:
-     |  // RA = base*8, (RB = 0,) RC = (nargs+1)*8
--    |  lwzux TMP0, RA, BASE
--    |   lwz LFUNC:RB, 4(RA)
-+    |  lwzux2 TMP0, LFUNC:RB, RA, BASE
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |    lwz TMP1, FRAME_PC(BASE)
-     |  checkfunc TMP0
-@@ -4430,12 +4814,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 ((2+1)*8))
-     |  mr TMP2, BASE
-     |  add BASE, BASE, RA
--    |  lwz TMP1, -24(BASE)
--    |   lwz LFUNC:RB, -20(BASE)
-+    |  lwz TMP1, WORD_HI-24(BASE)
-+    |   lwz LFUNC:RB, WORD_LO-24(BASE)
-     |    lfd f1, -8(BASE)
-     |    lfd f0, -16(BASE)
--    |  stw TMP1, 0(BASE)		// Copy callable.
--    |   stw LFUNC:RB, 4(BASE)
-+    |  stw TMP1, WORD_HI(BASE)		// Copy callable.
-+    |   stw LFUNC:RB, WORD_LO(BASE)
-     |  checkfunc TMP1
-     |    stfd f1, 16(BASE)		// Copy control var.
-     |     li NARGS8:RC, 16		// Iterators get 2 arguments.
-@@ -4450,8 +4834,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // NYI: add hotloop, record BC_ITERN.
-     |.endif
-     |  add RA, BASE, RA
--    |  lwz TAB:RB, -12(RA)
--    |  lwz RC, -4(RA)			// Get index from control var.
-+    |  lwz TAB:RB, WORD_LO-16(RA)
-+    |  lwz RC, WORD_LO-8(RA)		// Get index from control var.
-     |  lwz TMP0, TAB:RB->asize
-     |  lwz TMP1, TAB:RB->array
-     |   addi PC, PC, 4
-@@ -4459,14 +4843,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw RC, TMP0
-     |   slwi TMP3, RC, 3
-     |  bge >5				// Index points after array part?
--    |  lwzx TMP2, TMP1, TMP3
--    |   lfdx f0, TMP1, TMP3
-+    |  lfdux f0, TMP3, TMP1
-+    |   lwz TMP2, WORD_HI(TMP3)
-     |  checknil TMP2
-     |     lwz INS, -4(PC)
-     |  beq >4
-     |.if DUALNUM
--    |   stw RC, 4(RA)
--    |   stw TISNUM, 0(RA)
-+    |   stw RC, WORD_LO(RA)
-+    |   stw TISNUM, WORD_HI(RA)
-     |.else
-     |   tonum_u f1, RC
-     |.endif
-@@ -4474,7 +4858,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
-     |  stfd f0, 8(RA)
-     |     decode_RD4 TMP1, INS
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |     add PC, TMP1, TMP3
-     |.if not DUALNUM
-     |   stfd f1, 0(RA)
-@@ -4496,9 +4880,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgty <3
-     |   slwi RB, RC, 3
-     |   sub TMP3, TMP3, RB
--    |  lwzx RB, TMP2, TMP3
--    |  lfdx f0, TMP2, TMP3
--    |   add NODE:TMP3, TMP2, TMP3
-+    |  lfdux f0, TMP3, TMP2
-+    |  lwz RB, WORD_HI(TMP3)
-     |  checknil RB
-     |     lwz INS, -4(PC)
-     |  beq >7
-@@ -4510,7 +4893,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   stfd f1, 0(RA)
-     |    addi RC, RC, 1
-     |     add PC, TMP1, TMP2
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |  b <3
-     |
-     |7:  // Skip holes in hash part.
-@@ -4521,10 +4904,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISNEXT:
-     |  // RA = base*8, RD = target (points to ITERN)
-     |  add RA, BASE, RA
--    |  lwz TMP0, -24(RA)
--    |  lwz CFUNC:TMP1, -20(RA)
--    |   lwz TMP2, -16(RA)
--    |    lwz TMP3, -8(RA)
-+    |  lwz TMP0, WORD_HI-24(RA)
-+    |  lwz CFUNC:TMP1, WORD_LO-24(RA)
-+    |   lwz TMP2, WORD_HI-16(RA)
-+    |    lwz TMP3, WORD_HI-8(RA)
-     |   cmpwi cr0, TMP2, LJ_TTAB
-     |  cmpwi cr1, TMP0, LJ_TFUNC
-     |    cmpwi cr6, TMP3, LJ_TNIL
-@@ -4538,17 +4921,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bne cr0, >5
-     |  lus TMP1, 0xfffe
-     |  ori TMP1, TMP1, 0x7fff
--    |  stw ZERO, -4(RA)			// Initialize control var.
--    |  stw TMP1, -8(RA)
-+    |  stw ZERO, WORD_LO-8(RA)		// Initialize control var.
-+    |  stw TMP1, WORD_HI-8(RA)
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-     |1:
-     |  ins_next
-     |5:  // Despecialize bytecode if any of the checks fail.
-     |  li TMP0, BC_JMP
-     |   li TMP1, BC_ITERC
-+    |  .if ENDIAN_LE
-+    |  stb TMP0, -4(PC)
-+    |  .else
-     |  stb TMP0, -1(PC)
-+    |  .endif
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-+    |  .if ENDIAN_LE
-+    |   stb TMP1, 0(PC)
-+    |  .else
-     |   stb TMP1, 3(PC)
-+    |  .endif
-     |  b <1
-     break;
- 
-@@ -4582,7 +4973,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    addi RA, RA, 8
-     |   blt cr1, <1			// More vararg slots?
-     |2:  // Fill up remainder with nil.
--    |  stw TISNIL, 0(RA)
-+    |  stw TISNIL, WORD_HI(RA)
-     |  cmplw RA, TMP2
-     |   addi RA, RA, 8
-     |  blt <2
-@@ -4619,6 +5010,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |  add RC, BASE, SAVE0
-     |  subi TMP3, BASE, 8
-+    |  addi BASEP4, BASE, 4
-     |  b <6
-     break;
- 
-@@ -4667,13 +5059,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4709,13 +5102,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4741,11 +5135,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = (op == BC_IFORL || op == BC_JFORL);
-     |.if DUALNUM
-     |  // Integer loop.
--    |  lwzux TMP1, RA, BASE
--    |   lwz CARG1, FORL_IDX*8+4(RA)
-+    |  lwzux2 TMP1, CARG1, RA, BASE
-+    if (vk) {
-+      |  mtxer ZERO
-+    }
-     |  cmplw cr0, TMP1, TISNUM
-     if (vk) {
--      |   lwz CARG3, FORL_STEP*8+4(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-       |  bne >9
-       |.if GPR64
-       |  // Need to check overflow for (a<<32) + (b<<32).
-@@ -4757,15 +5153,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  addo. CARG1, CARG1, CARG3
-       |.endif
-       |    cmpwi cr6, CARG3, 0
--      |   lwz CARG2, FORL_STOP*8+4(RA)
--      |  bso >6
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-+      |  bso >2
-       |4:
--      |  stw CARG1, FORL_IDX*8+4(RA)
-+      |  stw CARG1, FORL_IDX*8+WORD_LO(RA)
-     } else {
--      |  lwz TMP3, FORL_STEP*8(RA)
--      |   lwz CARG3, FORL_STEP*8+4(RA)
--      |  lwz TMP2, FORL_STOP*8(RA)
--      |   lwz CARG2, FORL_STOP*8+4(RA)
-+      |  lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-+      |  lwz TMP2, FORL_STOP*8+WORD_HI(RA)
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-       |  cmplw cr7, TMP3, TISNUM
-       |  cmplw cr1, TMP2, TISNUM
-       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
-@@ -4776,11 +5172,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    blt cr6, >5
-     |  cmpw CARG1, CARG2
-     |1:
--    |   stw TISNUM, FORL_EXT*8(RA)
-+    |   stw TISNUM, FORL_EXT*8+WORD_HI(RA)
-     if (op != BC_JFORL) {
-       |  srwi RD, RD, 1
-     }
--    |   stw CARG1, FORL_EXT*8+4(RA)
-+    |   stw CARG1, FORL_EXT*8+WORD_LO(RA)
-     if (op != BC_JFORL) {
-       |  add RD, PC, RD
-     }
-@@ -4800,11 +5196,6 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:  // Invert check for negative step.
-     |  cmpw CARG2, CARG1
-     |  b <1
--    if (vk) {
--      |6:  // Potential overflow.
--      |  checkov TMP0, <4		// Ignore unrelated overflow.
--      |  b <2
--    }
-     |.endif
-     if (vk) {
-       |.if DUALNUM
-@@ -4815,14 +5206,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |.endif
-       |  lfd f3, FORL_STEP*8(RA)
-       |  lfd f2, FORL_STOP*8(RA)
--      |   lwz TMP3, FORL_STEP*8(RA)
-+      |   lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-       |  fadd f1, f1, f3
-       |  stfd f1, FORL_IDX*8(RA)
-     } else {
-       |.if DUALNUM
-       |9:  // FP loop.
-       |.else
-+      |.if ENDIAN_LE
-+      |  lwzx TMP1, RA, BASE_LO
-+      |  add RA, RA, BASE
-+      |.else
-       |  lwzux TMP1, RA, BASE
-+      |.endif
-       |  lwz TMP3, FORL_STEP*8(RA)
-       |  lwz TMP2, FORL_STOP*8(RA)
-       |  cmplw cr0, TMP1, TISNUM
-@@ -4903,17 +5299,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- #endif
-   case BC_IITERL:
-     |  // RA = base*8, RD = target
--    |  lwzux TMP1, RA, BASE
--    |   lwz TMP2, 4(RA)
-+    |  lwzux2 TMP1, TMP2, RA, BASE
-     |  checknil TMP1; beq >1		// Stop if iterator returned nil.
-     if (op == BC_JITERL) {
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-       |  b =>BC_JLOOP
-     } else {
-       |  branch_RD			// Otherwise save control var + branch.
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-     }
-     |1:
-     |  ins_next
-@@ -4942,7 +5337,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Traces on PPC don't store the trace number, so use 0.
-     |   stw ZERO, DISPATCH_GL(vmstate)(DISPATCH)
-     |  lwzx TRACE:TMP2, TMP1, RD
--    |  clrso TMP1
-+    |  mtxer ZERO
-     |  lp TMP2, TRACE:TMP2->mcode
-     |   stw BASE, DISPATCH_GL(jit_base)(DISPATCH)
-     |  mtctr TMP2
-@@ -4994,7 +5389,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |
-     |3:  // Clear missing parameters.
--    |  stwx TISNIL, BASE, NARGS8:RC
-+    |  stwx TISNIL, BASE_HI, NARGS8:RC
-     |  addi NARGS8:RC, NARGS8:RC, 8
-     |  b <2
-     break;
-@@ -5011,11 +5406,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwz TMP2, L->maxstack
-     |   add TMP1, BASE, RC
-     |  add TMP0, RA, RC
--    |   stw LFUNC:RB, 4(TMP1)		// Store copy of LFUNC.
-+    |   stw LFUNC:RB, WORD_LO(TMP1)	// Store copy of LFUNC.
-     |   addi TMP3, RC, 8+FRAME_VARG
-     |    lwz KBASE, -4+PC2PROTO(k)(PC)
-     |  cmplw TMP0, TMP2
--    |   stw TMP3, 0(TMP1)		// Store delta + FRAME_VARG.
-+    |   stw TMP3, WORD_HI(TMP1)		// Store delta + FRAME_VARG.
-     |  bge ->vm_growstack_l
-     |  lbz TMP2, -4+PC2PROTO(numparams)(PC)
-     |   mr RA, BASE
-@@ -5026,18 +5421,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq >3
-     |1:
-     |  cmplw RA, RC			// Less args than parameters?
--    |   lwz TMP0, 0(RA)
--    |   lwz TMP3, 4(RA)
-+    |   lwz TMP0, WORD_HI(RA)
-+    |   lwz TMP3, WORD_LO(RA)
-     |  bge >4
--    |    stw TISNIL, 0(RA)		// Clear old fixarg slot (help the GC).
-+    |    stw TISNIL, WORD_HI(RA)	// Clear old fixarg slot (help the GC).
-     |    addi RA, RA, 8
-     |2:
-     |  addic. TMP2, TMP2, -1
--    |   stw TMP0, 8(TMP1)
--    |   stw TMP3, 12(TMP1)
-+    |   stw TMP0, WORD_HI+8(TMP1)
-+    |   stw TMP3, WORD_LO+8(TMP1)
-     |    addi TMP1, TMP1, 8
-     |  bne <1
-     |3:
-+    |  addi BASEP4, BASE, 4
-     |  ins_next2
-     |
-     |4:  // Clear missing parameters.
-@@ -5049,35 +5445,35 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_FUNCCW:
-     |  // BASE = new base, RA = BASE+framesize*8, RB = CFUNC, RC = nargs*8
-     if (op == BC_FUNCC) {
--      |  lp RD, CFUNC:RB->f
-+      |  lp FUNCREG, CFUNC:RB->f
-     } else {
--      |  lp RD, DISPATCH_GL(wrapf)(DISPATCH)
-+      |  lp FUNCREG, DISPATCH_GL(wrapf)(DISPATCH)
-     }
-     |   add TMP1, RA, NARGS8:RC
-     |   lwz TMP2, L->maxstack
--    |  .toc lp TMP3, 0(RD)
-+    |  .opd lp TMP3, 0(FUNCREG)
-     |    add RC, BASE, NARGS8:RC
-     |   stp BASE, L->base
-     |   cmplw TMP1, TMP2
-     |    stp RC, L->top
-     |     li_vmstate C
--    |.if TOC
-+    |.if OPD
-     |  mtctr TMP3
-     |.else
--    |  mtctr RD
-+    |  mtctr FUNCREG
-     |.endif
-     if (op == BC_FUNCCW) {
-       |  lp CARG2, CFUNC:RB->f
-     }
-     |  mr CARG1, L
-     |   bgt ->vm_growstack_c		// Need to grow stack.
--    |  .toc lp TOCREG, TOC_OFS(RD)
--    |  .tocenv lp ENVREG, ENV_OFS(RD)
-+    |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+    |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-     |     st_vmstate
-     |  bctrl				// (lua_State *L [, lua_CFunction f])
-+    |  .toc lp TOCREG, SAVE_TOC
-     |  // Returns nresults.
-     |  lp BASE, L->base
--    |  .toc ld TOCREG, SAVE_TOC
-     |   slwi RD, CRET1, 3
-     |  lp TMP1, L->top
-     |    li_vmstate INTERP
-@@ -5128,7 +5524,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.byte 0xc\n\t.uleb128 1\n\t.uleb128 0\n"
- 	"\t.align 2\n"
-@@ -5141,14 +5541,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long .Lbegin\n"
- 	"\t.long %d\n"
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE0:\n\n");
-@@ -5164,8 +5574,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call\n"
- #endif
- 	"\t.long %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
-@@ -5180,7 +5594,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"zPR\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.uleb128 6\n"			/* augmentation length */
- 	"\t.byte 0x1b\n"			/* pcrel|sdata4 */
-@@ -5198,14 +5616,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE2:\n\n");
-@@ -5233,8 +5661,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call-.\n"
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch b/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
deleted file mode 100644
index f4e760b738361..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
+++ /dev/null
@@ -1,11 +0,0 @@
---- a/src/vm_ppc.dasc	2019-06-03 19:41:50.214671731 +0200
-+++ b/src/vm_ppc.dasc	2019-06-03 19:44:40.229686143 +0200
-@@ -2774,7 +2774,7 @@
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  addi sp, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-   |  saver 3 // CARG1
-   |  saver 4 // CARG2
-   |  saver 5 // CARG3
diff --git a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch b/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
deleted file mode 100644
index 487a1cd1ca787..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
+++ /dev/null
@@ -1,231 +0,0 @@
-commit 9da06535092d6d9dec442641a26c64bce5574322
-Author: Mike Pall <mike>
-Date:   Sun Jun 24 14:08:59 2018 +0200
-
-    ARM64: Fix exit stub patching.
-    
-    Contributed by Javier Guerra Giraldez.
-
-diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
-index cbb186d3..baafa21a 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -56,11 +56,11 @@ static void asm_exitstub_setup(ASMState *as, ExitNo nexits)
-     asm_mclimit(as);
-   /* 1: str lr,[sp]; bl ->vm_exit_handler; movz w0,traceno; bl <1; bl <1; ... */
-   for (i = nexits-1; (int32_t)i >= 0; i--)
--    *--mxp = A64I_LE(A64I_BL|((-3-i)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_MOVZw|A64F_U16(as->T->traceno));
-+    *--mxp = A64I_LE(A64I_BL | A64F_S26(-3-i));
-+  *--mxp = A64I_LE(A64I_MOVZw | A64F_U16(as->T->traceno));
-   mxp--;
--  *mxp = A64I_LE(A64I_BL|(((MCode *)(void *)lj_vm_exit_handler-mxp)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_STRx|A64F_D(RID_LR)|A64F_N(RID_SP));
-+  *mxp = A64I_LE(A64I_BL | A64F_S26(((MCode *)(void *)lj_vm_exit_handler-mxp)));
-+  *--mxp = A64I_LE(A64I_STRx | A64F_D(RID_LR) | A64F_N(RID_SP));
-   as->mctop = mxp;
- }
- 
-@@ -77,7 +77,7 @@ static void asm_guardcc(ASMState *as, A64CC cc)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cond_branch(as, cc^1, p-1);
-     return;
-   }
-@@ -91,7 +91,7 @@ static void asm_guardtnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_tnb(as, ai^0x01000000u, r, bit, p-1);
-     return;
-   }
-@@ -105,7 +105,7 @@ static void asm_guardcnb(ASMState *as, A64Ins ai, Reg r)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cnb(as, ai^0x01000000u, r, p-1);
-     return;
-   }
-@@ -1850,7 +1850,7 @@ static void asm_loop_fixup(ASMState *as)
-     p[-2] |= ((uint32_t)delta & mask) << 5;
-   } else {
-     ptrdiff_t delta = target - (p - 1);
--    p[-1] = A64I_B | ((uint32_t)(delta) & 0x03ffffffu);
-+    p[-1] = A64I_B | A64F_S26(delta);
-   }
- }
- 
-@@ -1919,7 +1919,7 @@ static void asm_tail_fixup(ASMState *as, TraceNo lnk)
-   }
-   /* Patch exit branch. */
-   target = lnk ? traceref(as->J, lnk)->mcode : (MCode *)lj_vm_exit_interp;
--  p[-1] = A64I_B | (((target-p)+1)&0x03ffffffu);
-+  p[-1] = A64I_B | A64F_S26((target-p)+1);
- }
- 
- /* Prepare tail of code. */
-@@ -1982,40 +1982,50 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
- {
-   MCode *p = T->mcode;
-   MCode *pe = (MCode *)((char *)p + T->szmcode);
--  MCode *cstart = NULL, *cend = p;
-+  MCode *cstart = NULL;
-   MCode *mcarea = lj_mcode_patch(J, p, 0);
-   MCode *px = exitstub_trace_addr(T, exitno);
-+  /* Note: this assumes a trace exit is only ever patched once. */
-   for (; p < pe; p++) {
-     /* Look for exitstub branch, replace with branch to target. */
-+    ptrdiff_t delta = target - p;
-     MCode ins = A64I_LE(*p);
-     if ((ins & 0xff000000u) == 0x54000000u &&
- 	((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch bcc exitstub. */
--      *p = A64I_LE((ins & 0xff00001fu) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch bcc, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0xfc000000u) == 0x14000000u &&
- 	       ((ins ^ (px-p)) & 0x03ffffffu) == 0) {
--      /* Patch b exitstub. */
--      *p = A64I_LE((ins & 0xfc000000u) | ((target-p) & 0x03ffffffu));
--      cend = p+1;
-+      /* Patch b. */
-+      lua_assert(A64F_S_OK(delta, 26));
-+      *p = A64I_LE((ins & 0xfc000000u) | A64F_S26(delta));
-       if (!cstart) cstart = p;
-     } else if ((ins & 0x7e000000u) == 0x34000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch cbz/cbnz exitstub. */
--      *p = A64I_LE((ins & 0xff00001f) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch cbz/cbnz, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0x7e000000u) == 0x36000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x0007ffe0u) == 0) {
--      /* Patch tbz/tbnz exitstub. */
--      *p = A64I_LE((ins & 0xfff8001fu) | (((target-p)<<5) & 0x0007ffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch tbz/tbnz, if within range. */
-+      if (A64F_S_OK(delta, 14)) {
-+	*p = A64I_LE((ins & 0xfff8001fu) | A64F_S14(delta));
-+	if (!cstart) cstart = p;
-+      }
-     }
-   }
--  lua_assert(cstart != NULL);
--  lj_mcode_sync(cstart, cend);
-+  {  /* Always patch long-range branch in exit stub itself. */
-+    ptrdiff_t delta = target - px;
-+    lua_assert(A64F_S_OK(delta, 26));
-+    *px = A64I_B | A64F_S26(delta);
-+    if (!cstart) cstart = px;
-+  }
-+  lj_mcode_sync(cstart, px+1);
-   lj_mcode_patch(J, mcarea, 1);
- }
- 
-diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
-index 6da4c7d4..1001b1d8 100644
---- a/src/lj_emit_arm64.h
-+++ b/src/lj_emit_arm64.h
-@@ -241,7 +241,7 @@ static void emit_loadk(ASMState *as, Reg rd, uint64_t u64, int is64)
- #define mcpofs(as, k) \
-   ((intptr_t)((uintptr_t)(k) - (uintptr_t)(as->mcp - 1)))
- #define checkmcpofs(as, k) \
--  ((((mcpofs(as, k)>>2) + 0x00040000) >> 19) == 0)
-+  (A64F_S_OK(mcpofs(as, k)>>2, 19))
- 
- static Reg ra_allock(ASMState *as, intptr_t k, RegSet allow);
- 
-@@ -312,7 +312,7 @@ static void emit_cond_branch(ASMState *as, A64CC cond, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = A64I_BCC | A64F_S19(delta) | cond;
- }
- 
-@@ -320,24 +320,24 @@ static void emit_branch(ASMState *as, A64Ins ai, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x02000000) >> 26) == 0);
--  *p = ai | ((uint32_t)delta & 0x03ffffffu);
-+  lua_assert(A64F_S_OK(delta, 26));
-+  *p = ai | A64F_S26(delta);
- }
- 
- static void emit_tnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(bit < 63 && ((delta + 0x2000) >> 14) == 0);
-+  lua_assert(bit < 63 && A64F_S_OK(delta, 14));
-   if (bit > 31) ai |= A64I_X;
--  *p = ai | A64F_BIT(bit & 31) | A64F_S14((uint32_t)delta & 0x3fffu) | r;
-+  *p = ai | A64F_BIT(bit & 31) | A64F_S14(delta) | r;
- }
- 
- static void emit_cnb(ASMState *as, A64Ins ai, Reg r, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = ai | A64F_S19(delta) | r;
- }
- 
-@@ -347,8 +347,8 @@ static void emit_call(ASMState *as, void *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = (char *)target - (char *)p;
--  if ((((delta>>2) + 0x02000000) >> 26) == 0) {
--    *p = A64I_BL | ((uint32_t)(delta>>2) & 0x03ffffffu);
-+  if (A64F_S_OK(delta>>2, 26)) {
-+    *p = A64I_BL | A64F_S26(delta>>2);
-   } else {  /* Target out of range: need indirect call. But don't use R0-R7. */
-     Reg r = ra_allock(as, i64ptr(target),
- 		      RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED);
-diff --git a/src/lj_target_arm64.h b/src/lj_target_arm64.h
-index 520023ae..a207a2ba 100644
---- a/src/lj_target_arm64.h
-+++ b/src/lj_target_arm64.h
-@@ -132,9 +132,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_IMMR(x)	((x) << 16)
- #define A64F_U16(x)	((x) << 5)
- #define A64F_U12(x)	((x) << 10)
--#define A64F_S26(x)	(x)
-+#define A64F_S26(x)	(((uint32_t)(x) & 0x03ffffffu))
- #define A64F_S19(x)	(((uint32_t)(x) & 0x7ffffu) << 5)
--#define A64F_S14(x)	((x) << 5)
-+#define A64F_S14(x)	(((uint32_t)(x) & 0x3fffu) << 5)
- #define A64F_S9(x)	((x) << 12)
- #define A64F_BIT(x)	((x) << 19)
- #define A64F_SH(sh, x)	(((sh) << 22) | ((x) << 10))
-@@ -145,6 +145,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_LSL16(x)	(((x) / 16) << 21)
- #define A64F_BSH(sh)	((sh) << 10)
- 
-+/* Check for valid field range. */
-+#define A64F_S_OK(x, b)	((((x) + (1 << (b-1))) >> (b)) == 0)
-+
- typedef enum A64Ins {
-   A64I_S = 0x20000000,
-   A64I_X = 0x80000000,
diff --git a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch b/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
deleted file mode 100644
index c30264786755f..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From: Jason Teplitz <jason@tensyr.com>
-Date: Mon, 9 Oct 2017 23:03:09 +0000
-Subject: Fix register allocation bug in arm64
-
----
- src/lj_asm_arm64.h | 3 +--
- 1 file changed, 1 insertion(+), 2 deletions(-)
-
-diff --git src/lj_asm_arm64.h src/lj_asm_arm64.h
-index 8fd92e7..549f8a6 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -871,7 +871,7 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   int bigofs = !emit_checkofs(A64I_LDRx, ofs);
-   RegSet allow = RSET_GPR;
-   Reg dest = (ra_used(ir) || bigofs) ? ra_dest(as, ir, RSET_GPR) : RID_NONE;
--  Reg node = ra_alloc1(as, ir->op1, allow);
-+  Reg node = ra_alloc1(as, ir->op1, ra_hasreg(dest) ? rset_clear(allow, dest) : allow);
-   Reg key = ra_scratch(as, rset_clear(allow, node));
-   Reg idx = node;
-   uint64_t k;
-@@ -879,7 +879,6 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   rset_clear(allow, key);
-   if (bigofs) {
-     idx = dest;
--    rset_clear(allow, dest);
-     kofs = (int32_t)offsetof(Node, key);
-   } else if (ra_hasreg(dest)) {
-     emit_opk(as, A64I_ADDx, dest, node, ofs, allow);
diff --git a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch b/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
deleted file mode 100644
index a217866c392cf..0000000000000
--- a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
+++ /dev/null
@@ -1,562 +0,0 @@
-From e9af1abec542e6f9851ff2368e7f196b6382a44c Mon Sep 17 00:00:00 2001
-From: Mike Pall <mike>
-Date: Wed, 30 Sep 2020 01:31:27 +0200
-Subject: [PATCH] Add support for full-range 64 bit lightuserdata.
-
----
- doc/status.html   | 11 ---------
- src/jit/dump.lua  |  4 +++-
- src/lib_debug.c   | 12 +++++-----
- src/lib_jit.c     | 14 ++++++------
- src/lib_package.c |  8 +++----
- src/lib_string.c  |  2 +-
- src/lj_api.c      | 40 +++++++++++++++++++++++++++++----
- src/lj_ccall.c    |  2 +-
- src/lj_cconv.c    |  2 +-
- src/lj_crecord.c  |  6 ++---
- src/lj_dispatch.c |  2 +-
- src/lj_ir.c       |  6 +++--
- src/lj_obj.c      |  5 +++--
- src/lj_obj.h      | 57 ++++++++++++++++++++++++++++++-----------------
- src/lj_snap.c     |  9 +++++++-
- src/lj_state.c    |  6 +++++
- src/lj_strfmt.c   |  2 +-
- 17 files changed, 121 insertions(+), 67 deletions(-)
-
-#diff --git a/doc/status.html b/doc/status.html
-#index 0aafe13a2..fd0ae8bae 100644
-#--- a/doc/status.html
-#+++ b/doc/status.html
-#@@ -91,17 +91,6 @@ <h2>Current Status</h2>
-# <tt>lua_atpanic</tt> on x64. This issue will be fixed with the new
-# garbage collector.
-# </li>
-#-<li>
-#-LuaJIT on 64 bit systems provides a <b>limited range</b> of 47 bits for the
-#-<b>legacy <tt>lightuserdata</tt></b> data type.
-#-This is only relevant on x64 systems which use the negative part of the
-#-virtual address space in user mode, e.g. Solaris/x64, and on ARM64 systems
-#-configured with a 48 bit or 52 bit VA.
-#-Avoid using <tt>lightuserdata</tt> to hold pointers that may point outside
-#-of that range, e.g. variables on the stack. In general, avoid this data
-#-type for new code and replace it with (much more performant) FFI bindings.
-#-FFI cdata pointers can address the full 64 bit range.
-#-</li>
-# </ul>
-# <br class="flush">
-# </div>
-Index: luajit/src/jit/dump.lua
-===================================================================
---- luajit.orig/src/jit/dump.lua
-+++ luajit/src/jit/dump.lua
-@@ -315,7 +315,9 @@
-   local tn = type(k)
-   local s
-   if tn == "number" then
--    if band(sn or 0, 0x30000) ~= 0 then
-+    if t < 12 then
-+      s = k == 0 and "NULL" or format("[0x%08x]", k)
-+    elseif band(sn or 0, 0x30000) ~= 0 then
-       s = band(sn, 0x20000) ~= 0 and "contpc" or "ftsz"
-     elseif k == 2^52+2^51 then
-       s = "bias"
-Index: luajit/src/lib_debug.c
-===================================================================
---- luajit.orig/src/lib_debug.c
-+++ luajit/src/lib_debug.c
-@@ -231,8 +231,8 @@
-   int32_t n = lj_lib_checkint(L, 2) - 1;
-   if ((uint32_t)n >= fn->l.nupvalues)
-     lj_err_arg(L, 2, LJ_ERR_IDXRNG);
--  setlightudV(L->top-1, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
--					(void *)&fn->c.upvalue[n]);
-+  lua_pushlightuserdata(L, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
-+					   (void *)&fn->c.upvalue[n]);
-   return 1;
- }
- 
-@@ -283,13 +283,13 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define KEY_HOOK	((void *)0x3004)
-+#define KEY_HOOK	(U64x(80000000,00000000)|'h')
- 
- static void hookf(lua_State *L, lua_Debug *ar)
- {
-   static const char *const hooknames[] =
-     {"call", "return", "line", "count", "tail return"};
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_rawget(L, LUA_REGISTRYINDEX);
-   if (lua_isfunction(L, -1)) {
-     lua_pushstring(L, hooknames[(int)ar->event]);
-@@ -334,7 +334,7 @@
-     count = luaL_optint(L, arg+3, 0);
-     func = hookf; mask = makemask(smask, count);
-   }
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_pushvalue(L, arg+1);
-   lua_rawset(L, LUA_REGISTRYINDEX);
-   lua_sethook(L, func, mask, count);
-@@ -349,7 +349,7 @@
-   if (hook != NULL && hook != hookf) {  /* external hook? */
-     lua_pushliteral(L, "external hook");
-   } else {
--    lua_pushlightuserdata(L, KEY_HOOK);
-+    (L->top++)->u64 = KEY_HOOK;
-     lua_rawget(L, LUA_REGISTRYINDEX);   /* get hook */
-   }
-   lua_pushstring(L, unmakemask(mask, buff));
-Index: luajit/src/lib_jit.c
-===================================================================
---- luajit.orig/src/lib_jit.c
-+++ luajit/src/lib_jit.c
-@@ -540,15 +540,15 @@
- 
- /* Not loaded by default, use: local profile = require("jit.profile") */
- 
--static const char KEY_PROFILE_THREAD = 't';
--static const char KEY_PROFILE_FUNC = 'f';
-+#define KEY_PROFILE_THREAD	(U64x(80000000,00000000)|'t')
-+#define KEY_PROFILE_FUNC	(U64x(80000000,00000000)|'f')
- 
- static void jit_profile_callback(lua_State *L2, lua_State *L, int samples,
- 				 int vmstate)
- {
-   TValue key;
-   cTValue *tv;
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   tv = lj_tab_get(L, tabV(registry(L)), &key);
-   if (tvisfunc(tv)) {
-     char vmst = (char)vmstate;
-@@ -575,9 +575,9 @@
-   lua_State *L2 = lua_newthread(L);  /* Thread that runs profiler callback. */
-   TValue key;
-   /* Anchor thread and function in registry. */
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setthreadV(L, lj_tab_set(L, registry, &key), L2);
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setfuncV(L, lj_tab_set(L, registry, &key), func);
-   lj_gc_anybarriert(L, registry);
-   luaJIT_profile_start(L, mode ? strdata(mode) : "",
-@@ -592,9 +592,9 @@
-   TValue key;
-   luaJIT_profile_stop(L);
-   registry = tabV(registry(L));
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setnilV(lj_tab_set(L, registry, &key));
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setnilV(lj_tab_set(L, registry, &key));
-   lj_gc_anybarriert(L, registry);
-   return 0;
-Index: luajit/src/lib_package.c
-===================================================================
---- luajit.orig/src/lib_package.c
-+++ luajit/src/lib_package.c
-@@ -398,7 +398,7 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define sentinel	((void *)0x4004)
-+#define KEY_SENTINEL	(U64x(80000000,00000000)|'s')
- 
- static int lj_cf_package_require(lua_State *L)
- {
-@@ -408,7 +408,7 @@
-   lua_getfield(L, LUA_REGISTRYINDEX, "_LOADED");
-   lua_getfield(L, 2, name);
-   if (lua_toboolean(L, -1)) {  /* is it there? */
--    if (lua_touserdata(L, -1) == sentinel)  /* check loops */
-+    if ((L->top-1)->u64 == KEY_SENTINEL)  /* check loops */
-       luaL_error(L, "loop or previous error loading module " LUA_QS, name);
-     return 1;  /* package is already loaded */
-   }
-@@ -431,14 +431,14 @@
-     else
-       lua_pop(L, 1);
-   }
--  lua_pushlightuserdata(L, sentinel);
-+  (L->top++)->u64 = KEY_SENTINEL;
-   lua_setfield(L, 2, name);  /* _LOADED[name] = sentinel */
-   lua_pushstring(L, name);  /* pass name as argument to module */
-   lua_call(L, 1, 1);  /* run loaded module */
-   if (!lua_isnil(L, -1))  /* non-nil return? */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = returned value */
-   lua_getfield(L, 2, name);
--  if (lua_touserdata(L, -1) == sentinel) {   /* module did not set a value? */
-+  if ((L->top-1)->u64 == KEY_SENTINEL) {   /* module did not set a value? */
-     lua_pushboolean(L, 1);  /* use true as result */
-     lua_pushvalue(L, -1);  /* extra copy to be returned */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = true */
-Index: luajit/src/lib_string.c
-===================================================================
---- luajit.orig/src/lib_string.c
-+++ luajit/src/lib_string.c
-@@ -714,7 +714,7 @@
- 	lj_strfmt_putfchar(sb, sf, lj_lib_checkint(L, arg));
- 	break;
-       case STRFMT_PTR:  /* No formatting. */
--	lj_strfmt_putptr(sb, lj_obj_ptr(L->base+arg-1));
-+	lj_strfmt_putptr(sb, lj_obj_ptr(G(L), L->base+arg-1));
- 	break;
-       default:
- 	lua_assert(0);
-Index: luajit/src/lj_api.c
-===================================================================
---- luajit.orig/src/lj_api.c
-+++ luajit/src/lj_api.c
-@@ -595,7 +595,7 @@
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(G(L), o);
-   else
-     return NULL;
- }
-@@ -608,7 +608,7 @@
- 
- LUA_API const void *lua_topointer(lua_State *L, int idx)
- {
--  return lj_obj_ptr(index2adr(L, idx));
-+  return lj_obj_ptr(G(L), index2adr(L, idx));
- }
- 
- /* -- Stack setters (object creation) ------------------------------------- */
-@@ -694,9 +694,38 @@
-   incr_top(L);
- }
- 
-+#if LJ_64
-+static void *lightud_intern(lua_State *L, void *p)
-+{
-+  global_State *g = G(L);
-+  uint64_t u = (uint64_t)p;
-+  uint32_t up = lightudup(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  MSize segnum = g->gc.lightudnum;
-+  if (segmap) {
-+    MSize seg;
-+    for (seg = 0; seg <= segnum; seg++)
-+      if (segmap[seg] == up)  /* Fast path. */
-+	return (void *)(((uint64_t)seg << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+    segnum++;
-+  }
-+  if (!((segnum-1) & segnum) && segnum != 1) {
-+    if (segnum >= (1 << LJ_LIGHTUD_BITS_SEG)) lj_err_msg(L, LJ_ERR_BADLU);
-+    lj_mem_reallocvec(L, segmap, segnum, segnum ? 2*segnum : 2u, uint32_t);
-+    setmref(g->gc.lightudseg, segmap);
-+  }
-+  g->gc.lightudnum = segnum;
-+  segmap[segnum] = up;
-+  return (void *)(((uint64_t)segnum << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+}
-+#endif
-+
- LUA_API void lua_pushlightuserdata(lua_State *L, void *p)
- {
--  setlightudV(L->top, checklightudptr(L, p));
-+#if LJ_64
-+  p = lightud_intern(L, p);
-+#endif
-+  setrawlightudV(L->top, p);
-   incr_top(L);
- }
- 
-@@ -1138,7 +1167,10 @@
-   fn->c.f = func;
-   setfuncV(L, top++, fn);
-   if (LJ_FR2) setnilV(top++);
--  setlightudV(top++, checklightudptr(L, ud));
-+#if LJ_64
-+  ud = lightud_intern(L, ud);
-+#endif
-+  setrawlightudV(top++, ud);
-   cframe_nres(L->cframe) = 1+0;  /* Zero results. */
-   L->top = top;
-   return top-1;  /* Now call the newly allocated C function. */
-Index: luajit/src/lj_ccall.c
-===================================================================
---- luajit.orig/src/lj_ccall.c
-+++ luajit/src/lj_ccall.c
-@@ -1314,7 +1314,7 @@
-     lj_vm_ffi_call(&cc);
-     if (cts->cb.slot != ~0u) {  /* Blacklist function that called a callback. */
-       TValue tv;
--      setlightudV(&tv, (void *)cc.func);
-+      tv.u64 = ((uintptr_t)(void *)cc.func >> 2) | U64x(800000000, 00000000);
-       setboolV(lj_tab_set(L, cts->miscmap, &tv), 1);
-     }
-     ct = (CType *)((intptr_t)ct+(intptr_t)cts->tab);  /* May be reallocated. */
-Index: luajit/src/lj_cconv.c
-===================================================================
---- luajit.orig/src/lj_cconv.c
-+++ luajit/src/lj_cconv.c
-@@ -611,7 +611,7 @@
-     if (ud->udtype == UDTYPE_IO_FILE)
-       tmpptr = *(void **)tmpptr;
-   } else if (tvislightud(o)) {
--    tmpptr = lightudV(o);
-+    tmpptr = lightudV(cts->g, o);
-   } else if (tvisfunc(o)) {
-     void *p = lj_ccallback_new(cts, d, funcV(o));
-     if (p) {
-Index: luajit/src/lj_crecord.c
-===================================================================
---- luajit.orig/src/lj_crecord.c
-+++ luajit/src/lj_crecord.c
-@@ -643,8 +643,7 @@
-     }
-   } else if (tref_islightud(sp)) {
- #if LJ_64
--    sp = emitir(IRT(IR_BAND, IRT_P64), sp,
--		lj_ir_kint64(J, U64x(00007fff,ffffffff)));
-+    lj_trace_err(J, LJ_TRERR_NYICONV);
- #endif
-   } else {  /* NYI: tref_istab(sp). */
-     IRType t;
-@@ -1209,8 +1208,7 @@
-     TRef tr;
-     TValue tv;
-     /* Check for blacklisted C functions that might call a callback. */
--    setlightudV(&tv,
--		cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4));
-+    tv.u64 = ((uintptr_t)cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4) >> 2) | U64x(800000000, 00000000);
-     if (tvistrue(lj_tab_get(J->L, cts->miscmap, &tv)))
-       lj_trace_err(J, LJ_TRERR_BLACKL);
-     if (ctype_isvoid(ctr->info)) {
-Index: luajit/src/lj_dispatch.c
-===================================================================
---- luajit.orig/src/lj_dispatch.c
-+++ luajit/src/lj_dispatch.c
-@@ -302,7 +302,7 @@
-       if (idx != 0) {
- 	cTValue *tv = idx > 0 ? L->base + (idx-1) : L->top + idx;
- 	if (tvislightud(tv))
--	  g->wrapf = (lua_CFunction)lightudV(tv);
-+	  g->wrapf = (lua_CFunction)lightudV(g, tv);
- 	else
- 	  return 0;  /* Failed. */
-       } else {
-Index: luajit/src/lj_ir.c
-===================================================================
---- luajit.orig/src/lj_ir.c
-+++ luajit/src/lj_ir.c
-@@ -386,8 +386,10 @@
-   case IR_KPRI: setpriV(tv, irt_toitype(ir->t)); break;
-   case IR_KINT: setintV(tv, ir->i); break;
-   case IR_KGC: setgcV(L, tv, ir_kgc(ir), irt_toitype(ir->t)); break;
--  case IR_KPTR: case IR_KKPTR: setlightudV(tv, ir_kptr(ir)); break;
--  case IR_KNULL: setlightudV(tv, NULL); break;
-+  case IR_KPTR: case IR_KKPTR:
-+    setnumV(tv, (lua_Number)(uintptr_t)ir_kptr(ir));
-+    break;
-+  case IR_KNULL: setintV(tv, 0); break;
-   case IR_KNUM: setnumV(tv, ir_knum(ir)->n); break;
- #if LJ_HASFFI
-   case IR_KINT64: {
-Index: luajit/src/lj_obj.c
-===================================================================
---- luajit.orig/src/lj_obj.c
-+++ luajit/src/lj_obj.c
-@@ -34,12 +34,13 @@
- }
- 
- /* Return pointer to object or its object data. */
--const void * LJ_FASTCALL lj_obj_ptr(cTValue *o)
-+const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o)
- {
-+  UNUSED(g);
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(g, o);
-   else if (LJ_HASFFI && tviscdata(o))
-     return cdataptr(cdataV(o));
-   else if (tvisgcv(o))
-Index: luajit/src/lj_obj.h
-===================================================================
---- luajit.orig/src/lj_obj.h
-+++ luajit/src/lj_obj.h
-@@ -232,7 +232,7 @@
- **                  ---MSW---.---LSW---
- ** primitive types |  itype  |         |
- ** lightuserdata   |  itype  |  void * |  (32 bit platforms)
--** lightuserdata   |ffff|    void *    |  (64 bit platforms, 47 bit pointers)
-+** lightuserdata   |ffff|seg|    ofs   |  (64 bit platforms)
- ** GC objects      |  itype  |  GCRef  |
- ** int (LJ_DUALNUM)|  itype  |   int   |
- ** number           -------double------
-@@ -245,7 +245,8 @@
- **
- **                     ------MSW------.------LSW------
- ** primitive types    |1..1|itype|1..................1|
--** GC objects/lightud |1..1|itype|-------GCRef--------|
-+** GC objects         |1..1|itype|-------GCRef--------|
-+** lightuserdata      |1..1|itype|seg|------ofs-------|
- ** int (LJ_DUALNUM)   |1..1|itype|0..0|-----int-------|
- ** number              ------------double-------------
- **
-@@ -285,6 +286,12 @@
- #define LJ_GCVMASK		(((uint64_t)1 << 47) - 1)
- #endif
- 
-+#if LJ_64
-+/* To stay within 47 bits, lightuserdata is segmented. */
-+#define LJ_LIGHTUD_BITS_SEG	8
-+#define LJ_LIGHTUD_BITS_LO	(47 - LJ_LIGHTUD_BITS_SEG)
-+#endif
-+
- /* -- String object ------------------------------------------------------- */
- 
- /* String object header. String payload follows. */
-@@ -576,7 +583,11 @@
-   uint8_t currentwhite;	/* Current white color. */
-   uint8_t state;	/* GC state. */
-   uint8_t nocdatafin;	/* No cdata finalizer called. */
--  uint8_t unused2;
-+#if LJ_64
-+  uint8_t lightudnum;	/* Number of lightuserdata segments - 1. */
-+#else
-+  uint8_t unused1;
-+#endif
-   MSize sweepstr;	/* Sweep position in string table. */
-   GCRef root;		/* List of all collectable objects. */
-   MRef sweep;		/* Sweep position in root list. */
-@@ -588,6 +599,9 @@
-   GCSize estimate;	/* Estimate of memory actually in use. */
-   MSize stepmul;	/* Incremental GC step granularity. */
-   MSize pause;		/* Pause between successive GC cycles. */
-+#if LJ_64
-+  MRef lightudseg;	/* Upper bits of lightuserdata segments. */
-+#endif
- } GCState;
- 
- /* Global state, shared by all threads of a Lua universe. */
-@@ -795,10 +809,23 @@
- #endif
- #define boolV(o)	check_exp(tvisbool(o), (LJ_TFALSE - itype(o)))
- #if LJ_64
--#define lightudV(o) \
--  check_exp(tvislightud(o), (void *)((o)->u64 & U64x(00007fff,ffffffff)))
-+#define lightudseg(u) \
-+  (((u) >> LJ_LIGHTUD_BITS_LO) & ((1 << LJ_LIGHTUD_BITS_SEG)-1))
-+#define lightudlo(u) \
-+  ((u) & (((uint64_t)1 << LJ_LIGHTUD_BITS_LO) - 1))
-+#define lightudup(p) \
-+  ((uint32_t)(((p) >> LJ_LIGHTUD_BITS_LO) << (LJ_LIGHTUD_BITS_LO-32)))
-+static LJ_AINLINE void *lightudV(global_State *g, cTValue *o)
-+{
-+  uint64_t u = o->u64;
-+  uint64_t seg = lightudseg(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  lua_assert(tvislightud(o));
-+  lua_assert(seg <= g->gc.lightudnum);
-+  return (void *)(((uint64_t)segmap[seg] << 32) | lightudlo(u));
-+}
- #else
--#define lightudV(o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
-+#define lightudV(g, o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
- #endif
- #define gcV(o)		check_exp(tvisgcv(o), gcval(o))
- #define strV(o)		check_exp(tvisstr(o), &gcval(o)->str)
-@@ -824,7 +851,7 @@
- #define setpriV(o, i)		(setitype((o), (i)))
- #endif
- 
--static LJ_AINLINE void setlightudV(TValue *o, void *p)
-+static LJ_AINLINE void setrawlightudV(TValue *o, void *p)
- {
- #if LJ_GC64
-   o->u64 = (uint64_t)p | (((uint64_t)LJ_TLIGHTUD) << 47);
-@@ -835,24 +862,14 @@
- #endif
- }
- 
--#if LJ_64
--#define checklightudptr(L, p) \
--  (((uint64_t)(p) >> 47) ? (lj_err_msg(L, LJ_ERR_BADLU), NULL) : (p))
--#else
--#define checklightudptr(L, p)	(p)
--#endif
--
--#if LJ_FR2
-+#if LJ_FR2 || LJ_32
- #define contptr(f)		((void *)(f))
- #define setcont(o, f)		((o)->u64 = (uint64_t)(uintptr_t)contptr(f))
--#elif LJ_64
-+#else
- #define contptr(f) \
-   ((void *)(uintptr_t)(uint32_t)((intptr_t)(f) - (intptr_t)lj_vm_asm_begin))
- #define setcont(o, f) \
-   ((o)->u64 = (uint64_t)(void *)(f) - (uint64_t)lj_vm_asm_begin)
--#else
--#define contptr(f)		((void *)(f))
--#define setcont(o, f)		setlightudV((o), contptr(f))
- #endif
- 
- #define tvchecklive(L, o) \
-@@ -978,6 +995,6 @@
- 
- /* Compare two objects without calling metamethods. */
- LJ_FUNC int LJ_FASTCALL lj_obj_equal(cTValue *o1, cTValue *o2);
--LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(cTValue *o);
-+LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o);
- 
- #endif
-Index: luajit/src/lj_snap.c
-===================================================================
---- luajit.orig/src/lj_snap.c
-+++ luajit/src/lj_snap.c
-@@ -626,7 +626,12 @@
-   IRType1 t = ir->t;
-   RegSP rs = ir->prev;
-   if (irref_isk(ref)) {  /* Restore constant slot. */
--    lj_ir_kvalue(J->L, o, ir);
-+    if (ir->o == IR_KPTR) {
-+      o->u64 = (uint64_t)(uintptr_t)ir_kptr(ir);
-+    } else {
-+      lua_assert(!(ir->o == IR_KKPTR || ir->o == IR_KNULL));
-+      lj_ir_kvalue(J->L, o, ir);
-+    }
-     return;
-   }
-   if (LJ_UNLIKELY(bloomtest(rfilt, ref)))
-Index: luajit/src/lj_state.c
-===================================================================
---- luajit.orig/src/lj_state.c
-+++ luajit/src/lj_state.c
-@@ -171,6 +171,12 @@
-   lj_mem_freevec(g, g->strhash, g->strmask+1, GCRef);
-   lj_buf_free(g, &g->tmpbuf);
-   lj_mem_freevec(g, tvref(L->stack), L->stacksize, TValue);
-+#if LJ_64
-+  if (mref(g->gc.lightudseg, uint32_t)) {
-+    MSize segnum = g->gc.lightudnum ? (2 << lj_fls(g->gc.lightudnum)) : 2;
-+    lj_mem_freevec(g, mref(g->gc.lightudseg, uint32_t), segnum, uint32_t);
-+  }
-+#endif
-   lua_assert(g->gc.total == sizeof(GG_State));
- #ifndef LUAJIT_USE_SYSMALLOC
-   if (g->allocf == lj_alloc_f)
-Index: luajit/src/lj_strfmt.c
-===================================================================
---- luajit.orig/src/lj_strfmt.c
-+++ luajit/src/lj_strfmt.c
-@@ -393,7 +393,7 @@
-       p = lj_buf_wmem(p, "builtin#", 8);
-       p = lj_strfmt_wint(p, funcV(o)->c.ffid);
-     } else {
--      p = lj_strfmt_wptr(p, lj_obj_ptr(o));
-+      p = lj_strfmt_wptr(p, lj_obj_ptr(G(L), o));
-     }
-     return lj_str_new(L, buf, (size_t)(p - buf));
-   }
diff --git a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
index 96e4c9106acf9..ac2a967c974d4 100644
--- a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
+++ b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
@@ -1,16 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Tue, 17 Nov 2015 16:27:11 +0100
-Subject: Enable debugging symbols in the build
-
----
- src/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git src/Makefile src/Makefile
-index 8a38efd..6b73a89 100644
+diff --git a/src/Makefile b/src/Makefile
+index 3a6a432..b606927 100644
 --- a/src/Makefile
 +++ b/src/Makefile
-@@ -54,9 +54,9 @@ CCOPT_arm64=
+@@ -53,9 +53,9 @@ CCOPT_arm64=
  CCOPT_ppc=
  CCOPT_mips=
  #
diff --git a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch b/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
deleted file mode 100644
index f53e211071063..0000000000000
--- a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
+++ /dev/null
@@ -1,33 +0,0 @@
---- a/src/jit/bcsave.lua	2018-12-17 19:06:27.215042417 +0100
-+++ b/src/jit/bcsave.lua	2018-12-17 19:17:12.982477945 +0100
-@@ -64,7 +64,7 @@
- 
- local map_arch = {
-   x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
--  ppc = true, mips = true, mipsel = true,
-+  ppc = true, ppc64 = true, ppc64le = true, mips = true, mipsel = true,
- }
- 
- local map_os = {
-@@ -200,9 +200,10 @@
- ]]
-   local symname = LJBC_PREFIX..ctx.modname
-   local is64, isbe = false, false
--  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
-+  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" or ctx.arch == "ppc64" or ctx.arch == "ppc64le" then
-     is64 = true
--  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
-+  end
-+  if ctx.arch == "ppc" or ctx.arch == "ppc64" or ctx.arch == "mips" then
-     isbe = true
-   end
- 
-@@ -237,7 +238,7 @@
-   hdr.eendian = isbe and 2 or 1
-   hdr.eversion = 1
-   hdr.type = f16(1)
--  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
-+  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, ppc64=21, ppc64le=21, mips=8, mipsel=8 })[ctx.arch])
-   if ctx.arch == "mips" or ctx.arch == "mipsel" then
-     hdr.flags = f32(0x50001006)
-   end
diff --git a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
index 59e1ee72fcbb8..207762263de53 100644
--- a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
+++ b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
@@ -1,18 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Thu, 19 Nov 2015 16:29:02 +0200
-Subject: Get rid of LUAJIT_VERSION_SYM that changes ABI on every patch release
-
----
- src/lj_dispatch.c | 5 -----
- src/luajit.c      | 2 --
- src/luajit.h      | 3 ---
- 3 files changed, 10 deletions(-)
-
-diff --git src/lj_dispatch.c src/lj_dispatch.c
-index 5d6795f..e865a78 100644
+diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
+index 7b66be7..6d31a61 100644
 --- a/src/lj_dispatch.c
 +++ b/src/lj_dispatch.c
-@@ -319,11 +319,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
+@@ -318,11 +318,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
    return 1;  /* OK. */
  }
  
@@ -24,23 +14,22 @@ index 5d6795f..e865a78 100644
  /* -- Hooks --------------------------------------------------------------- */
  
  /* This function can be called asynchronously (e.g. during a signal). */
-diff --git src/luajit.c src/luajit.c
-index 1ca2430..ccf425e 100644
+diff --git a/src/luajit.c b/src/luajit.c
+index 73e29d4..31fdba1 100644
 --- a/src/luajit.c
 +++ b/src/luajit.c
-@@ -516,8 +516,6 @@ static int pmain(lua_State *L)
+@@ -515,7 +515,6 @@ static int pmain(lua_State *L)
+   int argn;
+   int flags = 0;
    globalL = L;
-   if (argv[0] && argv[0][0]) progname = argv[0];
- 
 -  LUAJIT_VERSION_SYM();  /* Linker-enforced version check. */
--
+ 
    argn = collectargs(argv, &flags);
    if (argn < 0) {  /* Invalid args? */
-     print_usage();
-diff --git src/luajit.h src/luajit.h
-index 708a5a1..35ae02c 100644
---- a/src/luajit.h
-+++ b/src/luajit.h
+diff --git a/src/luajit_rolling.h b/src/luajit_rolling.h
+index e564477..1c7c142 100644
+--- a/src/luajit_rolling.h
++++ b/src/luajit_rolling.h
 @@ -73,7 +73,4 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
  LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt,
  					     int depth, size_t *len);
diff --git a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch b/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
deleted file mode 100644
index aedaacbaaea46..0000000000000
--- a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
+++ /dev/null
@@ -1,21 +0,0 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Wed, 11 Oct 2017 08:42:41 +0000
-Subject: Make ccall_copy_struct static to unpollute global library namespace
-
----
- src/lj_ccall.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git src/lj_ccall.c src/lj_ccall.c
-index b891591..a7dcc1b 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -960,7 +960,7 @@ noth:  /* Not a homogeneous float/double aggregate. */
-   return 0;  /* Struct is in GPRs. */
- }
- 
--void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
-+static void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
- {
-   if (LJ_ABI_SOFTFP ? ft :
-       ((ft & 3) == FTYPE_FLOAT || (ft >> 2) == FTYPE_FLOAT)) {
diff --git a/srcpkgs/LuaJIT/template b/srcpkgs/LuaJIT/template
index 85449ac3d6f73..70b49147e98a6 100644
--- a/srcpkgs/LuaJIT/template
+++ b/srcpkgs/LuaJIT/template
@@ -1,71 +1,58 @@
 # Template file for 'LuaJIT'
 pkgname=LuaJIT
-version=2.1.0beta3
-revision=2
-_so_version=2.1.0
-_dist_version=${_so_version}-beta3
+version=2.1.1692580715
+revision=1
+_dist_version=2.1.ROLLING
+build_style=gnu-makefile
 hostmakedepends="lua52-BitOp"
 short_desc="Just-In-Time Compiler for Lua"
 maintainer="Orphaned <orphan@voidlinux.org>"
 license="MIT"
 homepage="http://www.luajit.org"
-distfiles="http://luajit.org/download/${pkgname}-${_dist_version}.tar.gz"
-checksum=1ad2e34b111c802f9d0cdf019e986909123237a28c746b21295b63c9e785d9c3
+distfiles="https://repo.or.cz/luajit-2.0.git/snapshot/refs/tags/v${_dist_version}.tar.gz"
+checksum=e4b2e127def9b7c7fa97161c4d2f35d1d273a8c73c8734f97bc54208324e8fea
 
 build_options="lua52compat"
+desc_option_lua52compat="higher compatibility with lua 5.2"
 
-_cross_cc="cc"
-if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
-	if [ "${XBPS_MACHINE/-musl/}" = "x86_64" ]; then
-		hostmakedepends+=" cross-i686-linux-musl"
-		_cross_cc="i686-linux-musl-gcc -static"
-	else
-		broken="Host and target wordsize must match"
+_host_cc="cc"
+if [ -n "$CROSS_BUILD" ]; then
+	if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
+		if [ "${XBPS_MACHINE%%-*}" = "x86_64" ]; then
+			hostmakedepends+=" cross-i686-linux-musl"
+			_host_cc="i686-linux-musl-gcc -static"
+		else
+			broken="Host and target wordsize must match when not on x86_64"
+		fi
 	fi
-fi
 
-# the ppc64 patchset subtly breaks ppc, needs investigation; for
-# now apply patches conditionally, separately for ppc64 and ppc
-post_patch() {
-	local patchdir
+	make_build_args+=" CROSS=${XBPS_CROSS_TRIPLET}-"
+fi
 
-	case "$XBPS_TARGET_MACHINE" in
-		ppc64*) patchdir="ppc64";;
-		ppc*) patchdir="ppc";;
-		*) return;;
-	esac
+pre_build() {
+	if [ "$build_option_lua52compat" ]; then
+		make_build_args+=" XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
+	fi
 
-	for i in ${FILESDIR}/patches/${patchdir}/*.patch; do
-		msg_normal "patching: $i\n"
-		patch -sNp1 -i ${i}
-	done
+	# luajit gets its lowest version from this file if not using git
+	printf '%s' "${version##*.}" > "${wrksrc}/.relver"
 }
 
 do_build() {
+	# if we don't unset, the build fails to cross-compile
+	# due to confliction with the makefile macros
 	local _cflags=$CFLAGS
 	local _ldflags=$LDFLAGS
-
-	if [ "$CROSS_BUILD" ]; then
-		local cross="CROSS=${XBPS_CROSS_TRIPLET}-"
-	fi
-
-	if [ "$build_option_lua52compat" ]; then
-		local _xcflags="XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
-	fi
-
 	unset CFLAGS LDFLAGS
-	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 HOST_CC="${_cross_cc}" \
+
+	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 \
 		HOST_CFLAGS="$XBPS_CFLAGS" HOST_LDFLAGS="$XBPS_LDFLAGS" \
 		TARGET_CFLAGS="${_cflags}" TARGET_LDFLAGS="${_ldflags}" \
-		${_xcflags} ${cross}
+		HOST_CC="${_host_cc}" ${make_build_args}
 }
 
-do_install() {
-	make DPREFIX=${DESTDIR}/usr DESTDIR=${DESTDIR} \
-		INSTALL_SHARE=${DESTDIR}/usr/share PREFIX=/usr install
-
+post_install() {
 	mv ${DESTDIR}/usr/bin/luajit-* ${DESTDIR}/usr/bin/luajit
-	ln -fs libluajit-5.1.so.${_so_version} ${DESTDIR}/usr/lib/libluajit-5.1.so.2
 	vlicense COPYRIGHT
 }
 
diff --git a/srcpkgs/LuaJIT/update b/srcpkgs/LuaJIT/update
index 15613910677c9..96bbbd453c32c 100644
--- a/srcpkgs/LuaJIT/update
+++ b/srcpkgs/LuaJIT/update
@@ -1 +1 @@
-site="http://luajit.org/download.html"
+disabled="impossible to determine given release style and versioning scheme"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
  2024-01-31  8:17 ` [PR PATCH] [Updated] " yoshiyoshyosh
@ 2024-01-31 17:41 ` Chocimier
  2024-01-31 17:49 ` yoshiyoshyosh
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Chocimier @ 2024-01-31 17:41 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 279 bytes --]

New review comment by Chocimier on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473213891

Comment:
Upstream merges into this branch up to few times a day, what means checksum changes before we accept PR. Use commit hash instead.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
  2024-01-31  8:17 ` [PR PATCH] [Updated] " yoshiyoshyosh
  2024-01-31 17:41 ` [PR REVIEW] " Chocimier
@ 2024-01-31 17:49 ` yoshiyoshyosh
  2024-01-31 17:53 ` yoshiyoshyosh
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31 17:49 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 363 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473226844

Comment:
I am *not* using the branch ref. I am using the *tag* ref that was updated 5 months ago at the start of the rolling release transition: https://repo.or.cz/luajit-2.0.git/refs

should I still use commit hash?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (2 preceding siblings ...)
  2024-01-31 17:49 ` yoshiyoshyosh
@ 2024-01-31 17:53 ` yoshiyoshyosh
  2024-01-31 17:53 ` yoshiyoshyosh
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31 17:53 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473226844

Comment:
I am *not* using the branch ref. I am using the *tag* ref that was updated 5 months ago at the start of the rolling release transition: https://repo.or.cz/luajit-2.0.git/refs

if this is the case, should I still use commit hash?

or, in general, would it be better to simply use the constantly updating branch instead?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (3 preceding siblings ...)
  2024-01-31 17:53 ` yoshiyoshyosh
@ 2024-01-31 17:53 ` yoshiyoshyosh
  2024-01-31 18:14 ` Chocimier
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31 17:53 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473226844

Comment:
I am *not* using the branch ref. I am using the *tag* ref that was updated 5 months ago at the start of the rolling release transition: https://repo.or.cz/luajit-2.0.git/refs

if this is the case, should I still use commit hash?

or, in general, would it be better to use the constantly updating branch instead (with the commit hash of course)?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (4 preceding siblings ...)
  2024-01-31 17:53 ` yoshiyoshyosh
@ 2024-01-31 18:14 ` Chocimier
  2024-01-31 18:18 ` yoshiyoshyosh
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Chocimier @ 2024-01-31 18:14 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 259 bytes --]

New review comment by Chocimier on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473269653

Comment:
Anytime upstream moves tag/branch/whatever, template breaks.  Please use commit hash so we can have rebuilds.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (5 preceding siblings ...)
  2024-01-31 18:14 ` Chocimier
@ 2024-01-31 18:18 ` yoshiyoshyosh
  2024-01-31 18:29 ` [PR PATCH] [Updated] " yoshiyoshyosh
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31 18:18 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 224 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1473274130

Comment:
alrighty. might as well move to the actual rolling release branch then

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR PATCH] [Updated] LuaJIT: update to 2.1.1692580715, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (6 preceding siblings ...)
  2024-01-31 18:18 ` yoshiyoshyosh
@ 2024-01-31 18:29 ` yoshiyoshyosh
  2024-02-15 13:38 ` [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup Calandracas606
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-01-31 18:29 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2770 bytes --]

There is an updated pull request by yoshiyoshyosh against master on the void-packages repository

https://github.com/yoshiyoshyosh/void-packages luajit-2.1-rolling
https://github.com/void-linux/void-packages/pull/48453

LuaJIT: update to 2.1.1692580715, cleanup.
#### Testing the changes
- I tested the changes in this PR: **briefly**
  - The only lua thing I really run is awesomewm. I built awesomewm against luajit with the build option and everything seems good, but of course any further testing is encouraged.

#### Local build testing
- I built this PR locally for my native architecture, (`x86_64-glibc`)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - `x86_64-musl`
  - `i686-glibc` (both crossbuild and masterdir)
  - `aarch64-glibc` (crossbuild)
  - `aarch64-musl` (crossbuild)
  - `armv7l-glibc` (crossbuild)

This addresses #48349.

LuaJIT has moved to "rolling releases" on branches in their git repo, which basically means releases are git commits to a `v2.1` branch. Of course, this is incompatible with void's packaging philosophy. However, there also seems to be a `v2.1` *tag* that was created during the move and not updated since. I'm unsure on whether this tag is simply meant to be a marker for the start of v2.1 in the new rolling release era, or if they intend for it to be a stable tag that "releases" might occasionally get pushed to every now and then.
Whatever the case, this is a tag that was "released" in a form they seemingly deem stable enough, which is why I think of it as enough to update to (especially since we'd be getting off a 6 year old version to a 5 month old version now).

Regarding the version number: In the makefiles, there exists a `RELVER` macro [that gets set by a `git show` command](https://repo.or.cz/luajit-2.0.git/blob/2090842410e0ba6f81fad310a77bf5432488249a:/src/Makefile#l478). The "canonical" version number in the makefiles then becomes `major.minor.relver` and the binary/library version is installed with this version number. This is the only real patch version number that we have, so it's what I believe should go in the version number. I just used the same `git show` that they use and baked it into `version`

I removed all but two of the patches, as they have either been upstreamed into the `v2.1` tag or were for powerpc, which void doesn't support anymore. Should we even have the "enable debug symbols" patch for main repo builds instead of leaving it to `-dbg`? I'm only keeping it because every other distribution keeps it.

I also just cleaned up the template in general; it's more concise and organized IMO while achieving the same thing.

A patch file from https://github.com/void-linux/void-packages/pull/48453.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-luajit-2.1-rolling-48453.patch --]
[-- Type: text/x-diff, Size: 158630 bytes --]

From 537e65c11f39c4543ed1caa864b32fe5e816da24 Mon Sep 17 00:00:00 2001
From: yosh <yosh-git@riseup.net>
Date: Wed, 31 Jan 2024 02:54:09 -0500
Subject: [PATCH] LuaJIT: update to 2.1.1706708390, cleanup.

---
 .../patches/ppc/musl-ppc-secureplt.patch      |   93 -
 .../patches/ppc64/add-ppc64-support.patch     | 3521 -----------------
 .../patches/ppc64/fix-vm-jit-ppc64.patch      |   11 -
 .../aarch64-Fix-exit-stub-patching.patch      |  231 --
 .../aarch64-register-allocation-bug-fix.patch |   29 -
 ...1abec542e6f9851ff2368e7f196b6382a44c.patch |  562 ---
 .../LuaJIT/patches/enable-debug-symbols.patch |   14 +-
 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch |   33 -
 .../get-rid-of-luajit-version-sym.patch       |   40 +-
 .../patches/unpollute-global-namespace.patch  |   21 -
 srcpkgs/LuaJIT/template                       |   73 +-
 srcpkgs/LuaJIT/update                         |    2 +-
 12 files changed, 49 insertions(+), 4581 deletions(-)
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch

diff --git a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch b/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
deleted file mode 100644
index 3000ca0ed3d53..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-Imported from https://github.com/LuaJIT/LuaJIT/pull/486.
-
-This fixes crashes on ppc-musl, as musl only supports secureplt.
-
---- a/src/lj_dispatch.c
-+++ b/src/lj_dispatch.c
-@@ -56,6 +56,18 @@ static const ASMFunction dispatch_got[] = {
- #undef GOTFUNC
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+#include <math.h>
-+LJ_FUNCA_NORET void LJ_FASTCALL lj_ffh_coroutine_wrap_err(lua_State *L,
-+							  lua_State *co);
-+
-+#define GOTFUNC(name)	(ASMFunction)name,
-+static const ASMFunction dispatch_got[] = {
-+  GOTDEF(GOTFUNC)
-+};
-+#undef GOTFUNC
-+#endif
-+
- /* Initialize instruction dispatch table and hot counters. */
- void lj_dispatch_init(GG_State *GG)
- {
-@@ -77,6 +89,9 @@ void lj_dispatch_init(GG_State *GG)
- #if LJ_TARGET_MIPS
-   memcpy(GG->got, dispatch_got, LJ_GOT__MAX*sizeof(ASMFunction *));
- #endif
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+  memcpy(GG->got, dispatch_got, LJ_GOT__MAX*4);
-+#endif
- }
- 
- #if LJ_HASJIT
---- a/src/lj_dispatch.h
-+++ b/src/lj_dispatch.h
-@@ -66,6 +66,21 @@ GOTDEF(GOTENUM)
- };
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+/* Need our own global offset table for the dreaded MIPS calling conventions. */
-+#define GOTDEF(_) \
-+  _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \
-+  _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \
-+  _(pow) _(fmod) _(ldexp) _(sqrt)
-+
-+enum {
-+#define GOTENUM(name) LJ_GOT_##name,
-+GOTDEF(GOTENUM)
-+#undef GOTENUM
-+  LJ_GOT__MAX
-+};
-+#endif
-+
- /* Type of hot counter. Must match the code in the assembler VM. */
- /* 16 bits are sufficient. Only 0.0015% overhead with maximum slot penalty. */
- typedef uint16_t HotCount;
-@@ -89,7 +104,7 @@ typedef uint16_t HotCount;
- typedef struct GG_State {
-   lua_State L;				/* Main thread. */
-   global_State g;			/* Global state. */
--#if LJ_TARGET_MIPS
-+#if LJ_TARGET_MIPS || (LJ_TARGET_PPC && (LJ_ARCH_BITS == 32))
-   ASMFunction got[LJ_GOT__MAX];		/* Global offset table. */
- #endif
- #if LJ_HASJIT
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -59,7 +59,12 @@
- |.define ENV_OFS,	8
- |.endif
- |.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro blex, target
-+|  lwz TMP0, DISPATCH_GOT(target)(DISPATCH)
-+|  mtctr TMP0
-+|  bctrl
-+|  //bl extern target@plt
-+|.endmacro
- |.macro .toc, a, b; .endmacro
- |.endif
- |.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-@@ -448,6 +453,8 @@
- |// Assumes DISPATCH is relative to GL.
- #define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
- #define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
-+#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
-+#define DISPATCH_GOT(name)	(GG_DISP2GOT + 4*LJ_GOT_##name)
- |
- #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
- |
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch b/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
deleted file mode 100644
index 7c865859da923..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
+++ /dev/null
@@ -1,3521 +0,0 @@
-From: "Rodrigo R. Galvao" <rosattig@br.ibm.com>
-Date: Wed, 11 Oct 2017 08:41:47 +0000
-Subject: New patch proposal for PPC64 support
-
- Create a patch for PPC64 support based on 
-https://github.com/LuaJIT/LuaJIT/pull/140.
- It replaces the old patch since this new one is more likely to be merged 
-with luajit upstream.
-
-
-Author: Rodrigo R. Galvao <rosattig@br.ibm.com>
----
- dynasm/dasm_ppc.lua    |    5 +
- src/Makefile           |   11 +-
- src/host/buildvm_asm.c |   16 +-
- src/lj_arch.h          |   18 +-
- src/lj_ccall.c         |  166 ++++++-
- src/lj_ccall.h         |   13 +
- src/lj_ccallback.c     |   68 ++-
- src/lj_ctype.h         |    2 +-
- src/lj_def.h           |    4 +
- src/lj_frame.h         |    9 +
- src/lj_target_ppc.h    |   14 +
- src/vm_ppc.dasc        | 1290 ++++++++++++++++++++++++++++++++----------------
- 12 files changed, 1162 insertions(+), 454 deletions(-)
-
-diff --git dynasm/dasm_ppc.lua dynasm/dasm_ppc.lua
-index f73974d..a4ad70b 100644
---- a/dynasm/dasm_ppc.lua
-+++ b/dynasm/dasm_ppc.lua
-@@ -257,9 +257,11 @@ map_op = {
-   addic_3 =	"30000000RRI",
-   ["addic._3"] = "34000000RRI",
-   addi_3 =	"38000000RR0I",
-+  addil_3 =	"38000000RR0J",
-   li_2 =	"38000000RI",
-   la_2 =	"38000000RD",
-   addis_3 =	"3c000000RR0I",
-+  addisl_3 =	"3c000000RR0J",
-   lis_2 =	"3c000000RI",
-   lus_2 =	"3c000000RU",
-   bc_3 =	"40000000AAK",
-@@ -842,6 +844,9 @@ map_op = {
-   srdi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "64-("..p[3]..")"
-   end),
-+  ["srdi._3"] =	op_alias("rldicl._4", function(p)
-+    p[4] = p[3]; p[3] = "64-("..p[3]..")"
-+  end),
-   clrldi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "0"
-   end),
-diff --git src/Makefile src/Makefile
-index 6b73a89..cc50bae 100644
---- a/src/Makefile
-+++ b/src/Makefile
-@@ -453,7 +453,16 @@ ifeq (ppc,$(TARGET_LJARCH))
-     DASM_AFLAGS+= -D GPR64
-   endif
-   ifeq (PS3,$(TARGET_SYS))
--    DASM_AFLAGS+= -D PPE -D TOC
-+    DASM_AFLAGS+= -D PPE
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPD 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPD
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPDENV 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPDENV
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_ELFV2 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D ELFV2
-   endif
-   ifneq (,$(findstring LJ_ARCH_PPC64 ,$(TARGET_TESTARCH)))
-     DASM_ARCH= ppc64
-diff --git src/host/buildvm_asm.c src/host/buildvm_asm.c
-index ffd1490..6bb995e 100644
---- a/src/host/buildvm_asm.c
-+++ b/src/host/buildvm_asm.c
-@@ -140,18 +140,14 @@ static void emit_asm_wordreloc(BuildCtx *ctx, uint8_t *p, int n,
- #else
- #define TOCPREFIX ""
- #endif
--  if ((ins >> 26) == 16) {
-+  if ((ins >> 26) == 14) {
-+    fprintf(ctx->fp, "\taddi %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 15) {
-+    fprintf(ctx->fp, "\taddis %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 16) {
-     fprintf(ctx->fp, "\t%s %d, %d, " TOCPREFIX "%s\n",
- 	    (ins & 1) ? "bcl" : "bc", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-   } else if ((ins >> 26) == 18) {
--#if LJ_ARCH_PPC64
--    const char *suffix = strchr(sym, '@');
--    if (suffix && suffix[1] == 'h') {
--      fprintf(ctx->fp, "\taddis 11, 2, %s\n", sym);
--    } else if (suffix && suffix[1] == 'l') {
--      fprintf(ctx->fp, "\tld 12, %s\n", sym);
--    } else
--#endif
-     fprintf(ctx->fp, "\t%s " TOCPREFIX "%s\n", (ins & 1) ? "bl" : "b", sym);
-   } else {
-     fprintf(stderr,
-@@ -250,7 +246,7 @@ void emit_asm(BuildCtx *ctx)
-   int i, rel;
- 
-   fprintf(ctx->fp, "\t.file \"buildvm_%s.dasc\"\n", ctx->dasm_arch);
--#if LJ_ARCH_PPC64
-+#if LJ_ARCH_PPC_ELFV2
-   fprintf(ctx->fp, "\t.abiversion 2\n");
- #endif
-   fprintf(ctx->fp, "\t.text\n");
-diff --git src/lj_arch.h src/lj_arch.h
-index d609b37..53bc651 100644
---- a/src/lj_arch.h
-+++ b/src/lj_arch.h
-@@ -269,10 +269,18 @@
- #if LJ_TARGET_CONSOLE
- #define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOFFI		1
-+#if LJ_TARGET_PS3
-+#define LJ_ARCH_PPC_OPD		1
-+#endif
- #elif LJ_ARCH_BITS == 64
--#define LJ_ARCH_PPC64		1
--#define LJ_TARGET_GC64		1
-+#define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOJIT		1	/* NYI */
-+#if _CALL_ELF == 2
-+#define LJ_ARCH_PPC_ELFV2	1
-+#else
-+#define LJ_ARCH_PPC_OPD		1
-+#define LJ_ARCH_PPC_OPDENV	1
-+#endif
- #endif
- 
- #if _ARCH_PWR7
-@@ -423,12 +431,6 @@
- #if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
- #error "No support for PowerPC CPUs without double-precision FPU"
- #endif
--#if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
--#error "No support for little-endian PPC32"
--#endif
--#if LJ_ARCH_PPC64
--#error "No support for PowerPC 64 bit mode (yet)"
--#endif
- #ifdef __NO_FPRS__
- #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
- #endif
-diff --git src/lj_ccall.c src/lj_ccall.c
-index 5c252e5..b891591 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -369,21 +369,97 @@
- #elif LJ_TARGET_PPC
- /* -- PPC calling conventions --------------------------------------------- */
- 
-+#if LJ_ARCH_BITS == 64
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  if (sz > 16 && ccall_classify_fp(cts, ctr) <= 0) { \
-+    cc->retref = 1;  /* Return by reference. */ \
-+    cc->gpr[ngpr++] = (GPRArg)dp; \
-+  }
-+
-+#define CCALL_HANDLE_STRUCTRET2 \
-+  int isfp = ccall_classify_fp(cts, ctr); \
-+  int i; \
-+  if (isfp == FTYPE_FLOAT) { \
-+    for (i = 0; i < ctr->size / 4; i++) \
-+      ((float *)dp)[i] = cc->fpr[i]; \
-+  } else if (isfp == FTYPE_DOUBLE) { \
-+    for (i = 0; i < ctr->size / 8; i++) \
-+      ((double *)dp)[i] = cc->fpr[i]; \
-+  } else { \
-+    if (ctr->size < 8 && LJ_BE) { \
-+      sp += 8 - ctr->size; \
-+    } \
-+    memcpy(dp, sp, ctr->size); \
-+  }
-+
-+#else
-+
- #define CCALL_HANDLE_STRUCTRET \
-   cc->retref = 1;  /* Return all structs by reference. */ \
-   cc->gpr[ngpr++] = (GPRArg)dp;
- 
-+#endif
-+
- #define CCALL_HANDLE_COMPLEXRET \
-   /* Complex values are returned in 2 or 4 GPRs. */ \
-   cc->retref = 0;
- 
-+#define CCALL_HANDLE_STRUCTARG
-+
- #define CCALL_HANDLE_COMPLEXRET2 \
--  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+  if (ctr->size == 2*sizeof(float)) {  /* Copy complex float from FPRs. */ \
-+    ((float *)dp)[0] = cc->fpr[0]; \
-+    ((float *)dp)[1] = cc->fpr[1]; \
-+  } else {  /* Copy complex double from FPRs. */ \
-+    ((double *)dp)[0] = cc->fpr[0]; \
-+    ((double *)dp)[1] = cc->fpr[1]; \
-+  }
-+
-+#define CCALL_HANDLE_COMPLEXARG \
-+  isfp = 1; \
-+  if (d->size == sizeof(float) * 2) { \
-+    d = ctype_get(cts, CTID_COMPLEX_DOUBLE); \
-+    isf32 = 1; \
-+  }
-+
-+#define CCALL_HANDLE_REGARG \
-+  if (isfp && d->size == sizeof(float)) { \
-+    d = ctype_get(cts, CTID_DOUBLE); \
-+    isf32 = 1; \
-+  } \
-+  if (ngpr < maxgpr) { \
-+   dp = &cc->gpr[ngpr]; \
-+   ngpr += n; \
-+   if (ngpr > maxgpr) { \
-+     nsp += ngpr - 8; \
-+     ngpr = 8; \
-+     if (nsp > CCALL_MAXSTACK) { \
-+       goto err_nyi; \
-+     } \
-+   } \
-+   goto done; \
-+  }
-+
-+#else
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  cc->retref = 1;  /* Return all structs by reference. */ \
-+  cc->gpr[ngpr++] = (GPRArg)dp;
-+
-+#define CCALL_HANDLE_COMPLEXRET \
-+  /* Complex values are returned in 2 or 4 GPRs. */ \
-+  cc->retref = 0;
- 
- #define CCALL_HANDLE_STRUCTARG \
-   rp = cdataptr(lj_cdata_new(cts, did, sz)); \
-   sz = CTSIZE_PTR;  /* Pass all structs by reference. */
- 
-+#define CCALL_HANDLE_COMPLEXRET2 \
-+  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+
- #define CCALL_HANDLE_COMPLEXARG \
-   /* Pass complex by value in 2 or 4 GPRs. */
- 
-@@ -410,6 +486,8 @@
-     } \
-   }
- 
-+#endif
-+
- #define CCALL_HANDLE_RET \
-   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
-     ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
-@@ -801,6 +879,50 @@ noth:  /* Not a homogeneous float/double aggregate. */
- 
- #endif
- 
-+/* -- PowerPC64 ELFv2 ABI struct classification ------------------- */
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define FTYPE_FLOAT	1
-+#define FTYPE_DOUBLE	2
-+
-+static unsigned int ccall_classify_fp(CTState *cts, CType *ct) {
-+  if (ctype_isfp(ct->info)) {
-+    if (ct->size == sizeof(float))
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_iscomplex(ct->info)) {
-+    if (ct->size == sizeof(float) * 2)
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_isstruct(ct->info)) {
-+    int res = -1;
-+    int sz = ct->size;
-+    while (ct->sib) {
-+      ct = ctype_get(cts, ct->sib);
-+      if (ctype_isfield(ct->info)) {
-+        int sub = ccall_classify_fp(cts, ctype_rawchild(cts, ct));
-+        if (res == -1)
-+          res = sub;
-+        if (sub != -1 && sub != res)
-+          return 0;
-+      } else if (ctype_isbitfield(ct->info) ||
-+        ctype_isxattrib(ct->info, CTA_SUBTYPE)) {
-+        return 0;
-+      }
-+    }
-+    if (res > 0 && sz > res * 4 * 8)
-+      return 0;
-+    return res;
-+  } else {
-+    return 0;
-+  }
-+}
-+
-+#endif
-+
- /* -- MIPS64 ABI struct classification ---------------------------- */
- 
- #if LJ_TARGET_MIPS64
-@@ -974,6 +1096,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     CTSize sz;
-     MSize n, isfp = 0, isva = 0;
-     void *dp, *rp = NULL;
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    int isf32 = 0;
-+#endif
- 
-     if (fid) {  /* Get argument type from field. */
-       CType *ctf = ctype_get(cts, fid);
-@@ -1030,7 +1155,37 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-       *(void **)dp = rp;
-       dp = rp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64 && LJ_BE
-+    if (ctype_isstruct(d->info) && sz < CTSIZE_PTR) {
-+      dp = (char *)dp + (CTSIZE_PTR - sz);
-+    }
-+#endif
-     lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg));
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (isfp) {
-+      int i;
-+      for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+        cc->fpr[nfpr++] = ((double *)dp)[i];
-+    }
-+    if (isf32) {
-+      int i;
-+      for (i = 0; i < d->size / 8; i++)
-+        ((float *)dp)[i*2] = ((double *)dp)[i];
-+    }
-+#endif
-+#if LJ_ARCH_PPC_ELFV2
-+    if (ctype_isstruct(d->info)) {
-+      isfp = ccall_classify_fp(cts, d);
-+      int i;
-+      if (isfp == FTYPE_FLOAT) {
-+        for (i = 0; i < d->size / 4 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((float *)dp)[i];
-+      } else if (isfp == FTYPE_DOUBLE) {
-+        for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((double *)dp)[i];
-+      }
-+    }
-+#endif
-     /* Extend passed integers to 32 bits at least. */
-     if (ctype_isinteger_or_bool(d->info) && d->size < 4) {
-       if (d->info & CTF_UNSIGNED)
-@@ -1044,6 +1199,15 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     if (isfp && d->size == sizeof(float))
-       ((float *)dp)[1] = ((float *)dp)[0];  /* Floats occupy high slot. */
- #endif
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info))
-+	&& d->size <= 4) {
-+      if (d->info & CTF_UNSIGNED)
-+	*(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info)
- #if LJ_TARGET_MIPS64
-diff --git src/lj_ccall.h src/lj_ccall.h
-index 59f6648..bbf309f 100644
---- a/src/lj_ccall.h
-+++ b/src/lj_ccall.h
-@@ -86,10 +86,23 @@ typedef union FPRArg {
- #elif LJ_TARGET_PPC
- 
- #define CCALL_NARG_GPR		8
-+#if LJ_ARCH_BITS == 64
-+#define CCALL_NARG_FPR		13
-+#if LJ_ARCH_PPC_ELFV2
-+#define CCALL_NRET_GPR		2
-+#define CCALL_NRET_FPR		8
-+#define CCALL_SPS_EXTRA		14
-+#else
-+#define CCALL_NRET_GPR		1
-+#define CCALL_NRET_FPR		2
-+#define CCALL_SPS_EXTRA		16
-+#endif
-+#else
- #define CCALL_NARG_FPR		8
- #define CCALL_NRET_GPR		4	/* For complex double. */
- #define CCALL_NRET_FPR		1
- #define CCALL_SPS_EXTRA		4
-+#endif
- #define CCALL_SPS_FREE		0
- 
- typedef intptr_t GPRArg;
-diff --git src/lj_ccallback.c src/lj_ccallback.c
-index 846827b..eb7f445 100644
---- a/src/lj_ccallback.c
-+++ b/src/lj_ccallback.c
-@@ -61,8 +61,24 @@ static MSize CALLBACK_OFS2SLOT(MSize ofs)
- 
- #elif LJ_TARGET_PPC
- 
-+#if LJ_ARCH_PPC_OPD
-+
-+#define CALLBACK_SLOT2OFS(slot)		(24*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/24)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_OFS2SLOT(CALLBACK_MCODE_SIZE))
-+
-+#elif LJ_ARCH_PPC_ELFV2
-+
-+#define CALLBACK_SLOT2OFS(slot)		(4*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/4)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_MCODE_SIZE/4 - 10)
-+
-+#else
-+
- #define CALLBACK_MCODE_HEAD		24
- 
-+#endif
-+
- #elif LJ_TARGET_MIPS32
- 
- #define CALLBACK_MCODE_HEAD		20
-@@ -188,24 +204,59 @@ static void callback_mcode_init(global_State *g, uint32_t *page)
-   lua_assert(p - page <= CALLBACK_MCODE_SIZE);
- }
- #elif LJ_TARGET_PPC
-+#if LJ_ARCH_PPC_OPD
-+register void *vm_toc __asm__("r2");
-+static void callback_mcode_init(global_State *g, uint64_t *page)
-+{
-+  uint64_t *p = page;
-+  void *target = (void *)lj_vm_ffi_callback;
-+  MSize slot;
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p++ = (uint64_t)target;
-+    *p++ = (uint64_t)vm_toc;
-+    *p++ = (uint64_t)g | ((uint64_t)slot << 47);
-+  }
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 8);
-+}
-+#else
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-   uint32_t *p = page;
-   void *target = (void *)lj_vm_ffi_callback;
-   MSize slot;
-+#if LJ_ARCH_PPC_ELFV2
-+  // Needs to be in sync with lj_vm_ffi_callback.
-+  lua_assert(CALLBACK_MCODE_SIZE == 4096);
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p = PPCI_B | (((page+CALLBACK_MAX_SLOT-p) & 0x00ffffffu) << 2);
-+    p++;
-+  }
-+  *p++ = PPCI_LI | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 32) & 0xffff);
-+  *p++ = PPCI_LI | PPCF_T(RID_R11) | ((((intptr_t)g) >> 32) & 0xffff);
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_SYS1) | PPCF_A(RID_SYS1) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_R11) | PPCF_A(RID_R11) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_ORIS | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 16) & 0xffff);
-+  *p++ = PPCI_ORIS | PPCF_A(RID_R11) | PPCF_T(RID_R11) | ((((intptr_t)g) >> 16) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | (((intptr_t)target) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11) | PPCF_T(RID_R11) | (((intptr_t)g) & 0xffff);
-+  *p++ = PPCI_MTCTR | PPCF_T(RID_SYS1);
-+  *p++ = PPCI_BCTR;
-+#else
-   *p++ = PPCI_LIS | PPCF_T(RID_TMP) | (u32ptr(target) >> 16);
--  *p++ = PPCI_LIS | PPCF_T(RID_R12) | (u32ptr(g) >> 16);
-+  *p++ = PPCI_LIS | PPCF_T(RID_R11) | (u32ptr(g) >> 16);
-   *p++ = PPCI_ORI | PPCF_A(RID_TMP)|PPCF_T(RID_TMP) | (u32ptr(target) & 0xffff);
--  *p++ = PPCI_ORI | PPCF_A(RID_R12)|PPCF_T(RID_R12) | (u32ptr(g) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11)|PPCF_T(RID_R11) | (u32ptr(g) & 0xffff);
-   *p++ = PPCI_MTCTR | PPCF_T(RID_TMP);
-   *p++ = PPCI_BCTR;
-   for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
--    *p++ = PPCI_LI | PPCF_T(RID_R11) | slot;
-+    *p++ = PPCI_LI | PPCF_T(RID_R12) | slot;
-     *p = PPCI_B | (((page-p) & 0x00ffffffu) << 2);
-     p++;
-   }
--  lua_assert(p - page <= CALLBACK_MCODE_SIZE);
-+#endif
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 4);
- }
-+#endif
- #elif LJ_TARGET_MIPS
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-@@ -641,6 +692,15 @@ static void callback_conv_result(CTState *cts, lua_State *L, TValue *o)
- 	*(int32_t *)dp = ctr->size == 1 ? (int32_t)*(int8_t *)dp :
- 					  (int32_t)*(int16_t *)dp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (ctr->size <= 4 &&
-+       (ctype_isinteger_or_bool(ctr->info) || ctype_isenum(ctr->info))) {
-+      if (ctr->info & CTF_UNSIGNED)
-+        *(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     /* Always sign-extend results to 64 bits. Even a soft-fp 'float'. */
-     if (ctr->size <= 4 &&
-diff --git src/lj_ctype.h src/lj_ctype.h
-index 0c220a8..105865b 100644
---- a/src/lj_ctype.h
-+++ b/src/lj_ctype.h
-@@ -153,7 +153,7 @@ typedef struct CType {
- 
- /* Simplify target-specific configuration. Checked in lj_ccall.h. */
- #define CCALL_MAX_GPR		8
--#define CCALL_MAX_FPR		8
-+#define CCALL_MAX_FPR		14
- 
- typedef LJ_ALIGN(8) union FPRCBArg { double d; float f[2]; } FPRCBArg;
- 
-diff --git src/lj_def.h src/lj_def.h
-index 2d8fff6..381d6f5 100644
---- a/src/lj_def.h
-+++ b/src/lj_def.h
-@@ -71,7 +71,11 @@ typedef unsigned int uintptr_t;
- #define LJ_MAX_IDXCHAIN	100		/* __index/__newindex chain limit. */
- #define LJ_STACK_EXTRA	(5+2*LJ_FR2)	/* Extra stack space (metamethods). */
- 
-+#if defined(__powerpc64__) && _CALL_ELF != 2
-+#define LJ_NUM_CBPAGE	4		/* Number of FFI callback pages. */
-+#else
- #define LJ_NUM_CBPAGE	1		/* Number of FFI callback pages. */
-+#endif
- 
- /* Minimum table/buffer sizes. */
- #define LJ_MIN_GLOBAL	6		/* Min. global table size (hbits). */
-diff --git src/lj_frame.h src/lj_frame.h
-index 19c49a4..c666418 100644
---- a/src/lj_frame.h
-+++ b/src/lj_frame.h
-@@ -210,6 +210,15 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
- #define CFRAME_OFS_MULTRES	408
- #define CFRAME_SIZE		384
- #define CFRAME_SHIFT_MULTRES	3
-+#elif LJ_ARCH_PPC_ELFV2
-+#define CFRAME_OFS_ERRF		360
-+#define CFRAME_OFS_NRES		356
-+#define CFRAME_OFS_PREV		336
-+#define CFRAME_OFS_L		352
-+#define CFRAME_OFS_PC		348
-+#define CFRAME_OFS_MULTRES	344
-+#define CFRAME_SIZE		368
-+#define CFRAME_SHIFT_MULTRES	3
- #elif LJ_ARCH_PPC32ON64
- #define CFRAME_OFS_ERRF		472
- #define CFRAME_OFS_NRES		468
-diff --git src/lj_target_ppc.h src/lj_target_ppc.h
-index c5c991a..f0c8c94 100644
---- a/src/lj_target_ppc.h
-+++ b/src/lj_target_ppc.h
-@@ -30,8 +30,13 @@ enum {
- 
-   /* Calling conventions. */
-   RID_RET = RID_R3,
-+#if LJ_LE
-+  RID_RETHI = RID_R4,
-+  RID_RETLO = RID_R3,
-+#else
-   RID_RETHI = RID_R3,
-   RID_RETLO = RID_R4,
-+#endif
-   RID_FPRET = RID_F1,
- 
-   /* These definitions must match with the *.dasc file(s): */
-@@ -131,6 +136,8 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define PPCF_C(r)	((r) << 6)
- #define PPCF_MB(n)	((n) << 6)
- #define PPCF_ME(n)	((n) << 1)
-+#define PPCF_SH(n)	((((n) & 31) << (11+1)) | (((n) & 32) >> (5-1)))
-+#define PPCF_M6(n)	((((n) & 31) << (5+1)) | (((n) & 32) << (11-5)))
- #define PPCF_Y		0x00200000
- #define PPCF_DOT	0x00000001
- 
-@@ -200,6 +207,13 @@ typedef enum PPCIns {
-   PPCI_RLWINM = 0x54000000,
-   PPCI_RLWIMI = 0x50000000,
- 
-+  PPCI_RLDICL = 0x78000000,
-+  PPCI_RLDICR = 0x78000004,
-+  PPCI_RLDIC = 0x78000008,
-+  PPCI_RLDIMI = 0x7800000c,
-+  PPCI_RLDCL = 0x78000010,
-+  PPCI_RLDCR = 0x78000012,
-+
-   PPCI_B = 0x48000000,
-   PPCI_BL = 0x48000001,
-   PPCI_BC = 0x40800000,
-diff --git src/vm_ppc.dasc src/vm_ppc.dasc
-index b4260eb..abb381e 100644
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -22,35 +22,40 @@
- |// GPR64   64 bit registers (but possibly 32 bit pointers, e.g. PS3).
- |//         Affects reg saves, stack layout, carry/overflow/dot flags etc.
- |// FRAME32 Use 32 bit frame layout, even with GPR64 (Xbox 360).
--|// TOC     Need table of contents (64 bit or 32 bit variant, e.g. PS3).
-+|// OPD     Need function descriptors (64 bit or 32 bit variant, e.g. PS3).
- |//         Function pointers are really a struct: code, TOC, env (optional).
--|// TOCENV  Function pointers have an environment pointer, too (not on PS3).
-+|// OPDENV  Function pointers have an environment pointer, too (not on PS3).
-+|// ELFV2   The 64-bit ELF V2 ABI is in use.
- |// PPE     Power Processor Element of Cell (PS3) or Xenon (Xbox 360).
- |//         Must avoid (slow) micro-coded instructions.
- |
- |.if P64
--|.define TOC, 1
--|.define TOCENV, 1
- |.macro lpx, a, b, c; ldx a, b, c; .endmacro
- |.macro lp, a, b; ld a, b; .endmacro
- |.macro stp, a, b; std a, b; .endmacro
-+|.macro stpx, a, b, c; stdx a, b, c; .endmacro
- |.define decode_OPP, decode_OP8
--|.if FFI
--|// Missing: Calling conventions, 64 bit regs, TOC.
--|.error lib_ffi not yet implemented for PPC64
--|.endif
-+|.define PSIZE, 8
- |.else
- |.macro lpx, a, b, c; lwzx a, b, c; .endmacro
- |.macro lp, a, b; lwz a, b; .endmacro
- |.macro stp, a, b; stw a, b; .endmacro
-+|.macro stpx, a, b, c; stwx a, b, c; .endmacro
- |.define decode_OPP, decode_OP4
-+|.define PSIZE, 4
- |.endif
- |
- |// Convenience macros for TOC handling.
--|.if TOC
-+|.if OPD or ELFV2
- |// Linker needs a TOC patch area for every external call relocation.
--|.macro blex, target; bl extern target@plt; nop; .endmacro
-+|.macro blex, target; bl extern target; nop; .endmacro
- |.macro .toc, a, b; a, b; .endmacro
-+|.else
-+|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro .toc, a, b; .endmacro
-+|.endif
-+|.if OPD
-+|.macro .opd, a, b; a, b; .endmacro
- |.if P64
- |.define TOC_OFS,	 8
- |.define ENV_OFS,	16
-@@ -58,13 +63,13 @@
- |.define TOC_OFS,	4
- |.define ENV_OFS,	8
- |.endif
--|.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
--|.macro .toc, a, b; .endmacro
-+|.else  // No OPD.
-+|.macro .opd, a, b; .endmacro
- |.endif
--|.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-+|.macro .opdenv, a, b; .if OPDENV; a, b; .endif; .endmacro
- |
- |.macro .gpr64, a, b; .if GPR64; a, b; .endif; .endmacro
-+|.macro .elfv2, a, b; .if ELFV2; a, b; .endif; .endmacro
- |
- |.macro andix., y, a, i
- |.if PPE
-@@ -75,29 +80,6 @@
- |.endif
- |.endmacro
- |
--|.macro clrso, reg
--|.if PPE
--|  li reg, 0
--|  mtxer reg
--|.else
--|  mcrxr cr0
--|.endif
--|.endmacro
--|
--|.macro checkov, reg, noov
--|.if PPE
--|  mfxer reg
--|  add reg, reg, reg
--|  cmpwi reg, 0
--|   li reg, 0
--|   mtxer reg
--|  bgey noov
--|.else
--|  mcrxr cr0
--|  bley noov
--|.endif
--|.endmacro
--|
- |//-----------------------------------------------------------------------
- |
- |// Fixed register assignments for the interpreter.
-@@ -111,6 +93,7 @@
- |.define LREG,		r18	// Register holding lua_State (also in SAVE_L).
- |.define MULTRES,	r19	// Size of multi-result: (nresults+1)*8.
- |.define JGL,		r31	// On-trace: global_State + 32768.
-+|.define BASEP4,	r25	// Equal to BASE + 4
- |
- |// Constants for type-comparisons, stores and conversions. C callee-save.
- |.define TISNUM,	r22
-@@ -143,12 +126,19 @@
- |
- |.define FARG1,		f1
- |.define FARG2,		f2
-+|.define FARG3,		f3
-+|.define FARG4,		f4
-+|.define FARG5,		f5
-+|.define FARG6,		f6
-+|.define FARG7,		f7
-+|.define FARG8,		f8
- |
- |.define CRET1,		r3
- |.define CRET2,		r4
- |
- |.define TOCREG,	r2	// TOC register (only used by C code).
- |.define ENVREG,	r11	// Environment pointer (nested C functions).
-+|.define FUNCREG,	r12	// ELFv2 function pointer (overlaps RD)
- |
- |// Stack layout while in interpreter. Must match with lj_frame.h.
- |.if GPR64
-@@ -182,6 +172,49 @@
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
- |
-+|.elif ELFV2
-+|
-+|//			392(sp) // \ 32 bit C frame info.
-+|.define SAVE_LR,	384(sp)
-+|.define SAVE_CR,	376(sp) // 64 bit CR save.
-+|.define CFRAME_SPACE,	368     // Delta for sp.
-+|// Back chain for sp:	368(sp) <-- sp entering interpreter
-+|.define SAVE_ERRF,	360(sp) // |
-+|.define SAVE_NRES,	356(sp) // |
-+|.define SAVE_L,	352(sp) //  > Parameter save area.
-+|.define SAVE_PC,	348(sp) // |
-+|.define SAVE_MULTRES,	344(sp) // |
-+|.define SAVE_CFRAME,	336(sp) // / 64 bit C frame chain.
-+|.define SAVE_FPR_,	192     // .. 192+18*8: 64 bit FPR saves.
-+|.define SAVE_GPR_,	48      // .. 48+18*8: 64 bit GPR saves.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	44(sp)
-+|.define TMPD_LO,	40(sp)
-+|.define TONUM_HI,	36(sp)
-+|.define TONUM_LO,	32(sp)
-+|.else
-+|.define TMPD_LO,	44(sp)
-+|.define TMPD_HI,	40(sp)
-+|.define TONUM_LO,	36(sp)
-+|.define TONUM_HI,	32(sp)
-+|.endif
-+|.define SAVE_TOC,	24(sp)  // TOC save area.
-+|// Next frame lr:	16(sp)
-+|// Next frame cr:	8(sp)
-+|// Back chain for sp:	0(sp)	<-- sp while in interpreter
-+|
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
-+|.define TMPD_BLO,	39(sp)
-+|.define TMPD,		TMPD_HI
-+|.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	32
-+|
- |.else
- |
- |//			508(sp) // \ 32 bit C frame info.
-@@ -192,23 +225,39 @@
- |.define SAVE_MULTRES,	456(sp) // |
- |.define SAVE_CFRAME,	448(sp) // / 64 bit C frame chain.
- |.define SAVE_LR,	416(sp)
-+|.define SAVE_CR,	408(sp)  // 64 bit CR save.
- |.define CFRAME_SPACE,	400     // Delta for sp.
- |// Back chain for sp:	400(sp) <-- sp entering interpreter
- |.define SAVE_FPR_,	256     // .. 256+18*8: 64 bit FPR saves.
- |.define SAVE_GPR_,	112     // .. 112+18*8: 64 bit GPR saves.
- |//			48(sp)  // Callee parameter save area (ABI mandated).
- |.define SAVE_TOC,	40(sp)  // TOC save area.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	36(sp)  // \ Link editor temp (ABI mandated).
-+|.define TMPD_LO,	32(sp)  // /
-+|.define TONUM_HI,	28(sp)  // \ Compiler temp (ABI mandated).
-+|.define TONUM_LO,	24(sp)  // /
-+|.else
- |.define TMPD_LO,	36(sp)  // \ Link editor temp (ABI mandated).
- |.define TMPD_HI,	32(sp)  // /
- |.define TONUM_LO,	28(sp)  // \ Compiler temp (ABI mandated).
- |.define TONUM_HI,	24(sp)  // /
-+|.endif
- |// Next frame lr:	16(sp)
--|.define SAVE_CR,	8(sp)  // 64 bit CR save.
-+|// Next frame cr:	8(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	39(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	112
- |
- |.endif
- |.else
-@@ -226,16 +275,31 @@
- |.define SAVE_PC,	32(sp)
- |.define SAVE_MULTRES,	28(sp)
- |.define UNUSED1,	24(sp)
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	20(sp)
-+|.define TMPD_LO,	16(sp)
-+|.define TONUM_HI,	12(sp)
-+|.define TONUM_LO,	8(sp)
-+|.else
- |.define TMPD_LO,	20(sp)
- |.define TMPD_HI,	16(sp)
- |.define TONUM_LO,	12(sp)
- |.define TONUM_HI,	8(sp)
-+|.endif
- |// Next frame lr:	4(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	16(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	23(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	16
- |
- |.endif
- |
-@@ -350,8 +414,35 @@
- |//-----------------------------------------------------------------------
- |
- |// Access to frame relative to BASE.
-+|.if ENDIAN_LE
-+|.define FRAME_PC,	-4
-+|.define FRAME_FUNC,	-8
-+|.define FRAME_CONTPC,	-12
-+|.define FRAME_CONTRET,	-16
-+|.define WORD_LO,	0
-+|.define WORD_HI,	4
-+|.define WORD_BLO,	0
-+|.define BASE_LO,	BASE
-+|.define BASE_HI,	BASEP4
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux lo, base, idx
-+|  lwz hi, 4(base)
-+|.endmacro
-+|.else
- |.define FRAME_PC,	-8
- |.define FRAME_FUNC,	-4
-+|.define FRAME_CONTPC,	-16
-+|.define FRAME_CONTRET,	-12
-+|.define WORD_LO,	4
-+|.define WORD_HI,	0
-+|.define WORD_BLO,	7
-+|.define BASE_LO,	BASEP4
-+|.define BASE_HI,	BASE
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux hi, base, idx
-+|  lwz lo, 4(base)
-+|.endmacro
-+|.endif
- |
- |// Instruction decode.
- |.macro decode_OP4, dst, ins; rlwinm dst, ins, 2, 22, 29; .endmacro
-@@ -412,6 +503,7 @@
- |// Call decode and dispatch.
- |.macro ins_callt
- |  // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
-+|  addi BASEP4, BASE, 4
- |  lwz PC, LFUNC:RB->pc
- |  lwz INS, 0(PC)
- |   addi PC, PC, 4
-@@ -504,7 +596,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz PC, FRAME_PC(TMP2)		// Fetch PC of previous frame.
-   |  mr BASE, TMP2			// Restore caller base.
-   |  // Prepending may overwrite the pcall frame, so do it at the end.
--  |   stwu TMP1, FRAME_PC(RA)		// Prepend true to results.
-+  |  .if ENDIAN_LE
-+  |    addi RA, RA, -8
-+  |    stw TMP1, WORD_HI(RA)		// Prepend true to results.
-+  |  .else
-+  |    stwu TMP1, -8(RA)		// Prepend true to results.
-+  |  .endif
-   |
-   |->vm_returnc:
-   |  addi RD, RD, 8			// RD = (nresults+1)*8.
-@@ -560,7 +657,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP1, L->maxstack
-   |  cmplw BASE, TMP1
-   |  bge >8
--  |  stw TISNIL, 0(BASE)
-+  |  stw TISNIL, WORD_HI(BASE)
-   |  addi RD, RD, 8
-   |  addi BASE, BASE, 8
-   |  b <2
-@@ -611,7 +708,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_unwind_ff_eh:			// Landing pad for external unwinder.
-   |  lwz L, SAVE_L
-   |  .toc ld TOCREG, SAVE_TOC
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp BASE, L->base
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
-@@ -626,7 +728,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)			// Results start at BASE-8.
-   |     stw TMP3, TMPD
-   |   addi DISPATCH, DISPATCH, GG_G2DISP
--  |  stw TMP1, 0(RA)			// Prepend false to error message.
-+  |  stw TMP1, WORD_HI(RA)		// Prepend false to error message.
-   |  li RD, 16				// 2 results: false + error message.
-   |    st_vmstate
-   |     lfs TONUM, TMPD
-@@ -687,7 +789,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mr RA, BASE
-   |   lp BASE, L->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |  lwz PC, FRAME_PC(BASE)
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-@@ -737,7 +844,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:  // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype).
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |  add PC, PC, BASE
-@@ -757,8 +869,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->vm_call_dispatch:
-   |  // TMP2 = old base, BASE = new base, RC = nargs*8, PC = caller PC
--  |  lwz TMP0, FRAME_PC(BASE)
--  |   lwz LFUNC:RB, FRAME_FUNC(BASE)
-+  |  lwz TMP0, WORD_HI-8(BASE)
-+  |   lwz LFUNC:RB, WORD_LO-8(BASE)
-   |  checkfunc TMP0; bne ->vmeta_call
-   |
-   |->vm_call_dispatch_f:
-@@ -777,7 +889,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   sub TMP0, TMP0, TMP1		// Compute -savestack(L, L->top).
-   |    lp TMP1, L->cframe
-   |     addi DISPATCH, DISPATCH, GG_G2DISP
--  |  .toc lp CARG4, 0(CARG4)
-+  |  .opd lp TOCREG, TOC_OFS(CARG4)
-+  |  .opdenv lp ENVREG, ENV_OFS(CARG4)
-+  |  .opd lp CARG4, 0(CARG4)
-   |  li TMP2, 0
-   |   stw TMP0, SAVE_NRES		// Neg. delta means cframe w/o frame.
-   |  stw TMP2, SAVE_ERRF		// No error function.
-@@ -785,7 +899,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stp sp, L->cframe		// Add our C frame to cframe chain.
-   |     stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mtctr CARG4
-+  |  .elfv2 mr FUNCREG, CARG4
-   |  bctrl			// (lua_State *L, lua_CFunction func, void *ud)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |.if PPE
-   |  mr BASE, CRET1
-   |  cmpwi CRET1, 0
-@@ -807,20 +923,27 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_dispatch:
-   |  // BASE = meta base, RA = resultptr, RD = (nresults+1)*8
--  |  lwz TMP0, -12(BASE)		// Continuation.
-+  |  lwz TMP0, FRAME_CONTRET(BASE)	// Continuation.
-   |   mr RB, BASE
-   |   mr BASE, TMP2			// Restore caller BASE.
-   |    lwz LFUNC:TMP1, FRAME_FUNC(TMP2)
-   |.if FFI
-   |  cmplwi TMP0, 1
-   |.endif
--  |     lwz PC, -16(RB)			// Restore PC from [cont|PC].
--  |   subi TMP2, RD, 8
-+  |     lwz PC, FRAME_CONTPC(RB)	// Restore PC from [cont|PC].
-+  |  addi BASEP4, BASE, 4
-+  |   addi TMP2, RD, WORD_HI-8
-   |    lwz TMP1, LFUNC:TMP1->pc
-   |   stwx TISNIL, RA, TMP2		// Ensure one valid arg.
-+  |.if P64
-+  |   ld TMP3, 0(DISPATCH)
-+  |.endif
-   |.if FFI
-   |  ble >1
-   |.endif
-+  |.if P64
-+  |  add TMP0, TMP0, TMP3
-+  |.endif
-   |    lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // BASE = base, RA = resultptr, RB = meta base
-   |  mtctr TMP0
-@@ -856,20 +979,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgetb:			// TMP0 = index
-@@ -880,8 +1003,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -909,7 +1032,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 16			// 2 args for func(t, k).
-@@ -923,7 +1046,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f14, 0(CRET1)
-   |  b ->BC_TGETR_Z
-   |1:
--  |  stwx TISNIL, BASE, RA
-+  |  stwx TISNIL, BASE_HI, RA
-   |  b ->cont_nop
-   |
-   |//-----------------------------------------------------------------------
-@@ -932,20 +1055,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsetb:			// TMP0 = index
-@@ -956,8 +1079,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -986,7 +1109,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
-@@ -1006,17 +1129,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_comp:
-   |  mr CARG1, L
-   |   subi PC, PC, 4
--  |.if DUALNUM
--  |  mr CARG2, RA
--  |.else
-   |  add CARG2, BASE, RA
--  |.endif
-   |   stw PC, SAVE_PC
--  |.if DUALNUM
--  |  mr CARG3, RD
--  |.else
-   |  add CARG3, BASE, RD
--  |.endif
-   |   stp BASE, L->base
-   |  decode_OP1 CARG4, INS
-   |  bl extern lj_meta_comp  // (lua_State *L, TValue *o1, *o2, int op)
-@@ -1043,7 +1158,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b ->cont_nop
-   |
-   |->cont_condt:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is true.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1051,7 +1166,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b <4
-   |
-   |->cont_condf:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is false.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1103,8 +1218,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |
-   |->vmeta_unm:
--  |  mr CARG3, RD
--  |  mr CARG4, RD
-+  |  add CARG3, BASE, RD
-+  |  add CARG4, BASE, RD
-   |  b >1
-   |
-   |->vmeta_arith_vn:
-@@ -1139,7 +1254,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_binop:
-   |  // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2
-   |  sub TMP1, CRET1, BASE
--  |   stw PC, -16(CRET1)		// [cont|PC]
-+  |   stw PC, FRAME_CONTPC(CRET1)	// [cont|PC]
-   |   mr TMP2, BASE
-   |  addi PC, TMP1, FRAME_CONT
-   |   mr BASE, CRET1
-@@ -1150,7 +1265,7 @@ static void build_subroutines(BuildCtx *ctx)
- #if LJ_52
-   |  mr SAVE0, CARG1
- #endif
--  |  mr CARG2, RD
-+  |  add CARG2, BASE, RD
-   |   stp BASE, L->base
-   |  mr CARG1, L
-   |   stw PC, SAVE_PC
-@@ -1227,25 +1342,25 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_1, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_2, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG4, 8(BASE)
--  |   lwz CARG1, 4(BASE)
--  |    lwz CARG2, 12(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG4, WORD_HI+8(BASE)
-+  |   lwz CARG1, WORD_LO(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_n, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1254,9 +1369,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_nn, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |    lfd FARG2, 8(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1279,9 +1394,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplw cr1, CARG3, TMP1
-   |    lwz PC, FRAME_PC(BASE)
-   |  bge cr1, ->fff_fallback
--  |   stw CARG3, 0(RA)
-+  |   stw CARG3, WORD_HI(RA)
-   |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
--  |   stw CARG1, 4(RA)
-+  |   stw CARG1, WORD_LO(RA)
-   |  beq ->fff_res			// Done if exactly 1 argument.
-   |  li TMP1, 8
-   |  subi RC, RC, 8
-@@ -1295,17 +1410,36 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc type
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |  blt ->fff_fallback
-   |  .gpr64 extsw CARG1, CARG1
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG1, 15
-+  |  subfc TMP1, TMP0, CARG1
-+  |.else
-   |  subfc TMP0, TISNUM, CARG1
--  |  subfe TMP2, CARG1, CARG1
-+  |.endif
-+  |    subfe TMP2, CARG1, CARG1
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >1
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 3
-+  |2:
-   |   la TMP2, CFUNC:RB->upvalue
-   |  lfdx FARG1, TMP2, TMP1
-   |  b ->fff_resn
-+  |.if P64
-+  |1:
-+  |  li TMP1, ~LJ_TLIGHTUD<<3
-+  |  b <2
-+  |.endif
-   |
-   |//-- Base library: getters and setters ---------------------------------
-   |
-@@ -1328,10 +1462,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  sub TMP1, TMP0, TMP1
-   |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-   |3:  // Rearranged logic, because we expect _not_ to find the key.
--  |  lwz CARG4, NODE:TMP2->key
--  |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--  |    lwz CARG2, NODE:TMP2->val
--  |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+  |  lwz CARG4, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+  |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+  |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+  |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-   |  checkstr CARG4; bne >4
-   |   cmpw TMP0, STR:RC; beq >5
-   |4:
-@@ -1349,14 +1483,33 @@ static void build_subroutines(BuildCtx *ctx)
-   |6:
-   |  cmpwi CARG3, LJ_TUDATA; beq <1
-   |  .gpr64 extsw CARG3, CARG3
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG3, 15
-+  |  subfc TMP1, TMP0, CARG3
-+  |.else
-   |  subfc TMP0, TISNUM, CARG3
-+  |.endif
-   |  subfe TMP2, CARG3, CARG3
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >7
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 2
-+  |8:
-   |   la TMP2, DISPATCH_GL(gcroot[GCROOT_BASEMT])(DISPATCH)
-   |  lwzx TAB:CARG1, TMP2, TMP1
-   |  b <2
-+  |.if P64
-+  |7:
-+  |  li TMP1, ~LJ_TLIGHTUD<<2
-+  |  b <8
-+  |.endif
-   |
-   |.ffunc_2 setmetatable
-   |  // Fast path: no mt for table yet and not clearing the mt.
-@@ -1374,8 +1527,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc rawget
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG4, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checktab CARG4; bne ->fff_fallback
-   |   la CARG3, 8(BASE)
-@@ -1390,7 +1543,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc tonumber
-   |  // Only handles the number case inline (without a base argument).
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly one argument.
-   |   checknum CARG1; bgt ->fff_fallback
-@@ -1425,10 +1578,15 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc next
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-+  |.if ENDIAN_LE
-+  |   add TMP1, BASE, NARGS8:RC
-+  |   stw TISNIL, WORD_HI(TMP1)		// Set missing 2nd arg to nil.
-+  |.else
-   |   stwx TISNIL, BASE, NARGS8:RC	// Set missing 2nd arg to nil.
-+  |.endif
-   |  checktab CARG1
-   |   lwz PC, FRAME_PC(BASE)
-   |  bne ->fff_fallback
-@@ -1464,18 +1622,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, CFUNC:RB->upvalue[0]
-   |  la RA, -8(BASE)
- #endif
--  |   stw TISNIL, 8(BASE)
-+  |   stw TISNIL, 8+WORD_HI(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-   |
-   |.ffunc ipairs_aux
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz TAB:CARG1, 4(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz TAB:CARG1, WORD_LO(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP2, 12(BASE)
-+  |    lwz TMP2, 8+WORD_LO(BASE)
-   |.else
-   |    lfd FARG2, 8(BASE)
-   |.endif
-@@ -1504,16 +1662,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |   la RA, -8(BASE)
-   |  cmplw TMP0, TMP2
-   |.if DUALNUM
--  |  stw TISNUM, 0(RA)
-+  |  stw TISNUM, WORD_HI(RA)
-   |   slwi TMP3, TMP2, 3
--  |  stw TMP2, 4(RA)
-+  |  stw TMP2, WORD_LO(RA)
-   |.else
-   |   slwi TMP3, TMP2, 3
-   |  stfd FARG2, 0(RA)
-   |.endif
-   |  ble >2				// Not in array part?
--  |  lwzx TMP2, TMP1, TMP3
--  |  lfdx f0, TMP1, TMP3
-+  |  lfdux f0, TMP1, TMP3
-+  |  lwz TMP2, WORD_HI(TMP1)
-   |1:
-   |  checknil TMP2
-   |   li RD, (0+1)*8
-@@ -1532,7 +1690,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplwi CRET1, 0
-   |   li RD, (0+1)*8
-   |  beq ->fff_res
--  |  lwz TMP2, 0(CRET1)
-+  |  lwz TMP2, WORD_HI(CRET1)
-   |  lfd f0, 0(CRET1)
-   |  b <1
-   |
-@@ -1551,11 +1709,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)
- #endif
-   |.if DUALNUM
--  |  stw TISNUM, 8(BASE)
-+  |  stw TISNUM, 8+WORD_HI(BASE)
-   |.else
--  |  stw ZERO, 8(BASE)
-+  |  stw ZERO, 8+WORD_HI(BASE)
-   |.endif
--  |   stw ZERO, 12(BASE)
-+  |   stw ZERO, 8+WORD_LO(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-@@ -1576,7 +1734,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc xpcall
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |    lfd FARG2, 8(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-@@ -1673,7 +1831,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if resume
-   |  li TMP1, LJ_TTRUE
-   |   la RA, -8(BASE)
--  |  stw TMP1, -8(BASE)			// Prepend true to results.
-+  |  stw TMP1, WORD_HI-8(BASE)		// Prepend true to results.
-   |  addi RD, RD, 16
-   |.else
-   |  mr RA, BASE
-@@ -1693,7 +1851,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, 0(TMP3)
-   |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
-   |    li RD, (2+1)*8
--  |   stw TMP1, -8(BASE)		// Prepend false to results.
-+  |   stw TMP1, WORD_HI-8(BASE)		// Prepend false to results.
-   |    la RA, -8(BASE)
-   |  stfd f0, 0(BASE)			// Copy error message.
-   |  b <7
-@@ -1746,8 +1904,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_resi:
-   |  lwz PC, FRAME_PC(BASE)
-   |  la RA, -8(BASE)
--  |  stw TISNUM, -8(BASE)
--  |  stw CRET1, -4(BASE)
-+  |  stw TISNUM, WORD_HI-8(BASE)
-+  |  stw CRET1, WORD_LO-8(BASE)
-   |  b ->fff_res1
-   |1:
-   |  lus CARG3, 0x41e0	// 2^31.
-@@ -1762,9 +1920,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_restv:
-   |  // CARG3/CARG1 = TValue result.
-   |  lwz PC, FRAME_PC(BASE)
--  |   stw CARG3, -8(BASE)
-+  |   stw CARG3, WORD_HI-8(BASE)
-   |  la RA, -8(BASE)
--  |   stw CARG1, -4(BASE)
-+  |   stw CARG1, WORD_LO-8(BASE)
-   |->fff_res1:
-   |  // RA = results, PC = return.
-   |  li RD, (1+1)*8
-@@ -1782,10 +1940,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  ins_next1
-   |  // Adjust BASE. KBASE is assumed to be set for the calling frame.
-   |   sub BASE, RA, TMP0
-+  |   addi BASEP4, BASE, 4
-   |  ins_next2
-   |
-   |6:  // Fill up results with nil.
--  |  subi TMP1, RD, 8
-+  |  addi TMP1, RD, WORD_HI-8
-   |   addi RD, RD, 8
-   |  stwx TISNIL, RA, TMP1
-   |  b <5
-@@ -1898,7 +2057,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc math_log
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1923,13 +2082,13 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if DUALNUM
-   |.ffunc math_ldexp
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |.if GPR64
--  |    lwz CARG2, 12(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |.else
--  |    lwz CARG1, 12(BASE)
-+  |    lwz CARG1, WORD_LO+8(BASE)
-   |.endif
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1961,8 +2120,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stfd FARG1, 0(RA)
-   |  li RD, (2+1)*8
-   |.if DUALNUM
--  |   stw TISNUM, 8(RA)
--  |   stw TMP1, 12(RA)
-+  |   stw TISNUM, WORD_HI+8(RA)
-+  |   stw TMP1, WORD_LO+8(RA)
-   |.else
-   |   stfd FARG2, 8(RA)
-   |.endif
-@@ -1989,9 +2148,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   add TMP2, BASE, NARGS8:RC
-   |  bne >4
-   |1:  // Handle integers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |   bge cr1, ->fff_resi
-   |  checknum CARG4
-   |   xoris TMP0, CARG1, 0x8000
-@@ -2020,7 +2179,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lfd FARG1, 0(BASE)
-   |  bge ->fff_fallback
-   |5:  // Handle numbers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |  lfd FARG2, 0(TMP1)
-   |   bge cr1, ->fff_resn
-@@ -2035,7 +2194,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |  b <5
-   |7:  // Convert integer to number and continue above.
--  |   lwz CARG2, 4(TMP1)
-+  |   lwz CARG2, WORD_LO(TMP1)
-   |  bne ->fff_fallback
-   |  tonum_i FARG2, CARG2
-   |  b <6
-@@ -2043,7 +2202,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc_n name
-   |  li TMP1, 8
-   |1:
-+  |.if ENDIAN_LE
-+  |   add CARG2, BASE, TMP1
-+  |   lwz CARG2, WORD_HI(CARG2)
-+  |.else
-   |   lwzx CARG2, BASE, TMP1
-+  |.endif
-   |   lfdx FARG2, BASE, TMP1
-   |  cmplw cr1, TMP1, NARGS8:RC
-   |   checknum CARG2
-@@ -2067,8 +2231,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc string_byte			// Only handle the 1-arg case here.
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |   checkstr CARG3
-   |   bne ->fff_fallback
-@@ -2099,12 +2263,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_char			// Only handle the 1-arg case here.
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP0, 4(BASE)
-+  |    lwz TMP0, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-   |  checknum CARG3; bne ->fff_fallback
--  |   la CARG2, 7(BASE)
-+  |   la CARG2, WORD_BLO(BASE)
-   |.else
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-@@ -2128,16 +2292,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_sub
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 16(BASE)
-+  |   lwz CARG3, WORD_HI+16(BASE)
-   |.if not DUALNUM
-   |    lfd f0, 16(BASE)
-   |.endif
--  |   lwz TMP0, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz TMP0, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
--  |   lwz CARG2, 8(BASE)
-+  |   lwz CARG2, WORD_HI+8(BASE)
-   |.if DUALNUM
--  |    lwz TMP1, 12(BASE)
-+  |    lwz TMP1, WORD_LO+8(BASE)
-   |.else
-   |    lfd f1, 8(BASE)
-   |.endif
-@@ -2145,7 +2309,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  beq >1
-   |.if DUALNUM
-   |  checknum CARG3
--  |   lwz TMP2, 20(BASE)
-+  |   lwz TMP2, WORD_LO+16(BASE)
-   |  bne ->fff_fallback
-   |1:
-   |  checknum CARG2; bne ->fff_fallback
-@@ -2201,8 +2365,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc string_ .. name
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG2, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checkstr CARG3
-   |   la SBUF:CARG1, DISPATCH_GL(tmpbuf)(DISPATCH)
-@@ -2240,10 +2404,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  addi TMP1, BASE, 8
-   |  add TMP2, BASE, NARGS8:RC
-   |1:
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |.if DUALNUM
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |.else
-   |  lfd FARG1, 0(TMP1)
-   |.endif
-@@ -2344,20 +2508,23 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->fff_fallback:			// Call fast function fallback handler.
-   |  // BASE = new base, RB = CFUNC, RC = nargs*8
--  |  lp TMP3, CFUNC:RB->f
-+  |  lp FUNCREG, CFUNC:RB->f
-   |    add TMP1, BASE, NARGS8:RC
-   |   lwz PC, FRAME_PC(BASE)		// Fallback may overwrite PC.
-   |    addi TMP0, TMP1, 8*LUA_MINSTACK
-   |     lwz TMP2, L->maxstack
-   |   stw PC, SAVE_PC			// Redundant (but a defined value).
--  |  .toc lp TMP3, 0(TMP3)
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-   |  cmplw TMP0, TMP2
-   |     stp BASE, L->base
-   |    stp TMP1, L->top
-   |   mr CARG1, L
-   |  bgt >5				// Need to grow stack.
--  |  mtctr TMP3
-+  |  mtctr FUNCREG
-   |  bctrl				// (lua_State *L)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |  // Either throws an error, or recovers and returns -1, 0 or nresults+1.
-   |  lp BASE, L->base
-   |  cmpwi CRET1, 0
-@@ -2459,6 +2626,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:
-   |  lp BASE, L->base
-   |4:  // Re-dispatch to static ins.
-+  |  addi BASEP4, BASE, 4
-   |  lwz INS, -4(PC)
-   |  decode_OPP TMP1, INS
-   |   decode_RB8 RB, INS
-@@ -2472,7 +2640,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_hook:				// Continue from hook yield.
-   |  addi PC, PC, 4
--  |  lwz MULTRES, -20(RB)		// Restore MULTRES for *M ins.
-+  |  lwz MULTRES, WORD_LO-24(RB)		// Restore MULTRES for *M ins.
-   |  b <4
-   |
-   |->vm_hotloop:			// Hot loop counter underflow.
-@@ -2514,6 +2682,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lp BASE, L->base
-   |   lp TMP0, L->top
-   |   stw ZERO, SAVE_PC			// Invalidate for subsequent line hook.
-+  |  addi BASEP4, BASE, 4
-   |  sub NARGS8:RC, TMP0, BASE
-   |  add RA, BASE, RA
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)
-@@ -2525,7 +2694,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if JIT
-   |  // RA = resultptr, RB = meta base
-   |  lwz INS, -4(PC)
--  |    lwz TRACE:TMP2, -20(RB)		// Save previous trace.
-+  |    lwz TRACE:TMP2, WORD_LO-24(RB)	// Save previous trace.
-   |   addic. TMP1, MULTRES, -8
-   |  decode_RA8 RC, INS			// Call base.
-   |   beq >2
-@@ -2560,10 +2729,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG2, PC
-   |  bl extern lj_dispatch_stitch	// (jit_State *J, const BCIns *pc)
-   |  lp BASE, L->base
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
-   |
-   |9:
-+  |.if ENDIAN_LE
-+  |  addi BASEP4, BASE, 4
-+  |  stwx TISNIL, BASEP4, RC
-+  |.else
-   |  stwx TISNIL, BASE, RC
-+  |.endif
-   |  addi RC, RC, 8
-   |  b <3
-   |.endif
-@@ -2578,6 +2753,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
-   |  lp BASE, L->base
-   |  subi PC, PC, 4
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
- #endif
-   |
-@@ -2586,39 +2762,72 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-----------------------------------------------------------------------
-   |
-   |.macro savex_, a, b, c, d
--  |  stfd f..a, 16+a*8(sp)
--  |  stfd f..b, 16+b*8(sp)
--  |  stfd f..c, 16+c*8(sp)
--  |  stfd f..d, 16+d*8(sp)
-+  |  stfd f..a, EXIT_OFFSET+a*8(sp)
-+  |  stfd f..b, EXIT_OFFSET+b*8(sp)
-+  |  stfd f..c, EXIT_OFFSET+c*8(sp)
-+  |  stfd f..d, EXIT_OFFSET+d*8(sp)
-+  |.endmacro
-+  |
-+  |.macro saver, a
-+  |  stp r..a, EXIT_OFFSET+32*8+a*PSIZE(sp)
-   |.endmacro
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, sp, -(16+32*8+32*4)
--  |  stmw r2, 16+32*8+2*4(sp)
-+  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  saver 3 // CARG1
-+  |  saver 4 // CARG2
-+  |  saver 5 // CARG3
-+  |  saver 17 // DISPATCH
-   |    addi DISPATCH, JGL, -GG_DISP2G-32768
-   |    li CARG2, ~LJ_VMST_EXIT
--  |   lwz CARG1, 16+32*8+32*4(sp)	// Get stack chain.
-+  |   lp CARG1, EXIT_OFFSET+32*8+32*PSIZE(sp)	// Get stack chain.
-   |    stw CARG2, DISPATCH_GL(vmstate)(DISPATCH)
-+  |  saver 2
-+  |  saver 6
-+  |  saver 7
-+  |  saver 8
-+  |  saver 9
-+  |  saver 10
-+  |  saver 11
-+  |  saver 12
-+  |  saver 13
-   |  savex_ 0,1,2,3
--  |   stw CARG1, 0(sp)			// Store extended stack chain.
--  |   clrso TMP1
-+  |   stp CARG1, 0(sp)			// Store extended stack chain.
-+
-   |  savex_ 4,5,6,7
--  |   addi CARG2, sp, 16+32*8+32*4	// Recompute original value of sp.
-+  |  saver 14
-+  |  saver 15
-+  |  saver 16
-+  |  saver 18
-+  |   addi CARG2, sp, EXIT_OFFSET+32*8+32*PSIZE	// Recompute original value of sp.
-   |  savex_ 8,9,10,11
--  |   stw CARG2, 16+32*8+1*4(sp)	// Store sp in RID_SP.
-+  |   stp CARG2, EXIT_OFFSET+32*8+1*PSIZE(sp)	// Store sp in RID_SP.
-   |  savex_ 12,13,14,15
-   |   mflr CARG3
-   |   li TMP1, 0
-   |  savex_ 16,17,18,19
--  |   stw TMP1, 16+32*8+0*4(sp)		// Clear RID_TMP.
-+  |   stw TMP1, EXIT_OFFSET+32*8+0*PSIZE(sp)		// Clear RID_TMP.
-   |  savex_ 20,21,22,23
-   |   lhz CARG4, 2(CARG3)		// Load trace number.
-   |  savex_ 24,25,26,27
-   |  lwz L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  savex_ 28,29,30,31
-+  |  saver 19
-+  |  saver 20
-+  |  saver 21
-+  |  saver 22
-+  |  saver 23
-+  |  saver 24
-+  |  saver 25
-+  |  saver 26
-+  |  saver 27
-+  |  saver 28
-+  |  saver 29
-+  |  saver 30
-+  |  saver 31
-   |   sub CARG3, TMP0, CARG3		// Compute exit number.
--  |  lp BASE, DISPATCH_GL(jit_base)(DISPATCH)
-+  |  lwz BASE, DISPATCH_GL(jit_base)(DISPATCH)
-   |   srwi CARG3, CARG3, 2
-   |  stp L, DISPATCH_J(L)(DISPATCH)
-   |   subi CARG3, CARG3, 2
-@@ -2627,11 +2836,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw TMP1, DISPATCH_GL(jit_base)(DISPATCH)
-   |  addi CARG1, DISPATCH, GG_DISP2J
-   |   stw CARG3, DISPATCH_J(exitno)(DISPATCH)
--  |  addi CARG2, sp, 16
-+  |  addi CARG2, sp, EXIT_OFFSET
-   |  bl extern lj_trace_exit		// (jit_State *J, ExitState *ex)
-   |  // Returns MULTRES (unscaled) or negated error code.
-   |  lp TMP1, L->cframe
--  |  lwz TMP2, 0(sp)
-+  |  lp TMP2, 0(sp)
-   |   lp BASE, L->base
-   |.if GPR64
-   |  rldicr sp, TMP1, 0, 61
-@@ -2639,7 +2848,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  rlwinm sp, TMP1, 0, 0, 29
-   |.endif
-   |   lwz PC, SAVE_PC			// Get SAVE_PC.
--  |  stw TMP2, 0(sp)
-+  |  stp TMP2, 0(sp)
-   |  stw L, SAVE_L			// Set SAVE_L (on-trace resume/yield).
-   |  b >1
-   |.endif
-@@ -2660,7 +2869,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stw TMP2, DISPATCH_GL(jit_base)(DISPATCH)
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // Setup type comparison constants.
-+  |.if P64
-+  |  lus TISNUM, LJ_TISNUM >> 16
-+  |  ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |  li TISNUM, LJ_TISNUM
-+  |.endif
-   |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
-   |  stw TMP3, TMPD
-   |  li ZERO, 0
-@@ -2680,14 +2894,14 @@ static void build_subroutines(BuildCtx *ctx)
-   |   decode_RA8 RA, INS
-   |  lpx TMP0, DISPATCH, TMP1
-   |  mtctr TMP0
--  |  cmplwi TMP1, BC_FUNCF*4		// Function header?
-+  |  cmplwi TMP1, BC_FUNCF*PSIZE	// Function header?
-   |  bge >2
-   |   decode_RB8 RB, INS
-   |   decode_RD8 RD, INS
-   |   decode_RC8 RC, INS
-   |  bctr
-   |2:
--  |  cmplwi TMP1, (BC_FUNCC+2)*4	// Fast function?
-+  |  cmplwi TMP1, (BC_FUNCC+2)*PSIZE	// Fast function?
-   |  blt >3
-   |  // Check frame below fast function.
-   |  lwz TMP1, FRAME_PC(BASE)
-@@ -2697,7 +2911,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP2, -4(TMP1)
-   |  decode_RA8 TMP0, TMP2
-   |  sub TMP1, BASE, TMP0
--  |  lwz LFUNC:TMP2, -12(TMP1)
-+  |  lwz LFUNC:TMP2, WORD_LO-16(TMP1)
-   |  lwz TMP1, LFUNC:TMP2->pc
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |3:
-@@ -2718,6 +2932,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |// NYI: Use internal implementations of floor, ceil, trunc.
-   |
-   |->vm_modi:
-+  |  li TMP1, 0
-+  |  mtxer TMP1
-   |  divwo. TMP0, CARG1, CARG2
-   |  bso >1
-   |.if GPR64
-@@ -2736,7 +2952,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmpwi CARG2, 0
-   |   li CARG1, 0
-   |  beqlr
--  |  clrso TMP0			// Clear SO for -2147483648 % -1 and return 0.
-+  |  // Clear SO for -2147483648 % -1 and return 0.
-+  |  crxor 4*cr0+so, 4*cr0+so, 4*cr0+so
-   |  blr
-   |
-   |//-----------------------------------------------------------------------
-@@ -2749,10 +2966,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_cachesync:
-   |.if JIT or FFI
-   |  // Compute start of first cache line and number of cache lines.
-+  |  .if GPR64
-+  |  rldicr CARG1, CARG1, 0, 58
-+  |  .else
-   |  rlwinm CARG1, CARG1, 0, 0, 26
-+  |  .endif
-   |  sub CARG2, CARG2, CARG1
-   |  addi CARG2, CARG2, 31
-+  |  .if GPR64
-+  |  srdi. CARG2, CARG2, 5
-+  |  .else
-   |  rlwinm. CARG2, CARG2, 27, 5, 31
-+  |  .endif
-   |  beqlr
-   |  mtctr CARG2
-   |  mr CARG3, CARG1
-@@ -2774,39 +2999,70 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-- FFI helper functions -----------------------------------------------
-   |//-----------------------------------------------------------------------
-   |
--  |// Handler for callback functions. Callback slot number in r11, g in r12.
-+  |// Handler for callback functions.
-+  |// 32-bit: Callback slot number in r12, g in r11.
-+  |// 64-bit v1: Callback slot number in bits 47+ of r11, g in 0-46, TOC in r2.
-+  |// 64-bit v2: Callback slot number in bits 2-11 of r12, g in r11,
-+  |// vm_ffi_callback in r2.
-   |->vm_ffi_callback:
-   |.if FFI
-   |.type CTSTATE, CTState, PC
-+  |  .if OPD
-+  |   rldicl r12, r11, 17, 47
-+  |   rldicl r11, r11, 0, 17
-+  |  .endif
-+  |  .if ELFV2
-+  |   rlwinm r12, r12, 30, 22, 31
-+  |   addisl TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@ha
-+  |   addil TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@l
-+  |  .endif
-   |  saveregs
--  |  lwz CTSTATE, GL:r12->ctype_state
--  |   addi DISPATCH, r12, GG_G2DISP
--  |  stw r11, CTSTATE->cb.slot
--  |  stw r3, CTSTATE->cb.gpr[0]
-+  |  lwz CTSTATE, GL:r11->ctype_state
-+  |   addi DISPATCH, r11, GG_G2DISP
-+  |  stw r12, CTSTATE->cb.slot
-+  |  stp r3, CTSTATE->cb.gpr[0]
-   |   stfd f1, CTSTATE->cb.fpr[0]
--  |  stw r4, CTSTATE->cb.gpr[1]
-+  |  stp r4, CTSTATE->cb.gpr[1]
-   |   stfd f2, CTSTATE->cb.fpr[1]
--  |  stw r5, CTSTATE->cb.gpr[2]
-+  |  stp r5, CTSTATE->cb.gpr[2]
-   |   stfd f3, CTSTATE->cb.fpr[2]
--  |  stw r6, CTSTATE->cb.gpr[3]
-+  |  stp r6, CTSTATE->cb.gpr[3]
-   |   stfd f4, CTSTATE->cb.fpr[3]
--  |  stw r7, CTSTATE->cb.gpr[4]
-+  |  stp r7, CTSTATE->cb.gpr[4]
-   |   stfd f5, CTSTATE->cb.fpr[4]
--  |  stw r8, CTSTATE->cb.gpr[5]
-+  |  stp r8, CTSTATE->cb.gpr[5]
-   |   stfd f6, CTSTATE->cb.fpr[5]
--  |  stw r9, CTSTATE->cb.gpr[6]
-+  |  stp r9, CTSTATE->cb.gpr[6]
-   |   stfd f7, CTSTATE->cb.fpr[6]
--  |  stw r10, CTSTATE->cb.gpr[7]
-+  |  stp r10, CTSTATE->cb.gpr[7]
-   |   stfd f8, CTSTATE->cb.fpr[7]
-+  |  .if GPR64
-+  |   stfd f9, CTSTATE->cb.fpr[8]
-+  |   stfd f10, CTSTATE->cb.fpr[9]
-+  |   stfd f11, CTSTATE->cb.fpr[10]
-+  |   stfd f12, CTSTATE->cb.fpr[11]
-+  |   stfd f13, CTSTATE->cb.fpr[12]
-+  |  .endif
-+  |  .if ELFV2
-+  |  addi TMP0, sp, CFRAME_SPACE+96
-+  |  .elif GPR64
-+  |  addi TMP0, sp, CFRAME_SPACE+112
-+  |  .else
-   |  addi TMP0, sp, CFRAME_SPACE+8
--  |  stw TMP0, CTSTATE->cb.stack
-+  |  .endif
-+  |  stp TMP0, CTSTATE->cb.stack
-   |   mr CARG1, CTSTATE
-   |  stw CTSTATE, SAVE_PC		// Any value outside of bytecode is ok.
-   |   mr CARG2, sp
-   |  bl extern lj_ccallback_enter	// (CTState *cts, void *cf)
-   |  // Returns lua_State *.
-   |  lp BASE, L:CRET1->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp RC, L:CRET1->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |     li ZERO, 0
-@@ -2835,9 +3091,21 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG1, CTSTATE
-   |  mr CARG2, RA
-   |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
--  |  lwz CRET1, CTSTATE->cb.gpr[0]
-+  |  lp CRET1, CTSTATE->cb.gpr[0]
-   |  lfd FARG1, CTSTATE->cb.fpr[0]
--  |  lwz CRET2, CTSTATE->cb.gpr[1]
-+  |  lp CRET2, CTSTATE->cb.gpr[1]
-+  |  .if GPR64
-+  |    lfd FARG2, CTSTATE->cb.fpr[1]
-+  |  .else
-+  |    lp CARG3, CTSTATE->cb.gpr[2]
-+  |    lp CARG4, CTSTATE->cb.gpr[3]
-+  |  .endif
-+  |  .elfv2 lfd f3, CTSTATE->cb.fpr[2]
-+  |  .elfv2 lfd f4, CTSTATE->cb.fpr[3]
-+  |  .elfv2 lfd f5, CTSTATE->cb.fpr[4]
-+  |  .elfv2 lfd f6, CTSTATE->cb.fpr[5]
-+  |  .elfv2 lfd f7, CTSTATE->cb.fpr[6]
-+  |  .elfv2 lfd f8, CTSTATE->cb.fpr[7]
-   |  b ->vm_leave_unw
-   |.endif
-   |
-@@ -2850,23 +3118,46 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lbz CARG2, CCSTATE->nsp
-   |   lbz CARG3, CCSTATE->nfpr
-   |  neg TMP1, TMP1
-+  |  .if GPR64
-+  |    std TMP0, 16(sp)
-+  |  .else
-   |    stw TMP0, 4(sp)
-+  |  .endif
-   |   cmpwi cr1, CARG3, 0
-   |  mr TMP2, sp
-   |   addic. CARG2, CARG2, -1
-+  |  .if GPR64
-+  |  stdux sp, sp, TMP1
-+  |  .else
-   |  stwux sp, sp, TMP1
-+  |  .endif
-   |   crnot 4*cr1+eq, 4*cr1+eq		// For vararg calls.
--  |  stw r14, -4(TMP2)
--  |  stw CCSTATE, -8(TMP2)
-+  |  .if GPR64
-+  |    std r14, -8(TMP2)
-+  |    std CCSTATE, -16(TMP2)
-+  |  .else
-+  |    stw r14, -4(TMP2)
-+  |    stw CCSTATE, -8(TMP2)
-+  |  .endif
-   |  mr r14, TMP2
-   |  la TMP1, CCSTATE->stack
-+  |  .if GPR64
-+  |   sldi CARG2, CARG2, 3
-+  |  .else
-   |   slwi CARG2, CARG2, 2
-+  |  .endif
-   |   blty >2
--  |  la TMP2, 8(sp)
-+  |  .if ELFV2
-+  |    la TMP2, 96(sp)
-+  |  .elif GPR64
-+  |    la TMP2, 112(sp)
-+  |  .else
-+  |    la TMP2, 8(sp)
-+  |  .endif
-   |1:
--  |  lwzx TMP0, TMP1, CARG2
--  |  stwx TMP0, TMP2, CARG2
--  |   addic. CARG2, CARG2, -4
-+  |  lpx TMP0, TMP1, CARG2
-+  |  stpx TMP0, TMP2, CARG2
-+  |   addic. CARG2, CARG2, -PSIZE
-   |  bge <1
-   |2:
-   |  bney cr1, >3
-@@ -2878,28 +3169,55 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f6, CCSTATE->fpr[5]
-   |  lfd f7, CCSTATE->fpr[6]
-   |  lfd f8, CCSTATE->fpr[7]
-+  |  .if GPR64
-+  |  lfd f9, CCSTATE->fpr[8]
-+  |  lfd f10, CCSTATE->fpr[9]
-+  |  lfd f11, CCSTATE->fpr[10]
-+  |  lfd f12, CCSTATE->fpr[11]
-+  |  lfd f13, CCSTATE->fpr[12]
-+  |  .endif
-   |3:
--  |   lp TMP0, CCSTATE->func
--  |  lwz CARG2, CCSTATE->gpr[1]
--  |  lwz CARG3, CCSTATE->gpr[2]
--  |  lwz CARG4, CCSTATE->gpr[3]
--  |  lwz CARG5, CCSTATE->gpr[4]
--  |   mtctr TMP0
--  |  lwz r8, CCSTATE->gpr[5]
--  |  lwz r9, CCSTATE->gpr[6]
--  |  lwz r10, CCSTATE->gpr[7]
--  |  lwz CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-+  |  .toc std TOCREG, SAVE_TOC
-+  |   lp FUNCREG, CCSTATE->func
-+  |  lp CARG2, CCSTATE->gpr[1]
-+  |  lp CARG3, CCSTATE->gpr[2]
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-+  |  lp CARG4, CCSTATE->gpr[3]
-+  |  lp CARG5, CCSTATE->gpr[4]
-+  |   mtctr FUNCREG
-+  |  lp r8, CCSTATE->gpr[5]
-+  |  lp r9, CCSTATE->gpr[6]
-+  |  lp r10, CCSTATE->gpr[7]
-+  |  lp CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-   |   bctrl
--  |  lwz CCSTATE:TMP1, -8(r14)
--  |  lwz TMP2, -4(r14)
-+  |   .toc lp TOCREG, SAVE_TOC
-+  |  .if GPR64
-+  |   ld CCSTATE:TMP1, -16(r14)
-+  |   ld TMP2, -8(r14)
-+  |   ld TMP0, 16(r14)
-+  |  .else
-+  |   lwz CCSTATE:TMP1, -8(r14)
-+  |   lwz TMP2, -4(r14)
-   |   lwz TMP0, 4(r14)
--  |  stw CARG1, CCSTATE:TMP1->gpr[0]
-+  |  .endif
-+  |  stp CARG1, CCSTATE:TMP1->gpr[0]
-   |  stfd FARG1, CCSTATE:TMP1->fpr[0]
--  |  stw CARG2, CCSTATE:TMP1->gpr[1]
-+  |  stp CARG2, CCSTATE:TMP1->gpr[1]
-+  |  .if GPR64
-+  |   stfd FARG2, CCSTATE:TMP1->fpr[1]
-+  |  .endif
-+  |  .elfv2 stfd FARG3, CCSTATE:TMP1->fpr[2]
-+  |  .elfv2 stfd FARG4, CCSTATE:TMP1->fpr[3]
-+  |  .elfv2 stfd FARG5, CCSTATE:TMP1->fpr[4]
-+  |  .elfv2 stfd FARG6, CCSTATE:TMP1->fpr[5]
-+  |  .elfv2 stfd FARG7, CCSTATE:TMP1->fpr[6]
-+  |  .elfv2 stfd FARG8, CCSTATE:TMP1->fpr[7]
-   |   mtlr TMP0
--  |  stw CARG3, CCSTATE:TMP1->gpr[2]
-+  |  stp CARG3, CCSTATE:TMP1->gpr[2]
-   |   mr sp, r14
--  |  stw CARG4, CCSTATE:TMP1->gpr[3]
-+  |  stp CARG4, CCSTATE:TMP1->gpr[3]
-   |   mr r14, TMP2
-   |  blr
-   |.endif
-@@ -2923,13 +3241,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzx TMP1, BASE_HI, RD
-     |    lwz TMP2, -4(PC)
-     |  checknum cr0, TMP0
--    |   lwz CARG3, 4(RD)
-+    |   lwzx CARG3, BASE_LO, RD
-     |    decode_RD4 TMP2, TMP2
-     |  checknum cr1, TMP1
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-@@ -2953,7 +3271,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bgt cr0, ->vmeta_comp
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  bgt cr1, ->vmeta_comp
-     |  blt cr1, >4
-     |  // RA is a number, RD is an integer.
-@@ -2965,7 +3283,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA is an integer, RD is a number.
-     |  tonum_i f0, CARG2
-     |4:
--    |  lfd f1, 0(RD)
-+    |  lfdx f1, BASE, RD
-     |5:
-     |  fcmpu cr0, f0, f1
-     if (op == BC_ISLT) {
-@@ -2981,10 +3299,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  b <1
-     |.else
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
--    |  lwzx TMP1, BASE, RD
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |   lfdx f1, BASE, RD
-@@ -3015,15 +3333,23 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQV;
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  .if ENDIAN_LE
-+    |    lwzx TMP1, BASE_HI, RD
-+    |  .else
-+    |    lwzux TMP1, RD, BASE_HI
-+    |  .endif
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-+    |  .if ENDIAN_LE
-+    |   lwzux CARG3, RD, BASE_LO
-+    |  .else
-+    |   lwz CARG3, WORD_LO(RD)
-+    |  .endif
-     |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-@@ -3032,14 +3358,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  ble cr7, ->BC_ISNEN_Z
-     }
-     |.else
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   lwz TMP2, 0(PC)
--    |    lfd f0, 0(RA)
-+    |    lfdx f0, BASE, RA
-     |   addi PC, PC, 4
--    |  lwzux TMP1, RD, BASE
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |   decode_RD4 TMP2, TMP2
--    |    lfd f1, 0(RD)
-+    |    lfdx f1, BASE, RD
-     |  checknum cr1, TMP1
-     |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     |  bge cr0, >5
-@@ -3057,8 +3383,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.endif
-     |5:  // Either or both types are not numbers.
-     |.if not DUALNUM
--    |    lwz CARG2, 4(RA)
--    |    lwz CARG3, 4(RD)
-+    |    lwzx CARG2, BASE_LO, RA
-+    |    lwzx CARG3, BASE_LO, RD
-     |.endif
-     |.if FFI
-     |  cmpwi cr7, TMP0, LJ_TCDATA
-@@ -3074,10 +3400,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.if FFI
-     |  beq cr7, ->vmeta_equal_cd
-     |.endif
-+    |.if P64
-+    |   cmplwi cr7, TMP3, ~LJ_TUDATA		// Avoid 64 bit lightuserdata.
-+    |.endif
-     |    cmplw cr5, CARG2, CARG3
-     |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
-     |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
-     |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
-+    |.if P64
-+    |   cror 4*cr6+lt, 4*cr6+lt, 4*cr7+gt
-+    |.endif
-     |   mr SAVE0, PC
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
-     |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
-@@ -3116,9 +3448,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQS: case BC_ISNES:
-     vk = op == BC_ISEQS;
-     |  // RA = src*8, RD = str_const*8 (~), JMP with RD = target
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi RD, RD, 1
--    |  lwz STR:TMP3, 4(RA)
-+    |  lwzx STR:TMP3, BASE_LO, RA
-     |    lwz TMP2, 0(PC)
-     |   subfic RD, RD, -4
-     |    addi PC, PC, 4
-@@ -3150,15 +3482,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQN;
-     |  // RA = src*8, RD = num_const*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, KBASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzux2 TMP1, CARG3, RD, KBASE
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-       |->BC_ISEQN_Z:
-@@ -3175,7 +3506,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     } else {
-       |->BC_ISNEN_Z:  // Dummy label.
-     }
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
-     |    lwz TMP2, -4(PC)
-@@ -3213,7 +3544,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bge cr0, <3
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  blt cr1, >1
-     |  // RA is a number, RD is an integer.
-     |  tonum_i f1, CARG3
-@@ -3232,7 +3563,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQP: case BC_ISNEP:
-     vk = op == BC_ISEQP;
-     |  // RA = src*8, RD = primitive_type*8 (~), JMP with RD = target
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi TMP1, RD, 3
-     |    lwz TMP2, 0(PC)
-     |   not TMP1, TMP1
-@@ -3262,7 +3593,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
-     |  // RA = dst*8 or unused, RD = src*8, JMP with RD = target
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |   lwz INS, 0(PC)
-     |   addi PC, PC, 4
-     if (op == BC_IST || op == BC_ISF) {
-@@ -3297,7 +3628,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTYPE:
-     |  // RA = src*8, RD = -type*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  srwi TMP1, RD, 3
-     |  ins_next1
-     |.if not PPE and not GPR64
-@@ -3311,7 +3642,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_ISNUM:
-     |  // RA = src*8, RD = -(TISNUM-1)*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  ins_next1
-     |  checknum TMP0
-     |  bge ->vmeta_istype
-@@ -3330,17 +3661,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_NOT:
-     |  // RA = dst*8, RD = src*8
-     |  ins_next1
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |  .gpr64 extsw TMP0, TMP0
-     |  subfic TMP1, TMP0, LJ_TTRUE
-     |  adde TMP0, TMP0, TMP1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_UNM:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP1, RD, BASE
--    |   lwz TMP0, 4(RD)
-+    |  lwzx TMP1, BASE_HI, RD
-+    |   lwzx TMP0, BASE_LO, RD
-+    |.if DUALNUM and not GPR64
-+    |  mtxer ZERO
-+    |.endif
-     |  checknum TMP1
-     |.if DUALNUM
-     |  bne >5
-@@ -3352,18 +3686,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.else
-     |  nego. TMP0, TMP0
-     |  bso >4
--    |1:
-     |.endif
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |3:
-     |  ins_next2
-     |4:
--    |.if not GPR64
--    |  // Potential overflow.
--    |  checkov TMP1, <1			// Ignore unrelated overflow.
--    |.endif
-     |  lus TMP1, 0x41e0			// 2^31.
-     |  li TMP0, 0
-     |  b >7
-@@ -3373,8 +3702,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  xoris TMP1, TMP1, 0x8000
-     |7:
-     |  ins_next1
--    |  stwux TMP1, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TMP1, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |.if DUALNUM
-     |  b <3
-     |.else
-@@ -3383,15 +3712,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_LEN:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP0, RD, BASE
--    |   lwz CARG1, 4(RD)
-+    |  lwzx TMP0, BASE_HI, RD
-+    |   lwzx CARG1, BASE_LO, RD
-     |  checkstr TMP0; bne >2
-     |  lwz CRET1, STR:CARG1->len
-     |1:
-     |.if DUALNUM
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw CRET1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx CRET1, BASE_LO, RA
-     |.else
-     |  tonum_u f0, CRET1		// Result is a non-negative integer.
-     |  ins_next1
-@@ -3426,9 +3755,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, KBASE, RC
-@@ -3442,9 +3778,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f15, BASE, RB
-     |    lfdx f14, KBASE, RC
-@@ -3458,8 +3801,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||default:
--    |   lwzx TMP1, BASE, RB
--    |   lwzx TMP2, BASE, RC
-+    |   lwzx TMP1, BASE_HI, RB
-+    |   lwzx TMP2, BASE_HI, RC
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, BASE, RC
-     |   checknum cr0, TMP1
-@@ -3514,41 +3857,62 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG2, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG1, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG2, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG1, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG1, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG2, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG1, 4(RC)
-+    |   .endif
-     ||  break;
-     ||default:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, BASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |     lwzx TMP2, RC, BASE_HI
-+    |      lwzux CARG1, RB, BASE
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RC, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, BASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||}
-+    |  mtxer ZERO
-     |  checknum cr1, TMP2
-     |  bne >5
-     |  bne cr1, >5
-     |  intins CARG1, CARG1, CARG2
--    |  bso >4
--    |1:
-+    |  ins_arithfallback bso
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |  stw CARG1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |  stwx CARG1, BASE_LO, RA
-     |2:
-     |  ins_next2
--    |4:  // Overflow.
--    |  checkov TMP0, <1			// Ignore unrelated overflow.
--    |  ins_arithfallback b
-     |5:  // FP variant.
-     ||if (vk == 1) {
-     |  lfd f15, 0(RB)
-@@ -3620,9 +3984,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_POW:
-     |  // NYI: (partial) integer arithmetic.
--    |  lwzx TMP1, BASE, RB
-+    |  lwzx TMP1, BASE_HI, RB
-     |   lfdx FARG1, BASE, RB
--    |  lwzx TMP2, BASE, RC
-+    |  lwzx TMP2, BASE_HI, RC
-     |   lfdx FARG2, BASE, RC
-     |  checknum cr0, TMP1
-     |  checknum cr1, TMP2
-@@ -3648,6 +4012,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns NULL (finished) or TValue * (metamethod).
-     |  cmplwi CRET1, 0
-     |   lp BASE, L->base
-+    |   addi BASEP4, BASE, 4
-     |  bne ->vmeta_binop
-     |  ins_next1
-     |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
-@@ -3664,8 +4029,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-str_const*4
-     |  li TMP2, LJ_TSTR
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     break;
-   case BC_KCDATA:
-@@ -3676,8 +4041,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-cdata_const*4
-     |  li TMP2, LJ_TCDATA
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3687,14 +4052,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  slwi RD, RD, 13
-     |  srawi RD, RD, 16
-     |  ins_next1
--    |   stwux TISNUM, RA, BASE
--    |   stw RD, 4(RA)
-+    |   stwx TISNUM, BASE_HI, RA
-+    |   stwx RD, BASE_LO, RA
-     |  ins_next2
-     |.else
-     |  // The soft-float approach is faster.
-     |  slwi RD, RD, 13
-     |  srawi TMP1, RD, 31
-     |  xor TMP2, TMP1, RD
-+    |  .gpr64 extsw RD, RD
-     |  sub TMP2, TMP2, TMP1		// TMP2 = abs(x)
-     |  cntlzw TMP3, TMP2
-     |  subfic TMP1, TMP3, 0x40d		// TMP1 = exponent-1
-@@ -3706,8 +4072,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add RD, RD, TMP1		// hi = hi + exponent-1
-     |    and RD, RD, TMP0		// hi = x == 0 ? 0 : hi
-     |  ins_next1
--    |    stwux RD, RA, BASE
--    |    stw ZERO, 4(RA)
-+    |    stwx RD, BASE_HI, RA
-+    |    stwx ZERO, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3723,15 +4089,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  srwi TMP1, RD, 3
-     |  not TMP0, TMP1
-     |  ins_next1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_KNIL:
-     |  // RA = base*8, RD = end*8
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |   addi RA, RA, 8
-     |1:
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |  cmpw RA, RD
-     |   addi RA, RA, 8
-     |  blt <1
-@@ -3763,10 +4129,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lwz CARG2, UPVAL:RB->v
-     |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
-     |    lbz TMP0, UPVAL:RB->closed
--    |   lwz TMP2, 0(RD)
-+    |   lwz TMP2, WORD_HI(RD)
-     |   stfd f0, 0(CARG2)
-     |    cmplwi cr1, TMP0, 0
--    |   lwz TMP1, 4(RD)
-+    |   lwz TMP1, WORD_LO(RD)
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
-     |   subi TMP2, TMP2, (LJ_TNUMX+1)
-     |  bne >2				// Upvalue is closed and black?
-@@ -3799,8 +4165,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lbz TMP3, STR:TMP1->marked
-     |   lbz TMP2, UPVAL:RB->closed
-     |   li TMP0, LJ_TSTR
--    |   stw STR:TMP1, 4(CARG2)
--    |   stw TMP0, 0(CARG2)
-+    |   stw STR:TMP1, WORD_LO(CARG2)
-+    |   stw TMP0, WORD_HI(CARG2)
-     |  bne >2
-     |1:
-     |  ins_next
-@@ -3837,7 +4203,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwzx UPVAL:RB, LFUNC:RB, RA
-     |  ins_next1
-     |  lwz TMP1, UPVAL:RB->v
--    |  stw TMP0, 0(TMP1)
-+    |  stw TMP0, WORD_HI(TMP1)
-     |  ins_next2
-     break;
- 
-@@ -3852,6 +4218,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add CARG2, BASE, RA
-     |  bl extern lj_func_closeuv	// (lua_State *L, TValue *level)
-     |  lp BASE, L->base
-+    |  addi BASEP4, BASE, 4
-     |1:
-     |  ins_next
-     break;
-@@ -3870,8 +4237,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns GCfuncL *.
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TFUNC
--    |  stwux TMP0, RA, BASE
--    |  stw LFUNC:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx LFUNC:CRET1, BASE_LO, RA
-     |  ins_next
-     break;
- 
-@@ -3904,8 +4272,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TTAB
--    |  stwux TMP0, RA, BASE
--    |  stw TAB:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx TAB:CRET1, BASE_LO, RA
-     |  ins_next
-     if (op == BC_TNEW) {
-       |3:
-@@ -3938,13 +4307,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TGETV:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -3971,8 +4340,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP2, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tgetv		// Integer key and in array part?
--    |  lwzx TMP0, TMP1, TMP2
--    |   lfdx f14, TMP1, TMP2
-+    |  .if ENDIAN_LE
-+    |    lfdux f14, TMP1, TMP2
-+    |    lwz TMP0, WORD_HI(TMP1)
-+    |  .else
-+    |    lwzx TMP0, TMP1, TMP2
-+    |    lfdx f14, TMP1, TMP2
-+    |  .endif
-     |  checknil TMP0; beq >2
-     |1:
-     |  ins_next1
-@@ -3991,15 +4365,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tgetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TGETS_Z			// String key?
-     break;
-   case BC_TGETS:
-     |  // RA = dst*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4015,16 +4389,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  sub TMP1, TMP0, TMP1
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
--    |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+    |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-     |  checkstr CARG1; bne >4
-     |   cmpw TMP0, STR:RC; bne >4
-     |    checknil CARG2; beq >5		// Key found, but nil value?
-     |3:
--    |    stwux CARG2, RA, BASE
--    |     stw TMP1, 4(RA)
-+    |    stwx CARG2, BASE_HI, RA
-+    |     stwx TMP1, BASE_LO, RA
-     |  ins_next
-     |
-     |4:  // Follow hash chain.
-@@ -4045,15 +4419,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETB:
-     |  // RA = dst*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tgetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-     |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
--    |  lwzx TMP1, TMP2, RC
--    |   lfdx f0, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    lfdux f0, TMP2, RC
-+    |    lwz TMP1, WORD_HI(TMP2)
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |    lfdx f0, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  ins_next1
-@@ -4071,12 +4450,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG1, 4(RB)
-+    |  lwzx TAB:CARG1, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |  lwz TMP0, TAB:CARG1->asize
--    |  lwz CARG2, 4(RC)
-+    |  lwzx CARG2, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG1->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4096,13 +4473,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TSETV:
-     |  // RA = src*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -4129,7 +4506,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP0, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tsetv		// Integer key and in array part?
-+    |  .if ENDIAN_LE
-+    |   addi TMP2, TMP1, 4
-+    |   lwzx TMP2, TMP2, TMP0
-+    |  .else
-     |   lwzx TMP2, TMP1, TMP0
-+    |  .endif
-     |  lbz TMP3, TAB:RB->marked
-     |    lfdx f14, BASE, RA
-     |   checknil TMP2; beq >3
-@@ -4152,7 +4534,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tsetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TSETS_Z			// String key?
-     |
-@@ -4162,9 +4544,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETS:
-     |  // RA = src*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4183,9 +4565,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    lbz TMP3, TAB:RB->marked
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-     |     lwz NODE:TMP1, NODE:TMP2->next
-     |  checkstr CARG1; bne >5
-     |   cmpw TMP0, STR:RC; bne >5
-@@ -4225,13 +4607,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq ->vmeta_tsets		// 'no __newindex' flag NOT set: check.
-     |6:
-     |  li TMP0, LJ_TSTR
--    |   stw STR:RC, 4(CARG3)
-+    |   stw STR:RC, WORD_LO(CARG3)
-     |   mr CARG2, TAB:RB
--    |  stw TMP0, 0(CARG3)
-+    |  stw TMP0, WORD_HI(CARG3)
-     |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
-     |  // Returns TValue *.
-     |  lp BASE, L->base
-     |  stfd f14, 0(CRET1)
-+    |   addi BASEP4, BASE, 4
-     |  b <3				// No 2nd write barrier needed.
-     |
-     |7:  // Possible table write barrier for the value. Skip valiswhite check.
-@@ -4240,9 +4623,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETB:
-     |  // RA = src*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tsetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-@@ -4250,7 +4633,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw TMP0, TMP1
-     |   lfdx f14, BASE, RA
-     |  bge ->vmeta_tsetb
--    |  lwzx TMP1, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    addi TMP1, TMP2, 4
-+    |    lwzx TMP1, TMP1, RC
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
-@@ -4274,13 +4662,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG2, 4(RB)
-+    |  lwzx TAB:CARG2, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |    lbz TMP3, TAB:CARG2->marked
-     |  lwz TMP0, TAB:CARG2->asize
--    |  lwz CARG3, 4(RC)
-+    |  lwzx CARG3, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG2->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4311,9 +4697,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |1:
-     |   add TMP3, KBASE, RD
--    |  lwz TAB:CARG2, -4(RA)		// Guaranteed to be a table.
-+    |  lwz TAB:CARG2, WORD_LO-8(RA)	// Guaranteed to be a table.
-     |    addic. TMP0, MULTRES, -8
--    |   lwz TMP3, 4(TMP3)		// Integer constant is in lo-word.
-+    |   lwz TMP3, WORD_LO(TMP3)		// Integer constant is in lo-word.
-     |    srwi CARG3, TMP0, 3
-     |    beq >4				// Nothing to copy?
-     |  add CARG3, CARG3, TMP3
-@@ -4362,8 +4748,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_CALL:
-     |  // RA = base*8, (RB = (nresults+1)*8,) RC = (nargs+1)*8
-     |  mr TMP2, BASE
--    |  lwzux TMP0, BASE, RA
--    |   lwz LFUNC:RB, 4(BASE)
-+    |  lwzux2 TMP0, LFUNC:RB, BASE, RA
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |   addi BASE, BASE, 8
-     |  checkfunc TMP0; bne ->vmeta_call
-@@ -4377,8 +4762,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_CALLT:
-     |  // RA = base*8, (RB = 0,) RC = (nargs+1)*8
--    |  lwzux TMP0, RA, BASE
--    |   lwz LFUNC:RB, 4(RA)
-+    |  lwzux2 TMP0, LFUNC:RB, RA, BASE
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |    lwz TMP1, FRAME_PC(BASE)
-     |  checkfunc TMP0
-@@ -4430,12 +4814,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 ((2+1)*8))
-     |  mr TMP2, BASE
-     |  add BASE, BASE, RA
--    |  lwz TMP1, -24(BASE)
--    |   lwz LFUNC:RB, -20(BASE)
-+    |  lwz TMP1, WORD_HI-24(BASE)
-+    |   lwz LFUNC:RB, WORD_LO-24(BASE)
-     |    lfd f1, -8(BASE)
-     |    lfd f0, -16(BASE)
--    |  stw TMP1, 0(BASE)		// Copy callable.
--    |   stw LFUNC:RB, 4(BASE)
-+    |  stw TMP1, WORD_HI(BASE)		// Copy callable.
-+    |   stw LFUNC:RB, WORD_LO(BASE)
-     |  checkfunc TMP1
-     |    stfd f1, 16(BASE)		// Copy control var.
-     |     li NARGS8:RC, 16		// Iterators get 2 arguments.
-@@ -4450,8 +4834,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // NYI: add hotloop, record BC_ITERN.
-     |.endif
-     |  add RA, BASE, RA
--    |  lwz TAB:RB, -12(RA)
--    |  lwz RC, -4(RA)			// Get index from control var.
-+    |  lwz TAB:RB, WORD_LO-16(RA)
-+    |  lwz RC, WORD_LO-8(RA)		// Get index from control var.
-     |  lwz TMP0, TAB:RB->asize
-     |  lwz TMP1, TAB:RB->array
-     |   addi PC, PC, 4
-@@ -4459,14 +4843,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw RC, TMP0
-     |   slwi TMP3, RC, 3
-     |  bge >5				// Index points after array part?
--    |  lwzx TMP2, TMP1, TMP3
--    |   lfdx f0, TMP1, TMP3
-+    |  lfdux f0, TMP3, TMP1
-+    |   lwz TMP2, WORD_HI(TMP3)
-     |  checknil TMP2
-     |     lwz INS, -4(PC)
-     |  beq >4
-     |.if DUALNUM
--    |   stw RC, 4(RA)
--    |   stw TISNUM, 0(RA)
-+    |   stw RC, WORD_LO(RA)
-+    |   stw TISNUM, WORD_HI(RA)
-     |.else
-     |   tonum_u f1, RC
-     |.endif
-@@ -4474,7 +4858,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
-     |  stfd f0, 8(RA)
-     |     decode_RD4 TMP1, INS
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |     add PC, TMP1, TMP3
-     |.if not DUALNUM
-     |   stfd f1, 0(RA)
-@@ -4496,9 +4880,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgty <3
-     |   slwi RB, RC, 3
-     |   sub TMP3, TMP3, RB
--    |  lwzx RB, TMP2, TMP3
--    |  lfdx f0, TMP2, TMP3
--    |   add NODE:TMP3, TMP2, TMP3
-+    |  lfdux f0, TMP3, TMP2
-+    |  lwz RB, WORD_HI(TMP3)
-     |  checknil RB
-     |     lwz INS, -4(PC)
-     |  beq >7
-@@ -4510,7 +4893,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   stfd f1, 0(RA)
-     |    addi RC, RC, 1
-     |     add PC, TMP1, TMP2
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |  b <3
-     |
-     |7:  // Skip holes in hash part.
-@@ -4521,10 +4904,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISNEXT:
-     |  // RA = base*8, RD = target (points to ITERN)
-     |  add RA, BASE, RA
--    |  lwz TMP0, -24(RA)
--    |  lwz CFUNC:TMP1, -20(RA)
--    |   lwz TMP2, -16(RA)
--    |    lwz TMP3, -8(RA)
-+    |  lwz TMP0, WORD_HI-24(RA)
-+    |  lwz CFUNC:TMP1, WORD_LO-24(RA)
-+    |   lwz TMP2, WORD_HI-16(RA)
-+    |    lwz TMP3, WORD_HI-8(RA)
-     |   cmpwi cr0, TMP2, LJ_TTAB
-     |  cmpwi cr1, TMP0, LJ_TFUNC
-     |    cmpwi cr6, TMP3, LJ_TNIL
-@@ -4538,17 +4921,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bne cr0, >5
-     |  lus TMP1, 0xfffe
-     |  ori TMP1, TMP1, 0x7fff
--    |  stw ZERO, -4(RA)			// Initialize control var.
--    |  stw TMP1, -8(RA)
-+    |  stw ZERO, WORD_LO-8(RA)		// Initialize control var.
-+    |  stw TMP1, WORD_HI-8(RA)
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-     |1:
-     |  ins_next
-     |5:  // Despecialize bytecode if any of the checks fail.
-     |  li TMP0, BC_JMP
-     |   li TMP1, BC_ITERC
-+    |  .if ENDIAN_LE
-+    |  stb TMP0, -4(PC)
-+    |  .else
-     |  stb TMP0, -1(PC)
-+    |  .endif
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-+    |  .if ENDIAN_LE
-+    |   stb TMP1, 0(PC)
-+    |  .else
-     |   stb TMP1, 3(PC)
-+    |  .endif
-     |  b <1
-     break;
- 
-@@ -4582,7 +4973,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    addi RA, RA, 8
-     |   blt cr1, <1			// More vararg slots?
-     |2:  // Fill up remainder with nil.
--    |  stw TISNIL, 0(RA)
-+    |  stw TISNIL, WORD_HI(RA)
-     |  cmplw RA, TMP2
-     |   addi RA, RA, 8
-     |  blt <2
-@@ -4619,6 +5010,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |  add RC, BASE, SAVE0
-     |  subi TMP3, BASE, 8
-+    |  addi BASEP4, BASE, 4
-     |  b <6
-     break;
- 
-@@ -4667,13 +5059,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4709,13 +5102,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4741,11 +5135,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = (op == BC_IFORL || op == BC_JFORL);
-     |.if DUALNUM
-     |  // Integer loop.
--    |  lwzux TMP1, RA, BASE
--    |   lwz CARG1, FORL_IDX*8+4(RA)
-+    |  lwzux2 TMP1, CARG1, RA, BASE
-+    if (vk) {
-+      |  mtxer ZERO
-+    }
-     |  cmplw cr0, TMP1, TISNUM
-     if (vk) {
--      |   lwz CARG3, FORL_STEP*8+4(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-       |  bne >9
-       |.if GPR64
-       |  // Need to check overflow for (a<<32) + (b<<32).
-@@ -4757,15 +5153,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  addo. CARG1, CARG1, CARG3
-       |.endif
-       |    cmpwi cr6, CARG3, 0
--      |   lwz CARG2, FORL_STOP*8+4(RA)
--      |  bso >6
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-+      |  bso >2
-       |4:
--      |  stw CARG1, FORL_IDX*8+4(RA)
-+      |  stw CARG1, FORL_IDX*8+WORD_LO(RA)
-     } else {
--      |  lwz TMP3, FORL_STEP*8(RA)
--      |   lwz CARG3, FORL_STEP*8+4(RA)
--      |  lwz TMP2, FORL_STOP*8(RA)
--      |   lwz CARG2, FORL_STOP*8+4(RA)
-+      |  lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-+      |  lwz TMP2, FORL_STOP*8+WORD_HI(RA)
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-       |  cmplw cr7, TMP3, TISNUM
-       |  cmplw cr1, TMP2, TISNUM
-       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
-@@ -4776,11 +5172,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    blt cr6, >5
-     |  cmpw CARG1, CARG2
-     |1:
--    |   stw TISNUM, FORL_EXT*8(RA)
-+    |   stw TISNUM, FORL_EXT*8+WORD_HI(RA)
-     if (op != BC_JFORL) {
-       |  srwi RD, RD, 1
-     }
--    |   stw CARG1, FORL_EXT*8+4(RA)
-+    |   stw CARG1, FORL_EXT*8+WORD_LO(RA)
-     if (op != BC_JFORL) {
-       |  add RD, PC, RD
-     }
-@@ -4800,11 +5196,6 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:  // Invert check for negative step.
-     |  cmpw CARG2, CARG1
-     |  b <1
--    if (vk) {
--      |6:  // Potential overflow.
--      |  checkov TMP0, <4		// Ignore unrelated overflow.
--      |  b <2
--    }
-     |.endif
-     if (vk) {
-       |.if DUALNUM
-@@ -4815,14 +5206,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |.endif
-       |  lfd f3, FORL_STEP*8(RA)
-       |  lfd f2, FORL_STOP*8(RA)
--      |   lwz TMP3, FORL_STEP*8(RA)
-+      |   lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-       |  fadd f1, f1, f3
-       |  stfd f1, FORL_IDX*8(RA)
-     } else {
-       |.if DUALNUM
-       |9:  // FP loop.
-       |.else
-+      |.if ENDIAN_LE
-+      |  lwzx TMP1, RA, BASE_LO
-+      |  add RA, RA, BASE
-+      |.else
-       |  lwzux TMP1, RA, BASE
-+      |.endif
-       |  lwz TMP3, FORL_STEP*8(RA)
-       |  lwz TMP2, FORL_STOP*8(RA)
-       |  cmplw cr0, TMP1, TISNUM
-@@ -4903,17 +5299,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- #endif
-   case BC_IITERL:
-     |  // RA = base*8, RD = target
--    |  lwzux TMP1, RA, BASE
--    |   lwz TMP2, 4(RA)
-+    |  lwzux2 TMP1, TMP2, RA, BASE
-     |  checknil TMP1; beq >1		// Stop if iterator returned nil.
-     if (op == BC_JITERL) {
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-       |  b =>BC_JLOOP
-     } else {
-       |  branch_RD			// Otherwise save control var + branch.
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-     }
-     |1:
-     |  ins_next
-@@ -4942,7 +5337,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Traces on PPC don't store the trace number, so use 0.
-     |   stw ZERO, DISPATCH_GL(vmstate)(DISPATCH)
-     |  lwzx TRACE:TMP2, TMP1, RD
--    |  clrso TMP1
-+    |  mtxer ZERO
-     |  lp TMP2, TRACE:TMP2->mcode
-     |   stw BASE, DISPATCH_GL(jit_base)(DISPATCH)
-     |  mtctr TMP2
-@@ -4994,7 +5389,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |
-     |3:  // Clear missing parameters.
--    |  stwx TISNIL, BASE, NARGS8:RC
-+    |  stwx TISNIL, BASE_HI, NARGS8:RC
-     |  addi NARGS8:RC, NARGS8:RC, 8
-     |  b <2
-     break;
-@@ -5011,11 +5406,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwz TMP2, L->maxstack
-     |   add TMP1, BASE, RC
-     |  add TMP0, RA, RC
--    |   stw LFUNC:RB, 4(TMP1)		// Store copy of LFUNC.
-+    |   stw LFUNC:RB, WORD_LO(TMP1)	// Store copy of LFUNC.
-     |   addi TMP3, RC, 8+FRAME_VARG
-     |    lwz KBASE, -4+PC2PROTO(k)(PC)
-     |  cmplw TMP0, TMP2
--    |   stw TMP3, 0(TMP1)		// Store delta + FRAME_VARG.
-+    |   stw TMP3, WORD_HI(TMP1)		// Store delta + FRAME_VARG.
-     |  bge ->vm_growstack_l
-     |  lbz TMP2, -4+PC2PROTO(numparams)(PC)
-     |   mr RA, BASE
-@@ -5026,18 +5421,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq >3
-     |1:
-     |  cmplw RA, RC			// Less args than parameters?
--    |   lwz TMP0, 0(RA)
--    |   lwz TMP3, 4(RA)
-+    |   lwz TMP0, WORD_HI(RA)
-+    |   lwz TMP3, WORD_LO(RA)
-     |  bge >4
--    |    stw TISNIL, 0(RA)		// Clear old fixarg slot (help the GC).
-+    |    stw TISNIL, WORD_HI(RA)	// Clear old fixarg slot (help the GC).
-     |    addi RA, RA, 8
-     |2:
-     |  addic. TMP2, TMP2, -1
--    |   stw TMP0, 8(TMP1)
--    |   stw TMP3, 12(TMP1)
-+    |   stw TMP0, WORD_HI+8(TMP1)
-+    |   stw TMP3, WORD_LO+8(TMP1)
-     |    addi TMP1, TMP1, 8
-     |  bne <1
-     |3:
-+    |  addi BASEP4, BASE, 4
-     |  ins_next2
-     |
-     |4:  // Clear missing parameters.
-@@ -5049,35 +5445,35 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_FUNCCW:
-     |  // BASE = new base, RA = BASE+framesize*8, RB = CFUNC, RC = nargs*8
-     if (op == BC_FUNCC) {
--      |  lp RD, CFUNC:RB->f
-+      |  lp FUNCREG, CFUNC:RB->f
-     } else {
--      |  lp RD, DISPATCH_GL(wrapf)(DISPATCH)
-+      |  lp FUNCREG, DISPATCH_GL(wrapf)(DISPATCH)
-     }
-     |   add TMP1, RA, NARGS8:RC
-     |   lwz TMP2, L->maxstack
--    |  .toc lp TMP3, 0(RD)
-+    |  .opd lp TMP3, 0(FUNCREG)
-     |    add RC, BASE, NARGS8:RC
-     |   stp BASE, L->base
-     |   cmplw TMP1, TMP2
-     |    stp RC, L->top
-     |     li_vmstate C
--    |.if TOC
-+    |.if OPD
-     |  mtctr TMP3
-     |.else
--    |  mtctr RD
-+    |  mtctr FUNCREG
-     |.endif
-     if (op == BC_FUNCCW) {
-       |  lp CARG2, CFUNC:RB->f
-     }
-     |  mr CARG1, L
-     |   bgt ->vm_growstack_c		// Need to grow stack.
--    |  .toc lp TOCREG, TOC_OFS(RD)
--    |  .tocenv lp ENVREG, ENV_OFS(RD)
-+    |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+    |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-     |     st_vmstate
-     |  bctrl				// (lua_State *L [, lua_CFunction f])
-+    |  .toc lp TOCREG, SAVE_TOC
-     |  // Returns nresults.
-     |  lp BASE, L->base
--    |  .toc ld TOCREG, SAVE_TOC
-     |   slwi RD, CRET1, 3
-     |  lp TMP1, L->top
-     |    li_vmstate INTERP
-@@ -5128,7 +5524,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.byte 0xc\n\t.uleb128 1\n\t.uleb128 0\n"
- 	"\t.align 2\n"
-@@ -5141,14 +5541,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long .Lbegin\n"
- 	"\t.long %d\n"
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE0:\n\n");
-@@ -5164,8 +5574,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call\n"
- #endif
- 	"\t.long %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
-@@ -5180,7 +5594,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"zPR\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.uleb128 6\n"			/* augmentation length */
- 	"\t.byte 0x1b\n"			/* pcrel|sdata4 */
-@@ -5198,14 +5616,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE2:\n\n");
-@@ -5233,8 +5661,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call-.\n"
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch b/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
deleted file mode 100644
index f4e760b738361..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
+++ /dev/null
@@ -1,11 +0,0 @@
---- a/src/vm_ppc.dasc	2019-06-03 19:41:50.214671731 +0200
-+++ b/src/vm_ppc.dasc	2019-06-03 19:44:40.229686143 +0200
-@@ -2774,7 +2774,7 @@
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  addi sp, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-   |  saver 3 // CARG1
-   |  saver 4 // CARG2
-   |  saver 5 // CARG3
diff --git a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch b/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
deleted file mode 100644
index 487a1cd1ca787..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
+++ /dev/null
@@ -1,231 +0,0 @@
-commit 9da06535092d6d9dec442641a26c64bce5574322
-Author: Mike Pall <mike>
-Date:   Sun Jun 24 14:08:59 2018 +0200
-
-    ARM64: Fix exit stub patching.
-    
-    Contributed by Javier Guerra Giraldez.
-
-diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
-index cbb186d3..baafa21a 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -56,11 +56,11 @@ static void asm_exitstub_setup(ASMState *as, ExitNo nexits)
-     asm_mclimit(as);
-   /* 1: str lr,[sp]; bl ->vm_exit_handler; movz w0,traceno; bl <1; bl <1; ... */
-   for (i = nexits-1; (int32_t)i >= 0; i--)
--    *--mxp = A64I_LE(A64I_BL|((-3-i)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_MOVZw|A64F_U16(as->T->traceno));
-+    *--mxp = A64I_LE(A64I_BL | A64F_S26(-3-i));
-+  *--mxp = A64I_LE(A64I_MOVZw | A64F_U16(as->T->traceno));
-   mxp--;
--  *mxp = A64I_LE(A64I_BL|(((MCode *)(void *)lj_vm_exit_handler-mxp)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_STRx|A64F_D(RID_LR)|A64F_N(RID_SP));
-+  *mxp = A64I_LE(A64I_BL | A64F_S26(((MCode *)(void *)lj_vm_exit_handler-mxp)));
-+  *--mxp = A64I_LE(A64I_STRx | A64F_D(RID_LR) | A64F_N(RID_SP));
-   as->mctop = mxp;
- }
- 
-@@ -77,7 +77,7 @@ static void asm_guardcc(ASMState *as, A64CC cc)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cond_branch(as, cc^1, p-1);
-     return;
-   }
-@@ -91,7 +91,7 @@ static void asm_guardtnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_tnb(as, ai^0x01000000u, r, bit, p-1);
-     return;
-   }
-@@ -105,7 +105,7 @@ static void asm_guardcnb(ASMState *as, A64Ins ai, Reg r)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cnb(as, ai^0x01000000u, r, p-1);
-     return;
-   }
-@@ -1850,7 +1850,7 @@ static void asm_loop_fixup(ASMState *as)
-     p[-2] |= ((uint32_t)delta & mask) << 5;
-   } else {
-     ptrdiff_t delta = target - (p - 1);
--    p[-1] = A64I_B | ((uint32_t)(delta) & 0x03ffffffu);
-+    p[-1] = A64I_B | A64F_S26(delta);
-   }
- }
- 
-@@ -1919,7 +1919,7 @@ static void asm_tail_fixup(ASMState *as, TraceNo lnk)
-   }
-   /* Patch exit branch. */
-   target = lnk ? traceref(as->J, lnk)->mcode : (MCode *)lj_vm_exit_interp;
--  p[-1] = A64I_B | (((target-p)+1)&0x03ffffffu);
-+  p[-1] = A64I_B | A64F_S26((target-p)+1);
- }
- 
- /* Prepare tail of code. */
-@@ -1982,40 +1982,50 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
- {
-   MCode *p = T->mcode;
-   MCode *pe = (MCode *)((char *)p + T->szmcode);
--  MCode *cstart = NULL, *cend = p;
-+  MCode *cstart = NULL;
-   MCode *mcarea = lj_mcode_patch(J, p, 0);
-   MCode *px = exitstub_trace_addr(T, exitno);
-+  /* Note: this assumes a trace exit is only ever patched once. */
-   for (; p < pe; p++) {
-     /* Look for exitstub branch, replace with branch to target. */
-+    ptrdiff_t delta = target - p;
-     MCode ins = A64I_LE(*p);
-     if ((ins & 0xff000000u) == 0x54000000u &&
- 	((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch bcc exitstub. */
--      *p = A64I_LE((ins & 0xff00001fu) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch bcc, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0xfc000000u) == 0x14000000u &&
- 	       ((ins ^ (px-p)) & 0x03ffffffu) == 0) {
--      /* Patch b exitstub. */
--      *p = A64I_LE((ins & 0xfc000000u) | ((target-p) & 0x03ffffffu));
--      cend = p+1;
-+      /* Patch b. */
-+      lua_assert(A64F_S_OK(delta, 26));
-+      *p = A64I_LE((ins & 0xfc000000u) | A64F_S26(delta));
-       if (!cstart) cstart = p;
-     } else if ((ins & 0x7e000000u) == 0x34000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch cbz/cbnz exitstub. */
--      *p = A64I_LE((ins & 0xff00001f) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch cbz/cbnz, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0x7e000000u) == 0x36000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x0007ffe0u) == 0) {
--      /* Patch tbz/tbnz exitstub. */
--      *p = A64I_LE((ins & 0xfff8001fu) | (((target-p)<<5) & 0x0007ffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch tbz/tbnz, if within range. */
-+      if (A64F_S_OK(delta, 14)) {
-+	*p = A64I_LE((ins & 0xfff8001fu) | A64F_S14(delta));
-+	if (!cstart) cstart = p;
-+      }
-     }
-   }
--  lua_assert(cstart != NULL);
--  lj_mcode_sync(cstart, cend);
-+  {  /* Always patch long-range branch in exit stub itself. */
-+    ptrdiff_t delta = target - px;
-+    lua_assert(A64F_S_OK(delta, 26));
-+    *px = A64I_B | A64F_S26(delta);
-+    if (!cstart) cstart = px;
-+  }
-+  lj_mcode_sync(cstart, px+1);
-   lj_mcode_patch(J, mcarea, 1);
- }
- 
-diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
-index 6da4c7d4..1001b1d8 100644
---- a/src/lj_emit_arm64.h
-+++ b/src/lj_emit_arm64.h
-@@ -241,7 +241,7 @@ static void emit_loadk(ASMState *as, Reg rd, uint64_t u64, int is64)
- #define mcpofs(as, k) \
-   ((intptr_t)((uintptr_t)(k) - (uintptr_t)(as->mcp - 1)))
- #define checkmcpofs(as, k) \
--  ((((mcpofs(as, k)>>2) + 0x00040000) >> 19) == 0)
-+  (A64F_S_OK(mcpofs(as, k)>>2, 19))
- 
- static Reg ra_allock(ASMState *as, intptr_t k, RegSet allow);
- 
-@@ -312,7 +312,7 @@ static void emit_cond_branch(ASMState *as, A64CC cond, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = A64I_BCC | A64F_S19(delta) | cond;
- }
- 
-@@ -320,24 +320,24 @@ static void emit_branch(ASMState *as, A64Ins ai, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x02000000) >> 26) == 0);
--  *p = ai | ((uint32_t)delta & 0x03ffffffu);
-+  lua_assert(A64F_S_OK(delta, 26));
-+  *p = ai | A64F_S26(delta);
- }
- 
- static void emit_tnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(bit < 63 && ((delta + 0x2000) >> 14) == 0);
-+  lua_assert(bit < 63 && A64F_S_OK(delta, 14));
-   if (bit > 31) ai |= A64I_X;
--  *p = ai | A64F_BIT(bit & 31) | A64F_S14((uint32_t)delta & 0x3fffu) | r;
-+  *p = ai | A64F_BIT(bit & 31) | A64F_S14(delta) | r;
- }
- 
- static void emit_cnb(ASMState *as, A64Ins ai, Reg r, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = ai | A64F_S19(delta) | r;
- }
- 
-@@ -347,8 +347,8 @@ static void emit_call(ASMState *as, void *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = (char *)target - (char *)p;
--  if ((((delta>>2) + 0x02000000) >> 26) == 0) {
--    *p = A64I_BL | ((uint32_t)(delta>>2) & 0x03ffffffu);
-+  if (A64F_S_OK(delta>>2, 26)) {
-+    *p = A64I_BL | A64F_S26(delta>>2);
-   } else {  /* Target out of range: need indirect call. But don't use R0-R7. */
-     Reg r = ra_allock(as, i64ptr(target),
- 		      RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED);
-diff --git a/src/lj_target_arm64.h b/src/lj_target_arm64.h
-index 520023ae..a207a2ba 100644
---- a/src/lj_target_arm64.h
-+++ b/src/lj_target_arm64.h
-@@ -132,9 +132,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_IMMR(x)	((x) << 16)
- #define A64F_U16(x)	((x) << 5)
- #define A64F_U12(x)	((x) << 10)
--#define A64F_S26(x)	(x)
-+#define A64F_S26(x)	(((uint32_t)(x) & 0x03ffffffu))
- #define A64F_S19(x)	(((uint32_t)(x) & 0x7ffffu) << 5)
--#define A64F_S14(x)	((x) << 5)
-+#define A64F_S14(x)	(((uint32_t)(x) & 0x3fffu) << 5)
- #define A64F_S9(x)	((x) << 12)
- #define A64F_BIT(x)	((x) << 19)
- #define A64F_SH(sh, x)	(((sh) << 22) | ((x) << 10))
-@@ -145,6 +145,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_LSL16(x)	(((x) / 16) << 21)
- #define A64F_BSH(sh)	((sh) << 10)
- 
-+/* Check for valid field range. */
-+#define A64F_S_OK(x, b)	((((x) + (1 << (b-1))) >> (b)) == 0)
-+
- typedef enum A64Ins {
-   A64I_S = 0x20000000,
-   A64I_X = 0x80000000,
diff --git a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch b/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
deleted file mode 100644
index c30264786755f..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From: Jason Teplitz <jason@tensyr.com>
-Date: Mon, 9 Oct 2017 23:03:09 +0000
-Subject: Fix register allocation bug in arm64
-
----
- src/lj_asm_arm64.h | 3 +--
- 1 file changed, 1 insertion(+), 2 deletions(-)
-
-diff --git src/lj_asm_arm64.h src/lj_asm_arm64.h
-index 8fd92e7..549f8a6 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -871,7 +871,7 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   int bigofs = !emit_checkofs(A64I_LDRx, ofs);
-   RegSet allow = RSET_GPR;
-   Reg dest = (ra_used(ir) || bigofs) ? ra_dest(as, ir, RSET_GPR) : RID_NONE;
--  Reg node = ra_alloc1(as, ir->op1, allow);
-+  Reg node = ra_alloc1(as, ir->op1, ra_hasreg(dest) ? rset_clear(allow, dest) : allow);
-   Reg key = ra_scratch(as, rset_clear(allow, node));
-   Reg idx = node;
-   uint64_t k;
-@@ -879,7 +879,6 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   rset_clear(allow, key);
-   if (bigofs) {
-     idx = dest;
--    rset_clear(allow, dest);
-     kofs = (int32_t)offsetof(Node, key);
-   } else if (ra_hasreg(dest)) {
-     emit_opk(as, A64I_ADDx, dest, node, ofs, allow);
diff --git a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch b/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
deleted file mode 100644
index a217866c392cf..0000000000000
--- a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
+++ /dev/null
@@ -1,562 +0,0 @@
-From e9af1abec542e6f9851ff2368e7f196b6382a44c Mon Sep 17 00:00:00 2001
-From: Mike Pall <mike>
-Date: Wed, 30 Sep 2020 01:31:27 +0200
-Subject: [PATCH] Add support for full-range 64 bit lightuserdata.
-
----
- doc/status.html   | 11 ---------
- src/jit/dump.lua  |  4 +++-
- src/lib_debug.c   | 12 +++++-----
- src/lib_jit.c     | 14 ++++++------
- src/lib_package.c |  8 +++----
- src/lib_string.c  |  2 +-
- src/lj_api.c      | 40 +++++++++++++++++++++++++++++----
- src/lj_ccall.c    |  2 +-
- src/lj_cconv.c    |  2 +-
- src/lj_crecord.c  |  6 ++---
- src/lj_dispatch.c |  2 +-
- src/lj_ir.c       |  6 +++--
- src/lj_obj.c      |  5 +++--
- src/lj_obj.h      | 57 ++++++++++++++++++++++++++++++-----------------
- src/lj_snap.c     |  9 +++++++-
- src/lj_state.c    |  6 +++++
- src/lj_strfmt.c   |  2 +-
- 17 files changed, 121 insertions(+), 67 deletions(-)
-
-#diff --git a/doc/status.html b/doc/status.html
-#index 0aafe13a2..fd0ae8bae 100644
-#--- a/doc/status.html
-#+++ b/doc/status.html
-#@@ -91,17 +91,6 @@ <h2>Current Status</h2>
-# <tt>lua_atpanic</tt> on x64. This issue will be fixed with the new
-# garbage collector.
-# </li>
-#-<li>
-#-LuaJIT on 64 bit systems provides a <b>limited range</b> of 47 bits for the
-#-<b>legacy <tt>lightuserdata</tt></b> data type.
-#-This is only relevant on x64 systems which use the negative part of the
-#-virtual address space in user mode, e.g. Solaris/x64, and on ARM64 systems
-#-configured with a 48 bit or 52 bit VA.
-#-Avoid using <tt>lightuserdata</tt> to hold pointers that may point outside
-#-of that range, e.g. variables on the stack. In general, avoid this data
-#-type for new code and replace it with (much more performant) FFI bindings.
-#-FFI cdata pointers can address the full 64 bit range.
-#-</li>
-# </ul>
-# <br class="flush">
-# </div>
-Index: luajit/src/jit/dump.lua
-===================================================================
---- luajit.orig/src/jit/dump.lua
-+++ luajit/src/jit/dump.lua
-@@ -315,7 +315,9 @@
-   local tn = type(k)
-   local s
-   if tn == "number" then
--    if band(sn or 0, 0x30000) ~= 0 then
-+    if t < 12 then
-+      s = k == 0 and "NULL" or format("[0x%08x]", k)
-+    elseif band(sn or 0, 0x30000) ~= 0 then
-       s = band(sn, 0x20000) ~= 0 and "contpc" or "ftsz"
-     elseif k == 2^52+2^51 then
-       s = "bias"
-Index: luajit/src/lib_debug.c
-===================================================================
---- luajit.orig/src/lib_debug.c
-+++ luajit/src/lib_debug.c
-@@ -231,8 +231,8 @@
-   int32_t n = lj_lib_checkint(L, 2) - 1;
-   if ((uint32_t)n >= fn->l.nupvalues)
-     lj_err_arg(L, 2, LJ_ERR_IDXRNG);
--  setlightudV(L->top-1, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
--					(void *)&fn->c.upvalue[n]);
-+  lua_pushlightuserdata(L, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
-+					   (void *)&fn->c.upvalue[n]);
-   return 1;
- }
- 
-@@ -283,13 +283,13 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define KEY_HOOK	((void *)0x3004)
-+#define KEY_HOOK	(U64x(80000000,00000000)|'h')
- 
- static void hookf(lua_State *L, lua_Debug *ar)
- {
-   static const char *const hooknames[] =
-     {"call", "return", "line", "count", "tail return"};
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_rawget(L, LUA_REGISTRYINDEX);
-   if (lua_isfunction(L, -1)) {
-     lua_pushstring(L, hooknames[(int)ar->event]);
-@@ -334,7 +334,7 @@
-     count = luaL_optint(L, arg+3, 0);
-     func = hookf; mask = makemask(smask, count);
-   }
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_pushvalue(L, arg+1);
-   lua_rawset(L, LUA_REGISTRYINDEX);
-   lua_sethook(L, func, mask, count);
-@@ -349,7 +349,7 @@
-   if (hook != NULL && hook != hookf) {  /* external hook? */
-     lua_pushliteral(L, "external hook");
-   } else {
--    lua_pushlightuserdata(L, KEY_HOOK);
-+    (L->top++)->u64 = KEY_HOOK;
-     lua_rawget(L, LUA_REGISTRYINDEX);   /* get hook */
-   }
-   lua_pushstring(L, unmakemask(mask, buff));
-Index: luajit/src/lib_jit.c
-===================================================================
---- luajit.orig/src/lib_jit.c
-+++ luajit/src/lib_jit.c
-@@ -540,15 +540,15 @@
- 
- /* Not loaded by default, use: local profile = require("jit.profile") */
- 
--static const char KEY_PROFILE_THREAD = 't';
--static const char KEY_PROFILE_FUNC = 'f';
-+#define KEY_PROFILE_THREAD	(U64x(80000000,00000000)|'t')
-+#define KEY_PROFILE_FUNC	(U64x(80000000,00000000)|'f')
- 
- static void jit_profile_callback(lua_State *L2, lua_State *L, int samples,
- 				 int vmstate)
- {
-   TValue key;
-   cTValue *tv;
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   tv = lj_tab_get(L, tabV(registry(L)), &key);
-   if (tvisfunc(tv)) {
-     char vmst = (char)vmstate;
-@@ -575,9 +575,9 @@
-   lua_State *L2 = lua_newthread(L);  /* Thread that runs profiler callback. */
-   TValue key;
-   /* Anchor thread and function in registry. */
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setthreadV(L, lj_tab_set(L, registry, &key), L2);
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setfuncV(L, lj_tab_set(L, registry, &key), func);
-   lj_gc_anybarriert(L, registry);
-   luaJIT_profile_start(L, mode ? strdata(mode) : "",
-@@ -592,9 +592,9 @@
-   TValue key;
-   luaJIT_profile_stop(L);
-   registry = tabV(registry(L));
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setnilV(lj_tab_set(L, registry, &key));
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setnilV(lj_tab_set(L, registry, &key));
-   lj_gc_anybarriert(L, registry);
-   return 0;
-Index: luajit/src/lib_package.c
-===================================================================
---- luajit.orig/src/lib_package.c
-+++ luajit/src/lib_package.c
-@@ -398,7 +398,7 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define sentinel	((void *)0x4004)
-+#define KEY_SENTINEL	(U64x(80000000,00000000)|'s')
- 
- static int lj_cf_package_require(lua_State *L)
- {
-@@ -408,7 +408,7 @@
-   lua_getfield(L, LUA_REGISTRYINDEX, "_LOADED");
-   lua_getfield(L, 2, name);
-   if (lua_toboolean(L, -1)) {  /* is it there? */
--    if (lua_touserdata(L, -1) == sentinel)  /* check loops */
-+    if ((L->top-1)->u64 == KEY_SENTINEL)  /* check loops */
-       luaL_error(L, "loop or previous error loading module " LUA_QS, name);
-     return 1;  /* package is already loaded */
-   }
-@@ -431,14 +431,14 @@
-     else
-       lua_pop(L, 1);
-   }
--  lua_pushlightuserdata(L, sentinel);
-+  (L->top++)->u64 = KEY_SENTINEL;
-   lua_setfield(L, 2, name);  /* _LOADED[name] = sentinel */
-   lua_pushstring(L, name);  /* pass name as argument to module */
-   lua_call(L, 1, 1);  /* run loaded module */
-   if (!lua_isnil(L, -1))  /* non-nil return? */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = returned value */
-   lua_getfield(L, 2, name);
--  if (lua_touserdata(L, -1) == sentinel) {   /* module did not set a value? */
-+  if ((L->top-1)->u64 == KEY_SENTINEL) {   /* module did not set a value? */
-     lua_pushboolean(L, 1);  /* use true as result */
-     lua_pushvalue(L, -1);  /* extra copy to be returned */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = true */
-Index: luajit/src/lib_string.c
-===================================================================
---- luajit.orig/src/lib_string.c
-+++ luajit/src/lib_string.c
-@@ -714,7 +714,7 @@
- 	lj_strfmt_putfchar(sb, sf, lj_lib_checkint(L, arg));
- 	break;
-       case STRFMT_PTR:  /* No formatting. */
--	lj_strfmt_putptr(sb, lj_obj_ptr(L->base+arg-1));
-+	lj_strfmt_putptr(sb, lj_obj_ptr(G(L), L->base+arg-1));
- 	break;
-       default:
- 	lua_assert(0);
-Index: luajit/src/lj_api.c
-===================================================================
---- luajit.orig/src/lj_api.c
-+++ luajit/src/lj_api.c
-@@ -595,7 +595,7 @@
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(G(L), o);
-   else
-     return NULL;
- }
-@@ -608,7 +608,7 @@
- 
- LUA_API const void *lua_topointer(lua_State *L, int idx)
- {
--  return lj_obj_ptr(index2adr(L, idx));
-+  return lj_obj_ptr(G(L), index2adr(L, idx));
- }
- 
- /* -- Stack setters (object creation) ------------------------------------- */
-@@ -694,9 +694,38 @@
-   incr_top(L);
- }
- 
-+#if LJ_64
-+static void *lightud_intern(lua_State *L, void *p)
-+{
-+  global_State *g = G(L);
-+  uint64_t u = (uint64_t)p;
-+  uint32_t up = lightudup(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  MSize segnum = g->gc.lightudnum;
-+  if (segmap) {
-+    MSize seg;
-+    for (seg = 0; seg <= segnum; seg++)
-+      if (segmap[seg] == up)  /* Fast path. */
-+	return (void *)(((uint64_t)seg << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+    segnum++;
-+  }
-+  if (!((segnum-1) & segnum) && segnum != 1) {
-+    if (segnum >= (1 << LJ_LIGHTUD_BITS_SEG)) lj_err_msg(L, LJ_ERR_BADLU);
-+    lj_mem_reallocvec(L, segmap, segnum, segnum ? 2*segnum : 2u, uint32_t);
-+    setmref(g->gc.lightudseg, segmap);
-+  }
-+  g->gc.lightudnum = segnum;
-+  segmap[segnum] = up;
-+  return (void *)(((uint64_t)segnum << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+}
-+#endif
-+
- LUA_API void lua_pushlightuserdata(lua_State *L, void *p)
- {
--  setlightudV(L->top, checklightudptr(L, p));
-+#if LJ_64
-+  p = lightud_intern(L, p);
-+#endif
-+  setrawlightudV(L->top, p);
-   incr_top(L);
- }
- 
-@@ -1138,7 +1167,10 @@
-   fn->c.f = func;
-   setfuncV(L, top++, fn);
-   if (LJ_FR2) setnilV(top++);
--  setlightudV(top++, checklightudptr(L, ud));
-+#if LJ_64
-+  ud = lightud_intern(L, ud);
-+#endif
-+  setrawlightudV(top++, ud);
-   cframe_nres(L->cframe) = 1+0;  /* Zero results. */
-   L->top = top;
-   return top-1;  /* Now call the newly allocated C function. */
-Index: luajit/src/lj_ccall.c
-===================================================================
---- luajit.orig/src/lj_ccall.c
-+++ luajit/src/lj_ccall.c
-@@ -1314,7 +1314,7 @@
-     lj_vm_ffi_call(&cc);
-     if (cts->cb.slot != ~0u) {  /* Blacklist function that called a callback. */
-       TValue tv;
--      setlightudV(&tv, (void *)cc.func);
-+      tv.u64 = ((uintptr_t)(void *)cc.func >> 2) | U64x(800000000, 00000000);
-       setboolV(lj_tab_set(L, cts->miscmap, &tv), 1);
-     }
-     ct = (CType *)((intptr_t)ct+(intptr_t)cts->tab);  /* May be reallocated. */
-Index: luajit/src/lj_cconv.c
-===================================================================
---- luajit.orig/src/lj_cconv.c
-+++ luajit/src/lj_cconv.c
-@@ -611,7 +611,7 @@
-     if (ud->udtype == UDTYPE_IO_FILE)
-       tmpptr = *(void **)tmpptr;
-   } else if (tvislightud(o)) {
--    tmpptr = lightudV(o);
-+    tmpptr = lightudV(cts->g, o);
-   } else if (tvisfunc(o)) {
-     void *p = lj_ccallback_new(cts, d, funcV(o));
-     if (p) {
-Index: luajit/src/lj_crecord.c
-===================================================================
---- luajit.orig/src/lj_crecord.c
-+++ luajit/src/lj_crecord.c
-@@ -643,8 +643,7 @@
-     }
-   } else if (tref_islightud(sp)) {
- #if LJ_64
--    sp = emitir(IRT(IR_BAND, IRT_P64), sp,
--		lj_ir_kint64(J, U64x(00007fff,ffffffff)));
-+    lj_trace_err(J, LJ_TRERR_NYICONV);
- #endif
-   } else {  /* NYI: tref_istab(sp). */
-     IRType t;
-@@ -1209,8 +1208,7 @@
-     TRef tr;
-     TValue tv;
-     /* Check for blacklisted C functions that might call a callback. */
--    setlightudV(&tv,
--		cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4));
-+    tv.u64 = ((uintptr_t)cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4) >> 2) | U64x(800000000, 00000000);
-     if (tvistrue(lj_tab_get(J->L, cts->miscmap, &tv)))
-       lj_trace_err(J, LJ_TRERR_BLACKL);
-     if (ctype_isvoid(ctr->info)) {
-Index: luajit/src/lj_dispatch.c
-===================================================================
---- luajit.orig/src/lj_dispatch.c
-+++ luajit/src/lj_dispatch.c
-@@ -302,7 +302,7 @@
-       if (idx != 0) {
- 	cTValue *tv = idx > 0 ? L->base + (idx-1) : L->top + idx;
- 	if (tvislightud(tv))
--	  g->wrapf = (lua_CFunction)lightudV(tv);
-+	  g->wrapf = (lua_CFunction)lightudV(g, tv);
- 	else
- 	  return 0;  /* Failed. */
-       } else {
-Index: luajit/src/lj_ir.c
-===================================================================
---- luajit.orig/src/lj_ir.c
-+++ luajit/src/lj_ir.c
-@@ -386,8 +386,10 @@
-   case IR_KPRI: setpriV(tv, irt_toitype(ir->t)); break;
-   case IR_KINT: setintV(tv, ir->i); break;
-   case IR_KGC: setgcV(L, tv, ir_kgc(ir), irt_toitype(ir->t)); break;
--  case IR_KPTR: case IR_KKPTR: setlightudV(tv, ir_kptr(ir)); break;
--  case IR_KNULL: setlightudV(tv, NULL); break;
-+  case IR_KPTR: case IR_KKPTR:
-+    setnumV(tv, (lua_Number)(uintptr_t)ir_kptr(ir));
-+    break;
-+  case IR_KNULL: setintV(tv, 0); break;
-   case IR_KNUM: setnumV(tv, ir_knum(ir)->n); break;
- #if LJ_HASFFI
-   case IR_KINT64: {
-Index: luajit/src/lj_obj.c
-===================================================================
---- luajit.orig/src/lj_obj.c
-+++ luajit/src/lj_obj.c
-@@ -34,12 +34,13 @@
- }
- 
- /* Return pointer to object or its object data. */
--const void * LJ_FASTCALL lj_obj_ptr(cTValue *o)
-+const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o)
- {
-+  UNUSED(g);
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(g, o);
-   else if (LJ_HASFFI && tviscdata(o))
-     return cdataptr(cdataV(o));
-   else if (tvisgcv(o))
-Index: luajit/src/lj_obj.h
-===================================================================
---- luajit.orig/src/lj_obj.h
-+++ luajit/src/lj_obj.h
-@@ -232,7 +232,7 @@
- **                  ---MSW---.---LSW---
- ** primitive types |  itype  |         |
- ** lightuserdata   |  itype  |  void * |  (32 bit platforms)
--** lightuserdata   |ffff|    void *    |  (64 bit platforms, 47 bit pointers)
-+** lightuserdata   |ffff|seg|    ofs   |  (64 bit platforms)
- ** GC objects      |  itype  |  GCRef  |
- ** int (LJ_DUALNUM)|  itype  |   int   |
- ** number           -------double------
-@@ -245,7 +245,8 @@
- **
- **                     ------MSW------.------LSW------
- ** primitive types    |1..1|itype|1..................1|
--** GC objects/lightud |1..1|itype|-------GCRef--------|
-+** GC objects         |1..1|itype|-------GCRef--------|
-+** lightuserdata      |1..1|itype|seg|------ofs-------|
- ** int (LJ_DUALNUM)   |1..1|itype|0..0|-----int-------|
- ** number              ------------double-------------
- **
-@@ -285,6 +286,12 @@
- #define LJ_GCVMASK		(((uint64_t)1 << 47) - 1)
- #endif
- 
-+#if LJ_64
-+/* To stay within 47 bits, lightuserdata is segmented. */
-+#define LJ_LIGHTUD_BITS_SEG	8
-+#define LJ_LIGHTUD_BITS_LO	(47 - LJ_LIGHTUD_BITS_SEG)
-+#endif
-+
- /* -- String object ------------------------------------------------------- */
- 
- /* String object header. String payload follows. */
-@@ -576,7 +583,11 @@
-   uint8_t currentwhite;	/* Current white color. */
-   uint8_t state;	/* GC state. */
-   uint8_t nocdatafin;	/* No cdata finalizer called. */
--  uint8_t unused2;
-+#if LJ_64
-+  uint8_t lightudnum;	/* Number of lightuserdata segments - 1. */
-+#else
-+  uint8_t unused1;
-+#endif
-   MSize sweepstr;	/* Sweep position in string table. */
-   GCRef root;		/* List of all collectable objects. */
-   MRef sweep;		/* Sweep position in root list. */
-@@ -588,6 +599,9 @@
-   GCSize estimate;	/* Estimate of memory actually in use. */
-   MSize stepmul;	/* Incremental GC step granularity. */
-   MSize pause;		/* Pause between successive GC cycles. */
-+#if LJ_64
-+  MRef lightudseg;	/* Upper bits of lightuserdata segments. */
-+#endif
- } GCState;
- 
- /* Global state, shared by all threads of a Lua universe. */
-@@ -795,10 +809,23 @@
- #endif
- #define boolV(o)	check_exp(tvisbool(o), (LJ_TFALSE - itype(o)))
- #if LJ_64
--#define lightudV(o) \
--  check_exp(tvislightud(o), (void *)((o)->u64 & U64x(00007fff,ffffffff)))
-+#define lightudseg(u) \
-+  (((u) >> LJ_LIGHTUD_BITS_LO) & ((1 << LJ_LIGHTUD_BITS_SEG)-1))
-+#define lightudlo(u) \
-+  ((u) & (((uint64_t)1 << LJ_LIGHTUD_BITS_LO) - 1))
-+#define lightudup(p) \
-+  ((uint32_t)(((p) >> LJ_LIGHTUD_BITS_LO) << (LJ_LIGHTUD_BITS_LO-32)))
-+static LJ_AINLINE void *lightudV(global_State *g, cTValue *o)
-+{
-+  uint64_t u = o->u64;
-+  uint64_t seg = lightudseg(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  lua_assert(tvislightud(o));
-+  lua_assert(seg <= g->gc.lightudnum);
-+  return (void *)(((uint64_t)segmap[seg] << 32) | lightudlo(u));
-+}
- #else
--#define lightudV(o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
-+#define lightudV(g, o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
- #endif
- #define gcV(o)		check_exp(tvisgcv(o), gcval(o))
- #define strV(o)		check_exp(tvisstr(o), &gcval(o)->str)
-@@ -824,7 +851,7 @@
- #define setpriV(o, i)		(setitype((o), (i)))
- #endif
- 
--static LJ_AINLINE void setlightudV(TValue *o, void *p)
-+static LJ_AINLINE void setrawlightudV(TValue *o, void *p)
- {
- #if LJ_GC64
-   o->u64 = (uint64_t)p | (((uint64_t)LJ_TLIGHTUD) << 47);
-@@ -835,24 +862,14 @@
- #endif
- }
- 
--#if LJ_64
--#define checklightudptr(L, p) \
--  (((uint64_t)(p) >> 47) ? (lj_err_msg(L, LJ_ERR_BADLU), NULL) : (p))
--#else
--#define checklightudptr(L, p)	(p)
--#endif
--
--#if LJ_FR2
-+#if LJ_FR2 || LJ_32
- #define contptr(f)		((void *)(f))
- #define setcont(o, f)		((o)->u64 = (uint64_t)(uintptr_t)contptr(f))
--#elif LJ_64
-+#else
- #define contptr(f) \
-   ((void *)(uintptr_t)(uint32_t)((intptr_t)(f) - (intptr_t)lj_vm_asm_begin))
- #define setcont(o, f) \
-   ((o)->u64 = (uint64_t)(void *)(f) - (uint64_t)lj_vm_asm_begin)
--#else
--#define contptr(f)		((void *)(f))
--#define setcont(o, f)		setlightudV((o), contptr(f))
- #endif
- 
- #define tvchecklive(L, o) \
-@@ -978,6 +995,6 @@
- 
- /* Compare two objects without calling metamethods. */
- LJ_FUNC int LJ_FASTCALL lj_obj_equal(cTValue *o1, cTValue *o2);
--LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(cTValue *o);
-+LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o);
- 
- #endif
-Index: luajit/src/lj_snap.c
-===================================================================
---- luajit.orig/src/lj_snap.c
-+++ luajit/src/lj_snap.c
-@@ -626,7 +626,12 @@
-   IRType1 t = ir->t;
-   RegSP rs = ir->prev;
-   if (irref_isk(ref)) {  /* Restore constant slot. */
--    lj_ir_kvalue(J->L, o, ir);
-+    if (ir->o == IR_KPTR) {
-+      o->u64 = (uint64_t)(uintptr_t)ir_kptr(ir);
-+    } else {
-+      lua_assert(!(ir->o == IR_KKPTR || ir->o == IR_KNULL));
-+      lj_ir_kvalue(J->L, o, ir);
-+    }
-     return;
-   }
-   if (LJ_UNLIKELY(bloomtest(rfilt, ref)))
-Index: luajit/src/lj_state.c
-===================================================================
---- luajit.orig/src/lj_state.c
-+++ luajit/src/lj_state.c
-@@ -171,6 +171,12 @@
-   lj_mem_freevec(g, g->strhash, g->strmask+1, GCRef);
-   lj_buf_free(g, &g->tmpbuf);
-   lj_mem_freevec(g, tvref(L->stack), L->stacksize, TValue);
-+#if LJ_64
-+  if (mref(g->gc.lightudseg, uint32_t)) {
-+    MSize segnum = g->gc.lightudnum ? (2 << lj_fls(g->gc.lightudnum)) : 2;
-+    lj_mem_freevec(g, mref(g->gc.lightudseg, uint32_t), segnum, uint32_t);
-+  }
-+#endif
-   lua_assert(g->gc.total == sizeof(GG_State));
- #ifndef LUAJIT_USE_SYSMALLOC
-   if (g->allocf == lj_alloc_f)
-Index: luajit/src/lj_strfmt.c
-===================================================================
---- luajit.orig/src/lj_strfmt.c
-+++ luajit/src/lj_strfmt.c
-@@ -393,7 +393,7 @@
-       p = lj_buf_wmem(p, "builtin#", 8);
-       p = lj_strfmt_wint(p, funcV(o)->c.ffid);
-     } else {
--      p = lj_strfmt_wptr(p, lj_obj_ptr(o));
-+      p = lj_strfmt_wptr(p, lj_obj_ptr(G(L), o));
-     }
-     return lj_str_new(L, buf, (size_t)(p - buf));
-   }
diff --git a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
index 96e4c9106acf9..ac2a967c974d4 100644
--- a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
+++ b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
@@ -1,16 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Tue, 17 Nov 2015 16:27:11 +0100
-Subject: Enable debugging symbols in the build
-
----
- src/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git src/Makefile src/Makefile
-index 8a38efd..6b73a89 100644
+diff --git a/src/Makefile b/src/Makefile
+index 3a6a432..b606927 100644
 --- a/src/Makefile
 +++ b/src/Makefile
-@@ -54,9 +54,9 @@ CCOPT_arm64=
+@@ -53,9 +53,9 @@ CCOPT_arm64=
  CCOPT_ppc=
  CCOPT_mips=
  #
diff --git a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch b/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
deleted file mode 100644
index f53e211071063..0000000000000
--- a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
+++ /dev/null
@@ -1,33 +0,0 @@
---- a/src/jit/bcsave.lua	2018-12-17 19:06:27.215042417 +0100
-+++ b/src/jit/bcsave.lua	2018-12-17 19:17:12.982477945 +0100
-@@ -64,7 +64,7 @@
- 
- local map_arch = {
-   x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
--  ppc = true, mips = true, mipsel = true,
-+  ppc = true, ppc64 = true, ppc64le = true, mips = true, mipsel = true,
- }
- 
- local map_os = {
-@@ -200,9 +200,10 @@
- ]]
-   local symname = LJBC_PREFIX..ctx.modname
-   local is64, isbe = false, false
--  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
-+  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" or ctx.arch == "ppc64" or ctx.arch == "ppc64le" then
-     is64 = true
--  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
-+  end
-+  if ctx.arch == "ppc" or ctx.arch == "ppc64" or ctx.arch == "mips" then
-     isbe = true
-   end
- 
-@@ -237,7 +238,7 @@
-   hdr.eendian = isbe and 2 or 1
-   hdr.eversion = 1
-   hdr.type = f16(1)
--  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
-+  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, ppc64=21, ppc64le=21, mips=8, mipsel=8 })[ctx.arch])
-   if ctx.arch == "mips" or ctx.arch == "mipsel" then
-     hdr.flags = f32(0x50001006)
-   end
diff --git a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
index 59e1ee72fcbb8..2527bbef29961 100644
--- a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
+++ b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
@@ -1,18 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Thu, 19 Nov 2015 16:29:02 +0200
-Subject: Get rid of LUAJIT_VERSION_SYM that changes ABI on every patch release
-
----
- src/lj_dispatch.c | 5 -----
- src/luajit.c      | 2 --
- src/luajit.h      | 3 ---
- 3 files changed, 10 deletions(-)
-
-diff --git src/lj_dispatch.c src/lj_dispatch.c
-index 5d6795f..e865a78 100644
+diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
+index b9748bba..d09238f8 100644
 --- a/src/lj_dispatch.c
 +++ b/src/lj_dispatch.c
-@@ -319,11 +319,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
+@@ -318,11 +318,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
    return 1;  /* OK. */
  }
  
@@ -24,28 +14,28 @@ index 5d6795f..e865a78 100644
  /* -- Hooks --------------------------------------------------------------- */
  
  /* This function can be called asynchronously (e.g. during a signal). */
-diff --git src/luajit.c src/luajit.c
-index 1ca2430..ccf425e 100644
+diff --git a/src/luajit.c b/src/luajit.c
+index 73e29d44..31fdba18 100644
 --- a/src/luajit.c
 +++ b/src/luajit.c
-@@ -516,8 +516,6 @@ static int pmain(lua_State *L)
+@@ -515,7 +515,6 @@ static int pmain(lua_State *L)
+   int argn;
+   int flags = 0;
    globalL = L;
-   if (argv[0] && argv[0][0]) progname = argv[0];
- 
 -  LUAJIT_VERSION_SYM();  /* Linker-enforced version check. */
--
+ 
    argn = collectargs(argv, &flags);
    if (argn < 0) {  /* Invalid args? */
-     print_usage();
-diff --git src/luajit.h src/luajit.h
-index 708a5a1..35ae02c 100644
---- a/src/luajit.h
-+++ b/src/luajit.h
-@@ -73,7 +73,4 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
+diff --git a/src/luajit_rolling.h b/src/luajit_rolling.h
+index 2d04402c..5ab4167d 100644
+--- a/src/luajit_rolling.h
++++ b/src/luajit_rolling.h
+@@ -73,8 +73,5 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
  LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt,
  					     int depth, size_t *len);
  
 -/* Enforce (dynamic) linker error for version mismatches. Call from main. */
 -LUA_API void LUAJIT_VERSION_SYM(void);
 -
+ #error "DO NOT USE luajit_rolling.h -- only include build-generated luajit.h"
  #endif
diff --git a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch b/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
deleted file mode 100644
index aedaacbaaea46..0000000000000
--- a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
+++ /dev/null
@@ -1,21 +0,0 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Wed, 11 Oct 2017 08:42:41 +0000
-Subject: Make ccall_copy_struct static to unpollute global library namespace
-
----
- src/lj_ccall.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git src/lj_ccall.c src/lj_ccall.c
-index b891591..a7dcc1b 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -960,7 +960,7 @@ noth:  /* Not a homogeneous float/double aggregate. */
-   return 0;  /* Struct is in GPRs. */
- }
- 
--void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
-+static void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
- {
-   if (LJ_ABI_SOFTFP ? ft :
-       ((ft & 3) == FTYPE_FLOAT || (ft >> 2) == FTYPE_FLOAT)) {
diff --git a/srcpkgs/LuaJIT/template b/srcpkgs/LuaJIT/template
index 85449ac3d6f73..083696c7082d5 100644
--- a/srcpkgs/LuaJIT/template
+++ b/srcpkgs/LuaJIT/template
@@ -1,71 +1,58 @@
 # Template file for 'LuaJIT'
 pkgname=LuaJIT
-version=2.1.0beta3
-revision=2
-_so_version=2.1.0
-_dist_version=${_so_version}-beta3
+version=2.1.1706708390
+revision=1
+_commit_hash=9cc2e42b17148036d7d9ef36ab7afe52df345163
+build_style=gnu-makefile
 hostmakedepends="lua52-BitOp"
 short_desc="Just-In-Time Compiler for Lua"
 maintainer="Orphaned <orphan@voidlinux.org>"
 license="MIT"
 homepage="http://www.luajit.org"
-distfiles="http://luajit.org/download/${pkgname}-${_dist_version}.tar.gz"
-checksum=1ad2e34b111c802f9d0cdf019e986909123237a28c746b21295b63c9e785d9c3
+distfiles="https://repo.or.cz/luajit-2.0.git/snapshot/${_commit_hash}.tar.gz"
+checksum=a95ae5c59e2f9c0523f0a1b559e847cd341905547d1c6c6e2a18d780b7a5ffba
 
 build_options="lua52compat"
+desc_option_lua52compat="higher compatibility with lua 5.2"
 
-_cross_cc="cc"
-if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
-	if [ "${XBPS_MACHINE/-musl/}" = "x86_64" ]; then
-		hostmakedepends+=" cross-i686-linux-musl"
-		_cross_cc="i686-linux-musl-gcc -static"
-	else
-		broken="Host and target wordsize must match"
+_host_cc="cc"
+if [ -n "$CROSS_BUILD" ]; then
+	if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
+		if [ "${XBPS_MACHINE%%-*}" = "x86_64" ]; then
+			hostmakedepends+=" cross-i686-linux-musl"
+			_host_cc="i686-linux-musl-gcc -static"
+		else
+			broken="Host and target wordsize must match when not on x86_64"
+		fi
 	fi
-fi
 
-# the ppc64 patchset subtly breaks ppc, needs investigation; for
-# now apply patches conditionally, separately for ppc64 and ppc
-post_patch() {
-	local patchdir
+	make_build_args+=" CROSS=${XBPS_CROSS_TRIPLET}-"
+fi
 
-	case "$XBPS_TARGET_MACHINE" in
-		ppc64*) patchdir="ppc64";;
-		ppc*) patchdir="ppc";;
-		*) return;;
-	esac
+pre_build() {
+	if [ "$build_option_lua52compat" ]; then
+		make_build_args+=" XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
+	fi
 
-	for i in ${FILESDIR}/patches/${patchdir}/*.patch; do
-		msg_normal "patching: $i\n"
-		patch -sNp1 -i ${i}
-	done
+	# luajit gets its lowest version from this file if not using git
+	printf '%s' "${version##*.}" > "${wrksrc}/.relver"
 }
 
 do_build() {
+	# if we don't unset, the build fails to cross-compile
+	# due to confliction with the makefile macros
 	local _cflags=$CFLAGS
 	local _ldflags=$LDFLAGS
-
-	if [ "$CROSS_BUILD" ]; then
-		local cross="CROSS=${XBPS_CROSS_TRIPLET}-"
-	fi
-
-	if [ "$build_option_lua52compat" ]; then
-		local _xcflags="XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
-	fi
-
 	unset CFLAGS LDFLAGS
-	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 HOST_CC="${_cross_cc}" \
+
+	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 \
 		HOST_CFLAGS="$XBPS_CFLAGS" HOST_LDFLAGS="$XBPS_LDFLAGS" \
 		TARGET_CFLAGS="${_cflags}" TARGET_LDFLAGS="${_ldflags}" \
-		${_xcflags} ${cross}
+		HOST_CC="${_host_cc}" ${make_build_args}
 }
 
-do_install() {
-	make DPREFIX=${DESTDIR}/usr DESTDIR=${DESTDIR} \
-		INSTALL_SHARE=${DESTDIR}/usr/share PREFIX=/usr install
-
+post_install() {
 	mv ${DESTDIR}/usr/bin/luajit-* ${DESTDIR}/usr/bin/luajit
-	ln -fs libluajit-5.1.so.${_so_version} ${DESTDIR}/usr/lib/libluajit-5.1.so.2
 	vlicense COPYRIGHT
 }
 
diff --git a/srcpkgs/LuaJIT/update b/srcpkgs/LuaJIT/update
index 15613910677c9..96bbbd453c32c 100644
--- a/srcpkgs/LuaJIT/update
+++ b/srcpkgs/LuaJIT/update
@@ -1 +1 @@
-site="http://luajit.org/download.html"
+disabled="impossible to determine given release style and versioning scheme"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (7 preceding siblings ...)
  2024-01-31 18:29 ` [PR PATCH] [Updated] " yoshiyoshyosh
@ 2024-02-15 13:38 ` Calandracas606
  2024-02-16 16:25 ` Calandracas606
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-15 13:38 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 244 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1491022279

Comment:
Could using a version based on `git show` cause problem if the version number goes lower?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (8 preceding siblings ...)
  2024-02-15 13:38 ` [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup Calandracas606
@ 2024-02-16 16:25 ` Calandracas606
  2024-02-16 16:55 ` [PR REVIEW] " Calandracas606
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 16:25 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 717 bytes --]

New comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#issuecomment-1948804135

Comment:
upstream recommends that distributions use the "amalg" build target

https://luajit.org/install.html
```
The build system has a special target for an amalgamated build, i.e. make amalg. 
This compiles the LuaJIT core as one huge C file and allows GCC to generate faster and shorter code.
Alas, this requires lots of memory during the build. This may be a problem for some users,
that's why it's not enabled by default. But it shouldn't be a problem for most build farms.
It's recommended that binary distributions use this target for their LuaJIT builds. 
```

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (9 preceding siblings ...)
  2024-02-16 16:25 ` Calandracas606
@ 2024-02-16 16:55 ` Calandracas606
  2024-02-16 17:01 ` Calandracas606
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 16:55 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 227 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492743913

Comment:
Also the latest commit hash is: 0d313b243194a0b8d2399d8b549ca5a0ff234db5

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (10 preceding siblings ...)
  2024-02-16 16:55 ` [PR REVIEW] " Calandracas606
@ 2024-02-16 17:01 ` Calandracas606
  2024-02-16 17:18 ` Calandracas606
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 17:01 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 359 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492750454

Comment:
its not a problem, its using `git show -s --format=%ct` which will return a unix timestamp. Maybe there should be a comment in the template explaining where the version comes from, and how to generate it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (11 preceding siblings ...)
  2024-02-16 17:01 ` Calandracas606
@ 2024-02-16 17:18 ` Calandracas606
  2024-02-16 17:24 ` Calandracas606
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 17:18 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 234 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492771445

Comment:
this shouldn't be needed, the makefile creates a symlink for this automatically

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (12 preceding siblings ...)
  2024-02-16 17:18 ` Calandracas606
@ 2024-02-16 17:24 ` Calandracas606
  2024-02-16 18:21 ` Calandracas606
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 17:24 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 293 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492777653

Comment:
when downloading the snapshot from `https://repo.or.cz/luajit-2.0.git/snapshot` the `.relver` file already exists with the proper revision

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (13 preceding siblings ...)
  2024-02-16 17:24 ` Calandracas606
@ 2024-02-16 18:21 ` Calandracas606
  2024-02-16 18:29 ` Calandracas606
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 18:21 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492835731

Comment:
Maybe the version should just be "2.1" and not include the timestamp?

As mentioned in my other comment, `.relver` is already included in the tarball.

That way, whenever this package updates, we could just bump the revision, rather than needing to also figure out the unix timestamp

This is just a thought, I don't know if this is the correct or what the preferred way of doing this would be.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (14 preceding siblings ...)
  2024-02-16 18:21 ` Calandracas606
@ 2024-02-16 18:29 ` Calandracas606
  2024-02-19  5:00 ` yoshiyoshyosh
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-16 18:29 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

New review comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1492843461

Comment:
As you mentioned its probably possible to leave this up to xbps-src to add this flag when building `-dbg`

debug flag `-g` is added to CFLAGS automatically by xbps-src, see `common/xbps-src/shutils/common.sh`

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (15 preceding siblings ...)
  2024-02-16 18:29 ` Calandracas606
@ 2024-02-19  5:00 ` yoshiyoshyosh
  2024-02-19  5:06 ` yoshiyoshyosh
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:00 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 620 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1494011424

Comment:
I don't believe that we should set the version to simply "2.1" and revbump for updates.

For one, it obscures when this is actually updated or when something simply needs to change in the template/shared libs/etc.

For another, when liajit builds and installs, the shared lib has this version, the makefile echoes this as the "minor" version, all the like. By all means, the `git show` *is* the canonical minor version, and it should be reflected in the template

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (16 preceding siblings ...)
  2024-02-19  5:00 ` yoshiyoshyosh
@ 2024-02-19  5:06 ` yoshiyoshyosh
  2024-02-19  5:06 ` yoshiyoshyosh
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:06 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 167 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1494013818

Comment:
That it does!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (17 preceding siblings ...)
  2024-02-19  5:06 ` yoshiyoshyosh
@ 2024-02-19  5:06 ` yoshiyoshyosh
  2024-02-19  5:20 ` yoshiyoshyosh
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:06 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 179 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1494013818

Comment:
That it does! Nice catch!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (18 preceding siblings ...)
  2024-02-19  5:06 ` yoshiyoshyosh
@ 2024-02-19  5:20 ` yoshiyoshyosh
  2024-02-19  5:24 ` yoshiyoshyosh
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:20 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 283 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1494022012

Comment:
yeah I kinda just grandfathered this path without thinking about it too much, but the debug symbols do indeed generate without it

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (19 preceding siblings ...)
  2024-02-19  5:20 ` yoshiyoshyosh
@ 2024-02-19  5:24 ` yoshiyoshyosh
  2024-02-19  5:35 ` [PR PATCH] [Updated] " yoshiyoshyosh
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:24 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

New review comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#discussion_r1494011424

Comment:
I don't believe that we should set the version to simply "2.1" and revbump for updates.

For one, it obscures when this is actually updated or when something simply needs to change in the template/shared libs/etc.

For another, when liajit builds and installs, the shared lib has this version, the makefile echoes this as the "relver" version, all the like. By all means, the `git show` *is* the canonical bugfix/relver version, and it should be reflected in the template

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PR PATCH] [Updated] LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (20 preceding siblings ...)
  2024-02-19  5:24 ` yoshiyoshyosh
@ 2024-02-19  5:35 ` yoshiyoshyosh
  2024-02-19 12:21 ` Calandracas606
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-19  5:35 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2770 bytes --]

There is an updated pull request by yoshiyoshyosh against master on the void-packages repository

https://github.com/yoshiyoshyosh/void-packages luajit-2.1-rolling
https://github.com/void-linux/void-packages/pull/48453

LuaJIT: update to 2.1.1706708390, cleanup.
#### Testing the changes
- I tested the changes in this PR: **briefly**
  - The only lua thing I really run is awesomewm. I built awesomewm against luajit with the build option and everything seems good, but of course any further testing is encouraged.

#### Local build testing
- I built this PR locally for my native architecture, (`x86_64-glibc`)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - `x86_64-musl`
  - `i686-glibc` (both crossbuild and masterdir)
  - `aarch64-glibc` (crossbuild)
  - `aarch64-musl` (crossbuild)
  - `armv7l-glibc` (crossbuild)

This addresses #48349.

LuaJIT has moved to "rolling releases" on branches in their git repo, which basically means releases are git commits to a `v2.1` branch. Of course, this is incompatible with void's packaging philosophy. However, there also seems to be a `v2.1` *tag* that was created during the move and not updated since. I'm unsure on whether this tag is simply meant to be a marker for the start of v2.1 in the new rolling release era, or if they intend for it to be a stable tag that "releases" might occasionally get pushed to every now and then.
Whatever the case, this is a tag that was "released" in a form they seemingly deem stable enough, which is why I think of it as enough to update to (especially since we'd be getting off a 6 year old version to a 5 month old version now).

Regarding the version number: In the makefiles, there exists a `RELVER` macro [that gets set by a `git show` command](https://repo.or.cz/luajit-2.0.git/blob/2090842410e0ba6f81fad310a77bf5432488249a:/src/Makefile#l478). The "canonical" version number in the makefiles then becomes `major.minor.relver` and the binary/library version is installed with this version number. This is the only real patch version number that we have, so it's what I believe should go in the version number. I just used the same `git show` that they use and baked it into `version`

I removed all but two of the patches, as they have either been upstreamed into the `v2.1` tag or were for powerpc, which void doesn't support anymore. Should we even have the "enable debug symbols" patch for main repo builds instead of leaving it to `-dbg`? I'm only keeping it because every other distribution keeps it.

I also just cleaned up the template in general; it's more concise and organized IMO while achieving the same thing.

A patch file from https://github.com/void-linux/void-packages/pull/48453.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-luajit-2.1-rolling-48453.patch --]
[-- Type: text/x-diff, Size: 158714 bytes --]

From 4939cfc160c0e4093029186f08d075f5c76aad1f Mon Sep 17 00:00:00 2001
From: yosh <yosh-git@riseup.net>
Date: Wed, 31 Jan 2024 02:54:09 -0500
Subject: [PATCH] LuaJIT: update to 2.1.1706708390, cleanup.

---
 .../patches/ppc/musl-ppc-secureplt.patch      |   93 -
 .../patches/ppc64/add-ppc64-support.patch     | 3521 -----------------
 .../patches/ppc64/fix-vm-jit-ppc64.patch      |   11 -
 .../aarch64-Fix-exit-stub-patching.patch      |  231 --
 .../aarch64-register-allocation-bug-fix.patch |   29 -
 ...1abec542e6f9851ff2368e7f196b6382a44c.patch |  562 ---
 .../LuaJIT/patches/enable-debug-symbols.patch |   24 -
 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch |   33 -
 .../get-rid-of-luajit-version-sym.patch       |   40 +-
 .../patches/unpollute-global-namespace.patch  |   21 -
 srcpkgs/LuaJIT/template                       |   75 +-
 srcpkgs/LuaJIT/update                         |    2 +-
 12 files changed, 46 insertions(+), 4596 deletions(-)
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
 delete mode 100644 srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
 delete mode 100644 srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch

diff --git a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch b/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
deleted file mode 100644
index 3000ca0ed3d53..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc/musl-ppc-secureplt.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-Imported from https://github.com/LuaJIT/LuaJIT/pull/486.
-
-This fixes crashes on ppc-musl, as musl only supports secureplt.
-
---- a/src/lj_dispatch.c
-+++ b/src/lj_dispatch.c
-@@ -56,6 +56,18 @@ static const ASMFunction dispatch_got[] = {
- #undef GOTFUNC
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+#include <math.h>
-+LJ_FUNCA_NORET void LJ_FASTCALL lj_ffh_coroutine_wrap_err(lua_State *L,
-+							  lua_State *co);
-+
-+#define GOTFUNC(name)	(ASMFunction)name,
-+static const ASMFunction dispatch_got[] = {
-+  GOTDEF(GOTFUNC)
-+};
-+#undef GOTFUNC
-+#endif
-+
- /* Initialize instruction dispatch table and hot counters. */
- void lj_dispatch_init(GG_State *GG)
- {
-@@ -77,6 +89,9 @@ void lj_dispatch_init(GG_State *GG)
- #if LJ_TARGET_MIPS
-   memcpy(GG->got, dispatch_got, LJ_GOT__MAX*sizeof(ASMFunction *));
- #endif
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+  memcpy(GG->got, dispatch_got, LJ_GOT__MAX*4);
-+#endif
- }
- 
- #if LJ_HASJIT
---- a/src/lj_dispatch.h
-+++ b/src/lj_dispatch.h
-@@ -66,6 +66,21 @@ GOTDEF(GOTENUM)
- };
- #endif
- 
-+#if LJ_TARGET_PPC && (LJ_ARCH_BITS == 32)
-+/* Need our own global offset table for the dreaded MIPS calling conventions. */
-+#define GOTDEF(_) \
-+  _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \
-+  _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \
-+  _(pow) _(fmod) _(ldexp) _(sqrt)
-+
-+enum {
-+#define GOTENUM(name) LJ_GOT_##name,
-+GOTDEF(GOTENUM)
-+#undef GOTENUM
-+  LJ_GOT__MAX
-+};
-+#endif
-+
- /* Type of hot counter. Must match the code in the assembler VM. */
- /* 16 bits are sufficient. Only 0.0015% overhead with maximum slot penalty. */
- typedef uint16_t HotCount;
-@@ -89,7 +104,7 @@ typedef uint16_t HotCount;
- typedef struct GG_State {
-   lua_State L;				/* Main thread. */
-   global_State g;			/* Global state. */
--#if LJ_TARGET_MIPS
-+#if LJ_TARGET_MIPS || (LJ_TARGET_PPC && (LJ_ARCH_BITS == 32))
-   ASMFunction got[LJ_GOT__MAX];		/* Global offset table. */
- #endif
- #if LJ_HASJIT
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -59,7 +59,12 @@
- |.define ENV_OFS,	8
- |.endif
- |.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro blex, target
-+|  lwz TMP0, DISPATCH_GOT(target)(DISPATCH)
-+|  mtctr TMP0
-+|  bctrl
-+|  //bl extern target@plt
-+|.endmacro
- |.macro .toc, a, b; .endmacro
- |.endif
- |.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-@@ -448,6 +453,8 @@
- |// Assumes DISPATCH is relative to GL.
- #define DISPATCH_GL(field)	(GG_DISP2G + (int)offsetof(global_State, field))
- #define DISPATCH_J(field)	(GG_DISP2J + (int)offsetof(jit_State, field))
-+#define GG_DISP2GOT		(GG_OFS(got) - GG_OFS(dispatch))
-+#define DISPATCH_GOT(name)	(GG_DISP2GOT + 4*LJ_GOT_##name)
- |
- #define PC2PROTO(field)  ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
- |
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch b/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
deleted file mode 100644
index 7c865859da923..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/add-ppc64-support.patch
+++ /dev/null
@@ -1,3521 +0,0 @@
-From: "Rodrigo R. Galvao" <rosattig@br.ibm.com>
-Date: Wed, 11 Oct 2017 08:41:47 +0000
-Subject: New patch proposal for PPC64 support
-
- Create a patch for PPC64 support based on 
-https://github.com/LuaJIT/LuaJIT/pull/140.
- It replaces the old patch since this new one is more likely to be merged 
-with luajit upstream.
-
-
-Author: Rodrigo R. Galvao <rosattig@br.ibm.com>
----
- dynasm/dasm_ppc.lua    |    5 +
- src/Makefile           |   11 +-
- src/host/buildvm_asm.c |   16 +-
- src/lj_arch.h          |   18 +-
- src/lj_ccall.c         |  166 ++++++-
- src/lj_ccall.h         |   13 +
- src/lj_ccallback.c     |   68 ++-
- src/lj_ctype.h         |    2 +-
- src/lj_def.h           |    4 +
- src/lj_frame.h         |    9 +
- src/lj_target_ppc.h    |   14 +
- src/vm_ppc.dasc        | 1290 ++++++++++++++++++++++++++++++++----------------
- 12 files changed, 1162 insertions(+), 454 deletions(-)
-
-diff --git dynasm/dasm_ppc.lua dynasm/dasm_ppc.lua
-index f73974d..a4ad70b 100644
---- a/dynasm/dasm_ppc.lua
-+++ b/dynasm/dasm_ppc.lua
-@@ -257,9 +257,11 @@ map_op = {
-   addic_3 =	"30000000RRI",
-   ["addic._3"] = "34000000RRI",
-   addi_3 =	"38000000RR0I",
-+  addil_3 =	"38000000RR0J",
-   li_2 =	"38000000RI",
-   la_2 =	"38000000RD",
-   addis_3 =	"3c000000RR0I",
-+  addisl_3 =	"3c000000RR0J",
-   lis_2 =	"3c000000RI",
-   lus_2 =	"3c000000RU",
-   bc_3 =	"40000000AAK",
-@@ -842,6 +844,9 @@ map_op = {
-   srdi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "64-("..p[3]..")"
-   end),
-+  ["srdi._3"] =	op_alias("rldicl._4", function(p)
-+    p[4] = p[3]; p[3] = "64-("..p[3]..")"
-+  end),
-   clrldi_3 =	op_alias("rldicl_4", function(p)
-     p[4] = p[3]; p[3] = "0"
-   end),
-diff --git src/Makefile src/Makefile
-index 6b73a89..cc50bae 100644
---- a/src/Makefile
-+++ b/src/Makefile
-@@ -453,7 +453,16 @@ ifeq (ppc,$(TARGET_LJARCH))
-     DASM_AFLAGS+= -D GPR64
-   endif
-   ifeq (PS3,$(TARGET_SYS))
--    DASM_AFLAGS+= -D PPE -D TOC
-+    DASM_AFLAGS+= -D PPE
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPD 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPD
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_OPDENV 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D OPDENV
-+  endif
-+  ifneq (,$(findstring LJ_ARCH_PPC_ELFV2 1,$(TARGET_TESTARCH)))
-+    DASM_AFLAGS+= -D ELFV2
-   endif
-   ifneq (,$(findstring LJ_ARCH_PPC64 ,$(TARGET_TESTARCH)))
-     DASM_ARCH= ppc64
-diff --git src/host/buildvm_asm.c src/host/buildvm_asm.c
-index ffd1490..6bb995e 100644
---- a/src/host/buildvm_asm.c
-+++ b/src/host/buildvm_asm.c
-@@ -140,18 +140,14 @@ static void emit_asm_wordreloc(BuildCtx *ctx, uint8_t *p, int n,
- #else
- #define TOCPREFIX ""
- #endif
--  if ((ins >> 26) == 16) {
-+  if ((ins >> 26) == 14) {
-+    fprintf(ctx->fp, "\taddi %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 15) {
-+    fprintf(ctx->fp, "\taddis %d,%d,%s\n", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-+  } else if ((ins >> 26) == 16) {
-     fprintf(ctx->fp, "\t%s %d, %d, " TOCPREFIX "%s\n",
- 	    (ins & 1) ? "bcl" : "bc", (ins >> 21) & 31, (ins >> 16) & 31, sym);
-   } else if ((ins >> 26) == 18) {
--#if LJ_ARCH_PPC64
--    const char *suffix = strchr(sym, '@');
--    if (suffix && suffix[1] == 'h') {
--      fprintf(ctx->fp, "\taddis 11, 2, %s\n", sym);
--    } else if (suffix && suffix[1] == 'l') {
--      fprintf(ctx->fp, "\tld 12, %s\n", sym);
--    } else
--#endif
-     fprintf(ctx->fp, "\t%s " TOCPREFIX "%s\n", (ins & 1) ? "bl" : "b", sym);
-   } else {
-     fprintf(stderr,
-@@ -250,7 +246,7 @@ void emit_asm(BuildCtx *ctx)
-   int i, rel;
- 
-   fprintf(ctx->fp, "\t.file \"buildvm_%s.dasc\"\n", ctx->dasm_arch);
--#if LJ_ARCH_PPC64
-+#if LJ_ARCH_PPC_ELFV2
-   fprintf(ctx->fp, "\t.abiversion 2\n");
- #endif
-   fprintf(ctx->fp, "\t.text\n");
-diff --git src/lj_arch.h src/lj_arch.h
-index d609b37..53bc651 100644
---- a/src/lj_arch.h
-+++ b/src/lj_arch.h
-@@ -269,10 +269,18 @@
- #if LJ_TARGET_CONSOLE
- #define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOFFI		1
-+#if LJ_TARGET_PS3
-+#define LJ_ARCH_PPC_OPD		1
-+#endif
- #elif LJ_ARCH_BITS == 64
--#define LJ_ARCH_PPC64		1
--#define LJ_TARGET_GC64		1
-+#define LJ_ARCH_PPC32ON64	1
- #define LJ_ARCH_NOJIT		1	/* NYI */
-+#if _CALL_ELF == 2
-+#define LJ_ARCH_PPC_ELFV2	1
-+#else
-+#define LJ_ARCH_PPC_OPD		1
-+#define LJ_ARCH_PPC_OPDENV	1
-+#endif
- #endif
- 
- #if _ARCH_PWR7
-@@ -423,12 +431,6 @@
- #if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE)
- #error "No support for PowerPC CPUs without double-precision FPU"
- #endif
--#if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE
--#error "No support for little-endian PPC32"
--#endif
--#if LJ_ARCH_PPC64
--#error "No support for PowerPC 64 bit mode (yet)"
--#endif
- #ifdef __NO_FPRS__
- #error "No support for PPC/e500 anymore (use LuaJIT 2.0)"
- #endif
-diff --git src/lj_ccall.c src/lj_ccall.c
-index 5c252e5..b891591 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -369,21 +369,97 @@
- #elif LJ_TARGET_PPC
- /* -- PPC calling conventions --------------------------------------------- */
- 
-+#if LJ_ARCH_BITS == 64
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  if (sz > 16 && ccall_classify_fp(cts, ctr) <= 0) { \
-+    cc->retref = 1;  /* Return by reference. */ \
-+    cc->gpr[ngpr++] = (GPRArg)dp; \
-+  }
-+
-+#define CCALL_HANDLE_STRUCTRET2 \
-+  int isfp = ccall_classify_fp(cts, ctr); \
-+  int i; \
-+  if (isfp == FTYPE_FLOAT) { \
-+    for (i = 0; i < ctr->size / 4; i++) \
-+      ((float *)dp)[i] = cc->fpr[i]; \
-+  } else if (isfp == FTYPE_DOUBLE) { \
-+    for (i = 0; i < ctr->size / 8; i++) \
-+      ((double *)dp)[i] = cc->fpr[i]; \
-+  } else { \
-+    if (ctr->size < 8 && LJ_BE) { \
-+      sp += 8 - ctr->size; \
-+    } \
-+    memcpy(dp, sp, ctr->size); \
-+  }
-+
-+#else
-+
- #define CCALL_HANDLE_STRUCTRET \
-   cc->retref = 1;  /* Return all structs by reference. */ \
-   cc->gpr[ngpr++] = (GPRArg)dp;
- 
-+#endif
-+
- #define CCALL_HANDLE_COMPLEXRET \
-   /* Complex values are returned in 2 or 4 GPRs. */ \
-   cc->retref = 0;
- 
-+#define CCALL_HANDLE_STRUCTARG
-+
- #define CCALL_HANDLE_COMPLEXRET2 \
--  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+  if (ctr->size == 2*sizeof(float)) {  /* Copy complex float from FPRs. */ \
-+    ((float *)dp)[0] = cc->fpr[0]; \
-+    ((float *)dp)[1] = cc->fpr[1]; \
-+  } else {  /* Copy complex double from FPRs. */ \
-+    ((double *)dp)[0] = cc->fpr[0]; \
-+    ((double *)dp)[1] = cc->fpr[1]; \
-+  }
-+
-+#define CCALL_HANDLE_COMPLEXARG \
-+  isfp = 1; \
-+  if (d->size == sizeof(float) * 2) { \
-+    d = ctype_get(cts, CTID_COMPLEX_DOUBLE); \
-+    isf32 = 1; \
-+  }
-+
-+#define CCALL_HANDLE_REGARG \
-+  if (isfp && d->size == sizeof(float)) { \
-+    d = ctype_get(cts, CTID_DOUBLE); \
-+    isf32 = 1; \
-+  } \
-+  if (ngpr < maxgpr) { \
-+   dp = &cc->gpr[ngpr]; \
-+   ngpr += n; \
-+   if (ngpr > maxgpr) { \
-+     nsp += ngpr - 8; \
-+     ngpr = 8; \
-+     if (nsp > CCALL_MAXSTACK) { \
-+       goto err_nyi; \
-+     } \
-+   } \
-+   goto done; \
-+  }
-+
-+#else
-+
-+#define CCALL_HANDLE_STRUCTRET \
-+  cc->retref = 1;  /* Return all structs by reference. */ \
-+  cc->gpr[ngpr++] = (GPRArg)dp;
-+
-+#define CCALL_HANDLE_COMPLEXRET \
-+  /* Complex values are returned in 2 or 4 GPRs. */ \
-+  cc->retref = 0;
- 
- #define CCALL_HANDLE_STRUCTARG \
-   rp = cdataptr(lj_cdata_new(cts, did, sz)); \
-   sz = CTSIZE_PTR;  /* Pass all structs by reference. */
- 
-+#define CCALL_HANDLE_COMPLEXRET2 \
-+  memcpy(dp, sp, ctr->size);  /* Copy complex from GPRs. */
-+
- #define CCALL_HANDLE_COMPLEXARG \
-   /* Pass complex by value in 2 or 4 GPRs. */
- 
-@@ -410,6 +486,8 @@
-     } \
-   }
- 
-+#endif
-+
- #define CCALL_HANDLE_RET \
-   if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \
-     ctr = ctype_get(cts, CTID_DOUBLE);  /* FPRs always hold doubles. */
-@@ -801,6 +879,50 @@ noth:  /* Not a homogeneous float/double aggregate. */
- 
- #endif
- 
-+/* -- PowerPC64 ELFv2 ABI struct classification ------------------- */
-+
-+#if LJ_ARCH_PPC_ELFV2
-+
-+#define FTYPE_FLOAT	1
-+#define FTYPE_DOUBLE	2
-+
-+static unsigned int ccall_classify_fp(CTState *cts, CType *ct) {
-+  if (ctype_isfp(ct->info)) {
-+    if (ct->size == sizeof(float))
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_iscomplex(ct->info)) {
-+    if (ct->size == sizeof(float) * 2)
-+      return FTYPE_FLOAT;
-+    else
-+      return FTYPE_DOUBLE;
-+  } else if (ctype_isstruct(ct->info)) {
-+    int res = -1;
-+    int sz = ct->size;
-+    while (ct->sib) {
-+      ct = ctype_get(cts, ct->sib);
-+      if (ctype_isfield(ct->info)) {
-+        int sub = ccall_classify_fp(cts, ctype_rawchild(cts, ct));
-+        if (res == -1)
-+          res = sub;
-+        if (sub != -1 && sub != res)
-+          return 0;
-+      } else if (ctype_isbitfield(ct->info) ||
-+        ctype_isxattrib(ct->info, CTA_SUBTYPE)) {
-+        return 0;
-+      }
-+    }
-+    if (res > 0 && sz > res * 4 * 8)
-+      return 0;
-+    return res;
-+  } else {
-+    return 0;
-+  }
-+}
-+
-+#endif
-+
- /* -- MIPS64 ABI struct classification ---------------------------- */
- 
- #if LJ_TARGET_MIPS64
-@@ -974,6 +1096,9 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     CTSize sz;
-     MSize n, isfp = 0, isva = 0;
-     void *dp, *rp = NULL;
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    int isf32 = 0;
-+#endif
- 
-     if (fid) {  /* Get argument type from field. */
-       CType *ctf = ctype_get(cts, fid);
-@@ -1030,7 +1155,37 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-       *(void **)dp = rp;
-       dp = rp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64 && LJ_BE
-+    if (ctype_isstruct(d->info) && sz < CTSIZE_PTR) {
-+      dp = (char *)dp + (CTSIZE_PTR - sz);
-+    }
-+#endif
-     lj_cconv_ct_tv(cts, d, (uint8_t *)dp, o, CCF_ARG(narg));
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (isfp) {
-+      int i;
-+      for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+        cc->fpr[nfpr++] = ((double *)dp)[i];
-+    }
-+    if (isf32) {
-+      int i;
-+      for (i = 0; i < d->size / 8; i++)
-+        ((float *)dp)[i*2] = ((double *)dp)[i];
-+    }
-+#endif
-+#if LJ_ARCH_PPC_ELFV2
-+    if (ctype_isstruct(d->info)) {
-+      isfp = ccall_classify_fp(cts, d);
-+      int i;
-+      if (isfp == FTYPE_FLOAT) {
-+        for (i = 0; i < d->size / 4 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((float *)dp)[i];
-+      } else if (isfp == FTYPE_DOUBLE) {
-+        for (i = 0; i < d->size / 8 && nfpr < CCALL_NARG_FPR; i++)
-+          cc->fpr[nfpr++] = ((double *)dp)[i];
-+      }
-+    }
-+#endif
-     /* Extend passed integers to 32 bits at least. */
-     if (ctype_isinteger_or_bool(d->info) && d->size < 4) {
-       if (d->info & CTF_UNSIGNED)
-@@ -1044,6 +1199,15 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct,
-     if (isfp && d->size == sizeof(float))
-       ((float *)dp)[1] = ((float *)dp)[0];  /* Floats occupy high slot. */
- #endif
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info))
-+	&& d->size <= 4) {
-+      if (d->info & CTF_UNSIGNED)
-+	*(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     if ((ctype_isinteger_or_bool(d->info) || ctype_isenum(d->info)
- #if LJ_TARGET_MIPS64
-diff --git src/lj_ccall.h src/lj_ccall.h
-index 59f6648..bbf309f 100644
---- a/src/lj_ccall.h
-+++ b/src/lj_ccall.h
-@@ -86,10 +86,23 @@ typedef union FPRArg {
- #elif LJ_TARGET_PPC
- 
- #define CCALL_NARG_GPR		8
-+#if LJ_ARCH_BITS == 64
-+#define CCALL_NARG_FPR		13
-+#if LJ_ARCH_PPC_ELFV2
-+#define CCALL_NRET_GPR		2
-+#define CCALL_NRET_FPR		8
-+#define CCALL_SPS_EXTRA		14
-+#else
-+#define CCALL_NRET_GPR		1
-+#define CCALL_NRET_FPR		2
-+#define CCALL_SPS_EXTRA		16
-+#endif
-+#else
- #define CCALL_NARG_FPR		8
- #define CCALL_NRET_GPR		4	/* For complex double. */
- #define CCALL_NRET_FPR		1
- #define CCALL_SPS_EXTRA		4
-+#endif
- #define CCALL_SPS_FREE		0
- 
- typedef intptr_t GPRArg;
-diff --git src/lj_ccallback.c src/lj_ccallback.c
-index 846827b..eb7f445 100644
---- a/src/lj_ccallback.c
-+++ b/src/lj_ccallback.c
-@@ -61,8 +61,24 @@ static MSize CALLBACK_OFS2SLOT(MSize ofs)
- 
- #elif LJ_TARGET_PPC
- 
-+#if LJ_ARCH_PPC_OPD
-+
-+#define CALLBACK_SLOT2OFS(slot)		(24*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/24)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_OFS2SLOT(CALLBACK_MCODE_SIZE))
-+
-+#elif LJ_ARCH_PPC_ELFV2
-+
-+#define CALLBACK_SLOT2OFS(slot)		(4*(slot))
-+#define CALLBACK_OFS2SLOT(ofs)		((ofs)/4)
-+#define CALLBACK_MAX_SLOT		(CALLBACK_MCODE_SIZE/4 - 10)
-+
-+#else
-+
- #define CALLBACK_MCODE_HEAD		24
- 
-+#endif
-+
- #elif LJ_TARGET_MIPS32
- 
- #define CALLBACK_MCODE_HEAD		20
-@@ -188,24 +204,59 @@ static void callback_mcode_init(global_State *g, uint32_t *page)
-   lua_assert(p - page <= CALLBACK_MCODE_SIZE);
- }
- #elif LJ_TARGET_PPC
-+#if LJ_ARCH_PPC_OPD
-+register void *vm_toc __asm__("r2");
-+static void callback_mcode_init(global_State *g, uint64_t *page)
-+{
-+  uint64_t *p = page;
-+  void *target = (void *)lj_vm_ffi_callback;
-+  MSize slot;
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p++ = (uint64_t)target;
-+    *p++ = (uint64_t)vm_toc;
-+    *p++ = (uint64_t)g | ((uint64_t)slot << 47);
-+  }
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 8);
-+}
-+#else
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-   uint32_t *p = page;
-   void *target = (void *)lj_vm_ffi_callback;
-   MSize slot;
-+#if LJ_ARCH_PPC_ELFV2
-+  // Needs to be in sync with lj_vm_ffi_callback.
-+  lua_assert(CALLBACK_MCODE_SIZE == 4096);
-+  for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
-+    *p = PPCI_B | (((page+CALLBACK_MAX_SLOT-p) & 0x00ffffffu) << 2);
-+    p++;
-+  }
-+  *p++ = PPCI_LI | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 32) & 0xffff);
-+  *p++ = PPCI_LI | PPCF_T(RID_R11) | ((((intptr_t)g) >> 32) & 0xffff);
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_SYS1) | PPCF_A(RID_SYS1) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_RLDICR | PPCF_T(RID_R11) | PPCF_A(RID_R11) | PPCF_SH(32) | PPCF_M6(63-32);  /* sldi */
-+  *p++ = PPCI_ORIS | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | ((((intptr_t)target) >> 16) & 0xffff);
-+  *p++ = PPCI_ORIS | PPCF_A(RID_R11) | PPCF_T(RID_R11) | ((((intptr_t)g) >> 16) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_SYS1) | PPCF_T(RID_SYS1) | (((intptr_t)target) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11) | PPCF_T(RID_R11) | (((intptr_t)g) & 0xffff);
-+  *p++ = PPCI_MTCTR | PPCF_T(RID_SYS1);
-+  *p++ = PPCI_BCTR;
-+#else
-   *p++ = PPCI_LIS | PPCF_T(RID_TMP) | (u32ptr(target) >> 16);
--  *p++ = PPCI_LIS | PPCF_T(RID_R12) | (u32ptr(g) >> 16);
-+  *p++ = PPCI_LIS | PPCF_T(RID_R11) | (u32ptr(g) >> 16);
-   *p++ = PPCI_ORI | PPCF_A(RID_TMP)|PPCF_T(RID_TMP) | (u32ptr(target) & 0xffff);
--  *p++ = PPCI_ORI | PPCF_A(RID_R12)|PPCF_T(RID_R12) | (u32ptr(g) & 0xffff);
-+  *p++ = PPCI_ORI | PPCF_A(RID_R11)|PPCF_T(RID_R11) | (u32ptr(g) & 0xffff);
-   *p++ = PPCI_MTCTR | PPCF_T(RID_TMP);
-   *p++ = PPCI_BCTR;
-   for (slot = 0; slot < CALLBACK_MAX_SLOT; slot++) {
--    *p++ = PPCI_LI | PPCF_T(RID_R11) | slot;
-+    *p++ = PPCI_LI | PPCF_T(RID_R12) | slot;
-     *p = PPCI_B | (((page-p) & 0x00ffffffu) << 2);
-     p++;
-   }
--  lua_assert(p - page <= CALLBACK_MCODE_SIZE);
-+#endif
-+  lua_assert(p - page <= CALLBACK_MCODE_SIZE / 4);
- }
-+#endif
- #elif LJ_TARGET_MIPS
- static void callback_mcode_init(global_State *g, uint32_t *page)
- {
-@@ -641,6 +692,15 @@ static void callback_conv_result(CTState *cts, lua_State *L, TValue *o)
- 	*(int32_t *)dp = ctr->size == 1 ? (int32_t)*(int8_t *)dp :
- 					  (int32_t)*(int16_t *)dp;
-     }
-+#if LJ_TARGET_PPC && LJ_ARCH_BITS == 64
-+    if (ctr->size <= 4 &&
-+       (ctype_isinteger_or_bool(ctr->info) || ctype_isenum(ctr->info))) {
-+      if (ctr->info & CTF_UNSIGNED)
-+        *(uint64_t *)dp = (uint64_t)*(uint32_t *)dp;
-+      else
-+        *(int64_t *)dp = (int64_t)*(int32_t *)dp;
-+    }
-+#endif
- #if LJ_TARGET_MIPS64 || (LJ_TARGET_ARM64 && LJ_BE)
-     /* Always sign-extend results to 64 bits. Even a soft-fp 'float'. */
-     if (ctr->size <= 4 &&
-diff --git src/lj_ctype.h src/lj_ctype.h
-index 0c220a8..105865b 100644
---- a/src/lj_ctype.h
-+++ b/src/lj_ctype.h
-@@ -153,7 +153,7 @@ typedef struct CType {
- 
- /* Simplify target-specific configuration. Checked in lj_ccall.h. */
- #define CCALL_MAX_GPR		8
--#define CCALL_MAX_FPR		8
-+#define CCALL_MAX_FPR		14
- 
- typedef LJ_ALIGN(8) union FPRCBArg { double d; float f[2]; } FPRCBArg;
- 
-diff --git src/lj_def.h src/lj_def.h
-index 2d8fff6..381d6f5 100644
---- a/src/lj_def.h
-+++ b/src/lj_def.h
-@@ -71,7 +71,11 @@ typedef unsigned int uintptr_t;
- #define LJ_MAX_IDXCHAIN	100		/* __index/__newindex chain limit. */
- #define LJ_STACK_EXTRA	(5+2*LJ_FR2)	/* Extra stack space (metamethods). */
- 
-+#if defined(__powerpc64__) && _CALL_ELF != 2
-+#define LJ_NUM_CBPAGE	4		/* Number of FFI callback pages. */
-+#else
- #define LJ_NUM_CBPAGE	1		/* Number of FFI callback pages. */
-+#endif
- 
- /* Minimum table/buffer sizes. */
- #define LJ_MIN_GLOBAL	6		/* Min. global table size (hbits). */
-diff --git src/lj_frame.h src/lj_frame.h
-index 19c49a4..c666418 100644
---- a/src/lj_frame.h
-+++ b/src/lj_frame.h
-@@ -210,6 +210,15 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
- #define CFRAME_OFS_MULTRES	408
- #define CFRAME_SIZE		384
- #define CFRAME_SHIFT_MULTRES	3
-+#elif LJ_ARCH_PPC_ELFV2
-+#define CFRAME_OFS_ERRF		360
-+#define CFRAME_OFS_NRES		356
-+#define CFRAME_OFS_PREV		336
-+#define CFRAME_OFS_L		352
-+#define CFRAME_OFS_PC		348
-+#define CFRAME_OFS_MULTRES	344
-+#define CFRAME_SIZE		368
-+#define CFRAME_SHIFT_MULTRES	3
- #elif LJ_ARCH_PPC32ON64
- #define CFRAME_OFS_ERRF		472
- #define CFRAME_OFS_NRES		468
-diff --git src/lj_target_ppc.h src/lj_target_ppc.h
-index c5c991a..f0c8c94 100644
---- a/src/lj_target_ppc.h
-+++ b/src/lj_target_ppc.h
-@@ -30,8 +30,13 @@ enum {
- 
-   /* Calling conventions. */
-   RID_RET = RID_R3,
-+#if LJ_LE
-+  RID_RETHI = RID_R4,
-+  RID_RETLO = RID_R3,
-+#else
-   RID_RETHI = RID_R3,
-   RID_RETLO = RID_R4,
-+#endif
-   RID_FPRET = RID_F1,
- 
-   /* These definitions must match with the *.dasc file(s): */
-@@ -131,6 +136,8 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define PPCF_C(r)	((r) << 6)
- #define PPCF_MB(n)	((n) << 6)
- #define PPCF_ME(n)	((n) << 1)
-+#define PPCF_SH(n)	((((n) & 31) << (11+1)) | (((n) & 32) >> (5-1)))
-+#define PPCF_M6(n)	((((n) & 31) << (5+1)) | (((n) & 32) << (11-5)))
- #define PPCF_Y		0x00200000
- #define PPCF_DOT	0x00000001
- 
-@@ -200,6 +207,13 @@ typedef enum PPCIns {
-   PPCI_RLWINM = 0x54000000,
-   PPCI_RLWIMI = 0x50000000,
- 
-+  PPCI_RLDICL = 0x78000000,
-+  PPCI_RLDICR = 0x78000004,
-+  PPCI_RLDIC = 0x78000008,
-+  PPCI_RLDIMI = 0x7800000c,
-+  PPCI_RLDCL = 0x78000010,
-+  PPCI_RLDCR = 0x78000012,
-+
-   PPCI_B = 0x48000000,
-   PPCI_BL = 0x48000001,
-   PPCI_BC = 0x40800000,
-diff --git src/vm_ppc.dasc src/vm_ppc.dasc
-index b4260eb..abb381e 100644
---- a/src/vm_ppc.dasc
-+++ b/src/vm_ppc.dasc
-@@ -22,35 +22,40 @@
- |// GPR64   64 bit registers (but possibly 32 bit pointers, e.g. PS3).
- |//         Affects reg saves, stack layout, carry/overflow/dot flags etc.
- |// FRAME32 Use 32 bit frame layout, even with GPR64 (Xbox 360).
--|// TOC     Need table of contents (64 bit or 32 bit variant, e.g. PS3).
-+|// OPD     Need function descriptors (64 bit or 32 bit variant, e.g. PS3).
- |//         Function pointers are really a struct: code, TOC, env (optional).
--|// TOCENV  Function pointers have an environment pointer, too (not on PS3).
-+|// OPDENV  Function pointers have an environment pointer, too (not on PS3).
-+|// ELFV2   The 64-bit ELF V2 ABI is in use.
- |// PPE     Power Processor Element of Cell (PS3) or Xenon (Xbox 360).
- |//         Must avoid (slow) micro-coded instructions.
- |
- |.if P64
--|.define TOC, 1
--|.define TOCENV, 1
- |.macro lpx, a, b, c; ldx a, b, c; .endmacro
- |.macro lp, a, b; ld a, b; .endmacro
- |.macro stp, a, b; std a, b; .endmacro
-+|.macro stpx, a, b, c; stdx a, b, c; .endmacro
- |.define decode_OPP, decode_OP8
--|.if FFI
--|// Missing: Calling conventions, 64 bit regs, TOC.
--|.error lib_ffi not yet implemented for PPC64
--|.endif
-+|.define PSIZE, 8
- |.else
- |.macro lpx, a, b, c; lwzx a, b, c; .endmacro
- |.macro lp, a, b; lwz a, b; .endmacro
- |.macro stp, a, b; stw a, b; .endmacro
-+|.macro stpx, a, b, c; stwx a, b, c; .endmacro
- |.define decode_OPP, decode_OP4
-+|.define PSIZE, 4
- |.endif
- |
- |// Convenience macros for TOC handling.
--|.if TOC
-+|.if OPD or ELFV2
- |// Linker needs a TOC patch area for every external call relocation.
--|.macro blex, target; bl extern target@plt; nop; .endmacro
-+|.macro blex, target; bl extern target; nop; .endmacro
- |.macro .toc, a, b; a, b; .endmacro
-+|.else
-+|.macro blex, target; bl extern target@plt; .endmacro
-+|.macro .toc, a, b; .endmacro
-+|.endif
-+|.if OPD
-+|.macro .opd, a, b; a, b; .endmacro
- |.if P64
- |.define TOC_OFS,	 8
- |.define ENV_OFS,	16
-@@ -58,13 +63,13 @@
- |.define TOC_OFS,	4
- |.define ENV_OFS,	8
- |.endif
--|.else  // No TOC.
--|.macro blex, target; bl extern target@plt; .endmacro
--|.macro .toc, a, b; .endmacro
-+|.else  // No OPD.
-+|.macro .opd, a, b; .endmacro
- |.endif
--|.macro .tocenv, a, b; .if TOCENV; a, b; .endif; .endmacro
-+|.macro .opdenv, a, b; .if OPDENV; a, b; .endif; .endmacro
- |
- |.macro .gpr64, a, b; .if GPR64; a, b; .endif; .endmacro
-+|.macro .elfv2, a, b; .if ELFV2; a, b; .endif; .endmacro
- |
- |.macro andix., y, a, i
- |.if PPE
-@@ -75,29 +80,6 @@
- |.endif
- |.endmacro
- |
--|.macro clrso, reg
--|.if PPE
--|  li reg, 0
--|  mtxer reg
--|.else
--|  mcrxr cr0
--|.endif
--|.endmacro
--|
--|.macro checkov, reg, noov
--|.if PPE
--|  mfxer reg
--|  add reg, reg, reg
--|  cmpwi reg, 0
--|   li reg, 0
--|   mtxer reg
--|  bgey noov
--|.else
--|  mcrxr cr0
--|  bley noov
--|.endif
--|.endmacro
--|
- |//-----------------------------------------------------------------------
- |
- |// Fixed register assignments for the interpreter.
-@@ -111,6 +93,7 @@
- |.define LREG,		r18	// Register holding lua_State (also in SAVE_L).
- |.define MULTRES,	r19	// Size of multi-result: (nresults+1)*8.
- |.define JGL,		r31	// On-trace: global_State + 32768.
-+|.define BASEP4,	r25	// Equal to BASE + 4
- |
- |// Constants for type-comparisons, stores and conversions. C callee-save.
- |.define TISNUM,	r22
-@@ -143,12 +126,19 @@
- |
- |.define FARG1,		f1
- |.define FARG2,		f2
-+|.define FARG3,		f3
-+|.define FARG4,		f4
-+|.define FARG5,		f5
-+|.define FARG6,		f6
-+|.define FARG7,		f7
-+|.define FARG8,		f8
- |
- |.define CRET1,		r3
- |.define CRET2,		r4
- |
- |.define TOCREG,	r2	// TOC register (only used by C code).
- |.define ENVREG,	r11	// Environment pointer (nested C functions).
-+|.define FUNCREG,	r12	// ELFv2 function pointer (overlaps RD)
- |
- |// Stack layout while in interpreter. Must match with lj_frame.h.
- |.if GPR64
-@@ -182,6 +172,49 @@
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
- |
-+|.elif ELFV2
-+|
-+|//			392(sp) // \ 32 bit C frame info.
-+|.define SAVE_LR,	384(sp)
-+|.define SAVE_CR,	376(sp) // 64 bit CR save.
-+|.define CFRAME_SPACE,	368     // Delta for sp.
-+|// Back chain for sp:	368(sp) <-- sp entering interpreter
-+|.define SAVE_ERRF,	360(sp) // |
-+|.define SAVE_NRES,	356(sp) // |
-+|.define SAVE_L,	352(sp) //  > Parameter save area.
-+|.define SAVE_PC,	348(sp) // |
-+|.define SAVE_MULTRES,	344(sp) // |
-+|.define SAVE_CFRAME,	336(sp) // / 64 bit C frame chain.
-+|.define SAVE_FPR_,	192     // .. 192+18*8: 64 bit FPR saves.
-+|.define SAVE_GPR_,	48      // .. 48+18*8: 64 bit GPR saves.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	44(sp)
-+|.define TMPD_LO,	40(sp)
-+|.define TONUM_HI,	36(sp)
-+|.define TONUM_LO,	32(sp)
-+|.else
-+|.define TMPD_LO,	44(sp)
-+|.define TMPD_HI,	40(sp)
-+|.define TONUM_LO,	36(sp)
-+|.define TONUM_HI,	32(sp)
-+|.endif
-+|.define SAVE_TOC,	24(sp)  // TOC save area.
-+|// Next frame lr:	16(sp)
-+|// Next frame cr:	8(sp)
-+|// Back chain for sp:	0(sp)	<-- sp while in interpreter
-+|
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
-+|.define TMPD_BLO,	39(sp)
-+|.define TMPD,		TMPD_HI
-+|.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	32
-+|
- |.else
- |
- |//			508(sp) // \ 32 bit C frame info.
-@@ -192,23 +225,39 @@
- |.define SAVE_MULTRES,	456(sp) // |
- |.define SAVE_CFRAME,	448(sp) // / 64 bit C frame chain.
- |.define SAVE_LR,	416(sp)
-+|.define SAVE_CR,	408(sp)  // 64 bit CR save.
- |.define CFRAME_SPACE,	400     // Delta for sp.
- |// Back chain for sp:	400(sp) <-- sp entering interpreter
- |.define SAVE_FPR_,	256     // .. 256+18*8: 64 bit FPR saves.
- |.define SAVE_GPR_,	112     // .. 112+18*8: 64 bit GPR saves.
- |//			48(sp)  // Callee parameter save area (ABI mandated).
- |.define SAVE_TOC,	40(sp)  // TOC save area.
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	36(sp)  // \ Link editor temp (ABI mandated).
-+|.define TMPD_LO,	32(sp)  // /
-+|.define TONUM_HI,	28(sp)  // \ Compiler temp (ABI mandated).
-+|.define TONUM_LO,	24(sp)  // /
-+|.else
- |.define TMPD_LO,	36(sp)  // \ Link editor temp (ABI mandated).
- |.define TMPD_HI,	32(sp)  // /
- |.define TONUM_LO,	28(sp)  // \ Compiler temp (ABI mandated).
- |.define TONUM_HI,	24(sp)  // /
-+|.endif
- |// Next frame lr:	16(sp)
--|.define SAVE_CR,	8(sp)  // 64 bit CR save.
-+|// Next frame cr:	8(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	32(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	39(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	112
- |
- |.endif
- |.else
-@@ -226,16 +275,31 @@
- |.define SAVE_PC,	32(sp)
- |.define SAVE_MULTRES,	28(sp)
- |.define UNUSED1,	24(sp)
-+|.if ENDIAN_LE
-+|.define TMPD_HI,	20(sp)
-+|.define TMPD_LO,	16(sp)
-+|.define TONUM_HI,	12(sp)
-+|.define TONUM_LO,	8(sp)
-+|.else
- |.define TMPD_LO,	20(sp)
- |.define TMPD_HI,	16(sp)
- |.define TONUM_LO,	12(sp)
- |.define TONUM_HI,	8(sp)
-+|.endif
- |// Next frame lr:	4(sp)
- |// Back chain for sp:	0(sp)	<-- sp while in interpreter
- |
-+|.if ENDIAN_LE
-+|.define TMPD_BLO,	16(sp)
-+|.define TMPD,		TMPD_LO
-+|.define TONUM_D,	TONUM_LO
-+|.else
- |.define TMPD_BLO,	23(sp)
- |.define TMPD,		TMPD_HI
- |.define TONUM_D,	TONUM_HI
-+|.endif
-+|
-+|.define EXIT_OFFSET,	16
- |
- |.endif
- |
-@@ -350,8 +414,35 @@
- |//-----------------------------------------------------------------------
- |
- |// Access to frame relative to BASE.
-+|.if ENDIAN_LE
-+|.define FRAME_PC,	-4
-+|.define FRAME_FUNC,	-8
-+|.define FRAME_CONTPC,	-12
-+|.define FRAME_CONTRET,	-16
-+|.define WORD_LO,	0
-+|.define WORD_HI,	4
-+|.define WORD_BLO,	0
-+|.define BASE_LO,	BASE
-+|.define BASE_HI,	BASEP4
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux lo, base, idx
-+|  lwz hi, 4(base)
-+|.endmacro
-+|.else
- |.define FRAME_PC,	-8
- |.define FRAME_FUNC,	-4
-+|.define FRAME_CONTPC,	-16
-+|.define FRAME_CONTRET,	-12
-+|.define WORD_LO,	4
-+|.define WORD_HI,	0
-+|.define WORD_BLO,	7
-+|.define BASE_LO,	BASEP4
-+|.define BASE_HI,	BASE
-+|.macro lwzux2, hi, lo, base, idx
-+|  lwzux hi, base, idx
-+|  lwz lo, 4(base)
-+|.endmacro
-+|.endif
- |
- |// Instruction decode.
- |.macro decode_OP4, dst, ins; rlwinm dst, ins, 2, 22, 29; .endmacro
-@@ -412,6 +503,7 @@
- |// Call decode and dispatch.
- |.macro ins_callt
- |  // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
-+|  addi BASEP4, BASE, 4
- |  lwz PC, LFUNC:RB->pc
- |  lwz INS, 0(PC)
- |   addi PC, PC, 4
-@@ -504,7 +596,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz PC, FRAME_PC(TMP2)		// Fetch PC of previous frame.
-   |  mr BASE, TMP2			// Restore caller base.
-   |  // Prepending may overwrite the pcall frame, so do it at the end.
--  |   stwu TMP1, FRAME_PC(RA)		// Prepend true to results.
-+  |  .if ENDIAN_LE
-+  |    addi RA, RA, -8
-+  |    stw TMP1, WORD_HI(RA)		// Prepend true to results.
-+  |  .else
-+  |    stwu TMP1, -8(RA)		// Prepend true to results.
-+  |  .endif
-   |
-   |->vm_returnc:
-   |  addi RD, RD, 8			// RD = (nresults+1)*8.
-@@ -560,7 +657,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP1, L->maxstack
-   |  cmplw BASE, TMP1
-   |  bge >8
--  |  stw TISNIL, 0(BASE)
-+  |  stw TISNIL, WORD_HI(BASE)
-   |  addi RD, RD, 8
-   |  addi BASE, BASE, 8
-   |  b <2
-@@ -611,7 +708,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_unwind_ff_eh:			// Landing pad for external unwinder.
-   |  lwz L, SAVE_L
-   |  .toc ld TOCREG, SAVE_TOC
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp BASE, L->base
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |   lwz DISPATCH, L->glref		// Setup pointer to dispatch table.
-@@ -626,7 +728,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)			// Results start at BASE-8.
-   |     stw TMP3, TMPD
-   |   addi DISPATCH, DISPATCH, GG_G2DISP
--  |  stw TMP1, 0(RA)			// Prepend false to error message.
-+  |  stw TMP1, WORD_HI(RA)		// Prepend false to error message.
-   |  li RD, 16				// 2 results: false + error message.
-   |    st_vmstate
-   |     lfs TONUM, TMPD
-@@ -687,7 +789,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mr RA, BASE
-   |   lp BASE, L->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |  lwz PC, FRAME_PC(BASE)
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-@@ -737,7 +844,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:  // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype).
-   |  stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  lp TMP2, L->base			// TMP2 = old base (used in vmeta_call).
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |   lp TMP1, L->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |  add PC, PC, BASE
-@@ -757,8 +869,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->vm_call_dispatch:
-   |  // TMP2 = old base, BASE = new base, RC = nargs*8, PC = caller PC
--  |  lwz TMP0, FRAME_PC(BASE)
--  |   lwz LFUNC:RB, FRAME_FUNC(BASE)
-+  |  lwz TMP0, WORD_HI-8(BASE)
-+  |   lwz LFUNC:RB, WORD_LO-8(BASE)
-   |  checkfunc TMP0; bne ->vmeta_call
-   |
-   |->vm_call_dispatch_f:
-@@ -777,7 +889,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   sub TMP0, TMP0, TMP1		// Compute -savestack(L, L->top).
-   |    lp TMP1, L->cframe
-   |     addi DISPATCH, DISPATCH, GG_G2DISP
--  |  .toc lp CARG4, 0(CARG4)
-+  |  .opd lp TOCREG, TOC_OFS(CARG4)
-+  |  .opdenv lp ENVREG, ENV_OFS(CARG4)
-+  |  .opd lp CARG4, 0(CARG4)
-   |  li TMP2, 0
-   |   stw TMP0, SAVE_NRES		// Neg. delta means cframe w/o frame.
-   |  stw TMP2, SAVE_ERRF		// No error function.
-@@ -785,7 +899,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stp sp, L->cframe		// Add our C frame to cframe chain.
-   |     stw L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  mtctr CARG4
-+  |  .elfv2 mr FUNCREG, CARG4
-   |  bctrl			// (lua_State *L, lua_CFunction func, void *ud)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |.if PPE
-   |  mr BASE, CRET1
-   |  cmpwi CRET1, 0
-@@ -807,20 +923,27 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_dispatch:
-   |  // BASE = meta base, RA = resultptr, RD = (nresults+1)*8
--  |  lwz TMP0, -12(BASE)		// Continuation.
-+  |  lwz TMP0, FRAME_CONTRET(BASE)	// Continuation.
-   |   mr RB, BASE
-   |   mr BASE, TMP2			// Restore caller BASE.
-   |    lwz LFUNC:TMP1, FRAME_FUNC(TMP2)
-   |.if FFI
-   |  cmplwi TMP0, 1
-   |.endif
--  |     lwz PC, -16(RB)			// Restore PC from [cont|PC].
--  |   subi TMP2, RD, 8
-+  |     lwz PC, FRAME_CONTPC(RB)	// Restore PC from [cont|PC].
-+  |  addi BASEP4, BASE, 4
-+  |   addi TMP2, RD, WORD_HI-8
-   |    lwz TMP1, LFUNC:TMP1->pc
-   |   stwx TISNIL, RA, TMP2		// Ensure one valid arg.
-+  |.if P64
-+  |   ld TMP3, 0(DISPATCH)
-+  |.endif
-   |.if FFI
-   |  ble >1
-   |.endif
-+  |.if P64
-+  |  add TMP0, TMP0, TMP3
-+  |.endif
-   |    lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // BASE = base, RA = resultptr, RB = meta base
-   |  mtctr TMP0
-@@ -856,20 +979,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tgetb:			// TMP0 = index
-@@ -880,8 +1003,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -909,7 +1032,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 16			// 2 args for func(t, k).
-@@ -923,7 +1046,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f14, 0(CRET1)
-   |  b ->BC_TGETR_Z
-   |1:
--  |  stwx TISNIL, BASE, RA
-+  |  stwx TISNIL, BASE_HI, RA
-   |  b ->cont_nop
-   |
-   |//-----------------------------------------------------------------------
-@@ -932,20 +1055,20 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TSTR
-   |   decode_RB8 RB, INS
--  |  stw STR:RC, 4(CARG3)
-+  |  stw STR:RC, WORD_LO(CARG3)
-   |   add CARG2, BASE, RB
--  |  stw TMP0, 0(CARG3)
-+  |  stw TMP0, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsets:
-   |  la CARG2, DISPATCH_GL(tmptv)(DISPATCH)
-   |  li TMP0, LJ_TTAB
--  |  stw TAB:RB, 4(CARG2)
-+  |  stw TAB:RB, WORD_LO(CARG2)
-   |   la CARG3, DISPATCH_GL(tmptv2)(DISPATCH)
--  |  stw TMP0, 0(CARG2)
-+  |  stw TMP0, WORD_HI(CARG2)
-   |   li TMP1, LJ_TSTR
--  |   stw STR:RC, 4(CARG3)
--  |   stw TMP1, 0(CARG3)
-+  |   stw STR:RC, WORD_LO(CARG3)
-+  |   stw TMP1, WORD_HI(CARG3)
-   |  b >1
-   |
-   |->vmeta_tsetb:			// TMP0 = index
-@@ -956,8 +1079,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la CARG3, DISPATCH_GL(tmptv)(DISPATCH)
-   |   add CARG2, BASE, RB
-   |.if DUALNUM
--  |  stw TISNUM, 0(CARG3)
--  |  stw TMP0, 4(CARG3)
-+  |  stw TISNUM, WORD_HI(CARG3)
-+  |  stw TMP0, WORD_LO(CARG3)
-   |.else
-   |  stfd f0, 0(CARG3)
-   |.endif
-@@ -986,7 +1109,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
-   |  subfic TMP1, BASE, FRAME_CONT
-   |  lp BASE, L->top
--  |  stw PC, -16(BASE)			// [cont|PC]
-+  |  stw PC, FRAME_CONTPC(BASE)		// [cont|PC]
-   |   add PC, TMP1, BASE
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)	// Guaranteed to be a function here.
-   |   li NARGS8:RC, 24			// 3 args for func(t, k, v)
-@@ -1006,17 +1129,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_comp:
-   |  mr CARG1, L
-   |   subi PC, PC, 4
--  |.if DUALNUM
--  |  mr CARG2, RA
--  |.else
-   |  add CARG2, BASE, RA
--  |.endif
-   |   stw PC, SAVE_PC
--  |.if DUALNUM
--  |  mr CARG3, RD
--  |.else
-   |  add CARG3, BASE, RD
--  |.endif
-   |   stp BASE, L->base
-   |  decode_OP1 CARG4, INS
-   |  bl extern lj_meta_comp  // (lua_State *L, TValue *o1, *o2, int op)
-@@ -1043,7 +1158,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b ->cont_nop
-   |
-   |->cont_condt:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is true.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1051,7 +1166,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  b <4
-   |
-   |->cont_condf:			// RA = resultptr
--  |  lwz TMP0, 0(RA)
-+  |  lwz TMP0, WORD_HI(RA)
-   |  .gpr64 extsw TMP0, TMP0
-   |  subfic TMP0, TMP0, LJ_TTRUE	// Branch if result is false.
-   |  subfe CRET1, CRET1, CRET1
-@@ -1103,8 +1218,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |
-   |->vmeta_unm:
--  |  mr CARG3, RD
--  |  mr CARG4, RD
-+  |  add CARG3, BASE, RD
-+  |  add CARG4, BASE, RD
-   |  b >1
-   |
-   |->vmeta_arith_vn:
-@@ -1139,7 +1254,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vmeta_binop:
-   |  // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2
-   |  sub TMP1, CRET1, BASE
--  |   stw PC, -16(CRET1)		// [cont|PC]
-+  |   stw PC, FRAME_CONTPC(CRET1)	// [cont|PC]
-   |   mr TMP2, BASE
-   |  addi PC, TMP1, FRAME_CONT
-   |   mr BASE, CRET1
-@@ -1150,7 +1265,7 @@ static void build_subroutines(BuildCtx *ctx)
- #if LJ_52
-   |  mr SAVE0, CARG1
- #endif
--  |  mr CARG2, RD
-+  |  add CARG2, BASE, RD
-   |   stp BASE, L->base
-   |  mr CARG1, L
-   |   stw PC, SAVE_PC
-@@ -1227,25 +1342,25 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_1, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_2, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz CARG4, 8(BASE)
--  |   lwz CARG1, 4(BASE)
--  |    lwz CARG2, 12(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz CARG4, WORD_HI+8(BASE)
-+  |   lwz CARG1, WORD_LO(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |  blt ->fff_fallback
-   |.endmacro
-   |
-   |.macro .ffunc_n, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1254,9 +1369,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |.macro .ffunc_nn, name
-   |->ff_ .. name:
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |    lfd FARG2, 8(BASE)
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1279,9 +1394,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplw cr1, CARG3, TMP1
-   |    lwz PC, FRAME_PC(BASE)
-   |  bge cr1, ->fff_fallback
--  |   stw CARG3, 0(RA)
-+  |   stw CARG3, WORD_HI(RA)
-   |  addi RD, NARGS8:RC, 8		// Compute (nresults+1)*8.
--  |   stw CARG1, 4(RA)
-+  |   stw CARG1, WORD_LO(RA)
-   |  beq ->fff_res			// Done if exactly 1 argument.
-   |  li TMP1, 8
-   |  subi RC, RC, 8
-@@ -1295,17 +1410,36 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc type
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |  blt ->fff_fallback
-   |  .gpr64 extsw CARG1, CARG1
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG1, 15
-+  |  subfc TMP1, TMP0, CARG1
-+  |.else
-   |  subfc TMP0, TISNUM, CARG1
--  |  subfe TMP2, CARG1, CARG1
-+  |.endif
-+  |    subfe TMP2, CARG1, CARG1
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >1
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 3
-+  |2:
-   |   la TMP2, CFUNC:RB->upvalue
-   |  lfdx FARG1, TMP2, TMP1
-   |  b ->fff_resn
-+  |.if P64
-+  |1:
-+  |  li TMP1, ~LJ_TLIGHTUD<<3
-+  |  b <2
-+  |.endif
-   |
-   |//-- Base library: getters and setters ---------------------------------
-   |
-@@ -1328,10 +1462,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  sub TMP1, TMP0, TMP1
-   |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-   |3:  // Rearranged logic, because we expect _not_ to find the key.
--  |  lwz CARG4, NODE:TMP2->key
--  |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--  |    lwz CARG2, NODE:TMP2->val
--  |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+  |  lwz CARG4, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+  |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+  |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+  |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-   |  checkstr CARG4; bne >4
-   |   cmpw TMP0, STR:RC; beq >5
-   |4:
-@@ -1349,14 +1483,33 @@ static void build_subroutines(BuildCtx *ctx)
-   |6:
-   |  cmpwi CARG3, LJ_TUDATA; beq <1
-   |  .gpr64 extsw CARG3, CARG3
-+  |.if P64
-+  |  li TMP0, LJ_TNUMX
-+  |    srawi TMP3, CARG3, 15
-+  |  subfc TMP1, TMP0, CARG3
-+  |.else
-   |  subfc TMP0, TISNUM, CARG3
-+  |.endif
-   |  subfe TMP2, CARG3, CARG3
-+  |.if P64
-+  |  cmpwi TMP3, -2
-+  |    orc TMP1, TMP2, TMP1
-+  |    subf TMP1, TMP0, TMP1
-+  |  beq >7
-+  |.else
-   |  orc TMP1, TMP2, TMP0
--  |  addi TMP1, TMP1, ~LJ_TISNUM+1
-+  |  subf TMP1, TISNUM, TMP1
-+  |.endif
-   |  slwi TMP1, TMP1, 2
-+  |8:
-   |   la TMP2, DISPATCH_GL(gcroot[GCROOT_BASEMT])(DISPATCH)
-   |  lwzx TAB:CARG1, TMP2, TMP1
-   |  b <2
-+  |.if P64
-+  |7:
-+  |  li TMP1, ~LJ_TLIGHTUD<<2
-+  |  b <8
-+  |.endif
-   |
-   |.ffunc_2 setmetatable
-   |  // Fast path: no mt for table yet and not clearing the mt.
-@@ -1374,8 +1527,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc rawget
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG4, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checktab CARG4; bne ->fff_fallback
-   |   la CARG3, 8(BASE)
-@@ -1390,7 +1543,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc tonumber
-   |  // Only handles the number case inline (without a base argument).
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly one argument.
-   |   checknum CARG1; bgt ->fff_fallback
-@@ -1425,10 +1578,15 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc next
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG1, 0(BASE)
--  |    lwz TAB:CARG2, 4(BASE)
-+  |   lwz CARG1, WORD_HI(BASE)
-+  |    lwz TAB:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-+  |.if ENDIAN_LE
-+  |   add TMP1, BASE, NARGS8:RC
-+  |   stw TISNIL, WORD_HI(TMP1)		// Set missing 2nd arg to nil.
-+  |.else
-   |   stwx TISNIL, BASE, NARGS8:RC	// Set missing 2nd arg to nil.
-+  |.endif
-   |  checktab CARG1
-   |   lwz PC, FRAME_PC(BASE)
-   |  bne ->fff_fallback
-@@ -1464,18 +1622,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, CFUNC:RB->upvalue[0]
-   |  la RA, -8(BASE)
- #endif
--  |   stw TISNIL, 8(BASE)
-+  |   stw TISNIL, 8+WORD_HI(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-   |
-   |.ffunc ipairs_aux
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
--  |    lwz TAB:CARG1, 4(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz TAB:CARG1, WORD_LO(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP2, 12(BASE)
-+  |    lwz TMP2, 8+WORD_LO(BASE)
-   |.else
-   |    lfd FARG2, 8(BASE)
-   |.endif
-@@ -1504,16 +1662,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |   la RA, -8(BASE)
-   |  cmplw TMP0, TMP2
-   |.if DUALNUM
--  |  stw TISNUM, 0(RA)
-+  |  stw TISNUM, WORD_HI(RA)
-   |   slwi TMP3, TMP2, 3
--  |  stw TMP2, 4(RA)
-+  |  stw TMP2, WORD_LO(RA)
-   |.else
-   |   slwi TMP3, TMP2, 3
-   |  stfd FARG2, 0(RA)
-   |.endif
-   |  ble >2				// Not in array part?
--  |  lwzx TMP2, TMP1, TMP3
--  |  lfdx f0, TMP1, TMP3
-+  |  lfdux f0, TMP1, TMP3
-+  |  lwz TMP2, WORD_HI(TMP1)
-   |1:
-   |  checknil TMP2
-   |   li RD, (0+1)*8
-@@ -1532,7 +1690,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmplwi CRET1, 0
-   |   li RD, (0+1)*8
-   |  beq ->fff_res
--  |  lwz TMP2, 0(CRET1)
-+  |  lwz TMP2, WORD_HI(CRET1)
-   |  lfd f0, 0(CRET1)
-   |  b <1
-   |
-@@ -1551,11 +1709,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  la RA, -8(BASE)
- #endif
-   |.if DUALNUM
--  |  stw TISNUM, 8(BASE)
-+  |  stw TISNUM, 8+WORD_HI(BASE)
-   |.else
--  |  stw ZERO, 8(BASE)
-+  |  stw ZERO, 8+WORD_HI(BASE)
-   |.endif
--  |   stw ZERO, 12(BASE)
-+  |   stw ZERO, 8+WORD_LO(BASE)
-   |  li RD, (3+1)*8
-   |  stfd f0, 0(RA)
-   |  b ->fff_res
-@@ -1576,7 +1734,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc xpcall
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, 8+WORD_HI(BASE)
-   |    lfd FARG2, 8(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  blt ->fff_fallback
-@@ -1673,7 +1831,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if resume
-   |  li TMP1, LJ_TTRUE
-   |   la RA, -8(BASE)
--  |  stw TMP1, -8(BASE)			// Prepend true to results.
-+  |  stw TMP1, WORD_HI-8(BASE)		// Prepend true to results.
-   |  addi RD, RD, 16
-   |.else
-   |  mr RA, BASE
-@@ -1693,7 +1851,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f0, 0(TMP3)
-   |   stp TMP3, L:SAVE0->top		// Remove error from coroutine stack.
-   |    li RD, (2+1)*8
--  |   stw TMP1, -8(BASE)		// Prepend false to results.
-+  |   stw TMP1, WORD_HI-8(BASE)		// Prepend false to results.
-   |    la RA, -8(BASE)
-   |  stfd f0, 0(BASE)			// Copy error message.
-   |  b <7
-@@ -1746,8 +1904,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_resi:
-   |  lwz PC, FRAME_PC(BASE)
-   |  la RA, -8(BASE)
--  |  stw TISNUM, -8(BASE)
--  |  stw CRET1, -4(BASE)
-+  |  stw TISNUM, WORD_HI-8(BASE)
-+  |  stw CRET1, WORD_LO-8(BASE)
-   |  b ->fff_res1
-   |1:
-   |  lus CARG3, 0x41e0	// 2^31.
-@@ -1762,9 +1920,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |->fff_restv:
-   |  // CARG3/CARG1 = TValue result.
-   |  lwz PC, FRAME_PC(BASE)
--  |   stw CARG3, -8(BASE)
-+  |   stw CARG3, WORD_HI-8(BASE)
-   |  la RA, -8(BASE)
--  |   stw CARG1, -4(BASE)
-+  |   stw CARG1, WORD_LO-8(BASE)
-   |->fff_res1:
-   |  // RA = results, PC = return.
-   |  li RD, (1+1)*8
-@@ -1782,10 +1940,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  ins_next1
-   |  // Adjust BASE. KBASE is assumed to be set for the calling frame.
-   |   sub BASE, RA, TMP0
-+  |   addi BASEP4, BASE, 4
-   |  ins_next2
-   |
-   |6:  // Fill up results with nil.
--  |  subi TMP1, RD, 8
-+  |  addi TMP1, RD, WORD_HI-8
-   |   addi RD, RD, 8
-   |  stwx TISNIL, RA, TMP1
-   |  b <5
-@@ -1898,7 +2057,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc math_log
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1923,13 +2082,13 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if DUALNUM
-   |.ffunc math_ldexp
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |    lfd FARG1, 0(BASE)
--  |   lwz CARG4, 8(BASE)
-+  |   lwz CARG4, WORD_HI+8(BASE)
-   |.if GPR64
--  |    lwz CARG2, 12(BASE)
-+  |    lwz CARG2, WORD_LO+8(BASE)
-   |.else
--  |    lwz CARG1, 12(BASE)
-+  |    lwz CARG1, WORD_LO+8(BASE)
-   |.endif
-   |  blt ->fff_fallback
-   |  checknum CARG3; bge ->fff_fallback
-@@ -1961,8 +2120,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stfd FARG1, 0(RA)
-   |  li RD, (2+1)*8
-   |.if DUALNUM
--  |   stw TISNUM, 8(RA)
--  |   stw TMP1, 12(RA)
-+  |   stw TISNUM, WORD_HI+8(RA)
-+  |   stw TMP1, WORD_LO+8(RA)
-   |.else
-   |   stfd FARG2, 8(RA)
-   |.endif
-@@ -1989,9 +2148,9 @@ static void build_subroutines(BuildCtx *ctx)
-   |   add TMP2, BASE, NARGS8:RC
-   |  bne >4
-   |1:  // Handle integers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |   bge cr1, ->fff_resi
-   |  checknum CARG4
-   |   xoris TMP0, CARG1, 0x8000
-@@ -2020,7 +2179,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lfd FARG1, 0(BASE)
-   |  bge ->fff_fallback
-   |5:  // Handle numbers.
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |  lfd FARG2, 0(TMP1)
-   |   bge cr1, ->fff_resn
-@@ -2035,7 +2194,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.endif
-   |  b <5
-   |7:  // Convert integer to number and continue above.
--  |   lwz CARG2, 4(TMP1)
-+  |   lwz CARG2, WORD_LO(TMP1)
-   |  bne ->fff_fallback
-   |  tonum_i FARG2, CARG2
-   |  b <6
-@@ -2043,7 +2202,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc_n name
-   |  li TMP1, 8
-   |1:
-+  |.if ENDIAN_LE
-+  |   add CARG2, BASE, TMP1
-+  |   lwz CARG2, WORD_HI(CARG2)
-+  |.else
-   |   lwzx CARG2, BASE, TMP1
-+  |.endif
-   |   lfdx FARG2, BASE, TMP1
-   |  cmplw cr1, TMP1, NARGS8:RC
-   |   checknum CARG2
-@@ -2067,8 +2231,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |.ffunc string_byte			// Only handle the 1-arg case here.
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Need exactly 1 argument.
-   |   checkstr CARG3
-   |   bne ->fff_fallback
-@@ -2099,12 +2263,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_char			// Only handle the 1-arg case here.
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-   |.if DUALNUM
--  |    lwz TMP0, 4(BASE)
-+  |    lwz TMP0, WORD_LO(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-   |  checknum CARG3; bne ->fff_fallback
--  |   la CARG2, 7(BASE)
-+  |   la CARG2, WORD_BLO(BASE)
-   |.else
-   |    lfd FARG1, 0(BASE)
-   |  bne ->fff_fallback			// Exactly 1 argument.
-@@ -2128,16 +2292,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |.ffunc string_sub
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 16
--  |   lwz CARG3, 16(BASE)
-+  |   lwz CARG3, WORD_HI+16(BASE)
-   |.if not DUALNUM
-   |    lfd f0, 16(BASE)
-   |.endif
--  |   lwz TMP0, 0(BASE)
--  |    lwz STR:CARG1, 4(BASE)
-+  |   lwz TMP0, WORD_HI(BASE)
-+  |    lwz STR:CARG1, WORD_LO(BASE)
-   |  blt ->fff_fallback
--  |   lwz CARG2, 8(BASE)
-+  |   lwz CARG2, WORD_HI+8(BASE)
-   |.if DUALNUM
--  |    lwz TMP1, 12(BASE)
-+  |    lwz TMP1, WORD_LO+8(BASE)
-   |.else
-   |    lfd f1, 8(BASE)
-   |.endif
-@@ -2145,7 +2309,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  beq >1
-   |.if DUALNUM
-   |  checknum CARG3
--  |   lwz TMP2, 20(BASE)
-+  |   lwz TMP2, WORD_LO+16(BASE)
-   |  bne ->fff_fallback
-   |1:
-   |  checknum CARG2; bne ->fff_fallback
-@@ -2201,8 +2365,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  .ffunc string_ .. name
-   |  ffgccheck
-   |  cmplwi NARGS8:RC, 8
--  |   lwz CARG3, 0(BASE)
--  |    lwz STR:CARG2, 4(BASE)
-+  |   lwz CARG3, WORD_HI(BASE)
-+  |    lwz STR:CARG2, WORD_LO(BASE)
-   |  blt ->fff_fallback
-   |  checkstr CARG3
-   |   la SBUF:CARG1, DISPATCH_GL(tmpbuf)(DISPATCH)
-@@ -2240,10 +2404,10 @@ static void build_subroutines(BuildCtx *ctx)
-   |  addi TMP1, BASE, 8
-   |  add TMP2, BASE, NARGS8:RC
-   |1:
--  |  lwz CARG4, 0(TMP1)
-+  |  lwz CARG4, WORD_HI(TMP1)
-   |   cmplw cr1, TMP1, TMP2
-   |.if DUALNUM
--  |  lwz CARG2, 4(TMP1)
-+  |  lwz CARG2, WORD_LO(TMP1)
-   |.else
-   |  lfd FARG1, 0(TMP1)
-   |.endif
-@@ -2344,20 +2508,23 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->fff_fallback:			// Call fast function fallback handler.
-   |  // BASE = new base, RB = CFUNC, RC = nargs*8
--  |  lp TMP3, CFUNC:RB->f
-+  |  lp FUNCREG, CFUNC:RB->f
-   |    add TMP1, BASE, NARGS8:RC
-   |   lwz PC, FRAME_PC(BASE)		// Fallback may overwrite PC.
-   |    addi TMP0, TMP1, 8*LUA_MINSTACK
-   |     lwz TMP2, L->maxstack
-   |   stw PC, SAVE_PC			// Redundant (but a defined value).
--  |  .toc lp TMP3, 0(TMP3)
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-   |  cmplw TMP0, TMP2
-   |     stp BASE, L->base
-   |    stp TMP1, L->top
-   |   mr CARG1, L
-   |  bgt >5				// Need to grow stack.
--  |  mtctr TMP3
-+  |  mtctr FUNCREG
-   |  bctrl				// (lua_State *L)
-+  |  .toc lp TOCREG, SAVE_TOC
-   |  // Either throws an error, or recovers and returns -1, 0 or nresults+1.
-   |  lp BASE, L->base
-   |  cmpwi CRET1, 0
-@@ -2459,6 +2626,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |3:
-   |  lp BASE, L->base
-   |4:  // Re-dispatch to static ins.
-+  |  addi BASEP4, BASE, 4
-   |  lwz INS, -4(PC)
-   |  decode_OPP TMP1, INS
-   |   decode_RB8 RB, INS
-@@ -2472,7 +2640,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |
-   |->cont_hook:				// Continue from hook yield.
-   |  addi PC, PC, 4
--  |  lwz MULTRES, -20(RB)		// Restore MULTRES for *M ins.
-+  |  lwz MULTRES, WORD_LO-24(RB)		// Restore MULTRES for *M ins.
-   |  b <4
-   |
-   |->vm_hotloop:			// Hot loop counter underflow.
-@@ -2514,6 +2682,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lp BASE, L->base
-   |   lp TMP0, L->top
-   |   stw ZERO, SAVE_PC			// Invalidate for subsequent line hook.
-+  |  addi BASEP4, BASE, 4
-   |  sub NARGS8:RC, TMP0, BASE
-   |  add RA, BASE, RA
-   |  lwz LFUNC:RB, FRAME_FUNC(BASE)
-@@ -2525,7 +2694,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |.if JIT
-   |  // RA = resultptr, RB = meta base
-   |  lwz INS, -4(PC)
--  |    lwz TRACE:TMP2, -20(RB)		// Save previous trace.
-+  |    lwz TRACE:TMP2, WORD_LO-24(RB)	// Save previous trace.
-   |   addic. TMP1, MULTRES, -8
-   |  decode_RA8 RC, INS			// Call base.
-   |   beq >2
-@@ -2560,10 +2729,16 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG2, PC
-   |  bl extern lj_dispatch_stitch	// (jit_State *J, const BCIns *pc)
-   |  lp BASE, L->base
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
-   |
-   |9:
-+  |.if ENDIAN_LE
-+  |  addi BASEP4, BASE, 4
-+  |  stwx TISNIL, BASEP4, RC
-+  |.else
-   |  stwx TISNIL, BASE, RC
-+  |.endif
-   |  addi RC, RC, 8
-   |  b <3
-   |.endif
-@@ -2578,6 +2753,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
-   |  lp BASE, L->base
-   |  subi PC, PC, 4
-+  |  addi BASEP4, BASE, 4
-   |  b ->cont_nop
- #endif
-   |
-@@ -2586,39 +2762,72 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-----------------------------------------------------------------------
-   |
-   |.macro savex_, a, b, c, d
--  |  stfd f..a, 16+a*8(sp)
--  |  stfd f..b, 16+b*8(sp)
--  |  stfd f..c, 16+c*8(sp)
--  |  stfd f..d, 16+d*8(sp)
-+  |  stfd f..a, EXIT_OFFSET+a*8(sp)
-+  |  stfd f..b, EXIT_OFFSET+b*8(sp)
-+  |  stfd f..c, EXIT_OFFSET+c*8(sp)
-+  |  stfd f..d, EXIT_OFFSET+d*8(sp)
-+  |.endmacro
-+  |
-+  |.macro saver, a
-+  |  stp r..a, EXIT_OFFSET+32*8+a*PSIZE(sp)
-   |.endmacro
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, sp, -(16+32*8+32*4)
--  |  stmw r2, 16+32*8+2*4(sp)
-+  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  saver 3 // CARG1
-+  |  saver 4 // CARG2
-+  |  saver 5 // CARG3
-+  |  saver 17 // DISPATCH
-   |    addi DISPATCH, JGL, -GG_DISP2G-32768
-   |    li CARG2, ~LJ_VMST_EXIT
--  |   lwz CARG1, 16+32*8+32*4(sp)	// Get stack chain.
-+  |   lp CARG1, EXIT_OFFSET+32*8+32*PSIZE(sp)	// Get stack chain.
-   |    stw CARG2, DISPATCH_GL(vmstate)(DISPATCH)
-+  |  saver 2
-+  |  saver 6
-+  |  saver 7
-+  |  saver 8
-+  |  saver 9
-+  |  saver 10
-+  |  saver 11
-+  |  saver 12
-+  |  saver 13
-   |  savex_ 0,1,2,3
--  |   stw CARG1, 0(sp)			// Store extended stack chain.
--  |   clrso TMP1
-+  |   stp CARG1, 0(sp)			// Store extended stack chain.
-+
-   |  savex_ 4,5,6,7
--  |   addi CARG2, sp, 16+32*8+32*4	// Recompute original value of sp.
-+  |  saver 14
-+  |  saver 15
-+  |  saver 16
-+  |  saver 18
-+  |   addi CARG2, sp, EXIT_OFFSET+32*8+32*PSIZE	// Recompute original value of sp.
-   |  savex_ 8,9,10,11
--  |   stw CARG2, 16+32*8+1*4(sp)	// Store sp in RID_SP.
-+  |   stp CARG2, EXIT_OFFSET+32*8+1*PSIZE(sp)	// Store sp in RID_SP.
-   |  savex_ 12,13,14,15
-   |   mflr CARG3
-   |   li TMP1, 0
-   |  savex_ 16,17,18,19
--  |   stw TMP1, 16+32*8+0*4(sp)		// Clear RID_TMP.
-+  |   stw TMP1, EXIT_OFFSET+32*8+0*PSIZE(sp)		// Clear RID_TMP.
-   |  savex_ 20,21,22,23
-   |   lhz CARG4, 2(CARG3)		// Load trace number.
-   |  savex_ 24,25,26,27
-   |  lwz L, DISPATCH_GL(cur_L)(DISPATCH)
-   |  savex_ 28,29,30,31
-+  |  saver 19
-+  |  saver 20
-+  |  saver 21
-+  |  saver 22
-+  |  saver 23
-+  |  saver 24
-+  |  saver 25
-+  |  saver 26
-+  |  saver 27
-+  |  saver 28
-+  |  saver 29
-+  |  saver 30
-+  |  saver 31
-   |   sub CARG3, TMP0, CARG3		// Compute exit number.
--  |  lp BASE, DISPATCH_GL(jit_base)(DISPATCH)
-+  |  lwz BASE, DISPATCH_GL(jit_base)(DISPATCH)
-   |   srwi CARG3, CARG3, 2
-   |  stp L, DISPATCH_J(L)(DISPATCH)
-   |   subi CARG3, CARG3, 2
-@@ -2627,11 +2836,11 @@ static void build_subroutines(BuildCtx *ctx)
-   |  stw TMP1, DISPATCH_GL(jit_base)(DISPATCH)
-   |  addi CARG1, DISPATCH, GG_DISP2J
-   |   stw CARG3, DISPATCH_J(exitno)(DISPATCH)
--  |  addi CARG2, sp, 16
-+  |  addi CARG2, sp, EXIT_OFFSET
-   |  bl extern lj_trace_exit		// (jit_State *J, ExitState *ex)
-   |  // Returns MULTRES (unscaled) or negated error code.
-   |  lp TMP1, L->cframe
--  |  lwz TMP2, 0(sp)
-+  |  lp TMP2, 0(sp)
-   |   lp BASE, L->base
-   |.if GPR64
-   |  rldicr sp, TMP1, 0, 61
-@@ -2639,7 +2848,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  rlwinm sp, TMP1, 0, 0, 29
-   |.endif
-   |   lwz PC, SAVE_PC			// Get SAVE_PC.
--  |  stw TMP2, 0(sp)
-+  |  stp TMP2, 0(sp)
-   |  stw L, SAVE_L			// Set SAVE_L (on-trace resume/yield).
-   |  b >1
-   |.endif
-@@ -2660,7 +2869,12 @@ static void build_subroutines(BuildCtx *ctx)
-   |    stw TMP2, DISPATCH_GL(jit_base)(DISPATCH)
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |  // Setup type comparison constants.
-+  |.if P64
-+  |  lus TISNUM, LJ_TISNUM >> 16
-+  |  ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |  li TISNUM, LJ_TISNUM
-+  |.endif
-   |  lus TMP3, 0x59c0			// TOBIT = 2^52 + 2^51 (float).
-   |  stw TMP3, TMPD
-   |  li ZERO, 0
-@@ -2680,14 +2894,14 @@ static void build_subroutines(BuildCtx *ctx)
-   |   decode_RA8 RA, INS
-   |  lpx TMP0, DISPATCH, TMP1
-   |  mtctr TMP0
--  |  cmplwi TMP1, BC_FUNCF*4		// Function header?
-+  |  cmplwi TMP1, BC_FUNCF*PSIZE	// Function header?
-   |  bge >2
-   |   decode_RB8 RB, INS
-   |   decode_RD8 RD, INS
-   |   decode_RC8 RC, INS
-   |  bctr
-   |2:
--  |  cmplwi TMP1, (BC_FUNCC+2)*4	// Fast function?
-+  |  cmplwi TMP1, (BC_FUNCC+2)*PSIZE	// Fast function?
-   |  blt >3
-   |  // Check frame below fast function.
-   |  lwz TMP1, FRAME_PC(BASE)
-@@ -2697,7 +2911,7 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lwz TMP2, -4(TMP1)
-   |  decode_RA8 TMP0, TMP2
-   |  sub TMP1, BASE, TMP0
--  |  lwz LFUNC:TMP2, -12(TMP1)
-+  |  lwz LFUNC:TMP2, WORD_LO-16(TMP1)
-   |  lwz TMP1, LFUNC:TMP2->pc
-   |  lwz KBASE, PC2PROTO(k)(TMP1)
-   |3:
-@@ -2718,6 +2932,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |// NYI: Use internal implementations of floor, ceil, trunc.
-   |
-   |->vm_modi:
-+  |  li TMP1, 0
-+  |  mtxer TMP1
-   |  divwo. TMP0, CARG1, CARG2
-   |  bso >1
-   |.if GPR64
-@@ -2736,7 +2952,8 @@ static void build_subroutines(BuildCtx *ctx)
-   |  cmpwi CARG2, 0
-   |   li CARG1, 0
-   |  beqlr
--  |  clrso TMP0			// Clear SO for -2147483648 % -1 and return 0.
-+  |  // Clear SO for -2147483648 % -1 and return 0.
-+  |  crxor 4*cr0+so, 4*cr0+so, 4*cr0+so
-   |  blr
-   |
-   |//-----------------------------------------------------------------------
-@@ -2749,10 +2966,18 @@ static void build_subroutines(BuildCtx *ctx)
-   |->vm_cachesync:
-   |.if JIT or FFI
-   |  // Compute start of first cache line and number of cache lines.
-+  |  .if GPR64
-+  |  rldicr CARG1, CARG1, 0, 58
-+  |  .else
-   |  rlwinm CARG1, CARG1, 0, 0, 26
-+  |  .endif
-   |  sub CARG2, CARG2, CARG1
-   |  addi CARG2, CARG2, 31
-+  |  .if GPR64
-+  |  srdi. CARG2, CARG2, 5
-+  |  .else
-   |  rlwinm. CARG2, CARG2, 27, 5, 31
-+  |  .endif
-   |  beqlr
-   |  mtctr CARG2
-   |  mr CARG3, CARG1
-@@ -2774,39 +2999,70 @@ static void build_subroutines(BuildCtx *ctx)
-   |//-- FFI helper functions -----------------------------------------------
-   |//-----------------------------------------------------------------------
-   |
--  |// Handler for callback functions. Callback slot number in r11, g in r12.
-+  |// Handler for callback functions.
-+  |// 32-bit: Callback slot number in r12, g in r11.
-+  |// 64-bit v1: Callback slot number in bits 47+ of r11, g in 0-46, TOC in r2.
-+  |// 64-bit v2: Callback slot number in bits 2-11 of r12, g in r11,
-+  |// vm_ffi_callback in r2.
-   |->vm_ffi_callback:
-   |.if FFI
-   |.type CTSTATE, CTState, PC
-+  |  .if OPD
-+  |   rldicl r12, r11, 17, 47
-+  |   rldicl r11, r11, 0, 17
-+  |  .endif
-+  |  .if ELFV2
-+  |   rlwinm r12, r12, 30, 22, 31
-+  |   addisl TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@ha
-+  |   addil TOCREG, TOCREG, extern .TOC.-lj_vm_ffi_callback@l
-+  |  .endif
-   |  saveregs
--  |  lwz CTSTATE, GL:r12->ctype_state
--  |   addi DISPATCH, r12, GG_G2DISP
--  |  stw r11, CTSTATE->cb.slot
--  |  stw r3, CTSTATE->cb.gpr[0]
-+  |  lwz CTSTATE, GL:r11->ctype_state
-+  |   addi DISPATCH, r11, GG_G2DISP
-+  |  stw r12, CTSTATE->cb.slot
-+  |  stp r3, CTSTATE->cb.gpr[0]
-   |   stfd f1, CTSTATE->cb.fpr[0]
--  |  stw r4, CTSTATE->cb.gpr[1]
-+  |  stp r4, CTSTATE->cb.gpr[1]
-   |   stfd f2, CTSTATE->cb.fpr[1]
--  |  stw r5, CTSTATE->cb.gpr[2]
-+  |  stp r5, CTSTATE->cb.gpr[2]
-   |   stfd f3, CTSTATE->cb.fpr[2]
--  |  stw r6, CTSTATE->cb.gpr[3]
-+  |  stp r6, CTSTATE->cb.gpr[3]
-   |   stfd f4, CTSTATE->cb.fpr[3]
--  |  stw r7, CTSTATE->cb.gpr[4]
-+  |  stp r7, CTSTATE->cb.gpr[4]
-   |   stfd f5, CTSTATE->cb.fpr[4]
--  |  stw r8, CTSTATE->cb.gpr[5]
-+  |  stp r8, CTSTATE->cb.gpr[5]
-   |   stfd f6, CTSTATE->cb.fpr[5]
--  |  stw r9, CTSTATE->cb.gpr[6]
-+  |  stp r9, CTSTATE->cb.gpr[6]
-   |   stfd f7, CTSTATE->cb.fpr[6]
--  |  stw r10, CTSTATE->cb.gpr[7]
-+  |  stp r10, CTSTATE->cb.gpr[7]
-   |   stfd f8, CTSTATE->cb.fpr[7]
-+  |  .if GPR64
-+  |   stfd f9, CTSTATE->cb.fpr[8]
-+  |   stfd f10, CTSTATE->cb.fpr[9]
-+  |   stfd f11, CTSTATE->cb.fpr[10]
-+  |   stfd f12, CTSTATE->cb.fpr[11]
-+  |   stfd f13, CTSTATE->cb.fpr[12]
-+  |  .endif
-+  |  .if ELFV2
-+  |  addi TMP0, sp, CFRAME_SPACE+96
-+  |  .elif GPR64
-+  |  addi TMP0, sp, CFRAME_SPACE+112
-+  |  .else
-   |  addi TMP0, sp, CFRAME_SPACE+8
--  |  stw TMP0, CTSTATE->cb.stack
-+  |  .endif
-+  |  stp TMP0, CTSTATE->cb.stack
-   |   mr CARG1, CTSTATE
-   |  stw CTSTATE, SAVE_PC		// Any value outside of bytecode is ok.
-   |   mr CARG2, sp
-   |  bl extern lj_ccallback_enter	// (CTState *cts, void *cf)
-   |  // Returns lua_State *.
-   |  lp BASE, L:CRET1->base
-+  |.if P64
-+  |     lus TISNUM, LJ_TISNUM >> 16	// Setup type comparison constants.
-+  |     ori TISNUM, TISNUM, LJ_TISNUM & 0xffff
-+  |.else
-   |     li TISNUM, LJ_TISNUM		// Setup type comparison constants.
-+  |.endif
-   |  lp RC, L:CRET1->top
-   |     lus TMP3, 0x59c0		// TOBIT = 2^52 + 2^51 (float).
-   |     li ZERO, 0
-@@ -2835,9 +3091,21 @@ static void build_subroutines(BuildCtx *ctx)
-   |  mr CARG1, CTSTATE
-   |  mr CARG2, RA
-   |  bl extern lj_ccallback_leave	// (CTState *cts, TValue *o)
--  |  lwz CRET1, CTSTATE->cb.gpr[0]
-+  |  lp CRET1, CTSTATE->cb.gpr[0]
-   |  lfd FARG1, CTSTATE->cb.fpr[0]
--  |  lwz CRET2, CTSTATE->cb.gpr[1]
-+  |  lp CRET2, CTSTATE->cb.gpr[1]
-+  |  .if GPR64
-+  |    lfd FARG2, CTSTATE->cb.fpr[1]
-+  |  .else
-+  |    lp CARG3, CTSTATE->cb.gpr[2]
-+  |    lp CARG4, CTSTATE->cb.gpr[3]
-+  |  .endif
-+  |  .elfv2 lfd f3, CTSTATE->cb.fpr[2]
-+  |  .elfv2 lfd f4, CTSTATE->cb.fpr[3]
-+  |  .elfv2 lfd f5, CTSTATE->cb.fpr[4]
-+  |  .elfv2 lfd f6, CTSTATE->cb.fpr[5]
-+  |  .elfv2 lfd f7, CTSTATE->cb.fpr[6]
-+  |  .elfv2 lfd f8, CTSTATE->cb.fpr[7]
-   |  b ->vm_leave_unw
-   |.endif
-   |
-@@ -2850,23 +3118,46 @@ static void build_subroutines(BuildCtx *ctx)
-   |   lbz CARG2, CCSTATE->nsp
-   |   lbz CARG3, CCSTATE->nfpr
-   |  neg TMP1, TMP1
-+  |  .if GPR64
-+  |    std TMP0, 16(sp)
-+  |  .else
-   |    stw TMP0, 4(sp)
-+  |  .endif
-   |   cmpwi cr1, CARG3, 0
-   |  mr TMP2, sp
-   |   addic. CARG2, CARG2, -1
-+  |  .if GPR64
-+  |  stdux sp, sp, TMP1
-+  |  .else
-   |  stwux sp, sp, TMP1
-+  |  .endif
-   |   crnot 4*cr1+eq, 4*cr1+eq		// For vararg calls.
--  |  stw r14, -4(TMP2)
--  |  stw CCSTATE, -8(TMP2)
-+  |  .if GPR64
-+  |    std r14, -8(TMP2)
-+  |    std CCSTATE, -16(TMP2)
-+  |  .else
-+  |    stw r14, -4(TMP2)
-+  |    stw CCSTATE, -8(TMP2)
-+  |  .endif
-   |  mr r14, TMP2
-   |  la TMP1, CCSTATE->stack
-+  |  .if GPR64
-+  |   sldi CARG2, CARG2, 3
-+  |  .else
-   |   slwi CARG2, CARG2, 2
-+  |  .endif
-   |   blty >2
--  |  la TMP2, 8(sp)
-+  |  .if ELFV2
-+  |    la TMP2, 96(sp)
-+  |  .elif GPR64
-+  |    la TMP2, 112(sp)
-+  |  .else
-+  |    la TMP2, 8(sp)
-+  |  .endif
-   |1:
--  |  lwzx TMP0, TMP1, CARG2
--  |  stwx TMP0, TMP2, CARG2
--  |   addic. CARG2, CARG2, -4
-+  |  lpx TMP0, TMP1, CARG2
-+  |  stpx TMP0, TMP2, CARG2
-+  |   addic. CARG2, CARG2, -PSIZE
-   |  bge <1
-   |2:
-   |  bney cr1, >3
-@@ -2878,28 +3169,55 @@ static void build_subroutines(BuildCtx *ctx)
-   |  lfd f6, CCSTATE->fpr[5]
-   |  lfd f7, CCSTATE->fpr[6]
-   |  lfd f8, CCSTATE->fpr[7]
-+  |  .if GPR64
-+  |  lfd f9, CCSTATE->fpr[8]
-+  |  lfd f10, CCSTATE->fpr[9]
-+  |  lfd f11, CCSTATE->fpr[10]
-+  |  lfd f12, CCSTATE->fpr[11]
-+  |  lfd f13, CCSTATE->fpr[12]
-+  |  .endif
-   |3:
--  |   lp TMP0, CCSTATE->func
--  |  lwz CARG2, CCSTATE->gpr[1]
--  |  lwz CARG3, CCSTATE->gpr[2]
--  |  lwz CARG4, CCSTATE->gpr[3]
--  |  lwz CARG5, CCSTATE->gpr[4]
--  |   mtctr TMP0
--  |  lwz r8, CCSTATE->gpr[5]
--  |  lwz r9, CCSTATE->gpr[6]
--  |  lwz r10, CCSTATE->gpr[7]
--  |  lwz CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-+  |  .toc std TOCREG, SAVE_TOC
-+  |   lp FUNCREG, CCSTATE->func
-+  |  lp CARG2, CCSTATE->gpr[1]
-+  |  lp CARG3, CCSTATE->gpr[2]
-+  |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+  |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-+  |  .opd lp FUNCREG, 0(FUNCREG)
-+  |  lp CARG4, CCSTATE->gpr[3]
-+  |  lp CARG5, CCSTATE->gpr[4]
-+  |   mtctr FUNCREG
-+  |  lp r8, CCSTATE->gpr[5]
-+  |  lp r9, CCSTATE->gpr[6]
-+  |  lp r10, CCSTATE->gpr[7]
-+  |  lp CARG1, CCSTATE->gpr[0]		// Do this last, since CCSTATE is CARG1.
-   |   bctrl
--  |  lwz CCSTATE:TMP1, -8(r14)
--  |  lwz TMP2, -4(r14)
-+  |   .toc lp TOCREG, SAVE_TOC
-+  |  .if GPR64
-+  |   ld CCSTATE:TMP1, -16(r14)
-+  |   ld TMP2, -8(r14)
-+  |   ld TMP0, 16(r14)
-+  |  .else
-+  |   lwz CCSTATE:TMP1, -8(r14)
-+  |   lwz TMP2, -4(r14)
-   |   lwz TMP0, 4(r14)
--  |  stw CARG1, CCSTATE:TMP1->gpr[0]
-+  |  .endif
-+  |  stp CARG1, CCSTATE:TMP1->gpr[0]
-   |  stfd FARG1, CCSTATE:TMP1->fpr[0]
--  |  stw CARG2, CCSTATE:TMP1->gpr[1]
-+  |  stp CARG2, CCSTATE:TMP1->gpr[1]
-+  |  .if GPR64
-+  |   stfd FARG2, CCSTATE:TMP1->fpr[1]
-+  |  .endif
-+  |  .elfv2 stfd FARG3, CCSTATE:TMP1->fpr[2]
-+  |  .elfv2 stfd FARG4, CCSTATE:TMP1->fpr[3]
-+  |  .elfv2 stfd FARG5, CCSTATE:TMP1->fpr[4]
-+  |  .elfv2 stfd FARG6, CCSTATE:TMP1->fpr[5]
-+  |  .elfv2 stfd FARG7, CCSTATE:TMP1->fpr[6]
-+  |  .elfv2 stfd FARG8, CCSTATE:TMP1->fpr[7]
-   |   mtlr TMP0
--  |  stw CARG3, CCSTATE:TMP1->gpr[2]
-+  |  stp CARG3, CCSTATE:TMP1->gpr[2]
-   |   mr sp, r14
--  |  stw CARG4, CCSTATE:TMP1->gpr[3]
-+  |  stp CARG4, CCSTATE:TMP1->gpr[3]
-   |   mr r14, TMP2
-   |  blr
-   |.endif
-@@ -2923,13 +3241,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzx TMP1, BASE_HI, RD
-     |    lwz TMP2, -4(PC)
-     |  checknum cr0, TMP0
--    |   lwz CARG3, 4(RD)
-+    |   lwzx CARG3, BASE_LO, RD
-     |    decode_RD4 TMP2, TMP2
-     |  checknum cr1, TMP1
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-@@ -2953,7 +3271,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bgt cr0, ->vmeta_comp
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  bgt cr1, ->vmeta_comp
-     |  blt cr1, >4
-     |  // RA is a number, RD is an integer.
-@@ -2965,7 +3283,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA is an integer, RD is a number.
-     |  tonum_i f0, CARG2
-     |4:
--    |  lfd f1, 0(RD)
-+    |  lfdx f1, BASE, RD
-     |5:
-     |  fcmpu cr0, f0, f1
-     if (op == BC_ISLT) {
-@@ -2981,10 +3299,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  b <1
-     |.else
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
--    |  lwzx TMP1, BASE, RD
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |   lfdx f1, BASE, RD
-@@ -3015,15 +3333,23 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQV;
-     |  // RA = src1*8, RD = src2*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, BASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  .if ENDIAN_LE
-+    |    lwzx TMP1, BASE_HI, RD
-+    |  .else
-+    |    lwzux TMP1, RD, BASE_HI
-+    |  .endif
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-+    |  .if ENDIAN_LE
-+    |   lwzux CARG3, RD, BASE_LO
-+    |  .else
-+    |   lwz CARG3, WORD_LO(RD)
-+    |  .endif
-     |  cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-@@ -3032,14 +3358,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  ble cr7, ->BC_ISNEN_Z
-     }
-     |.else
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   lwz TMP2, 0(PC)
--    |    lfd f0, 0(RA)
-+    |    lfdx f0, BASE, RA
-     |   addi PC, PC, 4
--    |  lwzux TMP1, RD, BASE
-+    |  lwzx TMP1, BASE_HI, RD
-     |  checknum cr0, TMP0
-     |   decode_RD4 TMP2, TMP2
--    |    lfd f1, 0(RD)
-+    |    lfdx f1, BASE, RD
-     |  checknum cr1, TMP1
-     |   addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     |  bge cr0, >5
-@@ -3057,8 +3383,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.endif
-     |5:  // Either or both types are not numbers.
-     |.if not DUALNUM
--    |    lwz CARG2, 4(RA)
--    |    lwz CARG3, 4(RD)
-+    |    lwzx CARG2, BASE_LO, RA
-+    |    lwzx CARG3, BASE_LO, RD
-     |.endif
-     |.if FFI
-     |  cmpwi cr7, TMP0, LJ_TCDATA
-@@ -3074,10 +3400,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.if FFI
-     |  beq cr7, ->vmeta_equal_cd
-     |.endif
-+    |.if P64
-+    |   cmplwi cr7, TMP3, ~LJ_TUDATA		// Avoid 64 bit lightuserdata.
-+    |.endif
-     |    cmplw cr5, CARG2, CARG3
-     |  crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt	// 2: Same type and primitive.
-     |  crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq	// 1: Same tv or different type.
-     |  crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq	// 0: Same type and same tv.
-+    |.if P64
-+    |   cror 4*cr6+lt, 4*cr6+lt, 4*cr7+gt
-+    |.endif
-     |   mr SAVE0, PC
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt	// 0 or 2.
-     |  cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt	// 1 or 2.
-@@ -3116,9 +3448,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQS: case BC_ISNES:
-     vk = op == BC_ISEQS;
-     |  // RA = src*8, RD = str_const*8 (~), JMP with RD = target
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi RD, RD, 1
--    |  lwz STR:TMP3, 4(RA)
-+    |  lwzx STR:TMP3, BASE_LO, RA
-     |    lwz TMP2, 0(PC)
-     |   subfic RD, RD, -4
-     |    addi PC, PC, 4
-@@ -3150,15 +3482,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = op == BC_ISEQN;
-     |  // RA = src*8, RD = num_const*8, JMP with RD = target
-     |.if DUALNUM
--    |  lwzux TMP0, RA, BASE
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
--    |   lwz CARG2, 4(RA)
--    |  lwzux TMP1, RD, KBASE
-+    |   lwzx CARG2, BASE_LO, RA
-+    |  lwzux2 TMP1, CARG3, RD, KBASE
-     |  checknum cr0, TMP0
-     |    lwz TMP2, -4(PC)
-     |  checknum cr1, TMP1
-     |    decode_RD4 TMP2, TMP2
--    |   lwz CARG3, 4(RD)
-     |    addis TMP2, TMP2, -(BCBIAS_J*4 >> 16)
-     if (vk) {
-       |->BC_ISEQN_Z:
-@@ -3175,7 +3506,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     } else {
-       |->BC_ISNEN_Z:  // Dummy label.
-     }
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |    addi PC, PC, 4
-     |   lfdx f0, BASE, RA
-     |    lwz TMP2, -4(PC)
-@@ -3213,7 +3544,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |7:  // RA is not an integer.
-     |  bge cr0, <3
-     |  // RA is a number.
--    |   lfd f0, 0(RA)
-+    |   lfdx f0, BASE, RA
-     |  blt cr1, >1
-     |  // RA is a number, RD is an integer.
-     |  tonum_i f1, CARG3
-@@ -3232,7 +3563,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISEQP: case BC_ISNEP:
-     vk = op == BC_ISEQP;
-     |  // RA = src*8, RD = primitive_type*8 (~), JMP with RD = target
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |   srwi TMP1, RD, 3
-     |    lwz TMP2, 0(PC)
-     |   not TMP1, TMP1
-@@ -3262,7 +3593,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
-     |  // RA = dst*8 or unused, RD = src*8, JMP with RD = target
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |   lwz INS, 0(PC)
-     |   addi PC, PC, 4
-     if (op == BC_IST || op == BC_ISF) {
-@@ -3297,7 +3628,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_ISTYPE:
-     |  // RA = src*8, RD = -type*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  srwi TMP1, RD, 3
-     |  ins_next1
-     |.if not PPE and not GPR64
-@@ -3311,7 +3642,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_ISNUM:
-     |  // RA = src*8, RD = -(TISNUM-1)*8
--    |  lwzx TMP0, BASE, RA
-+    |  lwzx TMP0, BASE_HI, RA
-     |  ins_next1
-     |  checknum TMP0
-     |  bge ->vmeta_istype
-@@ -3330,17 +3661,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_NOT:
-     |  // RA = dst*8, RD = src*8
-     |  ins_next1
--    |  lwzx TMP0, BASE, RD
-+    |  lwzx TMP0, BASE_HI, RD
-     |  .gpr64 extsw TMP0, TMP0
-     |  subfic TMP1, TMP0, LJ_TTRUE
-     |  adde TMP0, TMP0, TMP1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_UNM:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP1, RD, BASE
--    |   lwz TMP0, 4(RD)
-+    |  lwzx TMP1, BASE_HI, RD
-+    |   lwzx TMP0, BASE_LO, RD
-+    |.if DUALNUM and not GPR64
-+    |  mtxer ZERO
-+    |.endif
-     |  checknum TMP1
-     |.if DUALNUM
-     |  bne >5
-@@ -3352,18 +3686,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |.else
-     |  nego. TMP0, TMP0
-     |  bso >4
--    |1:
-     |.endif
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |3:
-     |  ins_next2
-     |4:
--    |.if not GPR64
--    |  // Potential overflow.
--    |  checkov TMP1, <1			// Ignore unrelated overflow.
--    |.endif
-     |  lus TMP1, 0x41e0			// 2^31.
-     |  li TMP0, 0
-     |  b >7
-@@ -3373,8 +3702,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  xoris TMP1, TMP1, 0x8000
-     |7:
-     |  ins_next1
--    |  stwux TMP1, RA, BASE
--    |   stw TMP0, 4(RA)
-+    |  stwx TMP1, BASE_HI, RA
-+    |   stwx TMP0, BASE_LO, RA
-     |.if DUALNUM
-     |  b <3
-     |.else
-@@ -3383,15 +3712,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_LEN:
-     |  // RA = dst*8, RD = src*8
--    |  lwzux TMP0, RD, BASE
--    |   lwz CARG1, 4(RD)
-+    |  lwzx TMP0, BASE_HI, RD
-+    |   lwzx CARG1, BASE_LO, RD
-     |  checkstr TMP0; bne >2
-     |  lwz CRET1, STR:CARG1->len
-     |1:
-     |.if DUALNUM
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |   stw CRET1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |   stwx CRET1, BASE_LO, RA
-     |.else
-     |  tonum_u f0, CRET1		// Result is a non-negative integer.
-     |  ins_next1
-@@ -3426,9 +3755,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, KBASE, RC
-@@ -3442,9 +3778,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzx TMP1, BASE, RB
-+    |   .if ENDIAN_LE and DUALNUM
-+    |     addi TMP2, RC, 4
-+    |   .endif
-+    |   lwzx TMP1, BASE_HI, RB
-     |   .if DUALNUM
--    |     lwzx TMP2, KBASE, RC
-+    |     .if ENDIAN_LE
-+    |       lwzx TMP2, KBASE, TMP2
-+    |     .else
-+    |       lwzx TMP2, KBASE, RC
-+    |     .endif
-     |   .endif
-     |    lfdx f15, BASE, RB
-     |    lfdx f14, KBASE, RC
-@@ -3458,8 +3801,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   .endif
-     ||  break;
-     ||default:
--    |   lwzx TMP1, BASE, RB
--    |   lwzx TMP2, BASE, RC
-+    |   lwzx TMP1, BASE_HI, RB
-+    |   lwzx TMP2, BASE_HI, RC
-     |    lfdx f14, BASE, RB
-     |    lfdx f15, BASE, RC
-     |   checknum cr0, TMP1
-@@ -3514,41 +3857,62 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
-     ||switch (vk) {
-     ||case 0:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG2, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG1, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||case 1:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, KBASE
--    |    lwz CARG2, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG1, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzux CARG1, RC, KBASE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |      lwz TMP2, 4(RC)
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RB, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, KBASE
-+    |      lwz CARG2, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG1, 4(RC)
-+    |   .endif
-     ||  break;
-     ||default:
--    |   lwzux TMP1, RB, BASE
--    |   lwzux TMP2, RC, BASE
--    |    lwz CARG1, 4(RB)
--    |   checknum cr0, TMP1
--    |    lwz CARG2, 4(RC)
-+    |   .if ENDIAN_LE
-+    |     lwzx TMP1, RB, BASE_HI
-+    |     lwzx TMP2, RC, BASE_HI
-+    |      lwzux CARG1, RB, BASE
-+    |     checknum cr0, TMP1
-+    |      lwzux CARG2, RC, BASE
-+    |   .else
-+    |     lwzux TMP1, RB, BASE
-+    |     lwzux TMP2, RC, BASE
-+    |      lwz CARG1, 4(RB)
-+    |     checknum cr0, TMP1
-+    |      lwz CARG2, 4(RC)
-+    |   .endif
-     ||  break;
-     ||}
-+    |  mtxer ZERO
-     |  checknum cr1, TMP2
-     |  bne >5
-     |  bne cr1, >5
-     |  intins CARG1, CARG1, CARG2
--    |  bso >4
--    |1:
-+    |  ins_arithfallback bso
-     |  ins_next1
--    |  stwux TISNUM, RA, BASE
--    |  stw CARG1, 4(RA)
-+    |  stwx TISNUM, BASE_HI, RA
-+    |  stwx CARG1, BASE_LO, RA
-     |2:
-     |  ins_next2
--    |4:  // Overflow.
--    |  checkov TMP0, <1			// Ignore unrelated overflow.
--    |  ins_arithfallback b
-     |5:  // FP variant.
-     ||if (vk == 1) {
-     |  lfd f15, 0(RB)
-@@ -3620,9 +3984,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_POW:
-     |  // NYI: (partial) integer arithmetic.
--    |  lwzx TMP1, BASE, RB
-+    |  lwzx TMP1, BASE_HI, RB
-     |   lfdx FARG1, BASE, RB
--    |  lwzx TMP2, BASE, RC
-+    |  lwzx TMP2, BASE_HI, RC
-     |   lfdx FARG2, BASE, RC
-     |  checknum cr0, TMP1
-     |  checknum cr1, TMP2
-@@ -3648,6 +4012,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns NULL (finished) or TValue * (metamethod).
-     |  cmplwi CRET1, 0
-     |   lp BASE, L->base
-+    |   addi BASEP4, BASE, 4
-     |  bne ->vmeta_binop
-     |  ins_next1
-     |  lfdx f0, BASE, SAVE0		// Copy result from RB to RA.
-@@ -3664,8 +4029,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-str_const*4
-     |  li TMP2, LJ_TSTR
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     break;
-   case BC_KCDATA:
-@@ -3676,8 +4041,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  ins_next1
-     |  lwzx TMP0, KBASE, TMP1		// KBASE-4-cdata_const*4
-     |  li TMP2, LJ_TCDATA
--    |  stwux TMP2, RA, BASE
--    |  stw TMP0, 4(RA)
-+    |  stwx TMP2, BASE_HI, RA
-+    |  stwx TMP0, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3687,14 +4052,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  slwi RD, RD, 13
-     |  srawi RD, RD, 16
-     |  ins_next1
--    |   stwux TISNUM, RA, BASE
--    |   stw RD, 4(RA)
-+    |   stwx TISNUM, BASE_HI, RA
-+    |   stwx RD, BASE_LO, RA
-     |  ins_next2
-     |.else
-     |  // The soft-float approach is faster.
-     |  slwi RD, RD, 13
-     |  srawi TMP1, RD, 31
-     |  xor TMP2, TMP1, RD
-+    |  .gpr64 extsw RD, RD
-     |  sub TMP2, TMP2, TMP1		// TMP2 = abs(x)
-     |  cntlzw TMP3, TMP2
-     |  subfic TMP1, TMP3, 0x40d		// TMP1 = exponent-1
-@@ -3706,8 +4072,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add RD, RD, TMP1		// hi = hi + exponent-1
-     |    and RD, RD, TMP0		// hi = x == 0 ? 0 : hi
-     |  ins_next1
--    |    stwux RD, RA, BASE
--    |    stw ZERO, 4(RA)
-+    |    stwx RD, BASE_HI, RA
-+    |    stwx ZERO, BASE_LO, RA
-     |  ins_next2
-     |.endif
-     break;
-@@ -3723,15 +4089,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  srwi TMP1, RD, 3
-     |  not TMP0, TMP1
-     |  ins_next1
--    |  stwx TMP0, BASE, RA
-+    |  stwx TMP0, BASE_HI, RA
-     |  ins_next2
-     break;
-   case BC_KNIL:
-     |  // RA = base*8, RD = end*8
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |   addi RA, RA, 8
-     |1:
--    |  stwx TISNIL, BASE, RA
-+    |  stwx TISNIL, BASE_HI, RA
-     |  cmpw RA, RD
-     |   addi RA, RA, 8
-     |  blt <1
-@@ -3763,10 +4129,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lwz CARG2, UPVAL:RB->v
-     |  andix. TMP3, TMP3, LJ_GC_BLACK	// isblack(uv)
-     |    lbz TMP0, UPVAL:RB->closed
--    |   lwz TMP2, 0(RD)
-+    |   lwz TMP2, WORD_HI(RD)
-     |   stfd f0, 0(CARG2)
-     |    cmplwi cr1, TMP0, 0
--    |   lwz TMP1, 4(RD)
-+    |   lwz TMP1, WORD_LO(RD)
-     |  cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq
-     |   subi TMP2, TMP2, (LJ_TNUMX+1)
-     |  bne >2				// Upvalue is closed and black?
-@@ -3799,8 +4165,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   lbz TMP3, STR:TMP1->marked
-     |   lbz TMP2, UPVAL:RB->closed
-     |   li TMP0, LJ_TSTR
--    |   stw STR:TMP1, 4(CARG2)
--    |   stw TMP0, 0(CARG2)
-+    |   stw STR:TMP1, WORD_LO(CARG2)
-+    |   stw TMP0, WORD_HI(CARG2)
-     |  bne >2
-     |1:
-     |  ins_next
-@@ -3837,7 +4203,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwzx UPVAL:RB, LFUNC:RB, RA
-     |  ins_next1
-     |  lwz TMP1, UPVAL:RB->v
--    |  stw TMP0, 0(TMP1)
-+    |  stw TMP0, WORD_HI(TMP1)
-     |  ins_next2
-     break;
- 
-@@ -3852,6 +4218,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   add CARG2, BASE, RA
-     |  bl extern lj_func_closeuv	// (lua_State *L, TValue *level)
-     |  lp BASE, L->base
-+    |  addi BASEP4, BASE, 4
-     |1:
-     |  ins_next
-     break;
-@@ -3870,8 +4237,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Returns GCfuncL *.
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TFUNC
--    |  stwux TMP0, RA, BASE
--    |  stw LFUNC:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx LFUNC:CRET1, BASE_LO, RA
-     |  ins_next
-     break;
- 
-@@ -3904,8 +4272,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |  lp BASE, L->base
-     |   li TMP0, LJ_TTAB
--    |  stwux TMP0, RA, BASE
--    |  stw TAB:CRET1, 4(RA)
-+    |  addi BASEP4, BASE, 4
-+    |  stwx TMP0, BASE_HI, RA
-+    |  stwx TAB:CRET1, BASE_LO, RA
-     |  ins_next
-     if (op == BC_TNEW) {
-       |3:
-@@ -3938,13 +4307,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TGETV:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -3971,8 +4340,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP2, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tgetv		// Integer key and in array part?
--    |  lwzx TMP0, TMP1, TMP2
--    |   lfdx f14, TMP1, TMP2
-+    |  .if ENDIAN_LE
-+    |    lfdux f14, TMP1, TMP2
-+    |    lwz TMP0, WORD_HI(TMP1)
-+    |  .else
-+    |    lwzx TMP0, TMP1, TMP2
-+    |    lfdx f14, TMP1, TMP2
-+    |  .endif
-     |  checknil TMP0; beq >2
-     |1:
-     |  ins_next1
-@@ -3991,15 +4365,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tgetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TGETS_Z			// String key?
-     break;
-   case BC_TGETS:
-     |  // RA = dst*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4015,16 +4389,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  sub TMP1, TMP0, TMP1
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
--    |     lwz TMP1, 4+offsetof(Node, val)(NODE:TMP2)
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-+    |     lwz TMP1, WORD_LO+offsetof(Node, val)(NODE:TMP2)
-     |  checkstr CARG1; bne >4
-     |   cmpw TMP0, STR:RC; bne >4
-     |    checknil CARG2; beq >5		// Key found, but nil value?
-     |3:
--    |    stwux CARG2, RA, BASE
--    |     stw TMP1, 4(RA)
-+    |    stwx CARG2, BASE_HI, RA
-+    |     stwx TMP1, BASE_LO, RA
-     |  ins_next
-     |
-     |4:  // Follow hash chain.
-@@ -4045,15 +4419,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETB:
-     |  // RA = dst*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tgetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-     |  cmplw TMP0, TMP1; bge ->vmeta_tgetb
--    |  lwzx TMP1, TMP2, RC
--    |   lfdx f0, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    lfdux f0, TMP2, RC
-+    |    lwz TMP1, WORD_HI(TMP2)
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |    lfdx f0, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  ins_next1
-@@ -4071,12 +4450,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TGETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG1, 4(RB)
-+    |  lwzx TAB:CARG1, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |  lwz TMP0, TAB:CARG1->asize
--    |  lwz CARG2, 4(RC)
-+    |  lwzx CARG2, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG1->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4096,13 +4473,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- 
-   case BC_TSETV:
-     |  // RA = src*8, RB = table*8, RC = key*8
--    |  lwzux CARG1, RB, BASE
--    |  lwzux CARG2, RC, BASE
--    |   lwz TAB:RB, 4(RB)
-+    |  lwzx CARG1, BASE_HI, RB
-+    |  lwzx CARG2, BASE_HI, RC
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |.if DUALNUM
--    |   lwz RC, 4(RC)
-+    |   lwzx RC, BASE_LO, RC
-     |.else
--    |   lfd f0, 0(RC)
-+    |   lfdx f0, BASE, RC
-     |.endif
-     |  checktab CARG1
-     |   checknum cr1, CARG2
-@@ -4129,7 +4506,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   slwi TMP0, TMP2, 3
-     |.endif
-     |  ble ->vmeta_tsetv		// Integer key and in array part?
-+    |  .if ENDIAN_LE
-+    |   addi TMP2, TMP1, 4
-+    |   lwzx TMP2, TMP2, TMP0
-+    |  .else
-     |   lwzx TMP2, TMP1, TMP0
-+    |  .endif
-     |  lbz TMP3, TAB:RB->marked
-     |    lfdx f14, BASE, RA
-     |   checknil TMP2; beq >3
-@@ -4152,7 +4534,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:
-     |  checkstr CARG2; bne ->vmeta_tsetv
-     |.if not DUALNUM
--    |  lwz STR:RC, 4(RC)
-+    |  lwzx STR:RC, BASE_LO, RC
-     |.endif
-     |  b ->BC_TSETS_Z			// String key?
-     |
-@@ -4162,9 +4544,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETS:
-     |  // RA = src*8, RB = table*8, RC = str_const*8 (~)
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP1, RC, 1
--    |    lwz TAB:RB, 4(RB)
-+    |    lwzx TAB:RB, BASE_LO, RB
-     |   subfic TMP1, TMP1, -4
-     |  checktab CARG1
-     |   lwzx STR:RC, KBASE, TMP1	// KBASE-4-str_const*4
-@@ -4183,9 +4565,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    lbz TMP3, TAB:RB->marked
-     |  add NODE:TMP2, NODE:TMP2, TMP1	// node = tab->node + (idx*32-idx*8)
-     |1:
--    |  lwz CARG1, NODE:TMP2->key
--    |   lwz TMP0, 4+offsetof(Node, key)(NODE:TMP2)
--    |    lwz CARG2, NODE:TMP2->val
-+    |  lwz CARG1, WORD_HI+offsetof(Node, key)(NODE:TMP2)
-+    |   lwz TMP0, WORD_LO+offsetof(Node, key)(NODE:TMP2)
-+    |    lwz CARG2, WORD_HI+offsetof(Node, val)(NODE:TMP2)
-     |     lwz NODE:TMP1, NODE:TMP2->next
-     |  checkstr CARG1; bne >5
-     |   cmpw TMP0, STR:RC; bne >5
-@@ -4225,13 +4607,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq ->vmeta_tsets		// 'no __newindex' flag NOT set: check.
-     |6:
-     |  li TMP0, LJ_TSTR
--    |   stw STR:RC, 4(CARG3)
-+    |   stw STR:RC, WORD_LO(CARG3)
-     |   mr CARG2, TAB:RB
--    |  stw TMP0, 0(CARG3)
-+    |  stw TMP0, WORD_HI(CARG3)
-     |  bl extern lj_tab_newkey		// (lua_State *L, GCtab *t, TValue *k)
-     |  // Returns TValue *.
-     |  lp BASE, L->base
-     |  stfd f14, 0(CRET1)
-+    |   addi BASEP4, BASE, 4
-     |  b <3				// No 2nd write barrier needed.
-     |
-     |7:  // Possible table write barrier for the value. Skip valiswhite check.
-@@ -4240,9 +4623,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETB:
-     |  // RA = src*8, RB = table*8, RC = index*8
--    |  lwzux CARG1, RB, BASE
-+    |  lwzx CARG1, BASE_HI, RB
-     |   srwi TMP0, RC, 3
--    |   lwz TAB:RB, 4(RB)
-+    |   lwzx TAB:RB, BASE_LO, RB
-     |  checktab CARG1; bne ->vmeta_tsetb
-     |  lwz TMP1, TAB:RB->asize
-     |   lwz TMP2, TAB:RB->array
-@@ -4250,7 +4633,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw TMP0, TMP1
-     |   lfdx f14, BASE, RA
-     |  bge ->vmeta_tsetb
--    |  lwzx TMP1, TMP2, RC
-+    |  .if ENDIAN_LE
-+    |    addi TMP1, TMP2, 4
-+    |    lwzx TMP1, TMP1, RC
-+    |  .else
-+    |    lwzx TMP1, TMP2, RC
-+    |  .endif
-     |  checknil TMP1; beq >5
-     |1:
-     |  andix. TMP0, TMP3, LJ_GC_BLACK	// isblack(table)
-@@ -4274,13 +4662,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_TSETR:
-     |  // RA = dst*8, RB = table*8, RC = key*8
--    |  add RB, BASE, RB
--    |  lwz TAB:CARG2, 4(RB)
-+    |  lwzx TAB:CARG2, BASE_LO, RB
-     |.if DUALNUM
--    |  add RC, BASE, RC
-     |    lbz TMP3, TAB:CARG2->marked
-     |  lwz TMP0, TAB:CARG2->asize
--    |  lwz CARG3, 4(RC)
-+    |  lwzx CARG3, BASE_LO, RC
-     |   lwz TMP1, TAB:CARG2->array
-     |.else
-     |  lfdx f0, BASE, RC
-@@ -4311,9 +4697,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |1:
-     |   add TMP3, KBASE, RD
--    |  lwz TAB:CARG2, -4(RA)		// Guaranteed to be a table.
-+    |  lwz TAB:CARG2, WORD_LO-8(RA)	// Guaranteed to be a table.
-     |    addic. TMP0, MULTRES, -8
--    |   lwz TMP3, 4(TMP3)		// Integer constant is in lo-word.
-+    |   lwz TMP3, WORD_LO(TMP3)		// Integer constant is in lo-word.
-     |    srwi CARG3, TMP0, 3
-     |    beq >4				// Nothing to copy?
-     |  add CARG3, CARG3, TMP3
-@@ -4362,8 +4748,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_CALL:
-     |  // RA = base*8, (RB = (nresults+1)*8,) RC = (nargs+1)*8
-     |  mr TMP2, BASE
--    |  lwzux TMP0, BASE, RA
--    |   lwz LFUNC:RB, 4(BASE)
-+    |  lwzux2 TMP0, LFUNC:RB, BASE, RA
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |   addi BASE, BASE, 8
-     |  checkfunc TMP0; bne ->vmeta_call
-@@ -4377,8 +4762,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     break;
-   case BC_CALLT:
-     |  // RA = base*8, (RB = 0,) RC = (nargs+1)*8
--    |  lwzux TMP0, RA, BASE
--    |   lwz LFUNC:RB, 4(RA)
-+    |  lwzux2 TMP0, LFUNC:RB, RA, BASE
-     |    subi NARGS8:RC, NARGS8:RC, 8
-     |    lwz TMP1, FRAME_PC(BASE)
-     |  checkfunc TMP0
-@@ -4430,12 +4814,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 ((2+1)*8))
-     |  mr TMP2, BASE
-     |  add BASE, BASE, RA
--    |  lwz TMP1, -24(BASE)
--    |   lwz LFUNC:RB, -20(BASE)
-+    |  lwz TMP1, WORD_HI-24(BASE)
-+    |   lwz LFUNC:RB, WORD_LO-24(BASE)
-     |    lfd f1, -8(BASE)
-     |    lfd f0, -16(BASE)
--    |  stw TMP1, 0(BASE)		// Copy callable.
--    |   stw LFUNC:RB, 4(BASE)
-+    |  stw TMP1, WORD_HI(BASE)		// Copy callable.
-+    |   stw LFUNC:RB, WORD_LO(BASE)
-     |  checkfunc TMP1
-     |    stfd f1, 16(BASE)		// Copy control var.
-     |     li NARGS8:RC, 16		// Iterators get 2 arguments.
-@@ -4450,8 +4834,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // NYI: add hotloop, record BC_ITERN.
-     |.endif
-     |  add RA, BASE, RA
--    |  lwz TAB:RB, -12(RA)
--    |  lwz RC, -4(RA)			// Get index from control var.
-+    |  lwz TAB:RB, WORD_LO-16(RA)
-+    |  lwz RC, WORD_LO-8(RA)		// Get index from control var.
-     |  lwz TMP0, TAB:RB->asize
-     |  lwz TMP1, TAB:RB->array
-     |   addi PC, PC, 4
-@@ -4459,14 +4843,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  cmplw RC, TMP0
-     |   slwi TMP3, RC, 3
-     |  bge >5				// Index points after array part?
--    |  lwzx TMP2, TMP1, TMP3
--    |   lfdx f0, TMP1, TMP3
-+    |  lfdux f0, TMP3, TMP1
-+    |   lwz TMP2, WORD_HI(TMP3)
-     |  checknil TMP2
-     |     lwz INS, -4(PC)
-     |  beq >4
-     |.if DUALNUM
--    |   stw RC, 4(RA)
--    |   stw TISNUM, 0(RA)
-+    |   stw RC, WORD_LO(RA)
-+    |   stw TISNUM, WORD_HI(RA)
-     |.else
-     |   tonum_u f1, RC
-     |.endif
-@@ -4474,7 +4858,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |     addis TMP3, PC, -(BCBIAS_J*4 >> 16)
-     |  stfd f0, 8(RA)
-     |     decode_RD4 TMP1, INS
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |     add PC, TMP1, TMP3
-     |.if not DUALNUM
-     |   stfd f1, 0(RA)
-@@ -4496,9 +4880,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgty <3
-     |   slwi RB, RC, 3
-     |   sub TMP3, TMP3, RB
--    |  lwzx RB, TMP2, TMP3
--    |  lfdx f0, TMP2, TMP3
--    |   add NODE:TMP3, TMP2, TMP3
-+    |  lfdux f0, TMP3, TMP2
-+    |  lwz RB, WORD_HI(TMP3)
-     |  checknil RB
-     |     lwz INS, -4(PC)
-     |  beq >7
-@@ -4510,7 +4893,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |   stfd f1, 0(RA)
-     |    addi RC, RC, 1
-     |     add PC, TMP1, TMP2
--    |    stw RC, -4(RA)			// Update control var.
-+    |    stw RC, WORD_LO-8(RA)		// Update control var.
-     |  b <3
-     |
-     |7:  // Skip holes in hash part.
-@@ -4521,10 +4904,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_ISNEXT:
-     |  // RA = base*8, RD = target (points to ITERN)
-     |  add RA, BASE, RA
--    |  lwz TMP0, -24(RA)
--    |  lwz CFUNC:TMP1, -20(RA)
--    |   lwz TMP2, -16(RA)
--    |    lwz TMP3, -8(RA)
-+    |  lwz TMP0, WORD_HI-24(RA)
-+    |  lwz CFUNC:TMP1, WORD_LO-24(RA)
-+    |   lwz TMP2, WORD_HI-16(RA)
-+    |    lwz TMP3, WORD_HI-8(RA)
-     |   cmpwi cr0, TMP2, LJ_TTAB
-     |  cmpwi cr1, TMP0, LJ_TFUNC
-     |    cmpwi cr6, TMP3, LJ_TNIL
-@@ -4538,17 +4921,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bne cr0, >5
-     |  lus TMP1, 0xfffe
-     |  ori TMP1, TMP1, 0x7fff
--    |  stw ZERO, -4(RA)			// Initialize control var.
--    |  stw TMP1, -8(RA)
-+    |  stw ZERO, WORD_LO-8(RA)		// Initialize control var.
-+    |  stw TMP1, WORD_HI-8(RA)
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-     |1:
-     |  ins_next
-     |5:  // Despecialize bytecode if any of the checks fail.
-     |  li TMP0, BC_JMP
-     |   li TMP1, BC_ITERC
-+    |  .if ENDIAN_LE
-+    |  stb TMP0, -4(PC)
-+    |  .else
-     |  stb TMP0, -1(PC)
-+    |  .endif
-     |    addis PC, TMP3, -(BCBIAS_J*4 >> 16)
-+    |  .if ENDIAN_LE
-+    |   stb TMP1, 0(PC)
-+    |  .else
-     |   stb TMP1, 3(PC)
-+    |  .endif
-     |  b <1
-     break;
- 
-@@ -4582,7 +4973,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    addi RA, RA, 8
-     |   blt cr1, <1			// More vararg slots?
-     |2:  // Fill up remainder with nil.
--    |  stw TISNIL, 0(RA)
-+    |  stw TISNIL, WORD_HI(RA)
-     |  cmplw RA, TMP2
-     |   addi RA, RA, 8
-     |  blt <2
-@@ -4619,6 +5010,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  add RA, BASE, RA
-     |  add RC, BASE, SAVE0
-     |  subi TMP3, BASE, 8
-+    |  addi BASEP4, BASE, 4
-     |  b <6
-     break;
- 
-@@ -4667,13 +5059,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4709,13 +5102,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  bgt >6
-     |   sub BASE, TMP2, RA
-     |  lwz LFUNC:TMP1, FRAME_FUNC(BASE)
-+    |  addi BASEP4, BASE, 4
-     |  ins_next1
-     |  lwz TMP1, LFUNC:TMP1->pc
-     |  lwz KBASE, PC2PROTO(k)(TMP1)
-     |  ins_next2
-     |
-     |6:  // Fill up results with nil.
--    |  subi TMP1, RD, 8
-+    |  addi TMP1, RD, WORD_HI-8
-     |   addi RD, RD, 8
-     |  stwx TISNIL, TMP2, TMP1
-     |  b <5
-@@ -4741,11 +5135,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     vk = (op == BC_IFORL || op == BC_JFORL);
-     |.if DUALNUM
-     |  // Integer loop.
--    |  lwzux TMP1, RA, BASE
--    |   lwz CARG1, FORL_IDX*8+4(RA)
-+    |  lwzux2 TMP1, CARG1, RA, BASE
-+    if (vk) {
-+      |  mtxer ZERO
-+    }
-     |  cmplw cr0, TMP1, TISNUM
-     if (vk) {
--      |   lwz CARG3, FORL_STEP*8+4(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-       |  bne >9
-       |.if GPR64
-       |  // Need to check overflow for (a<<32) + (b<<32).
-@@ -4757,15 +5153,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |  addo. CARG1, CARG1, CARG3
-       |.endif
-       |    cmpwi cr6, CARG3, 0
--      |   lwz CARG2, FORL_STOP*8+4(RA)
--      |  bso >6
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-+      |  bso >2
-       |4:
--      |  stw CARG1, FORL_IDX*8+4(RA)
-+      |  stw CARG1, FORL_IDX*8+WORD_LO(RA)
-     } else {
--      |  lwz TMP3, FORL_STEP*8(RA)
--      |   lwz CARG3, FORL_STEP*8+4(RA)
--      |  lwz TMP2, FORL_STOP*8(RA)
--      |   lwz CARG2, FORL_STOP*8+4(RA)
-+      |  lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-+      |   lwz CARG3, FORL_STEP*8+WORD_LO(RA)
-+      |  lwz TMP2, FORL_STOP*8+WORD_HI(RA)
-+      |   lwz CARG2, FORL_STOP*8+WORD_LO(RA)
-       |  cmplw cr7, TMP3, TISNUM
-       |  cmplw cr1, TMP2, TISNUM
-       |  crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
-@@ -4776,11 +5172,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |    blt cr6, >5
-     |  cmpw CARG1, CARG2
-     |1:
--    |   stw TISNUM, FORL_EXT*8(RA)
-+    |   stw TISNUM, FORL_EXT*8+WORD_HI(RA)
-     if (op != BC_JFORL) {
-       |  srwi RD, RD, 1
-     }
--    |   stw CARG1, FORL_EXT*8+4(RA)
-+    |   stw CARG1, FORL_EXT*8+WORD_LO(RA)
-     if (op != BC_JFORL) {
-       |  add RD, PC, RD
-     }
-@@ -4800,11 +5196,6 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |5:  // Invert check for negative step.
-     |  cmpw CARG2, CARG1
-     |  b <1
--    if (vk) {
--      |6:  // Potential overflow.
--      |  checkov TMP0, <4		// Ignore unrelated overflow.
--      |  b <2
--    }
-     |.endif
-     if (vk) {
-       |.if DUALNUM
-@@ -4815,14 +5206,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-       |.endif
-       |  lfd f3, FORL_STEP*8(RA)
-       |  lfd f2, FORL_STOP*8(RA)
--      |   lwz TMP3, FORL_STEP*8(RA)
-+      |   lwz TMP3, FORL_STEP*8+WORD_HI(RA)
-       |  fadd f1, f1, f3
-       |  stfd f1, FORL_IDX*8(RA)
-     } else {
-       |.if DUALNUM
-       |9:  // FP loop.
-       |.else
-+      |.if ENDIAN_LE
-+      |  lwzx TMP1, RA, BASE_LO
-+      |  add RA, RA, BASE
-+      |.else
-       |  lwzux TMP1, RA, BASE
-+      |.endif
-       |  lwz TMP3, FORL_STEP*8(RA)
-       |  lwz TMP2, FORL_STOP*8(RA)
-       |  cmplw cr0, TMP1, TISNUM
-@@ -4903,17 +5299,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
- #endif
-   case BC_IITERL:
-     |  // RA = base*8, RD = target
--    |  lwzux TMP1, RA, BASE
--    |   lwz TMP2, 4(RA)
-+    |  lwzux2 TMP1, TMP2, RA, BASE
-     |  checknil TMP1; beq >1		// Stop if iterator returned nil.
-     if (op == BC_JITERL) {
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-       |  b =>BC_JLOOP
-     } else {
-       |  branch_RD			// Otherwise save control var + branch.
--      |  stw TMP1, -8(RA)
--      |   stw TMP2, -4(RA)
-+      |  stw TMP1, WORD_HI-8(RA)
-+      |   stw TMP2, WORD_LO-8(RA)
-     }
-     |1:
-     |  ins_next
-@@ -4942,7 +5337,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  // Traces on PPC don't store the trace number, so use 0.
-     |   stw ZERO, DISPATCH_GL(vmstate)(DISPATCH)
-     |  lwzx TRACE:TMP2, TMP1, RD
--    |  clrso TMP1
-+    |  mtxer ZERO
-     |  lp TMP2, TRACE:TMP2->mcode
-     |   stw BASE, DISPATCH_GL(jit_base)(DISPATCH)
-     |  mtctr TMP2
-@@ -4994,7 +5389,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     }
-     |
-     |3:  // Clear missing parameters.
--    |  stwx TISNIL, BASE, NARGS8:RC
-+    |  stwx TISNIL, BASE_HI, NARGS8:RC
-     |  addi NARGS8:RC, NARGS8:RC, 8
-     |  b <2
-     break;
-@@ -5011,11 +5406,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  lwz TMP2, L->maxstack
-     |   add TMP1, BASE, RC
-     |  add TMP0, RA, RC
--    |   stw LFUNC:RB, 4(TMP1)		// Store copy of LFUNC.
-+    |   stw LFUNC:RB, WORD_LO(TMP1)	// Store copy of LFUNC.
-     |   addi TMP3, RC, 8+FRAME_VARG
-     |    lwz KBASE, -4+PC2PROTO(k)(PC)
-     |  cmplw TMP0, TMP2
--    |   stw TMP3, 0(TMP1)		// Store delta + FRAME_VARG.
-+    |   stw TMP3, WORD_HI(TMP1)		// Store delta + FRAME_VARG.
-     |  bge ->vm_growstack_l
-     |  lbz TMP2, -4+PC2PROTO(numparams)(PC)
-     |   mr RA, BASE
-@@ -5026,18 +5421,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-     |  beq >3
-     |1:
-     |  cmplw RA, RC			// Less args than parameters?
--    |   lwz TMP0, 0(RA)
--    |   lwz TMP3, 4(RA)
-+    |   lwz TMP0, WORD_HI(RA)
-+    |   lwz TMP3, WORD_LO(RA)
-     |  bge >4
--    |    stw TISNIL, 0(RA)		// Clear old fixarg slot (help the GC).
-+    |    stw TISNIL, WORD_HI(RA)	// Clear old fixarg slot (help the GC).
-     |    addi RA, RA, 8
-     |2:
-     |  addic. TMP2, TMP2, -1
--    |   stw TMP0, 8(TMP1)
--    |   stw TMP3, 12(TMP1)
-+    |   stw TMP0, WORD_HI+8(TMP1)
-+    |   stw TMP3, WORD_LO+8(TMP1)
-     |    addi TMP1, TMP1, 8
-     |  bne <1
-     |3:
-+    |  addi BASEP4, BASE, 4
-     |  ins_next2
-     |
-     |4:  // Clear missing parameters.
-@@ -5049,35 +5445,35 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
-   case BC_FUNCCW:
-     |  // BASE = new base, RA = BASE+framesize*8, RB = CFUNC, RC = nargs*8
-     if (op == BC_FUNCC) {
--      |  lp RD, CFUNC:RB->f
-+      |  lp FUNCREG, CFUNC:RB->f
-     } else {
--      |  lp RD, DISPATCH_GL(wrapf)(DISPATCH)
-+      |  lp FUNCREG, DISPATCH_GL(wrapf)(DISPATCH)
-     }
-     |   add TMP1, RA, NARGS8:RC
-     |   lwz TMP2, L->maxstack
--    |  .toc lp TMP3, 0(RD)
-+    |  .opd lp TMP3, 0(FUNCREG)
-     |    add RC, BASE, NARGS8:RC
-     |   stp BASE, L->base
-     |   cmplw TMP1, TMP2
-     |    stp RC, L->top
-     |     li_vmstate C
--    |.if TOC
-+    |.if OPD
-     |  mtctr TMP3
-     |.else
--    |  mtctr RD
-+    |  mtctr FUNCREG
-     |.endif
-     if (op == BC_FUNCCW) {
-       |  lp CARG2, CFUNC:RB->f
-     }
-     |  mr CARG1, L
-     |   bgt ->vm_growstack_c		// Need to grow stack.
--    |  .toc lp TOCREG, TOC_OFS(RD)
--    |  .tocenv lp ENVREG, ENV_OFS(RD)
-+    |  .opd lp TOCREG, TOC_OFS(FUNCREG)
-+    |  .opdenv lp ENVREG, ENV_OFS(FUNCREG)
-     |     st_vmstate
-     |  bctrl				// (lua_State *L [, lua_CFunction f])
-+    |  .toc lp TOCREG, SAVE_TOC
-     |  // Returns nresults.
-     |  lp BASE, L->base
--    |  .toc ld TOCREG, SAVE_TOC
-     |   slwi RD, CRET1, 3
-     |  lp TMP1, L->top
-     |    li_vmstate INTERP
-@@ -5128,7 +5524,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.byte 0xc\n\t.uleb128 1\n\t.uleb128 0\n"
- 	"\t.align 2\n"
-@@ -5141,14 +5541,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long .Lbegin\n"
- 	"\t.long %d\n"
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE0:\n\n");
-@@ -5164,8 +5574,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call\n"
- #endif
- 	"\t.long %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
-@@ -5180,7 +5594,11 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.byte 0x1\n"
- 	"\t.string \"zPR\"\n"
- 	"\t.uleb128 0x1\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.sleb128 -8\n"
-+#else
- 	"\t.sleb128 -4\n"
-+#endif
- 	"\t.byte 65\n"
- 	"\t.uleb128 6\n"			/* augmentation length */
- 	"\t.byte 0x1b\n"			/* pcrel|sdata4 */
-@@ -5198,14 +5616,24 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
- 	"\t.byte 0xe\n\t.uleb128 %d\n"
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+	"\t.byte 0x11\n\t.uleb128 70\n\t.sleb128 -1\n",
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
- 	"\t.byte 0x5\n\t.uleb128 70\n\t.uleb128 55\n",
-+#endif
- 	fcofs, CFRAME_SIZE);
-     for (i = 14; i <= 31; i++)
-       fprintf(ctx->fp,
- 	"\t.byte %d\n\t.uleb128 %d\n"
- 	"\t.byte %d\n\t.uleb128 %d\n",
--	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i));
-+#if LJ_ARCH_PPC32ON64
-+	0x80+i, 19+(31-i), 0x80+32+i, 1+(31-i)
-+#else
-+	0x80+i, 37+(31-i), 0x80+32+i, 2+2*(31-i)
-+#endif
-+      );
-     fprintf(ctx->fp,
- 	"\t.align 2\n"
- 	".LEFDE2:\n\n");
-@@ -5233,8 +5661,12 @@ static void emit_asm_debug(BuildCtx *ctx)
- 	"\t.long lj_vm_ffi_call-.\n"
- 	"\t.long %d\n"
- 	"\t.uleb128 0\n"			/* augmentation length */
-+#if LJ_ARCH_PPC32ON64
-+	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -2\n"
-+#else
- 	"\t.byte 0x11\n\t.uleb128 65\n\t.sleb128 -1\n"
--	"\t.byte 0x8e\n\t.uleb128 2\n"
-+#endif
-+	"\t.byte 0x8e\n\t.uleb128 1\n"
- 	"\t.byte 0xd\n\t.uleb128 0xe\n"
- 	"\t.align 2\n"
- 	".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
diff --git a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch b/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
deleted file mode 100644
index f4e760b738361..0000000000000
--- a/srcpkgs/LuaJIT/files/patches/ppc64/fix-vm-jit-ppc64.patch
+++ /dev/null
@@ -1,11 +0,0 @@
---- a/src/vm_ppc.dasc	2019-06-03 19:41:50.214671731 +0200
-+++ b/src/vm_ppc.dasc	2019-06-03 19:44:40.229686143 +0200
-@@ -2774,7 +2774,7 @@
-   |
-   |->vm_exit_handler:
-   |.if JIT
--  |  addi sp, TMP0, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-+  |  addi sp, sp, -(EXIT_OFFSET+32*8+32*PSIZE)
-   |  saver 3 // CARG1
-   |  saver 4 // CARG2
-   |  saver 5 // CARG3
diff --git a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch b/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
deleted file mode 100644
index 487a1cd1ca787..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-Fix-exit-stub-patching.patch
+++ /dev/null
@@ -1,231 +0,0 @@
-commit 9da06535092d6d9dec442641a26c64bce5574322
-Author: Mike Pall <mike>
-Date:   Sun Jun 24 14:08:59 2018 +0200
-
-    ARM64: Fix exit stub patching.
-    
-    Contributed by Javier Guerra Giraldez.
-
-diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
-index cbb186d3..baafa21a 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -56,11 +56,11 @@ static void asm_exitstub_setup(ASMState *as, ExitNo nexits)
-     asm_mclimit(as);
-   /* 1: str lr,[sp]; bl ->vm_exit_handler; movz w0,traceno; bl <1; bl <1; ... */
-   for (i = nexits-1; (int32_t)i >= 0; i--)
--    *--mxp = A64I_LE(A64I_BL|((-3-i)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_MOVZw|A64F_U16(as->T->traceno));
-+    *--mxp = A64I_LE(A64I_BL | A64F_S26(-3-i));
-+  *--mxp = A64I_LE(A64I_MOVZw | A64F_U16(as->T->traceno));
-   mxp--;
--  *mxp = A64I_LE(A64I_BL|(((MCode *)(void *)lj_vm_exit_handler-mxp)&0x03ffffffu));
--  *--mxp = A64I_LE(A64I_STRx|A64F_D(RID_LR)|A64F_N(RID_SP));
-+  *mxp = A64I_LE(A64I_BL | A64F_S26(((MCode *)(void *)lj_vm_exit_handler-mxp)));
-+  *--mxp = A64I_LE(A64I_STRx | A64F_D(RID_LR) | A64F_N(RID_SP));
-   as->mctop = mxp;
- }
- 
-@@ -77,7 +77,7 @@ static void asm_guardcc(ASMState *as, A64CC cc)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cond_branch(as, cc^1, p-1);
-     return;
-   }
-@@ -91,7 +91,7 @@ static void asm_guardtnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_tnb(as, ai^0x01000000u, r, bit, p-1);
-     return;
-   }
-@@ -105,7 +105,7 @@ static void asm_guardcnb(ASMState *as, A64Ins ai, Reg r)
-   MCode *p = as->mcp;
-   if (LJ_UNLIKELY(p == as->invmcp)) {
-     as->loopinv = 1;
--    *p = A64I_B | ((target-p) & 0x03ffffffu);
-+    *p = A64I_B | A64F_S26(target-p);
-     emit_cnb(as, ai^0x01000000u, r, p-1);
-     return;
-   }
-@@ -1850,7 +1850,7 @@ static void asm_loop_fixup(ASMState *as)
-     p[-2] |= ((uint32_t)delta & mask) << 5;
-   } else {
-     ptrdiff_t delta = target - (p - 1);
--    p[-1] = A64I_B | ((uint32_t)(delta) & 0x03ffffffu);
-+    p[-1] = A64I_B | A64F_S26(delta);
-   }
- }
- 
-@@ -1919,7 +1919,7 @@ static void asm_tail_fixup(ASMState *as, TraceNo lnk)
-   }
-   /* Patch exit branch. */
-   target = lnk ? traceref(as->J, lnk)->mcode : (MCode *)lj_vm_exit_interp;
--  p[-1] = A64I_B | (((target-p)+1)&0x03ffffffu);
-+  p[-1] = A64I_B | A64F_S26((target-p)+1);
- }
- 
- /* Prepare tail of code. */
-@@ -1982,40 +1982,50 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target)
- {
-   MCode *p = T->mcode;
-   MCode *pe = (MCode *)((char *)p + T->szmcode);
--  MCode *cstart = NULL, *cend = p;
-+  MCode *cstart = NULL;
-   MCode *mcarea = lj_mcode_patch(J, p, 0);
-   MCode *px = exitstub_trace_addr(T, exitno);
-+  /* Note: this assumes a trace exit is only ever patched once. */
-   for (; p < pe; p++) {
-     /* Look for exitstub branch, replace with branch to target. */
-+    ptrdiff_t delta = target - p;
-     MCode ins = A64I_LE(*p);
-     if ((ins & 0xff000000u) == 0x54000000u &&
- 	((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch bcc exitstub. */
--      *p = A64I_LE((ins & 0xff00001fu) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch bcc, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0xfc000000u) == 0x14000000u &&
- 	       ((ins ^ (px-p)) & 0x03ffffffu) == 0) {
--      /* Patch b exitstub. */
--      *p = A64I_LE((ins & 0xfc000000u) | ((target-p) & 0x03ffffffu));
--      cend = p+1;
-+      /* Patch b. */
-+      lua_assert(A64F_S_OK(delta, 26));
-+      *p = A64I_LE((ins & 0xfc000000u) | A64F_S26(delta));
-       if (!cstart) cstart = p;
-     } else if ((ins & 0x7e000000u) == 0x34000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) {
--      /* Patch cbz/cbnz exitstub. */
--      *p = A64I_LE((ins & 0xff00001f) | (((target-p)<<5) & 0x00ffffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch cbz/cbnz, if within range. */
-+      if (A64F_S_OK(delta, 19)) {
-+	*p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta));
-+	if (!cstart) cstart = p;
-+      }
-     } else if ((ins & 0x7e000000u) == 0x36000000u &&
- 	       ((ins ^ ((px-p)<<5)) & 0x0007ffe0u) == 0) {
--      /* Patch tbz/tbnz exitstub. */
--      *p = A64I_LE((ins & 0xfff8001fu) | (((target-p)<<5) & 0x0007ffe0u));
--      cend = p+1;
--      if (!cstart) cstart = p;
-+      /* Patch tbz/tbnz, if within range. */
-+      if (A64F_S_OK(delta, 14)) {
-+	*p = A64I_LE((ins & 0xfff8001fu) | A64F_S14(delta));
-+	if (!cstart) cstart = p;
-+      }
-     }
-   }
--  lua_assert(cstart != NULL);
--  lj_mcode_sync(cstart, cend);
-+  {  /* Always patch long-range branch in exit stub itself. */
-+    ptrdiff_t delta = target - px;
-+    lua_assert(A64F_S_OK(delta, 26));
-+    *px = A64I_B | A64F_S26(delta);
-+    if (!cstart) cstart = px;
-+  }
-+  lj_mcode_sync(cstart, px+1);
-   lj_mcode_patch(J, mcarea, 1);
- }
- 
-diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
-index 6da4c7d4..1001b1d8 100644
---- a/src/lj_emit_arm64.h
-+++ b/src/lj_emit_arm64.h
-@@ -241,7 +241,7 @@ static void emit_loadk(ASMState *as, Reg rd, uint64_t u64, int is64)
- #define mcpofs(as, k) \
-   ((intptr_t)((uintptr_t)(k) - (uintptr_t)(as->mcp - 1)))
- #define checkmcpofs(as, k) \
--  ((((mcpofs(as, k)>>2) + 0x00040000) >> 19) == 0)
-+  (A64F_S_OK(mcpofs(as, k)>>2, 19))
- 
- static Reg ra_allock(ASMState *as, intptr_t k, RegSet allow);
- 
-@@ -312,7 +312,7 @@ static void emit_cond_branch(ASMState *as, A64CC cond, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = A64I_BCC | A64F_S19(delta) | cond;
- }
- 
-@@ -320,24 +320,24 @@ static void emit_branch(ASMState *as, A64Ins ai, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x02000000) >> 26) == 0);
--  *p = ai | ((uint32_t)delta & 0x03ffffffu);
-+  lua_assert(A64F_S_OK(delta, 26));
-+  *p = ai | A64F_S26(delta);
- }
- 
- static void emit_tnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(bit < 63 && ((delta + 0x2000) >> 14) == 0);
-+  lua_assert(bit < 63 && A64F_S_OK(delta, 14));
-   if (bit > 31) ai |= A64I_X;
--  *p = ai | A64F_BIT(bit & 31) | A64F_S14((uint32_t)delta & 0x3fffu) | r;
-+  *p = ai | A64F_BIT(bit & 31) | A64F_S14(delta) | r;
- }
- 
- static void emit_cnb(ASMState *as, A64Ins ai, Reg r, MCode *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = target - p;
--  lua_assert(((delta + 0x40000) >> 19) == 0);
-+  lua_assert(A64F_S_OK(delta, 19));
-   *p = ai | A64F_S19(delta) | r;
- }
- 
-@@ -347,8 +347,8 @@ static void emit_call(ASMState *as, void *target)
- {
-   MCode *p = --as->mcp;
-   ptrdiff_t delta = (char *)target - (char *)p;
--  if ((((delta>>2) + 0x02000000) >> 26) == 0) {
--    *p = A64I_BL | ((uint32_t)(delta>>2) & 0x03ffffffu);
-+  if (A64F_S_OK(delta>>2, 26)) {
-+    *p = A64I_BL | A64F_S26(delta>>2);
-   } else {  /* Target out of range: need indirect call. But don't use R0-R7. */
-     Reg r = ra_allock(as, i64ptr(target),
- 		      RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED);
-diff --git a/src/lj_target_arm64.h b/src/lj_target_arm64.h
-index 520023ae..a207a2ba 100644
---- a/src/lj_target_arm64.h
-+++ b/src/lj_target_arm64.h
-@@ -132,9 +132,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_IMMR(x)	((x) << 16)
- #define A64F_U16(x)	((x) << 5)
- #define A64F_U12(x)	((x) << 10)
--#define A64F_S26(x)	(x)
-+#define A64F_S26(x)	(((uint32_t)(x) & 0x03ffffffu))
- #define A64F_S19(x)	(((uint32_t)(x) & 0x7ffffu) << 5)
--#define A64F_S14(x)	((x) << 5)
-+#define A64F_S14(x)	(((uint32_t)(x) & 0x3fffu) << 5)
- #define A64F_S9(x)	((x) << 12)
- #define A64F_BIT(x)	((x) << 19)
- #define A64F_SH(sh, x)	(((sh) << 22) | ((x) << 10))
-@@ -145,6 +145,9 @@ static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno)
- #define A64F_LSL16(x)	(((x) / 16) << 21)
- #define A64F_BSH(sh)	((sh) << 10)
- 
-+/* Check for valid field range. */
-+#define A64F_S_OK(x, b)	((((x) + (1 << (b-1))) >> (b)) == 0)
-+
- typedef enum A64Ins {
-   A64I_S = 0x20000000,
-   A64I_X = 0x80000000,
diff --git a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch b/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
deleted file mode 100644
index c30264786755f..0000000000000
--- a/srcpkgs/LuaJIT/patches/aarch64-register-allocation-bug-fix.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From: Jason Teplitz <jason@tensyr.com>
-Date: Mon, 9 Oct 2017 23:03:09 +0000
-Subject: Fix register allocation bug in arm64
-
----
- src/lj_asm_arm64.h | 3 +--
- 1 file changed, 1 insertion(+), 2 deletions(-)
-
-diff --git src/lj_asm_arm64.h src/lj_asm_arm64.h
-index 8fd92e7..549f8a6 100644
---- a/src/lj_asm_arm64.h
-+++ b/src/lj_asm_arm64.h
-@@ -871,7 +871,7 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   int bigofs = !emit_checkofs(A64I_LDRx, ofs);
-   RegSet allow = RSET_GPR;
-   Reg dest = (ra_used(ir) || bigofs) ? ra_dest(as, ir, RSET_GPR) : RID_NONE;
--  Reg node = ra_alloc1(as, ir->op1, allow);
-+  Reg node = ra_alloc1(as, ir->op1, ra_hasreg(dest) ? rset_clear(allow, dest) : allow);
-   Reg key = ra_scratch(as, rset_clear(allow, node));
-   Reg idx = node;
-   uint64_t k;
-@@ -879,7 +879,6 @@ static void asm_hrefk(ASMState *as, IRIns *ir)
-   rset_clear(allow, key);
-   if (bigofs) {
-     idx = dest;
--    rset_clear(allow, dest);
-     kofs = (int32_t)offsetof(Node, key);
-   } else if (ra_hasreg(dest)) {
-     emit_opk(as, A64I_ADDx, dest, node, ofs, allow);
diff --git a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch b/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
deleted file mode 100644
index a217866c392cf..0000000000000
--- a/srcpkgs/LuaJIT/patches/e9af1abec542e6f9851ff2368e7f196b6382a44c.patch
+++ /dev/null
@@ -1,562 +0,0 @@
-From e9af1abec542e6f9851ff2368e7f196b6382a44c Mon Sep 17 00:00:00 2001
-From: Mike Pall <mike>
-Date: Wed, 30 Sep 2020 01:31:27 +0200
-Subject: [PATCH] Add support for full-range 64 bit lightuserdata.
-
----
- doc/status.html   | 11 ---------
- src/jit/dump.lua  |  4 +++-
- src/lib_debug.c   | 12 +++++-----
- src/lib_jit.c     | 14 ++++++------
- src/lib_package.c |  8 +++----
- src/lib_string.c  |  2 +-
- src/lj_api.c      | 40 +++++++++++++++++++++++++++++----
- src/lj_ccall.c    |  2 +-
- src/lj_cconv.c    |  2 +-
- src/lj_crecord.c  |  6 ++---
- src/lj_dispatch.c |  2 +-
- src/lj_ir.c       |  6 +++--
- src/lj_obj.c      |  5 +++--
- src/lj_obj.h      | 57 ++++++++++++++++++++++++++++++-----------------
- src/lj_snap.c     |  9 +++++++-
- src/lj_state.c    |  6 +++++
- src/lj_strfmt.c   |  2 +-
- 17 files changed, 121 insertions(+), 67 deletions(-)
-
-#diff --git a/doc/status.html b/doc/status.html
-#index 0aafe13a2..fd0ae8bae 100644
-#--- a/doc/status.html
-#+++ b/doc/status.html
-#@@ -91,17 +91,6 @@ <h2>Current Status</h2>
-# <tt>lua_atpanic</tt> on x64. This issue will be fixed with the new
-# garbage collector.
-# </li>
-#-<li>
-#-LuaJIT on 64 bit systems provides a <b>limited range</b> of 47 bits for the
-#-<b>legacy <tt>lightuserdata</tt></b> data type.
-#-This is only relevant on x64 systems which use the negative part of the
-#-virtual address space in user mode, e.g. Solaris/x64, and on ARM64 systems
-#-configured with a 48 bit or 52 bit VA.
-#-Avoid using <tt>lightuserdata</tt> to hold pointers that may point outside
-#-of that range, e.g. variables on the stack. In general, avoid this data
-#-type for new code and replace it with (much more performant) FFI bindings.
-#-FFI cdata pointers can address the full 64 bit range.
-#-</li>
-# </ul>
-# <br class="flush">
-# </div>
-Index: luajit/src/jit/dump.lua
-===================================================================
---- luajit.orig/src/jit/dump.lua
-+++ luajit/src/jit/dump.lua
-@@ -315,7 +315,9 @@
-   local tn = type(k)
-   local s
-   if tn == "number" then
--    if band(sn or 0, 0x30000) ~= 0 then
-+    if t < 12 then
-+      s = k == 0 and "NULL" or format("[0x%08x]", k)
-+    elseif band(sn or 0, 0x30000) ~= 0 then
-       s = band(sn, 0x20000) ~= 0 and "contpc" or "ftsz"
-     elseif k == 2^52+2^51 then
-       s = "bias"
-Index: luajit/src/lib_debug.c
-===================================================================
---- luajit.orig/src/lib_debug.c
-+++ luajit/src/lib_debug.c
-@@ -231,8 +231,8 @@
-   int32_t n = lj_lib_checkint(L, 2) - 1;
-   if ((uint32_t)n >= fn->l.nupvalues)
-     lj_err_arg(L, 2, LJ_ERR_IDXRNG);
--  setlightudV(L->top-1, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
--					(void *)&fn->c.upvalue[n]);
-+  lua_pushlightuserdata(L, isluafunc(fn) ? (void *)gcref(fn->l.uvptr[n]) :
-+					   (void *)&fn->c.upvalue[n]);
-   return 1;
- }
- 
-@@ -283,13 +283,13 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define KEY_HOOK	((void *)0x3004)
-+#define KEY_HOOK	(U64x(80000000,00000000)|'h')
- 
- static void hookf(lua_State *L, lua_Debug *ar)
- {
-   static const char *const hooknames[] =
-     {"call", "return", "line", "count", "tail return"};
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_rawget(L, LUA_REGISTRYINDEX);
-   if (lua_isfunction(L, -1)) {
-     lua_pushstring(L, hooknames[(int)ar->event]);
-@@ -334,7 +334,7 @@
-     count = luaL_optint(L, arg+3, 0);
-     func = hookf; mask = makemask(smask, count);
-   }
--  lua_pushlightuserdata(L, KEY_HOOK);
-+  (L->top++)->u64 = KEY_HOOK;
-   lua_pushvalue(L, arg+1);
-   lua_rawset(L, LUA_REGISTRYINDEX);
-   lua_sethook(L, func, mask, count);
-@@ -349,7 +349,7 @@
-   if (hook != NULL && hook != hookf) {  /* external hook? */
-     lua_pushliteral(L, "external hook");
-   } else {
--    lua_pushlightuserdata(L, KEY_HOOK);
-+    (L->top++)->u64 = KEY_HOOK;
-     lua_rawget(L, LUA_REGISTRYINDEX);   /* get hook */
-   }
-   lua_pushstring(L, unmakemask(mask, buff));
-Index: luajit/src/lib_jit.c
-===================================================================
---- luajit.orig/src/lib_jit.c
-+++ luajit/src/lib_jit.c
-@@ -540,15 +540,15 @@
- 
- /* Not loaded by default, use: local profile = require("jit.profile") */
- 
--static const char KEY_PROFILE_THREAD = 't';
--static const char KEY_PROFILE_FUNC = 'f';
-+#define KEY_PROFILE_THREAD	(U64x(80000000,00000000)|'t')
-+#define KEY_PROFILE_FUNC	(U64x(80000000,00000000)|'f')
- 
- static void jit_profile_callback(lua_State *L2, lua_State *L, int samples,
- 				 int vmstate)
- {
-   TValue key;
-   cTValue *tv;
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   tv = lj_tab_get(L, tabV(registry(L)), &key);
-   if (tvisfunc(tv)) {
-     char vmst = (char)vmstate;
-@@ -575,9 +575,9 @@
-   lua_State *L2 = lua_newthread(L);  /* Thread that runs profiler callback. */
-   TValue key;
-   /* Anchor thread and function in registry. */
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setthreadV(L, lj_tab_set(L, registry, &key), L2);
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setfuncV(L, lj_tab_set(L, registry, &key), func);
-   lj_gc_anybarriert(L, registry);
-   luaJIT_profile_start(L, mode ? strdata(mode) : "",
-@@ -592,9 +592,9 @@
-   TValue key;
-   luaJIT_profile_stop(L);
-   registry = tabV(registry(L));
--  setlightudV(&key, (void *)&KEY_PROFILE_THREAD);
-+  key.u64 = KEY_PROFILE_THREAD;
-   setnilV(lj_tab_set(L, registry, &key));
--  setlightudV(&key, (void *)&KEY_PROFILE_FUNC);
-+  key.u64 = KEY_PROFILE_FUNC;
-   setnilV(lj_tab_set(L, registry, &key));
-   lj_gc_anybarriert(L, registry);
-   return 0;
-Index: luajit/src/lib_package.c
-===================================================================
---- luajit.orig/src/lib_package.c
-+++ luajit/src/lib_package.c
-@@ -398,7 +398,7 @@
- 
- /* ------------------------------------------------------------------------ */
- 
--#define sentinel	((void *)0x4004)
-+#define KEY_SENTINEL	(U64x(80000000,00000000)|'s')
- 
- static int lj_cf_package_require(lua_State *L)
- {
-@@ -408,7 +408,7 @@
-   lua_getfield(L, LUA_REGISTRYINDEX, "_LOADED");
-   lua_getfield(L, 2, name);
-   if (lua_toboolean(L, -1)) {  /* is it there? */
--    if (lua_touserdata(L, -1) == sentinel)  /* check loops */
-+    if ((L->top-1)->u64 == KEY_SENTINEL)  /* check loops */
-       luaL_error(L, "loop or previous error loading module " LUA_QS, name);
-     return 1;  /* package is already loaded */
-   }
-@@ -431,14 +431,14 @@
-     else
-       lua_pop(L, 1);
-   }
--  lua_pushlightuserdata(L, sentinel);
-+  (L->top++)->u64 = KEY_SENTINEL;
-   lua_setfield(L, 2, name);  /* _LOADED[name] = sentinel */
-   lua_pushstring(L, name);  /* pass name as argument to module */
-   lua_call(L, 1, 1);  /* run loaded module */
-   if (!lua_isnil(L, -1))  /* non-nil return? */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = returned value */
-   lua_getfield(L, 2, name);
--  if (lua_touserdata(L, -1) == sentinel) {   /* module did not set a value? */
-+  if ((L->top-1)->u64 == KEY_SENTINEL) {   /* module did not set a value? */
-     lua_pushboolean(L, 1);  /* use true as result */
-     lua_pushvalue(L, -1);  /* extra copy to be returned */
-     lua_setfield(L, 2, name);  /* _LOADED[name] = true */
-Index: luajit/src/lib_string.c
-===================================================================
---- luajit.orig/src/lib_string.c
-+++ luajit/src/lib_string.c
-@@ -714,7 +714,7 @@
- 	lj_strfmt_putfchar(sb, sf, lj_lib_checkint(L, arg));
- 	break;
-       case STRFMT_PTR:  /* No formatting. */
--	lj_strfmt_putptr(sb, lj_obj_ptr(L->base+arg-1));
-+	lj_strfmt_putptr(sb, lj_obj_ptr(G(L), L->base+arg-1));
- 	break;
-       default:
- 	lua_assert(0);
-Index: luajit/src/lj_api.c
-===================================================================
---- luajit.orig/src/lj_api.c
-+++ luajit/src/lj_api.c
-@@ -595,7 +595,7 @@
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(G(L), o);
-   else
-     return NULL;
- }
-@@ -608,7 +608,7 @@
- 
- LUA_API const void *lua_topointer(lua_State *L, int idx)
- {
--  return lj_obj_ptr(index2adr(L, idx));
-+  return lj_obj_ptr(G(L), index2adr(L, idx));
- }
- 
- /* -- Stack setters (object creation) ------------------------------------- */
-@@ -694,9 +694,38 @@
-   incr_top(L);
- }
- 
-+#if LJ_64
-+static void *lightud_intern(lua_State *L, void *p)
-+{
-+  global_State *g = G(L);
-+  uint64_t u = (uint64_t)p;
-+  uint32_t up = lightudup(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  MSize segnum = g->gc.lightudnum;
-+  if (segmap) {
-+    MSize seg;
-+    for (seg = 0; seg <= segnum; seg++)
-+      if (segmap[seg] == up)  /* Fast path. */
-+	return (void *)(((uint64_t)seg << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+    segnum++;
-+  }
-+  if (!((segnum-1) & segnum) && segnum != 1) {
-+    if (segnum >= (1 << LJ_LIGHTUD_BITS_SEG)) lj_err_msg(L, LJ_ERR_BADLU);
-+    lj_mem_reallocvec(L, segmap, segnum, segnum ? 2*segnum : 2u, uint32_t);
-+    setmref(g->gc.lightudseg, segmap);
-+  }
-+  g->gc.lightudnum = segnum;
-+  segmap[segnum] = up;
-+  return (void *)(((uint64_t)segnum << LJ_LIGHTUD_BITS_LO) | lightudlo(u));
-+}
-+#endif
-+
- LUA_API void lua_pushlightuserdata(lua_State *L, void *p)
- {
--  setlightudV(L->top, checklightudptr(L, p));
-+#if LJ_64
-+  p = lightud_intern(L, p);
-+#endif
-+  setrawlightudV(L->top, p);
-   incr_top(L);
- }
- 
-@@ -1138,7 +1167,10 @@
-   fn->c.f = func;
-   setfuncV(L, top++, fn);
-   if (LJ_FR2) setnilV(top++);
--  setlightudV(top++, checklightudptr(L, ud));
-+#if LJ_64
-+  ud = lightud_intern(L, ud);
-+#endif
-+  setrawlightudV(top++, ud);
-   cframe_nres(L->cframe) = 1+0;  /* Zero results. */
-   L->top = top;
-   return top-1;  /* Now call the newly allocated C function. */
-Index: luajit/src/lj_ccall.c
-===================================================================
---- luajit.orig/src/lj_ccall.c
-+++ luajit/src/lj_ccall.c
-@@ -1314,7 +1314,7 @@
-     lj_vm_ffi_call(&cc);
-     if (cts->cb.slot != ~0u) {  /* Blacklist function that called a callback. */
-       TValue tv;
--      setlightudV(&tv, (void *)cc.func);
-+      tv.u64 = ((uintptr_t)(void *)cc.func >> 2) | U64x(800000000, 00000000);
-       setboolV(lj_tab_set(L, cts->miscmap, &tv), 1);
-     }
-     ct = (CType *)((intptr_t)ct+(intptr_t)cts->tab);  /* May be reallocated. */
-Index: luajit/src/lj_cconv.c
-===================================================================
---- luajit.orig/src/lj_cconv.c
-+++ luajit/src/lj_cconv.c
-@@ -611,7 +611,7 @@
-     if (ud->udtype == UDTYPE_IO_FILE)
-       tmpptr = *(void **)tmpptr;
-   } else if (tvislightud(o)) {
--    tmpptr = lightudV(o);
-+    tmpptr = lightudV(cts->g, o);
-   } else if (tvisfunc(o)) {
-     void *p = lj_ccallback_new(cts, d, funcV(o));
-     if (p) {
-Index: luajit/src/lj_crecord.c
-===================================================================
---- luajit.orig/src/lj_crecord.c
-+++ luajit/src/lj_crecord.c
-@@ -643,8 +643,7 @@
-     }
-   } else if (tref_islightud(sp)) {
- #if LJ_64
--    sp = emitir(IRT(IR_BAND, IRT_P64), sp,
--		lj_ir_kint64(J, U64x(00007fff,ffffffff)));
-+    lj_trace_err(J, LJ_TRERR_NYICONV);
- #endif
-   } else {  /* NYI: tref_istab(sp). */
-     IRType t;
-@@ -1209,8 +1208,7 @@
-     TRef tr;
-     TValue tv;
-     /* Check for blacklisted C functions that might call a callback. */
--    setlightudV(&tv,
--		cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4));
-+    tv.u64 = ((uintptr_t)cdata_getptr(cdataptr(cd), (LJ_64 && tp == IRT_P64) ? 8 : 4) >> 2) | U64x(800000000, 00000000);
-     if (tvistrue(lj_tab_get(J->L, cts->miscmap, &tv)))
-       lj_trace_err(J, LJ_TRERR_BLACKL);
-     if (ctype_isvoid(ctr->info)) {
-Index: luajit/src/lj_dispatch.c
-===================================================================
---- luajit.orig/src/lj_dispatch.c
-+++ luajit/src/lj_dispatch.c
-@@ -302,7 +302,7 @@
-       if (idx != 0) {
- 	cTValue *tv = idx > 0 ? L->base + (idx-1) : L->top + idx;
- 	if (tvislightud(tv))
--	  g->wrapf = (lua_CFunction)lightudV(tv);
-+	  g->wrapf = (lua_CFunction)lightudV(g, tv);
- 	else
- 	  return 0;  /* Failed. */
-       } else {
-Index: luajit/src/lj_ir.c
-===================================================================
---- luajit.orig/src/lj_ir.c
-+++ luajit/src/lj_ir.c
-@@ -386,8 +386,10 @@
-   case IR_KPRI: setpriV(tv, irt_toitype(ir->t)); break;
-   case IR_KINT: setintV(tv, ir->i); break;
-   case IR_KGC: setgcV(L, tv, ir_kgc(ir), irt_toitype(ir->t)); break;
--  case IR_KPTR: case IR_KKPTR: setlightudV(tv, ir_kptr(ir)); break;
--  case IR_KNULL: setlightudV(tv, NULL); break;
-+  case IR_KPTR: case IR_KKPTR:
-+    setnumV(tv, (lua_Number)(uintptr_t)ir_kptr(ir));
-+    break;
-+  case IR_KNULL: setintV(tv, 0); break;
-   case IR_KNUM: setnumV(tv, ir_knum(ir)->n); break;
- #if LJ_HASFFI
-   case IR_KINT64: {
-Index: luajit/src/lj_obj.c
-===================================================================
---- luajit.orig/src/lj_obj.c
-+++ luajit/src/lj_obj.c
-@@ -34,12 +34,13 @@
- }
- 
- /* Return pointer to object or its object data. */
--const void * LJ_FASTCALL lj_obj_ptr(cTValue *o)
-+const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o)
- {
-+  UNUSED(g);
-   if (tvisudata(o))
-     return uddata(udataV(o));
-   else if (tvislightud(o))
--    return lightudV(o);
-+    return lightudV(g, o);
-   else if (LJ_HASFFI && tviscdata(o))
-     return cdataptr(cdataV(o));
-   else if (tvisgcv(o))
-Index: luajit/src/lj_obj.h
-===================================================================
---- luajit.orig/src/lj_obj.h
-+++ luajit/src/lj_obj.h
-@@ -232,7 +232,7 @@
- **                  ---MSW---.---LSW---
- ** primitive types |  itype  |         |
- ** lightuserdata   |  itype  |  void * |  (32 bit platforms)
--** lightuserdata   |ffff|    void *    |  (64 bit platforms, 47 bit pointers)
-+** lightuserdata   |ffff|seg|    ofs   |  (64 bit platforms)
- ** GC objects      |  itype  |  GCRef  |
- ** int (LJ_DUALNUM)|  itype  |   int   |
- ** number           -------double------
-@@ -245,7 +245,8 @@
- **
- **                     ------MSW------.------LSW------
- ** primitive types    |1..1|itype|1..................1|
--** GC objects/lightud |1..1|itype|-------GCRef--------|
-+** GC objects         |1..1|itype|-------GCRef--------|
-+** lightuserdata      |1..1|itype|seg|------ofs-------|
- ** int (LJ_DUALNUM)   |1..1|itype|0..0|-----int-------|
- ** number              ------------double-------------
- **
-@@ -285,6 +286,12 @@
- #define LJ_GCVMASK		(((uint64_t)1 << 47) - 1)
- #endif
- 
-+#if LJ_64
-+/* To stay within 47 bits, lightuserdata is segmented. */
-+#define LJ_LIGHTUD_BITS_SEG	8
-+#define LJ_LIGHTUD_BITS_LO	(47 - LJ_LIGHTUD_BITS_SEG)
-+#endif
-+
- /* -- String object ------------------------------------------------------- */
- 
- /* String object header. String payload follows. */
-@@ -576,7 +583,11 @@
-   uint8_t currentwhite;	/* Current white color. */
-   uint8_t state;	/* GC state. */
-   uint8_t nocdatafin;	/* No cdata finalizer called. */
--  uint8_t unused2;
-+#if LJ_64
-+  uint8_t lightudnum;	/* Number of lightuserdata segments - 1. */
-+#else
-+  uint8_t unused1;
-+#endif
-   MSize sweepstr;	/* Sweep position in string table. */
-   GCRef root;		/* List of all collectable objects. */
-   MRef sweep;		/* Sweep position in root list. */
-@@ -588,6 +599,9 @@
-   GCSize estimate;	/* Estimate of memory actually in use. */
-   MSize stepmul;	/* Incremental GC step granularity. */
-   MSize pause;		/* Pause between successive GC cycles. */
-+#if LJ_64
-+  MRef lightudseg;	/* Upper bits of lightuserdata segments. */
-+#endif
- } GCState;
- 
- /* Global state, shared by all threads of a Lua universe. */
-@@ -795,10 +809,23 @@
- #endif
- #define boolV(o)	check_exp(tvisbool(o), (LJ_TFALSE - itype(o)))
- #if LJ_64
--#define lightudV(o) \
--  check_exp(tvislightud(o), (void *)((o)->u64 & U64x(00007fff,ffffffff)))
-+#define lightudseg(u) \
-+  (((u) >> LJ_LIGHTUD_BITS_LO) & ((1 << LJ_LIGHTUD_BITS_SEG)-1))
-+#define lightudlo(u) \
-+  ((u) & (((uint64_t)1 << LJ_LIGHTUD_BITS_LO) - 1))
-+#define lightudup(p) \
-+  ((uint32_t)(((p) >> LJ_LIGHTUD_BITS_LO) << (LJ_LIGHTUD_BITS_LO-32)))
-+static LJ_AINLINE void *lightudV(global_State *g, cTValue *o)
-+{
-+  uint64_t u = o->u64;
-+  uint64_t seg = lightudseg(u);
-+  uint32_t *segmap = mref(g->gc.lightudseg, uint32_t);
-+  lua_assert(tvislightud(o));
-+  lua_assert(seg <= g->gc.lightudnum);
-+  return (void *)(((uint64_t)segmap[seg] << 32) | lightudlo(u));
-+}
- #else
--#define lightudV(o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
-+#define lightudV(g, o)	check_exp(tvislightud(o), gcrefp((o)->gcr, void))
- #endif
- #define gcV(o)		check_exp(tvisgcv(o), gcval(o))
- #define strV(o)		check_exp(tvisstr(o), &gcval(o)->str)
-@@ -824,7 +851,7 @@
- #define setpriV(o, i)		(setitype((o), (i)))
- #endif
- 
--static LJ_AINLINE void setlightudV(TValue *o, void *p)
-+static LJ_AINLINE void setrawlightudV(TValue *o, void *p)
- {
- #if LJ_GC64
-   o->u64 = (uint64_t)p | (((uint64_t)LJ_TLIGHTUD) << 47);
-@@ -835,24 +862,14 @@
- #endif
- }
- 
--#if LJ_64
--#define checklightudptr(L, p) \
--  (((uint64_t)(p) >> 47) ? (lj_err_msg(L, LJ_ERR_BADLU), NULL) : (p))
--#else
--#define checklightudptr(L, p)	(p)
--#endif
--
--#if LJ_FR2
-+#if LJ_FR2 || LJ_32
- #define contptr(f)		((void *)(f))
- #define setcont(o, f)		((o)->u64 = (uint64_t)(uintptr_t)contptr(f))
--#elif LJ_64
-+#else
- #define contptr(f) \
-   ((void *)(uintptr_t)(uint32_t)((intptr_t)(f) - (intptr_t)lj_vm_asm_begin))
- #define setcont(o, f) \
-   ((o)->u64 = (uint64_t)(void *)(f) - (uint64_t)lj_vm_asm_begin)
--#else
--#define contptr(f)		((void *)(f))
--#define setcont(o, f)		setlightudV((o), contptr(f))
- #endif
- 
- #define tvchecklive(L, o) \
-@@ -978,6 +995,6 @@
- 
- /* Compare two objects without calling metamethods. */
- LJ_FUNC int LJ_FASTCALL lj_obj_equal(cTValue *o1, cTValue *o2);
--LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(cTValue *o);
-+LJ_FUNC const void * LJ_FASTCALL lj_obj_ptr(global_State *g, cTValue *o);
- 
- #endif
-Index: luajit/src/lj_snap.c
-===================================================================
---- luajit.orig/src/lj_snap.c
-+++ luajit/src/lj_snap.c
-@@ -626,7 +626,12 @@
-   IRType1 t = ir->t;
-   RegSP rs = ir->prev;
-   if (irref_isk(ref)) {  /* Restore constant slot. */
--    lj_ir_kvalue(J->L, o, ir);
-+    if (ir->o == IR_KPTR) {
-+      o->u64 = (uint64_t)(uintptr_t)ir_kptr(ir);
-+    } else {
-+      lua_assert(!(ir->o == IR_KKPTR || ir->o == IR_KNULL));
-+      lj_ir_kvalue(J->L, o, ir);
-+    }
-     return;
-   }
-   if (LJ_UNLIKELY(bloomtest(rfilt, ref)))
-Index: luajit/src/lj_state.c
-===================================================================
---- luajit.orig/src/lj_state.c
-+++ luajit/src/lj_state.c
-@@ -171,6 +171,12 @@
-   lj_mem_freevec(g, g->strhash, g->strmask+1, GCRef);
-   lj_buf_free(g, &g->tmpbuf);
-   lj_mem_freevec(g, tvref(L->stack), L->stacksize, TValue);
-+#if LJ_64
-+  if (mref(g->gc.lightudseg, uint32_t)) {
-+    MSize segnum = g->gc.lightudnum ? (2 << lj_fls(g->gc.lightudnum)) : 2;
-+    lj_mem_freevec(g, mref(g->gc.lightudseg, uint32_t), segnum, uint32_t);
-+  }
-+#endif
-   lua_assert(g->gc.total == sizeof(GG_State));
- #ifndef LUAJIT_USE_SYSMALLOC
-   if (g->allocf == lj_alloc_f)
-Index: luajit/src/lj_strfmt.c
-===================================================================
---- luajit.orig/src/lj_strfmt.c
-+++ luajit/src/lj_strfmt.c
-@@ -393,7 +393,7 @@
-       p = lj_buf_wmem(p, "builtin#", 8);
-       p = lj_strfmt_wint(p, funcV(o)->c.ffid);
-     } else {
--      p = lj_strfmt_wptr(p, lj_obj_ptr(o));
-+      p = lj_strfmt_wptr(p, lj_obj_ptr(G(L), o));
-     }
-     return lj_str_new(L, buf, (size_t)(p - buf));
-   }
diff --git a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch b/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
deleted file mode 100644
index 96e4c9106acf9..0000000000000
--- a/srcpkgs/LuaJIT/patches/enable-debug-symbols.patch
+++ /dev/null
@@ -1,24 +0,0 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Tue, 17 Nov 2015 16:27:11 +0100
-Subject: Enable debugging symbols in the build
-
----
- src/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git src/Makefile src/Makefile
-index 8a38efd..6b73a89 100644
---- a/src/Makefile
-+++ b/src/Makefile
-@@ -54,9 +54,9 @@ CCOPT_arm64=
- CCOPT_ppc=
- CCOPT_mips=
- #
--CCDEBUG=
-+#CCDEBUG=
- # Uncomment the next line to generate debug information:
--#CCDEBUG= -g
-+CCDEBUG= -g
- #
- CCWARN= -Wall
- # Uncomment the next line to enable more warnings:
diff --git a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch b/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
deleted file mode 100644
index f53e211071063..0000000000000
--- a/srcpkgs/LuaJIT/patches/fix-bcsave-ppc64.patch
+++ /dev/null
@@ -1,33 +0,0 @@
---- a/src/jit/bcsave.lua	2018-12-17 19:06:27.215042417 +0100
-+++ b/src/jit/bcsave.lua	2018-12-17 19:17:12.982477945 +0100
-@@ -64,7 +64,7 @@
- 
- local map_arch = {
-   x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true,
--  ppc = true, mips = true, mipsel = true,
-+  ppc = true, ppc64 = true, ppc64le = true, mips = true, mipsel = true,
- }
- 
- local map_os = {
-@@ -200,9 +200,10 @@
- ]]
-   local symname = LJBC_PREFIX..ctx.modname
-   local is64, isbe = false, false
--  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then
-+  if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" or ctx.arch == "ppc64" or ctx.arch == "ppc64le" then
-     is64 = true
--  elseif ctx.arch == "ppc" or ctx.arch == "mips" then
-+  end
-+  if ctx.arch == "ppc" or ctx.arch == "ppc64" or ctx.arch == "mips" then
-     isbe = true
-   end
- 
-@@ -237,7 +238,7 @@
-   hdr.eendian = isbe and 2 or 1
-   hdr.eversion = 1
-   hdr.type = f16(1)
--  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch])
-+  hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, ppc64=21, ppc64le=21, mips=8, mipsel=8 })[ctx.arch])
-   if ctx.arch == "mips" or ctx.arch == "mipsel" then
-     hdr.flags = f32(0x50001006)
-   end
diff --git a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
index 59e1ee72fcbb8..2527bbef29961 100644
--- a/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
+++ b/srcpkgs/LuaJIT/patches/get-rid-of-luajit-version-sym.patch
@@ -1,18 +1,8 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Thu, 19 Nov 2015 16:29:02 +0200
-Subject: Get rid of LUAJIT_VERSION_SYM that changes ABI on every patch release
-
----
- src/lj_dispatch.c | 5 -----
- src/luajit.c      | 2 --
- src/luajit.h      | 3 ---
- 3 files changed, 10 deletions(-)
-
-diff --git src/lj_dispatch.c src/lj_dispatch.c
-index 5d6795f..e865a78 100644
+diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
+index b9748bba..d09238f8 100644
 --- a/src/lj_dispatch.c
 +++ b/src/lj_dispatch.c
-@@ -319,11 +319,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
+@@ -318,11 +318,6 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
    return 1;  /* OK. */
  }
  
@@ -24,28 +14,28 @@ index 5d6795f..e865a78 100644
  /* -- Hooks --------------------------------------------------------------- */
  
  /* This function can be called asynchronously (e.g. during a signal). */
-diff --git src/luajit.c src/luajit.c
-index 1ca2430..ccf425e 100644
+diff --git a/src/luajit.c b/src/luajit.c
+index 73e29d44..31fdba18 100644
 --- a/src/luajit.c
 +++ b/src/luajit.c
-@@ -516,8 +516,6 @@ static int pmain(lua_State *L)
+@@ -515,7 +515,6 @@ static int pmain(lua_State *L)
+   int argn;
+   int flags = 0;
    globalL = L;
-   if (argv[0] && argv[0][0]) progname = argv[0];
- 
 -  LUAJIT_VERSION_SYM();  /* Linker-enforced version check. */
--
+ 
    argn = collectargs(argv, &flags);
    if (argn < 0) {  /* Invalid args? */
-     print_usage();
-diff --git src/luajit.h src/luajit.h
-index 708a5a1..35ae02c 100644
---- a/src/luajit.h
-+++ b/src/luajit.h
-@@ -73,7 +73,4 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
+diff --git a/src/luajit_rolling.h b/src/luajit_rolling.h
+index 2d04402c..5ab4167d 100644
+--- a/src/luajit_rolling.h
++++ b/src/luajit_rolling.h
+@@ -73,8 +73,5 @@ LUA_API void luaJIT_profile_stop(lua_State *L);
  LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt,
  					     int depth, size_t *len);
  
 -/* Enforce (dynamic) linker error for version mismatches. Call from main. */
 -LUA_API void LUAJIT_VERSION_SYM(void);
 -
+ #error "DO NOT USE luajit_rolling.h -- only include build-generated luajit.h"
  #endif
diff --git a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch b/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
deleted file mode 100644
index aedaacbaaea46..0000000000000
--- a/srcpkgs/LuaJIT/patches/unpollute-global-namespace.patch
+++ /dev/null
@@ -1,21 +0,0 @@
-From: =?utf-8?q?Ond=C5=99ej_Sur=C3=BD?= <ondrej@sury.org>
-Date: Wed, 11 Oct 2017 08:42:41 +0000
-Subject: Make ccall_copy_struct static to unpollute global library namespace
-
----
- src/lj_ccall.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git src/lj_ccall.c src/lj_ccall.c
-index b891591..a7dcc1b 100644
---- a/src/lj_ccall.c
-+++ b/src/lj_ccall.c
-@@ -960,7 +960,7 @@ noth:  /* Not a homogeneous float/double aggregate. */
-   return 0;  /* Struct is in GPRs. */
- }
- 
--void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
-+static void ccall_copy_struct(CCallState *cc, CType *ctr, void *dp, void *sp, int ft)
- {
-   if (LJ_ABI_SOFTFP ? ft :
-       ((ft & 3) == FTYPE_FLOAT || (ft >> 2) == FTYPE_FLOAT)) {
diff --git a/srcpkgs/LuaJIT/template b/srcpkgs/LuaJIT/template
index 85449ac3d6f73..85baebb074f52 100644
--- a/srcpkgs/LuaJIT/template
+++ b/srcpkgs/LuaJIT/template
@@ -1,71 +1,56 @@
 # Template file for 'LuaJIT'
 pkgname=LuaJIT
-version=2.1.0beta3
-revision=2
-_so_version=2.1.0
-_dist_version=${_so_version}-beta3
+# set last number to contents of .relver at repo root
+version=2.1.1707061634
+revision=1
+_commit_hash=0d313b243194a0b8d2399d8b549ca5a0ff234db5
+build_style=gnu-makefile
+make_build_target=amalg
 hostmakedepends="lua52-BitOp"
 short_desc="Just-In-Time Compiler for Lua"
 maintainer="Orphaned <orphan@voidlinux.org>"
 license="MIT"
 homepage="http://www.luajit.org"
-distfiles="http://luajit.org/download/${pkgname}-${_dist_version}.tar.gz"
-checksum=1ad2e34b111c802f9d0cdf019e986909123237a28c746b21295b63c9e785d9c3
+distfiles="https://repo.or.cz/luajit-2.0.git/snapshot/${_commit_hash}.tar.gz"
+checksum=4b29f310d9e982f8bfa18f0dcd4d979a23666943e690714bef0750d56b2cd64b
 
 build_options="lua52compat"
+desc_option_lua52compat="higher compatibility with lua 5.2"
 
-_cross_cc="cc"
-if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
-	if [ "${XBPS_MACHINE/-musl/}" = "x86_64" ]; then
-		hostmakedepends+=" cross-i686-linux-musl"
-		_cross_cc="i686-linux-musl-gcc -static"
-	else
-		broken="Host and target wordsize must match"
+_host_cc="cc"
+if [ -n "$CROSS_BUILD" ]; then
+	if [ "$XBPS_WORDSIZE" != "$XBPS_TARGET_WORDSIZE" ]; then
+		if [ "${XBPS_MACHINE%%-*}" = "x86_64" ]; then
+			hostmakedepends+=" cross-i686-linux-musl"
+			_host_cc="i686-linux-musl-gcc -static"
+		else
+			broken="Host and target wordsize must match when not on x86_64"
+		fi
 	fi
-fi
-
-# the ppc64 patchset subtly breaks ppc, needs investigation; for
-# now apply patches conditionally, separately for ppc64 and ppc
-post_patch() {
-	local patchdir
 
-	case "$XBPS_TARGET_MACHINE" in
-		ppc64*) patchdir="ppc64";;
-		ppc*) patchdir="ppc";;
-		*) return;;
-	esac
+	make_build_args+=" CROSS=${XBPS_CROSS_TRIPLET}-"
+fi
 
-	for i in ${FILESDIR}/patches/${patchdir}/*.patch; do
-		msg_normal "patching: $i\n"
-		patch -sNp1 -i ${i}
-	done
+pre_build() {
+	if [ "$build_option_lua52compat" ]; then
+		make_build_args+=" XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
+	fi
 }
 
 do_build() {
+	# if we don't unset, the build fails to cross-compile
+	# due to confliction with the makefile macros
 	local _cflags=$CFLAGS
 	local _ldflags=$LDFLAGS
-
-	if [ "$CROSS_BUILD" ]; then
-		local cross="CROSS=${XBPS_CROSS_TRIPLET}-"
-	fi
-
-	if [ "$build_option_lua52compat" ]; then
-		local _xcflags="XCFLAGS=-DLUAJIT_ENABLE_LUA52COMPAT"
-	fi
-
 	unset CFLAGS LDFLAGS
-	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 HOST_CC="${_cross_cc}" \
+
+	make ${makejobs} PREFIX=/usr HOST_LUA=lua5.2 \
 		HOST_CFLAGS="$XBPS_CFLAGS" HOST_LDFLAGS="$XBPS_LDFLAGS" \
 		TARGET_CFLAGS="${_cflags}" TARGET_LDFLAGS="${_ldflags}" \
-		${_xcflags} ${cross}
+		HOST_CC="${_host_cc}" ${make_build_args}
 }
 
-do_install() {
-	make DPREFIX=${DESTDIR}/usr DESTDIR=${DESTDIR} \
-		INSTALL_SHARE=${DESTDIR}/usr/share PREFIX=/usr install
-
-	mv ${DESTDIR}/usr/bin/luajit-* ${DESTDIR}/usr/bin/luajit
-	ln -fs libluajit-5.1.so.${_so_version} ${DESTDIR}/usr/lib/libluajit-5.1.so.2
+post_install() {
 	vlicense COPYRIGHT
 }
 
diff --git a/srcpkgs/LuaJIT/update b/srcpkgs/LuaJIT/update
index 15613910677c9..96bbbd453c32c 100644
--- a/srcpkgs/LuaJIT/update
+++ b/srcpkgs/LuaJIT/update
@@ -1 +1 @@
-site="http://luajit.org/download.html"
+disabled="impossible to determine given release style and versioning scheme"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (21 preceding siblings ...)
  2024-02-19  5:35 ` [PR PATCH] [Updated] " yoshiyoshyosh
@ 2024-02-19 12:21 ` Calandracas606
  2024-02-24  2:07 ` yoshiyoshyosh
  2024-02-24 18:15 ` Calandracas606
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-19 12:21 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

New comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#issuecomment-1952336424

Comment:
I think the remaining patch for luajit_rolling.h can be dropped too: https://github.com/LuaJIT/LuaJIT/commit/14987af80ab583514f19ef36d1023655324fc757

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (22 preceding siblings ...)
  2024-02-19 12:21 ` Calandracas606
@ 2024-02-24  2:07 ` yoshiyoshyosh
  2024-02-24 18:15 ` Calandracas606
  24 siblings, 0 replies; 26+ messages in thread
From: yoshiyoshyosh @ 2024-02-24  2:07 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]

New comment by yoshiyoshyosh on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#issuecomment-1962215871

Comment:
> I think the remaining patch for luajit_rolling.h can be dropped too: [LuaJIT/LuaJIT@14987af](https://github.com/LuaJIT/LuaJIT/commit/14987af80ab583514f19ef36d1023655324fc757)

I'm not sure? I look at all the other distros and they seem to keep this patch even after updating past this commit

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: LuaJIT: update to 2.1.1706708390, cleanup.
  2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
                   ` (23 preceding siblings ...)
  2024-02-24  2:07 ` yoshiyoshyosh
@ 2024-02-24 18:15 ` Calandracas606
  24 siblings, 0 replies; 26+ messages in thread
From: Calandracas606 @ 2024-02-24 18:15 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

New comment by Calandracas606 on void-packages repository

https://github.com/void-linux/void-packages/pull/48453#issuecomment-1962452705

Comment:
> > I think the remaining patch for luajit_rolling.h can be dropped too: [LuaJIT/LuaJIT@14987af](https://github.com/LuaJIT/LuaJIT/commit/14987af80ab583514f19ef36d1023655324fc757)
> 
> I'm not sure? I look at all the other distros and they seem to keep this patch even after updating past this commit

- Alpine
  - Not updated past this commit
- Chimera
  - LuaJIT not packaged
- Fedora
  - still has LUAJIT_VERSION_SYM patch
  - patch was added in 2015, well before LuaJIT moved to rolling

Fedora patch says that they patched out LUAJIT_VERSION_SYM since it caused ABI breaks on updates, which is the whole point of the macro.

My understanding is that the patch exists because historically, it "artificially" caused an ABI break, even when there were no ABI breaking changes.

Since the move to rolling, the macro has not changed, its value has remained "luaJIT_version_2_1_ROLLING", this suggests that there has been no breaking changes. As long as the macro stays the same, the patch doesn't actually accomplish anything.

If the macro does change, presumably due to a real breaking change, patching this out could make it hard to detect.

The way I see it, this patch doesn't actually accomplish anything meaningful, and I don't think keeping a pointless patch around is a good idea.

Again, it would be helpful to have another opinion on this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-02-24 18:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31  8:15 [PR PATCH] LuaJIT: update to 2.1.1692580715, cleanup yoshiyoshyosh
2024-01-31  8:17 ` [PR PATCH] [Updated] " yoshiyoshyosh
2024-01-31 17:41 ` [PR REVIEW] " Chocimier
2024-01-31 17:49 ` yoshiyoshyosh
2024-01-31 17:53 ` yoshiyoshyosh
2024-01-31 17:53 ` yoshiyoshyosh
2024-01-31 18:14 ` Chocimier
2024-01-31 18:18 ` yoshiyoshyosh
2024-01-31 18:29 ` [PR PATCH] [Updated] " yoshiyoshyosh
2024-02-15 13:38 ` [PR REVIEW] LuaJIT: update to 2.1.1706708390, cleanup Calandracas606
2024-02-16 16:25 ` Calandracas606
2024-02-16 16:55 ` [PR REVIEW] " Calandracas606
2024-02-16 17:01 ` Calandracas606
2024-02-16 17:18 ` Calandracas606
2024-02-16 17:24 ` Calandracas606
2024-02-16 18:21 ` Calandracas606
2024-02-16 18:29 ` Calandracas606
2024-02-19  5:00 ` yoshiyoshyosh
2024-02-19  5:06 ` yoshiyoshyosh
2024-02-19  5:06 ` yoshiyoshyosh
2024-02-19  5:20 ` yoshiyoshyosh
2024-02-19  5:24 ` yoshiyoshyosh
2024-02-19  5:35 ` [PR PATCH] [Updated] " yoshiyoshyosh
2024-02-19 12:21 ` Calandracas606
2024-02-24  2:07 ` yoshiyoshyosh
2024-02-24 18:15 ` Calandracas606

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).