From mboxrd@z Thu Jan 1 00:00:00 1970 From: w.f.j.mueller@retro11.de (Walter F.J. Mueller) Date: Sat, 10 Jun 2017 14:58:43 +0200 Subject: [TUHS] 211bsd: kernel panic after a 'here document' in tcsh Message-ID: Hi, the kernel panic after tcsh here documents is understood. And fixed, at least on my system. The essential hint was Johnny's observation that on his system he gets an "Illegal instruction - core dumped" and no kernel panic. I'm using a self-build PDP 11/70 on an FPGA, see https://github.com/wfjm/w11/ https://wfjm.github.io/home/w11/ which doesn't have a floating point unit. Therefore the kernel is build with floating point emulation, thus with FPSIM YES # floating point simulator In a kernel with FPSIM activated the trap handler trap(), see http://www.retro11.de/ouxr/211bsd/usr/src/sys/pdp/trap.c.html calls for each user mode illegal instruction trap fpsim(). In case it was a floating point instruction fpsim() emulates it, returns 0, and trap() simply returns. If not, fpsim() returns the abort signal type, and trap() calls psignal() with this signal type, which in general will terminate the offending process. The kernel panic is due to a coding error in mch_fpsim.s. Look in http://www.retro11.de/ouxr/211bsd/usr/src/sys/pdp/mch_fpsim.s.html the code after label badins: badins: / Illegal Instruction mov $SIGILL.,r0 br 2b The constant SIGILL is defined in assym.h as #define SIGILL 4. Thus after substitution the mov instruction is mov $4..,r0 with *two dots* !!! The 'as' assembler generates from this mov #160750,r0 So r0 will contain a invalid signal number, which is returned by fpsim() to trap(). This signal number is passed to psignal(), which starts with mask = sigmask(sig); prop = sigprop[sig]; The access to sigprop[sig] results into an address in IO space, causes an UNIBUS timeout, and in consequence the kernel panic. After fixing the "$SIGILL." to "$SIGILL" (removing the extraneous '.') and three similar cases the kernel doesn't panic anymore, tcsh crashed with an illegal instruction trap. Remains the question why tcsh runs onto an illegal instruction. Getting now a tcsh core dump adb gives the answer adb tcsh tcsh.core $c 0172774: _rscan(0176024,0174434) from ~heredoc+0246 0176040: _heredoc(067676) from ~execute+0234 0176126: _execute(067040,01512,0,0) from ~execute+03410 0176222: _execute(066754,01512,0,0) from ~process+01224 0176274: _process(01) from ~main+06030 0177414: _main() from start+0104 heredoc(), which is located in OV1, calls rscan(), which is in OV6 with rscan(Dv, Dtestq); where Dtestq is a function pointer to Dtestq(), which is as heredoc() in OV1. rscan(), which has the signature rscan(t, f) register Char **t; void (*f) (); uses 'f' in the statement (*f) (*p++); The problem is that - heredoc() and Dtestq() are in OV1 - that's why in the end ~Dtestq is used a function pointer, like for all overlay internal function invocations - rscan() is in OV6, when it's called, overlay is switched OV1 -> OV6 - this invalidates the function pointer, which points to some random code location, which happens to hold '000045', causing a trap. It is clear that in this context _Dtestq, the forwarder in the base, must be used and not ~Dtestq, the entry point in the overlay. The generated code for 'rscan(Dv, Dtestq)' is ~heredoc+0230: mov $0174434,(sp) # arg Dtestq: uses ~Dtestq ~heredoc+0234: mov r5,-(sp) ~heredoc+0236: add $0177764,(sp) # arg Dv ~heredoc+0242: jsr pc,*$_rscan Since rscan() is very small and only used by heredoc() I simply moved the code of rscan() from sh.glob.c (OV6) to sh.dol.c where also heredoc() and Dtestq() is defined. After that tcsh works fine with here documents ./tcsh cat >x.x <