[TUHS] 211bsd: kernel panic after a 'here document' in tcsh

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
       [not found] <mailman.1.1496714401.14870.tuhs@minnie.tuhs.org>
@ 2017-06-06 19:15 ` Johnny Billquist
  0 siblings, 0 replies; 15+ messages in thread
From: Johnny Billquist @ 2017-06-06 19:15 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]

On 2017-06-06 04:00, Michael Kjörling <michael at kjorling.se> wrote:
>
> On 5 Jun 2017 16:12 +0200, from w.f.j.mueller at retro11.de (Walter F.J. Mueller):
>> I'm using 211bsd (Version 447) and found that a 'here document' in tcsh
>> leads to a kernel panic. It's absolutely reproducible on my system, both
>> when run it on my FPGA PDP-11 or in simh. Just doing
>>
>>   tcsh
>>   cat << EOF
> I'm curious whether the same thing happens if you try that in some
> other shell? (Not sure how widely here documents were supported back
> then, but I'm asking anyway.)

Not sure if any of the other shells have this. We're basically talking 
csh, sh and ksh unless I remember wrong.
But it's a good question. If noone else have tried it by tomorrow, I 
could check.

>> is enough, and I get
>>
>>     ka6 31333 aps 147472
>>     pc 161324 ps 30004
>>     ov 4
>>     cpuerr 20
>>     trap type 0
>>     panic: trap
>>     syncing disks... done
>>
>> looking at the crash dump gives
>>
>>   cd /etc/crash
>>   ./why 4
>>     Backtrace:
>>     0147372: _boot(05000,0100) from    ~panic+072
>>     0147414: _etext(011350) from ~trap+0350
>>     0147450: ~trap() from call+040
>>     0147516: _psignal(0101520,0160750) from ~trap+0364
>>     0147554: ~trap() from call+040
>>
>> so the crash is in psignal, which is afaik the kernel internal
>> mechanism to dispatch signals.
> The PC value in the panic report ("pc 161324") strikes me as high, but
> 161324 octal is 58068 decimal, so it's not excessively so, and perhaps
> in line with what one might expect to see with a kernel pinned near
> top of memory. Are the offsets in the backtrace constant, i.e. does it
> always crash on the same code?

161324 is way high. This is in kernel mode, and that is in the I/O page. 
Basically no code lives in the I/O page (some boot roms and hardware 
diagnostics excepted). This smells like corrupted memory (pointer or 
stack), or something else very funny.

> Not knowing what cpuerr 20 is specifically doesn't help, and at least
> http://www.retro11.de/ouxr/29bsd/usr/src/sys/sys/trap.c.html#n:112
> (which doesn't seem to be too far from what you are running) isn't
> terribly enlightening; CPUERR is simply a pointer into a memory-mapped
> register of some kind, as seen at
> http://www.retro11.de/ouxr/29bsd/usr/include/sys/iopage.h.html#m:CPUERR,
> and at least pdp11_cpumod.c from the simh source code at
> http://simh.trailing-edge.com/interim/pdp11_cpumod.c wasn't terribly
> enlightening, though of course I could be looking in entirely the
> wrong place.

Like others said - the cpu error register is documented in the processor 
handbook.

020 means Unibus Timeout, which is consistent with trying to access 
something in the I/O page, where there is no device configured to 
respond to that address.

I just tried the same thing on a simh system here, and I do not get a 
crash. This on 2.11BSD at patch level 449, running on an emulated 11/94.

I do however get tcsh to crash.

simh:/home/bqt> su -
Password:
erase, kill ^U, intr ^C
# tcsh
simh:/# cat << EOF
Illegal instruction - core dumped
#
Suspended (tty input)
simh:/home/bqt>
simh:/home/bqt> cat /VERSION
Current Patch Level: 448
Date: January 5, 2010


Yes, it says patch level 448, but it really is 449. This was the system 
where I worked together with Steven when doing the 449 patch set, but I 
never got around to actually updating the VERSION file itself.

Also, this was while running on the console.



Could you (Walter) try the latest version of 2.11BSD and see if you 
still get that crash?

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-25 16:25 Walter F.J. Mueller
  0 siblings, 0 replies; 15+ messages in thread
From: Walter F.J. Mueller @ 2017-06-25 16:25 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

Hi,

two remarks on the issues around FPSIM and tcsh:

I of course wondered by a line like

    mov     $4..,r0

is accepted by 'as', I naively expected that this should cause an error.
I didn't locate the 211bsd 'as' manual, so checked 7th Edition manuals,
which can be found under

   https://wolfram.schneider.org/bsd/7thEdManVol2/

The assembler manual, see
   https://wolfram.schneider.org/bsd/7thEdManVol2/assembler/assembler.pdf

states

    6.1  Expression  operators
         The operators are:
            (blank)     when there  is  no  operand  between  operands,
                        the  effect  is  exactly  the  same  as  if  a
                        ‘+’ had  appeared.

So the lexer sees two tokens

   $4.    --> number
   .      --> symbol for location counter

and, because the default operator is '+', interprets this as

    mov     $4. + . , r0

which ends up being a number in the 160000 to 177777 range.

So 'as' is not to blame, works as designed.

Noel Chippa wrote:
 > I'm fairly amazed that apparently nobody has run across one of these 4 before!
 > (Or, at least, not bothered to report it.)
 > I wonder how long that bug has been in the code?

The answer is: this bug was in 211bsd all the time.
Steven Schultz told me that that they simply didn't have a way to
test FPSIM because all machines had FPP, and the only way of testing
would have been to physically remove the FP11 from a 11/70.


		With best regards,   Walter


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-10 14:24 Noel Chiappa
@ 2017-06-12 15:26 ` Clem Cole
  0 siblings, 0 replies; 15+ messages in thread
From: Clem Cole @ 2017-06-12 15:26 UTC (permalink / raw)


Probably says that that once that most people that ran UNIX purchased the
FP option and did not run the simulated FP and those that did clearly
didn't use tcsh ;-)

Fred Brook's bug's curves really are amazing but are so true.  Some really
major bugs just take years to be discovered because no one ever looked and
when you do.  It can be like this one -- 'wow!'

Clem

On Sat, Jun 10, 2017 at 10:24 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
wrote:

>     > From: "Walter F.J. Mueller"
>
>     > the kernel panic after tcsh here documents is understood.
>
> Very nice detective work!
>
>     > The kernel panic is due to a coding error in mch_fpsim.s. ...  After
>     > fixing the "$SIGILL." ... and three similar cases
>
> I'm fairly amazed that apparently nobody has run across one of these 4
> before!
> (Or, at least, not bothered to report it.)
>
> I wonder how long that bug has been in the code?
>
>      Noel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170612/09e89a39/attachment.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
       [not found] <mailman.1.1497146402.26080.tuhs@minnie.tuhs.org>
@ 2017-06-11 10:25 ` Johnny Billquist
  0 siblings, 0 replies; 15+ messages in thread
From: Johnny Billquist @ 2017-06-11 10:25 UTC (permalink / raw)

On 2017-06-11 04:00, "Walter F.J. Mueller" <w.f.j.mueller at retro11.de> wrote:

> Hi,
>
> the kernel panic after tcsh here documents is understood.
> And fixed, at least on my system.

Nice work. Looking forward to patch #250. And to respond to Noels remark 
about this being around for a long time without reports - since this is 
in FPSIM, and I believe the notes for 2.11BSD even says that this is an 
untested piece of code, which are not even know if it works or not, it's 
not something that have been used for ages. I'm in a way surprised it 
even worked at all. I think I've seen somewhere that it was last tested 
around 2.9BSD, and have not been officially tested since.

> The essential hint was Johnny's observation that on his system he gets
> an "Illegal instruction - core dumped" and no kernel panic.

Well, had I had FPP simulated, I would maybe not have gotten a kernel 
panic anyway. It would all depend on where the address ended up. With my 
current build, the kernel would have been able to read the address, 
since it pointed into the boot diagnostics rom. So it's a dicey error at 
best.

But the tcsh error was very good that you also figured out. And I guess 
it means we now have a known working FPSIM. :-)

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-10 14:24 Noel Chiappa
  2017-06-12 15:26 ` Clem Cole
  0 siblings, 1 reply; 15+ messages in thread
From: Noel Chiappa @ 2017-06-10 14:24 UTC (permalink / raw)


    > From: "Walter F.J. Mueller"

    > the kernel panic after tcsh here documents is understood.

Very nice detective work!

    > The kernel panic is due to a coding error in mch_fpsim.s. ...  After
    > fixing the "$SIGILL." ... and three similar cases

I'm fairly amazed that apparently nobody has run across one of these 4 before!
(Or, at least, not bothered to report it.)

I wonder how long that bug has been in the code?

     Noel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-10 12:58 Walter F.J. Mueller
  0 siblings, 0 replies; 15+ messages in thread
From: Walter F.J. Mueller @ 2017-06-10 12:58 UTC (permalink / raw)


Hi,

the kernel panic after tcsh here documents is understood.
And fixed, at least on my system.

The essential hint was Johnny's observation that on his system he gets
an "Illegal instruction - core dumped" and no kernel panic.

I'm using a self-build PDP 11/70 on an FPGA, see
   https://github.com/wfjm/w11/
   https://wfjm.github.io/home/w11/
which doesn't have a floating point unit. Therefore the kernel is build
with floating point emulation, thus with
   FPSIM   YES      # floating point simulator

In a kernel with FPSIM activated the trap handler trap(), see
   http://www.retro11.de/ouxr/211bsd/usr/src/sys/pdp/trap.c.html
calls for each user mode illegal instruction trap fpsim(). In case
it was a floating point instruction fpsim() emulates it, returns 0,
and trap() simply returns. If not, fpsim() returns the abort signal
type, and trap() calls psignal() with this signal type, which in
general will terminate the offending process.

The kernel panic is due to a coding error in mch_fpsim.s. Look in
   http://www.retro11.de/ouxr/211bsd/usr/src/sys/pdp/mch_fpsim.s.html
the code after label badins:

    badins:                         / Illegal Instruction
            mov     $SIGILL.,r0
            br      2b

The constant SIGILL is defined in assym.h as

    #define SIGILL 4.

Thus after substitution the mov instruction is

            mov     $4..,r0

with *two dots* !!! The 'as' assembler generates from this

            mov #160750,r0

So r0 will contain a invalid signal number, which is returned by fpsim() to
trap(). This signal number is passed to psignal(), which starts with

      mask = sigmask(sig);
      prop = sigprop[sig];

The access to sigprop[sig] results into an address in IO space, causes an
UNIBUS timeout, and in consequence the kernel panic.

After fixing the "$SIGILL." to "$SIGILL"  (removing the extraneous '.') and
three similar cases the kernel doesn't panic anymore, tcsh crashed with an
illegal instruction trap.

Remains the question why tcsh runs onto an illegal instruction. Getting now
a tcsh core dump adb gives the answer

   adb tcsh tcsh.core
     $c
       0172774: _rscan(0176024,0174434) from ~heredoc+0246
       0176040: _heredoc(067676) from ~execute+0234
       0176126: _execute(067040,01512,0,0) from ~execute+03410
       0176222: _execute(066754,01512,0,0) from ~process+01224
       0176274: _process(01) from ~main+06030
       0177414: _main() from start+0104

heredoc(), which is located in OV1, calls rscan(), which is in OV6 with

    rscan(Dv, Dtestq);

where Dtestq is a function pointer to Dtestq(), which is as heredoc() in OV1.
rscan(), which has the signature

      rscan(t, f)
           register Char **t;
           void    (*f) ();

uses 'f' in the statement

       (*f) (*p++);

The problem is that
   - heredoc() and Dtestq() are in OV1
   - that's why in the end ~Dtestq is used a function pointer, like
     for all overlay internal function invocations
   - rscan() is in OV6, when it's called, overlay is switched OV1 -> OV6
   - this invalidates the function pointer, which points to some random
     code location, which happens to hold '000045', causing a trap.

It is clear that in this context _Dtestq, the forwarder in the base, must
be used and not ~Dtestq, the entry point in the overlay. The generated
code for 'rscan(Dv, Dtestq)' is

       ~heredoc+0230:  mov     $0174434,(sp)         # arg Dtestq: uses ~Dtestq
       ~heredoc+0234:  mov     r5,-(sp)
       ~heredoc+0236:  add     $0177764,(sp)         # arg Dv
       ~heredoc+0242:  jsr     pc,*$_rscan

Since rscan() is very small and only used by heredoc() I simply moved the
code of rscan() from sh.glob.c (OV6) to sh.dol.c where also heredoc() and
Dtestq() is defined.

After that tcsh works fine with here documents
   ./tcsh
   cat >x.x <<EOF
   1
   $TERM
   $PWD
   EOF

   cat x.x
     1
     vt100-long
     /usr/src/bin/tcsh

Bottom line
   - fpsim was broken all the time
   - tcsh was broken all the time

I'm convert this into proper patches and send them to Steven, but this will
take some time because I've to tidy up my system to be again in the
position to provide proper and clean patch sets.

             With best regards,       Walter


P.S.: debugging the kernel issue was quite easy because the w11a CPU has
three essential 'build into the cpu' debug tools:
- a 'cpu monitor', which records 144 bits of processor state for the last 256
   instructions or vector fetches, see
     https://github.com/wfjm/w11/blob/master/rtl/w11a/pdp11_dmcmon.vhd
- a 'breakpoint unit' which allows to set instruction of data breakpoints
- an 'ibus monitor' which records the last 512 ibus transactions
After setting a breakpoint on the trap 004/010 handler an inspection of the
instruction trace gave the essential information. Below a very condensed
and annotated excerpt

  nc ....pc cprptnzvc ..dsrc ..ddst ..dres      vmaddr vmdata
#
# the "(*f) (*p++)" in tcsh, running onto an illegal instruction
#
  15 145210 uu00-.... 000105 173052 000105 w  d 173052 000105 mov r0,(sp)
  25 145212 uu00-.... 173050 174434 174434 w  d 173050 145216 jsr pc, at n(r5)
  19 174434 uu00-.... 000010 173064 000010 r  i 174434 000045 ?000045?
   1 174434 uu00-.... 000012 173064 000012 r  d 000010 000045 !VFETCH 010 RIT
#
# the "mov $SIGILL.,r0" in fpsim(), load 160750 instead of 000004
#
  17 160744 ku00-n..c 160750 000045 160750 r  i 160746 160750 mov #n,r0
  14 160750 ku00-n..c 160752 160750 160732 r  i 160750 000770 br .-14
#
# the "sigprop[sig]" access in psignal(), which accesses 174036
# which leads to a external bus (or UNIBUS) time out and IIT trap
#
  23 161314 ku00-.z.. 000000 147500 000000 w  d 147500 000000 mov r1,n(r5)
   9 161320 ku00-.z.. 174036 000000 000000 Ebto 174036 013066 movb n(r3),r0
   3 161320 ku00-.z.. 000006 000000 000006 r  d 000004 013066 !VFETCH 004 IIT


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
       [not found] <mailman.884.1496866451.3779.tuhs@minnie.tuhs.org>
@ 2017-06-08 22:29 ` Johnny Billquist
  0 siblings, 0 replies; 15+ messages in thread
From: Johnny Billquist @ 2017-06-08 22:29 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4404 bytes --]

On 2017-06-07 22:14, "Walter F.J. Mueller"<w.f.j.mueller at retro11.de> wrote:

> Hi,
> 
> a few remarks on the feedback on the kernel panic after a 'here document' in tcsh.
> 
> To Michael Kjörling question:
>   > I'm curious whether the same thing happens if you try that in some
>   > other shell? (Not sure how widely here documents were supported back
>   > then, but I'm asking anyway.)
> And Johnny Billquist remark
>   > Not sure if any of the other shells have this.
> 
> 'here documents' are available and work fine in sh and csh.
> And are in fact used, examples

Ah. Thanks. Too lazy to check.

> To Michael Kjörling remark
>   > The PC value in the panic report ("pc 161324") strikes me as high
> and Johnny Billquist remark
>   > This is in kernel mode, and that is in the I/O page.
> 
> 211bsd uses split I/D space and uses all 64 kB I space for code.

D'oh! Color me stupid. I should have thought of that.

> The top 8 kB are in fact  the overlay area, and the crash happened
> in overlay 4 (as indicated by ov 4). With a simple
> 
>     nm /unix | sort | grep " 4"
> 
> one gets
> 
>     161254 t ~psignal 4
>     162302 t ~issignal 4
> 
> so the crash is just 050 bytes after the entry point of psignal. So the
> PC address is fine and not the problem. For psignal look at
> 
>     http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#s:_psignal
> 
> the crash must be one of the first lines. psignal is an internal kernel
> function, called from
> 
>     http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#xref:s:_psignal
> 
> and has nothing to do with the libc function psignal
> 
>     http://www.retro11.de/ouxr/211bsd/usr/man/cat3/psignal.0.html
>     http://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/gen/psignal.c.html

The libc function would be in user mode, so that one was pretty clear.

Ok. Digging through this a little for real then.

psignal gets called with a signal from the trap handler. The actual 
signal is weird. It would appear to be 0160750, which would be -7704 if 
I'm counting right. That does not make sense as a signal.

The psignal code pulls a value based on the signal number, which is the 
line:
         prop = sigprop[sig];

which uses the signal number as an index. With a random, weird signal 
number, this access wherever that might end up. Which is when you get 
the crash.

On my system, sigprop is at address 0012172, which, with a signal of 
-7704 ends up at address 0173142, which by (un)luck happens to be in the 
middle of the diagnostics bootstrap rom space. So I don't get a Unibus 
timeout error, while you do. Probably because sigprop is at a slightly 
different address in your kernel.

So, the real question is how trap can be calling psignal with such a 
broken signal number.

I might dig further down that question another day. But unless you 
already got this far, I might have saved you a few minutes of digging. I 
did start looking into the trap code, which is in pdp/trap.c, but this 
is not entirely straight forward. It goes through a bunch of things 
trying to decide what signal to send, before actually calling psignal.

> To Johnny Billquist remark
>   > Could you (Walter) try the latest version of 2.11BSD and see if you
>   > still get that crash?
> 
> very interesting that you see a core dump of tcsh rather a kernel panic.

Indeed.

> Whatever tcsh does, it should not lead to a kernel panic, and if it does,
> it is primarily a bug of the kernel. It looks like there are two issues,
> one in tcsh, and one in the kernel. I've a hunch were this might come from,
> but that will take a weekend or two to check on.

Agree that the kernel should not crash on this.

Also, tcsh should not really crash either, but it's a separate issue, 
even though one might have triggered the other here.
But yes, there are two bugs in here.
If you can recreate the kernel crash on the latest version, that would 
be good.

But it smells like trap.c have some path where it does not even set what 
signal to deliver, and then calls psignal with whatever the variable i 
got at the function start. Which would be some random stuff on the stack.

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-07 20:14 Walter F.J. Mueller
@ 2017-06-08  7:54 ` Michael Kjörling
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kjörling @ 2017-06-08  7:54 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

On 7 Jun 2017 22:14 +0200, from w.f.j.mueller at retro11.de (Walter F.J. Mueller):
> To Michael Kjörling remark
>> The PC value in the panic report ("pc 161324") strikes me as high
> and Johnny Billquist remark
>> This is in kernel mode, and that is in the I/O page.
> 
> 211bsd uses split I/D space and uses all 64 kB I space for code.
> The top 8 kB are in fact  the overlay area, and the crash happened
> in overlay 4 (as indicated by ov 4). With a simple

Note what follows in the sentence which you snipped:

> ...but 161324 octal is 58068 decimal, so it's not excessively so,
> and perhaps in line with what one might expect to see...

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-07 20:14 Walter F.J. Mueller
  2017-06-08  7:54 ` Michael Kjörling
  0 siblings, 1 reply; 15+ messages in thread
From: Walter F.J. Mueller @ 2017-06-07 20:14 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

Hi,

a few remarks on the feedback on the kernel panic after a 'here document' in tcsh.

To Michael Kjörling question:
 > I'm curious whether the same thing happens if you try that in some
 > other shell? (Not sure how widely here documents were supported back
 > then, but I'm asking anyway.)
And Johnny Billquist remark
 > Not sure if any of the other shells have this.

'here documents' are available and work fine in sh and csh.
And are in fact used, examples

   /usr/adm/daily     (a /bin/sh script)
     su uucp << EOF
           /etc/uucp/clean.daily
     EOF

   /usr/crash/why     (a /bin/csh script)
     adb -k {unix,core}.$1 << 'EOF'
     version/sn"Backtrace:"n
     $c
     'EOF'

To Michael Kjörling remark
 > The PC value in the panic report ("pc 161324") strikes me as high
and Johnny Billquist remark
 > This is in kernel mode, and that is in the I/O page.

211bsd uses split I/D space and uses all 64 kB I space for code.
The top 8 kB are in fact  the overlay area, and the crash happened
in overlay 4 (as indicated by ov 4). With a simple

   nm /unix | sort | grep " 4"

one gets

   161254 t ~psignal 4
   162302 t ~issignal 4

so the crash is just 050 bytes after the entry point of psignal. So the
PC address is fine and not the problem. For psignal look at

   http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#s:_psignal

the crash must be one of the first lines. psignal is an internal kernel
function, called from

   http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#xref:s:_psignal

and has nothing to do with the libc function psignal

   http://www.retro11.de/ouxr/211bsd/usr/man/cat3/psignal.0.html
   http://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/gen/psignal.c.html

To Johnny Billquist remark
 > Could you (Walter) try the latest version of 2.11BSD and see if you
 > still get that crash?

very interesting that you see a core dump of tcsh rather a kernel panic.

Whatever tcsh does, it should not lead to a kernel panic, and if it does,
it is primarily a bug of the kernel. It looks like there are two issues,
one in tcsh, and one in the kernel. I've a hunch were this might come from,
but that will take a weekend or two to check on.


		With best regards,  Walter


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-05 22:08     ` Jacob Ritorto
@ 2017-06-06 11:43       ` Ron Natalie
  0 siblings, 0 replies; 15+ messages in thread
From: Ron Natalie @ 2017-06-06 11:43 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

Chapter 3 of the PDP-11 processor handbook.

 

Crudely scanned copy here:   <http://bitsavers.trailing-edge.com/pdf/dec/pdp11/1170/EK-KB11C-TM-001_1170procMan.pdf> http://bitsavers.trailing-edge.com/pdf/dec/pdp11/1170/EK-KB11C-TM-001_1170procMan.pdf

 

From: Jacob Ritorto [mailto:jacob.ritorto@gmail.com] 
Sent: Monday, June 5, 2017 6:08 PM
To: Ron Natalie
Cc: Michael Kjörling; tuhs at minnie.tuhs.org
Subject: Re: [TUHS] 211bsd: kernel panic after a 'here document' in tcsh

 

Nice snipe, Ron!  Where might one find the list of trap_types and cpuerrs?

 

On Mon, Jun 5, 2017 at 12:33 PM, Ron Natalie <ron at ronnatalie.com> wrote:

Trap type 0 is bus error.    The two causes of this are either addressing memory location that does not respond (as opposed to being umapped) or an word operation on an odd address.

Since you have a cpuerr print out you have a 44/45/70 CPU.  The 020 value indicates that it was the bus timeout case.

Something touched memory that was mapped in but didn't physically exist.



 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170606/3f7f88da/attachment.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-05 23:05 Noel Chiappa
  0 siblings, 0 replies; 15+ messages in thread
From: Noel Chiappa @ 2017-06-05 23:05 UTC (permalink / raw)


    > From: Jacob Ritorto

    > Where might one find the list of trap_types

Look in:

  http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/scb.s

which maps from trap vector locations (built into the hardware; consult a
PDP-11 CPU manual for details) to trap type numbers, which are defined here:

  http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/trap.h

and handled here:

  http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/trap.c


    > and cpuerrs?

That just prints the contents of the CPU Error Register; see an appropriate
PDP-11 CPU manual - 11/70, /44, /73, /83 or /84 for what all the bits mean.
Also the "KDJ11-A CPU Module User's Guide", which also documents it.

In theory, there's also a KDJ11-B UG, but it's not online. If anyone has one,
can we please get it scanned? Thanks!

    Noel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-05 16:33   ` Ron Natalie
@ 2017-06-05 22:08     ` Jacob Ritorto
  2017-06-06 11:43       ` Ron Natalie
  0 siblings, 1 reply; 15+ messages in thread
From: Jacob Ritorto @ 2017-06-05 22:08 UTC (permalink / raw)


Nice snipe, Ron!  Where might one find the list of trap_types and cpuerrs?

On Mon, Jun 5, 2017 at 12:33 PM, Ron Natalie <ron at ronnatalie.com> wrote:

> Trap type 0 is bus error.    The two causes of this are either addressing
> memory location that does not respond (as opposed to being umapped) or an
> word operation on an odd address.
>
> Since you have a cpuerr print out you have a 44/45/70 CPU.  The 020 value
> indicates that it was the bus timeout case.
>
> Something touched memory that was mapped in but didn't physically exist.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170605/37b404c5/attachment.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-05 16:16 ` Michael Kjörling
@ 2017-06-05 16:33   ` Ron Natalie
  2017-06-05 22:08     ` Jacob Ritorto
  0 siblings, 1 reply; 15+ messages in thread
From: Ron Natalie @ 2017-06-05 16:33 UTC (permalink / raw)


Trap type 0 is bus error.    The two causes of this are either addressing memory location that does not respond (as opposed to being umapped) or an word operation on an odd address.

Since you have a cpuerr print out you have a 44/45/70 CPU.  The 020 value indicates that it was the bus timeout case.

Something touched memory that was mapped in but didn't physically exist.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
  2017-06-05 14:12 Walter F.J. Mueller
@ 2017-06-05 16:16 ` Michael Kjörling
  2017-06-05 16:33   ` Ron Natalie
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kjörling @ 2017-06-05 16:16 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2197 bytes --]

On 5 Jun 2017 16:12 +0200, from w.f.j.mueller at retro11.de (Walter F.J. Mueller):
> I'm using 211bsd (Version 447) and found that a 'here document' in tcsh
> leads to a kernel panic. It's absolutely reproducible on my system, both
> when run it on my FPGA PDP-11 or in simh. Just doing
> 
>   tcsh
>   cat << EOF

I'm curious whether the same thing happens if you try that in some
other shell? (Not sure how widely here documents were supported back
then, but I'm asking anyway.)


> is enough, and I get
> 
>     ka6 31333 aps 147472
>     pc 161324 ps 30004
>     ov 4
>     cpuerr 20
>     trap type 0
>     panic: trap
>     syncing disks... done
> 
> looking at the crash dump gives
> 
>   cd /etc/crash
>   ./why 4
>     Backtrace:
>     0147372: _boot(05000,0100) from    ~panic+072
>     0147414: _etext(011350) from ~trap+0350
>     0147450: ~trap() from call+040
>     0147516: _psignal(0101520,0160750) from ~trap+0364
>     0147554: ~trap() from call+040
> 
> so the crash is in psignal, which is afaik the kernel internal
> mechanism to dispatch signals.

The PC value in the panic report ("pc 161324") strikes me as high, but
161324 octal is 58068 decimal, so it's not excessively so, and perhaps
in line with what one might expect to see with a kernel pinned near
top of memory. Are the offsets in the backtrace constant, i.e. does it
always crash on the same code?

Not knowing what cpuerr 20 is specifically doesn't help, and at least
http://www.retro11.de/ouxr/29bsd/usr/src/sys/sys/trap.c.html#n:112
(which doesn't seem to be too far from what you are running) isn't
terribly enlightening; CPUERR is simply a pointer into a memory-mapped
register of some kind, as seen at
http://www.retro11.de/ouxr/29bsd/usr/include/sys/iopage.h.html#m:CPUERR,
and at least pdp11_cpumod.c from the simh source code at
http://simh.trailing-edge.com/interim/pdp11_cpumod.c wasn't terribly
enlightening, though of course I could be looking in entirely the
wrong place.

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] 211bsd: kernel panic after a 'here document' in tcsh
@ 2017-06-05 14:12 Walter F.J. Mueller
  2017-06-05 16:16 ` Michael Kjörling
  0 siblings, 1 reply; 15+ messages in thread
From: Walter F.J. Mueller @ 2017-06-05 14:12 UTC (permalink / raw)


Hi,

I'm using 211bsd (Version 447) and found that a 'here document' in tcsh
leads to a kernel panic. It's absolutely reproducible on my system, both
when run it on my FPGA PDP-11 or in simh. Just doing

   tcsh
   cat << EOF

is enough, and I get

     ka6 31333 aps 147472
     pc 161324 ps 30004
     ov 4
     cpuerr 20
     trap type 0
     panic: trap
     syncing disks... done

looking at the crash dump gives

   cd /etc/crash
   ./why 4
     Backtrace:
     0147372: _boot(05000,0100) from    ~panic+072
     0147414: _etext(011350) from ~trap+0350
     0147450: ~trap() from call+040
     0147516: _psignal(0101520,0160750) from ~trap+0364
     0147554: ~trap() from call+040

so the crash is in psignal, which is afaik the kernel internal
mechanism to dispatch signals.

Questions:
   1. has anybody seen this before ?
   2. any idea what the reason could be ?


		With best regards, 	Walter


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-25 16:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.1.1496714401.14870.tuhs@minnie.tuhs.org>
2017-06-06 19:15 ` [TUHS] 211bsd: kernel panic after a 'here document' in tcsh Johnny Billquist
2017-06-25 16:25 Walter F.J. Mueller
     [not found] <mailman.1.1497146402.26080.tuhs@minnie.tuhs.org>
2017-06-11 10:25 ` Johnny Billquist
  -- strict thread matches above, loose matches on Subject: below --
2017-06-10 14:24 Noel Chiappa
2017-06-12 15:26 ` Clem Cole
2017-06-10 12:58 Walter F.J. Mueller
     [not found] <mailman.884.1496866451.3779.tuhs@minnie.tuhs.org>
2017-06-08 22:29 ` Johnny Billquist
2017-06-07 20:14 Walter F.J. Mueller
2017-06-08  7:54 ` Michael Kjörling
2017-06-05 23:05 Noel Chiappa
2017-06-05 14:12 Walter F.J. Mueller
2017-06-05 16:16 ` Michael Kjörling
2017-06-05 16:33   ` Ron Natalie
2017-06-05 22:08     ` Jacob Ritorto
2017-06-06 11:43       ` Ron Natalie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).