[TUHS] SYSTEM V R1 HELP

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] SYSTEM V R1 HELP
@ 2017-12-21 22:12 William Corcoran
  2017-12-21 22:30 ` Clem Cole
  0 siblings, 1 reply; 15+ messages in thread
From: William Corcoran @ 2017-12-21 22:12 UTC (permalink / raw)


Hello Team TUHS: 

I am having a problem with my PDP-11 SVR1 running under a recent SIMH build.  My problem occurs on both MAC OS X and FreeBSD.  

First, I created a six disk (RP06) and eight port TTY (DZ) kernel, with swap placed on drive 1.  The system behaves beautifully as FSCK reports clean.  Eight users can login with no problem.  

Second, I reverted to a pristine PDP-11 SVR1 with one drive (RP06) and no DZ and booted the default kernel (gdtm) and I see the same problem described below.

Third, when using the tape driver instead of /dev/null i get the same results.   

Next, here is the issue: 

cd / 
find . -print | cpio -ocvB > /dev/null 

It runs for a short while and then shitz a core: 
I am using /dev/null to take the tape driver out of the equation.  

Here is the backtrace for cpio:  

$c
__strout(053522,043,0,053012)
__doprnt(046652,0177606,053012)
_fprintf(053012,046652,053522)
~main(02,0177636)


Now, interestingly,  I run into a similar issue when using tar: 

cd  /usr
tar -cvf /dev/null . 

Again, this will run for a while, then drops a core.  Here is the backtrace for tar:  

$c
__strout(043123,02,0,045506)
__doprnt(043123,0167472,045506)
_fprintf(045506,043123,0170600)
~putfile(0170600,0170641)
~putfile(0171654,0171704)
~putfile(0172730,0172745)
~putfile(0174004,0174016)
~putfile(0175060,0175066)
~putfile(0176134,0176136)
~putfile(0177672,0177672)
~dorep(0177632)
~main(04,0177630)

This really bugging me since my SVR1 is otherwise working flawlessly.  I was able to remake the entire system and custom kernels that boot with no problem.   
Also, I configured my main port to run inside the AWS Lightsail and now I have access to SVR1 from anywhere in the world!

I was also wondering if doing a CPIO or TAR on the entire system was overflowing some link tables and maybe this is expected behavior for the minimal resource of the PDP-11?

Thank you for any help.   

Would you expect tar or cpio to dump core if you attempted to copy large filesystems  (or the entire system) on a PDP-11? 
Note: All of my testing has been in single user mode.   


Truly, 

Bill Corcoran  











^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-21 22:12 [TUHS] SYSTEM V R1 HELP William Corcoran
@ 2017-12-21 22:30 ` Clem Cole
  2017-12-22  0:34   ` William Corcoran
  0 siblings, 1 reply; 15+ messages in thread
From: Clem Cole @ 2017-12-21 22:30 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]

Bill, in the debugger, for both cpio and tar, follow the string ptrs in
fprintf and see if you can figure out where its dying.  Also see what value
errno is, hopefully it has not been lost.  Neither program should be using
printf except for messages to the console, so I'm guessing this is an error
message trying to be output from an error the kernel returned.   See if you
can find the message from cpio/tar and the error code and that might give
you a hint to look in the kernel.

Could you be running out of open files, maybe.

Another thing to try, is:  cd /
find . -print
 > /tmp/file.lst
cpio -ocvB > /dev/null
 < /tmp/file.lst

See if that changes anything.   It should remove some of pressure on the
kernel tables.

Clem


ᐧ

On Thu, Dec 21, 2017 at 5:12 PM, William Corcoran <wlc at jctaylor.com> wrote:

> Hello Team TUHS:
>
> I am having a problem with my PDP-11 SVR1 running under a recent SIMH
> build.  My problem occurs on both MAC OS X and FreeBSD.
>
> First, I created a six disk (RP06) and eight port TTY (DZ) kernel, with
> swap placed on drive 1.  The system behaves beautifully as FSCK reports
> clean.  Eight users can login with no problem.
>
> Second, I reverted to a pristine PDP-11 SVR1 with one drive (RP06) and no
> DZ and booted the default kernel (gdtm) and I see the same problem
> described below.
>
> Third, when using the tape driver instead of /dev/null i get the same
> results.
>
> Next, here is the issue:
>
> cd /
> find . -print | cpio -ocvB > /dev/null
>
> It runs for a short while and then shitz a core:
> I am using /dev/null to take the tape driver out of the equation.
>
> Here is the backtrace for cpio:
>
> $c
> __strout(053522,043,0,053012)
> __doprnt(046652,0177606,053012)
> _fprintf(053012,046652,053522)
> ~main(02,0177636)
>
>
> Now, interestingly,  I run into a similar issue when using tar:
>
> cd  /usr
> tar -cvf /dev/null .
>
> Again, this will run for a while, then drops a core.  Here is the
> backtrace for tar:
>
> $c
> __strout(043123,02,0,045506)
> __doprnt(043123,0167472,045506)
> _fprintf(045506,043123,0170600)
> ~putfile(0170600,0170641)
> ~putfile(0171654,0171704)
> ~putfile(0172730,0172745)
> ~putfile(0174004,0174016)
> ~putfile(0175060,0175066)
> ~putfile(0176134,0176136)
> ~putfile(0177672,0177672)
> ~dorep(0177632)
> ~main(04,0177630)
>
> This really bugging me since my SVR1 is otherwise working flawlessly.  I
> was able to remake the entire system and custom kernels that boot with no
> problem.
> Also, I configured my main port to run inside the AWS Lightsail and now I
> have access to SVR1 from anywhere in the world!
>
> I was also wondering if doing a CPIO or TAR on the entire system was
> overflowing some link tables and maybe this is expected behavior for the
> minimal resource of the PDP-11?
>
> Thank you for any help.
>
> Would you expect tar or cpio to dump core if you attempted to copy large
> filesystems  (or the entire system) on a PDP-11?
> Note: All of my testing has been in single user mode.
>
>
> Truly,
>
> Bill Corcoran
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171221/76ff7e19/attachment.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-21 22:30 ` Clem Cole
@ 2017-12-22  0:34   ` William Corcoran
  2017-12-22  1:51     ` William Corcoran
  2017-12-22  1:55     ` Random832
  0 siblings, 2 replies; 15+ messages in thread
From: William Corcoran @ 2017-12-22  0:34 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]

Hello Clem,  

Well, I have some interesting results.  

First, your comments about fprintf made me think about turning off the output.   So: 

cd /
find . -print | cpio -ocB > /dev/null  

Works perfectly, the return code is a nice clean 0—No more core dump!  

However, when I enable the verbose option to cpio, it dumps core.  

Next, it dumps core using your example below.  However, if I remove the verbose option, cpio completes without error!!! 

Next, I got a super weird result when I try to print the error code:  

find .  -print | cpio -ocvB > /dev/null 

      files are displayed…. 
      Memory Fault: core dumped   

echo $?

1003

Now, I thought all return values were 8 bit?    What is that all about?  

I just wanted to pass along the update.  I will see if I can follow the string pointers in fprintf.  

Truly, 

Bill Corcoran   




On Dec 21, 2017, at 5:30 PM, Clem Cole <clemc at ccc.com> wrote:

Bill, in the debugger, for both cpio and tar, follow the string ptrs in fprintf and see if you can figure out where its dying.  Also see what value errno is, hopefully it has not been lost.  Neither program should be using printf except for messages to the console, so I'm guessing this is an error message trying to be output from an error the kernel returned.   See if you can find the message from cpio/tar and the error code and that might give you a hint to look in the kernel.

Could you be running out of open files, maybe.

Another thing to try, is:  cd /
find . -print > /tmp/file.lst
cpio -ocvB > /dev/null < /tmp/file.lst

See if that changes anything.   It should remove some of pressure on the kernel tables.

Clem

ᐧ

On Thu, Dec 21, 2017 at 5:12 PM, William Corcoran <wlc at jctaylor.com> wrote:
Hello Team TUHS:

I am having a problem with my PDP-11 SVR1 running under a recent SIMH build.  My problem occurs on both MAC OS X and FreeBSD.

First, I created a six disk (RP06) and eight port TTY (DZ) kernel, with swap placed on drive 1.  The system behaves beautifully as FSCK reports clean.  Eight users can login with no problem.

Second, I reverted to a pristine PDP-11 SVR1 with one drive (RP06) and no DZ and booted the default kernel (gdtm) and I see the same problem described below.

Third, when using the tape driver instead of /dev/null i get the same results.

Next, here is the issue:

cd /
find . -print | cpio -ocvB > /dev/null

It runs for a short while and then shitz a core:
I am using /dev/null to take the tape driver out of the equation.

Here is the backtrace for cpio:

$c
__strout(053522,043,0,053012)
__doprnt(046652,0177606,053012)
_fprintf(053012,046652,053522)
~main(02,0177636)


Now, interestingly,  I run into a similar issue when using tar:

cd  /usr
tar -cvf /dev/null .

Again, this will run for a while, then drops a core.  Here is the backtrace for tar:

$c
__strout(043123,02,0,045506)
__doprnt(043123,0167472,045506)
_fprintf(045506,043123,0170600)
~putfile(0170600,0170641)
~putfile(0171654,0171704)
~putfile(0172730,0172745)
~putfile(0174004,0174016)
~putfile(0175060,0175066)
~putfile(0176134,0176136)
~putfile(0177672,0177672)
~dorep(0177632)
~main(04,0177630)

This really bugging me since my SVR1 is otherwise working flawlessly.  I was able to remake the entire system and custom kernels that boot with no problem.
Also, I configured my main port to run inside the AWS Lightsail and now I have access to SVR1 from anywhere in the world!

I was also wondering if doing a CPIO or TAR on the entire system was overflowing some link tables and maybe this is expected behavior for the minimal resource of the PDP-11?

Thank you for any help.

Would you expect tar or cpio to dump core if you attempted to copy large filesystems  (or the entire system) on a PDP-11?
Note: All of my testing has been in single user mode.


Truly,

Bill Corcoran













^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  0:34   ` William Corcoran
@ 2017-12-22  1:51     ` William Corcoran
  2017-12-22  1:54       ` William Corcoran
                         ` (2 more replies)
  2017-12-22  1:55     ` Random832
  1 sibling, 3 replies; 15+ messages in thread
From: William Corcoran @ 2017-12-22  1:51 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4485 bytes --]

Okay, I think I am on to something… 

Whenever tar or cpio dumps core, it is always when 32769 bytes have been written to stdout.  
I looked at fprintf and there is a register int called “count.” 

It looks like when it overflows its clobbering a structure and then the kernel gets mad and kills the process.  

Is it possible this is a latent defect in SVR1 or is there something going on with SIMH?  

I can’t believe this is a latent defect.  

Thank you so much for all of your help TEAM TUHS!  

Much thanks to Clem’s help! 

Bill Corcoran  



On Dec 21, 2017, at 7:34 PM, William Corcoran <wlc at jctaylor.com> wrote:

Hello Clem,  

Well, I have some interesting results.  

First, your comments about fprintf made me think about turning off the output.   So: 

cd /
find . -print | cpio -ocB > /dev/null  

Works perfectly, the return code is a nice clean 0—No more core dump!  

However, when I enable the verbose option to cpio, it dumps core.  

Next, it dumps core using your example below.  However, if I remove the verbose option, cpio completes without error!!! 

Next, I got a super weird result when I try to print the error code:  

find .  -print | cpio -ocvB > /dev/null 

    files are displayed…. 
    Memory Fault: core dumped   

echo $?

1003

Now, I thought all return values were 8 bit?    What is that all about?  

I just wanted to pass along the update.  I will see if I can follow the string pointers in fprintf.  

Truly, 

Bill Corcoran   




On Dec 21, 2017, at 5:30 PM, Clem Cole <clemc at ccc.com> wrote:

Bill, in the debugger, for both cpio and tar, follow the string ptrs in fprintf and see if you can figure out where its dying.  Also see what value errno is, hopefully it has not been lost.  Neither program should be using printf except for messages to the console, so I'm guessing this is an error message trying to be output from an error the kernel returned.   See if you can find the message from cpio/tar and the error code and that might give you a hint to look in the kernel.

Could you be running out of open files, maybe.

Another thing to try, is:  cd /
find . -print > /tmp/file.lst
cpio -ocvB > /dev/null < /tmp/file.lst

See if that changes anything.   It should remove some of pressure on the kernel tables.

Clem

ᐧ

On Thu, Dec 21, 2017 at 5:12 PM, William Corcoran <wlc at jctaylor.com> wrote:
Hello Team TUHS:

I am having a problem with my PDP-11 SVR1 running under a recent SIMH build.  My problem occurs on both MAC OS X and FreeBSD.

First, I created a six disk (RP06) and eight port TTY (DZ) kernel, with swap placed on drive 1.  The system behaves beautifully as FSCK reports clean.  Eight users can login with no problem.

Second, I reverted to a pristine PDP-11 SVR1 with one drive (RP06) and no DZ and booted the default kernel (gdtm) and I see the same problem described below.

Third, when using the tape driver instead of /dev/null i get the same results.

Next, here is the issue:

cd /
find . -print | cpio -ocvB > /dev/null

It runs for a short while and then shitz a core:
I am using /dev/null to take the tape driver out of the equation.

Here is the backtrace for cpio:

$c
__strout(053522,043,0,053012)
__doprnt(046652,0177606,053012)
_fprintf(053012,046652,053522)
~main(02,0177636)


Now, interestingly,  I run into a similar issue when using tar:

cd  /usr
tar -cvf /dev/null .

Again, this will run for a while, then drops a core.  Here is the backtrace for tar:

$c
__strout(043123,02,0,045506)
__doprnt(043123,0167472,045506)
_fprintf(045506,043123,0170600)
~putfile(0170600,0170641)
~putfile(0171654,0171704)
~putfile(0172730,0172745)
~putfile(0174004,0174016)
~putfile(0175060,0175066)
~putfile(0176134,0176136)
~putfile(0177672,0177672)
~dorep(0177632)
~main(04,0177630)

This really bugging me since my SVR1 is otherwise working flawlessly.  I was able to remake the entire system and custom kernels that boot with no problem.
Also, I configured my main port to run inside the AWS Lightsail and now I have access to SVR1 from anywhere in the world!

I was also wondering if doing a CPIO or TAR on the entire system was overflowing some link tables and maybe this is expected behavior for the minimal resource of the PDP-11?

Thank you for any help.

Would you expect tar or cpio to dump core if you attempted to copy large filesystems  (or the entire system) on a PDP-11?
Note: All of my testing has been in single user mode.


Truly,

Bill Corcoran














^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  1:51     ` William Corcoran
@ 2017-12-22  1:54       ` William Corcoran
  2017-12-22  2:55       ` Random832
  2017-12-22 15:04       ` Clem Cole
  2 siblings, 0 replies; 15+ messages in thread
From: William Corcoran @ 2017-12-22  1:54 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5024 bytes --]

Strike that... this is UNIX.   Please forgive my offensive post.   I should have said when count overflows it attempts to clobber the structure and the kernel gets mad and kills the process.   Please forgive my transgression.

On Dec 21, 2017, at 8:51 PM, William Corcoran <wlc at jctaylor.com<mailto:wlc at jctaylor.com>> wrote:

Okay, I think I am on to something…

Whenever tar or cpio dumps core, it is always when 32769 bytes have been written to stdout.
I looked at fprintf and there is a register int called “count.”

It looks like when it overflows its clobbering a structure and then the kernel gets mad and kills the process.

Is it possible this is a latent defect in SVR1 or is there something going on with SIMH?

I can’t believe this is a latent defect.

Thank you so much for all of your help TEAM TUHS!

Much thanks to Clem’s help!

Bill Corcoran



On Dec 21, 2017, at 7:34 PM, William Corcoran <wlc at jctaylor.com<mailto:wlc at jctaylor.com>> wrote:

Hello Clem,

Well, I have some interesting results.

First, your comments about fprintf made me think about turning off the output.   So:

cd /
find . -print | cpio -ocB > /dev/null

Works perfectly, the return code is a nice clean 0—No more core dump!

However, when I enable the verbose option to cpio, it dumps core.

Next, it dumps core using your example below.  However, if I remove the verbose option, cpio completes without error!!!

Next, I got a super weird result when I try to print the error code:

find .  -print | cpio -ocvB > /dev/null

   files are displayed….
   Memory Fault: core dumped

echo $?

1003

Now, I thought all return values were 8 bit?    What is that all about?

I just wanted to pass along the update.  I will see if I can follow the string pointers in fprintf.

Truly,

Bill Corcoran




On Dec 21, 2017, at 5:30 PM, Clem Cole <clemc at ccc.com<mailto:clemc at ccc.com>> wrote:

Bill, in the debugger, for both cpio and tar, follow the string ptrs in fprintf and see if you can figure out where its dying.  Also see what value errno is, hopefully it has not been lost.  Neither program should be using printf except for messages to the console, so I'm guessing this is an error message trying to be output from an error the kernel returned.   See if you can find the message from cpio/tar and the error code and that might give you a hint to look in the kernel.

Could you be running out of open files, maybe.

Another thing to try, is:  cd /
find . -print > /tmp/file.lst
cpio -ocvB > /dev/null < /tmp/file.lst

See if that changes anything.   It should remove some of pressure on the kernel tables.

Clem

ᐧ

On Thu, Dec 21, 2017 at 5:12 PM, William Corcoran <wlc at jctaylor.com<mailto:wlc at jctaylor.com>> wrote:
Hello Team TUHS:

I am having a problem with my PDP-11 SVR1 running under a recent SIMH build.  My problem occurs on both MAC OS X and FreeBSD.

First, I created a six disk (RP06) and eight port TTY (DZ) kernel, with swap placed on drive 1.  The system behaves beautifully as FSCK reports clean.  Eight users can login with no problem.

Second, I reverted to a pristine PDP-11 SVR1 with one drive (RP06) and no DZ and booted the default kernel (gdtm) and I see the same problem described below.

Third, when using the tape driver instead of /dev/null i get the same results.

Next, here is the issue:

cd /
find . -print | cpio -ocvB > /dev/null

It runs for a short while and then shitz a core:
I am using /dev/null to take the tape driver out of the equation.

Here is the backtrace for cpio:

$c
__strout(053522,043,0,053012)
__doprnt(046652,0177606,053012)
_fprintf(053012,046652,053522)
~main(02,0177636)


Now, interestingly,  I run into a similar issue when using tar:

cd  /usr
tar -cvf /dev/null .

Again, this will run for a while, then drops a core.  Here is the backtrace for tar:

$c
__strout(043123,02,0,045506)
__doprnt(043123,0167472,045506)
_fprintf(045506,043123,0170600)
~putfile(0170600,0170641)
~putfile(0171654,0171704)
~putfile(0172730,0172745)
~putfile(0174004,0174016)
~putfile(0175060,0175066)
~putfile(0176134,0176136)
~putfile(0177672,0177672)
~dorep(0177632)
~main(04,0177630)

This really bugging me since my SVR1 is otherwise working flawlessly.  I was able to remake the entire system and custom kernels that boot with no problem.
Also, I configured my main port to run inside the AWS Lightsail and now I have access to SVR1 from anywhere in the world!

I was also wondering if doing a CPIO or TAR on the entire system was overflowing some link tables and maybe this is expected behavior for the minimal resource of the PDP-11?

Thank you for any help.

Would you expect tar or cpio to dump core if you attempted to copy large filesystems  (or the entire system) on a PDP-11?
Note: All of my testing has been in single user mode.


Truly,

Bill Corcoran












-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171222/ef8bd98b/attachment-0001.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  0:34   ` William Corcoran
  2017-12-22  1:51     ` William Corcoran
@ 2017-12-22  1:55     ` Random832
  1 sibling, 0 replies; 15+ messages in thread
From: Random832 @ 2017-12-22  1:55 UTC (permalink / raw)

On Thu, Dec 21, 2017, at 19:34, William Corcoran wrote:
> echo $?
> 
> 1003
> 
> Now, I thought all return values were 8 bit?    What is that all about?

In the Bourne shell, when a process exits a signal, $? is set to sig|SIGFLG, which is, for some unknown reason, defined to 1000 (Yes, as in 0x3E8. As in 1000|11 == 1000|3 == 1003) - a value that no other version before or since SVR1 (at least not SysIII or SVR4) seems to have - every other one I checked has 0200.

I can only guess that the reason is to make it easier to read the signal number, and whoever it is only tested with SIGINT and SIGQUIT or they would have presumably changed it to sig+SIGFLG. Then at some point before a later SysV release, it was reverted to 0200.

POSIX for its part only guarantees a value above 128, not any particular value or distinct values per signal.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  1:51     ` William Corcoran
  2017-12-22  1:54       ` William Corcoran
@ 2017-12-22  2:55       ` Random832
  2017-12-22  3:47         ` William Corcoran
  2017-12-22 15:04       ` Clem Cole
  2 siblings, 1 reply; 15+ messages in thread
From: Random832 @ 2017-12-22  2:55 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Thu, Dec 21, 2017, at 20:51, William Corcoran wrote:
> Okay, I think I am on to something… 
> 
> Whenever tar or cpio dumps core, it is always when 32769 bytes have been 
> written to stdout.  

Is that 32769 bytes to the tape (which is not stdout in your tar invocation), or 32769 bytes of printed output? Is it exactly 32769, or just some value above 32768 by some small amount?

> I looked at fprintf and there is a register int called “count.” 

Your crash is in _strout, two calls deep before that variable is written since the adjust parameter is 0, the only thing of note in _strout is a putc loop. putc is a macro so it won't appear in the stack trace - that _flsbuf does not appear in the stack trace means this is happening in the 'ordinary' buffered I/O case.

Strange, though, I can't see anything inside putc/flsbuf that seems like anything should be any different on the 32770th character than the 514th or 1026th.

What are the strings being printed? (second argument to printf, and first argument to strout)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  2:55       ` Random832
@ 2017-12-22  3:47         ` William Corcoran
  2017-12-22  9:09           ` Random832
  0 siblings, 1 reply; 15+ messages in thread
From: William Corcoran @ 2017-12-22  3:47 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2369 bytes --]

Hello Random832,  

Yes, I make a mistake, the issue is with STDERR.  

The pipeline: 

find . -print | cpio -ocvB >/dev/null 2>/tmp/ERRFILE

will drop core: 

ERRFILE will have exactly 32769 bytes (no more and no less). 

STDOUT is redirected to /dev/null and there is no problem with STDOUT.  

For example: 

find . -print | cpio -ocB > /dev/null 

Succeeds because nothing (or close to nothing) is written to STDERR before cpio successfully terminates.   

STDOUT is fine and CPIO and tar can write thousands of blocks cleanly on STDOUT.  Normally, as you know, STDOUT would be redirected to the tape device and not /dev/null.  

Here is my take: 

This may be a latent defect in that early releases may not have realized the very large amounts of data that can be written to STDERR.  

In this case, each file  that is backed up with cpio is represented as a single line in STDERR.  When STDERR exceeds 32768 bytes, it blows up.  

fprintf returns an int that is the number of bytes in the stream.  For some reason, when this overflows, it is not wrapping around, instead it causes a segmentation violation.  

Could the issue be with fprintf or doprnt.c?  

I thought STDERR was unbuffered?   Please forgive me.  

Thank you for all of your help.  

On Dec 21, 2017, at 9:55 PM, Random832 <random832 at fastmail.com> wrote:

On Thu, Dec 21, 2017, at 20:51, William Corcoran wrote:
> Okay, I think I am on to something… 
> 
> Whenever tar or cpio dumps core, it is always when 32769 bytes have been 
> written to stdout.  

Is that 32769 bytes to the tape (which is not stdout in your tar invocation), or 32769 bytes of printed output? Is it exactly 32769, or just some value above 32768 by some small amount?

> I looked at fprintf and there is a register int called “count.” 

Your crash is in _strout, two calls deep before that variable is written since the adjust parameter is 0, the only thing of note in _strout is a putc loop. putc is a macro so it won't appear in the stack trace - that _flsbuf does not appear in the stack trace means this is happening in the 'ordinary' buffered I/O case.

Strange, though, I can't see anything inside putc/flsbuf that seems like anything should be any different on the 32770th character than the 514th or 1026th.

What are the strings being printed? (second argument to printf, and first argument to strout)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  3:47         ` William Corcoran
@ 2017-12-22  9:09           ` Random832
  2017-12-22 12:42             ` William Corcoran
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Random832 @ 2017-12-22  9:09 UTC (permalink / raw)

On Thu, Dec 21, 2017, at 22:47, William Corcoran wrote:
> Could the issue be with fprintf or doprnt.c?  
> 
> I thought STDERR was unbuffered?   Please forgive me.  

I've got it. The problem is with putc (actually _flsbuf), and it is precisely *because* stderr is unbuffered.

#define putc(x, p)	(--(p)->_cnt >= 0 ? \
			((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
			_flsbuf((unsigned char) (x), (p)))

Under normal circumstances for an unbuffered (or line-buffered) stream, _cnt starts as 0, and therefore every character is passed through _flsbuf.

However, _cnt is still decremented (becoming -1, -2, etc) - _flsbuf should be resetting this to zero (and does in other versions, both earlier and later), but we can see in flsbuf.c it does not. So, after 32769 characters have been output it ticks from -32768 to +32767, and putc now thinks the stream is buffered.

You'll want to add iop->cnt = 0 to the second if clause in _flsbuf.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  9:09           ` Random832
@ 2017-12-22 12:42             ` William Corcoran
  2017-12-22 23:48             ` William Corcoran
  2017-12-23  0:19             ` Dave Horsfall
  2 siblings, 0 replies; 15+ messages in thread
From: William Corcoran @ 2017-12-22 12:42 UTC (permalink / raw)


Brilliant Random832!

Very impressive!

Thank you for the early Christmas present!   Your initial analysis was equally spot on when you interpreted the backtrace.

This is very much appreciated!

Truly,

Bill Corcoran

On Dec 22, 2017, at 4:09 AM, Random832 <random832 at fastmail.com<mailto:random832 at fastmail.com>> wrote:

On Thu, Dec 21, 2017, at 22:47, William Corcoran wrote:
Could the issue be with fprintf or doprnt.c?

I thought STDERR was unbuffered?   Please forgive me.

I've got it. The problem is with putc (actually _flsbuf), and it is precisely *because* stderr is unbuffered.

#define putc(x, p)    (--(p)->_cnt >= 0 ? \
           ((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
           _flsbuf((unsigned char) (x), (p)))

Under normal circumstances for an unbuffered (or line-buffered) stream, _cnt starts as 0, and therefore every character is passed through _flsbuf.

However, _cnt is still decremented (becoming -1, -2, etc) - _flsbuf should be resetting this to zero (and does in other versions, both earlier and later), but we can see in flsbuf.c it does not. So, after 32769 characters have been output it ticks from -32768 to +32767, and putc now thinks the stream is buffered.

You'll want to add iop->cnt = 0 to the second if clause in _flsbuf.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171222/f0a7774d/attachment.html>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  1:51     ` William Corcoran
  2017-12-22  1:54       ` William Corcoran
  2017-12-22  2:55       ` Random832
@ 2017-12-22 15:04       ` Clem Cole
  2 siblings, 0 replies; 15+ messages in thread
From: Clem Cole @ 2017-12-22 15:04 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

On Thu, Dec 21, 2017 at 8:51 PM, William Corcoran <wlc at jctaylor.com> wrote:

>
>
> I can’t believe this is a latent defect.
>
Oh I can... SVR1 was not run on 16 bit machines that much I suspect.   By
the time SVR1 came on the scene, the VAX and 

68K were the primary UNIX systems.  AT&T was pushing the 3B but except for
the Telco's not getting much luck.

I'd look at the C runtime library.  I bet there is a overflow.  IIRC: The
BSD compiler (and Research) compilers used a different buffering scheme

 that the Summit folks did - Steve may remember the argument (I only
remember because I ran into that squirmish a few years early when the my
thesis work was causing an strange error  in the BSD runtime - I found and
fixed it and mentioned it to Dennis who had the same problem in the V8
compiler at that time).

The point is that 'standard' system in Summit by this time was likely to
been a 3B and Vaxen (i.e. 32 bit) and if there was something that was
assuming a 32 bit int in the runtime and it ran on PDP-11, it could easily
have not been tested.

Clem
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171222/0169a536/attachment-0001.html>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  9:09           ` Random832
  2017-12-22 12:42             ` William Corcoran
@ 2017-12-22 23:48             ` William Corcoran
  2017-12-23  0:19             ` Dave Horsfall
  2 siblings, 0 replies; 15+ messages in thread
From: William Corcoran @ 2017-12-22 23:48 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

Hello Team TUHS: 

This note confirms that Random832’s System V Release 1 fix for CPIO and TAR core dump works PERFECTLY!!!

			iop->_cnt=0;

Truly, 

Bill Corcoran   

On Dec 22, 2017, at 4:09 AM, Random832 <random832 at fastmail.com> wrote:

On Thu, Dec 21, 2017, at 22:47, William Corcoran wrote:
> Could the issue be with fprintf or doprnt.c?  
> 
> I thought STDERR was unbuffered?   Please forgive me.  

I've got it. The problem is with putc (actually _flsbuf), and it is precisely *because* stderr is unbuffered.

#define putc(x, p)	(--(p)->_cnt >= 0 ? \
			((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
			_flsbuf((unsigned char) (x), (p)))

Under normal circumstances for an unbuffered (or line-buffered) stream, _cnt starts as 0, and therefore every character is passed through _flsbuf.

However, _cnt is still decremented (becoming -1, -2, etc) - _flsbuf should be resetting this to zero (and does in other versions, both earlier and later), but we can see in flsbuf.c it does not. So, after 32769 characters have been output it ticks from -32768 to +32767, and putc now thinks the stream is buffered.

You'll want to add iop->cnt = 0 to the second if clause in _flsbuf.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-22  9:09           ` Random832
  2017-12-22 12:42             ` William Corcoran
  2017-12-22 23:48             ` William Corcoran
@ 2017-12-23  0:19             ` Dave Horsfall
  2017-12-23  0:21               ` Larry McVoy
  2 siblings, 1 reply; 15+ messages in thread
From: Dave Horsfall @ 2017-12-23  0:19 UTC (permalink / raw)


On Fri, 22 Dec 2017, Random832 wrote:

> I've got it. The problem is with putc (actually _flsbuf), and it is 
> precisely *because* stderr is unbuffered.
>
> #define putc(x, p)	(--(p)->_cnt >= 0 ? \
> 			((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
> 			_flsbuf((unsigned char) (x), (p)))

[...]

That, sir, is one brilliant piece of analysis; well done!  Of course, in 
hindsight it's bleedin' obvious :-)

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-23  0:19             ` Dave Horsfall
@ 2017-12-23  0:21               ` Larry McVoy
  2017-12-23 12:26                 ` Clem cole
  0 siblings, 1 reply; 15+ messages in thread
From: Larry McVoy @ 2017-12-23  0:21 UTC (permalink / raw)


On Sat, Dec 23, 2017 at 11:19:30AM +1100, Dave Horsfall wrote:
> On Fri, 22 Dec 2017, Random832 wrote:
> 
> >I've got it. The problem is with putc (actually _flsbuf), and it is
> >precisely *because* stderr is unbuffered.
> >
> >#define putc(x, p)	(--(p)->_cnt >= 0 ? \
> >			((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
> >			_flsbuf((unsigned char) (x), (p)))
> 
> [...]
> 
> That, sir, is one brilliant piece of analysis; well done!  Of course, in
> hindsight it's bleedin' obvious :-)

I'm curious as to which release of System V fixed this.  The SVR4 was the
release that gained (some) traction, was it busted until then?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] SYSTEM V R1 HELP
  2017-12-23  0:21               ` Larry McVoy
@ 2017-12-23 12:26                 ` Clem cole
  0 siblings, 0 replies; 15+ messages in thread
From: Clem cole @ 2017-12-23 12:26 UTC (permalink / raw)


V1.  

Sent from my PDP-7 Running UNIX V0 expect things to be almost but not quite. 

> On Dec 22, 2017, at 7:21 PM, Larry McVoy <lm at mcvoy.com> wrote:
> 
>> On Sat, Dec 23, 2017 at 11:19:30AM +1100, Dave Horsfall wrote:
>>> On Fri, 22 Dec 2017, Random832 wrote:
>>> 
>>> I've got it. The problem is with putc (actually _flsbuf), and it is
>>> precisely *because* stderr is unbuffered.
>>> 
>>> #define putc(x, p)    (--(p)->_cnt >= 0 ? \
>>>            ((int) (*(p)->_ptr++ = (unsigned char) (x))) : \
>>>            _flsbuf((unsigned char) (x), (p)))
>> 
>> [...]
>> 
>> That, sir, is one brilliant piece of analysis; well done!  Of course, in
>> hindsight it's bleedin' obvious :-)
> 
> I'm curious as to which release of System V fixed this.  The SVR4 was the
> release that gained (some) traction, was it busted until then?


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-12-23 12:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-21 22:12 [TUHS] SYSTEM V R1 HELP William Corcoran
2017-12-21 22:30 ` Clem Cole
2017-12-22  0:34   ` William Corcoran
2017-12-22  1:51     ` William Corcoran
2017-12-22  1:54       ` William Corcoran
2017-12-22  2:55       ` Random832
2017-12-22  3:47         ` William Corcoran
2017-12-22  9:09           ` Random832
2017-12-22 12:42             ` William Corcoran
2017-12-22 23:48             ` William Corcoran
2017-12-23  0:19             ` Dave Horsfall
2017-12-23  0:21               ` Larry McVoy
2017-12-23 12:26                 ` Clem cole
2017-12-22 15:04       ` Clem Cole
2017-12-22  1:55     ` Random832

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).