The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: paul.winalski@gmail.com (Paul Winalski)
Subject: [TUHS] long lived programs
Date: Sat, 7 Apr 2018 16:41:28 -0400	[thread overview]
Message-ID: <CABH=_VRr05LFd0xMHiqrHBT-u+3LP2uY+FvM79PqK0Hm-uRtkQ@mail.gmail.com> (raw)
In-Reply-To: <1522962186.9871.for-standards-violators@oclsc.org>

On 4/5/18, Norman Wilson <norman at oclsc.org> wrote:

[regarding streams implementation of pipes]
>
> But the System V folks were very nervous about it anyway, and
> wrote a planning document in which they proposed to create a
> new, different system call to make stream pipes.  pipe(2) would
> make an old-fashioned pipe; spipe(2) (or whatever it was called,
> I forget the name) had to be called to get a stream.  The document
> didn't really explain the justification for this.  To us in
> Research it just sounded crazy.

Sometimes critical code can have unintended dependencies on buggy or
undocumented behavior of system features.  I ran into something of
this sort in the Unix C runtime when I did the linker for VAX Fortran
for Ultrix.  The VAX Fortran runtime was written in several source
languages, and there was no common back end for the compilers for
these languages, so we decided that the easiest way to get the whole
mess ported to Ultrix was to port the VAX/VMS linker to Ultrix and to
teach it to understand a.out files and ar archives.  To prevent any
copyright or other IP problems, we did the project without reference
to the Unix sources or anything other than publicly published
documents.

All went well until we got to testing, where we got a curious test
failure.  The cause of the failure was the allocation of the C RTL's
iob structure, which is an array that holds the file descriptors
associated with stdin, stdout, and stderr.  The program was dying
because it was accessing iob[2] (stderr), but my Ultrix linker had
only allocated 8 bytes, not 12, for the iob array.  ld, on the other
hand, allocated 12 bytes.  All of the objects that participated in the
link had iob declared as an 8-byte common symbol, so I couldn't for
the life of me understand why ld allocated 12 bytes for it.

In desperation I looked at the source code for ld.  a.out common
symbols have the "external" bit specified, but unlike global reference
symbols they have a non-zero value field.  If there is a global
definition for the name, the common symbol is resolved against that
(i.e., it behaves like a global reference).  If not, the linker
allocates space in bss for the symbol, using the value field as the
number of bytes to allocate.  If common symbols of the same name from
different object files have different sizes, the linker allocates the
largest size.  The other significant feature of common symbols is that
if an archive member contains a common symbol that resolves a global
reference, that isn't enough to cause the archive member to be loaded,
as would be the case with a global definition.

The root cause of my problem was a feature of the ranlib program.
When ranlib built the archive index of global symbols, it merely
looked at the "external" bit--it indexed common symbols as well as
global definitions.  So if a linker sees a name it's looking for in
the ranlib index, it has to actually process the module's symbol table
to make sure that it is a "hard" definition and not a common symbol.
My ported VMS linker was very careful to do a pre-scan of each module
before loading so as to prevent common symbols causing a load.  ld
took a different approach--it loaded the module and then processed its
symbol table.  If it found that a common symbol had provoked the load,
it said "oops" and unloaded the module.  But by then it had already
maximized the sizes of any common symbols that were in the module--the
new sizes didn't get backed out.  So for common symbols ld would
allocate not the largest size for the symbol in any module
participating in the link, but the largest size IN ANY MODULE THAT LD
SAW while doing the link!

It turned out that there was a module in the Unix C runtime declared
iob as a two-element array, but the code accessed iob[2].  They got
away with this bug because ld always saw modules where iob was a
three-element array when it processed libc.a and thus always allocated
12 bytes for it.  My linker processed only the symbols in the modules
that actually were brought in from libc.a, and hence it ended up
allocating only 8 bytes.

One of Murphy's laws of programming is that if a facility has
undocumented side effects, there will be an important program that
depends on them.  Hence the reluctance of many software engineers to
make radical changes to how a feature is implemented.

-Paul W.


  parent reply	other threads:[~2018-04-07 20:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05 21:03 Norman Wilson
2018-04-05 21:23 ` Clem Cole
2018-04-05 21:38   ` Bakul Shah
2018-04-06  2:03     ` Random832
2018-04-06  4:27       ` Warner Losh
2018-04-06  4:31         ` Jon Steinhart
2018-04-06  4:58         ` Steve Nickolas
2018-04-06  5:02           ` Jon Steinhart
2018-04-06  4:29       ` Steve Johnson
2018-04-06  5:57       ` Bakul Shah
2018-04-06 21:52         ` Peter Jeremy
2018-04-05 22:46   ` Arthur Krewat
2018-04-05 23:23   ` Paul Winalski
2018-04-05 23:33     ` Arthur Krewat
2018-04-06  0:05       ` Toby Thain
2018-04-06  4:51 ` Dave Horsfall
2018-04-06 15:00 ` Tony Finch
2018-04-07 20:41 ` Paul Winalski [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-04-06 22:33 Doug McIlroy
2018-04-07  1:01 ` Paul Winalski
2018-04-07  1:09   ` Larry McVoy
2018-03-23 18:27 [TUHS] long lived programs (was Re: RIP John Backus Bakul Shah
2018-03-23 20:50 ` [TUHS] long lived programs Steve Johnson
2018-03-23 21:07   ` Clem Cole
2018-03-23 15:51 Ron Natalie
2018-03-23 15:57 ` Clem Cole
2018-03-23 16:25   ` Lars Brinkhoff
2018-03-23 16:59     ` Lawrence Stewart
2018-03-23 17:31       ` Steve Nickolas
2018-03-23 16:32   ` Ron Natalie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABH=_VRr05LFd0xMHiqrHBT-u+3LP2uY+FvM79PqK0Hm-uRtkQ@mail.gmail.com' \
    --to=paul.winalski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).