The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Question about early C behavior.
@ 2020-01-10 19:07 Dan Cross
  2020-01-10 20:24 ` Paul Winalski
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dan Cross @ 2020-01-10 19:07 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

This question comes from a colleague, who works on compilers.

Given the definition `int x;` (without an initializer) in a source file the
corresponding object contains `x` in a "common" section. What this means is
that, at link time, if some object file explicitly allocates an 'x' (e.g.,
by specifying an initializer, so that 'x' appears in the data section for
that object file), use that; otherwise, allocate space for it at link time,
possibly in the BSS. If several source files contain such a declaration,
the linker allocates exactly one 'x' (or whatever identifier) as
appropriate. We've verified that this behavior was present as early as 6th
edition.

The question is, what is the origin of this concept and nomenclature?
FORTRAN, of course, has "common blocks": was that an inspiration for the
name? Where did the idea for the implicit behavior come from (FORTRAN
common blocks are explicit).

My colleague was particularly surprised that this seemed required: even at
this early stage, the `extern` keyword was present, so why bother with this
behavior? Why not, instead, make it a link-time error? Please note that if
two source files have initializers for these variables, then one gets a
multiple-definition link error. The 1988 ANSI standard made this an error
(or at least undefined behavior) but the functionality persists; GCC is
changing its default to prohibit it (my colleague works on clang).

Doug? Ken? Steve?

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 1654 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Question about early C behavior.
  2020-01-10 19:07 [TUHS] Question about early C behavior Dan Cross
@ 2020-01-10 20:24 ` Paul Winalski
  2020-01-10 20:55 ` Derek Fawcus
  2020-01-10 21:05 ` Clem Cole
  2 siblings, 0 replies; 5+ messages in thread
From: Paul Winalski @ 2020-01-10 20:24 UTC (permalink / raw)
  To: Dan Cross; +Cc: The Eunuchs Hysterical Society

On 1/10/20, Dan Cross <crossd@gmail.com> wrote:
>
> Given the definition `int x;` (without an initializer) in a source file the
> corresponding object contains `x` in a "common" section. What this means is
> that, at link time, if some object file explicitly allocates an 'x' (e.g.,
> by specifying an initializer, so that 'x' appears in the data section for
> that object file), use that; otherwise, allocate space for it at link time,
> possibly in the BSS. If several source files contain such a declaration,
> the linker allocates exactly one 'x' (or whatever identifier) as
> appropriate. We've verified that this behavior was present as early as 6th
> edition.

I think the situation you describe (common sections) is how this is
done in ELF.  a.out and COFF, as used on Unix, don't have common
sections.  Instead 'int x;' (without an initializer) becomes symbol
'x' in the object file's symbol table, with both the "external" and
"undefined" attribute bits set, and with the symbol's value being the
size of 'x' (typically 4 bites, in your example).  It is the non-zero
symbol value that distinguishes common symbols from ordinary external
references, e.g., 'extern int x;' (without an initializer).

At link time, common symbols are handled differently from ordinary
external references:

[1] When the linker is searching libraries, an ordinary external
reference to 'x' will cause the linker to load an object that contains
an external definition for 'x'.  Common symbols do not trigger the
loading of an object from a library.

[2] After the linker has processed all of the files and libraries on
the command line, if there is an external definition for 'x', all
common symbol references to 'x' are treated as ordinary external
references to 'x' and resolved against the definition.  If no external
definition is found, the linker allocates 'x' in BSS, using the
maximum allocation size seen in any common symbol references to 'x'.
All common symbol references and ordinary external references to 'x'
are resolved to the newly-allocated space.

> The question is, what is the origin of this concept and nomenclature?
> FORTRAN, of course, has "common blocks": was that an inspiration for the
> name? Where did the idea for the implicit behavior come from (FORTRAN
> common blocks are explicit).

Yes, the concept, nomenclature, and semantics come from FORTRAN, and
they were included in a.out and COFF to support FORTRAN and other
languages (such as PL/I) that have COMMON block-type semantics.  I
don't know why 'int x;' (without an initializer) in C was implemented
as a common symbol.  I suspect it was done to allow C and FORTRAN
object modules linked together in the same executable to share
external data.

-Paul W.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Question about early C behavior.
  2020-01-10 19:07 [TUHS] Question about early C behavior Dan Cross
  2020-01-10 20:24 ` Paul Winalski
@ 2020-01-10 20:55 ` Derek Fawcus
  2020-01-10 21:02   ` Warner Losh
  2020-01-10 21:05 ` Clem Cole
  2 siblings, 1 reply; 5+ messages in thread
From: Derek Fawcus @ 2020-01-10 20:55 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Fri, Jan 10, 2020 at 02:07:53PM -0500, Dan Cross wrote:
> 
> My colleague was particularly surprised that this seemed required: even at
> this early stage, the `extern` keyword was present, so why bother with this
> behavior? Why not, instead, make it a link-time error? Please note that if
> two source files have initializers for these variables, then one gets a
> multiple-definition link error. The 1988 ANSI standard made this an error
> (or at least undefined behavior) but the functionality persists; GCC is
> changing its default to prohibit it (my colleague works on clang).

This behaviour differed between platforms, unix using the common approach,
and some other platforms simplying making it a (non common) symbol in the bss.

Having learnt C in its pre-ANSI form on unix, I then ran in to this behaviour
on DOS C compilers.  None of which (that I came across) providing the 'common'
behaviour.

DF

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Question about early C behavior.
  2020-01-10 20:55 ` Derek Fawcus
@ 2020-01-10 21:02   ` Warner Losh
  0 siblings, 0 replies; 5+ messages in thread
From: Warner Losh @ 2020-01-10 21:02 UTC (permalink / raw)
  To: Derek Fawcus; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Fri, Jan 10, 2020, 1:55 PM Derek Fawcus <dfawcus+lists-tuhs@employees.org>
wrote:

> On Fri, Jan 10, 2020 at 02:07:53PM -0500, Dan Cross wrote:
> >
> > My colleague was particularly surprised that this seemed required: even
> at
> > this early stage, the `extern` keyword was present, so why bother with
> this
> > behavior? Why not, instead, make it a link-time error? Please note that
> if
> > two source files have initializers for these variables, then one gets a
> > multiple-definition link error. The 1988 ANSI standard made this an error
> > (or at least undefined behavior) but the functionality persists; GCC is
> > changing its default to prohibit it (my colleague works on clang).
>
> This behaviour differed between platforms, unix using the common approach,
> and some other platforms simplying making it a (non common) symbol in the
> bss.
>
> Having learnt C in its pre-ANSI form on unix, I then ran in to this
> behaviour
> on DOS C compilers.  None of which (that I came across) providing the
> 'common'
> behaviour.
>

Gcc offered warnings for this behavior in the early 90s, iirc. I went
through a bunch of code in that time frame to remove the assumption...

Warner

>

[-- Attachment #2: Type: text/html, Size: 1813 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Question about early C behavior.
  2020-01-10 19:07 [TUHS] Question about early C behavior Dan Cross
  2020-01-10 20:24 ` Paul Winalski
  2020-01-10 20:55 ` Derek Fawcus
@ 2020-01-10 21:05 ` Clem Cole
  2 siblings, 0 replies; 5+ messages in thread
From: Clem Cole @ 2020-01-10 21:05 UTC (permalink / raw)
  To: Dan Cross; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

On Fri, Jan 10, 2020 at 2:09 PM Dan Cross <crossd@gmail.com> wrote:

> The 1988 ANSI standard made this an error (or at least undefined behavior)
> but the functionality persists; GCC is changing its default to prohibit it
> (my colleague works on clang).
>
Lovely - let's break code because we can.

To quote our late friend and colleague dmr:

“I can't recall any difficulty in making the C language definition
completely open - any discussion on the matter tended to mention languages
whose inventors tried to keep tight control, and consequent ill fate”
<https://www.inspiringquotes.us/quotes/TkCZ_JSNjCihu>

“When I read commentary about suggestions for where C should go, I often
think back and give thanks that it wasn't developed under the advice of a
worldwide crowd.” <https://www.inspiringquotes.us/quotes/eDQR_hqwtHAC9>

“C is peculiar in a lot of ways, but it, like many other successful things,
has a certain unity of approach that stems from development in a small
group” <https://www.inspiringquotes.us/quotes/zjSl_37Fc1onj>

[-- Attachment #2: Type: text/html, Size: 3777 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-10 21:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10 19:07 [TUHS] Question about early C behavior Dan Cross
2020-01-10 20:24 ` Paul Winalski
2020-01-10 20:55 ` Derek Fawcus
2020-01-10 21:02   ` Warner Losh
2020-01-10 21:05 ` Clem Cole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).