The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] origin of null-terminated strings
@ 2022-12-16  3:02 Douglas McIlroy
  2022-12-16  3:14 ` [TUHS] " Ken Thompson
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Douglas McIlroy @ 2022-12-16  3:02 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: TUHS main list

I think this cited quote from
https://www.joelonsoftware.com/2001/12/11/ is urban legend.

    Why do C strings [have a terminating NUl]? It’s because the PDP-7
microprocessor, on which UNIX and the C programming language were
invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
at the end.”

This assertion seems unlikely since neither C nor the library string
functions existed on the PDP-7. In fact the "terminating character" of
a string in the PDP-7 language B was the pair '*e'. A string was a
sequence of words, packed two characters per word. For odd-length
strings half of the final one-character word was effectively
NUL-padded as described below.

One might trace null termination to the original (1965) proposal for
ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
role specifically suggested for NUL is to "serve to accomplish time
fill or media fill." With character-addressable hardware (not the
PDP-7), it is only a small step from using NUL as terminal padding to
the convention of null termination in all cases.

Ken would probably know for sure whether there's any  truth in the
attribution to ASCIZ.

Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  3:02 [TUHS] origin of null-terminated strings Douglas McIlroy
@ 2022-12-16  3:14 ` Ken Thompson
  2022-12-16  9:13   ` Dr Iain Maoileoin
  2022-12-16  3:17 ` Steve Nickolas
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Ken Thompson @ 2022-12-16  3:14 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: Alejandro Colomar, TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

asciz -- this is the first time i heard of it.
doug -- yes.


On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:

> I think this cited quote from
> https://www.joelonsoftware.com/2001/12/11/ is urban legend.
>
>     Why do C strings [have a terminating NUl]? It’s because the PDP-7
> microprocessor, on which UNIX and the C programming language were
> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
> at the end.”
>
> This assertion seems unlikely since neither C nor the library string
> functions existed on the PDP-7. In fact the "terminating character" of
> a string in the PDP-7 language B was the pair '*e'. A string was a
> sequence of words, packed two characters per word. For odd-length
> strings half of the final one-character word was effectively
> NUL-padded as described below.
>
> One might trace null termination to the original (1965) proposal for
> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
> role specifically suggested for NUL is to "serve to accomplish time
> fill or media fill." With character-addressable hardware (not the
> PDP-7), it is only a small step from using NUL as terminal padding to
> the convention of null termination in all cases.
>
> Ken would probably know for sure whether there's any  truth in the
> attribution to ASCIZ.
>
> Doug
>

[-- Attachment #2: Type: text/html, Size: 1982 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  3:02 [TUHS] origin of null-terminated strings Douglas McIlroy
  2022-12-16  3:14 ` [TUHS] " Ken Thompson
@ 2022-12-16  3:17 ` Steve Nickolas
  2022-12-16 17:24 ` John P. Linderman
       [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com>
  3 siblings, 0 replies; 18+ messages in thread
From: Steve Nickolas @ 2022-12-16  3:17 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]

On Thu, 15 Dec 2022, Douglas McIlroy wrote:

> I think this cited quote from
> https://www.joelonsoftware.com/2001/12/11/ is urban legend.
>
>    Why do C strings [have a terminating NUl]? It’s because the PDP-7
> microprocessor, on which UNIX and the C programming language were
> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
> at the end.”
>
> This assertion seems unlikely since neither C nor the library string
> functions existed on the PDP-7. In fact the "terminating character" of
> a string in the PDP-7 language B was the pair '*e'. A string was a
> sequence of words, packed two characters per word. For odd-length
> strings half of the final one-character word was effectively
> NUL-padded as described below.
>
> One might trace null termination to the original (1965) proposal for
> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
> role specifically suggested for NUL is to "serve to accomplish time
> fill or media fill." With character-addressable hardware (not the
> PDP-7), it is only a small step from using NUL as terminal padding to
> the convention of null termination in all cases.
>
> Ken would probably know for sure whether there's any  truth in the
> attribution to ASCIZ.
>
> Doug
>

For what it's worth, when I code for the Apple //e (using 65C02 
assembler), I use C strings.  I can just do something like

prstr:   ldy       #$00
@1:      lda       msg, y
          beq       @2        ; string terminator
          ora       #$80      ; firmware wants high bit on
          jsr       $FDED     ; write char
          iny
          bne       @1
@2:      rts

msg:     .byte     "Hello, cruel world.", 13, 0

and using a NUL terminator just makes sense here because of how simple it 
is to check for (BEQ and BNE check the 6502's zero flag, which LDA 
automatically sets).

-uso.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  3:14 ` [TUHS] " Ken Thompson
@ 2022-12-16  9:13   ` Dr Iain Maoileoin
  2022-12-16 13:42     ` Dan Halbert
  2022-12-16 20:12     ` Dave Horsfall
  0 siblings, 2 replies; 18+ messages in thread
From: Dr Iain Maoileoin @ 2022-12-16  9:13 UTC (permalink / raw)
  To: Ken Thompson; +Cc: Douglas McIlroy, Alejandro Colomar, TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

ASCIZ
Lost in the mists of time in my mind.

I remember running into a .asciz directive n the 70s “somewhere”.
It was an assembler directive in one of the RT11 systems??? or perhaps the unix bootstrap and/or “.s” files - when I get some time I will go read some old code/manuals.

I

Yes, it put a null byte at the end of a string.

> On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote:
> 
> asciz -- this is the first time i heard of it.
> doug -- yes.
> 
> 
> On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <douglas.mcilroy@dartmouth.edu <mailto:douglas.mcilroy@dartmouth.edu>> wrote:
> I think this cited quote from
> https://www.joelonsoftware.com/2001/12/11/ <https://www.joelonsoftware.com/2001/12/11/> is urban legend.
> 
>     Why do C strings [have a terminating NUl]? It’s because the PDP-7
> microprocessor, on which UNIX and the C programming language were
> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
> at the end.”
> 
> This assertion seems unlikely since neither C nor the library string
> functions existed on the PDP-7. In fact the "terminating character" of
> a string in the PDP-7 language B was the pair '*e'. A string was a
> sequence of words, packed two characters per word. For odd-length
> strings half of the final one-character word was effectively
> NUL-padded as described below.
> 
> One might trace null termination to the original (1965) proposal for
> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839 <https://dl.acm.org/doi/10.1145/363831.363839>. There the only
> role specifically suggested for NUL is to "serve to accomplish time
> fill or media fill." With character-addressable hardware (not the
> PDP-7), it is only a small step from using NUL as terminal padding to
> the convention of null termination in all cases.
> 
> Ken would probably know for sure whether there's any  truth in the
> attribution to ASCIZ.
> 
> Doug


[-- Attachment #2: Type: text/html, Size: 3319 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  9:13   ` Dr Iain Maoileoin
@ 2022-12-16 13:42     ` Dan Halbert
  2022-12-16 16:10       ` Dan Cross
  2022-12-16 20:12     ` Dave Horsfall
  1 sibling, 1 reply; 18+ messages in thread
From: Dan Halbert @ 2022-12-16 13:42 UTC (permalink / raw)
  To: tuhs

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

ASCIZ was an assembler directive used for a number of different DEC 
computers, and also the name for null-terminated strings. I learned it 
for the PDP-10, but I'm sure it existed on other machines. It is in some 
PDP-10 documentation I am looking at right now. Anyone who used DEC and 
did assembly programming would have known about it. Various system calls 
took ASCIZ strings.

On 12/16/22 04:13, Dr Iain Maoileoin wrote:
> ASCIZ
> Lost in the mists of time in my mind.
>
> I remember running into a .asciz directive n the 70s “somewhere”.
> It was an assembler directive in one of the RT11 systems??? or perhaps 
> the unix bootstrap and/or “.s” files - when I get some time I will go 
> read some old code/manuals.
>
> I
>
> Yes, it put a null byte at the end of a string.
>
>> On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote:
>>
>> asciz -- this is the first time i heard of it.
>> doug -- yes.
>>
>>
>> On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy 
>> <douglas.mcilroy@dartmouth.edu> wrote:
>>
>>     I think this cited quote from
>>     https://www.joelonsoftware.com/2001/12/11/ is urban legend.
>>
>>         Why do C strings [have a terminating NUl]? It’s because the PDP-7
>>     microprocessor, on which UNIX and the C programming language were
>>     invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z
>>     (zero)
>>     at the end.”
>>
>>     This assertion seems unlikely since neither C nor the library string
>>     functions existed on the PDP-7. In fact the "terminating
>>     character" of
>>     a string in the PDP-7 language B was the pair '*e'. A string was a
>>     sequence of words, packed two characters per word. For odd-length
>>     strings half of the final one-character word was effectively
>>     NUL-padded as described below.
>>
>>     One might trace null termination to the original (1965) proposal for
>>     ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only
>>     role specifically suggested for NUL is to "serve to accomplish time
>>     fill or media fill." With character-addressable hardware (not the
>>     PDP-7), it is only a small step from using NUL as terminal padding to
>>     the convention of null termination in all cases.
>>
>>     Ken would probably know for sure whether there's any truth in the
>>     attribution to ASCIZ.
>>
>>     Doug
>>
>

[-- Attachment #2: Type: text/html, Size: 5515 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 13:42     ` Dan Halbert
@ 2022-12-16 16:10       ` Dan Cross
  2022-12-16 16:22         ` Tom Lyon
  2022-12-16 16:29         ` Jon Steinhart
  0 siblings, 2 replies; 18+ messages in thread
From: Dan Cross @ 2022-12-16 16:10 UTC (permalink / raw)
  To: Dan Halbert; +Cc: tuhs

On Fri, Dec 16, 2022 at 8:42 AM Dan Halbert <halbert@halwitz.org> wrote:
> ASCIZ was an assembler directive used for a number of different DEC computers, and also the name for null-terminated strings. I learned it for the PDP-10, but I'm sure it existed on other machines. It is in some PDP-10 documentation I am looking at right now. Anyone who used DEC and did assembly programming would have known about it. Various system calls took ASCIZ strings.

This raises something I've always been curious about. To what extent were
the Unix folks at Bell Labs already familiar with DEC systems before the PDP-7?

It strikes me that much of the published work was centered around IBM and GE
systems (e.g., Ken's wonderful paper on regular expressions, and of course the
Multics work). Were there other Digital machines floating around? I know a
proposal was written to get a PDP-10 for operating systems research, but it
wasn't approved.

Relatedly, was any thought given to trying to get a 360 system?

On 12/16/22 04:13, Dr Iain Maoileoin wrote:
> ASCIZ
> Lost in the mists of time in my mind.

Origin, perhaps, but it exists in contemporary assemblers. Like most
sane people I try to avoid being in assembler for too long, when you're
first turning on a machine it is useful to be able to squirt a message
out of the UART if something goes dramatically wrong, and the directive
is handy for that.

It seems to have made its way into Research assembler via BSD; it's in
locore.s in 8th Edition, for instance, but doesn't appear before that.  The
"UNIX Assembler Manual" describes "String Statements" for the 7th
Edition assembler; strings are sequences of ASCII characters between
'<' and '>'.  But it doesn't say that they're NUL terminated, and they are
not: adding the terminator was manual via the familiar, `\0` escape
sequence.

        - Dan C.


> I remember running into a .asciz directive n the 70s “somewhere”.
> It was an assembler directive in one of the RT11 systems??? or perhaps the unix bootstrap and/or “.s” files - when I get some time I will go read some old code/manuals.
>
> I
>
> Yes, it put a null byte at the end of a string.
>
> On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote:
>
> asciz -- this is the first time i heard of it.
> doug -- yes.
>
>
> On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
>>
>> I think this cited quote from
>> https://www.joelonsoftware.com/2001/12/11/ is urban legend.
>>
>>     Why do C strings [have a terminating NUl]? It’s because the PDP-7
>> microprocessor, on which UNIX and the C programming language were
>> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
>> at the end.”
>>
>> This assertion seems unlikely since neither C nor the library string
>> functions existed on the PDP-7. In fact the "terminating character" of
>> a string in the PDP-7 language B was the pair '*e'. A string was a
>> sequence of words, packed two characters per word. For odd-length
>> strings half of the final one-character word was effectively
>> NUL-padded as described below.
>>
>> One might trace null termination to the original (1965) proposal for
>> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
>> role specifically suggested for NUL is to "serve to accomplish time
>> fill or media fill." With character-addressable hardware (not the
>> PDP-7), it is only a small step from using NUL as terminal padding to
>> the convention of null termination in all cases.
>>
>> Ken would probably know for sure whether there's any  truth in the
>> attribution to ASCIZ.
>>
>> Doug
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 16:10       ` Dan Cross
@ 2022-12-16 16:22         ` Tom Lyon
  2022-12-16 16:29         ` Jon Steinhart
  1 sibling, 0 replies; 18+ messages in thread
From: Tom Lyon @ 2022-12-16 16:22 UTC (permalink / raw)
  To: Dan Cross; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 4327 bytes --]

Re: getting a 360 - IBM and AT&T really hated each other, so 360s were
avoided for strategic reasons. That said, they could not be practically
avoided; Holmdel had a large installation:
https://www.youtube.com/watch?v=HMYiktO0D64&ab_channel=AT%26TTechChannel

When Amdahl and UTS/UNIX came along, the Bell System was by far the biggest
customer.

On Fri, Dec 16, 2022 at 8:12 AM Dan Cross <crossd@gmail.com> wrote:

> On Fri, Dec 16, 2022 at 8:42 AM Dan Halbert <halbert@halwitz.org> wrote:
> > ASCIZ was an assembler directive used for a number of different DEC
> computers, and also the name for null-terminated strings. I learned it for
> the PDP-10, but I'm sure it existed on other machines. It is in some PDP-10
> documentation I am looking at right now. Anyone who used DEC and did
> assembly programming would have known about it. Various system calls took
> ASCIZ strings.
>
> This raises something I've always been curious about. To what extent were
> the Unix folks at Bell Labs already familiar with DEC systems before the
> PDP-7?
>
> It strikes me that much of the published work was centered around IBM and
> GE
> systems (e.g., Ken's wonderful paper on regular expressions, and of course
> the
> Multics work). Were there other Digital machines floating around? I know a
> proposal was written to get a PDP-10 for operating systems research, but it
> wasn't approved.
>
> Relatedly, was any thought given to trying to get a 360 system?
>
> On 12/16/22 04:13, Dr Iain Maoileoin wrote:
> > ASCIZ
> > Lost in the mists of time in my mind.
>
> Origin, perhaps, but it exists in contemporary assemblers. Like most
> sane people I try to avoid being in assembler for too long, when you're
> first turning on a machine it is useful to be able to squirt a message
> out of the UART if something goes dramatically wrong, and the directive
> is handy for that.
>
> It seems to have made its way into Research assembler via BSD; it's in
> locore.s in 8th Edition, for instance, but doesn't appear before that.  The
> "UNIX Assembler Manual" describes "String Statements" for the 7th
> Edition assembler; strings are sequences of ASCII characters between
> '<' and '>'.  But it doesn't say that they're NUL terminated, and they are
> not: adding the terminator was manual via the familiar, `\0` escape
> sequence.
>
>         - Dan C.
>
>
> > I remember running into a .asciz directive n the 70s “somewhere”.
> > It was an assembler directive in one of the RT11 systems??? or perhaps
> the unix bootstrap and/or “.s” files - when I get some time I will go read
> some old code/manuals.
> >
> > I
> >
> > Yes, it put a null byte at the end of a string.
> >
> > On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote:
> >
> > asciz -- this is the first time i heard of it.
> > doug -- yes.
> >
> >
> > On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <
> douglas.mcilroy@dartmouth.edu> wrote:
> >>
> >> I think this cited quote from
> >> https://www.joelonsoftware.com/2001/12/11/ is urban legend.
> >>
> >>     Why do C strings [have a terminating NUl]? It’s because the PDP-7
> >> microprocessor, on which UNIX and the C programming language were
> >> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
> >> at the end.”
> >>
> >> This assertion seems unlikely since neither C nor the library string
> >> functions existed on the PDP-7. In fact the "terminating character" of
> >> a string in the PDP-7 language B was the pair '*e'. A string was a
> >> sequence of words, packed two characters per word. For odd-length
> >> strings half of the final one-character word was effectively
> >> NUL-padded as described below.
> >>
> >> One might trace null termination to the original (1965) proposal for
> >> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
> >> role specifically suggested for NUL is to "serve to accomplish time
> >> fill or media fill." With character-addressable hardware (not the
> >> PDP-7), it is only a small step from using NUL as terminal padding to
> >> the convention of null termination in all cases.
> >>
> >> Ken would probably know for sure whether there's any  truth in the
> >> attribution to ASCIZ.
> >>
> >> Doug
> >
> >
> >
>

[-- Attachment #2: Type: text/html, Size: 5568 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 16:10       ` Dan Cross
  2022-12-16 16:22         ` Tom Lyon
@ 2022-12-16 16:29         ` Jon Steinhart
  1 sibling, 0 replies; 18+ messages in thread
From: Jon Steinhart @ 2022-12-16 16:29 UTC (permalink / raw)
  To: tuhs

Dan Cross writes:
>
> This raises something I've always been curious about. To what extent were
> the Unix folks at Bell Labs already familiar with DEC systems before the PDP-7?

Well, I recall that there was a PDP-8 in the keypunch room on the 5th floor of
building 2.  I believe that it was hooked to a card reader and printer so that
one could get a listing of a deck of cards without having to use the computer
center.  But that's probably not what you're asking about.

Jon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  3:02 [TUHS] origin of null-terminated strings Douglas McIlroy
  2022-12-16  3:14 ` [TUHS] " Ken Thompson
  2022-12-16  3:17 ` Steve Nickolas
@ 2022-12-16 17:24 ` John P. Linderman
       [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com>
  3 siblings, 0 replies; 18+ messages in thread
From: John P. Linderman @ 2022-12-16 17:24 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: Alejandro Colomar, TUHS main list

[-- Attachment #1: Type: text/plain, Size: 4313 bytes --]

Suppose you have two strings of 8-bit bytes, and you'd like
to compare them lexicographically (left to right, byte by byte).
An oracle tells you the length of the strings, so maybe you have

3 ram
4 ramp

You can just do an strncmp on the two strings, using the minimum
of the two lengths (3 in the example). If they differ (they didn't),
you are done. If the strings are of the same length (they aren't),
they are equal. Otherwise, the shorter (a prefix of the longer)
compares low. Ho-hum.

Suppose each comparand is a sequence of such strings, and you
want to break ties on initial components of such sequences
using subsequent components (if any). But you have to combine
them as a single string and the oracle only tells you the total length.
You can't just concatenate them together, or

(3 ram, 4 part) => 7 rampart
(4 ramp, 3 art) => 7 rampart
(7 rampart) => 7 rampart

and they all look equal, but they're not supposed to be.
The problem is that some components are proper prefixes
of the corresponding component. We can sneak past the end
of one component and compare bytes from different components,
something that cannot be allowed. A collection of components
is said to have "the prefix property" if no component is
a proper prefix of any other component. If there is some
byte that cannot occur in some component, we can tack that
byte onto the component, and the components will then have
the prefix property. Newline terminated lines have the
prefix property, but we cannot just concatenate such
components and do an strncmp, because newlines compare low
to most bytes, but high to some legitimate bytes, like tabs,
so they don't break ties correctly.

Null terminators to the rescue! These induce the prefix property,
and compare low to everything else. Using blanks to stand in for
hard-to-parse null bytes, we get

(4 ram , 5 part ) => 9 ram part
(5 ramp , 4 art ) => 9 ramp art
(8 rampart )      => 8 rampart

The strncmp on the results establishes the desired order.
The null terminator on the final component is optional.
The oracular length determines how much of the component
is significant. But you have to be consistent. The final
component must always, or never, have the "optional" terminator.

Null-terminated strings (which cannot, themselves, contain a null)
are not the only components having the prefix property.
Fixed length strings cannot, by definition, include proper prefixes,
and they are free to contain any byte. UTF-8 code-points
have the prefix property (I suspect this was no accident),
so the null-terminated concatenation of non-null UTF-8
code-points have the prefix property and break ties appropriately
(assuming that the code-points themselves establish the
correct order for what is being compared).

I doubt that this explains the use of null terminated strings,
but for those of use who spend too much time thinking about sorting,
they sure work well.

On Thu, Dec 15, 2022 at 10:04 PM Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:

> I think this cited quote from
> https://www.joelonsoftware.com/2001/12/11/ is urban legend.
>
>     Why do C strings [have a terminating NUl]? It’s because the PDP-7
> microprocessor, on which UNIX and the C programming language were
> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero)
> at the end.”
>
> This assertion seems unlikely since neither C nor the library string
> functions existed on the PDP-7. In fact the "terminating character" of
> a string in the PDP-7 language B was the pair '*e'. A string was a
> sequence of words, packed two characters per word. For odd-length
> strings half of the final one-character word was effectively
> NUL-padded as described below.
>
> One might trace null termination to the original (1965) proposal for
> ASCII,  https://dl.acm.org/doi/10.1145/363831.363839. There the only
> role specifically suggested for NUL is to "serve to accomplish time
> fill or media fill." With character-addressable hardware (not the
> PDP-7), it is only a small step from using NUL as terminal padding to
> the convention of null termination in all cases.
>
> Ken would probably know for sure whether there's any  truth in the
> attribution to ASCIZ.
>
> Doug
>

[-- Attachment #2: Type: text/html, Size: 5367 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16  9:13   ` Dr Iain Maoileoin
  2022-12-16 13:42     ` Dan Halbert
@ 2022-12-16 20:12     ` Dave Horsfall
  2022-12-16 21:02       ` Warner Losh
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Horsfall @ 2022-12-16 20:12 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 363 bytes --]

On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote:

> I remember running into a .asciz directive n the 70s “somewhere”. It was 
> an assembler directive in one of the RT11 systems??? or perhaps the unix 
> bootstrap and/or “.s” files - when I get some time I will go read some 
> old code/manuals.

MACRO-11 on RSX-11D seems to ring a bell...

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 20:12     ` Dave Horsfall
@ 2022-12-16 21:02       ` Warner Losh
  2022-12-16 21:13         ` Clem Cole
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Warner Losh @ 2022-12-16 21:02 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org> wrote:

> On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote:
>
> > I remember running into a .asciz directive n the 70s “somewhere”. It was
> > an assembler directive in one of the RT11 systems??? or perhaps the unix
> > bootstrap and/or “.s” files - when I get some time I will go read some
> > old code/manuals.
>
> MACRO-11 on RSX-11D seems to ring a bell...
>

I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the v6
macro assembler from DEC via Harvard that eventually wound up in 2BSD is
older and dates to 1977 or so.

Warner

>

[-- Attachment #2: Type: text/html, Size: 1179 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 21:02       ` Warner Losh
@ 2022-12-16 21:13         ` Clem Cole
  2022-12-16 21:49           ` Clem Cole
  2022-12-16 21:18         ` Luther Johnson
  2022-12-16 21:20         ` Dan Halbert
  2 siblings, 1 reply; 18+ messages in thread
From: Clem Cole @ 2022-12-16 21:13 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]

So I went to the oracle on much of DEC history ... -- this explains why Ken
never heard it.



---------- Forwarded message ---------
From: Timothe Litt
Date: Fri, Dec 16, 2022 at 3:40 PM
Subject: Re: Origin of ASCIZ / null terminated char arrays.
To: Clem Cole <clemc@ccc.com>

On 16-Dec-22 15:04, Clem Cole wrote:
Do either of you know when it showed up in DEC assemblers?  I  remember it
in Macro11 and Macro10, but I have to believe it was in the earlier
machines?  So far I have not found a reference to it in any of my PDP-8
stuff (which is small) and I never had the docs for 6, 7 or 9 -- I assume
Al K. has them on bitsavers - so I'm going to go poking around - but I
thought I'd ask you two if you knew.


Ken Thompson says he had never heard of it before, but he never used the
DEC assemblers -- (he wrote their own on the Honeywell originally I
believe). FWIW: B did not use null-terminated char arrays originally, but
by the time dmr morphed B into newB then C, they had become standard.  Like
many, I had always thought Dennis picked them from the DEC assembler, but
as Ken says - they never really used it.


I was trying to figure out when they (null terminate char arrays) started
to become more standard and specifically the pdeudo OP ASCIZ to create them.


Tx
Clem

It depends on if you require ASCII, or just character strings terminated by
a stop code...
The -11 has .asciz (as does VMS Macro,...); the -10 has ASCIZ.  SIXBIT 0 is
a space, so you needed to know the length, oftentimes in words, so strip
trailing 00s.
The basic 8 assembler (PAL) didn't even have ASCII data.
http://www.bitsavers.org/pdf/dec/pdp8/software/DEC-08-ASAC-D_PAL-III_Symbolic_Assembler_Programming_Manual.pdf

Macro-8 does; the TEXT pseudo-op uses 00 as a stop code.  (It also uses a
6-bit ASCII code).  " is a single character ASCII constant, but not used
for strings.
https://www.grc.com/pdp-8/docs/macro-8_programming_manual.pdf

The -15 has .ASCII and .SIXBIT, but no .ASCIZ.

http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/pdp15/DEC-15-AMZA-D_MACRO15.pdf

Probably of most interest to the Unix history, the PDP-7 assembler's TEXT
pseudo-op 'in order to separate the string from other data following it, a
termination code determined by the character mode is inserted automatically
after the last character code of the string"/...

http://www.bitsavers.org/pdf/dec/pdp7/PDP-7_AsmMan.pdf
I don't remember and/or didn't use the earlier assemblers, but many of the
manuals are on bitsavers.
Both NUL and RUBOUT (a.k.a. DELETE) were used as fill characters to cover
the time teletypes take to execute <CR> and <LF>.  you couldn't represent
the NUL version with ASCIZ, and RUBOUT was picked for the ability to
overpunch paper tape typos.  Neither function, nor the use of NUL as an end
of string marker  is in the ASCII standard, IIRC.

ᐧ

> ᐧ

[-- Attachment #2: Type: text/html, Size: 5920 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 21:02       ` Warner Losh
  2022-12-16 21:13         ` Clem Cole
@ 2022-12-16 21:18         ` Luther Johnson
  2022-12-16 21:20         ` Dan Halbert
  2 siblings, 0 replies; 18+ messages in thread
From: Luther Johnson @ 2022-12-16 21:18 UTC (permalink / raw)
  To: tuhs

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

I used RT-11 versions 4 and 5, and I seem to remember the MACRO-11 there 
had .ASCIZ.

On 12/16/2022 02:02 PM, Warner Losh wrote:
>
>
> On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org 
> <mailto:dave@horsfall.org>> wrote:
>
>     On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote:
>
>     > I remember running into a .asciz directive n the 70s
>     “somewhere”. It was
>     > an assembler directive in one of the RT11 systems??? or perhaps
>     the unix
>     > bootstrap and/or “.s” files - when I get some time I will go
>     read some
>     > old code/manuals.
>
>     MACRO-11 on RSX-11D seems to ring a bell...
>
>
> I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the 
> v6 macro assembler from DEC via Harvard that eventually wound up in 
> 2BSD is older and dates to 1977 or so.
>
> Warner


[-- Attachment #2: Type: text/html, Size: 1944 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 21:02       ` Warner Losh
  2022-12-16 21:13         ` Clem Cole
  2022-12-16 21:18         ` Luther Johnson
@ 2022-12-16 21:20         ` Dan Halbert
  2 siblings, 0 replies; 18+ messages in thread
From: Dan Halbert @ 2022-12-16 21:20 UTC (permalink / raw)
  To: tuhs

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

On 12/16/22 16:02, Warner Losh wrote:
> On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org> wrote:
>
>     On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote:
>
>     > I remember running into a .asciz directive n the 70s
>     “somewhere”. It was
>     > an assembler directive in one of the RT11 systems??? or perhaps
>     the unix
>     > bootstrap and/or “.s” files - when I get some time I will go
>     read some
>     > old code/manuals.
>
>     MACRO-11 on RSX-11D seems to ring a bell...
>
>
> I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the 
> v6 macro assembler from DEC via Harvard that eventually wound up in 
> 2BSD is older and dates to 1977 or so.
>
> Warner

The PDP-10 manual I spoke of is from 1971, and there were older 
editions. For the PDP-7, this manual from 1965, 
http://www.bitsavers.org/pdf/dec/pdp7/PDP-7_AsmMan.pdf, printed pages 
38-40, does not mention "ASCIZ" specifically, but talks about assembler 
directives "TELETYPE" and "ANALEX" that add a "termination code" of 00 
octal, for characters.

DEC also used SIXBIT, a truncated ASCII code that had printing 
characters but no control characters, so no newline, etc. In that 
scheme, 00 octal was SPACE. Table here: 
https://en.wikipedia.org/wiki/Six-bit_character_code#Examples_of_six-bit_ASCII_variants.

Dan H

[-- Attachment #2: Type: text/html, Size: 2677 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 21:13         ` Clem Cole
@ 2022-12-16 21:49           ` Clem Cole
  2022-12-17  0:26             ` Phil Budne
  0 siblings, 1 reply; 18+ messages in thread
From: Clem Cole @ 2022-12-16 21:49 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1002 bytes --]

More info WRT to historical DEC usage ...
---------- Forwarded message ---------
From: Bob Supnik
Date: Fri, Dec 16, 2022 at 4:39 PM
Subject: Re: Origin of ASCIZ / null terminated char arrays.
To: Clem Cole


It wasn't in the PDP8. The PDP8 mostly used sixbit, the ASCII subset
between 40 and 137. The character was simply masked by 077, so that 100
(@) became 0 and could be used as the delimiter. PAL8 (in OS8) does not
have a text generation pseudo-op.

The PDP7 had a TEXT pseudo-op that <did> fill an extra word with 0s if
the string was a multiple of 3 characters. It supported FIODEC, BAUDOT,
and ANALEX encodings, but not ASCII.

The PDP9 has both .SXBIT and .ASCII. The latter used two 18-bit words to
hold five 7bit ASCII characters. In both cases, words were zero-filled,
but an extra (word) of 0s was not added if the string was a multiple of
2/multiple of 5 characters.

The PDP11 had .ASCIZ, starting with Macro11 in 1972.

Tim can comment on the PDP10.

> ᐧ

[-- Attachment #2: Type: text/html, Size: 1647 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings)
       [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com>
@ 2022-12-16 22:30   ` Alejandro Colomar
  2022-12-16 22:51     ` [TUHS] " Dave Horsfall
  0 siblings, 1 reply; 18+ messages in thread
From: Alejandro Colomar @ 2022-12-16 22:30 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: TUHS main list

[Resend from my subscribed address, as the list is subscribers-only, it seems]

In C, most syscalls and libc functions use strings, that is, zero or more 
non-NUL characters followed by a NUL.

However, there are a few cases where other incompatible character constructs are 
used.  A few examples:

-  utmpx(5): Some of its fields use fixed-width char arrays which contain a 
sequence of non-NUL characters, and padding of NULs to fill the rest (although 
some systems only require a NUL to delimit the padding, which can then contain 
garbage).

-  Some programs use just a pointer and a length to determine sequences of 
characters.  No NULs involved.

-  abstract sockets:  On Linux, abstract Unix socket names are stored in a 
fixed-width array, and all bytes are meaningful (up to the specified size), even 
if they are NULs.  Only special that that the first byte is NUL.

Since those are only rare cases, those constructs don't seem to have a name; 
some programmers call them strings (quite confusingly).

Has there been any de-facto standard (or informal naming) to call those things, 
and differentiate them?


Thanks,

Alex

-- 
<http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: Terms for string, and similar character constructs (was: origin of null-terminated strings)
  2022-12-16 22:30   ` [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings) Alejandro Colomar
@ 2022-12-16 22:51     ` Dave Horsfall
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Horsfall @ 2022-12-16 22:51 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Fri, 16 Dec 2022, Alejandro Colomar wrote:

> [Resend from my subscribed address, as the list is subscribers-only, it seems]

Of course; open lists are spam magnets...  Regretful, but true.

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: origin of null-terminated strings
  2022-12-16 21:49           ` Clem Cole
@ 2022-12-17  0:26             ` Phil Budne
  0 siblings, 0 replies; 18+ messages in thread
From: Phil Budne @ 2022-12-17  0:26 UTC (permalink / raw)
  To: tuhs

> From: Bob Supnik
> Tim can comment on the PDP10.

MACRO10 (the DEC PDP-10 assembler) had the ASCIZ directive,

I don't see it in the May 1964 MACRO6 (PDP-6 assembler) document at:

http://bitsavers.trailing-edge.com/pdf/dec/pdp6/F-64MAS_MACRO6_Assembly_Program_May64.pdf

Nor the February 1965 version:
http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-TP-MAC-LM-FP-ACT01_MACRO-6_Assembly_Language_Feb65.pdf

But it does appear in the May 1965 MACRO-6 manual:

http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-TP-MAC-LM-FP_ACT02_MACRO-6_Assembly_Language_May65.pdf

Which has the fullly trifuricated character packings:

ASCII/ASCIZ:	7 bit bytes, with the low order bit left over
		(set at the start of lines in files to indicate a Line
		Sequence Number metadata for line number based editors)
SIXBIT		"6-bit ASCII" -- ASCII characters 040 thru 0137
		stored as 00 thru 077 in six six bit bytes
RADIX50		6 characters from a 40 (050) character character set
		(plus four flag bits) used to store symbol tables
		https://en.wikipedia.org/wiki/DEC_RADIX_50#36-bit_systems

And ASCIZ is used in listings of the PDP-6 "T.S. Executive" version
1.4 dated 8-18-65:

http://bitsavers.trailing-edge.com/pdf/dec/pdp6/tsExec1.4/COMCON.pdf

COMCON is "COMmand CONtrol" -- the top level command interpreter built
into the monitor (the file name was retained into the later days of
TOPS-10), and messages output to the user use ASCIZ directives.

And to tie the thread back (closer) to the list subject, the "sub
title" headers in the above assembler listing file are "T. HASTINGS
8-2-65" (who I believe is Tom Hastings), which also appears in many
other files, including the job scheduler:

http://bitsavers.trailing-edge.com/pdf/dec/pdp6/tsExec1.4/CLKCSS.pdf

*AND* T. Hastings also appears as an author of the CTSS scheduler:

https://softwarehistory.csse.rose-hulman.edu/index.php/ctss-scheduler/
(in the "Full Code" section):

          :R******TIME SHARING SCHEDULING ALGORITHM***********
          :R    T. Hastings and R. Daley
          :R    Minor Modifications by G. Schroeder when NEW
          :R    I/O Package Installed....Summer, 1965

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-12-17  0:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16  3:02 [TUHS] origin of null-terminated strings Douglas McIlroy
2022-12-16  3:14 ` [TUHS] " Ken Thompson
2022-12-16  9:13   ` Dr Iain Maoileoin
2022-12-16 13:42     ` Dan Halbert
2022-12-16 16:10       ` Dan Cross
2022-12-16 16:22         ` Tom Lyon
2022-12-16 16:29         ` Jon Steinhart
2022-12-16 20:12     ` Dave Horsfall
2022-12-16 21:02       ` Warner Losh
2022-12-16 21:13         ` Clem Cole
2022-12-16 21:49           ` Clem Cole
2022-12-17  0:26             ` Phil Budne
2022-12-16 21:18         ` Luther Johnson
2022-12-16 21:20         ` Dan Halbert
2022-12-16  3:17 ` Steve Nickolas
2022-12-16 17:24 ` John P. Linderman
     [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com>
2022-12-16 22:30   ` [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings) Alejandro Colomar
2022-12-16 22:51     ` [TUHS] " Dave Horsfall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).