* [TUHS] origin of null-terminated strings @ 2022-12-16 3:02 Douglas McIlroy 2022-12-16 3:14 ` [TUHS] " Ken Thompson ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Douglas McIlroy @ 2022-12-16 3:02 UTC (permalink / raw) To: Alejandro Colomar; +Cc: TUHS main list I think this cited quote from https://www.joelonsoftware.com/2001/12/11/ is urban legend. Why do C strings [have a terminating NUl]? It’s because the PDP-7 microprocessor, on which UNIX and the C programming language were invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) at the end.” This assertion seems unlikely since neither C nor the library string functions existed on the PDP-7. In fact the "terminating character" of a string in the PDP-7 language B was the pair '*e'. A string was a sequence of words, packed two characters per word. For odd-length strings half of the final one-character word was effectively NUL-padded as described below. One might trace null termination to the original (1965) proposal for ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only role specifically suggested for NUL is to "serve to accomplish time fill or media fill." With character-addressable hardware (not the PDP-7), it is only a small step from using NUL as terminal padding to the convention of null termination in all cases. Ken would probably know for sure whether there's any truth in the attribution to ASCIZ. Doug ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 3:02 [TUHS] origin of null-terminated strings Douglas McIlroy @ 2022-12-16 3:14 ` Ken Thompson 2022-12-16 9:13 ` Dr Iain Maoileoin 2022-12-16 3:17 ` Steve Nickolas ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Ken Thompson @ 2022-12-16 3:14 UTC (permalink / raw) To: Douglas McIlroy; +Cc: Alejandro Colomar, TUHS main list [-- Attachment #1: Type: text/plain, Size: 1405 bytes --] asciz -- this is the first time i heard of it. doug -- yes. On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy < douglas.mcilroy@dartmouth.edu> wrote: > I think this cited quote from > https://www.joelonsoftware.com/2001/12/11/ is urban legend. > > Why do C strings [have a terminating NUl]? It’s because the PDP-7 > microprocessor, on which UNIX and the C programming language were > invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) > at the end.” > > This assertion seems unlikely since neither C nor the library string > functions existed on the PDP-7. In fact the "terminating character" of > a string in the PDP-7 language B was the pair '*e'. A string was a > sequence of words, packed two characters per word. For odd-length > strings half of the final one-character word was effectively > NUL-padded as described below. > > One might trace null termination to the original (1965) proposal for > ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only > role specifically suggested for NUL is to "serve to accomplish time > fill or media fill." With character-addressable hardware (not the > PDP-7), it is only a small step from using NUL as terminal padding to > the convention of null termination in all cases. > > Ken would probably know for sure whether there's any truth in the > attribution to ASCIZ. > > Doug > [-- Attachment #2: Type: text/html, Size: 1982 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 3:14 ` [TUHS] " Ken Thompson @ 2022-12-16 9:13 ` Dr Iain Maoileoin 2022-12-16 13:42 ` Dan Halbert 2022-12-16 20:12 ` Dave Horsfall 0 siblings, 2 replies; 18+ messages in thread From: Dr Iain Maoileoin @ 2022-12-16 9:13 UTC (permalink / raw) To: Ken Thompson; +Cc: Douglas McIlroy, Alejandro Colomar, TUHS main list [-- Attachment #1: Type: text/plain, Size: 1969 bytes --] ASCIZ Lost in the mists of time in my mind. I remember running into a .asciz directive n the 70s “somewhere”. It was an assembler directive in one of the RT11 systems??? or perhaps the unix bootstrap and/or “.s” files - when I get some time I will go read some old code/manuals. I Yes, it put a null byte at the end of a string. > On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote: > > asciz -- this is the first time i heard of it. > doug -- yes. > > > On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <douglas.mcilroy@dartmouth.edu <mailto:douglas.mcilroy@dartmouth.edu>> wrote: > I think this cited quote from > https://www.joelonsoftware.com/2001/12/11/ <https://www.joelonsoftware.com/2001/12/11/> is urban legend. > > Why do C strings [have a terminating NUl]? It’s because the PDP-7 > microprocessor, on which UNIX and the C programming language were > invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) > at the end.” > > This assertion seems unlikely since neither C nor the library string > functions existed on the PDP-7. In fact the "terminating character" of > a string in the PDP-7 language B was the pair '*e'. A string was a > sequence of words, packed two characters per word. For odd-length > strings half of the final one-character word was effectively > NUL-padded as described below. > > One might trace null termination to the original (1965) proposal for > ASCII, https://dl.acm.org/doi/10.1145/363831.363839 <https://dl.acm.org/doi/10.1145/363831.363839>. There the only > role specifically suggested for NUL is to "serve to accomplish time > fill or media fill." With character-addressable hardware (not the > PDP-7), it is only a small step from using NUL as terminal padding to > the convention of null termination in all cases. > > Ken would probably know for sure whether there's any truth in the > attribution to ASCIZ. > > Doug [-- Attachment #2: Type: text/html, Size: 3319 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 9:13 ` Dr Iain Maoileoin @ 2022-12-16 13:42 ` Dan Halbert 2022-12-16 16:10 ` Dan Cross 2022-12-16 20:12 ` Dave Horsfall 1 sibling, 1 reply; 18+ messages in thread From: Dan Halbert @ 2022-12-16 13:42 UTC (permalink / raw) To: tuhs [-- Attachment #1: Type: text/plain, Size: 2436 bytes --] ASCIZ was an assembler directive used for a number of different DEC computers, and also the name for null-terminated strings. I learned it for the PDP-10, but I'm sure it existed on other machines. It is in some PDP-10 documentation I am looking at right now. Anyone who used DEC and did assembly programming would have known about it. Various system calls took ASCIZ strings. On 12/16/22 04:13, Dr Iain Maoileoin wrote: > ASCIZ > Lost in the mists of time in my mind. > > I remember running into a .asciz directive n the 70s “somewhere”. > It was an assembler directive in one of the RT11 systems??? or perhaps > the unix bootstrap and/or “.s” files - when I get some time I will go > read some old code/manuals. > > I > > Yes, it put a null byte at the end of a string. > >> On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote: >> >> asciz -- this is the first time i heard of it. >> doug -- yes. >> >> >> On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy >> <douglas.mcilroy@dartmouth.edu> wrote: >> >> I think this cited quote from >> https://www.joelonsoftware.com/2001/12/11/ is urban legend. >> >> Why do C strings [have a terminating NUl]? It’s because the PDP-7 >> microprocessor, on which UNIX and the C programming language were >> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z >> (zero) >> at the end.” >> >> This assertion seems unlikely since neither C nor the library string >> functions existed on the PDP-7. In fact the "terminating >> character" of >> a string in the PDP-7 language B was the pair '*e'. A string was a >> sequence of words, packed two characters per word. For odd-length >> strings half of the final one-character word was effectively >> NUL-padded as described below. >> >> One might trace null termination to the original (1965) proposal for >> ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only >> role specifically suggested for NUL is to "serve to accomplish time >> fill or media fill." With character-addressable hardware (not the >> PDP-7), it is only a small step from using NUL as terminal padding to >> the convention of null termination in all cases. >> >> Ken would probably know for sure whether there's any truth in the >> attribution to ASCIZ. >> >> Doug >> > [-- Attachment #2: Type: text/html, Size: 5515 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 13:42 ` Dan Halbert @ 2022-12-16 16:10 ` Dan Cross 2022-12-16 16:22 ` Tom Lyon 2022-12-16 16:29 ` Jon Steinhart 0 siblings, 2 replies; 18+ messages in thread From: Dan Cross @ 2022-12-16 16:10 UTC (permalink / raw) To: Dan Halbert; +Cc: tuhs On Fri, Dec 16, 2022 at 8:42 AM Dan Halbert <halbert@halwitz.org> wrote: > ASCIZ was an assembler directive used for a number of different DEC computers, and also the name for null-terminated strings. I learned it for the PDP-10, but I'm sure it existed on other machines. It is in some PDP-10 documentation I am looking at right now. Anyone who used DEC and did assembly programming would have known about it. Various system calls took ASCIZ strings. This raises something I've always been curious about. To what extent were the Unix folks at Bell Labs already familiar with DEC systems before the PDP-7? It strikes me that much of the published work was centered around IBM and GE systems (e.g., Ken's wonderful paper on regular expressions, and of course the Multics work). Were there other Digital machines floating around? I know a proposal was written to get a PDP-10 for operating systems research, but it wasn't approved. Relatedly, was any thought given to trying to get a 360 system? On 12/16/22 04:13, Dr Iain Maoileoin wrote: > ASCIZ > Lost in the mists of time in my mind. Origin, perhaps, but it exists in contemporary assemblers. Like most sane people I try to avoid being in assembler for too long, when you're first turning on a machine it is useful to be able to squirt a message out of the UART if something goes dramatically wrong, and the directive is handy for that. It seems to have made its way into Research assembler via BSD; it's in locore.s in 8th Edition, for instance, but doesn't appear before that. The "UNIX Assembler Manual" describes "String Statements" for the 7th Edition assembler; strings are sequences of ASCII characters between '<' and '>'. But it doesn't say that they're NUL terminated, and they are not: adding the terminator was manual via the familiar, `\0` escape sequence. - Dan C. > I remember running into a .asciz directive n the 70s “somewhere”. > It was an assembler directive in one of the RT11 systems??? or perhaps the unix bootstrap and/or “.s” files - when I get some time I will go read some old code/manuals. > > I > > Yes, it put a null byte at the end of a string. > > On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote: > > asciz -- this is the first time i heard of it. > doug -- yes. > > > On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote: >> >> I think this cited quote from >> https://www.joelonsoftware.com/2001/12/11/ is urban legend. >> >> Why do C strings [have a terminating NUl]? It’s because the PDP-7 >> microprocessor, on which UNIX and the C programming language were >> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) >> at the end.” >> >> This assertion seems unlikely since neither C nor the library string >> functions existed on the PDP-7. In fact the "terminating character" of >> a string in the PDP-7 language B was the pair '*e'. A string was a >> sequence of words, packed two characters per word. For odd-length >> strings half of the final one-character word was effectively >> NUL-padded as described below. >> >> One might trace null termination to the original (1965) proposal for >> ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only >> role specifically suggested for NUL is to "serve to accomplish time >> fill or media fill." With character-addressable hardware (not the >> PDP-7), it is only a small step from using NUL as terminal padding to >> the convention of null termination in all cases. >> >> Ken would probably know for sure whether there's any truth in the >> attribution to ASCIZ. >> >> Doug > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 16:10 ` Dan Cross @ 2022-12-16 16:22 ` Tom Lyon 2022-12-16 16:29 ` Jon Steinhart 1 sibling, 0 replies; 18+ messages in thread From: Tom Lyon @ 2022-12-16 16:22 UTC (permalink / raw) To: Dan Cross; +Cc: tuhs [-- Attachment #1: Type: text/plain, Size: 4327 bytes --] Re: getting a 360 - IBM and AT&T really hated each other, so 360s were avoided for strategic reasons. That said, they could not be practically avoided; Holmdel had a large installation: https://www.youtube.com/watch?v=HMYiktO0D64&ab_channel=AT%26TTechChannel When Amdahl and UTS/UNIX came along, the Bell System was by far the biggest customer. On Fri, Dec 16, 2022 at 8:12 AM Dan Cross <crossd@gmail.com> wrote: > On Fri, Dec 16, 2022 at 8:42 AM Dan Halbert <halbert@halwitz.org> wrote: > > ASCIZ was an assembler directive used for a number of different DEC > computers, and also the name for null-terminated strings. I learned it for > the PDP-10, but I'm sure it existed on other machines. It is in some PDP-10 > documentation I am looking at right now. Anyone who used DEC and did > assembly programming would have known about it. Various system calls took > ASCIZ strings. > > This raises something I've always been curious about. To what extent were > the Unix folks at Bell Labs already familiar with DEC systems before the > PDP-7? > > It strikes me that much of the published work was centered around IBM and > GE > systems (e.g., Ken's wonderful paper on regular expressions, and of course > the > Multics work). Were there other Digital machines floating around? I know a > proposal was written to get a PDP-10 for operating systems research, but it > wasn't approved. > > Relatedly, was any thought given to trying to get a 360 system? > > On 12/16/22 04:13, Dr Iain Maoileoin wrote: > > ASCIZ > > Lost in the mists of time in my mind. > > Origin, perhaps, but it exists in contemporary assemblers. Like most > sane people I try to avoid being in assembler for too long, when you're > first turning on a machine it is useful to be able to squirt a message > out of the UART if something goes dramatically wrong, and the directive > is handy for that. > > It seems to have made its way into Research assembler via BSD; it's in > locore.s in 8th Edition, for instance, but doesn't appear before that. The > "UNIX Assembler Manual" describes "String Statements" for the 7th > Edition assembler; strings are sequences of ASCII characters between > '<' and '>'. But it doesn't say that they're NUL terminated, and they are > not: adding the terminator was manual via the familiar, `\0` escape > sequence. > > - Dan C. > > > > I remember running into a .asciz directive n the 70s “somewhere”. > > It was an assembler directive in one of the RT11 systems??? or perhaps > the unix bootstrap and/or “.s” files - when I get some time I will go read > some old code/manuals. > > > > I > > > > Yes, it put a null byte at the end of a string. > > > > On 16 Dec 2022, at 03:14, Ken Thompson <kenbob@gmail.com> wrote: > > > > asciz -- this is the first time i heard of it. > > doug -- yes. > > > > > > On Thu, Dec 15, 2022 at 7:04 PM Douglas McIlroy < > douglas.mcilroy@dartmouth.edu> wrote: > >> > >> I think this cited quote from > >> https://www.joelonsoftware.com/2001/12/11/ is urban legend. > >> > >> Why do C strings [have a terminating NUl]? It’s because the PDP-7 > >> microprocessor, on which UNIX and the C programming language were > >> invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) > >> at the end.” > >> > >> This assertion seems unlikely since neither C nor the library string > >> functions existed on the PDP-7. In fact the "terminating character" of > >> a string in the PDP-7 language B was the pair '*e'. A string was a > >> sequence of words, packed two characters per word. For odd-length > >> strings half of the final one-character word was effectively > >> NUL-padded as described below. > >> > >> One might trace null termination to the original (1965) proposal for > >> ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only > >> role specifically suggested for NUL is to "serve to accomplish time > >> fill or media fill." With character-addressable hardware (not the > >> PDP-7), it is only a small step from using NUL as terminal padding to > >> the convention of null termination in all cases. > >> > >> Ken would probably know for sure whether there's any truth in the > >> attribution to ASCIZ. > >> > >> Doug > > > > > > > [-- Attachment #2: Type: text/html, Size: 5568 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 16:10 ` Dan Cross 2022-12-16 16:22 ` Tom Lyon @ 2022-12-16 16:29 ` Jon Steinhart 1 sibling, 0 replies; 18+ messages in thread From: Jon Steinhart @ 2022-12-16 16:29 UTC (permalink / raw) To: tuhs Dan Cross writes: > > This raises something I've always been curious about. To what extent were > the Unix folks at Bell Labs already familiar with DEC systems before the PDP-7? Well, I recall that there was a PDP-8 in the keypunch room on the 5th floor of building 2. I believe that it was hooked to a card reader and printer so that one could get a listing of a deck of cards without having to use the computer center. But that's probably not what you're asking about. Jon ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 9:13 ` Dr Iain Maoileoin 2022-12-16 13:42 ` Dan Halbert @ 2022-12-16 20:12 ` Dave Horsfall 2022-12-16 21:02 ` Warner Losh 1 sibling, 1 reply; 18+ messages in thread From: Dave Horsfall @ 2022-12-16 20:12 UTC (permalink / raw) To: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 363 bytes --] On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote: > I remember running into a .asciz directive n the 70s “somewhere”. It was > an assembler directive in one of the RT11 systems??? or perhaps the unix > bootstrap and/or “.s” files - when I get some time I will go read some > old code/manuals. MACRO-11 on RSX-11D seems to ring a bell... -- Dave ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 20:12 ` Dave Horsfall @ 2022-12-16 21:02 ` Warner Losh 2022-12-16 21:13 ` Clem Cole ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Warner Losh @ 2022-12-16 21:02 UTC (permalink / raw) To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 639 bytes --] On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org> wrote: > On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote: > > > I remember running into a .asciz directive n the 70s “somewhere”. It was > > an assembler directive in one of the RT11 systems??? or perhaps the unix > > bootstrap and/or “.s” files - when I get some time I will go read some > > old code/manuals. > > MACRO-11 on RSX-11D seems to ring a bell... > I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the v6 macro assembler from DEC via Harvard that eventually wound up in 2BSD is older and dates to 1977 or so. Warner > [-- Attachment #2: Type: text/html, Size: 1179 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 21:02 ` Warner Losh @ 2022-12-16 21:13 ` Clem Cole 2022-12-16 21:49 ` Clem Cole 2022-12-16 21:18 ` Luther Johnson 2022-12-16 21:20 ` Dan Halbert 2 siblings, 1 reply; 18+ messages in thread From: Clem Cole @ 2022-12-16 21:13 UTC (permalink / raw) To: Warner Losh; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 2920 bytes --] So I went to the oracle on much of DEC history ... -- this explains why Ken never heard it. ---------- Forwarded message --------- From: Timothe Litt Date: Fri, Dec 16, 2022 at 3:40 PM Subject: Re: Origin of ASCIZ / null terminated char arrays. To: Clem Cole <clemc@ccc.com> On 16-Dec-22 15:04, Clem Cole wrote: Do either of you know when it showed up in DEC assemblers? I remember it in Macro11 and Macro10, but I have to believe it was in the earlier machines? So far I have not found a reference to it in any of my PDP-8 stuff (which is small) and I never had the docs for 6, 7 or 9 -- I assume Al K. has them on bitsavers - so I'm going to go poking around - but I thought I'd ask you two if you knew. Ken Thompson says he had never heard of it before, but he never used the DEC assemblers -- (he wrote their own on the Honeywell originally I believe). FWIW: B did not use null-terminated char arrays originally, but by the time dmr morphed B into newB then C, they had become standard. Like many, I had always thought Dennis picked them from the DEC assembler, but as Ken says - they never really used it. I was trying to figure out when they (null terminate char arrays) started to become more standard and specifically the pdeudo OP ASCIZ to create them. Tx Clem It depends on if you require ASCII, or just character strings terminated by a stop code... The -11 has .asciz (as does VMS Macro,...); the -10 has ASCIZ. SIXBIT 0 is a space, so you needed to know the length, oftentimes in words, so strip trailing 00s. The basic 8 assembler (PAL) didn't even have ASCII data. http://www.bitsavers.org/pdf/dec/pdp8/software/DEC-08-ASAC-D_PAL-III_Symbolic_Assembler_Programming_Manual.pdf Macro-8 does; the TEXT pseudo-op uses 00 as a stop code. (It also uses a 6-bit ASCII code). " is a single character ASCII constant, but not used for strings. https://www.grc.com/pdp-8/docs/macro-8_programming_manual.pdf The -15 has .ASCII and .SIXBIT, but no .ASCIZ. http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/pdp15/DEC-15-AMZA-D_MACRO15.pdf Probably of most interest to the Unix history, the PDP-7 assembler's TEXT pseudo-op 'in order to separate the string from other data following it, a termination code determined by the character mode is inserted automatically after the last character code of the string"/... http://www.bitsavers.org/pdf/dec/pdp7/PDP-7_AsmMan.pdf I don't remember and/or didn't use the earlier assemblers, but many of the manuals are on bitsavers. Both NUL and RUBOUT (a.k.a. DELETE) were used as fill characters to cover the time teletypes take to execute <CR> and <LF>. you couldn't represent the NUL version with ASCIZ, and RUBOUT was picked for the ability to overpunch paper tape typos. Neither function, nor the use of NUL as an end of string marker is in the ASCII standard, IIRC. ᐧ > ᐧ [-- Attachment #2: Type: text/html, Size: 5920 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 21:13 ` Clem Cole @ 2022-12-16 21:49 ` Clem Cole 2022-12-17 0:26 ` Phil Budne 0 siblings, 1 reply; 18+ messages in thread From: Clem Cole @ 2022-12-16 21:49 UTC (permalink / raw) To: Warner Losh; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 1002 bytes --] More info WRT to historical DEC usage ... ---------- Forwarded message --------- From: Bob Supnik Date: Fri, Dec 16, 2022 at 4:39 PM Subject: Re: Origin of ASCIZ / null terminated char arrays. To: Clem Cole It wasn't in the PDP8. The PDP8 mostly used sixbit, the ASCII subset between 40 and 137. The character was simply masked by 077, so that 100 (@) became 0 and could be used as the delimiter. PAL8 (in OS8) does not have a text generation pseudo-op. The PDP7 had a TEXT pseudo-op that <did> fill an extra word with 0s if the string was a multiple of 3 characters. It supported FIODEC, BAUDOT, and ANALEX encodings, but not ASCII. The PDP9 has both .SXBIT and .ASCII. The latter used two 18-bit words to hold five 7bit ASCII characters. In both cases, words were zero-filled, but an extra (word) of 0s was not added if the string was a multiple of 2/multiple of 5 characters. The PDP11 had .ASCIZ, starting with Macro11 in 1972. Tim can comment on the PDP10. > ᐧ [-- Attachment #2: Type: text/html, Size: 1647 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 21:49 ` Clem Cole @ 2022-12-17 0:26 ` Phil Budne 0 siblings, 0 replies; 18+ messages in thread From: Phil Budne @ 2022-12-17 0:26 UTC (permalink / raw) To: tuhs > From: Bob Supnik > Tim can comment on the PDP10. MACRO10 (the DEC PDP-10 assembler) had the ASCIZ directive, I don't see it in the May 1964 MACRO6 (PDP-6 assembler) document at: http://bitsavers.trailing-edge.com/pdf/dec/pdp6/F-64MAS_MACRO6_Assembly_Program_May64.pdf Nor the February 1965 version: http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-TP-MAC-LM-FP-ACT01_MACRO-6_Assembly_Language_Feb65.pdf But it does appear in the May 1965 MACRO-6 manual: http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-TP-MAC-LM-FP_ACT02_MACRO-6_Assembly_Language_May65.pdf Which has the fullly trifuricated character packings: ASCII/ASCIZ: 7 bit bytes, with the low order bit left over (set at the start of lines in files to indicate a Line Sequence Number metadata for line number based editors) SIXBIT "6-bit ASCII" -- ASCII characters 040 thru 0137 stored as 00 thru 077 in six six bit bytes RADIX50 6 characters from a 40 (050) character character set (plus four flag bits) used to store symbol tables https://en.wikipedia.org/wiki/DEC_RADIX_50#36-bit_systems And ASCIZ is used in listings of the PDP-6 "T.S. Executive" version 1.4 dated 8-18-65: http://bitsavers.trailing-edge.com/pdf/dec/pdp6/tsExec1.4/COMCON.pdf COMCON is "COMmand CONtrol" -- the top level command interpreter built into the monitor (the file name was retained into the later days of TOPS-10), and messages output to the user use ASCIZ directives. And to tie the thread back (closer) to the list subject, the "sub title" headers in the above assembler listing file are "T. HASTINGS 8-2-65" (who I believe is Tom Hastings), which also appears in many other files, including the job scheduler: http://bitsavers.trailing-edge.com/pdf/dec/pdp6/tsExec1.4/CLKCSS.pdf *AND* T. Hastings also appears as an author of the CTSS scheduler: https://softwarehistory.csse.rose-hulman.edu/index.php/ctss-scheduler/ (in the "Full Code" section): :R******TIME SHARING SCHEDULING ALGORITHM*********** :R T. Hastings and R. Daley :R Minor Modifications by G. Schroeder when NEW :R I/O Package Installed....Summer, 1965 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 21:02 ` Warner Losh 2022-12-16 21:13 ` Clem Cole @ 2022-12-16 21:18 ` Luther Johnson 2022-12-16 21:20 ` Dan Halbert 2 siblings, 0 replies; 18+ messages in thread From: Luther Johnson @ 2022-12-16 21:18 UTC (permalink / raw) To: tuhs [-- Attachment #1: Type: text/plain, Size: 867 bytes --] I used RT-11 versions 4 and 5, and I seem to remember the MACRO-11 there had .ASCIZ. On 12/16/2022 02:02 PM, Warner Losh wrote: > > > On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org > <mailto:dave@horsfall.org>> wrote: > > On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote: > > > I remember running into a .asciz directive n the 70s > “somewhere”. It was > > an assembler directive in one of the RT11 systems??? or perhaps > the unix > > bootstrap and/or “.s” files - when I get some time I will go > read some > > old code/manuals. > > MACRO-11 on RSX-11D seems to ring a bell... > > > I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the > v6 macro assembler from DEC via Harvard that eventually wound up in > 2BSD is older and dates to 1977 or so. > > Warner [-- Attachment #2: Type: text/html, Size: 1944 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 21:02 ` Warner Losh 2022-12-16 21:13 ` Clem Cole 2022-12-16 21:18 ` Luther Johnson @ 2022-12-16 21:20 ` Dan Halbert 2 siblings, 0 replies; 18+ messages in thread From: Dan Halbert @ 2022-12-16 21:20 UTC (permalink / raw) To: tuhs [-- Attachment #1: Type: text/plain, Size: 1370 bytes --] On 12/16/22 16:02, Warner Losh wrote: > On Fri, Dec 16, 2022, 1:12 PM Dave Horsfall <dave@horsfall.org> wrote: > > On Fri, 16 Dec 2022, Dr Iain Maoileoin wrote: > > > I remember running into a .asciz directive n the 70s > “somewhere”. It was > > an assembler directive in one of the RT11 systems??? or perhaps > the unix > > bootstrap and/or “.s” files - when I get some time I will go > read some > > old code/manuals. > > MACRO-11 on RSX-11D seems to ring a bell... > > > I first encountered it on RSTS/E 6C in the MACRO-11 it had... But the > v6 macro assembler from DEC via Harvard that eventually wound up in > 2BSD is older and dates to 1977 or so. > > Warner The PDP-10 manual I spoke of is from 1971, and there were older editions. For the PDP-7, this manual from 1965, http://www.bitsavers.org/pdf/dec/pdp7/PDP-7_AsmMan.pdf, printed pages 38-40, does not mention "ASCIZ" specifically, but talks about assembler directives "TELETYPE" and "ANALEX" that add a "termination code" of 00 octal, for characters. DEC also used SIXBIT, a truncated ASCII code that had printing characters but no control characters, so no newline, etc. In that scheme, 00 octal was SPACE. Table here: https://en.wikipedia.org/wiki/Six-bit_character_code#Examples_of_six-bit_ASCII_variants. Dan H [-- Attachment #2: Type: text/html, Size: 2677 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 3:02 [TUHS] origin of null-terminated strings Douglas McIlroy 2022-12-16 3:14 ` [TUHS] " Ken Thompson @ 2022-12-16 3:17 ` Steve Nickolas 2022-12-16 17:24 ` John P. Linderman [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com> 3 siblings, 0 replies; 18+ messages in thread From: Steve Nickolas @ 2022-12-16 3:17 UTC (permalink / raw) To: TUHS main list [-- Attachment #1: Type: text/plain, Size: 1910 bytes --] On Thu, 15 Dec 2022, Douglas McIlroy wrote: > I think this cited quote from > https://www.joelonsoftware.com/2001/12/11/ is urban legend. > > Why do C strings [have a terminating NUl]? It’s because the PDP-7 > microprocessor, on which UNIX and the C programming language were > invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) > at the end.” > > This assertion seems unlikely since neither C nor the library string > functions existed on the PDP-7. In fact the "terminating character" of > a string in the PDP-7 language B was the pair '*e'. A string was a > sequence of words, packed two characters per word. For odd-length > strings half of the final one-character word was effectively > NUL-padded as described below. > > One might trace null termination to the original (1965) proposal for > ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only > role specifically suggested for NUL is to "serve to accomplish time > fill or media fill." With character-addressable hardware (not the > PDP-7), it is only a small step from using NUL as terminal padding to > the convention of null termination in all cases. > > Ken would probably know for sure whether there's any truth in the > attribution to ASCIZ. > > Doug > For what it's worth, when I code for the Apple //e (using 65C02 assembler), I use C strings. I can just do something like prstr: ldy #$00 @1: lda msg, y beq @2 ; string terminator ora #$80 ; firmware wants high bit on jsr $FDED ; write char iny bne @1 @2: rts msg: .byte "Hello, cruel world.", 13, 0 and using a NUL terminator just makes sense here because of how simple it is to check for (BEQ and BNE check the 6502's zero flag, which LDA automatically sets). -uso. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: origin of null-terminated strings 2022-12-16 3:02 [TUHS] origin of null-terminated strings Douglas McIlroy 2022-12-16 3:14 ` [TUHS] " Ken Thompson 2022-12-16 3:17 ` Steve Nickolas @ 2022-12-16 17:24 ` John P. Linderman [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com> 3 siblings, 0 replies; 18+ messages in thread From: John P. Linderman @ 2022-12-16 17:24 UTC (permalink / raw) To: Douglas McIlroy; +Cc: Alejandro Colomar, TUHS main list [-- Attachment #1: Type: text/plain, Size: 4313 bytes --] Suppose you have two strings of 8-bit bytes, and you'd like to compare them lexicographically (left to right, byte by byte). An oracle tells you the length of the strings, so maybe you have 3 ram 4 ramp You can just do an strncmp on the two strings, using the minimum of the two lengths (3 in the example). If they differ (they didn't), you are done. If the strings are of the same length (they aren't), they are equal. Otherwise, the shorter (a prefix of the longer) compares low. Ho-hum. Suppose each comparand is a sequence of such strings, and you want to break ties on initial components of such sequences using subsequent components (if any). But you have to combine them as a single string and the oracle only tells you the total length. You can't just concatenate them together, or (3 ram, 4 part) => 7 rampart (4 ramp, 3 art) => 7 rampart (7 rampart) => 7 rampart and they all look equal, but they're not supposed to be. The problem is that some components are proper prefixes of the corresponding component. We can sneak past the end of one component and compare bytes from different components, something that cannot be allowed. A collection of components is said to have "the prefix property" if no component is a proper prefix of any other component. If there is some byte that cannot occur in some component, we can tack that byte onto the component, and the components will then have the prefix property. Newline terminated lines have the prefix property, but we cannot just concatenate such components and do an strncmp, because newlines compare low to most bytes, but high to some legitimate bytes, like tabs, so they don't break ties correctly. Null terminators to the rescue! These induce the prefix property, and compare low to everything else. Using blanks to stand in for hard-to-parse null bytes, we get (4 ram , 5 part ) => 9 ram part (5 ramp , 4 art ) => 9 ramp art (8 rampart ) => 8 rampart The strncmp on the results establishes the desired order. The null terminator on the final component is optional. The oracular length determines how much of the component is significant. But you have to be consistent. The final component must always, or never, have the "optional" terminator. Null-terminated strings (which cannot, themselves, contain a null) are not the only components having the prefix property. Fixed length strings cannot, by definition, include proper prefixes, and they are free to contain any byte. UTF-8 code-points have the prefix property (I suspect this was no accident), so the null-terminated concatenation of non-null UTF-8 code-points have the prefix property and break ties appropriately (assuming that the code-points themselves establish the correct order for what is being compared). I doubt that this explains the use of null terminated strings, but for those of use who spend too much time thinking about sorting, they sure work well. On Thu, Dec 15, 2022 at 10:04 PM Douglas McIlroy < douglas.mcilroy@dartmouth.edu> wrote: > I think this cited quote from > https://www.joelonsoftware.com/2001/12/11/ is urban legend. > > Why do C strings [have a terminating NUl]? It’s because the PDP-7 > microprocessor, on which UNIX and the C programming language were > invented, had an ASCIZ string type. ASCIZ meant “ASCII with a Z (zero) > at the end.” > > This assertion seems unlikely since neither C nor the library string > functions existed on the PDP-7. In fact the "terminating character" of > a string in the PDP-7 language B was the pair '*e'. A string was a > sequence of words, packed two characters per word. For odd-length > strings half of the final one-character word was effectively > NUL-padded as described below. > > One might trace null termination to the original (1965) proposal for > ASCII, https://dl.acm.org/doi/10.1145/363831.363839. There the only > role specifically suggested for NUL is to "serve to accomplish time > fill or media fill." With character-addressable hardware (not the > PDP-7), it is only a small step from using NUL as terminal padding to > the convention of null termination in all cases. > > Ken would probably know for sure whether there's any truth in the > attribution to ASCIZ. > > Doug > [-- Attachment #2: Type: text/html, Size: 5367 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com>]
* [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings) [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com> @ 2022-12-16 22:30 ` Alejandro Colomar 2022-12-16 22:51 ` [TUHS] " Dave Horsfall 0 siblings, 1 reply; 18+ messages in thread From: Alejandro Colomar @ 2022-12-16 22:30 UTC (permalink / raw) To: Douglas McIlroy; +Cc: TUHS main list [Resend from my subscribed address, as the list is subscribers-only, it seems] In C, most syscalls and libc functions use strings, that is, zero or more non-NUL characters followed by a NUL. However, there are a few cases where other incompatible character constructs are used. A few examples: - utmpx(5): Some of its fields use fixed-width char arrays which contain a sequence of non-NUL characters, and padding of NULs to fill the rest (although some systems only require a NUL to delimit the padding, which can then contain garbage). - Some programs use just a pointer and a length to determine sequences of characters. No NULs involved. - abstract sockets: On Linux, abstract Unix socket names are stored in a fixed-width array, and all bytes are meaningful (up to the specified size), even if they are NULs. Only special that that the first byte is NUL. Since those are only rare cases, those constructs don't seem to have a name; some programmers call them strings (quite confusingly). Has there been any de-facto standard (or informal naming) to call those things, and differentiate them? Thanks, Alex -- <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [TUHS] Re: Terms for string, and similar character constructs (was: origin of null-terminated strings) 2022-12-16 22:30 ` [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings) Alejandro Colomar @ 2022-12-16 22:51 ` Dave Horsfall 0 siblings, 0 replies; 18+ messages in thread From: Dave Horsfall @ 2022-12-16 22:51 UTC (permalink / raw) To: The Eunuchs Hysterical Society On Fri, 16 Dec 2022, Alejandro Colomar wrote: > [Resend from my subscribed address, as the list is subscribers-only, it seems] Of course; open lists are spam magnets... Regretful, but true. -- Dave ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-12-17 0:27 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-16 3:02 [TUHS] origin of null-terminated strings Douglas McIlroy 2022-12-16 3:14 ` [TUHS] " Ken Thompson 2022-12-16 9:13 ` Dr Iain Maoileoin 2022-12-16 13:42 ` Dan Halbert 2022-12-16 16:10 ` Dan Cross 2022-12-16 16:22 ` Tom Lyon 2022-12-16 16:29 ` Jon Steinhart 2022-12-16 20:12 ` Dave Horsfall 2022-12-16 21:02 ` Warner Losh 2022-12-16 21:13 ` Clem Cole 2022-12-16 21:49 ` Clem Cole 2022-12-17 0:26 ` Phil Budne 2022-12-16 21:18 ` Luther Johnson 2022-12-16 21:20 ` Dan Halbert 2022-12-16 3:17 ` Steve Nickolas 2022-12-16 17:24 ` John P. Linderman [not found] ` <6009124d-750d-365e-a424-ec7bb25922b9@gmail.com> 2022-12-16 22:30 ` [TUHS] Terms for string, and similar character constructs (was: origin of null-terminated strings) Alejandro Colomar 2022-12-16 22:51 ` [TUHS] " Dave Horsfall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).