The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Line Terminators in Text Files
@ 2017-09-04 18:32 Doug McIlroy
  2017-09-05 13:02 ` Random832
  0 siblings, 1 reply; 10+ messages in thread
From: Doug McIlroy @ 2017-09-04 18:32 UTC (permalink / raw)



> When did LF replace CR/LF in UNIX?


Never. Unix always took LF as newline--an interpretation
blessed by the ASCII standard. The convention was
inherited from Multics.

Interpolation of CRs was the business of drivers, not file
formats.

As far as I know, the only CR/LF terminal that original
Unix dealt with was the model 33 console. That was identified
by the fact that the login name was received in all caps.
IIRC the TTY 37 conformed to Multics practice on the advice
of Joe Ossanna.

Because of the model 33, login names were case-insesitive.
Come to think of it, I don't know whether they still are
in general, though they must be for email to be delivered by
login name.

Doug


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-04 18:32 [TUHS] Line Terminators in Text Files Doug McIlroy
@ 2017-09-05 13:02 ` Random832
  2017-09-05 13:18   ` Steffen Nurpmeso
  2017-09-05 14:46   ` Clem Cole
  0 siblings, 2 replies; 10+ messages in thread
From: Random832 @ 2017-09-05 13:02 UTC (permalink / raw)


On Mon, Sep 4, 2017, at 14:32, Doug McIlroy wrote:
> Because of the model 33, login names were case-insesitive.
> Come to think of it, I don't know whether they still are
> in general, though they must be for email to be delivered by
> login name.

Looking at the code for getty in various versions, it looks like 4.2BSD
and System V changed this to "if the name was entered in all uppercase,
convert it to lowercase, else leave it alone" - 4.1cBSD and System III
(and earlier) had it unconditionally convert to lowercase, detecting
whether it was entered in all uppercase in order to set the tty mode. 

As for email, the email standards don't make any guarantee that the
username portion of addresses will be case insensitive. From what I can
find, sendmail by default converts all characters in the username
portion to lowercase (thus allowing email addresses to be
case-insensitive for delivery to local usernames that must be in all
lowercase), and has an option to treat it as case-sensitive instead.

(All of this can probably be rounded down to "Usernames should be
defined as all-lowercase")


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-05 13:02 ` Random832
@ 2017-09-05 13:18   ` Steffen Nurpmeso
  2017-09-05 14:46   ` Clem Cole
  1 sibling, 0 replies; 10+ messages in thread
From: Steffen Nurpmeso @ 2017-09-05 13:18 UTC (permalink / raw)


Random832 <random832 at fastmail.com> wrote:
 |On Mon, Sep 4, 2017, at 14:32, Doug McIlroy wrote:
 |> Because of the model 33, login names were case-insesitive.
 |> Come to think of it, I don't know whether they still are
 |> in general, though they must be for email to be delivered by
 |> login name.
 |
 |Looking at the code for getty in various versions, it looks like 4.2BSD
 |and System V changed this to "if the name was entered in all uppercase,
 |convert it to lowercase, else leave it alone" - 4.1cBSD and System III
 |(and earlier) had it unconditionally convert to lowercase, detecting
 |whether it was entered in all uppercase in order to set the tty mode. 
 |
 |As for email, the email standards don't make any guarantee that the
 |username portion of addresses will be case insensitive. From what I can
 |find, sendmail by default converts all characters in the username
 |portion to lowercase (thus allowing email addresses to be
 |case-insensitive for delivery to local usernames that must be in all
 |lowercase), and has an option to treat it as case-sensitive instead.
 |
 |(All of this can probably be rounded down to "Usernames should be
 |defined as all-lowercase")

If i recall correctly the email standards go for case-preserving
all through, to leave it up to the destination to apply whatever
is appropriate locally.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-05 13:02 ` Random832
  2017-09-05 13:18   ` Steffen Nurpmeso
@ 2017-09-05 14:46   ` Clem Cole
  2017-09-05 15:44     ` Random832
  1 sibling, 1 reply; 10+ messages in thread
From: Clem Cole @ 2017-09-05 14:46 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

“We're doomed to repeat the past no matter what. That's what it is to be
alive. It's pretty dense kids who haven't figured that out by the time
they're ten.... Most kids can't afford to go to Harvard and be
misinformed.”
― Kurt Vonnegut Jr.
<https://www.goodreads.com/author/show/2778055.Kurt_Vonnegut_Jr_>, Bluebeard
<https://www.goodreads.com/work/quotes/6582745>

On Tue, Sep 5, 2017 at 9:02 AM, Random832 <random832 at fastmail.com> wrote:

>
> As for email, the email standards don't make any guarantee that the
> username portion of addresses will be case insensitive.

Actually they do, RFC722 was not quite as crisp as this which 833 replaced,
but even with 722 the intention was pretty clear and examples described
"case independence of certain special atoms."   With 833, Crocker cleaned
up the language to be simply:

From RFC833, Section 3.4.7 CASE INDEPENDENCE

Except as noted, alphabetic strings may be represented in any combination
of upper and lower case. The only syntactic units which requires
preservation of case information are:

   - text
   - qtext
   - dtext
   - ctext
   - quoted-pair
   - local-part, except "Postmaster"

   When matching any other syntactic unit, case is to be ignored.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170905/7a96295c/attachment-0001.html>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-05 14:46   ` Clem Cole
@ 2017-09-05 15:44     ` Random832
  2017-09-05 16:05       ` Clem Cole
  0 siblings, 1 reply; 10+ messages in thread
From: Random832 @ 2017-09-05 15:44 UTC (permalink / raw)


On Tue, Sep 5, 2017, at 10:46, Clem Cole wrote:
> Except as noted, alphabetic strings may be represented in any combination
> of upper and lower case. The only syntactic units which requires
> preservation of case information are:
>
>    - local-part, except "Postmaster"

So... the username portion.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-05 15:44     ` Random832
@ 2017-09-05 16:05       ` Clem Cole
  2017-09-05 16:10         ` Random832
  0 siblings, 1 reply; 10+ messages in thread
From: Clem Cole @ 2017-09-05 16:05 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

On Tue, Sep 5, 2017 at 11:44 AM, Random832 <random832 at fastmail.com> wrote:

> On Tue, Sep 5, 2017, at 10:46, Clem Cole wrote:
> > Except as noted, alphabetic strings may be represented in any combination
> > of upper and lower case. The only syntactic units which requires
> > preservation of case information are:
> >
> >    - local-part, except "Postmaster"
>
> So... the username portion.
>
​so the "postmaster" (username) does not preserve case by this rule.

It had to work that way, because CDC machines in particular in those days
had very funky character sets (lots of them actually).  IBM's were not much
better.   Remember, IBM was the primary driver behind ASCII (the System 360
was supposed to be IBM's first ASCII system).

Upper and Lower were very much a luxury because bits were expensive.  Not
just in registers, but main memory, registers, disk storage.

I think it's hard for modern users to really understand the extremes that
programmers had in those days because so much was done to encode things in
small numeric codes.   This was just another example if it.

The 8-bit 'byte' is only so because Fred Brooks, kept throwing Gene Amdahl
out of office during the 360 project.   Gene thought anything over 6 bits
was a waste.   Fred said if it was not a power of 2 don't come back, he
could not program with it.

Clem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170905/046eb09b/attachment-0001.html>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-05 16:05       ` Clem Cole
@ 2017-09-05 16:10         ` Random832
  0 siblings, 0 replies; 10+ messages in thread
From: Random832 @ 2017-09-05 16:10 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On Tue, Sep 5, 2017, at 12:05, Clem Cole wrote:
> On Tue, Sep 5, 2017 at 11:44 AM, Random832 <random832 at fastmail.com>
> wrote:
> 
> > On Tue, Sep 5, 2017, at 10:46, Clem Cole wrote:
> > > Except as noted, alphabetic strings may be represented in any combination
> > > of upper and lower case. The only syntactic units which requires
> > > preservation of case information are:
> > >
> > >    - local-part, except "Postmaster"
> >
> > So... the username portion.
> >
> ​so the "postmaster" (username) does not preserve case by this rule.

Yes, but, say, "clemc" or "random832" or "tuhs" does (the endpoint
system might not actually care, but other systems are not free to
capitalize it in the assumption that it won't matter). Postmaster, in
particular, is an exception to the exception.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-06 18:30 ` William Cheswick
@ 2017-09-07  9:02   ` Mutiny 
  0 siblings, 0 replies; 10+ messages in thread
From: Mutiny  @ 2017-09-07  9:02 UTC (permalink / raw)


From: William Cheswick &lt;ches@cheswick.com&gt;Sent: Thu, 07 Sep 2017 00:01:35...My favorite line in the Model 33 TTY repair manual:&ldquo;The resistor will act as a fuse.&rdquo;really? ;-)BTW u made my day.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170907/ccfd5ab2/attachment.html>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
  2017-09-06 15:24 Doug McIlroy
@ 2017-09-06 18:30 ` William Cheswick
  2017-09-07  9:02   ` Mutiny 
  0 siblings, 1 reply; 10+ messages in thread
From: William Cheswick @ 2017-09-06 18:30 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

I remember the TTYs quite distinctly: they were my first computer interfaces.  On the model
33 and 35 teletypes, a carriage return and a single rubout were sufficient padding, even if
the carriage was on the far right.  The rubout character was distinctive, and you could spot
line separators fairly easily on the punched paper tape.  (Where did all my paper tapes go?)

My favorite line in the Model 33 TTY repair manual:
“The resistor will act as a fuse.”

ches



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Line Terminators in Text Files
@ 2017-09-06 15:24 Doug McIlroy
  2017-09-06 18:30 ` William Cheswick
  0 siblings, 1 reply; 10+ messages in thread
From: Doug McIlroy @ 2017-09-06 15:24 UTC (permalink / raw)


> does anyone know anything about the 1961 DoD 8-bit
> character set standard it refers to?
>
> This does not appear to say anything about LF vs "Newline" (as either a
> name or a function), though the 1986 version of ASCII deprecates it, so
> was most likely acknowledged in versions between these in response to
> practices on OSes such as Multics. ECMA-6:1973 acknowledges it

I wouldn't say the "practices" of Multics influenced the recognition
of NL in the ASCII standard, for Multics didn't go into use until
1970, while NL was specified by 1965 (see below) with direct
reference to the properties of equipment, not operating systems.
Just what equipment, I don't know. IBM Selectric perhaps?

I recall Multics discussions that specifically cited the standard,
in particular Joe Ossanna's liaison between Multics and the TTY 37
design team at Western Electric, circa 1967. Thus it is my
understanding that Multics was an early adopter of ASCII's NL
convention, not an influencer of it.

Quotation from "Proposed revised American standard for information
interchange", CACM 8 (April 1965) 207-214:

  The controls CR and LF are intended for printer equipment
  which requires separate combinations to return the carriage
  and feed a line.

  As an alternative, for equipment which uses a single combination
  for a combined carriage-return and line-feed operation
  (called New-Line), NL will be coded at FE 2 [LF]. Then FE 5
  [CR] will be regarded as Backspace BS.

  If the latter type of equipment has to interwork with the
  former, it may be necessary to take steps to introduce the
  CR character.

One might read the preceding paragraph as advice not only to
writers of driver software but also to a future standards
committee to undo the curious notion of regarding CR
as BS. Unix effectively took it both ways, and kept the
original meaning of CR.

doug


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-07  9:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-04 18:32 [TUHS] Line Terminators in Text Files Doug McIlroy
2017-09-05 13:02 ` Random832
2017-09-05 13:18   ` Steffen Nurpmeso
2017-09-05 14:46   ` Clem Cole
2017-09-05 15:44     ` Random832
2017-09-05 16:05       ` Clem Cole
2017-09-05 16:10         ` Random832
2017-09-06 15:24 Doug McIlroy
2017-09-06 18:30 ` William Cheswick
2017-09-07  9:02   ` Mutiny 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).