From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29953 invoked from network); 18 May 2020 16:12:05 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 18 May 2020 16:12:05 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 5A7DB9C1D8; Tue, 19 May 2020 02:12:00 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 2747F9C160; Tue, 19 May 2020 02:11:41 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="fl8C435g"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 6CA9C9C178; Tue, 19 May 2020 02:11:38 +1000 (AEST) Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by minnie.tuhs.org (Postfix) with ESMTPS id DC2F99C15F for ; Tue, 19 May 2020 02:11:37 +1000 (AEST) Received: by mail-qk1-f181.google.com with SMTP id y22so10636007qki.3 for ; Mon, 18 May 2020 09:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7zmH0uWAC4ftzZLYZNqqjphLB95+qyz/h1rud+mxslc=; b=fl8C435gGaDraP8SctCu5IEY0oI8BqXLPT8FlGBe5sI91wkCxl1g2mAxjuNo7mGbNG iczMa0t8nmlEKFa+cMeToIIgvDcGyv1uE1HjYNW/SeGEyHJAu74tFxMpHmgehB9rs50H Jkei2b6ET6a52D57UzrOrR9FC+cVwFBYUR9Ztxx54tuXWDFgPkm1X/jcAF8VKuXjVW12 8pVVZj4XdCtxa7zRWoF7pXQw+UvUwt+kg0acVzxQ4vHx0Gc5o54kdxuVVZbrdhlyRrbe r7miXxbGtk3fJTAMqlZjlypCsvmEuK9fPUHgd82zf41GzTXSmozrt7fG6wLdX4YUTVW8 TcPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7zmH0uWAC4ftzZLYZNqqjphLB95+qyz/h1rud+mxslc=; b=bs+nfZ2WlbSL3jjYYHulO1Xo2iIw/qYxNxNxYyETmq7p96AXCvqa51UCJHCVg4MLx4 3MiLcNwZjQ+hUIZ86vZP3MZTKwlMwVqzjHDBEoE66VW85OCWmCQa0ve3qRQ8ehGo+YZr Yl/8q4nj+KAEUwtFblPjfouMLpL1k+xjPt88Z3KORluT8Z7TsDhg93epI+UfK1BVHRii /yeDPG2y3Et05KXnZqN5cJxlxmS+B0ZtGS+X+DDehpjEyl8+keKi1AUPk0GqbnswOy5k C8Do6SxTopBUMWEYuoBGeBioxk5zBNSdDFTqNIPYlongRFaudECw5JRItkBcgw3HDxmi MjLA== X-Gm-Message-State: AOAM531OpeI/D5VnSsffLWoJcGG8j20u0YTiaFAl4kBZP1CTJOQOqKi3 QdE9qA4vq63QMgdZ3khz9LbOSZqKdJUXH75NAEM= X-Google-Smtp-Source: ABdhPJxMkWLnxp5/xDk0/0PezcfpXjXCE3FTc+K+aTU/bKaPbrvWWWksgmUvTBOIYxiPGPdTzmauPyOO7C3SCFrrNgk= X-Received: by 2002:a05:620a:747:: with SMTP id i7mr17288163qki.346.1589818297025; Mon, 18 May 2020 09:11:37 -0700 (PDT) MIME-Version: 1.0 References: <20200515213138.8E0F72D2D71E@macaroni.inf.ed.ac.uk> <077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com> <20200515233427.31Vab%steffen@sdaoden.eu> <5DB09C5A-F5DA-4375-AAA5-0711FC6FB1D9@ronnatalie.com> <20200516232607.nLiIx%steffen@sdaoden.eu> In-Reply-To: From: Dan Cross Date: Mon, 18 May 2020 12:11:00 -0400 Message-ID: To: Paul Winalski Content-Type: multipart/alternative; boundary="000000000000da0b5105a5ee6af0" Subject: Re: [TUHS] v7 K&R C X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: The Eunuchs Hysterical Society , COFF Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000da0b5105a5ee6af0 Content-Type: text/plain; charset="UTF-8" On Sun, May 17, 2020 at 12:24 PM Paul Winalski wrote: > On 5/16/20, Steffen Nurpmeso wrote: > > > > Why was there no byte or "mem" type? > > These days machine architecture has settled on the 8-bit byte as the > unit for addressing, but it wasn't always the case. The PDP-10 > addressed memory in 36-bit units. The character manipulating > instructions could deal with a variety of different byte lengths: you > could store six 6-bit BCD characters per machine word, Was this perhaps a typo for 9 4-bit BCD digits? I have heard that a reason for the 36-bit word size of computers of that era was that the main competition at the time was against mechanical calculator, which had 9-digit precision. 9*4=36, so 9 BCD digits could fit into a single word, for parity with the competition. 6x6-bit data would certainly hold BAUDOT data, and I thought the Univac/CDC machines supported a 6-bit character set? Does this live on in the Unisys 1100-series machines? I see some reference to FIELDATA online. I feel like this might be drifting into COFF territory now; Cc'ing there. or five ASCII > 7-bit characters (with a bit left over), or four 8-bit characters > (ASCII plus parity, with four bits left over), or four 9-bit > characters. > > Regarding a "mem" type, take a look at BLISS. The only data type that > language has is the machine word. > > > +getfield(buf) > > +char buf[]; > > +{ > > + int j; > > + char c; > > + > > + j = 0; > > + while((c = buf[j] = getc(iobuf)) >= 0) > > + if(c==':' || c=='\n') { > > + buf[j] =0; > > + return(1); > > + } else > > + j++; > > + return(0); > > +} > > > > so here the EOF was different and char was signed 7-bit it seems. > > That makes perfect sense if you're dealing with ASCII, which is a > 7-bit character set. To bring it back slightly to Unix, when Mary Ann and I were playing around with First Edition on the emulated PDP-7 at LCM+L during the Unix50 event last USENIX, I have a vague recollection that the B routine for reading a character from stdin was either `getchar` or `getc`. I had some impression that this did some magic necessary to extract a character from half of an 18-bit word (maybe it just zeroed the upper half of a word or something). If I had to guess, I imagine that the coincidence between "character" and "byte" in C is a quirk of this history, as opposed to any special hidden meaning regarding textual vs binary data, particularly since Unix makes no real distinction between the two: files are just unstructured bags of bytes, they're called 'char' because that was just the way things had always been. - Dan C. --000000000000da0b5105a5ee6af0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, May 17, 2020 at 12:24 PM Paul Win= alski <paul.winalski@gmail.co= m> wrote:
On 5/16/20, Steffen Nurpmeso <steffen@sdaoden.eu> wrote:=
>
> Why was there no byte or "mem" type?

These days machine architecture has settled on the 8-bit byte as the
unit for addressing, but it wasn't always the case.=C2=A0 The PDP-10 addressed memory in 36-bit units.=C2=A0 The character manipulating
instructions could deal with a variety of different byte lengths:=C2=A0 you=
could store six 6-bit BCD characters per machine word,
Was this perhaps a typo for 9 4-bit BCD digits? I have heard th= at a reason for the 36-bit word size of computers of that era was that the = main competition at the time was against mechanical calculator, which had 9= -digit precision. 9*4=3D36, so 9 BCD digits could fit into a single word, f= or parity with the competition.

6x6-bit data would= certainly hold BAUDOT data, and I thought the Univac/CDC machines supporte= d a 6-bit character set?=C2=A0 Does this live on in the Unisys 1100-series = machines? I see some reference to FIELDATA online.

I feel like this might be drifting into COFF territory now; Cc'ing the= re.

o= r five ASCII
7-bit characters (with a bit left over), or four 8-bit characters
(ASCII plus parity, with four bits left over), or four 9-bit
characters.

Regarding a "mem" type, take a look at BLISS.=C2=A0 The only data= type that
language has is the machine word.

>=C2=A0 =C2=A0+getfield(buf)
>=C2=A0 =C2=A0+char buf[];
>=C2=A0 =C2=A0+{
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0int j;
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0char c;
>=C2=A0 =C2=A0+
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0j =3D 0;
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0while((c =3D buf[j] =3D getc(i= obuf)) >=3D 0)
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0if(c=3D=3D':' || c=3D= =3D'\n') {
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bu= f[j] =3D0;
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0re= turn(1);
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0j+= +;
>=C2=A0 =C2=A0+=C2=A0 =C2=A0 =C2=A0 =C2=A0return(0);
>=C2=A0 =C2=A0+}
>
> so here the EOF was different and char was signed 7-bit it seems.

That makes perfect sense if you're dealing with ASCII, which is a
7-bit character set.

To bring it back sligh= tly to Unix, when Mary Ann and I were playing around with First Edition on = the emulated PDP-7 at LCM+L during the Unix50 event last USENIX, I have a v= ague recollection that the B routine for reading a character from stdin was= either `getchar` or `getc`. I had some impression that this did some magic= necessary to extract a character from half of an 18-bit word (maybe it jus= t zeroed the upper half of a word or something). If I had to guess, I imagi= ne that the coincidence between "character" and "byte" = in C is a quirk of this history, as opposed to any special hidden meaning r= egarding textual vs binary data, particularly since Unix makes no real dist= inction between the two: files are just unstructured bags of bytes, they= 9;re called 'char' because that was just the way things had always = been.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Dan C.

--000000000000da0b5105a5ee6af0--