[TUHS] origins of void* -- Apology!

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] origins of void* -- Apology!
@ 2017-11-08 16:07 Nemo
  2017-11-08 16:12 ` Warner Losh
  2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
  0 siblings, 2 replies; 35+ messages in thread
From: Nemo @ 2017-11-08 16:07 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 390 bytes --]

On 6 November 2017 at 19:36, Ron Natalie <ron at ronnatalie.com> wrote:
> It’s worse than that.   “char” is defined as neither signed nor unsigned.
> The signedness is implementation defined.    This was why we have the inane
> “signed” keyword.

What was that story about porting an early UNIX to a machine with
different char polarity?  I dimly recall only a few problems.

N.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 16:07 [TUHS] origins of void* -- Apology! Nemo
@ 2017-11-08 16:12 ` Warner Losh
  2017-11-08 19:59   ` Ron Natalie
                     ` (2 more replies)
  2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
  1 sibling, 3 replies; 35+ messages in thread
From: Warner Losh @ 2017-11-08 16:12 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

On Wed, Nov 8, 2017 at 9:07 AM, Nemo <cym224 at gmail.com> wrote:

> On 6 November 2017 at 19:36, Ron Natalie <ron at ronnatalie.com> wrote:
> > It’s worse than that.   “char” is defined as neither signed nor unsigned.
> > The signedness is implementation defined.    This was why we have the
> inane
> > “signed” keyword.
>
> What was that story about porting an early UNIX to a machine with
> different char polarity?  I dimly recall only a few problems.
>

Doesn't even have to be very early... There's lots of 'assume char is
signed bugs' in even modern code. So many that ARM gave up on the idea that
unsigned char was good (since the underlying ARM architecture supported it
better) and their modern ABIs are all signed char. The other thing that
EABI fixes is the crazy alignment rules that were out-of-step with the rest
of the computer industry that broke a lot of networking and storage code on
ARM because its rules caused structs that would otherwise describe the
binary layout to be suddenly wrong. Yes, that is an implementation choice,
just a poor one that was eventually corrected.

When I was working on FreeBSD/arm only a decade ago, I'd routinely hit both
of these issues...

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/568e164a/attachment.html>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 16:12 ` Warner Losh
@ 2017-11-08 19:59   ` Ron Natalie
  2017-11-08 23:33   ` Steffen Nurpmeso
  2017-11-09  1:35   ` Steve Johnson
  2 siblings, 0 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-08 19:59 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

The “story” was the reticence to require something that MIGHT take additional instructions.     Again, this stems from yet another overload of char, that of a small numeric type (many languages do not treat characters as integers).     The idea was to let char be the regular char type and if you were going to do math on it, you’d better explicitly state signed/unsigned, of course, people get sloppy leading to the bugs you noted.

From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Warner Losh
Sent: Wednesday, November 8, 2017 11:13 AM
To: Nemo
Cc: The Eunuchs Hysterical Society
Subject: Re: [TUHS] origins of void* -- Apology!

On Wed, Nov 8, 2017 at 9:07 AM, Nemo <cym224 at gmail.com> wrote:

On 6 November 2017 at 19:36, Ron Natalie <ron at ronnatalie.com> wrote:
> It’s worse than that.   “char” is defined as neither signed nor unsigned.
> The signedness is implementation defined.    This was why we have the inane
> “signed” keyword.

What was that story about porting an early UNIX to a machine with
different char polarity?  I dimly recall only a few problems.

Doesn't even have to be very early... There's lots of 'assume char is signed bugs' in even modern code. So many that ARM gave up on the idea that unsigned char was good (since the underlying ARM architecture supported it better) and their modern ABIs are all signed char. The other thing that EABI fixes is the crazy alignment rules that were out-of-step with the rest of the computer industry that broke a lot of networking and storage code on ARM because its rules caused structs that would otherwise describe the binary layout to be suddenly wrong. Yes, that is an implementation choice, just a poor one that was eventually corrected.

When I was working on FreeBSD/arm only a decade ago, I'd routinely hit both of these issues...

Warner

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/0a653e9e/attachment-0001.html>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] NUXI Problem
  2017-11-08 16:07 [TUHS] origins of void* -- Apology! Nemo
  2017-11-08 16:12 ` Warner Losh
@ 2017-11-08 20:28 ` Warren Toomey
  2017-11-08 20:38   ` Ron Natalie
                     ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Warren Toomey @ 2017-11-08 20:28 UTC (permalink / raw)


On Wed, Nov 08, 2017 at 11:07:10AM -0500, Nemo wrote:
>What was that story about porting an early UNIX to a machine with
>different char polarity?  I dimly recall only a few problems.

The NUXI problem on the Interdata 7/32, when the University of Wollongong
did the port of Sixth Edition.

I can't find a link to the actual story, but from memory the system
printed "NUXI" out when it was first booted.

Cheers, Warren


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] NUXI Problem
  2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
@ 2017-11-08 20:38   ` Ron Natalie
  2017-11-08 20:39   ` Clem Cole
  2017-11-08 23:14   ` Warren Toomey
  2 siblings, 0 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-08 20:38 UTC (permalink / raw)

I seem to recall the story was an early IBM Series 1 port.   Seemed to
recall them relating that at a UNIX Users Group meeting.

-----Original Message-----
From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Warren Toomey
Sent: Wednesday, November 8, 2017 3:29 PM
To: Nemo
Cc: The Eunuchs Hysterical Society
Subject: Re: [TUHS] NUXI Problem

On Wed, Nov 08, 2017 at 11:07:10AM -0500, Nemo wrote:
>What was that story about porting an early UNIX to a machine with 
>different char polarity?  I dimly recall only a few problems.

The NUXI problem on the Interdata 7/32, when the University of Wollongong
did the port of Sixth Edition.

I can't find a link to the actual story, but from memory the system printed
"NUXI" out when it was first booted.

Cheers, Warren

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] NUXI Problem
  2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
  2017-11-08 20:38   ` Ron Natalie
@ 2017-11-08 20:39   ` Clem Cole
  2017-11-08 23:14   ` Warren Toomey
  2 siblings, 0 replies; 35+ messages in thread
From: Clem Cole @ 2017-11-08 20:39 UTC (permalink / raw)


Close it, was the Series/1 port not the Interdata.

It was reported at the 1980 USENIX by the folks from Cleveland State that
ported it.   The Series/1 was not byte-swapped.

Clem

On Wed, Nov 8, 2017 at 8:28 PM, Warren Toomey <wkt at tuhs.org> wrote:

> On Wed, Nov 08, 2017 at 11:07:10AM -0500, Nemo wrote:
>
>> What was that story about porting an early UNIX to a machine with
>> different char polarity?  I dimly recall only a few problems.
>>
>
> The NUXI problem on the Interdata 7/32, when the University of Wollongong
> did the port of Sixth Edition.
>
> I can't find a link to the actual story, but from memory the system
> printed "NUXI" out when it was first booted.
>
> Cheers, Warren
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/37d444c6/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] NUXI Problem
  2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
  2017-11-08 20:38   ` Ron Natalie
  2017-11-08 20:39   ` Clem Cole
@ 2017-11-08 23:14   ` Warren Toomey
  2 siblings, 0 replies; 35+ messages in thread
From: Warren Toomey @ 2017-11-08 23:14 UTC (permalink / raw)


On Thu, Nov 09, 2017 at 06:28:33AM +1000, Warren Toomey wrote:
>On Wed, Nov 08, 2017 at 11:07:10AM -0500, Nemo wrote:
>>What was that story about porting an early UNIX to a machine with
>>different char polarity?  I dimly recall only a few problems.
>
>The NUXI problem on the Interdata 7/32, when the University of Wollongong
>did the port of Sixth Edition.

Oops, I stand corrected (by private e-mails), it was the IBM Series/1.

I'll blame bit rot, or perhaps the Interdata also suffered from the NUXI
problem.

Thanks, Warren


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 16:12 ` Warner Losh
  2017-11-08 19:59   ` Ron Natalie
@ 2017-11-08 23:33   ` Steffen Nurpmeso
  2017-11-09  1:35   ` Steve Johnson
  2 siblings, 0 replies; 35+ messages in thread
From: Steffen Nurpmeso @ 2017-11-08 23:33 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]

Warner Losh <imp at bsdimp.com> wrote:
 |On Wed, Nov 8, 2017 at 9:07 AM, Nemo <[1]cym224 at gmail.com[/1]> wrote:
 |On 6 November 2017 at 19:36, Ron Natalie <[2]ron at ronnatalie.com[/2]> wrote:
 |> It’s worse than that.   “char” is defined as neither signed nor unsigned.
 |> The signedness is implementation defined.    This was why we have \
 |> the inane
 |> “signed” keyword.
 ...
 |What was that story about porting an early UNIX to a machine with
 |different char polarity?  I dimly recall only a few problems.
 |
 |Doesn't even have to be very early... There's lots of 'assume char \
 |is signed bugs' in even modern code. So many that ARM gave up on the \
 |idea that 
 |unsigned char was good (since the underlying ARM architecture supported \
 |it better) and their modern ABIs are all signed char. The other thing \
 ..
 |When I was working on FreeBSD/arm only a decade ago, I'd routinely \
 |hit both of these issues...

I had one of those on Debian/arm64 (Bug#806300) no sooner but
November 2015, very friendly reported as

 |This symptom and the pattern of failures is typical of programs that
 |assume that plain char is signed. Fortunately there's a warning in
 |the build log that tells you exactly where the bug is:

(in fact already mentioned in some hidden archlinux forum, and
also to me in private by a Swede, but i failed to see, and forgot,
hu-hu!!, all in March 2015) introduced in December 2013 when
blindly fixing CC warnings (Many: fix gcc 4.8.2 -fstrict-overflow
-Wstrict-overflow=5).  Testing char not int against EOF is bad.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 16:12 ` Warner Losh
  2017-11-08 19:59   ` Ron Natalie
  2017-11-08 23:33   ` Steffen Nurpmeso
@ 2017-11-09  1:35   ` Steve Johnson
  2 siblings, 0 replies; 35+ messages in thread
From: Steve Johnson @ 2017-11-09  1:35 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

I don't think either Dennis or I ever thought that characters should
be signed.  It's true that PCC didn't specify it.  It's also true
that in those days the 8th bit in a char was pretty much unused
(except by the shell), so the issue never arose.  I believe the folks
at Bell Labs Holmdel who did the port to the Vax were the first ones
to come up with signed characters.  I think it was a real blot on
C.  For example, consider:
       struct {  ....
              char x:1;
       ...
       }

If characters are signed, the only legal values of x are 0 and -1 (!)

Steve

----- Original Message -----
From:
 "Warner Losh" <imp at bsdimp.com>

To:
"Nemo" <cym224 at gmail.com>
Cc:
"The Eunuchs Hysterical Society" <tuhs at tuhs.org>
Sent:
Wed, 8 Nov 2017 09:12:50 -0700
Subject:
Re: [TUHS] origins of void* -- Apology!

On Wed, Nov 8, 2017 at 9:07 AM, Nemo <cym224 at gmail.com [1]>
 wrote:
On 6 November 2017 at 19:36, Ron Natalie <ron at ronnatalie.com [2]>
wrote:
 > It’s worse than that.   “char” is defined as neither signed
nor unsigned.
 > The signedness is implementation defined.    This was why we have
the inane
 > “signed” keyword.

 What was that story about porting an early UNIX to a machine with
 different char polarity?  I dimly recall only a few problems.

Doesn't even have to be very early... There's lots of 'assume char is
signed bugs' in even modern code. So many that ARM gave up on the idea
that unsigned char was good (since the underlying ARM architecture
supported it better) and their modern ABIs are all signed char. The
other thing that EABI fixes is the crazy alignment rules that were
out-of-step with the rest of the computer industry that broke a lot of
networking and storage code on ARM because its rules caused structs
that would otherwise describe the binary layout to be suddenly wrong.
Yes, that is an implementation choice, just a poor one that was
eventually corrected.

When I was working on FreeBSD/arm only a decade ago, I'd routinely hit
both of these issues...

Warner

Links:
------
[1] mailto:cym224 at gmail.com
[2] mailto:ron at ronnatalie.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/c2c13b12/attachment.html>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-09  7:14             ` Don Hopkins
@ 2017-11-09  7:44               ` Lars Brinkhoff
  0 siblings, 0 replies; 35+ messages in thread
From: Lars Brinkhoff @ 2017-11-09  7:44 UTC (permalink / raw)


Don Hopkins wrote:
> 1 bit bytes, the smallest addressable unit on the PDP-10, sounds kinda
> cool actually. Now would those be signed or unsigned?

There are only instructions to load unsigned bytes.  No sign extensions.

But sadly, this is getting off topic.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-09  6:37           ` Lars Brinkhoff
@ 2017-11-09  7:14             ` Don Hopkins
  2017-11-09  7:44               ` Lars Brinkhoff
  0 siblings, 1 reply; 35+ messages in thread
From: Don Hopkins @ 2017-11-09  7:14 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]


> On 9 Nov 2017, at 07:37, Lars Brinkhoff <lars at nocrew.org> wrote:
> 
> Bakul Shah wrote:
>> I agree that `char' shouldn't do double duty as the smallest
>> addressable unit and I was suggesing uint8_t does that job.
> 
> There are still machines around where 8-bit bytes isn't a natural fit.

1 bit bytes, the smallest addressable unit on the PDP-10, sounds kinda cool actually. Now would those be signed or unsigned?

The PowerPC was great at smashing and swizzling bit fields and emulating other CPU instruction sets with different memory layouts, because it could rotate and mask very quickly! You could do byte reversal in three instructions: "rotate left" 8 to position two of the bytes, then two “rotate left word immediate then mask insert” instructions. 

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_32bit_rtate_shift.htm <https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_32bit_rtate_shift.htm>

http://sametwice.com/rlwinm <http://sametwice.com/rlwinm>

https://www.google.com/patents/US20140208067 <https://www.google.com/patents/US20140208067>

PowerPC AltiVec was a really beautiful instruction set.

https://en.wikipedia.org/wiki/AltiVec <https://en.wikipedia.org/wiki/AltiVec>

-Don

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171109/72050088/attachment-0001.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 21:25         ` Bakul Shah
@ 2017-11-09  6:37           ` Lars Brinkhoff
  2017-11-09  7:14             ` Don Hopkins
  0 siblings, 1 reply; 35+ messages in thread
From: Lars Brinkhoff @ 2017-11-09  6:37 UTC (permalink / raw)


Bakul Shah wrote:
> I agree that `char' shouldn't do double duty as the smallest
> addressable unit and I was suggesing uint8_t does that job.

There are still machines around where 8-bit bytes isn't a natural fit.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:45             ` Warner Losh
@ 2017-11-09  6:33               ` Lars Brinkhoff
  0 siblings, 0 replies; 35+ messages in thread
From: Lars Brinkhoff @ 2017-11-09  6:33 UTC (permalink / raw)


Warner Losh wrote:
> Don Hopkins wrote:
>> The PDP-10 had arbitrarily sized byte pointers! Did anybody ever
>> implement a C compiler on that hardware?
>
> Yes. Several people did. We had a thread about it not to long ago. kcc Kok
> Chen's C compiler for PDP-10 can be found at https://github.com/PDP-10/kcc.

I know about four:

- C10 by Alan Snyder
- KCC by Kok Chen, improved by Ken Harrenstien
- GCC backend by myself, funded by XKL
- PCC backend by Anders Magnusson


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 17:44       ` Ralph Corderoy
  2017-11-08 19:56         ` Ron Natalie
@ 2017-11-08 21:25         ` Bakul Shah
  2017-11-09  6:37           ` Lars Brinkhoff
  1 sibling, 1 reply; 35+ messages in thread
From: Bakul Shah @ 2017-11-08 21:25 UTC (permalink / raw)

On Wed, 08 Nov 2017 17:44:50 +0000 Ralph Corderoy <ralph at inputplus.co.uk> wrote:
Ralph Corderoy writes:
> Hi Bakul,
> 
> > void* serves a different purpose. It says this is an untyped pointer
> > (or a ptr to an instance of any type) so no question of size being an
> > issue.
> 
> In C, ignoring POSIX, a void pointer is big enough to hold any pointer
> to data.  Pointers to data may be different sizes.  And a void pointer
> can't hold a function pointer, but all function pointers are defined to
> be the same size.  Thus `void (*)(void)' can be used as a generic
> function pointer type and cast to other ones when needed.

Yes, I was being sloppy, not mentiong the fn ptr exception.

I was saying `void *' represents a generic non-function
pointer. I was just separating it from what Ron wants, which
is, if I understand right, is a pointer to the *smallest*
addressable memory unit. I agree that `char' shouldn't do
double duty as the smallest addressable unit and I was
suggesing uint8_t does that job. But that is not true either.
There are word addressable machines where you can't directly
address bytes (if they have 8 bit bytes). Nor would you want a
"byte pointer" to be a general pointer.

> > It shouldn't even have been "void*". I would've preferred _* and _
> > instead of void* and void. Much more appropriate for a concise
> > language like C!
> 
> That's awful.  Might as well say `return' occurs so often, it should
> have been `@'.  :-)

Fits right in with "e1 ? e2 : e3" :-) My thinking was that the
word void loses any meaning in "void *". It is a not a pointer
to an empty space.  Seems people just didn't want to add a new
keyword so they reused void. _ is alreast more mnemonic.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 19:56         ` Ron Natalie
  2017-11-08 20:39           ` Don Hopkins
@ 2017-11-08 20:50           ` Steve Nickolas
  1 sibling, 0 replies; 35+ messages in thread
From: Steve Nickolas @ 2017-11-08 20:50 UTC (permalink / raw)

On Wed, 8 Nov 2017, Ron Natalie wrote:

> Ralph is right.  You don't have to go any further than the old x86 
> implementations to find machines where the function pointers are bigger 
> than the data pointers.

It could get pretty baroque, depending on your memory model; as you had 
16-bit pointers and 32-bit (actually 20-bit because of the hairy way 
segmentation worked) pointers and what was used for what depended on 
compiler switches or the nonstandard "near" and "far" keywords (e.g., char 
far *screen=0xA0000000;).

-uso.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:47               ` Don Hopkins
@ 2017-11-08 20:48                 ` Don Hopkins
  0 siblings, 0 replies; 35+ messages in thread
From: Don Hopkins @ 2017-11-08 20:48 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 495 bytes --]


> On 8 Nov 2017, at 21:47, Don Hopkins <SimHacker at gmail.com> wrote:
> 
> 
>> I don’t care what they say.
>> 36-bits is here to stay!
> 
> You know what they say: 64 bits is just 36 bits at 2.88% interest compounded monthly for two years! 
> 
> -Don
> 

That’s not what they say: it’s actually twenty years! ;) 

-Don

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/8350a593/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:42             ` Ron Natalie
@ 2017-11-08 20:47               ` Don Hopkins
  2017-11-08 20:48                 ` Don Hopkins
  0 siblings, 1 reply; 35+ messages in thread
From: Don Hopkins @ 2017-11-08 20:47 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 336 bytes --]


> I don’t care what they say.
> 36-bits is here to stay!

You know what they say: 64 bits is just 36 bits at 2.88% interest compounded monthly for two years! 

-Don

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/f184e9e8/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:39           ` Don Hopkins
                               ` (2 preceding siblings ...)
  2017-11-08 20:43             ` Clem Cole
@ 2017-11-08 20:45             ` Warner Losh
  2017-11-09  6:33               ` Lars Brinkhoff
  3 siblings, 1 reply; 35+ messages in thread
From: Warner Losh @ 2017-11-08 20:45 UTC (permalink / raw)


On Wed, Nov 8, 2017 at 1:39 PM, Don Hopkins <don at donhopkins.com> wrote:

>
> On 8 Nov 2017, at 20:56, Ron Natalie <ron at ronnatalie.com> wrote:
>
> Ralph is right.   You don't have to go any further than the old x86
> implementations to find machines where the function pointers are bigger
> than the data pointers.
>
> Further void* both by the standard and by practical matter MUST have the
> format of char*.   Any other type of pointer has to be convertible to
> void*/char* (as both must address the smallest addressable unit).
> Most machines, don't need to actually do any pointer conversion but more
> than a few do, mostly those that have word addressing as native.
>
> If I recall properly, the CRAY, which didn't really have byte addressing
> at all, natively, just had the byte offset into word encoded in high order
> bits.    The UNIVAC has a quite rich "partial word" format encoded in the
> pointers.    The HEP as well used the low order bits to switch the operand
> size as well as the offset into the word.
>
> This all works because conversion via normal means converted the from or
> to the void*/char* and whatever the other data pointer type, as it knows
> the type of both sides of the conversion.
> The BSD kernels however were ripe with what I call "conversion by union."
>    It would store one pointer type into a union of one pointer type and
> retrieve it from another.    Now this is officially undefined behavior
> (as is most use of sockaddr_t in the early days).    I remember spending a
> few days running around the kernel "fixing" this when doing the HEP port.
>
> Ah, yes, the strict alias rule. Took FreeBSD, at least, about a decade to
excise it from the tree...


> The PDP-10 had arbitrarily sized byte pointers! Did anybody ever implement
> a C compiler on that hardware?
>

Yes. Several people did. We had a thread about it not to long ago. kcc Kok
Chen's C compiler for PDP-10 can be found at https://github.com/PDP-10/kcc.

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/b51b96e1/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:39           ` Don Hopkins
  2017-11-08 20:42             ` Ron Natalie
  2017-11-08 20:43             ` Don Hopkins
@ 2017-11-08 20:43             ` Clem Cole
  2017-11-08 20:45             ` Warner Losh
  3 siblings, 0 replies; 35+ messages in thread
From: Clem Cole @ 2017-11-08 20:43 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 440 bytes --]

On Wed, Nov 8, 2017 at 8:39 PM, Don Hopkins <don at donhopkins.com> wrote:

>
>
> The PDP-10 had arbitrarily sized byte pointers! Did anybody ever implement
> a C compiler on that hardware?
>
> Yes:  There were at least 2 that I knew about:
https://github.com/PDP-10
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/7e7ca560/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:39           ` Don Hopkins
  2017-11-08 20:42             ` Ron Natalie
@ 2017-11-08 20:43             ` Don Hopkins
  2017-11-08 20:43             ` Clem Cole
  2017-11-08 20:45             ` Warner Losh
  3 siblings, 0 replies; 35+ messages in thread
From: Don Hopkins @ 2017-11-08 20:43 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

> The PDP-10 had arbitrarily sized byte pointers! Did anybody ever implement a C compiler on that hardware?
> 
> https://stackoverflow.com/questions/3153141/defining-a-byte-in-c <https://stackoverflow.com/questions/3153141/defining-a-byte-in-c>
> 
> https://en.wikipedia.org/wiki/36-bit <https://en.wikipedia.org/wiki/36-bit>
> 
> As DIGEX teased the VAX weenies at DECUS:
> 
> “If you’re not playing with 36 bits, you’re not playing with a full DEC!"
> 
> -Don
> 


Re: PDP-10 backend for gcc
https://gcc.gnu.org/ml/gcc/2000-09/msg00073.html <https://gcc.gnu.org/ml/gcc/2000-09/msg00073.html>

ftp://kermit.columbia.edu/kermit/dec20/assembler-guide.txt <ftp://kermit.columbia.edu/kermit/dec20/assembler-guide.txt>
2.12. Byte Instructions

In the PDP-10 a "byte" is some number of contiguous bits within one word.  A
byte pointer is a word that describes the byte.  There are three parts to the
description of a byte: the word (i.e., address) in which the byte occurs, the
position of the byte within the word, and the length of the byte.

A byte pointer has the following format:


   Bit     000000 000011 1 1 1111 112222222222333333
 Position  012345 678901 2 3 4567 890123456789012345
           _________________________________________
          |      |      | | |    |                  |
          | POS  | SIZE |U|I| X  |        Y         |
          |______|______|_|_|____|__________________|

   - POS is the byte position: the number of bits remaining in the word
     to the right of the byte.

   - SIZE is the byte size in bits.

   - The U field is reserved for future use and must be zero.

   - I, X, and Y are the same as in an instruction.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/44082a4c/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 20:39           ` Don Hopkins
@ 2017-11-08 20:42             ` Ron Natalie
  2017-11-08 20:47               ` Don Hopkins
  2017-11-08 20:43             ` Don Hopkins
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Ron Natalie @ 2017-11-08 20:42 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

Ø  The PDP-10 had arbitrarily sized byte pointers! Did anybody ever implement a C compiler on that hardware?

 

I don’t recall a DEC 10 (or PDP-6) C compiler, but such certainly did exist on the UNIVAC 1100 which had a similar arbitrary partial word format.

The Univac never could decide what the character set was, be it ASCII (in 7, 8, or 9 bits) or FIELDDATA (six bits).

 

I don’t care what they say.
36-bits is here to stay!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/446b3e25/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 19:56         ` Ron Natalie
@ 2017-11-08 20:39           ` Don Hopkins
  2017-11-08 20:42             ` Ron Natalie
                               ` (3 more replies)
  2017-11-08 20:50           ` Steve Nickolas
  1 sibling, 4 replies; 35+ messages in thread
From: Don Hopkins @ 2017-11-08 20:39 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]


> On 8 Nov 2017, at 20:56, Ron Natalie <ron at ronnatalie.com> wrote:
> 
> Ralph is right.   You don't have to go any further than the old x86 implementations to find machines where the function pointers are bigger than the data pointers.  
> 
> Further void* both by the standard and by practical matter MUST have the format of char*.   Any other type of pointer has to be convertible to void*/char* (as both must address the smallest addressable unit).
> Most machines, don't need to actually do any pointer conversion but more than a few do, mostly those that have word addressing as native.
> 
> If I recall properly, the CRAY, which didn't really have byte addressing at all, natively, just had the byte offset into word encoded in high order bits.    The UNIVAC has a quite rich "partial word" format encoded in the pointers.    The HEP as well used the low order bits to switch the operand size as well as the offset into the word.
> 
> This all works because conversion via normal means converted the from or to the void*/char* and whatever the other data pointer type, as it knows the type of both sides of the conversion.
> The BSD kernels however were ripe with what I call "conversion by union."    It would store one pointer type into a union of one pointer type and retrieve it from another.    Now this is officially undefined behavior
> (as is most use of sockaddr_t in the early days).    I remember spending a few days running around the kernel "fixing" this when doing the HEP port.

The PDP-10 had arbitrarily sized byte pointers! Did anybody ever implement a C compiler on that hardware?

https://stackoverflow.com/questions/3153141/defining-a-byte-in-c <https://stackoverflow.com/questions/3153141/defining-a-byte-in-c>

https://en.wikipedia.org/wiki/36-bit <https://en.wikipedia.org/wiki/36-bit>

As DIGEX teased the VAX weenies at DECUS:

“If you’re not playing with 36 bits, you’re not playing with a full DEC!"

-Don

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/47a7686e/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 17:44       ` Ralph Corderoy
@ 2017-11-08 19:56         ` Ron Natalie
  2017-11-08 20:39           ` Don Hopkins
  2017-11-08 20:50           ` Steve Nickolas
  2017-11-08 21:25         ` Bakul Shah
  1 sibling, 2 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-08 19:56 UTC (permalink / raw)


Ralph is right.   You don't have to go any further than the old x86 implementations to find machines where the function pointers are bigger than the data pointers.  

Further void* both by the standard and by practical matter MUST have the format of char*.   Any other type of pointer has to be convertible to void*/char* (as both must address the smallest addressable unit).
Most machines, don't need to actually do any pointer conversion but more than a few do, mostly those that have word addressing as native.

If I recall properly, the CRAY, which didn't really have byte addressing at all, natively, just had the byte offset into word encoded in high order bits.    The UNIVAC has a quite rich "partial word" format encoded in the pointers.    The HEP as well used the low order bits to switch the operand size as well as the offset into the word.

This all works because conversion via normal means converted the from or to the void*/char* and whatever the other data pointer type, as it knows the type of both sides of the conversion.
The BSD kernels however were ripe with what I call "conversion by union."    It would store one pointer type into a union of one pointer type and retrieve it from another.    Now this is officially undefined behavior
(as is most use of sockaddr_t in the early days).    I remember spending a few days running around the kernel "fixing" this when doing the HEP port.



-----Original Message-----
From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Ralph Corderoy
Sent: Wednesday, November 8, 2017 12:45 PM
To: tuhs at minnie.tuhs.org
Subject: Re: [TUHS] origins of void* -- Apology!

Hi Bakul,

> void* serves a different purpose. It says this is an untyped pointer 
> (or a ptr to an instance of any type) so no question of size being an 
> issue.

In C, ignoring POSIX, a void pointer is big enough to hold any pointer to data.  Pointers to data may be different sizes.  And a void pointer can't hold a function pointer, but all function pointers are defined to be the same size.  Thus `void (*)(void)' can be used as a generic function pointer type and cast to other ones when needed.

> It shouldn't even have been "void*". I would've preferred _* and _ 
> instead of void* and void. Much more appropriate for a concise 
> language like C!

That's awful.  Might as well say `return' occurs so often, it should have been `@'.  :-)

--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-07  1:09     ` Bakul Shah
  2017-11-07  1:55       ` Ron Natalie
@ 2017-11-08 17:44       ` Ralph Corderoy
  2017-11-08 19:56         ` Ron Natalie
  2017-11-08 21:25         ` Bakul Shah
  1 sibling, 2 replies; 35+ messages in thread
From: Ralph Corderoy @ 2017-11-08 17:44 UTC (permalink / raw)


Hi Bakul,

> void* serves a different purpose. It says this is an untyped pointer
> (or a ptr to an instance of any type) so no question of size being an
> issue.

In C, ignoring POSIX, a void pointer is big enough to hold any pointer
to data.  Pointers to data may be different sizes.  And a void pointer
can't hold a function pointer, but all function pointers are defined to
be the same size.  Thus `void (*)(void)' can be used as a generic
function pointer type and cast to other ones when needed.

> It shouldn't even have been "void*". I would've preferred _* and _
> instead of void* and void. Much more appropriate for a concise
> language like C!

That's awful.  Might as well say `return' occurs so often, it should
have been `@'.  :-)

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 12:48 ` Tony Finch
  2017-11-08 13:36   ` Otto Moerbeek
@ 2017-11-08 16:03   ` Warner Losh
  1 sibling, 0 replies; 35+ messages in thread
From: Warner Losh @ 2017-11-08 16:03 UTC (permalink / raw)


On Wed, Nov 8, 2017 at 5:48 AM, Tony Finch <dot at dotat.at> wrote:

> Nelson H. F. Beebe <beebe at math.utah.edu> wrote:
> >
> >       % cat *.log | grep '^ char type is' | sort | uniq -c
> >           157         char type is          signed
> >             3         char type is          unsigned
> >
> > The sole outliers are
> >
> >       * Arch Linux ARM on armv7l
> >       * IBM CentOS Linux release 7.4.1708 on PowerPC-8
> >       * SGI IRIX 6.5 on MIPS R10000-SC
>
> Nice survey, thanks!
>
> I learned C using the Norcroft C compiler on early Acorn / ARM machines
> where char was unsigned. That is still the case, though ARM have switched
> from Norcroft to clang.
>
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.
> doc.dui0774h/kpr1493281322162.html
>
> (And I started learning about unix from reading articles about RISC iX,
> Acorn's 4.3BSD port to the Archimedes.)
>

ARM's pre-EABI ABIs dictated that char be unsigned. It's all dictated by
the ABI that's implemented, and less about which compiler is used. Now that
EABI is basically mainstream, unsigned characters on ARM has become a
historic oddity.

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171108/6bf5abe8/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-08 12:48 ` Tony Finch
@ 2017-11-08 13:36   ` Otto Moerbeek
  2017-11-08 16:03   ` Warner Losh
  1 sibling, 0 replies; 35+ messages in thread
From: Otto Moerbeek @ 2017-11-08 13:36 UTC (permalink / raw)


On Wed, Nov 08, 2017 at 12:48:47PM +0000, Tony Finch wrote:

> Nelson H. F. Beebe <beebe at math.utah.edu> wrote:
> >
> > 	% cat *.log | grep '^ char type is' | sort | uniq -c
> > 	    157         char type is          signed
> > 	      3         char type is          unsigned
> >
> > The sole outliers are
> >
> > 	* Arch Linux ARM on armv7l
> > 	* IBM CentOS Linux release 7.4.1708 on PowerPC-8
> > 	* SGI IRIX 6.5 on MIPS R10000-SC
> 
> Nice survey, thanks!
> 
> I learned C using the Norcroft C compiler on early Acorn / ARM machines
> where char was unsigned. That is still the case, though ARM have switched
> from Norcroft to clang.

whether char is signed or unsigned is defined by the ABI of the
platform, not by the compiler (if the compiler builder respects the
ABI, which is of course a wise thing to do).


	-Otto


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-07 15:34 [TUHS] origins of void* -- Apology! Nelson H. F. Beebe
@ 2017-11-08 12:48 ` Tony Finch
  2017-11-08 13:36   ` Otto Moerbeek
  2017-11-08 16:03   ` Warner Losh
  0 siblings, 2 replies; 35+ messages in thread
From: Tony Finch @ 2017-11-08 12:48 UTC (permalink / raw)


Nelson H. F. Beebe <beebe at math.utah.edu> wrote:
>
> 	% cat *.log | grep '^ char type is' | sort | uniq -c
> 	    157         char type is          signed
> 	      3         char type is          unsigned
>
> The sole outliers are
>
> 	* Arch Linux ARM on armv7l
> 	* IBM CentOS Linux release 7.4.1708 on PowerPC-8
> 	* SGI IRIX 6.5 on MIPS R10000-SC

Nice survey, thanks!

I learned C using the Norcroft C compiler on early Acorn / ARM machines
where char was unsigned. That is still the case, though ARM have switched
from Norcroft to clang.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0774h/kpr1493281322162.html

(And I started learning about unix from reading articles about RISC iX,
Acorn's 4.3BSD port to the Archimedes.)

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/  -  I xn--zr8h punycode
Tyne, Dogger: Variable 3 or 4, becoming west or southwest 5 to 7, perhaps gale
8 later in north. Slight, becoming moderate. Occasional rain later. Good,
occasionally moderate.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
@ 2017-11-07 15:34 Nelson H. F. Beebe
  2017-11-08 12:48 ` Tony Finch
  0 siblings, 1 reply; 35+ messages in thread
From: Nelson H. F. Beebe @ 2017-11-07 15:34 UTC (permalink / raw)


Arthur Krewat <krewat at kilonet.net> writes on Mon, 6 Nov 2017 19:34:34 -0500

>> char (at least these days) is signed. So really, it's 7-bit ASCII.

I decided last night to investigate that statement, and updated my
C/C++ features tool to test the sign and range of char and wchar_t.  

I ran it in our test lab with physical and virtual machines
representing many different GNU/Hurd, GNU/Linux, *BSD, macOS, Minix,
Solaris, and other Unix family members, on ARM, MIPS, PowerPC, SPARC,
x86, and x86-64 CPU architectures.  Here is a summary:

	% cat *.log | grep '^ char type is' | sort | uniq -c
	    157         char type is          signed
	      3         char type is          unsigned

The sole outliers are 

	* Arch Linux ARM on armv7l
	* IBM CentOS Linux release 7.4.1708 on PowerPC-8
	* SGI IRIX 6.5 on MIPS R10000-SC

for which I found these log data:

	Character range and sign...
		CHAR_MIN                        =   +0
		CHAR_MAX                        = +255
		SCHAR_MIN                       = -128
		SCHAR_MAX                       = +127
		UCHAR_MAX                       = +255
		char type is          unsigned
		signed char type is   signed
		unsigned char type is unsigned

The last two lines are expected, but my program checked for an
incorrect result, and would have produced the string "WRONG!" in the
output; no system had that result.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe at math.utah.edu  -
- 155 S 1400 E RM 233                       beebe at acm.org  beebe at computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void*  -- Apology!
  2017-11-07  1:09     ` Bakul Shah
@ 2017-11-07  1:55       ` Ron Natalie
  2017-11-08 17:44       ` Ralph Corderoy
  1 sibling, 0 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-07  1:55 UTC (permalink / raw)



> C has had uint8_t since C99.

uint8_t isn't the same thing as I was proposing.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void*  -- Apology!
  2017-11-07  0:25   ` Ron Natalie
  2017-11-07  0:34     ` Arthur Krewat
@ 2017-11-07  1:09     ` Bakul Shah
  2017-11-07  1:55       ` Ron Natalie
  2017-11-08 17:44       ` Ralph Corderoy
  1 sibling, 2 replies; 35+ messages in thread
From: Bakul Shah @ 2017-11-07  1:09 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]



> On Nov 6, 2017, at 4:25 PM, Ron Natalie <ron at ronnatalie.com> wrote:
> 
> I believe one of C’s biggest failings is that they did not solve the schizophrenic definition of char*.
>  
> Char* as historically implemented and then  CODIFIED in the C and C++ standards is both the basic character type as well as the smallest addressable unit of storage.
> This was all peachy keen in the 8 bit ASCII days (and even earlier alternative character sets such as EBCDIC, and its predecessors and other historical character sets like UNIVAC’s fielddata), but fell apart when we started into the 16 bit and larger UNICODE.
>  
> We needed a basic memory type that had sizeof == 1 (which void*) did not meet and release char from having to play double duty.

C has had uint8_t since C99.

void* serves a different purpose. It says this is an
untyped pointer (or a ptr to an instance of any type)
so no question of size being an issue. It shouldn't
even have been "void*". I would've preferred _* and
_ instead of void* and void. Much more appropriate for
a concise language like C!


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-07  0:34     ` Arthur Krewat
@ 2017-11-07  0:36       ` Ron Natalie
  0 siblings, 0 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-07  0:36 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

It’s worse than that.   “char” is defined as neither signed nor unsigned.   The signedness is implementation defined.    This was why we have the inane “signed” keyword.

 

 

From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Arthur Krewat
Sent: Monday, November 6, 2017 7:35 PM
To: tuhs at minnie.tuhs.org
Subject: Re: [TUHS] origins of void* -- Apology!

 

char (at least these days) is signed. So really, it's 7-bit ASCII.

I've been bitten by the 7-bit ASCII thing when it comes to modern character sets. unsigned char gets tiresome ;)



On 11/6/2017 7:25 PM, Ron Natalie wrote:

I believe one of C’s biggest failings is that they did not solve the schizophrenic definition of char*.

 

Char* as historically implemented and then  CODIFIED in the C and C++ standards is both the basic character type as well as the smallest addressable unit of storage.

This was all peachy keen in the 8 bit ASCII days (and even earlier alternative character sets such as EBCDIC, and its predecessors and other historical character sets like UNIVAC’s fielddata), but fell apart when we started into the 16 bit and larger UNICODE.

 

We needed a basic memory type that had sizeof == 1 (which void*) did not meet and release char from having to play double duty.

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171106/b89f672d/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-07  0:25   ` Ron Natalie
@ 2017-11-07  0:34     ` Arthur Krewat
  2017-11-07  0:36       ` Ron Natalie
  2017-11-07  1:09     ` Bakul Shah
  1 sibling, 1 reply; 35+ messages in thread
From: Arthur Krewat @ 2017-11-07  0:34 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

char (at least these days) is signed. So really, it's 7-bit ASCII.

I've been bitten by the 7-bit ASCII thing when it comes to modern 
character sets. unsigned char gets tiresome ;)


On 11/6/2017 7:25 PM, Ron Natalie wrote:
>
> I believe one of C’s biggest failings is that they did not solve the 
> schizophrenic definition of char*.
>
> Char* as historically implemented and then  CODIFIED in the C and C++ 
> standards is both the basic character type as well as the smallest 
> addressable unit of storage.
>
> Thiswas all peachy keen in the 8 bit ASCII days (and even earlier 
> alternative character sets such as EBCDIC, and its predecessors and 
> other historical character sets like UNIVAC’s fielddata_), but fell 
> apart when we started into the 16 bit and larger UNICODE._
>
> __
>
> _We needed a basic memory type that had sizeof == 1 (which void*) did 
> not meet and release char from having to play double duty._
>
> __
>
> __
>
> __
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171106/fb57d3be/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void*  -- Apology!
  2017-11-06 21:46 ` [TUHS] origins of void* -- Apology! Steve Johnson
  2017-11-06 22:18   ` Warner Losh
@ 2017-11-07  0:25   ` Ron Natalie
  2017-11-07  0:34     ` Arthur Krewat
  2017-11-07  1:09     ` Bakul Shah
  1 sibling, 2 replies; 35+ messages in thread
From: Ron Natalie @ 2017-11-07  0:25 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

I believe one of C’s biggest failings is that they did not solve the schizophrenic definition of char*.

 

Char* as historically implemented and then  CODIFIED in the C and C++ standards is both the basic character type as well as the smallest addressable unit of storage.

This was all peachy keen in the 8 bit ASCII days (and even earlier alternative character sets such as EBCDIC, and its predecessors and other historical character sets like UNIVAC’s fielddata), but fell apart when we started into the 16 bit and larger UNICODE.

 

We needed a basic memory type that had sizeof == 1 (which void*) did not meet and release char from having to play double duty.

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171106/d18e4e01/attachment.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void* -- Apology!
  2017-11-06 21:46 ` [TUHS] origins of void* -- Apology! Steve Johnson
@ 2017-11-06 22:18   ` Warner Losh
  2017-11-07  0:25   ` Ron Natalie
  1 sibling, 0 replies; 35+ messages in thread
From: Warner Losh @ 2017-11-06 22:18 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4018 bytes --]

On Mon, Nov 6, 2017 at 2:46 PM, Steve Johnson <scj at yaccman.com> wrote:

> I had a senior moment when I sent out my posting about the origin of void
> *.   The person in question
> was Larry Rossler, not Charlie Roberts -- apologies to both!
>
> Larry was active in the ANSI committee, so he had a bully pulpit for
> suggesting this change...
>
> About Bliss, there certainly was a bit of a competition for a while
> between C and Bliss, and Bliss wasn't such a bad language.  But the
> technology behind it was pretty ugly.  You had to compile PDP-11 programs
> on a Dec System 20, which shut out a large percentage of the people who
> might have been interested.   And they took a very relaxed stance of
> pointer aliasing -- basically, the compiler assumed that no two pointers
> pointed to the same thing unless you turned on a flag in which case it
> assume all pointers pointed to all other pointers.  This would not have
> worked well for system code...
>
> Pascal was a much more serious competitor -- it was much easier to teach
> 75 people in a room how to program in Pascal than in C, and P-code was a
> reasonable portability mechanism.  The differences have been much discussed
> in this forum so I won't restart that thread again...
>
> At one point about 15 years after C has pretty much won over Bliss, I gave
> a job interview to a programmer at Dec who was responsible for maintaining
> 50 million lines of bliss.   I've rarely met anyone who was more determined
> to change jobs!
>

I've seen the signature "Bliss is ignorance" :)

Warner


> Steve
>
>
>
> ----- Original Message -----
> From:
> "Warner Losh" <imp at bsdimp.com>
>
> To:
> <arnold at skeeve.com>
> Cc:
> "TUHS main list" <tuhs at minnie.tuhs.org>, "Paul Ruizendaal" <pnr at planet.nl>
> Sent:
> Mon, 6 Nov 2017 08:02:53 -0700
> Subject:
> Re: [TUHS] origins of void*
>
>
>
>
> On Mon, Nov 6, 2017 at 12:24 AM, <arnold at skeeve.com> wrote:
>
>> Paul Ruizendaal <pnr at planet.nl> wrote:
>>
>> > >> In the 4BSD era there was caddr_t, which I think was used for pretty
>> > >> much the same purposes.
>> > >
>> > > Only for kernel code. I am pretty sure caddr_t wasn't used in
>> user-land code.
>> >
>> > Ah, thanks for pointing that out, I had not realised that and it helps
>> > explain some things. But why wasn’t caddr_t used for user-land code:
>> > usage in the signature of e.g. write() would have made sense, right?
>>
>> It's clear from K&R 1 that char* served as both pointer to string and
>> generic pointer to memory.  That's not unreasonable, since, in some sense,
>> "memory is just bytes".  So user-land code didn't need caddr_t.  I also
>> suspect that caddr_t came into being with the effort to port Unix off
>> the PDP-11 and the weight of Unix practice before then had been to make
>> do with char*.
>>
>> I think it helps to remember the evolutionary processes that were
>> happening
>> in the '70s.  High level languages had caught on for application code
>> (FORTRAN and COBOL in the US, Algol in Europe) but the weight of existing
>> practice for *systems coding* (operating systems and utilities) had been
>> to use assembly language.  Multics proved that you could write an OS in
>> a high level language, but Multics itself (at that time) wasn't a success.
>>
>> So when C came along in the mid-'70s, strong typing had essentially been
>> absent from systems programming.  With time and experience, along with
>> the recognition in the general CS world that strong typing was valuable,
>> C also started to evolve in that direction.
>>
>
> I thought there'd also been some influences from BLISS... DEC did much of
> their system programming in BLISS along side the MACRO-{11,32,20}....  Not
> exactly a strongly typed language, but another entry in the higher level
> language category that C was competing against.
>
> Warner
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171106/5efd365c/attachment-0001.html>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [TUHS] origins of void*  -- Apology!
  2017-11-06 15:02 [TUHS] origins of void* Warner Losh
@ 2017-11-06 21:46 ` Steve Johnson
  2017-11-06 22:18   ` Warner Losh
  2017-11-07  0:25   ` Ron Natalie
  0 siblings, 2 replies; 35+ messages in thread
From: Steve Johnson @ 2017-11-06 21:46 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3812 bytes --]

I had a senior moment when I sent out my posting about the origin of
void *.   The person in question
was Larry Rossler, not Charlie Roberts -- apologies to both!

Larry was active in the ANSI committee, so he had a bully pulpit for
suggesting this change...

About Bliss, there certainly was a bit of a competition for a while
between C and Bliss, and Bliss wasn't such a bad language.  But the
technology behind it was pretty ugly.  You had to compile PDP-11
programs on a Dec System 20, which shut out a large percentage of the
people who might have been interested.   And they took a very
relaxed stance of pointer aliasing -- basically, the compiler assumed
that no two pointers pointed to the same thing unless you turned on a
flag in which case it assume all pointers pointed to all other
pointers.  This would not have worked well for system code...

Pascal was a much more serious competitor -- it was much easier to
teach 75 people in a room how to program in Pascal than in C, and
P-code was a reasonable portability mechanism.  The differences have
been much discussed in this forum so I won't restart that thread
again...

At one point about 15 years after C has pretty much won over Bliss, I
gave a job interview to a programmer at Dec who was responsible for
maintaining 50 million lines of bliss.   I've rarely met anyone who
was more determined to change jobs!

Steve

----- Original Message -----
From:
 "Warner Losh" <imp at bsdimp.com>

To:
<arnold at skeeve.com>
Cc:
"TUHS main list" <tuhs at minnie.tuhs.org>, "Paul Ruizendaal"
<pnr at planet.nl>
Sent:
Mon, 6 Nov 2017 08:02:53 -0700
Subject:
Re: [TUHS] origins of void*

On Mon, Nov 6, 2017 at 12:24 AM, <arnold at skeeve.com [1]>
 wrote:
Paul Ruizendaal <pnr at planet.nl [2]> wrote:

 > >> In the 4BSD era there was caddr_t, which I think was used for
pretty
 > >> much the same purposes.
 > >
 > > Only for kernel code. I am pretty sure caddr_t wasn't used in
user-land code.
 >
 > Ah, thanks for pointing that out, I had not realised that and it
helps
 > explain some things. But why wasn’t caddr_t used for user-land
code:
 > usage in the signature of e.g. write() would have made sense,
right?

It's clear from K&R 1 that char* served as both pointer to string and
 generic pointer to memory.  That's not unreasonable, since, in some
sense,
 "memory is just bytes".  So user-land code didn't need caddr_t.  I
also
 suspect that caddr_t came into being with the effort to port Unix off
 the PDP-11 and the weight of Unix practice before then had been to
make
 do with char*.

 I think it helps to remember the evolutionary processes that were
happening
 in the '70s.  High level languages had caught on for application
code
 (FORTRAN and COBOL in the US, Algol in Europe) but the weight of
existing
 practice for *systems coding* (operating systems and utilities) had
been
 to use assembly language.  Multics proved that you could write an OS
in
 a high level language, but Multics itself (at that time) wasn't a
success.

 So when C came along in the mid-'70s, strong typing had essentially
been
 absent from systems programming.  With time and experience, along
with
 the recognition in the general CS world that strong typing was
valuable,
 C also started to evolve in that direction.

I thought there'd also been some influences from BLISS... DEC did much
of their system programming in BLISS along side the
MACRO-{11,32,20}....  Not exactly a strongly typed language, but
another entry in the higher level language category that C was
competing against.

Warner 

Links:
------
[1] mailto:arnold at skeeve.com
[2] mailto:pnr at planet.nl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171106/03cedd64/attachment.html>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-11-09  7:44 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08 16:07 [TUHS] origins of void* -- Apology! Nemo
2017-11-08 16:12 ` Warner Losh
2017-11-08 19:59   ` Ron Natalie
2017-11-08 23:33   ` Steffen Nurpmeso
2017-11-09  1:35   ` Steve Johnson
2017-11-08 20:28 ` [TUHS] NUXI Problem Warren Toomey
2017-11-08 20:38   ` Ron Natalie
2017-11-08 20:39   ` Clem Cole
2017-11-08 23:14   ` Warren Toomey
  -- strict thread matches above, loose matches on Subject: below --
2017-11-07 15:34 [TUHS] origins of void* -- Apology! Nelson H. F. Beebe
2017-11-08 12:48 ` Tony Finch
2017-11-08 13:36   ` Otto Moerbeek
2017-11-08 16:03   ` Warner Losh
2017-11-06 15:02 [TUHS] origins of void* Warner Losh
2017-11-06 21:46 ` [TUHS] origins of void* -- Apology! Steve Johnson
2017-11-06 22:18   ` Warner Losh
2017-11-07  0:25   ` Ron Natalie
2017-11-07  0:34     ` Arthur Krewat
2017-11-07  0:36       ` Ron Natalie
2017-11-07  1:09     ` Bakul Shah
2017-11-07  1:55       ` Ron Natalie
2017-11-08 17:44       ` Ralph Corderoy
2017-11-08 19:56         ` Ron Natalie
2017-11-08 20:39           ` Don Hopkins
2017-11-08 20:42             ` Ron Natalie
2017-11-08 20:47               ` Don Hopkins
2017-11-08 20:48                 ` Don Hopkins
2017-11-08 20:43             ` Don Hopkins
2017-11-08 20:43             ` Clem Cole
2017-11-08 20:45             ` Warner Losh
2017-11-09  6:33               ` Lars Brinkhoff
2017-11-08 20:50           ` Steve Nickolas
2017-11-08 21:25         ` Bakul Shah
2017-11-09  6:37           ` Lars Brinkhoff
2017-11-09  7:14             ` Don Hopkins
2017-11-09  7:44               ` Lars Brinkhoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).