[TUHS] Unix stories - Nick Downing

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

From: downing.nick@gmail.com (Nick Downing)
Subject: [TUHS] Unix stories
Date: Mon, 2 Jan 2017 17:25:16 +1100	[thread overview]
Message-ID: <CAH1jEzbyL=H7ePZks_0FPjNk01JwsbkeVngQn3zV0htRDwFiGw@mail.gmail.com> (raw)
In-Reply-To: <alpine.BSF.2.02.1701020058230.83335@frieza.hoshinet.org>

Yes, I agree, I want to do exactly that. And, I know my ideas are
probably also perceived as daft. But it's OK :)

Unfortunately a C interpreter does not really work in principle,
because of the fact you can jump into loops and other kinds of abuse.
So the correct execution model is a bytecode. This means the C
compiler is pretty much unchanged, so there are no real
simplifications there. Also, you can cast a pointer and access
individual bytes of a structure and that kind of abuse. So the data
model is pretty much unchanged from a real machine's too, and there
are no real simplifications there either. But in my opinion that's not
the end of the story.

What I would like to do is to implement pointer safety, I know this is
a pretty big ask and will break compatibility with a big number of C
applications out there. But it might be surprising how many actually
do use pointers in a safe way. So my idea was something like this:

Suppose there is "struct foo { char *p; int *q; long b; short a; };",
well the user wouldn't be allowed to access the individual bytes of
the pointers p and q. But they would be allowed to access the
individual bytes of a and b since this does not introduce anything
dangerous. Therefore the struct would be internally converted to
"struct  foo { char *p; int *q; char data[6] };" allowing 4 bytes for
the long and 2 for the short. Then, now suppose I call a function and
I pass it "&a". This would be internally converted to "data + 4". But
it would be passed as a triple (data, data + 4, data + 6) which gives
the bounds of the underlying array. Or suppose the virtual machine was
implemented in Java or Python or some such, it would be passed as a
pair (data, 4) where data is a 6-byte array and 4 is an integer index
into it.

I was then going to make the compiler recognize some idiomatic C
constructs like "struct foo *p = malloc(10 * sizeof(foo));" and
internally translate them to something like "struct foo *p = new
foo[10];". Hopefully there could eventually be discriminated unions so
that I could declare something like "union foo { char *p; int *q; long
b; short a; };" and this would be internally translated to "struct foo
{ int which; union { char *, int *, data[4] } };" and the field
"which" would be set to 0, 1 or 2 depending on whether a char *, an
int *, or plain old data had been stored there. That way, when you
access a given field of the union it can validate that the correct
thing was stored there earlier. Also, with a little bit of syntactic
sugar we could implement intptr_t as "union { void *p; int a; };" to
increase compatibility with C.

If we do it like this, then your P-system idea works quite well,
because the memory space doesn't exist anymore, it's an OO database.

cheers, Nick

On Mon, Jan 2, 2017 at 5:01 PM, Steve Nickolas <usotsuki at buric.co> wrote:
> On Mon, 2 Jan 2017, Nick Downing wrote:
>
>> What I want to do is go back to a REALLY SIMPLE unbloated system,
>> which is why I am very interested in 2.11BSD (you probably saw my
>> earlier posts about the 2.11BSD system and potential port to Z180 and
>> so on). And then I want to define the ONE TRUE WAY(TM) of doing each
>> thing. But before I do this I want to go right back to basics and look
>> at the object model used in the operating system itself. For instance
>> the stuff like the oft (open file table), inode table, the filesystem
>> table (I mean Sun VFS which isn't in 2.11BSD AFAIK but eventually
>> should be), the device table and so on. And also the user-visible
>> objects like files, sockets etc which map to kernel objects.
>>
>> So once I have sorted all this out and created an object model that
>> the kernel can use efficiently, with compiler support (like C++
>> without the bloat, like java with internal pointers and ability to get
>> at the bits and bytes of your objects, like C without all the void *
>> stuff and with automatic handling of stuff like vtables), and
>> converted the kernel and all drivers to use it, then I think I will
>> have an object model that is useful in practice. So my idea then is to
>> export it to userlevel, so that userland programs can be rewritten
>> into this new C-like object oriented language and calls like: count =
>> read(fd, buf, size) would change to: count = fd.read(buf, size) since
>> kernel objects are objects. There would also be a compatibility layer
>> that allows you to keep a table of integer fds and their kernel
>> objects in userspace for porting.
>>
>> After this I would start to look at popular libraries like the C
>> library or the X-Window system, and convert them to use the new object
>> model, while also providing the compatibility layer as I did for the
>> kernel interface. Ultimately the result would be a bit like Java, but
>> using all the familiar Unix objects and familiar Unix calling
>> conventions (such as argument passing by reference or malloc/memcpy
>> stuff that Java can't do). Also without any header files or
>> boilerplate of any description, which is one of my pet peeves with
>> Java, C, C++ etc.
>>
>> I really think that the solution to bloat is to go through and rewrite
>> everything to do things in a more standardized way with more reuse.
>> Also I think that the massive amount of bloat arises to some extent
>> because the environment lends itself to writing non maintainable code
>> (for example you have to write loads of boilerplate and synchronize
>> function definitions in various places, which discourages you from
>> changing a function definition even if that's the right thing to do in
>> a situation). So there's always the temptation to add another
>> compatibility layer rather than dealing with the bloat. Rewriting
>> things in a much more minimal and maintainable style is the answer.
>>
>> Another reason for bloat is that authors have to support millions of
>> slightly different systems. My idea is to totally standardize it, like
>> POSIX but much more drastically so. Think about Java, it defines a
>> strict virtual machine so there's nothing to change when you port your
>> code to another platform. I haven't totally decided how to handle
>> word-size issues in this context, but I am sure there is a way.
>
>
> This vaguely reminds me of an idea I had a couple years ago, but never
> mentioned because I thought it might be perceived as daft.
>
> I had the idea of writing a virtual machine specifically tailored to running
> C code, with a virtual operating system similar to Unix.  On the surface it
> would probably feel like running Unix under an emulator, but the design
> would probably be a bit less baroque, because everything would be designed
> to work together cleanly.
>
> Like the mutant child of Unix and UCSD Pascal, I suppose.
>
> -uso.

next prev parent reply	other threads:[~2017-01-02  6:25 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-01  5:00 Larry McVoy
2017-01-01  6:48 ` Nick Downing
2017-01-02  2:03   ` Steve Johnson
2017-01-02  2:42     ` Nick Downing
2017-01-02  6:01       ` Steve Nickolas
2017-01-02  6:21         ` Warren Toomey
2017-01-02  6:25         ` Nick Downing [this message]
2017-01-04  4:07           ` Steve Nickolas
2017-01-02  7:29       ` arnold
2017-01-02 22:52     ` Dave Horsfall
2017-01-02 22:56       ` Larry McVoy
2017-01-02 22:59         ` Ronald Natalie
2017-01-02 22:58       ` Ronald Natalie
2017-01-02 23:23     ` Tim Bradshaw
2017-01-03  0:49       ` Larry McVoy
2017-01-03 11:36         ` Joerg Schilling
2017-01-04 13:04         ` Steffen Nurpmeso
2017-01-04 14:07           ` Random832
2017-01-04 14:54             ` Ron Natalie
2017-01-04 15:59               ` Random832
2017-01-04 16:30                 ` Steffen Nurpmeso
2017-01-04 16:32                   ` Random832
2017-01-04 16:51                     ` Steffen Nurpmeso
2017-01-04 16:54                       ` Random832
2017-01-04 16:58                         ` Ron Natalie
2017-01-04 17:38                           ` Steffen Nurpmeso
2017-01-04 17:47                             ` Steffen Nurpmeso
2017-01-04 18:51                           ` Steve Johnson
2017-01-04 17:08                         ` Steffen Nurpmeso
2017-01-04 16:22             ` Steffen Nurpmeso
2017-01-04 16:35               ` Random832
2017-01-04 17:03                 ` Steffen Nurpmeso
2017-02-09 13:46                 ` Steffen Nurpmeso
2017-02-09 14:55                   ` Random832
2017-02-09 17:15                     ` Steffen Nurpmeso
2017-01-01 13:11 ` Ron Natalie
2017-01-01 16:50 Noel Chiappa
2017-01-01 21:45 ` Nemo
2017-01-02  2:53   ` Wesley Parish
2017-01-02 14:30 Doug McIlroy
2017-01-02 18:36 ` Dan Cross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH1jEzbyL=H7ePZks_0FPjNk01JwsbkeVngQn3zV0htRDwFiGw@mail.gmail.com' \
    --to=downing.nick@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).