On Fri, Dec 15, 2023 at 10:51 AM Paul Winalski <paul.winalski@gmail.com> wrote:

For me, the term "system process" means either:

o A conventional, but perhaps privileged user-mode process that
performs a system function. An example would be the output side of a
spooling system, or an operator communications process.

o A process, or at least an address space + execution thread, that
runs in privileged mode on the hardware and whose address space is in
the resident kernel.

Do Unix system processes participate in time-sliced scheduling the way
that user processes do?

Yes. At least on FreeBSD they do. They are just processes that get

scheduled. They may have different priorities, etc, but all that factors

in, and those priorities allow them to compete and/or preempt already

running processes depending on a number of things. The only thing

special about kernel-only thread/processes is that they are optimized

knowing they never have a userland associated with them...

On 12/14/23, Bakul Shah <bakul@iitbombay.org> wrote:
>
> Exactly! If blocking was not required, you can do the work in an
> interrupt handler. If blocking is required, you can't just use the
> stack of a random process (while in supervisor mode) unless you
> are doing some work specifically on its behalf.
>
>> Interestingly, other early systems don't seem to have thought of this
>> structuring technique.
>
> I suspect IBM operating systems probably did use them. At least TSO
> must have. Once you start *accounting* (and charging) for cpu time,
> this idea must fall out naturally. You don't want to charge a process
> for kernel time used for an unrelated work!

The usual programming convention for IBM S/360/370 operating systems
(OS/360, OS/VS, TOS and DOS/360, DOS/VS) did not involve use of a
stack at all, unless one was writing a routine involving recursive
calls, and that was rare. Addressing for both program and data was
done using a base register + offset. PL/I is the only IBM HLL I know
that explicitly supported recursion. I don't know how they
implemented automatic variables assigned to memory in recursive
routines. It might have been a linked list rather than a stack.

I remember when I first went from the IBM world and started
programming VAX/VMS, I thought it was really weird to burn an entire
register just for a process stack.

> There was a race condition in V7 swapping code. Once a colleague and I
> spent two weeks of 16 hour debugging days!

I had a race condition in some multithread code I wrote. I couldn't
find it the bug. I even resorted to getting machine code listings of
the whole program and marking the critical and non-critical sections
with green and red markers. I eventually threw all of the code out
and rewrite it from scratch. The second version didn't have the race
condition.

The award for my 'longest bug chased' is at around 3-4 years. We had

a product, based on an arm9 CPU (so armv4) that would sometimes

hang. Well, individual threads in it would hang waiting for a lock and so

weird aspects of the program stopped working in unusual ways. But the

root cause was a stuck lock, or missed wakeup. It took months to recreate

this problem. I tried all manner of debugging to accelerate it reoccurring (no

luck) to audit tall locks/unlocks/wakeups to make sure there was no leaks

or subtle mismatches (there wasn't, despite a 100MB log file). It went on

and on. I rewrote all the locking / sleeping / etc code, but also no dice.

The one day, by chance, I was talking to someone who asked me

about atomic operations. I blew them off at first, but then realized the atomic

ops weren't implemented in hardware, but in software with the support of

the kernel (there were no CPU level atomic ops). Within an hour of realizing

this and auditing the code path, I had a fix to a race that was trivial to discover

once you looked at the code closely. My friend also found the same race that I

had about the same time I was finishing up my fix (which he found another race

in, go pair programming). With the corrected fix, the weird hanging went

away, only to be reported once again... in a unit that hadn't been updated

with the patch!

tl;dr: you never know what the root cause might be in weird, racy situations.

Warner