On Fri, Dec 15, 2023 at 10:51 AM Paul Winalski <paul.winalski@gmail.com> wrote:
For me, the term "system process" means either:

o A conventional, but perhaps privileged user-mode process that
performs a system function.  An example would be the output side of a
spooling system, or an operator communications process.

o A process, or at least an address space + execution thread, that
runs in privileged mode on the hardware and whose address space is in
the resident kernel.

Do Unix system processes participate in time-sliced scheduling the way
that user processes do?

Yes. At least on FreeBSD they do. They are just processes that get
scheduled. They may have different priorities, etc, but all that factors
in, and those priorities allow them to compete and/or preempt already
running processes depending on a number of things. The only thing
special about kernel-only thread/processes is that they are optimized
knowing they never have a userland associated with them...
 
On 12/14/23, Bakul Shah <bakul@iitbombay.org> wrote:
>
> Exactly! If blocking was not required, you can do the work in an
> interrupt handler. If blocking is required, you can't just use the
> stack of a random process (while in supervisor mode) unless you
> are doing some work specifically on its behalf.
>
>> Interestingly, other early systems don't seem to have thought of this
>> structuring technique.
>
> I suspect IBM operating systems probably did use them. At least TSO
> must have. Once you start *accounting* (and charging) for cpu time,
> this idea must fall out naturally. You don't want to charge a process
> for kernel time used for an unrelated work!

The usual programming convention for IBM S/360/370 operating systems
(OS/360, OS/VS, TOS and DOS/360, DOS/VS) did not involve use of a
stack at all, unless one was writing a routine involving recursive
calls, and that was rare.  Addressing for both program and data was
done using a base register + offset.  PL/I is the only IBM HLL I know
that explicitly supported recursion.  I don't know how they
implemented automatic variables assigned to memory in recursive
routines.  It might have been a linked list rather than a stack.

I remember when I first went from the IBM world and started
programming VAX/VMS, I thought it was really weird to burn an entire
register just for a process stack.

> There was a race condition in V7 swapping code. Once a colleague and I
> spent two weeks of 16 hour debugging days!

I had a race condition in some multithread code I wrote.  I couldn't
find it the bug.  I even resorted to getting machine code listings of
the whole program and marking the critical and non-critical sections
with green and red markers.  I eventually threw all of the code out
and rewrite it from scratch.  The second version didn't have the race
condition.

The award for my 'longest bug chased' is at around 3-4 years. We had
a product, based on an arm9 CPU (so armv4) that would sometimes
hang. Well, individual threads in it would hang waiting for a lock and so
weird aspects of the program stopped working in unusual ways. But the
root cause was a stuck lock, or missed wakeup. It took months to recreate
this problem. I tried all manner of debugging to accelerate it reoccurring (no
luck) to audit tall locks/unlocks/wakeups to make sure there was no leaks
or subtle mismatches (there wasn't, despite a 100MB log file). It went on
and on. I rewrote all the locking / sleeping / etc code, but also no dice.
The one day, by chance, I was talking to someone who asked me
about atomic operations. I blew them off at first, but then realized the atomic
ops weren't implemented in hardware, but in software with the support of
the kernel (there were no CPU level atomic ops). Within an hour of realizing
this and auditing the code path, I had a fix to a race that was trivial to discover
once you looked at the code closely. My friend also found the same race that I
had about the same time I was finishing up my fix (which he found another race
in, go pair programming). With the corrected fix, the weird hanging went
away, only to be reported once again... in a unit that hadn't been updated
with the patch!

tl;dr: you never know what the root cause might be in weird, racy situations.

Warner