[TUHS] Another odd comment in V6

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] Another odd comment in V6
@ 2017-02-14 14:14 Noel Chiappa
  2017-02-14 14:35 ` Paul Ruizendaal
  2017-02-14 15:48 ` Random832
  0 siblings, 2 replies; 15+ messages in thread
From: Noel Chiappa @ 2017-02-14 14:14 UTC (permalink / raw)


    > From: Paul Ruizendaal

    > There's an odd comment in V6, in tty.c, just above ttread():
    > ...
    > That comment is strange, because it does not describe what the code
    > does.

I can't actually find anyplace where the PC is backed up (except on a
segmentation fault, when extending the stack)?

So I suspect that the comment is a tombstone; it refers to what the code did
at one point, but no longer does.

    > The comment isn't there in V5 or V7.

Which is consistent with it documenting a temporary state of affairs...


    > I wonder if there is a link to the famous Gabriel paper

I suspect so. Perhaps they tried backing up the PC (in the case where a system
call is interrupted by a software interrupt in the user's process), and
decided it was too much work to do it 'right' in all instances, and punted.

The whole question of how to handle software interrupts while a process is
waiting on some event, while in the kernel, is non-trivial, especially in
systems which use the now-universal approach of i) writing in a higher-level
stack oriented language, and ii) 'suspending' with a sub-routine call chain on
the kernel stack.

Unix (at least, in V6 - I'm not familiar with the others) just trashes the
whole call stack (via the qsav thing), and uses the intflg mechanism to notify
the user that a system call was aborted. But on systems with e.g. locks, it
can get pretty complicated (try Googling Multics crawl-out). Many PhD theses
have looked at these issues...


    > Actually, research Unix does save the complete state of a process and
    > could back up the PC. The reason that it doesn't work is in the syscall
    > API design, using registers to pass values etc. If all values were
    > passed on the stack it would work.

Sorry, I don't follow this?

The problem with 'backing up the PC' is that you 'sort of' have to restore the
arguments to the state they were in at the time the system call was first
made. This is actually easier if the arguments are in registers.

I said 'sort of' because the hard issue is that there are system calls (like
terminal I/O) where the system call is potentially already partially executed
(e.g.  a read asking for 10 characters from the user's console may have
already gotten 5, and stored them in the user's buffer), so you can't just
simply completely 'back out' the call (i.e. restore the arguments to what they
were, and expect the system call to execute 'correctly' if retried - in the
example, those 5 characters would be lost).

Instead, you have to modify the arguments so that the re-tried call takes up
where it left off - in the example above, tries to read 5 characters, starting
5 bytes into the buffer). The hard part is that the return value (of the
number of characters actually read) has to count the 5 already read! Without
the proper design of the system call interface, this can be hard - how does
the system distinguish between the _first_ attempt at a system call (in which
the 'already done' count is 0), and a _later_ attempt? If the user passes in
the 'already done' count, it's pretty straightforward - otherwise, not so
much!

Alan Bawden wrote a good paper about PCLSR'ing which explores some of these
issues.

	Noel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 14:14 [TUHS] Another odd comment in V6 Noel Chiappa
@ 2017-02-14 14:35 ` Paul Ruizendaal
  2017-02-14 15:48 ` Random832
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Ruizendaal @ 2017-02-14 14:35 UTC (permalink / raw)

On 14 Feb 2017, at 15:14 , Noel Chiappa wrote:

>> From: Paul Ruizendaal
> 
>> Actually, research Unix does save the complete state of a process and
>> could back up the PC. The reason that it doesn't work is in the syscall
>> API design, using registers to pass values etc. If all values were
>> passed on the stack it would work.
> 
> Sorry, I don't follow this?
> 
> The problem with 'backing up the PC' is that you 'sort of' have to restore the
> arguments to the state they were in at the time the system call was first
> made. This is actually easier if the arguments are in registers.

Yeah, you're right. I was thinking of the 2.9BSD code which only does the backup
in certain cases and when stack parameter mode was used (the 0200 bit), but
stating it like I did is indeed incomplete to say the least.

Another difficulty in stock V6 would be code like this:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/s4/chmod.s
where the data at label 9: could be overwritten by a signal handler
and re-executing the sys call would not work as intended.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 14:14 [TUHS] Another odd comment in V6 Noel Chiappa
  2017-02-14 14:35 ` Paul Ruizendaal
@ 2017-02-14 15:48 ` Random832
  2017-02-14 16:06   ` Dan Cross
  1 sibling, 1 reply; 15+ messages in thread
From: Random832 @ 2017-02-14 15:48 UTC (permalink / raw)

On Tue, Feb 14, 2017, at 09:14, Noel Chiappa wrote:
> Without the proper design of the system call interface, this can be hard - how
> does the system distinguish between the _first_ attempt at a system call (in
> which the 'already done' count is 0), and a _later_ attempt? If the user passes
> in the 'already done' count, it's pretty straightforward - otherwise, not so
> much!

You could return the address of the last character read, and let the
user code do the math.

I'm a bit confused though from a practical point of view where this
comes up. If the terminal is in raw/cbreak mode, the user code must
handle a "partial" read anyway, so returning five bytes is fine. If it's
in canonical mode, the system call does not copy characters into the
user buffer until they have pressed enter. Maybe there's some other case
other than reading from a terminal that it makes sense for, but I
couldn't think of any while writing this post.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 15:48 ` Random832
@ 2017-02-14 16:06   ` Dan Cross
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Cross @ 2017-02-14 16:06 UTC (permalink / raw)

On Tue, Feb 14, 2017 at 10:48 AM, Random832 <random832 at fastmail.com> wrote:

> On Tue, Feb 14, 2017, at 09:14, Noel Chiappa wrote:
> > Without the proper design of the system call interface, this can be hard
> - how
> > does the system distinguish between the _first_ attempt at a system call
> (in
> > which the 'already done' count is 0), and a _later_ attempt? If the user
> passes
> > in the 'already done' count, it's pretty straightforward - otherwise,
> not so
> > much!
>
> You could return the address of the last character read, and let the
> user code do the math.
>
> I'm a bit confused though from a practical point of view where this
> comes up. If the terminal is in raw/cbreak mode, the user code must
> handle a "partial" read anyway, so returning five bytes is fine. If it's
> in canonical mode, the system call does not copy characters into the
> user buffer until they have pressed enter. Maybe there's some other case
> other than reading from a terminal that it makes sense for, but I
> couldn't think of any while writing this post.
>

Reading is sort of a bad example; the mechanism shines much more brightly
when one considers the write case.

Consider typing a file out to a (slow - recall these systems were designed
in the 70s when 300 BAUD terminals were considered fast and 1200 was
downright zippy) terminal device. The user may interrupt and suspend the
file-printing program in order to do something else for a time, and then
want to resume output where it left off. The beauty of the ITS PCLSR
mechanism is that it handles this case transparently: the system backs up
the PC and adjusts the system call arguments so that when the program is
resumed it automatically re-invokes the system call such that it continues
printing where it left off, with no need for the application to care.

An aside: The Gabriel papers elaborated on this, discussing the tradeoff
between the Unix approach and ITS and suggesting that the Unix approach has
better survivability characteristics. It's easier to get the Unix mechanism
"right", whereas ITS's implementation took many pages of assembly language
code (I recall him having a quip along the lines of, "and it probably isn't
all right").

One of the interesting things about Gabriel's writing is that he
acknowledges that the definition of "correct" varies and is subjective and
takes into account a good deal of taste and other "soft" characteristics.
The MIT folks who worked on ITS preferred their approach because they saw
it as being more obviously "correct", while the Unix folks felt the same
way, despite the obvious differences between the two.

        - Dan C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170214/499f1760/attachment.html>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
@ 2017-02-21 19:23 Norman Wilson
  2017-02-22  9:59 ` Tony Finch
  0 siblings, 1 reply; 15+ messages in thread
From: Norman Wilson @ 2017-02-21 19:23 UTC (permalink / raw)


Noel:

  Instead, you have to modify the arguments so that the re-tried call takes up
  where it left off - in the example above, tries to read 5 characters, starting
  5 bytes into the buffer). The hard part is that the return value (of the
  number of characters actually read) has to count the 5 already read! Without
  the proper design of the system call interface, this can be hard - how does
  the system distinguish between the _first_ attempt at a system call (in which
  the 'already done' count is 0), and a _later_ attempt? If the user passes in
  the 'already done' count, it's pretty straightforward - otherwise, not so
  much!

====

Sometime in the latter days of the Research system (somewhere
between when the 9/e and 10/e manuals were published), I had
an inspiration about that, and changed things as follows:

When a system call like read is interrupted by a signal:
-- If no characters have been copied into the user's
buffer yet, return -1 and set errno to EINTR (as would
always have been done in Heritage UNIX).
-- If some data has already been copied out, return the
number of characters copied.

So no data would be lost.  Programs that wanted to keep
reading into the same buffer (presumably until a certain
terminator character is encountered or the buffer is full
or EOF) would have to loop, but a program that didn't loop
in that case was broken anyway: it probably wouldn't work
right were its input coming from a pipe or a network connection.

I don't remember any programs breaking when I made that change,
but since it's approaching 30 years since I did it, I don't
know whether I can trust my memory.  Others on this list may
have clearer memories.

All this was a reaction to the messy (both in semantics and
in implementation) compromise that had come from BSD, to
have separate `restart the system call' and `interrupt the
system call' states.  I could see why they did it, but was
never satisfied with the result.  If only I'd had my inspiration
some years earlier, when there was a chance of it getting out
into the real world and influencing POSIX and so on.  Oh, well.

Norman Wilson
Toronto ON


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-21 19:23 Norman Wilson
@ 2017-02-22  9:59 ` Tony Finch
  0 siblings, 0 replies; 15+ messages in thread
From: Tony Finch @ 2017-02-22  9:59 UTC (permalink / raw)


Norman Wilson <norman at oclsc.org> wrote:
>
> Sometime in the latter days of the Research system (somewhere
> between when the 9/e and 10/e manuals were published), I had
> an inspiration about that, and changed things as follows:
>
> When a system call like read is interrupted by a signal:
> -- If no characters have been copied into the user's
> buffer yet, return -1 and set errno to EINTR (as would
> always have been done in Heritage UNIX).
> -- If some data has already been copied out, return the
> number of characters copied.

Weird, I thought what you describe was the traditional behaviour, e.g.

https://www.freebsd.org/cgi/man.cgi?query=read&sektion=2&manpath=4.3BSD+NET%2F2
https://www.freebsd.org/cgi/man.cgi?query=read&sektion=2&manpath=SunOS+4.1.3

both say you only get EINTR if nothing was read. (This detail seems to go
back to the 4.1c manual.) Was this a BSDism?

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/  -  I xn--zr8h punycode
Thames, Dover, Wight, Portland, Plymouth: West or southwest 6 to gale 8.
Moderate or rough. Occasional drizzle. Moderate occasionally poor.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
@ 2017-02-14 16:17 Noel Chiappa
  0 siblings, 0 replies; 15+ messages in thread
From: Noel Chiappa @ 2017-02-14 16:17 UTC (permalink / raw)


    > From: Random832

    > You could return the address of the last character read, and let the
    > user code do the math.

Yes, but that's still 'design the system call to work with interrupted and
re-started system calls'.

    > If the terminal is in raw/cbreak mode, the user code must handle a
    > "partial" read anyway, so returning five bytes is fine.

As in, if a software interrupt happens after 5 characters are read in, just
terminate the read() call and have it return 5? Yeah, I suppose that would
work.

    > If it's in canonical mode, the system call does not copy characters into
    > the user buffer until they have pressed enter.

I didn't remember that; that TTY code makes my head hurt! I've had to read it
(to add 8-bit input and output), but I can't remember all the complicated
details unless I'm looking at it!


    > Maybe there's some other case other than reading from a terminal that it
    > makes sense for, but I couldn't think of any while writing this post.

As the Bawden paper points out, probably a better example is _output_ to a
slow device, such as a console. If the thing has already printed 5 characters,
you can't ask for them back! :-)

So one can neither i) roll the system call back to make it look like it hasn't
started yet (as one could do, with input, by stuffing the characters back into
the input buffer with kernel ungetc()), or ii) wait for it to complete (since
that will delay delivery of the software interrupt). One can only interrupt
the call (and show that it didn't complete, i.e. an error), or have
re-startability (i.e. argument modification).

	Noel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
@ 2017-02-14 15:40 Noel Chiappa
  0 siblings, 0 replies; 15+ messages in thread
From: Noel Chiappa @ 2017-02-14 15:40 UTC (permalink / raw)

    > From: Lars Brinkhoff

    > Nick Downing <downing.nick at gmail.com> writes:

    >> By contrast the MIT guy probably was working with a much smaller/more
    >> economical system that didn't maintain a kernel stack per process.

I'm not sure I'd call ITS 'smaller'... :-)

    > PCLSRing is a feature of MIT' ITS operating system, and it does have a
    > separate stack for the kernel.

I wasn't sure if there was a separate kernel stack for each process; I checked
the ITS source, and there is indeed a separate stack per process. There are
also three other stacks in the kernel that are used from time to time (look
for 'MOVE P,' for places where the SP is loaded).

Oddly enough, it doesn't seem to ever _save_ the SP - there are no 'MOVEM P,'
instructions that I could find!

	Noel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
@ 2017-02-14  8:46 Paul Ruizendaal
  2017-02-14 11:27 ` Nick Downing
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Ruizendaal @ 2017-02-14  8:46 UTC (permalink / raw)



There's an odd comment in V6, in tty.c, just above ttread():

/*
 * Called from device's read routine after it has
 * calculated the tty-structure given as argument.
 * The pc is backed up for the duration of this call.
 * In case of a caught interrupt, an RTI will re-execute.
 */

That comment is strange, because it does not describe what the code does. The comment isn't there in V5 or V7.

I wonder if there is a link to the famous Gabriel paper about "worse is better" (http://dreamsongs.com/RiseOfWorseIsBetter.html). In arguing its points, the paper includes this story:

---
Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called PC loser-ing because the PC is being coerced into loser mode, where loser is the affectionate name for user at MIT.

The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.

The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix -- namely, implementation simplicity was more important than interface simplicity.
---

Actually, research Unix does save the complete state of a process and could back up the PC. The reason that it doesn't work is in the syscall API design, using registers to pass values etc. If all values were passed on the stack it would work. As to whether it is the right thing to be stuck in a read() call waiting for terminal input after a signal was received...

I always thought that this story was entirely fictional, but now I wonder. The Unix guru referred to could be Ken Thompson (note how he is first referred to as "from Berkeley but working on Unix" and then as "the New Jersey guy").

Who can tell me more about this? Any of the old hands?

Paul



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14  8:46 Paul Ruizendaal
@ 2017-02-14 11:27 ` Nick Downing
  2017-02-14 12:27   ` Paul Ruizendaal
  2017-02-14 14:03   ` Lars Brinkhoff
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Downing @ 2017-02-14 11:27 UTC (permalink / raw)

Well I don't know about this actual conversation in history so I can't
help with that. But I can describe how interrupted system calls work.

Firstly it depends on what you mean by an interrupt. The guy from MIT
might have meant, "what happens if the process is blocked in a system
call and a task switch occurs". This isn't a problem in the design of
unix, because a process continues to have a normal program counter
(PC) and stack pointer (SP) whilst executing a system call. A task
switch can only occur when executing inside the kernel, and what it
does is to save the kernel PC and SP of the process going to sleep,
and then restore the kernel PC and SP of the process waking up. So the
system call that was being executed when the waking-up process went to
sleep, is NOT restarted as a fresh system call. By contrast the MIT
guy probably was working with a much smaller/more economical system
that didn't maintain a kernel stack per process.

I now have to digress a bit to discuss interrupts. There are two kinds
of interrupts, one is a direct hardware interrupt that gets delivered
to the kernel, such as a timer interrupt or an I/O event, e.g. for an
RS232 TTY interface this could be receiver ready (character just came
in) or transmitter ready (character just went out, can now transmit
another one). The other is a simulated interrupt that gets delivered
to a user process. This latter is what we call a "signal", some
examples are SIGINT (Ctrl-C was pressed) or SIGALRM (process requested
a timer interrupt and the requested period has elapsed). Note that
signals are not real interrupts, they are a "fiction" created by the
kernel that appears to the receiving process like an interrupt. This
was chosen by Thompson and Ritchie because it's convenient and "makes
sense" -- for exactly the same reasons that hardware designers created
hardware interrupts, clever software designers created signals.

One slight complication with hardware interrupts is they can happen in
either kernel mode (current process is doing a system call) or user
mode (current process is doing some computation in between system
calls). The kernel mode case is the simplest -- it behaves as if the
kernel had executed a subroutine call to the interrupt service routine
(ISR). The ISR has to be careful to save and restore any registers it
uses. It also cannot redirect execution flow anywhere else, it has to
service the interrupting hardware device quickly and then return. On
the other hand, if the hardware interrupt occurs in user mode, it
behaves as if the user-mode program had executed a null system call --
which simply does nothing, does not alter any registers or memory, and
returns. But once inside the kernel, it then behaves as if the
kernel-mode program had executed a subroutine call to the ISR. Once
this finishes, the kernel executes basically its normal system call
return sequence (but without any status return), requiring checks (1)
should process be pre-empted and (2) should signals be delivered.

If process is going to be pre-empted (either at the conclusion of a
system call, or else before the "fake" system call associated with a
hardware interrupt returns) then the kernel does a normal in-kernel
task switch, basically it goes to sleep and another process wakes up
in kernel mode. Then when the pre-empted process eventually gets
scheduled again, it resumes in kernel mode and continues with its
system call return sequence. If signals are going to be delivered,
this is where things get a bit interesting. Conceptually all we do is
to take the user-mode PC we were going to return to, and put it on the
user-mode stack (decrementing the user-mode SP by one word) and then
load the user-mode PC with the address of the user-mode signal hander,
and then continue the system call return sequence. So, to the
user-mode program, it looks like the system call completed, and then
the user program magically did a subroutine call to the signal
handler. If the system call was a "fake" system call due to a hardware
interrupt, then this appears asynchronous to the user program.

One way of thinking about a signal, is that it's a "callback" from the
operating system to tell you that something's happened. This is
particularly useful in the context of long-running system calls, like
for instance suppose I've used the alarm() system call to ask for a
SIGALRM to be delivered to my process in 30 seconds, and then I've
used the read() system call to get some input from the TTY. If the
read() system call is still blocked after 30 seconds, the SIGALRM
signal has to be delivered to me, and conceptually we would like the
kernel to call my signal handler to tell me about the SIGALRM, and
then return to the blocked read() system call, which conceptually
should not return until there is some input available. But alas this
conceptual model is too dangerous to implement, because of the risk to
the kernel from badly behaved signal handlers. The kernel can't risk
calling user code directly, in case that user code never returns.

So, in early versions of unix what happened was that if SIGALRM or
some other signal had to be delivered during a blocking operation, the
blocking operation would terminate returning the error EINTR, and then
in the normal system call return sequence the signal would be
delivered, and after the signal handler executed, the user code that
made the system call had to process the EINTR. Basically, every call
to read() or similar, had to check for an EINTR return, and then
restart the system call. Very annoying, and a source of bugs since
there would be plenty of little-used read() sites in the program that
didn't have the required checks. So, in my understanding at least, a
new error code was reserved, I think by the BSD designers, called
ERESTART, and the C library's system call routines like read() and
write() were modified to always do the looping that restarts the
system call, if the system call returns an error and the error is
ERESTART. EINTR is thus (in my understanding) redundant, except for a
few rare system calls where restarting isn't desirable or EINTR can be
requested.

With this innovation we can have callbacks from the operating system
at any time, even though the operating system is executing a blocking
operation, and if the signal handler returns normally (to the C
library code just after the system call, i.e. to the ERESTART check),
then from the user-mode foreground program's point of view it is
completely transparent, the read() only returns when it has data. But
if the signal handler instead chooses to, say, abort the user-mode
foreground program by doing a longjmp() to an error recovery routine,
and then restarting the user-mode foreground program from a known
point such as its main loop or main menu, then this is also fine, the
blocking system call appears to have been aborted so that the
foreground program could do its error recovery. And the kernel just
doesn't care, since once it's done its system call return sequence and
diverted the user-mode execution appropriately, it washes its hands of
it all.

And now having explained all that, I can address your point about the
interface complexity. To keep things simple, unix doesn't have the
ability to actually undo or restart a system call. For instance, if I
was reading the TTY and the kernel delivered some keystrokes into my
buffer, it can't restore the previous contents of the buffer and move
those keystrokes back into the TTY driver. Some other operating
systems probably can do this. And, for instance, x86 processors can do
something like this: If a pagefault occurs partway through the
execution of an instruction, the x86 processor will actually restore
all register values to their values prior to the instruction starting,
so that the pagefault handler can resume the user program cleanly
after a pagefault. This is kind of a good feature (the 68010 added
this feature fixing a significant issue in the 68000 which prevented
its widespread adoption in unix workstations), but it's also a potent
source of bugs, for instance early 286 silicon didn't correctly
restart some instructions after a segment-not-present or similar
fault. It's clear to see why Thompson and Ritchie didn't adopt such a
design. Instead what happens is, the blocking system calls are
specified so that they can only return EINTR or ERESTART if no data
has been transferred. If data has been transferred, they return a
"short read" or "short write".

To make my discussion complete I also have to mention something about
signal trampolines. To understand these properly it's best to think
about hardware interrupts (signals are just an emulation of a hardware
interrupt). Hardware interrupts always disable further interrupts of
the same kind while they execute. For instance suppose a TTY is
receiving characters very fast, it would be unacceptable to have the
ISR get interrupted by a fresh character arriving halfway through
stashing the last character. So the hardware always provides a way to
enter the ISR with interrupts of the same or lower priority as the one
under service, disabled. The ISR then has to inform the hardware when
these can be re-enabled. Moreover, this last step has to be done
*atomically* with the interrupt return -- it can't be done BEFORE the
interrupt return, because fast-arriving interrupts could each push a
new return address on the stack until the stack overflows.

Thus the hardware provides intructions like IRET (x86) which
atomically re-enable interrupts and return. What unix provides is a
system call that *behaves* like IRET, it is called sigreturn(). What
this does conceptually, is to re-enable the signal that just got
delivered, and then make the user-mode process do a
return-from-subroutine. This neatly reverses the steps taken when
delivering the signal, i.e. it pops the user-mode PC from the
user-mode stack, increasing the user-mode SP by one word. To make this
whole process more user-friendly, the C library provides something
called the signal trampoline. When the kernel wants to deliver a
signal, it first masks off further delivery of signals of the same
kind. Then, instead of making the user-mode program issue a subroutine
call to the signal handler, it makes the user-mode program call the
signal trampoline. In turn the signal trampoline calls the user's
signal handler, and when the user's signal handler returns (which is
not compulsory, it is allowed to do a longjmp() to error recovery code
instead), the signal trampoline does the sigreturn().

cheers, Nick

On Tue, Feb 14, 2017 at 7:46 PM, Paul Ruizendaal <pnr at planet.nl> wrote:
>
> There's an odd comment in V6, in tty.c, just above ttread():
>
> /*
>  * Called from device's read routine after it has
>  * calculated the tty-structure given as argument.
>  * The pc is backed up for the duration of this call.
>  * In case of a caught interrupt, an RTI will re-execute.
>  */
>
> That comment is strange, because it does not describe what the code does. The comment isn't there in V5 or V7.
>
> I wonder if there is a link to the famous Gabriel paper about "worse is better" (http://dreamsongs.com/RiseOfWorseIsBetter.html). In arguing its points, the paper includes this story:
>
> ---
> Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called PC loser-ing because the PC is being coerced into loser mode, where loser is the affectionate name for user at MIT.
>
> The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.
>
> The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix -- namely, implementation simplicity was more important than interface simplicity.
> ---
>
> Actually, research Unix does save the complete state of a process and could back up the PC. The reason that it doesn't work is in the syscall API design, using registers to pass values etc. If all values were passed on the stack it would work. As to whether it is the right thing to be stuck in a read() call waiting for terminal input after a signal was received...
>
> I always thought that this story was entirely fictional, but now I wonder. The Unix guru referred to could be Ken Thompson (note how he is first referred to as "from Berkeley but working on Unix" and then as "the New Jersey guy").
>
> Who can tell me more about this? Any of the old hands?
>
> Paul
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 11:27 ` Nick Downing
@ 2017-02-14 12:27   ` Paul Ruizendaal
  2017-02-14 12:46     ` Nick Downing
  2017-02-14 14:03   ` Lars Brinkhoff
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Ruizendaal @ 2017-02-14 12:27 UTC (permalink / raw)

Hi Nick,

Many thanks for that background!

I think the quote from the Gabriel paper indeed refers to software
interrupts, i.e. signals -- it would not make sense otherwise. The
ITS system that the MIT guy referred to is 'large', it ran on PDP10
mainframes.

I understand how executing a signal handler is piggy-backed on the
return from kernel mode. However, when the signal handler is
finished it could either continue with the next instruction or
re-excute the system call trap instruction. See
http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.9BSD/usr/src/sys/sys/trap.c
(towards end) for details how this is actually done in 2.9BSD.
I think you referred to that mechanism as well.

However, my question remains: why is that mysterious comment there,
above ttread() in V6, and is there a link with the Gabriel story?

Paul

On 14 Feb 2017, at 12:27 , Nick Downing wrote:

> Well I don't know about this actual conversation in history so I can't
> help with that. But I can describe how interrupted system calls work.
> [..more..]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 12:27   ` Paul Ruizendaal
@ 2017-02-14 12:46     ` Nick Downing
  0 siblings, 0 replies; 15+ messages in thread
From: Nick Downing @ 2017-02-14 12:46 UTC (permalink / raw)

Yes, you are right, I had not paid attention to that pc=opc stuff, in
fact 2.9 has a comment saying it's backing the PC up but the other
BSDs do not, so I hadn't noticed that bit. :) I was probably thinking
of another unix that implements it in the C library not the kernel,
however it makes no difference conceptually. Interestingly, 2.11BSD
has ERESTART defined as an errno and does the pc=opc thing if ERESTART
was to have been returned as the errno. Whereas the other BSDs have
another variable eosys which has just a few possible values, where one
of those values (NORMALRETURN or some such) means that errno should be
checked as well. I like the 2.11BSD way. V7 does not have the pc=opc
thing and there is no mention of restarting, so I suppose EINTR just
aborts the interrupted system call.
cheers, Nick

On Tue, Feb 14, 2017 at 11:27 PM, Paul Ruizendaal <pnr at planet.nl> wrote:
> Hi Nick,
>
> Many thanks for that background!
>
> I think the quote from the Gabriel paper indeed refers to software
> interrupts, i.e. signals -- it would not make sense otherwise. The
> ITS system that the MIT guy referred to is 'large', it ran on PDP10
> mainframes.
>
> I understand how executing a signal handler is piggy-backed on the
> return from kernel mode. However, when the signal handler is
> finished it could either continue with the next instruction or
> re-excute the system call trap instruction. See
> http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.9BSD/usr/src/sys/sys/trap.c
> (towards end) for details how this is actually done in 2.9BSD.
> I think you referred to that mechanism as well.
>
> However, my question remains: why is that mysterious comment there,
> above ttread() in V6, and is there a link with the Gabriel story?
>
> Paul
>
> On 14 Feb 2017, at 12:27 , Nick Downing wrote:
>
>> Well I don't know about this actual conversation in history so I can't
>> help with that. But I can describe how interrupted system calls work.
>> [..more..]
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 11:27 ` Nick Downing
  2017-02-14 12:27   ` Paul Ruizendaal
@ 2017-02-14 14:03   ` Lars Brinkhoff
  2017-02-14 14:18     ` Nick Downing
  1 sibling, 1 reply; 15+ messages in thread
From: Lars Brinkhoff @ 2017-02-14 14:03 UTC (permalink / raw)


Nick Downing <downing.nick at gmail.com> writes:
> By contrast the MIT guy probably was working with a much smaller/more
> economical system that didn't maintain a kernel stack per process.

No.  PCLSRing is a feature of MIT' ITS operating system, and it does
have a separate stack for the kernel.

Here is a copy of Alan Bawdens paper about PCLSRing:
http://fare.tunes.org/tmp/emergent/pclsr.htm


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 14:03   ` Lars Brinkhoff
@ 2017-02-14 14:18     ` Nick Downing
  2017-02-14 15:50       ` Random832
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Downing @ 2017-02-14 14:18 UTC (permalink / raw)

Excellent paper, well that makes it completely clear what the MIT guy
means by restarting a system call. It is interesting that in their
approach they restart a read() or write() call or whatever they call
it in their system, with the buffer advanced and the count reduced.
This is a bit like what would happen in x86 if it gets interrupted in
a REP MOVSB instruction, it returns into REP MOVSB with SI/DI advanced
and CX reduced. However it would not work exactly right in unix due to
the return value from read()/write() being wrong. Anyway, I like the
unix way, it is nice and simple. I have never found it to be a problem
when devices return a "short read" or "short write", although it is an
interesting semantic that if you opened the file yourself and you're
doing lseek on it, it cannot be a TTY, and short reads/writes do not
occur. Having a hypothetical system with a very slow disk (or fast
CPU) in which disk accesses are blocking, would break many programs.
cheers, Nick

On Wed, Feb 15, 2017 at 1:03 AM, Lars Brinkhoff <lars at nocrew.org> wrote:
> Nick Downing <downing.nick at gmail.com> writes:
>> By contrast the MIT guy probably was working with a much smaller/more
>> economical system that didn't maintain a kernel stack per process.
>
> No.  PCLSRing is a feature of MIT' ITS operating system, and it does
> have a separate stack for the kernel.
>
> Here is a copy of Alan Bawdens paper about PCLSRing:
> http://fare.tunes.org/tmp/emergent/pclsr.htm

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [TUHS] Another odd comment in V6
  2017-02-14 14:18     ` Nick Downing
@ 2017-02-14 15:50       ` Random832
  0 siblings, 0 replies; 15+ messages in thread
From: Random832 @ 2017-02-14 15:50 UTC (permalink / raw)

On Tue, Feb 14, 2017, at 09:18, Nick Downing wrote:
> Having a hypothetical system with a very slow disk (or fast
> CPU) in which disk accesses are blocking, would break many programs.
> cheers, Nick

That's not very hypothetical, but it is probably the reason disk access
(and even access to regular files on networked filesystems) isn't
interruptible, and non-blocking I/O useless in such situations, on some
modern systems.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-22  9:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-14 14:14 [TUHS] Another odd comment in V6 Noel Chiappa
2017-02-14 14:35 ` Paul Ruizendaal
2017-02-14 15:48 ` Random832
2017-02-14 16:06   ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2017-02-21 19:23 Norman Wilson
2017-02-22  9:59 ` Tony Finch
2017-02-14 16:17 Noel Chiappa
2017-02-14 15:40 Noel Chiappa
2017-02-14  8:46 Paul Ruizendaal
2017-02-14 11:27 ` Nick Downing
2017-02-14 12:27   ` Paul Ruizendaal
2017-02-14 12:46     ` Nick Downing
2017-02-14 14:03   ` Lars Brinkhoff
2017-02-14 14:18     ` Nick Downing
2017-02-14 15:50       ` Random832

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).