Re: [9fans] rfork(), getss() etc etc

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] rfork(), getss() etc etc
@ 2000-09-02  9:49 nigel
  2000-09-02 10:52 ` Alexander Viro
  0 siblings, 1 reply; 9+ messages in thread
From: nigel @ 2000-09-02  9:49 UTC (permalink / raw)
  To: 9fans

>>	Check the examples of use. Really.

Linuxthreads is a very good example.

>>	Ferchrissake, you've explicitly asked for shared address space. It

Well, yes, but stacks are a special case. Each process has to take care
not to write outside of it's own data, fine. This is called good programming
technique, and does not require any special coding.

Ensuring that the stack is not overflowed requires either compiler
assistance, or contortions in programming.

If I thought that splitting the stack was an unreasonable thing to ask for,
I wouldn't. The thing is, fork() does it so it can't be hard. Also, I have solid
examples of operating systems which provide a choice.

>>	(Thread_Data *) (ESP & -Alignment) + Alignement - sizeof(Thread_Data)

Hadn't escaped my radar. We're getting into machine dependency here again,
but it is a solution that I had tried.

>>	consequences - you are welcome, just let's avoid imitating *.advocacy.

One man's advocacy is another man's technical discussion. I simply do not
buy the "clone() is perfect and cannot be changed" attitude, or for that
matter the "FreeBSD rfork() is perfect and cannot be changed" attitude either.
In both cases, the problem could be solved by adding a spot of functionality,
and taking away none.

So one could add this feature, break nothing, and aid a whole class of applications.
Why not?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-02  9:49 [9fans] rfork(), getss() etc etc nigel
@ 2000-09-02 10:52 ` Alexander Viro
  2000-09-03  2:51   ` Scott Schwartz
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Viro @ 2000-09-02 10:52 UTC (permalink / raw)
  To: 9fans



On Sat, 2 Sep 2000 nigel@9fs.org wrote:

> >>	(Thread_Data *) (ESP & -Alignment) + Alignement - sizeof(Thread_Data)
> 
> Hadn't escaped my radar. We're getting into machine dependency here again,
> but it is a solution that I had tried.
> 
> >>	consequences - you are welcome, just let's avoid imitating *.advocacy.
> 
> One man's advocacy is another man's technical discussion. I simply do not
> buy the "clone() is perfect and cannot be changed" attitude, or for that
> matter the "FreeBSD rfork() is perfect and cannot be changed" attitude either.
FreeBSD rfork() still can be used to panic the box ;-/

> In both cases, the problem could be solved by adding a spot of functionality,
> and taking away none.
> 
> So one could add this feature, break nothing, and aid a whole class of applications.
> Why not?

	OK, I'll try to describe the reasons. Let's hope that I'm awake
enough to do that...

	Splitting the stack means that we are getting two classes of 
pointers - stack and non-stack ones. E.g. if you are doing coroutines you
can't pass the pointers to auto variables even if their lifetimes are OK.
It's not nice, to put it mildly.
	On the kernel side we would have to use separate page tables for
every process. Even if they share VM context. It has a lot of interesting
implications. One of them is that unmapping becomes very expensive.
Another, and that's more serious, is that _every_ context switch leads to
complete TLB flush.
	You will not notice the effect simply benshmarking schedule(),
but you will get big slowdown spread over the userland. It's not a pure
theory - effect is quite visible.
	Trying to work around that would give serious mess in VM code. If
you have a clean way to do that - great. So far all proposals were
extremely messy.
	Moreover, we _have_ support for large amount of mappings. It makes
the situation very different - solution that works for Plan 9 will break
horribly on such types of use. "Don't do it, then" is a nice policy, but
it works both ways.
	Having the "same VM - same memory" policy simplifies the things
big way.

	IOW, mixing these things will require serious changes in the
kernel that will try it and I'm less than sure that it's worth the
trouble. Same goes for doing tons of segments on the Plan 9 side (bloated
kernel memory and serious slowdown  or  changing the data structures in
not-too-obvious ways).

	Features may be nice, but they must be doable in clean way.
Otherwise you end up with SVR4 on hands.

	One of the things that might make sense on our side would be
sharing a VMA (more or less equivalent to Plan 9 segment) between several
VMs. Right now I don't see how to do it without very bad behaviour in case
of VM with many areas. Hell knows, it might be doable. However, semantics
of mmap() becomes rather interesting with such change. And life without a
feature is IMO better than life with an ugly kludge.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-02 10:52 ` Alexander Viro
@ 2000-09-03  2:51   ` Scott Schwartz
  2000-09-03  3:03     ` Boyd Roberts
  2000-09-05  5:32     ` Erik Theisen
  0 siblings, 2 replies; 9+ messages in thread
From: Scott Schwartz @ 2000-09-03  2:51 UTC (permalink / raw)
  To: 9fans

| 	You will not notice the effect simply benshmarking schedule(),
| but you will get big slowdown spread over the userland. It's not a pure
| theory - effect is quite visible.

Just out of curiosity, do any real linux programs use clone or pthreads
or whatever?  Say what you like about rfork, but lots of real programs
make good use of it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-03  2:51   ` Scott Schwartz
@ 2000-09-03  3:03     ` Boyd Roberts
  2000-09-05  5:32     ` Erik Theisen
  1 sibling, 0 replies; 9+ messages in thread
From: Boyd Roberts @ 2000-09-03  3:03 UTC (permalink / raw)
  To: 9fans

From: Scott Schwartz <schwartz@bio.cse.psu.edu>

> Just out of curiosity, do any real linux programs use clone or pthreads
> or whatever

i could believe that there is a set of linux programs, somewhere,
that use any random collection of linux braindamage, but it's just
a gut feeling :-)




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-03  2:51   ` Scott Schwartz
  2000-09-03  3:03     ` Boyd Roberts
@ 2000-09-05  5:32     ` Erik Theisen
  1 sibling, 0 replies; 9+ messages in thread
From: Erik Theisen @ 2000-09-05  5:32 UTC (permalink / raw)
  To: 9fans

on 02.09.00 22:51, Scott Schwartz at schwartz@bio.cse.psu.edu wrote:

> Just out of curiosity, do any real linux programs use clone or pthreads
> or whatever?  Say what you like about rfork, but lots of real programs
> make good use of it.
> 

I believe that the lastest versions of StarOffice use clone().




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-02  9:31   ` Alexander Viro
@ 2000-09-02  9:39     ` Alexander Viro
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander Viro @ 2000-09-02  9:39 UTC (permalink / raw)
  To: 9fans



Arrgh. Sorry, I really need more coffee...

> 	register (int *)fn(void*) = _fn;
> 	register void *arg = _arg;
> 	register unsigned flagd = _flags;
> 	register int pid;
> 
- 	/* new_sp ignored unless flags has CLONE_VM set */
+	/* new_sp ignored if it is 0 */
> 	pid = _syscall2(__NR_CLONE, flags, new_sp);
- 	if ((flags & CLONE_VM) && pid == 0)
+	if (pid == 0)
> 		exit((*fn)(arg));
> 	return pid;



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-02  8:57 ` Alexander Viro
@ 2000-09-02  9:31   ` Alexander Viro
  2000-09-02  9:39     ` Alexander Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Viro @ 2000-09-02  9:31 UTC (permalink / raw)
  To: 9fans



On Sat, 2 Sep 2000, I wrote:

> On Sat, 2 Sep 2000 nigel@9fs.org wrote:
> 
> > Now, what is the problem with this? Firslty, the only way to tell whether
> > you are parent or child after the split is to check the return result from
> > the system call. The inevitable conclusion is that assembly code is
> > required to establish a new stack. This is a retrograde step, and I am
> 
> 	Simply not true. Kernel is perfectly able to set the usermode ESP
> before returning to userland. Code that does transition from the kernel
> mode to user mode is in the kernel. Usually it's an assembler (check
> forkret() in /sys/src/9/pc/l.s for exact parallel). Picking the right ESP
> value happens in IRETL, same way on all systems in question. Nothing
> special in userland.

	Damn. Sorry, I've realized what you might mean right after sending
the reply ;-/ Yes, there is some userland trickery. Unlike other system
calls this beast does essentially

	register (int *)fn(void*) = _fn;
	register void *arg = _arg;
	register unsigned flagd = _flags;
	register int pid;

	/* new_sp ignored unless flags has CLONE_VM set */
	pid = _syscall2(__NR_CLONE, flags, new_sp);
	if ((flags & CLONE_VM) && pid == 0)
		exit((*fn)(arg));
	return pid;

with some gcc-isms to _force_ these variables to be in registers.

Since the thing is in the library _and_ contains the assembler anyway (as
any other system call wrapper, be it on Linux, Plan 9 or FreeBSD -
INTR is hardly pure C ;-)... Yes, you have some point, but not too
serious one. Ironic, since the unusual (compared to other system calls)
part doesn't require any assembler...



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] rfork(), getss() etc etc
  2000-09-02  7:50 nigel
@ 2000-09-02  8:57 ` Alexander Viro
  2000-09-02  9:31   ` Alexander Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Viro @ 2000-09-02  8:57 UTC (permalink / raw)
  To: 9fans



On Sat, 2 Sep 2000 nigel@9fs.org wrote:

> Now, what is the problem with this? Firslty, the only way to tell whether
> you are parent or child after the split is to check the return result from
> the system call. The inevitable conclusion is that assembly code is
> required to establish a new stack. This is a retrograde step, and I am

	Simply not true. Kernel is perfectly able to set the usermode ESP
before returning to userland. Code that does transition from the kernel
mode to user mode is in the kernel. Usually it's an assembler (check
forkret() in /sys/src/9/pc/l.s for exact parallel). Picking the right ESP
value happens in IRETL, same way on all systems in question. Nothing
special in userland.

> staggered to find open source systems promoting the use of assembly
> code by including system call variants which cannot be sensibly used
> without.

	Check the examples of use. Really.

> Secondly, the stack now established is not managed or protected by
> the kernel. Good grief, this is what we all criticise Win9x and MacOS
> for.

	Ferchrissake, you've explicitly asked for shared address space. It
might be stupid, but if you are asking for that, you _are_ getting what
you ask for. Come on, clone() may be good or bad, but lack of memory
protection is precisely what you are asking for. And getting on all
systems in question.

> Thirdly, you can only identify which process you are by calling getpid(),
> rather than referencing a data structure in your own stack. This is
> expensive, as it involves a system call, plus some form of mapping
> (?hash table?) from pid to per process data structure.
> 
> This is might be why gettss() was used. It produces a number with
> a smaller range that getpid(), allowing a simple index to per process
> data.

	Sigh... I _really_ doubt that folks who had written it need your
protection. Or unable to speak themselves if they would want to. Not that
there was much to speak about - code relied on seriously non-portable hack
that was bound to break at some point. It happened. There are more
portable and clean solutions, but they rely on the thing that was missing
in manpages. So there are two things to fix: manpages on Linux and TSS
hack.

	There are good reasons behind both variants of semantics. There
are _very_ good reasons not to change either of them. There are relatively
simple ways to get thread-specific data with multiple-stacks one.
Example: map the areas on aligned boundary, put the thread-specific data
in the bottom of the area and use the equivalent of

(Thread_Data *) (ESP & -Alignment) + Alignement - sizeof(Thread_Data)

for access to the current one. Turn that into inlined function and you've
	* got rid of hash/array/whatever - you have a function that
returns a pointer to the structure you need.
	* made it faster than it used to be (1 register->register
assignment + 2 binary operations with one of the arguments being constant
- cheaper than extracting TSS and accessing element of array;  definitely
cheaper than any games with hashes).
	* made it independent from the implementation of context switches
used in the kernel.

	IMO it's a win compared to TSS hack. If nothing else, it works ;-)

	As far as I'm concerned that's it. If somebody wants to discuss
the reasons behind the semantics or compare implementations and their
consequences - you are welcome, just let's avoid imitating *.advocacy.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [9fans] rfork(), getss() etc etc
@ 2000-09-02  7:50 nigel
  2000-09-02  8:57 ` Alexander Viro
  0 siblings, 1 reply; 9+ messages in thread
From: nigel @ 2000-09-02  7:50 UTC (permalink / raw)
  To: 9fans

Hmm.

clone() not splitting the stack is a feature of Linux, and probably not
worth wasting brain power over. The fact that rfork() in FreeBSD copied
clone() semantics, but the Plan 9 manual page is hard to
believe. I did get some indication recently that the FreeBSD manual page
would be changed to own up to the fact that it was rfork() by name
but clone() by nature! FreeBSD does not split either because of the way
the VM works (reading between the lines, allegedly).

Now, what is the problem with this? Firslty, the only way to tell whether
you are parent or child after the split is to check the return result from
the system call. The inevitable conclusion is that assembly code is
required to establish a new stack. This is a retrograde step, and I am
staggered to find open source systems promoting the use of assembly
code by including system call variants which cannot be sensibly used
without.

Secondly, the stack now established is not managed or protected by
the kernel. Good grief, this is what we all criticise Win9x and MacOS
for.

Thirdly, you can only identify which process you are by calling getpid(),
rather than referencing a data structure in your own stack. This is
expensive, as it involves a system call, plus some form of mapping
(?hash table?) from pid to per process data structure.

This is might be why gettss() was used. It produces a number with
a smaller range that getpid(), allowing a simple index to per process
data.

The Vita Nuova FreeBSD port of the Inferno emu uses rfork() and getpid().

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2000-09-05  5:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-02  9:49 [9fans] rfork(), getss() etc etc nigel
2000-09-02 10:52 ` Alexander Viro
2000-09-03  2:51   ` Scott Schwartz
2000-09-03  3:03     ` Boyd Roberts
2000-09-05  5:32     ` Erik Theisen
  -- strict thread matches above, loose matches on Subject: below --
2000-09-02  7:50 nigel
2000-09-02  8:57 ` Alexander Viro
2000-09-02  9:31   ` Alexander Viro
2000-09-02  9:39     ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).