9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Geoff Collyer <geoff@collyer.net>
Cc: 9fans@cse.psu.edu
Subject: Re: [9fans] Re: Threads: Sewing badges of honor onto a Kernel
Date: Fri, 27 Feb 2004 01:30:57 -0800	[thread overview]
Message-ID: <Pine.LNX.4.58.0402270058480.2563@ppc970.osdl.org> (raw)
In-Reply-To: <a7ec3985c6aeabd43949005aa6af0f67@collyer.net>



On Fri, 27 Feb 2004, Geoff Collyer wrote:
>
> I think we're talking past each other due to different terminologies.
> Linus seems to use `thread' to mean `a process sharing address space
> (other than normal text segment sharing)', whereas in Plan 9, that's
> just a process; some share address space, some don't.  A Plan 9 thread
> is entirely a user-mode creation of the Plan 9 thread library, which
> doesn't implement POSIX threads.

Well, what Linux has is really what I privately call a "context of
execution".

The "clone()" system call in Linux just creates a new such "context of
execution", and you can choose to arbitrarily share pretty much any OS
state, by just saying which state you want to share in a bitmap. In
addition to the bitmap there are a few pointers you pass around, the full
required state is actually

	clone_flags: bitmap of how to create the new context of execution
	newsp: stack pointer of new context
	parent tidptr: pointer (in the parent) to the thread ID information
	child tidptr: pointer (in the child) to the thread ID information
	tls pointer: pointer to TLS (thread-local-storage) for the context

but not all of them are necessarily used (ie if you don't want to set TLS
or TID information, those pointers are obviously unused).

The bits you can control the context copy with are:

  CSIGNAL         	/* signal mask to be sent at exit */

  CLONE_VM        	/* set if VM shared between processes */
  CLONE_FS        	/* set if fs info shared between processes */
  CLONE_FILES     	/* set if open files shared between processes */
  CLONE_SIGHAND   	/* set if signal handlers and blocked signals shared */
  CLONE_IDLETASK  	/* set if new pid should be 0 (kernel only)*/
  CLONE_PTRACE    	/* set if we want to let tracing continue on the child too */
  CLONE_VFORK     	/* set if the parent wants the child to wake it up on mm_release */
  CLONE_PARENT    	/* set if we want to have the same parent as the cloner */
  CLONE_THREAD    	/* Same thread group? */
  CLONE_NEWNS     	/* New namespace group? */
  CLONE_SYSVSEM   	/* share system V SEM_UNDO semantics */
  CLONE_SETTLS    	/* create a new TLS for the child */
  CLONE_PARENT_SETTID   /* set the TID in the parent */
  CLONE_CHILD_CLEARTID  /* clear the TID in the child */
  CLONE_DETACHED        /* Unused, ignored */
  CLONE_UNTRACED        /* set if the tracing process can't force CLONE_PTRACE on this clone */
  CLONE_CHILD_SETTID    /* set the TID in the child */
  CLONE_STOPPED         /* Start in stopped state */

(CSIGNAL isn't a bit - it's the low 8 bits, and it specifies the signal
you want to send to your parent when you die).

So a "fork()" is literally really just a "clone(SIGCHLD)". We're saying
that we don't want to share anything, and that we want to send a SIGCHLD
at exit.

Setting the CLONE_VM bit says that the VM gets shared. That means that
instead of copying the page tables, we just copy the pointer to the
"struct mm_struct", which describes everything in the VM, and we increment
its reference count.

There is no "partial copy". If you say that you want to share the VM, you
get the WHOLE VM. Or you will get a totally private VM. Similarly, i fyou
say that you want to share the file descriptors (CLONE_FILES), they will
all be shared: one context doing an "open()" will have that fd be valid in
all other contexts that share it.

(The difference between CLONE_FILES and a regular fork() is that a
CLONE_FILES will increment just _one_ reference count: the reference count
for the whole array of pointers to files. In contrast, a fork-like
non-shared case will create a whole new array of pointers to files, and
then for each file increment the pointer for that file).

What most "unix people" call threads is somethign that is created with
pretty much all flags set - we share pretty much everything except for the
register state and the kernel stack between the contexts. And when I say
"share", I really mean share: most of the bits end up being copying a
kernel pointer and incrementing the reference count for that object.

Some of the bits are "administrative": the VFORK bit isn't about sharing,
it's about the parent waiting until the child releases the VM back to it
(btw, that uses a "completion" structure on the parents stack). Similarly,
the SETTID/CLEARTID bits are about writing the TID ("thread ID" as opposed
to "process ID") to the VM space atomically with the creation (or in the
case of CLEARTID, teardown) of the thread. That ends up helping the thread
management (from user space) a _lot_.

(Tangential to this discussion is the TLS or "thread local storage" bit -
some architecture-specific way of indicating a small thread-specific
storage area. It's not the stack, it's just a regular allocation, and
different architectures have different ways of pointing to it. Usually
there's some architected register set aside for it).

And you can mix and match things. You can literally create a new context
that shares the file descriptors (so that one process doing an "open()"
will open files in the other one), but doesn't share the VM space.

Although some of them are interdependent - CLONE_THREAD (which is really
just "all signal state" despite the name - it has nothing to do with VM
per se) depends on CLONE_SIGHAND (which is just the set of signal handlers
associated with the context), which in turn depends on CLONE_VM (because
it doesn't make sense to be able to take a signal in different contexts
unless they share the same VM).

This has gotten fairly far off the notion of stacks and VM.. But I hope
it's clear to everybody that I heartily agree with rfork-like
functionality. It's just segmented/private stacks I can't understand.

			Linus


       reply	other threads:[~2004-02-27  9:30 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a7ec3985c6aeabd43949005aa6af0f67@collyer.net>
2004-02-27  9:30 ` Linus Torvalds [this message]
2004-02-27  8:33   ` boyd, rounin
2004-02-27  9:52     ` Linus Torvalds
2004-02-27 12:14       ` Fco.J.Ballesteros
2004-03-02  4:48   ` Martin C.Atkins
2004-03-02  4:56     ` ron minnich
2004-03-02  9:42       ` Bengt Kleberg
2004-03-04  6:43 Andrew Simmons
     [not found] <f62d09b11d1f097b3f4b5f6b70b65ea5@proxima.alt.za>
2004-03-02  6:55 ` David Tolpin
     [not found] <749601c045a56e4f77835f30907e255b@vitanuova.com>
2004-02-27 17:43 ` Linus Torvalds
     [not found] <dbcf45ba64ac8a77cd6a3dc6ba63b94b@plan9.escet.urjc.es>
2004-02-27 17:26 ` Linus Torvalds
2004-02-27 23:22   ` boyd, rounin
     [not found] <20040227081500.14366.qmail@g.galapagos.bx.psu.edu>
2004-02-27  8:58 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2004-02-27  4:45 [9fans] " dbailey27
2004-02-27  6:20 ` [9fans] " Linus Torvalds
2004-02-27  6:31   ` dbailey27
2004-02-27  6:49     ` Linus Torvalds
2004-02-27  6:48       ` dbailey27
2004-02-27  7:04         ` Linus Torvalds
2004-02-27  7:06           ` dbailey27
2004-02-27  7:30             ` a
2004-02-27  7:49               ` dbailey27
2004-02-27  7:39             ` Lucio De Re
2004-02-27  7:57               ` Linus Torvalds
2004-02-27  8:00                 ` Rob Pike
2004-02-27  8:05                 ` Lucio De Re
2004-02-27  8:06                 ` boyd, rounin
2004-02-27  7:47             ` Linus Torvalds
2004-02-27  7:46               ` dbailey27
2004-02-27  8:08                 ` Linus Torvalds
2004-02-27  8:04                   ` dbailey27
2004-02-27  8:19                     ` Geoff Collyer
2004-02-27 15:28                       ` Rob Pike
2004-02-27 16:57                         ` Linus Torvalds
2004-02-27  8:11                   ` Lucio De Re
2004-02-27  8:17                     ` Rob Pike
2004-02-27  8:31                       ` Lucio De Re
2004-02-27  9:46                         ` Linus Torvalds
2004-02-27  8:44                           ` boyd, rounin
2004-02-27 10:00                             ` Linus Torvalds
2004-02-27  9:52                           ` Lucio De Re
2004-02-27 10:00                             ` Charles Forsyth
2004-02-27 10:07                               ` Lucio De Re
2004-02-27 10:14                                 ` Charles Forsyth
2004-02-27 10:24                                   ` Lucio De Re
2004-02-27 11:40                                     ` C H Forsyth
2004-02-28  9:58                               ` Bruce Ellis
2004-02-27 10:11                             ` Linus Torvalds
2004-02-27 10:13                               ` Lucio De Re
2004-02-27 10:36                                 ` Linus Torvalds
2004-02-27 19:07                   ` Donald Brownlee
2004-02-27  7:47               ` Fco.J.Ballesteros
2004-02-27  8:04               ` boyd, rounin
2004-02-29 21:17             ` boyd, rounin
2004-02-27  7:12           ` Rob Pike
2004-02-27  7:17             ` Charles Forsyth
2004-02-27  8:01               ` boyd, rounin
2004-02-27  8:06             ` Scott Schwartz
2004-02-27  8:15               ` Rob Pike
2004-02-27  7:06         ` Lucio De Re
2004-02-27  7:53         ` boyd, rounin
2004-02-27 12:23       ` Dave Lukes
2004-02-27 16:08         ` Linus Torvalds
2004-02-27 16:39           ` Dave Lukes
2004-02-27 17:05             ` Linus Torvalds
2004-02-27 17:03               ` Fco.J.Ballesteros
2004-02-27 17:50               ` Dave Lukes
2004-02-27 18:26                 ` Linus Torvalds
2004-02-27 18:27                   ` matt
2004-02-27 18:39                     ` andrey mirtchovski
2004-02-27 23:39                   ` boyd, rounin
2004-03-01  8:44                     ` Fco.J.Ballesteros
2004-03-01  8:48                       ` Fco.J.Ballesteros
2004-03-01  8:59                         ` Lucio De Re
2004-03-01  9:04                           ` Fco.J.Ballesteros
2004-03-01  9:16                             ` Kenji Okamoto
2004-03-01  9:19                               ` Kenji Okamoto
2004-03-01 15:47                           ` ron minnich
2004-03-01 16:23                             ` lucio
2004-03-01 18:04                               ` viro
2004-03-02  9:37                               ` Douglas A. Gwyn
2004-03-02 10:16                                 ` lucio
2004-03-03  1:36                             ` Kenji Okamoto
2004-03-02  1:40                       ` rob pike, esq.
2004-02-27 23:20               ` boyd, rounin
2004-03-01 10:34               ` Bengt Kleberg
2004-03-01 14:40                 ` Russ Cox
2004-03-01 15:17                   ` boyd
2004-03-02  9:42                   ` Bengt Kleberg
2004-03-02  9:53                     ` Fco.J.Ballesteros
2004-03-02 14:51                     ` ron minnich
2004-03-03  9:33                       ` Bengt Kleberg
2004-03-03 12:59                         ` ron minnich
2004-03-03 13:10                           ` Fco.J.Ballesteros
2004-03-03 13:21                             ` ron minnich
2004-03-04 10:00                               ` Yi Li
2004-03-04 11:22                                 ` Fco.J.Ballesteros
2004-03-05 15:17                                   ` Yi Li
2004-03-03 13:38                             ` rog
2004-03-03 17:57                             ` a
2004-03-01 15:56                 ` ron minnich
2004-03-02  9:42                   ` Bengt Kleberg
2004-02-27 17:32             ` C H Forsyth
2004-02-29 21:10               ` boyd, rounin
2004-03-01  8:19                 ` Charles Forsyth
2004-03-01  8:46                   ` dbailey27
2004-03-01  9:34                     ` David Tolpin
2004-03-01 10:02                       ` Charles Forsyth
2004-03-01 10:12                         ` David Tolpin
2004-03-01 10:40                       ` Charles Forsyth
2004-03-01 11:56                         ` David Tolpin
2004-03-01 17:29                           ` rog
2004-03-02  6:38                         ` 9nut
2004-03-01 19:02                       ` Taj Khattra
2004-03-01 19:15                         ` David Tolpin
2004-03-01 19:22                           ` Joel Salomon
2004-03-01 19:43                             ` David Tolpin
2004-03-01 21:07                               ` Derek Fawcus
2004-03-01 21:12                                 ` David Tolpin
2004-03-02  2:46                                   ` boyd, rounin
2004-03-02  6:02                                     ` David Tolpin
2004-03-02 12:31                                       ` Bruce Ellis
2004-03-02 18:46                                         ` boyd, rounin
2004-03-02 12:19                                   ` Dick Davies
2004-03-02 18:40                                     ` boyd, rounin
2004-03-04  3:52                                     ` Martin C.Atkins
2004-03-04  9:07                                       ` Bruce Ellis
2004-03-01 21:15                                 ` Charles Forsyth
2004-03-01 21:20                                 ` rog
2004-03-02  2:48                                   ` Joel Salomon
2004-03-03  3:58                           ` Martin C.Atkins
2004-03-01  9:36                   ` Geoff Collyer
2004-03-01 12:06                     ` boyd
2004-03-01 14:55                       ` David Presotto
2004-03-01 12:18                   ` boyd
2004-03-01 13:29                     ` Fco.J.Ballesteros
2004-03-01 13:33                       ` lucio
2004-03-01 13:55                       ` boyd
2004-03-02  4:13                   ` Taj Khattra
2004-03-02  4:34                     ` Roman Shaposhnick
2004-03-02  4:47                       ` ron minnich
2004-03-02  5:53                         ` Roman Shaposhnick
2004-03-02  5:58                           ` ron minnich
2004-03-02 15:49                         ` boyd, rounin
2004-03-02  7:00                     ` rob pike, esq.
2004-03-02 20:58                       ` Andrew Simmons
2004-03-02 21:23                         ` boyd, rounin
2004-03-03  7:05                           ` Anastasopoulos S
2004-03-03  5:11                         ` Kenji Okamoto
2004-03-03  5:26                           ` boyd, rounin
2004-03-03  9:49                             ` Bruce Ellis
2004-03-03 12:41                               ` boyd, rounin
2004-03-03  9:42                           ` Bruce Ellis
2004-03-03  7:55                         ` 9nut
2004-02-27  6:59   ` Donald Brownlee
2004-02-27  7:49   ` boyd, rounin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0402270058480.2563@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=9fans@cse.psu.edu \
    --cc=geoff@collyer.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).