supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Update on the progress of slew development
@ 2019-03-17 13:25 Casper Ti. Vector
  2019-03-17 14:30 ` Oliver Schad
  2019-09-27 17:42 ` Update on the progress of slew development Casper Ti. Vector
  0 siblings, 2 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-17 13:25 UTC (permalink / raw)
  To: supervision

Since the first announcement [1] of slew [2], a few people expressed
interest in the project, but I have received little feedback regarding
its technical contents.  Therefore although I have successfully deployed
slew on a few real-life systems, it is still quite a slowly moving
personal hobby project.  However, there are a few changes which I think
might be interesting to some people here, which is briefly summarised in
this mail, 5 days before the project's one-year anniversary.

[1] <https://skarnet.org/cgi-bin/archive.cgi?2:mss:1945:201803:pdabbgogplcnfhcmpgkg>.
[2] <https://gitlab.com/CasperVector/slew>.

Previously, slew would only save /run/uncaught-logs/current to
/var/log/init (so rotated log files would be ignored) on shutdown (so
no saved logs if the system crashed).  Now the log-saving mechanism is
implemented in an s6-log rotation processor (`init/save_log.rc'), which
would do the task with best effort (if /var/log/init is unwritable for
too long and the catch-all logger is fed with a large stream, the head
of the stream might be discarded anyway).  The `local' oneshot and
`init/rc.halt' would trigger the mechanism by sending the logger
SIGALRM; currently, the remaining issue is that s6-log would not run
the processor upon SIGALRM if `current' is empty, so a temporary write
failure in /var/log/init plus an unfortunate amount of log (no new bytes
after the rotated and unsaved log) would result in discarded logs.

Other noteworthy changes:
* The issue about slew's fault tolerane mentioned in the original
  announcement has been largely solved.
* "Methods" for polymorphic services are supported: see `lib/fn' and
  `misc/wpa_supplicant/wpa_cli.rc' for an example.
* Information can be passed through the kernel command line to slew: see
  `lib/kcmd.rc', and `init/{load,save}_clock.rc' for an example.

Finally, as some people strongly complained [3] about this issue, I
would like to ask for your opinions about the naming convention: what do
you prefer, more "standardised" names like `wpa_supplicant.wlan0.log' or
easier-to-type names like `wpasup.wlan0.log'?  I can switch to a new
convention if you overwhelmingly support it or if I find a very
convincing argument for it, but I need to be really sure that I would
*not* need to change the convention *more than once*.

[3] <https://forums.gentoo.org/viewtopic-t-1079878-start-25.html>.

Suggestions and questions are welcome.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-17 13:25 Update on the progress of slew development Casper Ti. Vector
@ 2019-03-17 14:30 ` Oliver Schad
  2019-03-18 14:44   ` Casper Ti. Vector
  2019-03-19 12:42   ` Casper Ti. Vector
  2019-09-27 17:42 ` Update on the progress of slew development Casper Ti. Vector
  1 sibling, 2 replies; 15+ messages in thread
From: Oliver Schad @ 2019-03-17 14:30 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

Hi Casper,

thanks for the project. I have to say, that I didn't get really, that
the implementation of jobs is a goal of slew, because most jobs
didn't fit for us.

So I understood the whole project as a nice integration layer for s6,
but we changed every unit, changed reboot, halt, the build-script and
so on.

So in the end Slew was great to understand, how s6 could be integrated
as a pattern. But the units/scripts itself didn't work for us.

https://gitlab-2.asag.io/snippets/7

So may I ask directly: is the plan to provide scripts/units for
everyone, which works almost out of the box?

Best Regards
Oli


-- 
Automatic-Server AG •••••
Oliver Schad
Geschäftsführer
Turnerstrasse 2
9000 St. Gallen | Schweiz

www.automatic-server.com | oliver.schad@automatic-server.com
Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-17 14:30 ` Oliver Schad
@ 2019-03-18 14:44   ` Casper Ti. Vector
  2019-03-19 12:10     ` Casper Ti. Vector
  2019-03-19 12:42   ` Casper Ti. Vector
  1 sibling, 1 reply; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-18 14:44 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

On Sun, Mar 17, 2019 at 03:30:02PM +0100, Oliver Schad wrote:
> So in the end Slew was great to understand, how s6 could be integrated
> as a pattern. But the units/scripts itself didn't work for us.

I personally use Alpine for servers and Void for desktops, and so did
not know what problems distributer might encounter in Debian/Ubuntu.  So
first of all I need to thank you for attempting to use slew on real-life
systems, which is exactly how the slew codebase can evolve to suit more
application scenarios.

> https://gitlab-2.asag.io/snippets/7

I constructed a slew-managed Ubuntu system with only essential services,
udhcpc on eth0, and sshd, reproducible with the following steps:
* Install Ubuntu on a VM with `ubuntu-18.04.2-server-amd64.iso'.
  (Using the US keymap, and with SSH server enabled).
* Build static execline, s6 and s6-rc using attached `ska-build.sh' (as
  root), and tailor slew for Ubuntu using attached `slew-build.sh'.
  (Better done on an Alpine VM because Ubuntu does not use musl.)
* Transfer the `pkgs' directory (with its contents, all produced in the
  step above) to the Ubuntu VM, run (as root) attached `slew-build.sh'
  in the directory where `pkgs' reside.

I personally find the changes fairly minor, except for these issues:
* Debian/Ubuntu do not package eudev, so I used `/sbin/udevd' from
  Devuan as a workaround; to ensure basic safety, you definitely need to
  package this yourself for your customised Debian/Ubuntu systems.
* One other nuisance is that while the slew-managed system uses ~32M
  memory after booting, the dracut-generated initramfs barely loads even
  with 256M, which is an important reason for avoiding Ubuntu.

> So may I ask directly: is the plan to provide scripts/units for
> everyone, which works almost out of the box?

Linux distros are too diverse for slew to fully accomodate, but slew
has been designed from the beginning with flexibility in mind: once you
successfully customise it for the expected average case of a distro, the
user-level customisations would be fairly easy.  And as you can see from
the attached scripts, the distro-level customisations are, while perhaps
non-trivial, quite manageable.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


[-- Attachment #2: slew-ubuntu.tgz --]
[-- Type: application/x-gtar, Size: 1525 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-18 14:44   ` Casper Ti. Vector
@ 2019-03-19 12:10     ` Casper Ti. Vector
  0 siblings, 0 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-19 12:10 UTC (permalink / raw)
  To: supervision

On Mon, Mar 18, 2019 at 10:44:43PM +0800, Casper Ti. Vector wrote:
> * Transfer the `pkgs' directory (with its contents, all produced in the
>   step above) to the Ubuntu VM, run (as root) attached `slew-build.sh'
>   in the directory where `pkgs' reside.

Here I actually meant `ubuntu-conf.sh' instead of `slew-build.sh'.
Additionally, one thing `ubuntu-conf.sh' did not say clearly is that
Devuan's binary package for eudev should be downloaded *into the `pkgs'
directory*.  And in case you like smaller files, `ska-build.sh' makes
unstripped binaries; stripping them reduces the total size of execline,
s6 and s6-rc binary packages from ~5.5M to ~1M.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-17 14:30 ` Oliver Schad
  2019-03-18 14:44   ` Casper Ti. Vector
@ 2019-03-19 12:42   ` Casper Ti. Vector
  2019-03-19 15:25     ` Casper Ti. Vector
  2019-03-19 15:58     ` Oliver Schad
  1 sibling, 2 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-19 12:42 UTC (permalink / raw)
  To: supervision

On Sun, Mar 17, 2019 at 03:30:02PM +0100, Oliver Schad wrote:
> https://gitlab-2.asag.io/snippets/7

A closer look at this snippet reveals that most changes therein are:
1. Customisations of `s6-log.rc', probably modifying the logging user.
2. Addition of unshipped services (eg. postfix).
3. Deletion of unused services (eg. dcron and busybox cron).
4. Other regular customisations similar to what I showed yesterday.

Regarding 1, if you (as I guessed) want to just use `nobody' for all
logs (which is, BTW, strongly discouraged!), I have just pushed commit
a1ffc647 so you can simply delete `s6-log.rc' of services for this
purpose, provided that you do not plan to change the $args or $logd of
these services.

Regarding 2 and 3, and noticing the presence of `db' and `prep.main'
(BTW, `prep.main' and `db/old.main' are supposed to be deleted after
successful `lib/build.rc'/`s6-rc-update' invocations), it is likely that
you directly cloned slew's git repository into /etc.  I personally think
this makes updating complicated; and since git ignores ownership,
permission bits (except for the `x' bit) and empty directories, the
structure of /etc/slew would drift from the expected status in certain
conditions.

Instead, I think a better way to distribute slew is to break it into
multiple packages for the intended distro, with the packaging script(s)
essentially performing the jobs of yesterday's `slew-build.sh' and
`ubuntu-conf.sh':
* A "base" package: including the `init' / `run' directories, absolutely
  essential services in `base', and a small `main' config somewhat like
  the shipped one).
* Multiple packages for other services (eg. OpenVPN and wpa_supplicant):
  each including the necessary service definitions in `base', and
  corresponding ancillary files in `misc'.
* The most important ancillary files are preprocessing passes like
  `misc/openvpn/70-openvpn.rc', which should of course be installed into
  /etc/slew/lib/prep.  They are not directly put into `lib/prep' because
  unlike extra service definitions in `base', extra preprocessing passes
  results in actual overhead when `lib/prep.rc' is run.  (The user can
  disable preprocessing passes by removing the `x' bit from the
  corresponding files, cf. `lib/prep.rc').
* Patches like `misc/thinkfan/thinkfan-0.9.3-fglog.patch' are intended
  to be applied to distro packages to increase their compliance with
  s6's way of longrun management (usually about logging), and therefore
  can be omitted from the service packages.  Other ancillary files are
  intended to be installed into locations outside of /etc/slew, like
  `misc/wpa_supplicant/wpa_cli.rc' should be installed into
  /etc/wpa_supplicant, in accordance with `base/wpacli./run'.
Perhaps these intentions were not as clear as I thought they should
implicitly (inferred from the codebase) be; I am sorry for that.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-19 12:42   ` Casper Ti. Vector
@ 2019-03-19 15:25     ` Casper Ti. Vector
  2019-03-19 15:58     ` Oliver Schad
  1 sibling, 0 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-19 15:25 UTC (permalink / raw)
  To: supervision

On Tue, Mar 19, 2019 at 08:42:39PM +0800, Casper Ti. Vector wrote:
> * The most important ancillary files are preprocessing passes like
>   `misc/openvpn/70-openvpn.rc', which should of course be installed into
>   /etc/slew/lib/prep.  They are not directly put into `lib/prep' because
>   unlike extra service definitions in `base', extra preprocessing passes
>   results in actual overhead when `lib/prep.rc' is run.  (The user can
>   disable preprocessing passes by removing the `x' bit from the
>   corresponding files, cf. `lib/prep.rc').

A more important reason for not directly putting extra preprocessing
passes into `lib/prep': although I have always been carefully preventing
these passes from conflicting with each other, I cannot be sure that
conflicts would always be avoidable, especially with certain distro
packages that are mutually exclusive.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-19 12:42   ` Casper Ti. Vector
  2019-03-19 15:25     ` Casper Ti. Vector
@ 2019-03-19 15:58     ` Oliver Schad
       [not found]       ` <20190320051439.GA7636@caspervector>
  2019-03-20  5:14       ` Casper Ti. Vector
  1 sibling, 2 replies; 15+ messages in thread
From: Oliver Schad @ 2019-03-19 15:58 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]

On Tue, 19 Mar 2019 20:42:39 +0800
"Casper Ti. Vector" <caspervector@gmail.com> wrote:

> On Sun, Mar 17, 2019 at 03:30:02PM +0100, Oliver Schad wrote:
> > https://gitlab-2.asag.io/snippets/7  
> 
> A closer look at this snippet reveals that most changes therein are:
> 1. Customisations of `s6-log.rc', probably modifying the logging user.

Exactly - it doesn't make sense for us to have for every service it's
own logging user. So I defined a common log user.

> 4. Other regular customisations similar to what I showed yesterday.

I had to change the some files in init/
https://gitlab-2.asag.io/snippets/8 

Some changes in lib/ to have a working update:
https://gitlab-2.asag.io/snippets/9

killall5 didn't work at all, the "sync" was added just in case mounting
read-only doesn't work, emtpyenv kills the container environment, which
is needed, clock adjustment doesn't work inside a container.

Rest later ...

Best Regards
Oli

-- 
Automatic-Server AG •••••
Oliver Schad
Geschäftsführer
Turnerstrasse 2
9000 St. Gallen | Schweiz

www.automatic-server.com | oliver.schad@automatic-server.com
Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-19 15:58     ` Oliver Schad
       [not found]       ` <20190320051439.GA7636@caspervector>
@ 2019-03-20  5:14       ` Casper Ti. Vector
  2019-03-20 11:51         ` Casper Ti. Vector
  2019-05-04  6:07         ` Casper Ti. Vector
  1 sibling, 2 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-20  5:14 UTC (permalink / raw)
  To: supervision

On Tue, Mar 19, 2019 at 04:58:53PM +0100, Oliver Schad wrote:
> Exactly - it doesn't make sense for us to have for every service it's
> own logging user. So I defined a common log user.

Using separate logging users is mainly due to security reasons: though
s6-log is very reliable, reasonably more privilege separation still seem
desirable; see the example service definitions shipped with s6/s6-rc, as
well as qmail's practice of privilege separation.  And it is also easy
to do this: just put `useradd'/`adduser' invocations in post-install
hooks of the slew-related packages.

On the other hand, if you really do not want to use separate logging
users, at least do not use `nobody' (sorry for not considering this
yesterday; I have just pushed commit d5cb508b to correct this): nowadays
too many programs use `nobody' as a privilege separation user (I know
the OpenBSD people consciously avoid this, but many Linux distro
developers do not seem aware), so all these service would be in danger
if `nobody' gets compromised, especially considering that these services
are usually much easier targets for attacks than s6-log is.

> https://gitlab-2.asag.io/snippets/9

`cd /etc/slew/db' seems unnecessary as the source filename does not need
to be resolvable in $CWD of ln(1).

`s6-rc-update /etc/slew/db/compiled' is intentionally not invoked in
`lib/build.rc' because in certain conditions (eg. when some services
are renamed), the old-named service would be stopped and then the
new-named service would be started.  More seriously, since the effects
of `up' scripts for certain oneshots are reversed not in the
corresponding `down' scripts (eg. `base/fstab/pre/up' is reversed in
`init/rc.halt'), they are often not reentrant, so renaming of their
dependencies and naiively invoking `s6-rc-update /etc/slew/db/compiled'
may result in service outage that must be fixed manually.

The `-f' option of `s6-rc-update' is intended exactly for this scenario,
and the intended way to use `lib/build.rc' is:
: # Perhaps write a conversion file `conv', according to the output.
: /etc/slew/lib/build.rc main
: s6-rc-update [-f conv] /etc/slew/db/compiled
: rm -rf /etc/slew/db/old.main /etc/slew/prep.main

> killall5 didn't work at all

Fixed (provided that sysvinit killall(8) is included in the container)
in commit 3f246b20 the day before yesterday :)

> the "sync" was added just in case mounting read-only doesn't work

pkill(1), killall(1) and killall5(8) all retrieve a process list and
kill them one by one, instead of calling kill(-1, signal), so a race
condition can happen thats let some process escape the final SIGKILL.
Since pkill(1) and killall(1) use regex matching, the probability for
the race can be significantly larger.

To be 100% sure no process (except for PID 1) escapes that signal, you
can use `s6-nuke' from s6-portable-utils.  `kill -signal -- -1' should
theoretically do similar things, but kill(1) from coreutils and busybox
do not seem to behave in this way.

> emtpyenv kills the container environment, which is needed

You can replace
: /bin/emptyenv
: /bin/export PATH /usr/sbin:/usr/bin:/sbin:/bin
with something like
: /bin/importas PIA PIA
: /usr/bin/env -i PATH=/usr/sbin:/usr/bin:/sbin:/bin PIA=${PIA}

> clock adjustment doesn't work inside a container.

Then it is simpler to set `clock=()' in /etc/slew/lib/slew.rc :)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-20  5:14       ` Casper Ti. Vector
@ 2019-03-20 11:51         ` Casper Ti. Vector
  2019-05-04  6:07         ` Casper Ti. Vector
  1 sibling, 0 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-03-20 11:51 UTC (permalink / raw)
  To: supervision

On Wed, Mar 20, 2019 at 01:14:39PM +0800, Casper Ti. Vector wrote:
> Fixed (provided that sysvinit killall(8) is included in the container)
> in commit 3f246b20 the day before yesterday :)

I forgot to note that sending SIGHUP is unnecessary, and `rc.halt' did
this previously because of my misunderstanding of how the catch-all
logger was stopped.  `s6-svc -X /run/service/s6-svscan.log' lets the
corresponding `s6-supervise' process immediately close its fds 0/1/2 and
wait to exit after the catch-all logger exits, so the logger's input is
only connected to outputs of unlogged services (except for itself) not
stopped in `rc.fin' (which should not exist if `s6-rc -d change ...'
succeeded) and all stray processes that has inherited fds from parents
that directly wrote to the catch-all logger.

If no such processes remained at this moment, the SIGHUP (intended to
stop the catch-all logger because it is started with the `-b' option of
`s6-log') would have no effect; if they existed, sending SIGHUP
immediately after SIGTERM/SIGCONT would probably result in lost logs for
the final outputs of them, and simply letting the impending SIGKILL kill
the logger would be better.  Additionally, I have just realised that
these final logs were not considered by the current version of
`rc.halt', and then pushed commit 5911f892 to fix this.

> something like
> : /bin/importas PIA PIA
> : /usr/bin/env -i PATH=/usr/sbin:/usr/bin:/sbin:/bin PIA=${PIA}

Note that here correctness of the environment is ensured whether $PIA is
non-empty, empty or unset, because in the last case `importas' would
simply delete `PIA=${PIA}' from the command line to be exec()ed.  (Kudos
to Laurent, of course :)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-20  5:14       ` Casper Ti. Vector
  2019-03-20 11:51         ` Casper Ti. Vector
@ 2019-05-04  6:07         ` Casper Ti. Vector
  2019-05-05  1:55           ` race condition in killall sysinit
  1 sibling, 1 reply; 15+ messages in thread
From: Casper Ti. Vector @ 2019-05-04  6:07 UTC (permalink / raw)
  To: supervision

On Wed, Mar 20, 2019 at 01:14:39PM +0800, Casper Ti. Vector wrote:
> pkill(1), killall(1) and killall5(8) all retrieve a process list and
> kill them one by one, instead of calling kill(-1, signal), so a race
> condition can happen thats let some process escape the final SIGKILL.
> Since pkill(1) and killall(1) use regex matching, the probability for
> the race can be significantly larger.
> 
> To be 100% sure no process (except for PID 1) escapes that signal, you
> can use `s6-nuke' from s6-portable-utils.  `kill -signal -- -1' should
> theoretically do similar things, but kill(1) from coreutils and busybox
> do not seem to behave in this way.

Reading recent posts on this mail list, I have noticed that the sentence
about kill(1) was incorrect because:
* POSIX does not require `kill -signal -- -1', but just `kill -- -1' *in
  addition to* `kill -signal -1'.
* `kill -signal -1' does do the desired job, and I erronously thought
  it did not, because what I tried was `{/bin/kill,busybox kill} -15 -1'
  in the shell of a test user, but the shell trapped SIGTERM (and Linux
  does not send the signal to the calling process of kill(-1, signal)).
* coreutils does not implement kill(1); busybox, util-linux and procps
  do.  Anyway, `kill -{signum,SIGNAME} -1' is required by POSIX.

slew has been updated (commit 593e6174) to use kill(1), and its users
no longer need to worry about the theoretical possibility about
comets [1] escaping the final SIGKILL.
[1] <https://turing.une.edu.au/~cosc330/lectures/lecture_03/lecture_03.html>

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* race condition in killall
  2019-05-04  6:07         ` Casper Ti. Vector
@ 2019-05-05  1:55           ` sysinit
  2019-05-07 14:46             ` Casper Ti. Vector
  2019-05-11 18:29             ` Guillermo
  0 siblings, 2 replies; 15+ messages in thread
From: sysinit @ 2019-05-05  1:55 UTC (permalink / raw)
  To: supervision


> pkill(1), killall(1) and killall5(8) all retrieve a process list and
> kill them one by one, instead of calling kill(-1, signal), so a race
> condition can happen thats let some process escape the final SIGKILL.

interesting. i have not considered this at all.
looks like kill( -1, sig ) from process #1 ensures correctnes here
in a cheap and simple way.
so splitting stage 3 into 2 parts seems to be a good approach.

> Since pkill(1) and killall(1) use regex matching, the probability for
> the race can be significantly larger.

since they do more work to select processes and hence need more time when
iterating the PID dirs in the procfs ?
though i doubt they use any matching at all when tasked with killing all
processes and probably behave like the killall5 utility in this situation.

OpenRC also provides a tool for that task btw:
/libexec/rc/bin/kill_all

it uses the kvm method to find running processes on the BSDs and the procfs
on Linux.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: race condition in killall
  2019-05-05  1:55           ` race condition in killall sysinit
@ 2019-05-07 14:46             ` Casper Ti. Vector
  2019-05-11 18:29             ` Guillermo
  1 sibling, 0 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-05-07 14:46 UTC (permalink / raw)
  To: supervision

On Sun, May 05, 2019 at 03:55:51AM +0200, sysinit@yandex.com wrote:
> since they do more work to select processes and hence need more time
> when iterating the PID dirs in the procfs?  though i doubt they use
> any matching at all when tasked with killing all processes and
> probably behave like the killall5 utility in this situation.

The original snippet on gitlab-2.asag.io used `pkill -SIG .'; and it
seemed that they had really encounterred escaped processes, because they
said:
> the "sync" was added just in case mounting read-only doesn't work

> OpenRC also provides a tool for that task btw:
> /libexec/rc/bin/kill_all
> it uses the kvm method to find running processes on the BSDs and the
> procfs on Linux.

Or use chainloading and `kill -9 -1': both are simple and portable.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: race condition in killall
  2019-05-05  1:55           ` race condition in killall sysinit
  2019-05-07 14:46             ` Casper Ti. Vector
@ 2019-05-11 18:29             ` Guillermo
  2019-05-11 19:26               ` Laurent Bercot
  1 sibling, 1 reply; 15+ messages in thread
From: Guillermo @ 2019-05-11 18:29 UTC (permalink / raw)
  To: Supervision

El sáb., 4 may. 2019 a las 22:55, sysinit escribió:
>
> > pkill(1), killall(1) and killall5(8) all retrieve a process list and
> > kill them one by one, instead of calling kill(-1, signal), so a race
> > condition can happen thats let some process escape the final SIGKILL.
>
> interesting. i have not considered this at all.
> looks like kill( -1, sig ) from process #1 ensures correctnes here
> in a cheap and simple way.

I haven't looked at pkill or killall, but it should be noted that
killall5 is supposed to *not* send signals to everyone: process 1,
processes in the same session (in the POSIX sense), and processes
specified with the -o option, if given, are excluded. So it has to
retrieve a process list and classify. If the signal is SIGKILL and
killall5 is used in a shell script, the session thing generally allows
the shell process to continue execution after invocation of the
program. And, I suppose, it also allows the process that invoked the
script and maybe other ancestors, such as rc subsystem components, to
continue execution as well.

However, both sysvinit's and BusyBox's kilall5 make a kill(-1,
SIGSTOP) call before going through the PID list and selectively
sending the requested signal (and I guess Linux does not deliver
SIGSTOP to the process that contains the call, or it would be
pointless), and make a kill(-1, SIGCONT) call when they are done, so
I'm not sure if there's actually a race condition.

But yeah, in a version 0.4.x.x s6-linux-init-style setup, where the
stage 3 init can just spawn a process that makes a kill(-1, sig) call,
all this is not needed, and just using 'kill -KILL -1' or some
equivalent is probably the simplest alternative. BTW, the kill program
from procps 3.3.15 segfaulted when I tried to use it with a -1 PID
argument :/ BusyBox's kill applet, as well as Bash's builtin kill
utility (i.e. sh -c 'kill -KILL -1') did work when used like this. I
haven't tried s6-nuke, but I'm assuming it works since
s6-linux-init-04.x.x relies on it, and haven't tried util-linux's and
GNU Coreutils' kill either.

> OpenRC also provides a tool for that task btw:
> /libexec/rc/bin/kill_all

Yeah, ${LIBEXECDIR}/bin/kill_all works like kilall5. OpenRC used to
have a killall5 invocation in its 'killprocs' service script, which
meant a runtime dependency on a package that provided the program.
Probably not a problem in a sysvinit + OpenRC or BusyBox init + OpenRC
setup, but ugly in a 'pure' OpenRC setup (i.e. with openrc-init as
process 1).

G.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: race condition in killall
  2019-05-11 18:29             ` Guillermo
@ 2019-05-11 19:26               ` Laurent Bercot
  0 siblings, 0 replies; 15+ messages in thread
From: Laurent Bercot @ 2019-05-11 19:26 UTC (permalink / raw)
  To: Supervision

>However, both sysvinit's and BusyBox's kilall5 make a kill(-1,
>SIGSTOP) call before going through the PID list and selectively
>sending the requested signal (and I guess Linux does not deliver
>SIGSTOP to the process that contains the call, or it would be
>pointless), and make a kill(-1, SIGCONT) call when they are done, so
>I'm not sure if there's actually a race condition.

Right, I failed to mention that, sorry. It is true that killall5
doesn't have a real race condition... because it stops the world
before killing a set of processes, and restarts the world afterwards.
This is clearly not something I want to use on systems I manage, even
if it technically accomplishes its goal.

--
Laurent



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Update on the progress of slew development
  2019-03-17 13:25 Update on the progress of slew development Casper Ti. Vector
  2019-03-17 14:30 ` Oliver Schad
@ 2019-09-27 17:42 ` Casper Ti. Vector
  1 sibling, 0 replies; 15+ messages in thread
From: Casper Ti. Vector @ 2019-09-27 17:42 UTC (permalink / raw)
  To: supervision

On Sun, Mar 17, 2019 at 09:25:32PM +0800, Casper Ti. Vector wrote:
> Since the first announcement [1] of slew [2], a few people expressed
> interest in the project, but I have received little feedback regarding
> its technical contents.  Therefore although I have successfully deployed
> slew on a few real-life systems, it is still quite a slowly moving
> personal hobby project.  However, there are a few changes which I think
> might be interesting to some people here, which is briefly summarised in
> this mail, 5 days before the project's one-year anniversary.

To facilitate distro packaging, I changed the slew repository structure,
so that all files specific to a package are put in the corresponding
subdirectory, eg. (with whitespace squeezed in the output):
> % ls pkg/wpa_supplicant/*
> pkg/wpa_supplicant/base: wpacli. wpasup.
> pkg/wpa_supplicant/lib: prep
> pkg/wpa_supplicant/misc: wpa_cli.rc
> % ls pkg/dhcpcd/*
> pkg/dhcpcd/base: dhcpcd.
> pkg/dhcpcd/lib: fn
Please see the updated manual for details:
<https://gitlab.com/CasperVector/slew/blob/master/Manual>.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-27 17:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-17 13:25 Update on the progress of slew development Casper Ti. Vector
2019-03-17 14:30 ` Oliver Schad
2019-03-18 14:44   ` Casper Ti. Vector
2019-03-19 12:10     ` Casper Ti. Vector
2019-03-19 12:42   ` Casper Ti. Vector
2019-03-19 15:25     ` Casper Ti. Vector
2019-03-19 15:58     ` Oliver Schad
     [not found]       ` <20190320051439.GA7636@caspervector>
2019-03-20  5:14       ` Casper Ti. Vector
2019-03-20 11:51         ` Casper Ti. Vector
2019-05-04  6:07         ` Casper Ti. Vector
2019-05-05  1:55           ` race condition in killall sysinit
2019-05-07 14:46             ` Casper Ti. Vector
2019-05-11 18:29             ` Guillermo
2019-05-11 19:26               ` Laurent Bercot
2019-09-27 17:42 ` Update on the progress of slew development Casper Ti. Vector

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).