mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Martin Vajnar <martin.vajnar@gmail.com>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com, Markus Wichmann <nullplan@gmx.net>,
	 Florian Weimer <fweimer@redhat.com>
Subject: Re: [musl] Backwards kernel compatibility
Date: Wed, 2 Jun 2021 09:38:08 +0200	[thread overview]
Message-ID: <CAHHiRURfq=gY1jmoek1v7=hOayK3u51c7tgR-G24F5S50gptmA@mail.gmail.com> (raw)
In-Reply-To: <20210524220004.GD2546@brightrain.aerifal.cx>

Hi Rich,

thank you for such detailed reply.

út 25. 5. 2021 v 0:00 odesílatel Rich Felker <dalias@libc.org> napsal:
>
> On Mon, May 24, 2021 at 03:52:44PM +0200, Martin Vajnar wrote:
> > Hi, Markus,
> >
> > sorry for the late reply it was quite busy lately. You're describing
> > exactly the issue, we are facing in our project. We need to use old kernel
> > which we have only in binary form and have headers for it. At the same time
> > we would like to have the latest musl running on it.
> >
> > The problem we encounter is that for unsupported (or better said, not
> > supported yet) syscalls we get performance overhead because of the ENOSYS.
>
> Can you give some information on what syscalls these are and if/how
> you measured the performance overhead as being significant?


The main source of overhead comes from the kernel 4.4 which on arm64
produces stack traces when not implemented syscall is invoked:

    https://github.com/torvalds/linux/blob/afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc/arch/arm64/kernel/traps.c#L369

While the kernel is dumping there is noticeable slow down in system response
(in the below case) as the dumping sometimes lasts up to tens of miliseconds
(in the below example the dropbear is running in AArch32 mode):

    [90276.609777] dropbear[29686]: syscall 403
    [90276.611310] Code: 4620e02b f2404629 463a1393 df00461f (f1104617)
    [90276.615265] CPU: 2 PID: 29686 Comm: dropbear Tainted: P
       4.4.60 #1
    [90276.621212] Hardware name: board.A
    [90276.628688] task: ffffffc029fc3700 ti: ffffffc029fc3700
task.ti: ffffffc029fc3700
    [90276.633277] PC is at 0x57168
    [90276.640748] LR is at 0x22223
    [90276.643683] pc : [<0000000000057168>] lr : [<0000000000022223>]
pstate: 20000030
    [90276.646569] sp : 00000000ff94f118
    [90276.653933] x12: 0000000000000007
    [90276.660429] x11: 0000000000000000 x10: 0000000000000000
    [90276.665830] x9 : 0000000000000000 x8 : 0000000000000000
    [90276.678746] x7 : 0000000000000193 x6 : 0000000000010325
    [90276.684039] x5 : 00000000ff94f158 x4 : 0000000000000001
    [90276.689337] x3 : 0000000000000193 x2 : 00000000ff94f130
    [90276.694629] x1 : 00000000ff94f158 x0 : 0000000000000001

> > We see 2 options to approach this:
> >
> >  1. remove the syscalls manually/alter the code to not invoke them (hacky)
> >  2. during musl compile time (maybe even configure-time), parse the
> > supplied kernel headers and based on availability of syscalls use defines
> > to steer the code execution (more universal)
> >
> > Would the 2nd case be something that musl community would be interested in,
> > should we choose to implement it for the project?
>
> No, but hopefully there's a third option: identify whatever place the
> fallback is actual a performance bottleneck and do what we can to
> mitigate it. If it's really bad, saving the result might be an option,
> but we've tried to avoid that both for complexity reasons and because
> it could preclude fixing serious problems (like Y2038 EOL) by
> live-migrating processes to a newer kernel with new syscalls that
> avoid the bug. A better approach is just using the "oldest" syscall
> that can actually do the job, which we already try to do in most
> places in musl, only relying on the newer one for inputs that require
> it. However this is not possible for functions that read back a time,
> since the input is external (e.g. the system clock or the filesystem)
> and it's not known in advance whether the old syscall could represent
> the result.

Yes, I think this is the case as I only saw the stack traces for syscalls
403 (__NR_clock_gettime64) and 397 (__NR_statx).

> It *might* be plausible to memorize the result "new syscall not
> available" but drop that memory whenever we see a result that
> indicates a failure due to use of the outdated syscall. We're kinda
> already doing that with the vdso clock_gettime -- cgt_time32_wrap
> disables itself if it ever sees a negative value for seconds.

Thanks for suggestion. I think this would solve the issue I'm experiencing
and it would definitely be cleaner, than my original idea. It would
significantly
decrease the number of traces produced.

I can prepare patches implementing this for the stat*() and clock_gettime()
and see how it looks compared to the below approach.

> An alternative approach, especially if this is a matter of time64, to
> avoid nonstandard binaries that would be non-future-proof, might be to
> patch your kernel with a loadable module that adds dumb translation
> layers for the syscalls that are performance bottlenecks.
>
> Rich

Regards,
Martin

  reply	other threads:[~2021-06-02  7:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10  5:50 Martin Vajnar
2021-05-10  6:46 ` Florian Weimer
2021-05-10 18:58 ` Markus Wichmann
2021-05-24 13:52   ` Martin Vajnar
2021-05-24 22:00     ` Rich Felker
2021-06-02  7:38       ` Martin Vajnar [this message]
2021-06-02 11:52         ` Arnd Bergmann
2021-06-02 14:56           ` Rich Felker
2021-06-02 16:01             ` Arnd Bergmann
2021-06-02 16:18               ` Arnd Bergmann
2021-06-09  7:03           ` Arnd Bergmann
2021-06-08 22:16       ` Martin Vajnar
2021-06-09  0:37         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHHiRURfq=gY1jmoek1v7=hOayK3u51c7tgR-G24F5S50gptmA@mail.gmail.com' \
    --to=martin.vajnar@gmail.com \
    --cc=dalias@libc.org \
    --cc=fweimer@redhat.com \
    --cc=musl@lists.openwall.com \
    --cc=nullplan@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).