From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 12184 invoked from network); 2 Jun 2021 07:38:36 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 2 Jun 2021 07:38:36 -0000 Received: (qmail 1384 invoked by uid 550); 2 Jun 2021 07:38:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 1366 invoked from network); 2 Jun 2021 07:38:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=BqIL0K4IY1jaH1RyYXVFwTvGDU8nC3VOR57UZkkg+XU=; b=oKGL/eGSblcjjDV0xYMfvhRf5a4JjE5hiGhRK/2q1YGkPLGc+zFuL59L+zzDRWSI8J XDpk/T+qrpkuQB7ogpa9hMR8mGTpdmZrCf8F2K3SfiW3jR47fsDN6ISaNFxAi0OSyZQE s8YqgO4c5hKCm7A7k1rKzlcgj6YIPDa9Q8KQxcUCIoRFy5jMOgHMQpvT/FTXr9cvI7Ol rzhWDZ0H6e6T/q+wltELz+yefZXufJm2TvcaVrl1/xYvHoc/RRuAZHCdAXJefP7FyYDf Lupt6R2lkyJhyhu3ZxveucTppP3BbRjVBAP/LaXi277UJhJaHJOG3mioTgexIR1K0+cv snow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=BqIL0K4IY1jaH1RyYXVFwTvGDU8nC3VOR57UZkkg+XU=; b=f44KUG2j757/W2XfTCvdZPjHJEru+lAKfaUr/4XbDFkAuXB/mh2QtiuhOe/SXDkFI9 uMImYli+cRUOESEdNJCoXxYVe7E/y2kp+S1r+EM0fiiV0LzzxRnc4XzYI8kevdVxgRA0 KKYPRxLaM279ViK+yS0e4rRWm9NCy74zEQdNfdiOgbuaC6hZAQKYfy0SaKE64Faoz/CR x38Y+yhRBwOaL++ZYtt5R26NcZXuqiFdtdv9FpjYTcD0p1KQJwvjP+GuUEm0fZLSMysU bqmXX6Fs5JZIcffBWh+KuUpV+IC2xnWACoNOWYTATXhZqilg08QClqzGvuxg4ptfRduk WA+g== X-Gm-Message-State: AOAM533g+RHubQ/xMtzLuCBf+xWriQR68Gn9fFoC1VNt+rqMvd08PzYM 2hBqKpoeGwPWdBKQE6vWLmBupXQ6BaaK+V5C5E4= X-Google-Smtp-Source: ABdhPJyW2vpJ0nimdTzGknOMUlsfBHL46vzI/seDcArOU10XIh+9kdOEjTPqLxJr3mdyGownw8A70ajY59QioMSeCsY= X-Received: by 2002:a19:8085:: with SMTP id b127mr21702373lfd.353.1622619500445; Wed, 02 Jun 2021 00:38:20 -0700 (PDT) MIME-Version: 1.0 References: <20210510185837.GD2031@voyager> <20210524220004.GD2546@brightrain.aerifal.cx> In-Reply-To: <20210524220004.GD2546@brightrain.aerifal.cx> From: Martin Vajnar Date: Wed, 2 Jun 2021 09:38:08 +0200 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com, Markus Wichmann , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [musl] Backwards kernel compatibility Hi Rich, thank you for such detailed reply. =C3=BAt 25. 5. 2021 v 0:00 odes=C3=ADlatel Rich Felker na= psal: > > On Mon, May 24, 2021 at 03:52:44PM +0200, Martin Vajnar wrote: > > Hi, Markus, > > > > sorry for the late reply it was quite busy lately. You're describing > > exactly the issue, we are facing in our project. We need to use old ker= nel > > which we have only in binary form and have headers for it. At the same = time > > we would like to have the latest musl running on it. > > > > The problem we encounter is that for unsupported (or better said, not > > supported yet) syscalls we get performance overhead because of the ENOS= YS. > > Can you give some information on what syscalls these are and if/how > you measured the performance overhead as being significant? The main source of overhead comes from the kernel 4.4 which on arm64 produces stack traces when not implemented syscall is invoked: https://github.com/torvalds/linux/blob/afd2ff9b7e1b367172f18ba7f693dfb6= 2bdcb2dc/arch/arm64/kernel/traps.c#L369 While the kernel is dumping there is noticeable slow down in system respons= e (in the below case) as the dumping sometimes lasts up to tens of milisecond= s (in the below example the dropbear is running in AArch32 mode): [90276.609777] dropbear[29686]: syscall 403 [90276.611310] Code: 4620e02b f2404629 463a1393 df00461f (f1104617) [90276.615265] CPU: 2 PID: 29686 Comm: dropbear Tainted: P 4.4.60 #1 [90276.621212] Hardware name: board.A [90276.628688] task: ffffffc029fc3700 ti: ffffffc029fc3700 task.ti: ffffffc029fc3700 [90276.633277] PC is at 0x57168 [90276.640748] LR is at 0x22223 [90276.643683] pc : [<0000000000057168>] lr : [<0000000000022223>] pstate: 20000030 [90276.646569] sp : 00000000ff94f118 [90276.653933] x12: 0000000000000007 [90276.660429] x11: 0000000000000000 x10: 0000000000000000 [90276.665830] x9 : 0000000000000000 x8 : 0000000000000000 [90276.678746] x7 : 0000000000000193 x6 : 0000000000010325 [90276.684039] x5 : 00000000ff94f158 x4 : 0000000000000001 [90276.689337] x3 : 0000000000000193 x2 : 00000000ff94f130 [90276.694629] x1 : 00000000ff94f158 x0 : 0000000000000001 > > We see 2 options to approach this: > > > > 1. remove the syscalls manually/alter the code to not invoke them (hac= ky) > > 2. during musl compile time (maybe even configure-time), parse the > > supplied kernel headers and based on availability of syscalls use defin= es > > to steer the code execution (more universal) > > > > Would the 2nd case be something that musl community would be interested= in, > > should we choose to implement it for the project? > > No, but hopefully there's a third option: identify whatever place the > fallback is actual a performance bottleneck and do what we can to > mitigate it. If it's really bad, saving the result might be an option, > but we've tried to avoid that both for complexity reasons and because > it could preclude fixing serious problems (like Y2038 EOL) by > live-migrating processes to a newer kernel with new syscalls that > avoid the bug. A better approach is just using the "oldest" syscall > that can actually do the job, which we already try to do in most > places in musl, only relying on the newer one for inputs that require > it. However this is not possible for functions that read back a time, > since the input is external (e.g. the system clock or the filesystem) > and it's not known in advance whether the old syscall could represent > the result. Yes, I think this is the case as I only saw the stack traces for syscalls 403 (__NR_clock_gettime64) and 397 (__NR_statx). > It *might* be plausible to memorize the result "new syscall not > available" but drop that memory whenever we see a result that > indicates a failure due to use of the outdated syscall. We're kinda > already doing that with the vdso clock_gettime -- cgt_time32_wrap > disables itself if it ever sees a negative value for seconds. Thanks for suggestion. I think this would solve the issue I'm experiencing and it would definitely be cleaner, than my original idea. It would significantly decrease the number of traces produced. I can prepare patches implementing this for the stat*() and clock_gettime() and see how it looks compared to the below approach. > An alternative approach, especially if this is a matter of time64, to > avoid nonstandard binaries that would be non-future-proof, might be to > patch your kernel with a loadable module that adds dumb translation > layers for the syscalls that are performance bottlenecks. > > Rich Regards, Martin