From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13314 Path: news.gmane.org!.POSTED!not-for-mail From: Rabbitstack Newsgroups: gmane.linux.lib.musl.general Subject: Re: setrlimit hangs the process Date: Tue, 25 Sep 2018 16:54:37 +0200 Message-ID: References: <20180925141551.GE10209@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000aacbd00576b349c8" X-Trace: blaine.gmane.org 1537887179 21006 195.159.176.226 (25 Sep 2018 14:52:59 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 25 Sep 2018 14:52:59 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-13330-gllmg-musl=m.gmane.org@lists.openwall.com Tue Sep 25 16:52:55 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1g4ohr-0005NR-8D for gllmg-musl@m.gmane.org; Tue, 25 Sep 2018 16:52:55 +0200 Original-Received: (qmail 11354 invoked by uid 550); 25 Sep 2018 14:55:04 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 11336 invoked from network); 25 Sep 2018 14:55:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=L/uP16tq+Io3p0gHo3r0Tz7mo6p7f78yI8/j769vLA0=; b=EvJ8/MP5WkhhnEvR+JXX/ZN4WezyIR4+6EQipAxk6JPVdQI+OIdwO2ZzdspDfA/3Il gegH4P/is4AxlD8wqCQxulH+lTpMPmCO2rOLyhYrI7jRWKvdKhmGoXsTaq9X23Dq2eHB R8EEqb/vY/Tqj6LW+u3ElKcGiTK0jp5BXckNI/uFhNOeCdGm3FE/9yWPi6v640tje8Z3 I0aE+tvqTGeFHr/iCHBov8g24RKG7E/58qIPfENJj8ra2z0JIU/Wuff4Cdos1RIXri7H np+/QdMTrt+1Tkmm7arpGufpiBLQle1kbfCSdQFg7wxm76FGDA9NMY7t/Nx1LKRE0ckL b5Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=L/uP16tq+Io3p0gHo3r0Tz7mo6p7f78yI8/j769vLA0=; b=iDL3ZTmzCAiimT33zV62EkqHcAeGN8jfw+HLf47hl1oqoNMAvEwcd70x5e9x6pT8fp cehS41BJ0lktHE0lsULYMt7mmji7+eC/D1RGFOi2ZofAuPdXSO/mqeKAWT40Qblbt4Yg lUjUHso9dEqIa55+dw3ALbacEwYgH0i+tSwoMG2fhLk45JQwzQqsLyZYh71wD3vdneRJ 9ZfnlT6rEzF5SrJecU4G0OEFHRddS94DjKYF+xjX6smZJK5Byn1vBqKm0jlIcFshE3d0 r9gaF8VfzugZ0hnDgytSjVXdESY1ujZPV+SLt6rkE61oCxw8Y6J1KR93fdImrTPCxyQ0 0oGw== X-Gm-Message-State: ABuFfoigtkQk4DRTUSnTJJESSx1ZifmYhz1JULNOFHClluQYcFmeiw+3 HWCZf5vi+FbYz+0nygKxSAz7s93EsI/xWObg+CkI7w8= X-Google-Smtp-Source: ACcGV622f1Jmpb35mIMqzNFgyXCA/rzeUyeHkYlDM/SnUULib2Su64LR7GDWHzlxc1S1vMz6hIlRFQPdp9kyzDOyHso= X-Received: by 2002:a17:902:7e49:: with SMTP id a9-v6mr1596934pln.149.1537887290705; Tue, 25 Sep 2018 07:54:50 -0700 (PDT) In-Reply-To: <20180925141551.GE10209@port70.net> Xref: news.gmane.org gmane.linux.lib.musl.general:13314 Archived-At: --000000000000aacbd00576b349c8 Content-Type: text/plain; charset="UTF-8" Sorry. Let me describe the problem in more detail. The process only hangs when launched without root privileges on the host (Arch Linux x64 with kernel 4.17.5-1) where Alpine docker container is running. Once with root privileges, it starts up correctly (but this is obvious since it doesn't hit setrlimit call). The odd side is that on other hosts it hangs even when started with root. No error messages so far. Strace output: $ sudo strace -p 9285 futex(0x2cddfc0, FUTEX_WAIT_PRIVATE, 0, NULL $ sudo strace -f -p 9285 ..... [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=51442144}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=62384239}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=73251219}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=84458579}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=95098614}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 [pid 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=1537887068, tv_nsec=106005502}) = -1 ETIMEDOUT (Connection timed out) [pid 9287] getdents64(10, /* 0 entries */, 2048) = 0 [pid 9287] lseek(10, 0, SEEK_SET) = 0 [pid 9287] getdents64(10, /* 14 entries */, 2048) = 336 [pid 9287] tgkill(9285, 9285, SIGRT_2) = 0 ..... I'll try to build a tiny example to isolate the problem and hopefully provide more feedback. Thanks On Tue, Sep 25, 2018 at 4:15 PM Szabolcs Nagy wrote: > * Rabbitstack [2018-09-25 14:59:45 +0200]: > > I'm using the latest golang:alpine Docker image to produce a > > statically-linked Go binary. Even though I'm able to build the binary, > when > > I run it the process gets stuck during ebpf program loading. I've > > investigated a bit and found the root cause is the call to setrlimit > (this > > is the offending line > > > > > https://github.com/iovisor/gobpf/blob/2e314be67b1854ad226f012f08a984e0e89b6da9/elf/elf.go#L105 > ). > > Are you aware of such behaviour in musl? > > > > well you could have described what goes wrong in more detail > (error message, strace output, target platform, are you root, ...) > > i assume you are not running this on mips (since there is no > alpine docker image for mips), which has the issue of > SYSCALL_RLIM_INFINITY != RLIM_INFINITY > the kernel side value is different from userspace so musl > has to translate the value which may go wrong. > > nor on x32 (which may have various issues with the raw > syscalls both in the go code and c code). > > increasing rlimit is not allowed by default, so you have to > ensure you have permissions, musl should have no special > behaviour with respect to RLIMIT_MEMLOCK, so it's more likely > that you just don't have bpf and setrlimit permissions. > > instead of using a complex system like go + c code + elf loader, > try a minimal c program to see if the bpf syscall succeeds at > all in your docker environment. > > --000000000000aacbd00576b349c8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sorry. Let me describe the problem in more detail.

The process only hangs when launched without root privilege= s on the host (Arch Linux x64 with kernel 4.17.5-1) where Alpine docker con= tainer is running. Once with root privileges, it starts up correctly (but t= his is obvious since it doesn't hit setrlimit call). The odd side is th= at on other hosts it hangs even when started with root. No error messages s= o far. Strace output:

$ sudo strace -p 9285

futex(0x2cddfc0, FUTEX_WAIT_PRIVATE, 0, NULL
<= div>
$ sudo strace -f -p 9285

.= ....
[pid=C2=A0 9287] getdents64(10, /* 14 entries */, 2048) =3D = 336
[pid=C2=A0 9287] tgkill(9285, 9285, SIGRT_2) =3D 0
[pid=C2=A0 928= 7] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=3D1537887068, tv_ns= ec=3D51442144}) =3D -1 ETIMEDOUT (Connection timed out)
[pid=C2=A0 9287]= getdents64(10, /* 0 entries */, 2048) =3D 0
[pid=C2=A0 9287] lseek(10, = 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D 0
[pid=C2=A0 9287] getden= ts64(10, /* 14 entries */, 2048) =3D 336
[pid=C2=A0 9287] tgkill(9285, 9= 285, SIGRT_2) =3D 0
[pid=C2=A0 9287] futex(0x7efbff70008c, FUTEX_LOCK_PI= _PRIVATE, {tv_sec=3D1537887068, tv_nsec=3D62384239}) =3D -1 ETIMEDOUT (Conn= ection timed out)
[pid=C2=A0 9287] getdents64(10, /* 0 entries */, 2048)= =3D 0
[pid=C2=A0 9287] lseek(10, 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D 0
[pid=C2=A0 9287] getdents64(10, /* 14 entries */, 2048) =3D= 336
[pid=C2=A0 9287] tgkill(9285, 9285, SIGRT_2) =3D 0
[pid=C2=A0 92= 87] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=3D1537887068, tv_n= sec=3D73251219}) =3D -1 ETIMEDOUT (Connection timed out)
[pid=C2=A0 9287= ] getdents64(10, /* 0 entries */, 2048) =3D 0
[pid=C2=A0 9287] lseek(10,= 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D 0
[pid=C2=A0 9287] getde= nts64(10, /* 14 entries */, 2048) =3D 336
[pid=C2=A0 9287] tgkill(9285, = 9285, SIGRT_2) =3D 0
[pid=C2=A0 9287] futex(0x7efbff70008c, FUTEX_LOCK_P= I_PRIVATE, {tv_sec=3D1537887068, tv_nsec=3D84458579}) =3D -1 ETIMEDOUT (Con= nection timed out)
[pid=C2=A0 9287] getdents64(10, /* 0 entries */, 2048= ) =3D 0
[pid=C2=A0 9287] lseek(10, 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D 0
[pid=C2=A0 9287] getdents64(10, /* 14 entries */, 2048) =3D= 336
[pid=C2=A0 9287] tgkill(9285, 9285, SIGRT_2) =3D 0
[pid=C2=A0 92= 87] futex(0x7efbff70008c, FUTEX_LOCK_PI_PRIVATE, {tv_sec=3D1537887068, tv_n= sec=3D95098614}) =3D -1 ETIMEDOUT (Connection timed out)
[pid=C2=A0 9287= ] getdents64(10, /* 0 entries */, 2048) =3D 0
[pid=C2=A0 9287] lseek(10,= 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D 0
[pid=C2=A0 9287] getde= nts64(10, /* 14 entries */, 2048) =3D 336
[pid=C2=A0 9287] tgkill(9285, = 9285, SIGRT_2) =3D 0
[pid=C2=A0 9287] futex(0x7efbff70008c, FUTEX_LOCK_P= I_PRIVATE, {tv_sec=3D1537887068, tv_nsec=3D106005502}) =3D -1 ETIMEDOUT (Co= nnection timed out)
[pid=C2=A0 9287] getdents64(10, /* 0 entries */, 204= 8) =3D 0
[pid=C2=A0 9287] lseek(10, 0, SEEK_SET)=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D 0
[pid=C2=A0 9287] getdents64(10, /* 14 entries */, 2048) =3D= 336
[pid=C2=A0 9287] tgkill(9285, 9285, SIGRT_2) =3D 0
..= ...


I'll try to build a tin= y example to isolate the problem and hopefully provide more feedback.
=

Thanks

On Tue, Sep 25, 2018 at 4:15 PM Szab= olcs Nagy <nsz@port70.net> wrot= e:
* Rabbitstack <rabbitstack7@gmail.com>= ; [2018-09-25 14:59:45 +0200]:
> I'm using the latest golang:alpine Docker image to produce a
> statically-linked Go binary. Even though I'm able to build the bin= ary, when
> I run it the process gets stuck during ebpf program loading. I've<= br> > investigated a bit and found the root cause is the call to setrlimit (= this
> is the offending line
>
> h= ttps://github.com/iovisor/gobpf/blob/2e314be67b1854ad226f012f08a984e0e89b6d= a9/elf/elf.go#L105).
> Are you aware of such behaviour in musl?
>

well you could have described what goes wrong in more detail
(error message, strace output, target platform, are you root, ...)

i assume you are not running this on mips (since there is no
alpine docker image for mips), which has the issue of
SYSCALL_RLIM_INFINITY !=3D RLIM_INFINITY
the kernel side value is different from userspace so musl
has to translate the value which may go wrong.

nor on x32 (which may have various issues with the raw
syscalls both in the go code and c code).

increasing rlimit is not allowed by default, so you have to
ensure you have permissions, musl should have no special
behaviour with respect to RLIMIT_MEMLOCK, so it's more likely
that you just don't have bpf and setrlimit permissions.

instead of using a complex system like go + c code + elf loader,
try a minimal c program to see if the bpf syscall succeeds at
all in your docker environment.

--000000000000aacbd00576b349c8--