mailing list of musl libc
 help / color / mirror / code / Atom feed
* aio_cancel segmentation fault for in progress write requests
@ 2018-12-07 12:52 Arkadiusz Sienkiewicz
  2018-12-07 15:44 ` Rich Felker
  0 siblings, 1 reply; 22+ messages in thread
From: Arkadiusz Sienkiewicz @ 2018-12-07 12:52 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 3306 bytes --]

Hi,

I'm experiencing segmentation fault when I invoke aio_cancel on request
which is in EINPROGRESS state. This happens only with libc muls (used
version - 1.1.12-r8) and only on one (dual Intel Xeon Gold 6128) of few
computers I've tried it on - please let me know if you need more
information about that machine. Attached is very simple program
(aioWrite.cpp) that reproduces this problem.

alpine-tmp-0:~$ ./aioWrite
Segmentation fault (core dumped)

Bt from gdb shows problem is in aio_cancel.

(gdb) r
Starting program: ~/aioWrite
[New LWP 70321]

Program received signal ?, Unknown signal.
[Switching to LWP 70321]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29    src/thread/x86_64/syscall_cp.s: No such file or directory.
(gdb) bt
#0  __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1  0x00007ffff7dc6919 in __syscall_cp_c (nr=18, u=<optimized out>,
v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>,
z=0) at src/thread/pthread_cancel.c:37
#2  0x00007ffff7dcc0df in pwrite (fd=fd@entry=3, buf=buf@entry=0x7ffffff81900,
size=size@entry=512512, ofs=ofs@entry=0) at src/unistd/pwrite.c:7
#3  0x00007ffff7d8974e in io_thread_func (ctx=<optimized out>) at
src/aio/aio.c:240
#4  0x00007ffff7dc7293 in start (p=0x7ffff7ff4ab0) at
src/thread/pthread_create.c:145
#5  0x00007ffff7dc6072 in __clone () at src/thread/x86_64/clone.s:21
Backtrace stopped: frame did not save the PC
(gdb) info threads
  Id   Target Id         Frame
* 2    LWP 70321 "aioWrite" __cp_end () at src/thread/x86_64/syscall_cp.s:29
  1    LWP 70317 "aioWrite" __wait (addr=addr@entry=0x7ffff7ff49f8,
waiters=waiters@entry=0x0, val=val@entry=-1, priv=<optimized out>,
priv@entry=1) at src/thread/__wait.c:14
(gdb) thread 1
[Switching to thread 1 (LWP 70317)]
#0  __wait (addr=addr@entry=0x7ffff7ff49f8, waiters=waiters@entry=0x0,
val=val@entry=-1, priv=<optimized out>, priv@entry=1) at
src/thread/__wait.c:14
14    src/thread/__wait.c: No such file or directory.
(gdb) bt
#0  __wait (addr=addr@entry=0x7ffff7ff49f8, waiters=waiters@entry=0x0,
val=val@entry=-1, priv=<optimized out>, priv@entry=1) at
src/thread/__wait.c:14
#1  0x00007ffff7d89b30 in aio_cancel (fd=<optimized out>,
cb=0x7ffffff04640) at src/aio/aio.c:356
#2  0x0000000000400c54 in main () at aioWrite.cpp:45
(gdb)

In other application (which code I cannot share) I was able to get more
detailed trace for main thread, narrowing problem to pthread_kill call.

Program received signal ?, Unknown signal.
[Switching to LWP 70293]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29    src/thread/x86_64/syscall_cp.s: No such file or directory.
(gdb) thread 1
[Switching to thread 1 (LWP 60762)]
#0  0x00007ffff7dc7ac4 in pthread_kill (t=t@entry=0x7ffff7fdeab0,
sig=sig@entry=33) at src/thread/pthread_kill.c:7
7    src/thread/pthread_kill.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7dc7ac4 in pthread_kill (t=t@entry=0x7ffff7fdeab0,
sig=sig@entry=33) at src/thread/pthread_kill.c:7
#1  0x00007ffff7dc69eb in pthread_cancel (t=0x7ffff7fdeab0) at
src/thread/pthread_cancel.c:99
#2  0x00007ffff7d89b1d in aio_cancel (fd=<optimized out>, cb=0xf4e180) at
src/aio/aio.c:355

Operating system is containerized alpine linux:
Linux alpine-tmp-0 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC
2018 x86_64 Linux

Best Regards,

[-- Attachment #1.2: Type: text/html, Size: 3716 bytes --]

[-- Attachment #2: aioWrite.cpp --]
[-- Type: text/x-c++src, Size: 1193 bytes --]

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <aio.h>

#define TNAME "aio_write/1-1.c"

int main() {
  char tmpfname[256];
  #define BUF_SIZE 512512
  char buf[BUF_SIZE];
  char check[BUF_SIZE+1];
  int fd;
  struct aiocb aiocb;
  int err;
  int ret;

  snprintf(tmpfname, sizeof(tmpfname), "pts_aio_write_1_1_%d", getpid());
  unlink(tmpfname);
  fd = open(tmpfname, O_CREAT | O_RDWR | O_EXCL, S_IRUSR | S_IWUSR);
  if (fd == -1) {
    printf(TNAME " Error at open(): %s\n", strerror(errno));
    exit(1);
  }

  unlink(tmpfname);

  memset(buf, 0xaa, BUF_SIZE);
  memset(&aiocb, 0, sizeof(struct aiocb));
  aiocb.aio_fildes = fd;
  aiocb.aio_buf = buf;
  aiocb.aio_nbytes = BUF_SIZE;

  if (aio_write(&aiocb) == -1) {
    printf(TNAME " Error at aio_write(): %s\n", strerror(errno));
    close(fd);
    exit(2);
  }

  int cancellationStatus = aio_cancel(fd, &aiocb);
  printf (TNAME " cancelationStatus : %d\n", cancellationStatus);

  /* Wait until completion */
  while (aio_error (&aiocb) == EINPROGRESS);

  close(fd);
  printf ("Test PASSED\n");
  return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 12:52 aio_cancel segmentation fault for in progress write requests Arkadiusz Sienkiewicz
@ 2018-12-07 15:44 ` Rich Felker
  2018-12-07 16:04   ` Arkadiusz Sienkiewicz
  0 siblings, 1 reply; 22+ messages in thread
From: Rich Felker @ 2018-12-07 15:44 UTC (permalink / raw)
  To: Arkadiusz Sienkiewicz; +Cc: musl

On Fri, Dec 07, 2018 at 01:52:31PM +0100, Arkadiusz Sienkiewicz wrote:
> Hi,
> 
> I'm experiencing segmentation fault when I invoke aio_cancel on request
> which is in EINPROGRESS state. This happens only with libc muls (used
> version - 1.1.12-r8) and only on one (dual Intel Xeon Gold 6128) of few
> computers I've tried it on - please let me know if you need more
> information about that machine. Attached is very simple program
> (aioWrite.cpp) that reproduces this problem.
> 
> alpine-tmp-0:~$ ./aioWrite
> Segmentation fault (core dumped)
> 
> Bt from gdb shows problem is in aio_cancel.

This is not correct:

> 
> (gdb) r
> Starting program: ~/aioWrite
> [New LWP 70321]
> 
> Program received signal ?, Unknown signal.
> [Switching to LWP 70321]

This just shows that the aio thread received the cancellation request.
It's not a crash or a problem. However, gdb's reporting of it as
"Unknown signal" and inability to pass it on correctly indicates that
something is wrong with the gdb on your system. I've hit this issue a
lot but it works on some systems and I don't recall what the
cause/difference behind it is. We should work to figure that out and
get an appropriate fix in distros that are affected.


> #include <stdio.h>
> #include <sys/types.h>
> #include <unistd.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <string.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <aio.h>
> 
> #define TNAME "aio_write/1-1.c"
> 
> int main() {
>   char tmpfname[256];
>   #define BUF_SIZE 512512
>   char buf[BUF_SIZE];
>   char check[BUF_SIZE+1];
>   int fd;
>   struct aiocb aiocb;
>   int err;
>   int ret;
> 
>   snprintf(tmpfname, sizeof(tmpfname), "pts_aio_write_1_1_%d", getpid());
>   unlink(tmpfname);
>   fd = open(tmpfname, O_CREAT | O_RDWR | O_EXCL, S_IRUSR | S_IWUSR);
>   if (fd == -1) {
>     printf(TNAME " Error at open(): %s\n", strerror(errno));
>     exit(1);
>   }
> 
>   unlink(tmpfname);
> 
>   memset(buf, 0xaa, BUF_SIZE);
>   memset(&aiocb, 0, sizeof(struct aiocb));
>   aiocb.aio_fildes = fd;
>   aiocb.aio_buf = buf;
>   aiocb.aio_nbytes = BUF_SIZE;
> 
>   if (aio_write(&aiocb) == -1) {
>     printf(TNAME " Error at aio_write(): %s\n", strerror(errno));
>     close(fd);
>     exit(2);
>   }
> 
>   int cancellationStatus = aio_cancel(fd, &aiocb);
>   printf (TNAME " cancelationStatus : %d\n", cancellationStatus);
> 
>   /* Wait until completion */
>   while (aio_error (&aiocb) == EINPROGRESS);
> 
>   close(fd);
>   printf ("Test PASSED\n");
>   return 0;
> }

I just tried this test and it works for me on 32-bit x86. I'll try
some other systems and see if I can reproduce the issue. It could be a
bug in the test but I didn't see anything obviously wrong with it.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 15:44 ` Rich Felker
@ 2018-12-07 16:04   ` Arkadiusz Sienkiewicz
  2018-12-07 16:52     ` Orivej Desh
  2018-12-07 16:52     ` Rich Felker
  0 siblings, 2 replies; 22+ messages in thread
From: Arkadiusz Sienkiewicz @ 2018-12-07 16:04 UTC (permalink / raw)
  To: dalias; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 3337 bytes --]

Ok, maybe stacktrace is misleading due to some problem in GDB. However,
that doesn't explain why I'm getting segmentation fault when I execute test
program without gdb. Also commenting aio_cancel line will "fix" seg fault,
so that function is most probable culprit.

pt., 7 gru 2018 o 16:44 Rich Felker <dalias@libc.org> napisał(a):

> On Fri, Dec 07, 2018 at 01:52:31PM +0100, Arkadiusz Sienkiewicz wrote:
> > Hi,
> >
> > I'm experiencing segmentation fault when I invoke aio_cancel on request
> > which is in EINPROGRESS state. This happens only with libc muls (used
> > version - 1.1.12-r8) and only on one (dual Intel Xeon Gold 6128) of few
> > computers I've tried it on - please let me know if you need more
> > information about that machine. Attached is very simple program
> > (aioWrite.cpp) that reproduces this problem.
> >
> > alpine-tmp-0:~$ ./aioWrite
> > Segmentation fault (core dumped)
> >
> > Bt from gdb shows problem is in aio_cancel.
>
> This is not correct:
>
> >
> > (gdb) r
> > Starting program: ~/aioWrite
> > [New LWP 70321]
> >
> > Program received signal ?, Unknown signal.
> > [Switching to LWP 70321]
>
> This just shows that the aio thread received the cancellation request.
> It's not a crash or a problem. However, gdb's reporting of it as
> "Unknown signal" and inability to pass it on correctly indicates that
> something is wrong with the gdb on your system. I've hit this issue a
> lot but it works on some systems and I don't recall what the
> cause/difference behind it is. We should work to figure that out and
> get an appropriate fix in distros that are affected.
>
>
> > #include <stdio.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <string.h>
> > #include <errno.h>
> > #include <stdlib.h>
> > #include <aio.h>
> >
> > #define TNAME "aio_write/1-1.c"
> >
> > int main() {
> >   char tmpfname[256];
> >   #define BUF_SIZE 512512
> >   char buf[BUF_SIZE];
> >   char check[BUF_SIZE+1];
> >   int fd;
> >   struct aiocb aiocb;
> >   int err;
> >   int ret;
> >
> >   snprintf(tmpfname, sizeof(tmpfname), "pts_aio_write_1_1_%d", getpid());
> >   unlink(tmpfname);
> >   fd = open(tmpfname, O_CREAT | O_RDWR | O_EXCL, S_IRUSR | S_IWUSR);
> >   if (fd == -1) {
> >     printf(TNAME " Error at open(): %s\n", strerror(errno));
> >     exit(1);
> >   }
> >
> >   unlink(tmpfname);
> >
> >   memset(buf, 0xaa, BUF_SIZE);
> >   memset(&aiocb, 0, sizeof(struct aiocb));
> >   aiocb.aio_fildes = fd;
> >   aiocb.aio_buf = buf;
> >   aiocb.aio_nbytes = BUF_SIZE;
> >
> >   if (aio_write(&aiocb) == -1) {
> >     printf(TNAME " Error at aio_write(): %s\n", strerror(errno));
> >     close(fd);
> >     exit(2);
> >   }
> >
> >   int cancellationStatus = aio_cancel(fd, &aiocb);
> >   printf (TNAME " cancelationStatus : %d\n", cancellationStatus);
> >
> >   /* Wait until completion */
> >   while (aio_error (&aiocb) == EINPROGRESS);
> >
> >   close(fd);
> >   printf ("Test PASSED\n");
> >   return 0;
> > }
>
> I just tried this test and it works for me on 32-bit x86. I'll try
> some other systems and see if I can reproduce the issue. It could be a
> bug in the test but I didn't see anything obviously wrong with it.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 4280 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 16:04   ` Arkadiusz Sienkiewicz
@ 2018-12-07 16:52     ` Orivej Desh
  2018-12-07 16:52     ` Rich Felker
  1 sibling, 0 replies; 22+ messages in thread
From: Orivej Desh @ 2018-12-07 16:52 UTC (permalink / raw)
  To: musl

I can't reproduce it either. Does it fail every time?
Could you attach the log from "strace -f -o strace.log ~/aioWrite"?
Do the other machines have the same kernel (4.15.0-20-generic)?
Have you tried running the binary built on a successful machine on
the problematic machine?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 16:04   ` Arkadiusz Sienkiewicz
  2018-12-07 16:52     ` Orivej Desh
@ 2018-12-07 16:52     ` Rich Felker
  2018-12-07 17:31       ` A. Wilcox
  1 sibling, 1 reply; 22+ messages in thread
From: Rich Felker @ 2018-12-07 16:52 UTC (permalink / raw)
  To: Arkadiusz Sienkiewicz; +Cc: musl

On Fri, Dec 07, 2018 at 05:04:07PM +0100, Arkadiusz Sienkiewicz wrote:
> Ok, maybe stacktrace is misleading due to some problem in GDB. However,

It's not just the backtrace that's misleading, but also the point of
crash. The point gdb has stopped at is prior to the crash. However...

> that doesn't explain why I'm getting segmentation fault when I execute test
> program without gdb. Also commenting aio_cancel line will "fix" seg fault,
> so that function is most probable culprit.

it seems from your output, which lacks the message:

    aio_write/1-1.c cancelationStatus : 2

that the crash happened before the printf was reached. It's not clear
to me what could have caused it though. Calling close also performs
the equivalent of aio_cancel on the fd. Can you try running under
strace (with -f option) or anything else that might give further clues
as to where/why it crashed? valgrind might also be a good idea.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 16:52     ` Rich Felker
@ 2018-12-07 17:31       ` A. Wilcox
  2018-12-07 18:26         ` Rich Felker
  0 siblings, 1 reply; 22+ messages in thread
From: A. Wilcox @ 2018-12-07 17:31 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 2008 bytes --]

awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
zsh: segmentation fault  ./aioWrite
awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
zsh: segmentation fault  ./aioWrite
awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
zsh: segmentation fault  ./aioWrite
awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
zsh: segmentation fault  ./aioWrite
awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
zsh: segmentation fault  ./aioWrite
awilcox on gwyn [pts/7 Fri 7 11:29] ~: gdb aioWrite
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64-foxkit-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from aioWrite...done.
(gdb) run
Starting program: /home/awilcox/aioWrite
[New LWP 60165]
[LWP 60165 exited]
aio_write/1-1.c cancelationStatus : 2
Test PASSED
[Inferior 1 (process 60162) exited normally]
(gdb) quit


Well, that's pretty interesting.


awilcox on gwyn [pts/7 Fri 7 11:29] ~: uname -a
Linux gwyn 4.14.76-mc11-easy-p8 #1 SMP Sat Nov 17 04:52:54 UTC 2018
ppc64 GNU/Linux
awilcox on gwyn [pts/7 Fri 7 11:30] ~: /lib/ld-musl-powerpc64.so.1
musl libc (powerpc64)
Version 1.1.20
Dynamic Program Loader
Usage: /lib/ld-musl-powerpc64.so.1 [options] [--] pathname [args]


Maybe this is a bug that has been fixed on master?


Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 17:31       ` A. Wilcox
@ 2018-12-07 18:26         ` Rich Felker
  2018-12-07 19:05           ` A. Wilcox
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-07 18:26 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 11:31:01AM -0600, A. Wilcox wrote:
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> zsh: segmentation fault  ./aioWrite
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> zsh: segmentation fault  ./aioWrite
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> zsh: segmentation fault  ./aioWrite
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> zsh: segmentation fault  ./aioWrite
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> zsh: segmentation fault  ./aioWrite
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: gdb aioWrite
> GNU gdb (GDB) 8.2
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "powerpc64-foxkit-linux-musl".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from aioWrite...done.
> (gdb) run
> Starting program: /home/awilcox/aioWrite
> [New LWP 60165]
> [LWP 60165 exited]
> aio_write/1-1.c cancelationStatus : 2
> Test PASSED
> [Inferior 1 (process 60162) exited normally]
> (gdb) quit
> 
> 
> Well, that's pretty interesting.
> 
> 
> awilcox on gwyn [pts/7 Fri 7 11:29] ~: uname -a
> Linux gwyn 4.14.76-mc11-easy-p8 #1 SMP Sat Nov 17 04:52:54 UTC 2018
> ppc64 GNU/Linux
> awilcox on gwyn [pts/7 Fri 7 11:30] ~: /lib/ld-musl-powerpc64.so.1
> musl libc (powerpc64)
> Version 1.1.20
> Dynamic Program Loader
> Usage: /lib/ld-musl-powerpc64.so.1 [options] [--] pathname [args]
> 
> 
> Maybe this is a bug that has been fixed on master?

I don't think so. I'm concerned that it's a stack overflow, and that
somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
AIO threads use a PTHREAD_STACK_MIN-sized stack with no guard page
(because they don't run any application code, just a tiny stub
function) but this could overflow in kernelspace (and either crash or
clobber memory depending on memory layout and presence/absence of
ASLR) if the kernel is making a signal frame that's too big. Note that
this would have to be nearly twice MINSIGSTKSZ (on x86 at least) due
to rounding up to whole pages, so if the kernel is misbehaving here
it's *badly* misbehaving...

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 18:26         ` Rich Felker
@ 2018-12-07 19:05           ` A. Wilcox
  2018-12-07 20:07             ` Rich Felker
  2018-12-07 19:13           ` A. Wilcox
  2018-12-07 20:06           ` Florian Weimer
  2 siblings, 1 reply; 22+ messages in thread
From: A. Wilcox @ 2018-12-07 19:05 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 2695 bytes --]

On 12/07/18 12:26, Rich Felker wrote:
> On Fri, Dec 07, 2018 at 11:31:01AM -0600, A. Wilcox wrote:
>> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
>> zsh: segmentation fault  ./aioWrite
>>
>> (gdb) run
>> Starting program: /home/awilcox/aioWrite
>> [New LWP 60165]
>> [LWP 60165 exited]
>> aio_write/1-1.c cancelationStatus : 2
>> Test PASSED
>> [Inferior 1 (process 60162) exited normally]
>> (gdb) quit
>>
> I don't think so. I'm concerned that it's a stack overflow, and that
> somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
> AIO threads use a PTHREAD_STACK_MIN-sized stack with no guard page
> (because they don't run any application code, just a tiny stub
> function) but this could overflow in kernelspace (and either crash or
> clobber memory depending on memory layout and presence/absence of
> ASLR) if the kernel is making a signal frame that's too big. Note that
> this would have to be nearly twice MINSIGSTKSZ (on x86 at least) due
> to rounding up to whole pages, so if the kernel is misbehaving here
> it's *badly* misbehaving...
> 
> Rich
> 


Note how for me, it runs correctly in gdb, but not bare.  I can
reproduce this behaviour in valgrind, too:

awilcox on gwyn [pts/7 Fri 7 13:03] ~: valgrind ./aioWrite
==47650== Memcheck, a memory error detector
==47650== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==47650== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==47650== Command: ./aioWrite
==47650==
--47650-- WARNING: unhandled ppc64be-linux syscall: 208
--47650-- You may be able to write your own handler.
--47650-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--47650-- Nevertheless we consider this a bug.  Please report
--47650-- it at http://valgrind.org/support/bug_reports.html.
aio_write/1-1.c cancelationStatus : 2
Test PASSED
==47650==
==47650== HEAP SUMMARY:
==47650==     in use at exit: 7,574 bytes in 5 blocks
==47650==   total heap usage: 6 allocs, 1 frees, 7,694 bytes allocated
==47650==
==47650== LEAK SUMMARY:
==47650==    definitely lost: 0 bytes in 0 blocks
==47650==    indirectly lost: 0 bytes in 0 blocks
==47650==      possibly lost: 0 bytes in 0 blocks
==47650==    still reachable: 7,168 bytes in 4 blocks
==47650==         suppressed: 406 bytes in 1 blocks
==47650== Rerun with --leak-check=full to see details of leaked memory
==47650==
==47650== For counts of detected and suppressed errors, rerun with: -v
==47650== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


(syscall 208 is tkill)

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 18:26         ` Rich Felker
  2018-12-07 19:05           ` A. Wilcox
@ 2018-12-07 19:13           ` A. Wilcox
  2018-12-07 20:21             ` Rich Felker
  2018-12-07 20:35             ` Markus Wichmann
  2018-12-07 20:06           ` Florian Weimer
  2 siblings, 2 replies; 22+ messages in thread
From: A. Wilcox @ 2018-12-07 19:13 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1374 bytes --]

Okay, it's a race of some kind:

awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so
musl libc (powerpc64)
Version 1.1.20-git-156-gb1c58cb9
Dynamic Program Loader
Usage: lib/libc.so [options] [--] pathname [args]
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite

zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
aio_write/1-1.c cancelationStatus : 2
Test PASSED
awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
zsh: segmentation fault  lib/libc.so ~/aioWrite


So, my best theory is that running inside a debugger (gdb, valgrind)
makes it slow enough that it no longer races.

Best,
--arw


-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 18:26         ` Rich Felker
  2018-12-07 19:05           ` A. Wilcox
  2018-12-07 19:13           ` A. Wilcox
@ 2018-12-07 20:06           ` Florian Weimer
  2018-12-07 20:14             ` Rich Felker
  2 siblings, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2018-12-07 20:06 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> I don't think so. I'm concerned that it's a stack overflow, and that
> somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.

Probably:

  <https://sourceware.org/bugzilla/show_bug.cgi?id=20305>
  <https://sourceware.org/bugzilla/show_bug.cgi?id=22636>

It's a nasty CPU backwards compatibility problem.  Some of the
suggestions I made to work around this are simply wrong; don't take them
too seriously.

Nowadays, the kernel has a way to disable the %zmm registers, but it
unfortunately does not reduce the save area size.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 19:05           ` A. Wilcox
@ 2018-12-07 20:07             ` Rich Felker
  0 siblings, 0 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-07 20:07 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 01:05:53PM -0600, A. Wilcox wrote:
> On 12/07/18 12:26, Rich Felker wrote:
> > On Fri, Dec 07, 2018 at 11:31:01AM -0600, A. Wilcox wrote:
> >> awilcox on gwyn [pts/7 Fri 7 11:29] ~: ./aioWrite
> >> zsh: segmentation fault  ./aioWrite
> >>
> >> (gdb) run
> >> Starting program: /home/awilcox/aioWrite
> >> [New LWP 60165]
> >> [LWP 60165 exited]
> >> aio_write/1-1.c cancelationStatus : 2
> >> Test PASSED
> >> [Inferior 1 (process 60162) exited normally]
> >> (gdb) quit
> >>
> > I don't think so. I'm concerned that it's a stack overflow, and that
> > somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
> > AIO threads use a PTHREAD_STACK_MIN-sized stack with no guard page
> > (because they don't run any application code, just a tiny stub
> > function) but this could overflow in kernelspace (and either crash or
> > clobber memory depending on memory layout and presence/absence of
> > ASLR) if the kernel is making a signal frame that's too big. Note that
> > this would have to be nearly twice MINSIGSTKSZ (on x86 at least) due
> > to rounding up to whole pages, so if the kernel is misbehaving here
> > it's *badly* misbehaving...
> 
> Note how for me, it runs correctly in gdb, but not bare.  I can
> reproduce this behaviour in valgrind, too:
> 
> awilcox on gwyn [pts/7 Fri 7 13:03] ~: valgrind ./aioWrite
> ==47650== Memcheck, a memory error detector
> ==47650== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==47650== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==47650== Command: ./aioWrite
> ==47650==
> --47650-- WARNING: unhandled ppc64be-linux syscall: 208
> --47650-- You may be able to write your own handler.
> --47650-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
> --47650-- Nevertheless we consider this a bug.  Please report
> --47650-- it at http://valgrind.org/support/bug_reports.html.
> aio_write/1-1.c cancelationStatus : 2
> Test PASSED
> ==47650==
> ==47650== HEAP SUMMARY:
> ==47650==     in use at exit: 7,574 bytes in 5 blocks
> ==47650==   total heap usage: 6 allocs, 1 frees, 7,694 bytes allocated
> ==47650==
> ==47650== LEAK SUMMARY:
> ==47650==    definitely lost: 0 bytes in 0 blocks
> ==47650==    indirectly lost: 0 bytes in 0 blocks
> ==47650==      possibly lost: 0 bytes in 0 blocks
> ==47650==    still reachable: 7,168 bytes in 4 blocks
> ==47650==         suppressed: 406 bytes in 1 blocks
> ==47650== Rerun with --leak-check=full to see details of leaked memory
> ==47650==
> ==47650== For counts of detected and suppressed errors, rerun with: -v
> ==47650== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 
> 
> (syscall 208 is tkill)

It runs ok for you under valgrind? It was messing up for me (crashing
with static linking, getting stuck bad with dynamic) and that's what
suggested stack overflow to me (since valgrind likely uses a lot of
stack emulating stuff). This was on x86_64.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 20:06           ` Florian Weimer
@ 2018-12-07 20:14             ` Rich Felker
  2018-12-08 16:18               ` Florian Weimer
  0 siblings, 1 reply; 22+ messages in thread
From: Rich Felker @ 2018-12-07 20:14 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 09:06:18PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > I don't think so. I'm concerned that it's a stack overflow, and that
> > somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
> 
> Probably:
> 
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=20305>
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=22636>
> 
> It's a nasty CPU backwards compatibility problem.  Some of the
> suggestions I made to work around this are simply wrong; don't take them
> too seriously.
> 
> Nowadays, the kernel has a way to disable the %zmm registers, but it
> unfortunately does not reduce the save area size.

How large is the saved context with the %zmm junk? I measured just
~768 bytes on normal x86_64 without it, and since 2048 is rounded up
to a whole page (4096), overflow should not happen until the signal
context is something like 3.5k (allowing ~512 bytes for TCB (~128) and
2 simple call frames).

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 19:13           ` A. Wilcox
@ 2018-12-07 20:21             ` Rich Felker
  2018-12-07 20:35             ` Markus Wichmann
  1 sibling, 0 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-07 20:21 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> Okay, it's a race of some kind:
> 
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so
> musl libc (powerpc64)
> Version 1.1.20-git-156-gb1c58cb9
> Dynamic Program Loader
> Usage: lib/libc.so [options] [--] pathname [args]
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> 
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> aio_write/1-1.c cancelationStatus : 2
> Test PASSED
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> 
> 
> So, my best theory is that running inside a debugger (gdb, valgrind)
> makes it slow enough that it no longer races.

OK, here's a theory. Based on my reply just now to Florian, the signal
context would have to get really big to make the expected code path
overflow -- io_thread_func() has a very small stack frame and so does
cleanup(). However, early in io_thread_func, it calls
__aio_get_queue(), which calls calloc() if the tables at each level
don't already exist, which is certainly the case for the first call.
During this call, the margin will be somewhat smaller, and maybe it's
enough to make kernels that break the MINSIGSTKSZ contract cause an
overflow.

The right action here is probably calling __aio_get_queue with the fd
number *before* calling pthread_create, so that it's guaranteed that
__aio_get_queue takes the fast path in the io thread and doesn't call
calloc. This is especially important in light of the newish allowance
that malloc be interposed, where we would be running
application-provided malloc code in a thread with tiny stack.

I'm still not sure this is the source of the reported crash but I
think it needs to be changed either way.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 19:13           ` A. Wilcox
  2018-12-07 20:21             ` Rich Felker
@ 2018-12-07 20:35             ` Markus Wichmann
  2018-12-07 21:12               ` Rich Felker
  2018-12-07 22:51               ` A. Wilcox
  1 sibling, 2 replies; 22+ messages in thread
From: Markus Wichmann @ 2018-12-07 20:35 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> So, my best theory is that running inside a debugger (gdb, valgrind)
> makes it slow enough that it no longer races.

Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
unlimited"). That won't interfere with timing, but I have no clue if
coredumps work with multithreading.

2: Might I suggest installing a SIGSEGV handler? If you have libunwind,
you can create a backtrace from inside the handler. And even if not, you
can at least print the exception PC, which would help a ton already.

Only, don't return from that handler. Either _exit(), or better yet,
restore the default handler, then kill yourself with SIGSEGV.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 20:35             ` Markus Wichmann
@ 2018-12-07 21:12               ` Rich Felker
  2018-12-07 22:51               ` A. Wilcox
  1 sibling, 0 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-07 21:12 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 09:35:32PM +0100, Markus Wichmann wrote:
> On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> > So, my best theory is that running inside a debugger (gdb, valgrind)
> > makes it slow enough that it no longer races.
> 
> Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
> unlimited"). That won't interfere with timing, but I have no clue if
> coredumps work with multithreading.
> 
> 2: Might I suggest installing a SIGSEGV handler? If you have libunwind,
> you can create a backtrace from inside the handler. And even if not, you
> can at least print the exception PC, which would help a ton already.
> 
> Only, don't return from that handler. Either _exit(), or better yet,
> restore the default handler, then kill yourself with SIGSEGV.

The signal handler will not work if the problem is stack overflow on
the io thread. It runs with all signals blocked, and would need a
sigaltstack to have space to run anyway, but there's nowhere to
install a sigaltstack for the io thread.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 20:35             ` Markus Wichmann
  2018-12-07 21:12               ` Rich Felker
@ 2018-12-07 22:51               ` A. Wilcox
  2018-12-07 23:50                 ` Rich Felker
  1 sibling, 1 reply; 22+ messages in thread
From: A. Wilcox @ 2018-12-07 22:51 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 2211 bytes --]

On 12/07/18 14:35, Markus Wichmann wrote:
> On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
>> So, my best theory is that running inside a debugger (gdb, valgrind)
>> makes it slow enough that it no longer races.
> 
> Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
> unlimited"). That won't interfere with timing, but I have no clue if
> coredumps work with multithreading.

Core was generated by `./aioWrite '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
32      src/thread/powerpc64/syscall_cp.s: No such file or directory.
[Current thread is 1 (LWP 5507)]
(gdb) bt
#0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
#1  0x00003fffa768f2a4 in __syscall_cp_c (nr=180, u=512512, v=0, w=0,
x=0, y=0, z=0) at src/thread/pthread_cancel.c:35
#2  0x00003fffa768e008 in __syscall_cp (nr=<optimized out>, u=<optimized
out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=<optimized out>) at src/thread/__syscall_cp.c:20
#3  0x00003fffa76969f4 in pwrite (fd=<optimized out>, buf=<optimized
out>, size=<optimized out>, ofs=<optimized out>) at src/unistd/pwrite.c:7
#4  0x00003fffa763eddc in io_thread_func (ctx=<optimized out>) at
src/aio/aio.c:240
#5  0x00003fffa768f76c in start (p=0x3fffa76e8af8) at
src/thread/pthread_create.c:147
#6  0x00003fffa769b608 in __clone () at src/thread/powerpc64/clone.s:43
(gdb) thread 2
[Switching to thread 2 (LWP 5506)]
#0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
./arch/powerpc64/syscall_arch.h:54
54      ./arch/powerpc64/syscall_arch.h: No such file or directory.
(gdb) bt
#0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
./arch/powerpc64/syscall_arch.h:54
#1  __wait (addr=0x200, waiters=0x0, val=<optimized out>,
priv=<optimized out>) at src/thread/__wait.c:13
#2  0x00003fffa763f07c in aio_cancel (fd=<optimized out>,
cb=0x3fffffafd2b8) at src/aio/aio.c:356
#3  0x000000012034c044 in main ()


221 is SYS_futex.  Wow, that looks wrong.

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 22:51               ` A. Wilcox
@ 2018-12-07 23:50                 ` Rich Felker
  0 siblings, 0 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-07 23:50 UTC (permalink / raw)
  To: musl

On Fri, Dec 07, 2018 at 04:51:03PM -0600, A. Wilcox wrote:
> On 12/07/18 14:35, Markus Wichmann wrote:
> > On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> >> So, my best theory is that running inside a debugger (gdb, valgrind)
> >> makes it slow enough that it no longer races.
> > 
> > Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
> > unlimited"). That won't interfere with timing, but I have no clue if
> > coredumps work with multithreading.
> 
> Core was generated by `./aioWrite '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> 32      src/thread/powerpc64/syscall_cp.s: No such file or directory.
> [Current thread is 1 (LWP 5507)]
> (gdb) bt
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> #1  0x00003fffa768f2a4 in __syscall_cp_c (nr=180, u=512512, v=0, w=0,
> x=0, y=0, z=0) at src/thread/pthread_cancel.c:35
> #2  0x00003fffa768e008 in __syscall_cp (nr=<optimized out>, u=<optimized
> out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
> y=<optimized out>, z=<optimized out>) at src/thread/__syscall_cp.c:20
> #3  0x00003fffa76969f4 in pwrite (fd=<optimized out>, buf=<optimized
> out>, size=<optimized out>, ofs=<optimized out>) at src/unistd/pwrite.c:7
> #4  0x00003fffa763eddc in io_thread_func (ctx=<optimized out>) at
> src/aio/aio.c:240
> #5  0x00003fffa768f76c in start (p=0x3fffa76e8af8) at
> src/thread/pthread_create.c:147
> #6  0x00003fffa769b608 in __clone () at src/thread/powerpc64/clone.s:43
> (gdb) thread 2
> [Switching to thread 2 (LWP 5506)]
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> 54      ./arch/powerpc64/syscall_arch.h: No such file or directory.
> (gdb) bt
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> #1  __wait (addr=0x200, waiters=0x0, val=<optimized out>,
> priv=<optimized out>) at src/thread/__wait.c:13
> #2  0x00003fffa763f07c in aio_cancel (fd=<optimized out>,
> cb=0x3fffffafd2b8) at src/aio/aio.c:356
> #3  0x000000012034c044 in main ()
> 
> 
> 221 is SYS_futex.  Wow, that looks wrong.

I don't think thread 2 (odd numbering; it looks like the main thread)
is relevant to the crash; it's alread proceeded past whatever was
happening when thread 1 (the io thread) started crashing.

I'm guessing it is stack overflow. Can you dump the registers (to see
the stack pointer value) and info about memory ranges? That should
show how much space is left on the stack at the point of crash. If the
crash is the signal handler trying to run, there will probably be some
space left but less than the size of a signal frame, and the kernel
will probably refrain from moving the stack pointer to include the
signal frame.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-07 20:14             ` Rich Felker
@ 2018-12-08 16:18               ` Florian Weimer
  2018-12-10  9:05                 ` Arkadiusz Sienkiewicz
  0 siblings, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2018-12-08 16:18 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Fri, Dec 07, 2018 at 09:06:18PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > I don't think so. I'm concerned that it's a stack overflow, and that
>> > somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
>> 
>> Probably:
>> 
>>   <https://sourceware.org/bugzilla/show_bug.cgi?id=20305>
>>   <https://sourceware.org/bugzilla/show_bug.cgi?id=22636>
>> 
>> It's a nasty CPU backwards compatibility problem.  Some of the
>> suggestions I made to work around this are simply wrong; don't take them
>> too seriously.
>> 
>> Nowadays, the kernel has a way to disable the %zmm registers, but it
>> unfortunately does not reduce the save area size.
>
> How large is the saved context with the %zmm junk? I measured just
> ~768 bytes on normal x86_64 without it, and since 2048 is rounded up
> to a whole page (4096), overflow should not happen until the signal
> context is something like 3.5k (allowing ~512 bytes for TCB (~128) and
> 2 simple call frames).

I wrote a test to do some measurements:

  <https://sourceware.org/ml/libc-alpha/2018-12/msg00271.html>

The signal handler context is quite large on x86-64 with AVX-512F,
indeed around 3.5 KiB.  It is even larger on ppc64 and ppc64el
(~4.5 KiB), which I find somewhat surprising.

The cancellation test also includes stack usage from the libgcc
unwinder.  Its stack usage likely differs between versions, so I should
have included that in the reported results.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-08 16:18               ` Florian Weimer
@ 2018-12-10  9:05                 ` Arkadiusz Sienkiewicz
  2018-12-12  0:36                   ` Rich Felker
  0 siblings, 1 reply; 22+ messages in thread
From: Arkadiusz Sienkiewicz @ 2018-12-10  9:05 UTC (permalink / raw)
  To: musl; +Cc: dalias

[-- Attachment #1: Type: text/plain, Size: 5308 bytes --]

Here are answers to some question directed to me earlier:

> Could you attach the log from "strace -f -o strace.log ~/aioWrite"?
Sorry, can't do that. strace is not installed and I don't have root access.
If this is still needed I will ask admin to add strace.

> Do the other machines have the same kernel (4.15.0-20-generic)?
No, other machines use kernel 4.15.0-39-generic.

> Have you tried running the binary built on a successful machine on
the problematic machine?

Yes, same effect - segmentation fault. bt from gdb is identical too.

> valgrind might also be a good idea.

alpine-tmp-0:~$ strace -f ./aioWrite
-sh: strace: not found
alpine-tmp-0:~$ valgrind
valgrind            valgrind-di-server  valgrind-listener
alpine-tmp-0:~$ valgrind ./aioWrite
==70339== Memcheck, a memory error detector
==70339== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==70339== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==70339== Command: ./aioWrite
==70339==
==70339== Invalid free() / delete / delete[] / realloc()
==70339==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
==70339==    by 0x4020248: reclaim_gaps (dynlink.c:478)
==70339==    by 0x4020CD0: map_library (dynlink.c:674)
==70339==    by 0x4021818: load_library (dynlink.c:980)
==70339==    by 0x4022607: load_preload (dynlink.c:1075)
==70339==    by 0x4022607: __dls3 (dynlink.c:1585)
==70339==    by 0x4021EDB: __dls2 (dynlink.c:1389)
==70339==    by 0x401FC8E: ??? (in /lib/ld-musl-x86_64.so.1)
==70339==  Address 0x4e9a180 is in a rw- mapped file
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment
==70339==
==70339== Can't extend stack to 0x4087948 during signal delivery for thread
2:
==70339==   no stack segment
==70339==
==70339== Process terminating with default action of signal 11 (SIGSEGV):
dumping core
==70339==  Access not within mapped region at address 0x4087948
==70339==    at 0x4016834: __syscall3 (syscall_arch.h:29)
==70339==    by 0x4016834: __wake (pthread_impl.h:133)
==70339==    by 0x4016834: cleanup (aio.c:154)
==70339==    by 0x40167B0: io_thread_func (aio.c:255)
==70339==    by 0x4054292: start (pthread_create.c:145)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==    by 0x4053071: ??? (clone.s:21)
==70339==  If you believe this happened as a result of a stack
==70339==  overflow in your program's main thread (unlikely but
==70339==  possible), you can try to increase the size of the
==70339==  main thread stack using the --main-stacksize= flag.
==70339==  The main thread stack size used in this run was 8388608.
==70339==
==70339== HEAP SUMMARY:
==70339==     in use at exit: 81,051 bytes in 9 blocks
==70339==   total heap usage: 9 allocs, 3 frees, 81,051 bytes allocated
==70339==
==70339== LEAK SUMMARY:
==70339==    definitely lost: 0 bytes in 0 blocks
==70339==    indirectly lost: 0 bytes in 0 blocks
==70339==      possibly lost: 0 bytes in 0 blocks
==70339==    still reachable: 81,051 bytes in 9 blocks
==70339==         suppressed: 0 bytes in 0 blocks
==70339== Rerun with --leak-check=full to see details of leaked memory
==70339==
==70339== For counts of detected and suppressed errors, rerun with: -v
==70339== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
Killed


sob., 8 gru 2018 o 17:18 Florian Weimer <fweimer@redhat.com> napisał(a):

> * Rich Felker:
>
> > On Fri, Dec 07, 2018 at 09:06:18PM +0100, Florian Weimer wrote:
> >> * Rich Felker:
> >>
> >> > I don't think so. I'm concerned that it's a stack overflow, and that
> >> > somehow the kernel folks have managed to break the MINSIGSTKSZ ABI.
> >>
> >> Probably:
> >>
> >>   <https://sourceware.org/bugzilla/show_bug.cgi?id=20305>
> >>   <https://sourceware.org/bugzilla/show_bug.cgi?id=22636>
> >>
> >> It's a nasty CPU backwards compatibility problem.  Some of the
> >> suggestions I made to work around this are simply wrong; don't take them
> >> too seriously.
> >>
> >> Nowadays, the kernel has a way to disable the %zmm registers, but it
> >> unfortunately does not reduce the save area size.
> >
> > How large is the saved context with the %zmm junk? I measured just
> > ~768 bytes on normal x86_64 without it, and since 2048 is rounded up
> > to a whole page (4096), overflow should not happen until the signal
> > context is something like 3.5k (allowing ~512 bytes for TCB (~128) and
> > 2 simple call frames).
>
> I wrote a test to do some measurements:
>
>   <https://sourceware.org/ml/libc-alpha/2018-12/msg00271.html>
>
> The signal handler context is quite large on x86-64 with AVX-512F,
> indeed around 3.5 KiB.  It is even larger on ppc64 and ppc64el
> (~4.5 KiB), which I find somewhat surprising.
>
> The cancellation test also includes stack usage from the libgcc
> unwinder.  Its stack usage likely differs between versions, so I should
> have included that in the reported results.
>
> Thanks,
> Florian
>

[-- Attachment #2: Type: text/html, Size: 6631 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-10  9:05                 ` Arkadiusz Sienkiewicz
@ 2018-12-12  0:36                   ` Rich Felker
  2018-12-17 14:21                     ` Arkadiusz Sienkiewicz
  0 siblings, 1 reply; 22+ messages in thread
From: Rich Felker @ 2018-12-12  0:36 UTC (permalink / raw)
  To: musl

On Mon, Dec 10, 2018 at 10:05:05AM +0100, Arkadiusz Sienkiewicz wrote:
> Here are answers to some question directed to me earlier:
> 
> > Could you attach the log from "strace -f -o strace.log ~/aioWrite"?
> Sorry, can't do that. strace is not installed and I don't have root access.
> If this is still needed I will ask admin to add strace.
> 
> > Do the other machines have the same kernel (4.15.0-20-generic)?
> No, other machines use kernel 4.15.0-39-generic.
> 
> > Have you tried running the binary built on a successful machine on
> the problematic machine?
> 
> Yes, same effect - segmentation fault. bt from gdb is identical too.
> 
> > valgrind might also be a good idea.
> 
> alpine-tmp-0:~$ strace -f ./aioWrite
> -sh: strace: not found
> alpine-tmp-0:~$ valgrind
> valgrind            valgrind-di-server  valgrind-listener
> alpine-tmp-0:~$ valgrind ./aioWrite
> ==70339== Memcheck, a memory error detector
> ==70339== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==70339== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
> ==70339== Command: ./aioWrite
> ==70339==
> ==70339== Invalid free() / delete / delete[] / realloc()
> ==70339==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> ==70339==    by 0x4020248: reclaim_gaps (dynlink.c:478)
> ==70339==    by 0x4020CD0: map_library (dynlink.c:674)
> ==70339==    by 0x4021818: load_library (dynlink.c:980)
> ==70339==    by 0x4022607: load_preload (dynlink.c:1075)
> ==70339==    by 0x4022607: __dls3 (dynlink.c:1585)
> ==70339==    by 0x4021EDB: __dls2 (dynlink.c:1389)
> ==70339==    by 0x401FC8E: ??? (in /lib/ld-musl-x86_64.so.1)
> ==70339==  Address 0x4e9a180 is in a rw- mapped file
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment
> ==70339==
> ==70339== Can't extend stack to 0x4087948 during signal delivery for thread
> 2:
> ==70339==   no stack segment
> ==70339==
> ==70339== Process terminating with default action of signal 11 (SIGSEGV):
> dumping core
> ==70339==  Access not within mapped region at address 0x4087948
> ==70339==    at 0x4016834: __syscall3 (syscall_arch.h:29)
> ==70339==    by 0x4016834: __wake (pthread_impl.h:133)
> ==70339==    by 0x4016834: cleanup (aio.c:154)
> ==70339==    by 0x40167B0: io_thread_func (aio.c:255)
> ==70339==    by 0x4054292: start (pthread_create.c:145)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==    by 0x4053071: ??? (clone.s:21)
> ==70339==  If you believe this happened as a result of a stack
> ==70339==  overflow in your program's main thread (unlikely but
> ==70339==  possible), you can try to increase the size of the
> ==70339==  main thread stack using the --main-stacksize= flag.
> ==70339==  The main thread stack size used in this run was 8388608.
> ==70339==
> ==70339== HEAP SUMMARY:
> ==70339==     in use at exit: 81,051 bytes in 9 blocks
> ==70339==   total heap usage: 9 allocs, 3 frees, 81,051 bytes allocated
> ==70339==
> ==70339== LEAK SUMMARY:
> ==70339==    definitely lost: 0 bytes in 0 blocks
> ==70339==    indirectly lost: 0 bytes in 0 blocks
> ==70339==      possibly lost: 0 bytes in 0 blocks
> ==70339==    still reachable: 81,051 bytes in 9 blocks
> ==70339==         suppressed: 0 bytes in 0 blocks
> ==70339== Rerun with --leak-check=full to see details of leaked memory
> ==70339==
> ==70339== For counts of detected and suppressed errors, rerun with: -v
> ==70339== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
> Killed

Based on discussions in the other branches of this thread and on IRC,
I'm reasonably sure the cause of your crash is that your combination
of kernel and cpu model produces very large signal frames that
overflow the stack on the io thread. I have committed a solution to
the problem which I plan to push soon, along with some additional
improvements in this area.

Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-12  0:36                   ` Rich Felker
@ 2018-12-17 14:21                     ` Arkadiusz Sienkiewicz
  2018-12-17 17:29                       ` Rich Felker
  0 siblings, 1 reply; 22+ messages in thread
From: Arkadiusz Sienkiewicz @ 2018-12-17 14:21 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4640 bytes --]

Great, thank you for the fix.
Will it be available only in version 1.1.21 onward? Or will you also
backport it to older versions?

śr., 12 gru 2018 o 01:36 Rich Felker <dalias@libc.org> napisał(a):

> On Mon, Dec 10, 2018 at 10:05:05AM +0100, Arkadiusz Sienkiewicz wrote:
> > Here are answers to some question directed to me earlier:
> >
> > > Could you attach the log from "strace -f -o strace.log ~/aioWrite"?
> > Sorry, can't do that. strace is not installed and I don't have root
> access.
> > If this is still needed I will ask admin to add strace.
> >
> > > Do the other machines have the same kernel (4.15.0-20-generic)?
> > No, other machines use kernel 4.15.0-39-generic.
> >
> > > Have you tried running the binary built on a successful machine on
> > the problematic machine?
> >
> > Yes, same effect - segmentation fault. bt from gdb is identical too.
> >
> > > valgrind might also be a good idea.
> >
> > alpine-tmp-0:~$ strace -f ./aioWrite
> > -sh: strace: not found
> > alpine-tmp-0:~$ valgrind
> > valgrind            valgrind-di-server  valgrind-listener
> > alpine-tmp-0:~$ valgrind ./aioWrite
> > ==70339== Memcheck, a memory error detector
> > ==70339== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> > ==70339== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
> info
> > ==70339== Command: ./aioWrite
> > ==70339==
> > ==70339== Invalid free() / delete / delete[] / realloc()
> > ==70339==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > ==70339==    by 0x4020248: reclaim_gaps (dynlink.c:478)
> > ==70339==    by 0x4020CD0: map_library (dynlink.c:674)
> > ==70339==    by 0x4021818: load_library (dynlink.c:980)
> > ==70339==    by 0x4022607: load_preload (dynlink.c:1075)
> > ==70339==    by 0x4022607: __dls3 (dynlink.c:1585)
> > ==70339==    by 0x4021EDB: __dls2 (dynlink.c:1389)
> > ==70339==    by 0x401FC8E: ??? (in /lib/ld-musl-x86_64.so.1)
> > ==70339==  Address 0x4e9a180 is in a rw- mapped file
> > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment
> > ==70339==
> > ==70339== Can't extend stack to 0x4087948 during signal delivery for
> thread
> > 2:
> > ==70339==   no stack segment
> > ==70339==
> > ==70339== Process terminating with default action of signal 11 (SIGSEGV):
> > dumping core
> > ==70339==  Access not within mapped region at address 0x4087948
> > ==70339==    at 0x4016834: __syscall3 (syscall_arch.h:29)
> > ==70339==    by 0x4016834: __wake (pthread_impl.h:133)
> > ==70339==    by 0x4016834: cleanup (aio.c:154)
> > ==70339==    by 0x40167B0: io_thread_func (aio.c:255)
> > ==70339==    by 0x4054292: start (pthread_create.c:145)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==    by 0x4053071: ??? (clone.s:21)
> > ==70339==  If you believe this happened as a result of a stack
> > ==70339==  overflow in your program's main thread (unlikely but
> > ==70339==  possible), you can try to increase the size of the
> > ==70339==  main thread stack using the --main-stacksize= flag.
> > ==70339==  The main thread stack size used in this run was 8388608.
> > ==70339==
> > ==70339== HEAP SUMMARY:
> > ==70339==     in use at exit: 81,051 bytes in 9 blocks
> > ==70339==   total heap usage: 9 allocs, 3 frees, 81,051 bytes allocated
> > ==70339==
> > ==70339== LEAK SUMMARY:
> > ==70339==    definitely lost: 0 bytes in 0 blocks
> > ==70339==    indirectly lost: 0 bytes in 0 blocks
> > ==70339==      possibly lost: 0 bytes in 0 blocks
> > ==70339==    still reachable: 81,051 bytes in 9 blocks
> > ==70339==         suppressed: 0 bytes in 0 blocks
> > ==70339== Rerun with --leak-check=full to see details of leaked memory
> > ==70339==
> > ==70339== For counts of detected and suppressed errors, rerun with: -v
> > ==70339== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
> > Killed
>
> Based on discussions in the other branches of this thread and on IRC,
> I'm reasonably sure the cause of your crash is that your combination
> of kernel and cpu model produces very large signal frames that
> overflow the stack on the io thread. I have committed a solution to
> the problem which I plan to push soon, along with some additional
> improvements in this area.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 5457 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: aio_cancel segmentation fault for in progress write requests
  2018-12-17 14:21                     ` Arkadiusz Sienkiewicz
@ 2018-12-17 17:29                       ` Rich Felker
  0 siblings, 0 replies; 22+ messages in thread
From: Rich Felker @ 2018-12-17 17:29 UTC (permalink / raw)
  To: musl

On Mon, Dec 17, 2018 at 03:21:53PM +0100, Arkadiusz Sienkiewicz wrote:
> Great, thank you for the fix.
> Will it be available only in version 1.1.21 onward? Or will you also
> backport it to older versions?

Yes, it will be in 1.1.21. Generally I don't backport bugfixes within
a release series. At first glance I thought it would be trivial to
apply the relevant commit:

https://git.musl-libc.org/cgit/musl/commit/?id=1a6d6f131bd60ec2a858b34100049f0c042089f2

and this one it depends on:

https://git.musl-libc.org/cgit/musl/commit/?id=836022b1c3655f82cfc8ed5fc62006526ec73b8b

However, it also depens on the following commit, which depends on the
internal header wrappers and hidden-visibility framework that's new in
this release cycle:

https://git.musl-libc.org/cgit/musl/commit/?id=26c66c485c04fa782b8c6f7450bf008f4457b5a8

So some further tweaks would be needed if you really want to do this
on older versions. If you can't do this yourself there are probably
people in the community who'd be happy to do it as short contract
work.

Rich


> śr., 12 gru 2018 o 01:36 Rich Felker <dalias@libc.org> napisał(a):
> 
> > On Mon, Dec 10, 2018 at 10:05:05AM +0100, Arkadiusz Sienkiewicz wrote:
> > > Here are answers to some question directed to me earlier:
> > >
> > > > Could you attach the log from "strace -f -o strace.log ~/aioWrite"?
> > > Sorry, can't do that. strace is not installed and I don't have root
> > access.
> > > If this is still needed I will ask admin to add strace.
> > >
> > > > Do the other machines have the same kernel (4.15.0-20-generic)?
> > > No, other machines use kernel 4.15.0-39-generic.
> > >
> > > > Have you tried running the binary built on a successful machine on
> > > the problematic machine?
> > >
> > > Yes, same effect - segmentation fault. bt from gdb is identical too.
> > >
> > > > valgrind might also be a good idea.
> > >
> > > alpine-tmp-0:~$ strace -f ./aioWrite
> > > -sh: strace: not found
> > > alpine-tmp-0:~$ valgrind
> > > valgrind            valgrind-di-server  valgrind-listener
> > > alpine-tmp-0:~$ valgrind ./aioWrite
> > > ==70339== Memcheck, a memory error detector
> > > ==70339== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> > > ==70339== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
> > info
> > > ==70339== Command: ./aioWrite
> > > ==70339==
> > > ==70339== Invalid free() / delete / delete[] / realloc()
> > > ==70339==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > > ==70339==    by 0x4020248: reclaim_gaps (dynlink.c:478)
> > > ==70339==    by 0x4020CD0: map_library (dynlink.c:674)
> > > ==70339==    by 0x4021818: load_library (dynlink.c:980)
> > > ==70339==    by 0x4022607: load_preload (dynlink.c:1075)
> > > ==70339==    by 0x4022607: __dls3 (dynlink.c:1585)
> > > ==70339==    by 0x4021EDB: __dls2 (dynlink.c:1389)
> > > ==70339==    by 0x401FC8E: ??? (in /lib/ld-musl-x86_64.so.1)
> > > ==70339==  Address 0x4e9a180 is in a rw- mapped file
> > > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment
> > > ==70339==
> > > ==70339== Can't extend stack to 0x4087948 during signal delivery for
> > thread
> > > 2:
> > > ==70339==   no stack segment
> > > ==70339==
> > > ==70339== Process terminating with default action of signal 11 (SIGSEGV):
> > > dumping core
> > > ==70339==  Access not within mapped region at address 0x4087948
> > > ==70339==    at 0x4016834: __syscall3 (syscall_arch.h:29)
> > > ==70339==    by 0x4016834: __wake (pthread_impl.h:133)
> > > ==70339==    by 0x4016834: cleanup (aio.c:154)
> > > ==70339==    by 0x40167B0: io_thread_func (aio.c:255)
> > > ==70339==    by 0x4054292: start (pthread_create.c:145)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==    by 0x4053071: ??? (clone.s:21)
> > > ==70339==  If you believe this happened as a result of a stack
> > > ==70339==  overflow in your program's main thread (unlikely but
> > > ==70339==  possible), you can try to increase the size of the
> > > ==70339==  main thread stack using the --main-stacksize= flag..
> > > ==70339==  The main thread stack size used in this run was 8388608.
> > > ==70339==
> > > ==70339== HEAP SUMMARY:
> > > ==70339==     in use at exit: 81,051 bytes in 9 blocks
> > > ==70339==   total heap usage: 9 allocs, 3 frees, 81,051 bytes allocated
> > > ==70339==
> > > ==70339== LEAK SUMMARY:
> > > ==70339==    definitely lost: 0 bytes in 0 blocks
> > > ==70339==    indirectly lost: 0 bytes in 0 blocks
> > > ==70339==      possibly lost: 0 bytes in 0 blocks
> > > ==70339==    still reachable: 81,051 bytes in 9 blocks
> > > ==70339==         suppressed: 0 bytes in 0 blocks
> > > ==70339== Rerun with --leak-check=full to see details of leaked memory
> > > ==70339==
> > > ==70339== For counts of detected and suppressed errors, rerun with: -v
> > > ==70339== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
> > > Killed
> >
> > Based on discussions in the other branches of this thread and on IRC,
> > I'm reasonably sure the cause of your crash is that your combination
> > of kernel and cpu model produces very large signal frames that
> > overflow the stack on the io thread. I have committed a solution to
> > the problem which I plan to push soon, along with some additional
> > improvements in this area.
> >
> > Rich
> >


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-12-17 17:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 12:52 aio_cancel segmentation fault for in progress write requests Arkadiusz Sienkiewicz
2018-12-07 15:44 ` Rich Felker
2018-12-07 16:04   ` Arkadiusz Sienkiewicz
2018-12-07 16:52     ` Orivej Desh
2018-12-07 16:52     ` Rich Felker
2018-12-07 17:31       ` A. Wilcox
2018-12-07 18:26         ` Rich Felker
2018-12-07 19:05           ` A. Wilcox
2018-12-07 20:07             ` Rich Felker
2018-12-07 19:13           ` A. Wilcox
2018-12-07 20:21             ` Rich Felker
2018-12-07 20:35             ` Markus Wichmann
2018-12-07 21:12               ` Rich Felker
2018-12-07 22:51               ` A. Wilcox
2018-12-07 23:50                 ` Rich Felker
2018-12-07 20:06           ` Florian Weimer
2018-12-07 20:14             ` Rich Felker
2018-12-08 16:18               ` Florian Weimer
2018-12-10  9:05                 ` Arkadiusz Sienkiewicz
2018-12-12  0:36                   ` Rich Felker
2018-12-17 14:21                     ` Arkadiusz Sienkiewicz
2018-12-17 17:29                       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).