mailing list of musl libc
 help / color / mirror / code / Atom feed
* LMDB test failures under musl on mips
@ 2014-02-13 20:50 Martin Lucina
  2014-02-13 21:39 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Martin Lucina @ 2014-02-13 20:50 UTC (permalink / raw)
  To: musl

Hi,

I'm currently using musl libc and LMDB [1] in a new project. When
developing on a Debian x86_64 host everything works fine, but when building
for a target device (OpenWRT mips or mipsel, I've tried both) with static
linking my LMDB code starts failing with assertions and/or segfaults inside
LMDB itself.

Cross-compiling to statically linked musl on x86_64 does not have the
problem.

It's possible that the problem is LMDB itself; I can ask on the OpenLDAP
lists but I'd like to check here first if someone else has encountered this
problem?

You can reproduce the problem fairly easily by building the mtest* programs
that come with LMDB. Running mtest a few times (after creating ./testdb)
reliably gives either a segfault or various assertion failures in LMDB.

Note that I'm using the prebuilt toolchains from musl.codu.org (thanks!),
and the 0.9,15 release.

Any ideas?

Thanks,

Martin

[1] http://symas.com/mdb/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-13 20:50 LMDB test failures under musl on mips Martin Lucina
@ 2014-02-13 21:39 ` Rich Felker
  2014-02-13 21:42 ` Szabolcs Nagy
  2014-02-13 23:26 ` Szabolcs Nagy
  2 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2014-02-13 21:39 UTC (permalink / raw)
  To: musl

Hi,

Thank you for the report. Can you provide some more details on where
it's failing such as the specific tests/assertions? I'll see if
someone with a mips setup can look at it.

Rich


On Thu, Feb 13, 2014 at 09:50:40PM +0100, Martin Lucina wrote:
> Hi,
> 
> I'm currently using musl libc and LMDB [1] in a new project. When
> developing on a Debian x86_64 host everything works fine, but when building
> for a target device (OpenWRT mips or mipsel, I've tried both) with static
> linking my LMDB code starts failing with assertions and/or segfaults inside
> LMDB itself.
> 
> Cross-compiling to statically linked musl on x86_64 does not have the
> problem.
> 
> It's possible that the problem is LMDB itself; I can ask on the OpenLDAP
> lists but I'd like to check here first if someone else has encountered this
> problem?
> 
> You can reproduce the problem fairly easily by building the mtest* programs
> that come with LMDB. Running mtest a few times (after creating ./testdb)
> reliably gives either a segfault or various assertion failures in LMDB.
> 
> Note that I'm using the prebuilt toolchains from musl.codu.org (thanks!),
> and the 0.9,15 release.
> 
> Any ideas?
> 
> Thanks,
> 
> Martin
> 
> [1] http://symas.com/mdb/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-13 20:50 LMDB test failures under musl on mips Martin Lucina
  2014-02-13 21:39 ` Rich Felker
@ 2014-02-13 21:42 ` Szabolcs Nagy
  2014-02-13 23:26 ` Szabolcs Nagy
  2 siblings, 0 replies; 7+ messages in thread
From: Szabolcs Nagy @ 2014-02-13 21:42 UTC (permalink / raw)
  To: musl

* Martin Lucina <martin@lucina.net> [2014-02-13 21:50:40 +0100]:
> I'm currently using musl libc and LMDB [1] in a new project. When
> developing on a Debian x86_64 host everything works fine, but when building
> for a target device (OpenWRT mips or mipsel, I've tried both) with static
> linking my LMDB code starts failing with assertions and/or segfaults inside
> LMDB itself.
> 
> Cross-compiling to statically linked musl on x86_64 does not have the
> problem.
> 
> It's possible that the problem is LMDB itself; I can ask on the OpenLDAP
> lists but I'd like to check here first if someone else has encountered this
> problem?
> 

mips was not nearly as extensively tested as x86 targets and
it has a lot of arch specific syscall quirks

so it may be a musl bug, if you have strace on the target
then please send a strace log

what is the pagesize used on the target?
(iirc some mips can be non 4k)

> You can reproduce the problem fairly easily by building the mtest* programs
> that come with LMDB. Running mtest a few times (after creating ./testdb)
> reliably gives either a segfault or various assertion failures in LMDB.
> 
> Note that I'm using the prebuilt toolchains from musl.codu.org (thanks!),
> and the 0.9,15 release.
> 
> Any ideas?
> 
> Thanks,
> 
> Martin
> 
> [1] http://symas.com/mdb/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-13 20:50 LMDB test failures under musl on mips Martin Lucina
  2014-02-13 21:39 ` Rich Felker
  2014-02-13 21:42 ` Szabolcs Nagy
@ 2014-02-13 23:26 ` Szabolcs Nagy
  2014-02-14  9:31   ` Martin Lucina
  2 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2014-02-13 23:26 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 488 bytes --]

* Martin Lucina <martin@lucina.net> [2014-02-13 21:50:40 +0100]:
> You can reproduce the problem fairly easily by building the mtest* programs
> that come with LMDB. Running mtest a few times (after creating ./testdb)
> reliably gives either a segfault or various assertion failures in LMDB.

ok i could reproduce it
i got the following assertion failure:

mdb.c:2001: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch()

(it's on real hw without debugger, but i have strace now)


[-- Attachment #2: mtest.txt --]
[-- Type: text/plain, Size: 3608 bytes --]

execve("./mtest", ["./mtest"], [/* 11 vars */]) = 0
clock_gettime(CLOCK_REALTIME, {1392333473, 890526084}) = 0
getpid()                                = 28121
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2abe1000
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac62000
open("./testdb/lock.mdb", O_RDWR|O_CREAT|O_LARGEFILE|0x80000, 0664) = 3
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 16) = 0
set_thread_area(0x2abe7d68)             = 0
set_tid_address(0x2abe0cb8)             = 28121
fcntl64(3, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}, 0x7f9ee820) = 0
_llseek(3, 0, [8192], SEEK_END)         = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x2abce000
open("./testdb/data.mdb", O_RDWR|O_CREAT|O_LARGEFILE, 0664) = 4
pread(4, "\0\0\0\0\0\0\0\10\0\0\0\0\276\357\300\336\0\0\0\1*\326@\0\0\240\0\0\0\0\20\0"..., 92, 0) = 92
pread(4, "\0\0\0\1\0\0\0\10\0\0\0\0\276\357\300\336\0\0\0\1*\326@\0\0\240\0\0\0\0\20\0"..., 92, 4096) = 92
mmap(0x2ad64000, 10485760, PROT_READ, MAP_SHARED, 4, 0) = 0x2ad64000
open("./testdb/data.mdb", O_RDWR|O_SYNC|O_LARGEFILE) = 5
fcntl64(3, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=1}, 0x7f9ee878) = 0
brk(0)                                  = 0x423000
brk(0x425000)                           = 0x425000
ioctl(1, TIOCNXCL, 0x7f9ee6e0)          = -1 ENOTTY (Inappropriate ioctl for device)
writev(1, [{"Adding 75", 9}, {" values\n", 8}], 2) = 17
brk(0x427000)                           = 0x427000
brk(0x429000)                           = 0x429000
pwrite(4, "\0\0\0\2\0\0\0\1\0\20\17\354\17\370\17\354\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8192) = 4096
_llseek(4, 40960, [40960], SEEK_SET)    = 0
writev(4, [{"\0\0\0\n\0\0\0\2\0\212\5,\17\324\17\250\17|\17P\17$\16\370\16\314\7\354\16\240\16t"..., 4096}, {"\0\0\0\v\0\0\0\2\0\20\17\310\17\344\17\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096}], 2) = 8192
fdatasync(4)                            = 0
pwrite(5, "\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0\2\0\0"..., 58, 34) = 58
writev(1, [{"74 duplicates skipped\nkey: 0x2ad"..., 1021}, {"097 ", 4}], 2) = 1025
writev(1, [{", data: 0x2ad67da4 097 151 foo b"..., 1024}, {"\n", 1}], 2) = 1025
writev(1, [{"key: 0x2ad67b64 114 , data: 0x2a"..., 1020}, {"2ad67640", 8}], 2) = 1028
writev(1, [{" 1a3 419 foo bar\nkey: 0x2ad67820"..., 1023}, {"258 ", 4}], 2) = 1027
writev(1, [{", data: 0x2ad68f04 258 600 foo b"..., 1024}, {"\n", 1}], 2) = 1025
writev(1, [{"key: 0x2ad68c98 327 , data: 0x2a"..., 1020}, {"2ad68ab8", 8}], 2) = 1028
writev(1, [{" 377 887 foo bar\nkey: 0x2ad68a88"..., 1024}, {"3f4 ", 4}], 2) = 1028
_llseek(4, 16384, [16384], SEEK_SET)    = 0
writev(4, [{"\0\0\0\4\0\0\0\1\0\20\17\354\17\370\17\354\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096}, {"\0\0\0\5\0\0\0\2\0\210\5X\17\324\17\250\17|\17P\17$\16\370\16\314\7\354\16\240\16t"..., 4096}, {"\0\0\0\6\0\0\0\2\0\20\17\310\17\344\17\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096}], 3) = 12288
fdatasync(4)                            = 0
pwrite(5, "\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\0\0\0\6\0\0\0\0\0\0\0\2\0\0"..., 58, 4130) = 58
writev(2, [{"mdb.c:2001: Assertion 'mp->mp_pg"..., 71}, {NULL, 0}], 2mdb.c:2001: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch()
) = 71
rt_sigprocmask(SIG_BLOCK, ~[RT_0 RT_1 RT_2], [], 16) = 0
gettid()                                = 28121
getpid()                                = 28121
tgkill(28121, 28121, SIGIOT)            = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 16) = 0
--- SIGIOT (Aborted) @ 0 (0) ---
+++ killed by SIGIOT +++
Aborted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-13 23:26 ` Szabolcs Nagy
@ 2014-02-14  9:31   ` Martin Lucina
  2014-02-14 10:26     ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Lucina @ 2014-02-14  9:31 UTC (permalink / raw)
  To: musl

nsz@port70.net said:
> * Martin Lucina <martin@lucina.net> [2014-02-13 21:50:40 +0100]:
> > You can reproduce the problem fairly easily by building the mtest* programs
> > that come with LMDB. Running mtest a few times (after creating ./testdb)
> > reliably gives either a segfault or various assertion failures in LMDB.
> 
> ok i could reproduce it
> i got the following assertion failure:
> 
> mdb.c:2001: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch()
> 
> (it's on real hw without debugger, but i have strace now)

That's what I get, and also these:

mdb.c:5176: Assertion 'IS_LEAF(mp)' failed in mdb_cursor_next()

or

mdb.c:1713: Assertion 'rc == 0' failed in mdb_page_dirty()

etc.

mtest is somewhat fickle, it uses random() to decide exactly what it's
doing. I have a hunch that I can provoke this with a simpler test program,
going to try that now.

Do you still want those strace logs from me?

Both of the targets (ASUS RT-N66u running Tomato, TP-Link TL-WDR4300
running OpenWRT trunk) I tried have 4k page size, so nothing out of the
ordinary there.

One thing I'd like to try is building against the normal OpenWRT/uClibc
toolchain (or even a plain glibc one) to see if anything changes.
Unfortunately the snapshot binaries they provide require at least glibc
2.14 which I don't have on my machines running Debian stable. I tried
using a toolchain built from source using the OpenWRT buildroot but get
random link errors :-/

Martin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-14  9:31   ` Martin Lucina
@ 2014-02-14 10:26     ` Szabolcs Nagy
  2014-02-14 15:53       ` Martin Lucina
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2014-02-14 10:26 UTC (permalink / raw)
  To: musl

* Martin Lucina <martin@lucina.net> [2014-02-14 10:31:56 +0100]:
> That's what I get, and also these:
> 
> mdb.c:5176: Assertion 'IS_LEAF(mp)' failed in mdb_cursor_next()
> 
> or
> 
> mdb.c:1713: Assertion 'rc == 0' failed in mdb_page_dirty()
> 
> etc.
> 
> mtest is somewhat fickle, it uses random() to decide exactly what it's
> doing. I have a hunch that I can provoke this with a simpler test program,
> going to try that now.

i removed the srandom(time(NULL)) and disabled ASLR and it's still fickle
i haven't looked further

> 
> Do you still want those strace logs from me?
> 

no, i think strace does not help here
(at least i didnt see anything obvious)

i don't quite understand the nondeterministic behaviour

it seems to do reads/writes and mmap through two different fds to the same
underlying file, but it does fdatasync on one and O_SYNC on the other
so i think the behaviour should be deterministic
(i'd need to know more about mdb and see the mmap accesses as well
to figure out what's going on..)

> Both of the targets (ASUS RT-N66u running Tomato, TP-Link TL-WDR4300
> running OpenWRT trunk) I tried have 4k page size, so nothing out of the
> ordinary there.
> 

i tried it on a wrt160nl with old openwrt image
(Atheros AR9130 cpu)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LMDB test failures under musl on mips
  2014-02-14 10:26     ` Szabolcs Nagy
@ 2014-02-14 15:53       ` Martin Lucina
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Lucina @ 2014-02-14 15:53 UTC (permalink / raw)
  To: musl

nsz@port70.net said:
> no, i think strace does not help here
> (at least i didnt see anything obvious)

Same here.

> i don't quite understand the nondeterministic behaviour

Me neither :/ Something to do with magic memory accesses, most likely. I've
gotten in touch with the author via the openldap lists and we are looking
into it.

I built a toolchain for mips-unknown-linux-gnu, using GCC 4.8.1
(crosstool-NG 1.19.0) and configured with eglibc 2.17. mtest still fails
with the same symptoms - so it looks like we can rule out musl as a cause
for the time being.

Will follow up once I have more information.

Thanks for your help,

Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-14 15:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-13 20:50 LMDB test failures under musl on mips Martin Lucina
2014-02-13 21:39 ` Rich Felker
2014-02-13 21:42 ` Szabolcs Nagy
2014-02-13 23:26 ` Szabolcs Nagy
2014-02-14  9:31   ` Martin Lucina
2014-02-14 10:26     ` Szabolcs Nagy
2014-02-14 15:53       ` Martin Lucina

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).