From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14541 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: dlsym returning unresolved symbol address instead of dependency library symbol address Date: Sat, 10 Aug 2019 12:42:52 -0400 Message-ID: <20190810164252.GL9017@brightrain.aerifal.cx> References: <20190810101111.GH22009@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="20699"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Luiz Angelo Daros de Luca To: musl@lists.openwall.com Original-X-From: musl-return-14557-gllmg-musl=m.gmane.org@lists.openwall.com Sat Aug 10 18:43:10 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hwUST-0005Gb-GW for gllmg-musl@m.gmane.org; Sat, 10 Aug 2019 18:43:09 +0200 Original-Received: (qmail 27832 invoked by uid 550); 10 Aug 2019 16:43:05 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 27814 invoked from network); 10 Aug 2019 16:43:05 -0000 Content-Disposition: inline In-Reply-To: <20190810101111.GH22009@port70.net> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14541 Archived-At: On Sat, Aug 10, 2019 at 12:11:11PM +0200, Szabolcs Nagy wrote: > * Luiz Angelo Daros de Luca [2019-08-10 05:16:19 -0300]: > > I'm ruby maintainer in OpenWrt 18.06 (musl 1.1.19). I got a bug report ( > > https://github.com/openwrt/packages/issues/9297) related to musl in mipsel > > 32bit. > > > > When ruby loads a module (.so), it checks if that module was built for the > > same ruby that is loading it. Ruby loads libruby at startup, which exports > > ruby_xmalloc sym. So, the check consists on loading the module, searching > > for ruby_xmalloc in the module context and comparing with global > > ruby_xmalloc address. If they do not match, the module is using a different > > libruby. Something like this: > > > > handle = (void*)dlopen(file, RTLD_LAZY|RTLD_GLOBAL) > > void *ex = dlsym(handle, EXTERNAL_PREFIX"ruby_xmalloc"); > > if (ex && ex != ruby_xmalloc) { > > // module is incompatible! > > } > > > > The first time a module is loaded, it simply works as expected. > > I debugged and musl is working nicely. At do_dlsym(struct dso *p, const > > char *s, void *ra), it correctly fails to find the symbol with: > > > > sym = sysv_lookup(s, h, p) > > > > and correctly find it with: > > > > sysv_lookup(s, h, p->deps[0]) > > > > Now, when the second module is loaded, it find "ruby_xmalloc" already with: > > > > sym = sysv_lookup(s, h, p) > > > > However, sym now points to the address of the undefined symbol in the > > second library (sym->st_shndx is NULL) instead of searching for it in > > dependencies. It seems that do_dlsym() only checks for undefined symbol > > (sym->shndx==NULL) when DL_FDPIC is 1 and DL_FDPIC is 0 in my case. > > > > Does it make any sense to return an undefined symbol from dlsym()? > > Or does it make sense to return an undefined symbol from sysv_lookup()? > > Or is there any other arch specific issue that happened before, when > > library was loaded? > > yes, if the search involves the main executable then > st_shndx==0 && st_value!=0 symbols must be included > because it's a plt in the exe and that's how function > addresses work.. on most targets except mips. > > undef syms have st_value==0 in shared libs, maybe > not in mips? can you post the readelf -aW output of > the module that has st_shndx==0 && st_value!=0 entry > in its dynamic symbol table > > i think this was going to be fixed by > https://www.openwall.com/lists/musl/2017/02/16/1/2 > but that was never applied. I brought it up a few times after that, asking what should be done since it no longer cleanly applies. The concept of that patch is probably still right but a localized fix now followed by deduplication later is probably preferable. Do you know if the TLS and STB_LOCAL issues described there still exist too? > > I created a simple patch that skips a symbol if it is undefined. > > https://raw.githubusercontent.com/luizluca/openwrt/b9674d528513c7c93205fa000fed7c0d3c6bb2e7/toolchain/musl/patches/020-dlsym_donot_return_address_from_undef_sym.patch This patch is wrong (on non-MIPS and on MIPS with PLT); it will result in wrong values for dlsym of a > i think the find_sym logic should be copied > because mips behaves differently from other targets: > > http://git.musl-libc.org/cgit/musl/commit/?id=2d8cc92a7cb4a3256ed07d86843388ffd8a882b1 Yes. Conceptually, compared to find_sym, need_def is always false for dlsym (dlsym must return PLT thunk and copy relocation definitions), and STT_TLS was already checked as a special case above to lookup the thread-local copy of the object, so the only additional check needed here is !ARCH_SYM_REJECT_UND(sym). Does that sound correct to you? Rich