From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 3DA0A29C25 for ; Thu, 26 Sep 2024 21:36:20 +0200 (CEST) Received: (qmail 22061 invoked by uid 550); 26 Sep 2024 19:36:14 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com x-ms-reactions: disallow Received: (qmail 22012 invoked from network); 26 Sep 2024 19:36:13 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1727379365; x=1727984165; darn=lists.openwall.com; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=p6+TGpH/+vZYaoUx/eCSlnXPd7SxbgI/4MhaiQTy1Q8=; b=flaUg61We0BF4h6F9d4izMk0/PNqGEG1wL5AorGs7qV4svjwS7DZ18fuqID9dbSQcl mTJd1i9YCNh36+p6vLnXAERvD4Zo+Au8Kwc9N3JHHP0ilCCDSEcpguUnB8syP7PsK6yD EwuwDf5ywW9Y9fSjKI3/mY4DvfKLGQXvkksWg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727379365; x=1727984165; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p6+TGpH/+vZYaoUx/eCSlnXPd7SxbgI/4MhaiQTy1Q8=; b=ts1UJzhUH8zIap6McihDOx4HCT/etTwhYy3qA7eZ4G/5DsYoKQYO1f3MIpic2DZLJv tYBSBe+qSWV2eiPfvgluh0pEutVf/i9zjDe22AUQUnM1WDTLe12qZ8lqpYC+NuYWuzCS nzGyLjgU6ilYD1bxNBXU7r83G7aqKRAdMvJa8XW5wGC0jcq26tfXTM7lryScSEWvSOjd HoPG8nTsmDpE3H+bBl1Sasr/Xao5ym0nC9rew64XLgy/N/mQk3ZCxZ4J5qUPLLhAJcdp 99hg7rEhdO7+u2VUfQo1lU434gwOhblxBmEmhTABGZm2tSrdI3xBsLullZIbWwurdInJ Hpnw== X-Forwarded-Encrypted: i=1; AJvYcCWZcIEyq+0/UB43hO8UvVObqHFeBZ/kqeZA/RREurJnBiw9PZOLn3pEGiladgpJdlWu/FUY@lists.openwall.com X-Gm-Message-State: AOJu0YwQc9KFDxGs/vHbisGuOcPlyAD98jF15S+kMP91KU/9TxAlIUd3 lZ6EoIIR9gqbO2XsrYHIIiuw4cUEmy5mBce+6gJLnHFARiAS9gMbpj2Hqx5drVjyoNFJC8jqH9x n9fsBkX0u2wHXfxoWQKY/4JMg5PRlhfiHMEc9 X-Google-Smtp-Source: AGHT+IHc2BIoXd64Xh6//UJcC062AAA+NZ+qetjhBD0XdBVWtotuQhqBw1mgpC9EykcWq3exH5Qvg/B4u4BTBuRZ5UM= X-Received: by 2002:a17:90b:190b:b0:2da:5347:b0fa with SMTP id 98e67ed59e1d1-2e0b8b177b4mr817698a91.18.1727379364825; Thu, 26 Sep 2024 12:36:04 -0700 (PDT) MIME-Version: 1.0 References: <20240705211232.2718532-1-mmayer@broadcom.com> <20240705211232.2718532-2-mmayer@broadcom.com> <20240723210849.GU3766212@port70.net> In-Reply-To: <20240723210849.GU3766212@port70.net> From: Markus Mayer Date: Thu, 26 Sep 2024 12:35:52 -0700 Message-ID: To: Markus Mayer , Musl Mailing List , Colin Cross , Colin Cross Content-Type: text/plain; charset="UTF-8" Subject: Re: [musl] [PATCH 1/1] Ignore incorrect elf architecture libraries On Tue, 23 Jul 2024 at 14:08, Szabolcs Nagy wrote: Thanks for reviewing the proposed changes. It's been a little while since the submission. I found some time to work on this again, and I was able to incorporate most of the proposed changes. Everything is still working as intended. Also, Colin reached out to me following my re-submission of his proposal. He told me he wasn't going to have time to work on it for the foreseeable future and that I was welcome to continue the work and to take over authorship in that case. Here's a link to the current state of affairs.[1] I am not posting the new patch yet and just posting a link, because work isn't quite finished. Also, this makes it easier to continue the already existing thread. I am providing my thoughts and responses below. > > Modify the loader to continue searching when a file is found that is > > a valid elf file but for the an elf machine or class that differs from > > that of the loader itself. > > fwiw i think the proposed feature is reasonable, i added review > comments that may help inclusion into musl. > > i'd note that skipping wrong e_machine and e_ident[EI_CLASS] is > required by the elf spec: > [...] > > if we do this change i think e_ident[EI_DATA] should be checked > too for le/be and e_flags for abi variants like arm hf. I agree with regards to EI_DATA. It makes a lot of sense checking that as well while we are at it, and it is now already implemented in the current revision of this proposed change. Checking the e_flags also sounds very sensible, especially in the context of ARM. However, I am not sure if it is feasible. The specification says that e_flags are architecture-specific, so interpreting and checking the flags would become a per-architecture endeavour. That would mean the loader has to know a lot about the different architectures and then decide at runtime which check to run. Ideally, we would probably need conditional compilation to only build in the checks for the architecture in question. Otherwise the code gets bloated with routines that will never be called. It sounds rather involved and a little messy. Also, there doesn't seem to be a precedent for architecture-specific code in the loader. To get an idea, see [2] for what readelf is getting up to in order to parse the ELF flags for ARM. (Spoiler alert, it's over 200 lines of code.) That being said, if there is a non-messy way to implement checks for ARM soft- and hard-float, I am all for it. > e_type, e_version, EI_OSABI and EI_ABIVERSION are unlikely to be > useful (the latter may be used in the future to bump musl abi) > i don't have a strong opinion about these. Too many checks might also lead to false positives, i.e. binaries being rejected that would have run just fine. All it would take is for some obscure ELF header fields to be set improperly at build time and nobody notices, because the fields are ignored by almost everyone. Alternatively, our checks could end up being too strict even when the fields are all set correctly. > https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#shobj_dependencies > > > --- > > ldso/dynlink.c | 77 +++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 61 insertions(+), 16 deletions(-) > > > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c > > index 324aa85919f0..2f1ac7e9d089 100644 > > --- a/ldso/dynlink.c > > +++ b/ldso/dynlink.c > > @@ -68,6 +68,8 @@ struct dso { > > size_t *dynv; > > struct dso *next, *prev; > > > > + int elfmachine; > > + int elfclass; > > i don't think we need this in every dso. i'd make them global > > e.g. ldso_ehdr (or just the ldso_e_machine and ldso_e_class) Yes, I did that. At first I created individual, global variables, but ultimately I decided to encapsulate them in a struct to make the code a little cleaner and introduce only one global variable rather than four. The struct now looks a bit like it is duplicating Ehdr, but there are far fewer fields inside. > > Phdr *phdr; > > int phnum; > > size_t phentsize; > > @@ -684,6 +686,19 @@ static void unmap_library(struct dso *dso) > > } > > } > > > > +static int verify_elf_magic(const Ehdr* eh) { > > + return eh->e_ident[0] == ELFMAG0 && > > + eh->e_ident[1] == ELFMAG1 && > > + eh->e_ident[2] == ELFMAG2 && > > + eh->e_ident[3] == ELFMAG3; > > +} > > + > > +/* Verifies that an elf header's machine and class match the loader */ > > +static int verify_elf_arch(const Ehdr* eh) { > > + return eh->e_machine == ldso.elfmachine && > > + eh->e_ident[EI_CLASS] == ldso.elfclass; > > +} > > i'd do more checks and drop the comment. Done. > > + > > static void *map_library(int fd, struct dso *dso) > > { > > Ehdr buf[(896+sizeof(Ehdr))/sizeof(Ehdr)]; > > @@ -706,6 +721,10 @@ static void *map_library(int fd, struct dso *dso) > > if (l<0) return 0; > > if (le_type != ET_DYN && eh->e_type != ET_EXEC)) > > goto noexec; > > + if (!verify_elf_magic(eh)) goto noexec; > > + if (!verify_elf_arch(eh)) goto noexec; > > ok. > > > + dso->elfmachine = eh->e_machine; > > + dso->elfclass = eh->e_ident[EI_CLASS]; > > i don't think this is needed. Indeed. Removed. > > phsize = eh->e_phentsize * eh->e_phnum; > > if (phsize > sizeof buf - sizeof *eh) { > > allocated_buf = malloc(phsize); > > @@ -870,29 +889,53 @@ error: > > return 0; > > } > > > > -static int path_open(const char *name, const char *s, char *buf, size_t buf_size) > > +static int path_open_library(const char *name, const char *s, char *buf, size_t buf_size) > > no need to rename. Done. > > { > > size_t l; > > int fd; > > + const char *p; > > for (;;) { > > s += strspn(s, ":\n"); > > + p = s; > > l = strcspn(s, ":\n"); > > i would keep the orig logic and add new stuff to > > if ((fd = ...) >= 0) { > // here > } else switch (errno) { > // this is orig logic > } > > then the diff is easier to review. Done. However, there's an indentation change nonetheless, so the diff still looks quite large. The unchanged "else" branch shows up in the diff because of whitespace. Using "git diff -w" for this section helps a bit. > > if (l-1 >= INT_MAX) return -1; > > - if (snprintf(buf, buf_size, "%.*s/%s", (int)l, s, name) < buf_size) { > > - if ((fd = open(buf, O_RDONLY|O_CLOEXEC))>=0) return fd; > > - switch (errno) { > > - case ENOENT: > > - case ENOTDIR: > > - case EACCES: > > - case ENAMETOOLONG: > > - break; > > - default: > > - /* Any negative value but -1 will inhibit > > - * futher path search. */ > > + s += l; > > + if (snprintf(buf, buf_size, "%.*s/%s", (int)l, p, name) < buf_size) { > > + fd = open(buf, O_RDONLY|O_CLOEXEC); > > + if (fd < 0) { > > + switch (errno) { > > + case ENOENT: > > + case ENOTDIR: > > + case EACCES: > > + case ENAMETOOLONG: > > + /* Keep searching in path list. */ > > + continue; > > + default: > > + /* Any negative value but -1 will > > + * inhibit further path search in > > + * load_library. */ > > + return -2; > > + } > > + } > > + Ehdr eh; > > + ssize_t n = pread(fd, &eh, sizeof(eh), 0); > > i'd use read instead of pread I tried. It doesn't work. The reason is that read() advances the file pointer whereas pread() does not. In other words, pread() doesn't affect subsequent read() calls, whereas using read() here throws off a subsequent read that was expecting to read the beginning of the file. I didn't spend the time to narrow down where exactly it is tripping up when using read() here. However, I tried read() followed by lseek() to reset the file pointer to the beginning to verify my theory. Doing this works, and therefore proves that the advanced file pointer is indeed the problem. To summarize: - using pread() works - using read() fails - using read() followed by lseek(fd, 0, SEEK_SET) works, but adds an extra syscall As a result, the code is continuing to use pread(). > this adds an extra syscall per dso that uses path lookup. > this is the main cost of the patch, so i'd note this in > the commit msg. Done. For now, at least. I still need to address Rich's feedback about the caller-provided buffer. > > + /* If the elf file is invalid return -2 to inhibit > > + * further path search in load_library. */ > > + if (n < 0 || > > + n != sizeof eh || > > + !verify_elf_magic(&eh)) { > > + close(fd); > > return -2; > > } > > ok. > > > + /* If the elf file has a valid header but is for the > > + * wrong architecture ignore it and keep searching the > > + * path list. */ > > + if (!verify_elf_arch(&eh)) { > > + close(fd); > > + continue; > > + } > > + return fd; > > > that's a weird way to end a loop. i'd do > > if (verify_elf_arch(&eh)) > return fd; > close(fd); Done. Regards, -Markus [1] https://github.com/mmayer/musl-libc/compare/master...ld_so-squashed [2] https://sourceware.org/git?p=binutils-gdb.git;a=blob;f=binutils/readelf.c;h=0f8dc1b9716ed5c0ba13ececfc012ed59f8ba270;hb=HEAD#l3511