mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] unlink on NFS volume fails silently
@ 2025-07-06  6:25 Stephen Von Takach
  2025-07-07 18:26 ` Markus Wichmann
  2025-07-08  1:08 ` Rich Felker
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Von Takach @ 2025-07-06  6:25 UTC (permalink / raw)
  To: musl; +Cc: Viv Briffa

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

Hi,

We recently had to move a service from being built on alpine linux to
debian linux as we were getting silent failures when deleting a directory
with many files on an NFS volume. Basically this call to unlink was not
raising an error if the file failed to delete
https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/unix/file.cr#L129

We replicated the issue in an alpine container with rm -rf
/nfs_mount/git_repo_to_delete and it also failed to successfully delete all
the files, it did raise an error though (I assume it checked the file was
removed before continuing) not entirely sure.

Both these operations succeed with glibc when using debian.
Looks a bit like this issue:
https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@place.technology

[-- Attachment #2: Type: text/html, Size: 4184 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-06  6:25 [musl] unlink on NFS volume fails silently Stephen Von Takach
@ 2025-07-07 18:26 ` Markus Wichmann
  2025-07-08  1:08 ` Rich Felker
  1 sibling, 0 replies; 12+ messages in thread
From: Markus Wichmann @ 2025-07-07 18:26 UTC (permalink / raw)
  To: musl; +Cc: Viv Briffa, Stephen Von Takach

Am Sun, Jul 06, 2025 at 04:25:31PM +1000 schrieb Stephen Von Takach:
> Hi,
> 
> We recently had to move a service from being built on alpine linux to
> debian linux as we were getting silent failures when deleting a directory
> with many files on an NFS volume. Basically this call to unlink was not
> raising an error if the file failed to delete
> https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/unix/file.cr#L129
> 
> We replicated the issue in an alpine container with rm -rf
> /nfs_mount/git_repo_to_delete and it also failed to successfully delete all
> the files, it did raise an error though (I assume it checked the file was
> removed before continuing) not entirely sure.
> 
> Both these operations succeed with glibc when using debian.
> Looks a bit like this issue:
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
> 

Neither unlink() nor readdir() are appreciably different between musl
and glibc, so I don't know what is going on. Could you strace the test
cases to see where the differences are?

Ciao,
Markus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-06  6:25 [musl] unlink on NFS volume fails silently Stephen Von Takach
  2025-07-07 18:26 ` Markus Wichmann
@ 2025-07-08  1:08 ` Rich Felker
  2025-07-09  3:59   ` [musl] " Stephen Von Takach
  1 sibling, 1 reply; 12+ messages in thread
From: Rich Felker @ 2025-07-08  1:08 UTC (permalink / raw)
  To: Stephen Von Takach; +Cc: musl, Viv Briffa

On Sun, Jul 06, 2025 at 04:25:31PM +1000, Stephen Von Takach wrote:
> Hi,
> 
> We recently had to move a service from being built on alpine linux to
> debian linux as we were getting silent failures when deleting a directory
> with many files on an NFS volume. Basically this call to unlink was not
> raising an error if the file failed to delete
> https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/unix/file.cr#L129
> 
> We replicated the issue in an alpine container with rm -rf
> /nfs_mount/git_repo_to_delete and it also failed to successfully delete all
> the files, it did raise an error though (I assume it checked the file was
> removed before continuing) not entirely sure.
> 
> Both these operations succeed with glibc when using debian.
> Looks a bit like this issue:
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960

Assuming you actually traced and saw the unlink syscall succeed, the
root cause here is the filesystem/kernel lying about that. Standard
"NFS Considered Harmful" stuff.

But the fact that you're seeing different behavior on Alpine is almost
surely a matter of busybox rm vs GNU coreutils differences in how they
behave under faulty kernel behavior. The easy solution is probably
installing the coreutils package. Otherwise, investigate what busybox
is doing differently and if there's a way it could be made more
reliable in this situation.

Rich



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] unlink on NFS volume fails silently
  2025-07-08  1:08 ` Rich Felker
@ 2025-07-09  3:59   ` Stephen Von Takach
  2025-07-09 18:41     ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Von Takach @ 2025-07-09  3:59 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Viv Briffa

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

Yes we traced this.
The libc unlink function on musl returned 0 for a filename on an NFS mount
that wasn't deleted.
https://www.gnu.org/software/libc/manual/html_node/Deleting-Files.html

The same call to unlink on glibc returned 0 and actually removed the file.

The issue occurs when there is a high volume of files being removed


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@place.technology


On Tue, 8 Jul 2025 at 11:08, Rich Felker <dalias@libc.org> wrote:

> On Sun, Jul 06, 2025 at 04:25:31PM +1000, Stephen Von Takach wrote:
> > Hi,
> >
> > We recently had to move a service from being built on alpine linux to
> > debian linux as we were getting silent failures when deleting a directory
> > with many files on an NFS volume. Basically this call to unlink was not
> > raising an error if the file failed to delete
> >
> https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/unix/file.cr#L129
> >
> > We replicated the issue in an alpine container with rm -rf
> > /nfs_mount/git_repo_to_delete and it also failed to successfully delete
> all
> > the files, it did raise an error though (I assume it checked the file was
> > removed before continuing) not entirely sure.
> >
> > Both these operations succeed with glibc when using debian.
> > Looks a bit like this issue:
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
>
> Assuming you actually traced and saw the unlink syscall succeed, the
> root cause here is the filesystem/kernel lying about that. Standard
> "NFS Considered Harmful" stuff.
>
> But the fact that you're seeing different behavior on Alpine is almost
> surely a matter of busybox rm vs GNU coreutils differences in how they
> behave under faulty kernel behavior. The easy solution is probably
> installing the coreutils package. Otherwise, investigate what busybox
> is doing differently and if there's a way it could be made more
> reliable in this situation.
>
> Rich
>
>

[-- Attachment #2: Type: text/html, Size: 5806 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-09  3:59   ` [musl] " Stephen Von Takach
@ 2025-07-09 18:41     ` Rich Felker
  2025-07-09 23:01       ` [musl] " Stephen Von Takach
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2025-07-09 18:41 UTC (permalink / raw)
  To: Stephen Von Takach; +Cc: musl, Viv Briffa

On Wed, Jul 09, 2025 at 01:59:18PM +1000, Stephen Von Takach wrote:
> Yes we traced this.
> The libc unlink function on musl returned 0 for a filename on an NFS mount
> that wasn't deleted.
> https://www.gnu.org/software/libc/manual/html_node/Deleting-Files.html
> 
> The same call to unlink on glibc returned 0 and actually removed the file.

They are not doing anything different. If you're getting a different
result, it's something else different about the systems, possibly as
random as timing differences between the program using musl and the
one using glibc. But I would also check for different kernel versions
or configurations, mount setups, etc.

> The issue occurs when there is a high volume of files being removed

Sounds like a timing dependent bug in the NFS implementation.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] unlink on NFS volume fails silently
  2025-07-09 18:41     ` Rich Felker
@ 2025-07-09 23:01       ` Stephen Von Takach
  2025-07-10  0:03         ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Von Takach @ 2025-07-09 23:01 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Viv Briffa

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

We're using docker containers running on the same kernel using the same
mount setup.
Works on debian, does not work on alpine.

The difference is at the libc interface.


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@place.technology


On Thu, 10 Jul 2025 at 04:41, Rich Felker <dalias@libc.org> wrote:

> On Wed, Jul 09, 2025 at 01:59:18PM +1000, Stephen Von Takach wrote:
> > Yes we traced this.
> > The libc unlink function on musl returned 0 for a filename on an NFS
> mount
> > that wasn't deleted.
> > https://www.gnu.org/software/libc/manual/html_node/Deleting-Files.html
> >
> > The same call to unlink on glibc returned 0 and actually removed the
> file.
>
> They are not doing anything different. If you're getting a different
> result, it's something else different about the systems, possibly as
> random as timing differences between the program using musl and the
> one using glibc. But I would also check for different kernel versions
> or configurations, mount setups, etc.
>
> > The issue occurs when there is a high volume of files being removed
>
> Sounds like a timing dependent bug in the NFS implementation.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 4695 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-09 23:01       ` [musl] " Stephen Von Takach
@ 2025-07-10  0:03         ` Rich Felker
  2025-07-10  4:58           ` Stephen Von Takach
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2025-07-10  0:03 UTC (permalink / raw)
  To: Stephen Von Takach; +Cc: musl, Viv Briffa

On Thu, Jul 10, 2025 at 09:01:57AM +1000, Stephen Von Takach wrote:
> We're using docker containers running on the same kernel using the same
> mount setup.
> Works on debian, does not work on alpine.
> 
> The difference is at the libc interface.

Unless I'm missing something, musl and glibc are doing exactly the
same thing here. There is no userspace code for unlink; it's just a
syscall. You can compare the glibc code at:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/unlink.c;hb=glibc-2.41

with the musl code at:

https://git.musl-libc.org/cgit/musl/tree/src/unistd/unlink.c?id=v1.2.5

If you're seeing different behavior, something else is the cause. It's
almost surely what I said, a timing-dependent NFS bug.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-10  0:03         ` Rich Felker
@ 2025-07-10  4:58           ` Stephen Von Takach
  2025-07-10 15:44             ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Von Takach @ 2025-07-10  4:58 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Viv Briffa

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

Yeah I see your point and this was closed as a kernel issue:
https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960

We're running these two containers on the same kernel and seeing the same
behaviour as that alpine issue.
Happy to continue working around the issue by using debian userspace to
build our service.

It does seems crazy that there is clearly an issue, possibly a kernel issue
that is being handwaved away by all parties


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@place.technology


On Thu, 10 Jul 2025 at 10:03, Rich Felker <dalias@libc.org> wrote:

> On Thu, Jul 10, 2025 at 09:01:57AM +1000, Stephen Von Takach wrote:
> > We're using docker containers running on the same kernel using the same
> > mount setup.
> > Works on debian, does not work on alpine.
> >
> > The difference is at the libc interface.
>
> Unless I'm missing something, musl and glibc are doing exactly the
> same thing here. There is no userspace code for unlink; it's just a
> syscall. You can compare the glibc code at:
>
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/unlink.c;hb=glibc-2.41
>
> with the musl code at:
>
> https://git.musl-libc.org/cgit/musl/tree/src/unistd/unlink.c?id=v1.2.5
>
> If you're seeing different behavior, something else is the cause. It's
> almost surely what I said, a timing-dependent NFS bug.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 5188 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-10  4:58           ` Stephen Von Takach
@ 2025-07-10 15:44             ` Rich Felker
  2025-07-10 17:01               ` Nathan McSween
  2025-07-10 21:25               ` Stephen Von Takach
  0 siblings, 2 replies; 12+ messages in thread
From: Rich Felker @ 2025-07-10 15:44 UTC (permalink / raw)
  To: Stephen Von Takach; +Cc: musl, Viv Briffa

On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> Yeah I see your point and this was closed as a kernel issue:
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960

OK, is your issue unlink falsely succeeding, or readdir skipping
entries? The latter is a known bug in the kernel NFS client. One of my
comments on the tracker suggests:

  "The nordirplus option mentioned in one of those tracker threads
  might be a workaround."

I'm not sure if this is the case, but it might be worth trying.

Note that it's *expected* that an already-in-progress iteration of a
directory may return entries that were already deleted. The
unacceptable thing is the opposite: when it skips some entries that
have not been deleted as a consequence of other things being deleted.

> We're running these two containers on the same kernel and seeing the same
> behaviour as that alpine issue.
> Happy to continue working around the issue by using debian userspace to
> build our service.
> 
> It does seems crazy that there is clearly an issue, possibly a kernel issue
> that is being handwaved away by all parties

It's not "handwaved away" by us. We have determined that there is a
bug in a component we have no control over, and for which we have no
sound means of working around.

I'm happy to work together on tracking down the cause to get it fixed,
but that requires cooperation from someone who's able to reproduce it,
documenting the exact circumstances under which it occurs (NFS server
vendor/version, NFS mount options) and either producing a minimal test
program to reproduce the issue under those conditions, or being
willing to run a proposed test by someone else.

Even if using Debian/glibc *seems* to make things work for you, I
think it would be beneficial for you to try to get to the root cause
of the problem and get it fixed. What we previously found on the
above-linked ticket was that glibc is not doing anything special that
should rule out that bug, only that the particular filename
sizes/counts in the test didn't trigger the bug with glibc.

Again, I don't know if this is the same bug you're hitting (this is
the first time in the thread you've mentioned readdir if I'm not
mistaken, as opposed to just unlink) or if there's a second bug in
play here. If you could at least clarify that, it would be a big help
to anyone investigating it in the future.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-10 15:44             ` Rich Felker
@ 2025-07-10 17:01               ` Nathan McSween
  2025-07-10 17:11                 ` Rich Felker
  2025-07-10 21:25               ` Stephen Von Takach
  1 sibling, 1 reply; 12+ messages in thread
From: Nathan McSween @ 2025-07-10 17:01 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2729 bytes --]

https://github.com/Azure/AKS/issues/1325#issuecomment-713372369, does the
behavior happen with coreutils?

On Thu, Jul 10, 2025, 8:44 AM Rich Felker <dalias@libc.org> wrote:

> On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> > Yeah I see your point and this was closed as a kernel issue:
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
>
> OK, is your issue unlink falsely succeeding, or readdir skipping
> entries? The latter is a known bug in the kernel NFS client. One of my
> comments on the tracker suggests:
>
>   "The nordirplus option mentioned in one of those tracker threads
>   might be a workaround."
>
> I'm not sure if this is the case, but it might be worth trying.
>
> Note that it's *expected* that an already-in-progress iteration of a
> directory may return entries that were already deleted. The
> unacceptable thing is the opposite: when it skips some entries that
> have not been deleted as a consequence of other things being deleted.
>
> > We're running these two containers on the same kernel and seeing the same
> > behaviour as that alpine issue.
> > Happy to continue working around the issue by using debian userspace to
> > build our service.
> >
> > It does seems crazy that there is clearly an issue, possibly a kernel
> issue
> > that is being handwaved away by all parties
>
> It's not "handwaved away" by us. We have determined that there is a
> bug in a component we have no control over, and for which we have no
> sound means of working around.
>
> I'm happy to work together on tracking down the cause to get it fixed,
> but that requires cooperation from someone who's able to reproduce it,
> documenting the exact circumstances under which it occurs (NFS server
> vendor/version, NFS mount options) and either producing a minimal test
> program to reproduce the issue under those conditions, or being
> willing to run a proposed test by someone else.
>
> Even if using Debian/glibc *seems* to make things work for you, I
> think it would be beneficial for you to try to get to the root cause
> of the problem and get it fixed. What we previously found on the
> above-linked ticket was that glibc is not doing anything special that
> should rule out that bug, only that the particular filename
> sizes/counts in the test didn't trigger the bug with glibc.
>
> Again, I don't know if this is the same bug you're hitting (this is
> the first time in the thread you've mentioned readdir if I'm not
> mistaken, as opposed to just unlink) or if there's a second bug in
> play here. If you could at least clarify that, it would be a big help
> to anyone investigating it in the future.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 3493 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-10 17:01               ` Nathan McSween
@ 2025-07-10 17:11                 ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2025-07-10 17:11 UTC (permalink / raw)
  To: Nathan McSween; +Cc: musl

On Thu, Jul 10, 2025 at 10:01:50AM -0700, Nathan McSween wrote:
> https://github.com/Azure/AKS/issues/1325#issuecomment-713372369, does the
> behavior happen with coreutils?

That thread looks like it has a lot of misinformation. In particular,
the comment you linked makes an erroneous claim:

    "If a file is removed from or added to the directory after the
    most recent call to opendir() or rewinddir(), whether a subsequent
    call to readdir() returns an entry for that file is unspecified.

    So the different filesystems are left free to choose their own
    behaviour when this happens. cifs.ko (the Linux SMB client) makes

The above first part is true. Implementations are free to choose
whether to show stale (already deleted) entries or do the extra work
to suppress them. However...

    sure that it's not returning stale data, at the cost of missing
    some entries for this particular use case"

...that does NOT give them license to break conformance by "missing
some entries". If your mitigation for showing stale file entries
involves failure to show some other non-stale ones, it's broken, and
needs to be removed.

Rich


> On Thu, Jul 10, 2025, 8:44 AM Rich Felker <dalias@libc.org> wrote:
> 
> > On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> > > Yeah I see your point and this was closed as a kernel issue:
> > > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
> >
> > OK, is your issue unlink falsely succeeding, or readdir skipping
> > entries? The latter is a known bug in the kernel NFS client. One of my
> > comments on the tracker suggests:
> >
> >   "The nordirplus option mentioned in one of those tracker threads
> >   might be a workaround."
> >
> > I'm not sure if this is the case, but it might be worth trying.
> >
> > Note that it's *expected* that an already-in-progress iteration of a
> > directory may return entries that were already deleted. The
> > unacceptable thing is the opposite: when it skips some entries that
> > have not been deleted as a consequence of other things being deleted.
> >
> > > We're running these two containers on the same kernel and seeing the same
> > > behaviour as that alpine issue.
> > > Happy to continue working around the issue by using debian userspace to
> > > build our service.
> > >
> > > It does seems crazy that there is clearly an issue, possibly a kernel
> > issue
> > > that is being handwaved away by all parties
> >
> > It's not "handwaved away" by us. We have determined that there is a
> > bug in a component we have no control over, and for which we have no
> > sound means of working around.
> >
> > I'm happy to work together on tracking down the cause to get it fixed,
> > but that requires cooperation from someone who's able to reproduce it,
> > documenting the exact circumstances under which it occurs (NFS server
> > vendor/version, NFS mount options) and either producing a minimal test
> > program to reproduce the issue under those conditions, or being
> > willing to run a proposed test by someone else.
> >
> > Even if using Debian/glibc *seems* to make things work for you, I
> > think it would be beneficial for you to try to get to the root cause
> > of the problem and get it fixed. What we previously found on the
> > above-linked ticket was that glibc is not doing anything special that
> > should rule out that bug, only that the particular filename
> > sizes/counts in the test didn't trigger the bug with glibc.
> >
> > Again, I don't know if this is the same bug you're hitting (this is
> > the first time in the thread you've mentioned readdir if I'm not
> > mistaken, as opposed to just unlink) or if there's a second bug in
> > play here. If you could at least clarify that, it would be a big help
> > to anyone investigating it in the future.
> >
> > Rich
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unlink on NFS volume fails silently
  2025-07-10 15:44             ` Rich Felker
  2025-07-10 17:01               ` Nathan McSween
@ 2025-07-10 21:25               ` Stephen Von Takach
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Von Takach @ 2025-07-10 21:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Viv Briffa

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

Given the unlink code is exactly the same, it must be an issue with readdir
and not an issue with unlink, as per the
https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
So the issue on musl is that we're not attempting to remove all the files.


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@place.technology


On Fri, 11 Jul 2025 at 01:44, Rich Felker <dalias@libc.org> wrote:

> On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> > Yeah I see your point and this was closed as a kernel issue:
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
>
> OK, is your issue unlink falsely succeeding, or readdir skipping
> entries? The latter is a known bug in the kernel NFS client. One of my
> comments on the tracker suggests:
>
>   "The nordirplus option mentioned in one of those tracker threads
>   might be a workaround."
>
> I'm not sure if this is the case, but it might be worth trying.
>
> Note that it's *expected* that an already-in-progress iteration of a
> directory may return entries that were already deleted. The
> unacceptable thing is the opposite: when it skips some entries that
> have not been deleted as a consequence of other things being deleted.
>
> > We're running these two containers on the same kernel and seeing the same
> > behaviour as that alpine issue.
> > Happy to continue working around the issue by using debian userspace to
> > build our service.
> >
> > It does seems crazy that there is clearly an issue, possibly a kernel
> issue
> > that is being handwaved away by all parties
>
> It's not "handwaved away" by us. We have determined that there is a
> bug in a component we have no control over, and for which we have no
> sound means of working around.
>
> I'm happy to work together on tracking down the cause to get it fixed,
> but that requires cooperation from someone who's able to reproduce it,
> documenting the exact circumstances under which it occurs (NFS server
> vendor/version, NFS mount options) and either producing a minimal test
> program to reproduce the issue under those conditions, or being
> willing to run a proposed test by someone else.
>
> Even if using Debian/glibc *seems* to make things work for you, I
> think it would be beneficial for you to try to get to the root cause
> of the problem and get it fixed. What we previously found on the
> above-linked ticket was that glibc is not doing anything special that
> should rule out that bug, only that the particular filename
> sizes/counts in the test didn't trigger the bug with glibc.
>
> Again, I don't know if this is the same bug you're hitting (this is
> the first time in the thread you've mentioned readdir if I'm not
> mistaken, as opposed to just unlink) or if there's a second bug in
> play here. If you could at least clarify that, it would be a big help
> to anyone investigating it in the future.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 6654 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-10 21:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-06  6:25 [musl] unlink on NFS volume fails silently Stephen Von Takach
2025-07-07 18:26 ` Markus Wichmann
2025-07-08  1:08 ` Rich Felker
2025-07-09  3:59   ` [musl] " Stephen Von Takach
2025-07-09 18:41     ` Rich Felker
2025-07-09 23:01       ` [musl] " Stephen Von Takach
2025-07-10  0:03         ` Rich Felker
2025-07-10  4:58           ` Stephen Von Takach
2025-07-10 15:44             ` Rich Felker
2025-07-10 17:01               ` Nathan McSween
2025-07-10 17:11                 ` Rich Felker
2025-07-10 21:25               ` Stephen Von Takach

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).