On Mon, Jun 13, 2022 at 11:41:57AM -0400, Waldek Kozaczuk wrote:
> Hi,
>
> Very recently we implemented minimal rnetlink support on OSv side which
> allowed us to finally switch to the netlink-based implementation of
> getifaddrs() and if_nameindex().
>
> However, I noticed that the function __netlink_enumerate() in
> https://github.com/ifduyue/musl/blob/master/src/network/netlink.c uses
> MSG_DONTWAIT flag when calling recv() which may fail with EAGAIN or
> EWOULDBLOCK and there is no error/retry handling for that. I actually saw
> both functions fail occasionally on OSv.
>
> One way to fix is to add missing error handling. But another simpler
> solution is to stop using MSG_DONTWAIT altogether and force recv() to
> block. In other words, the line:
>
> r = recv(fd, u.buf, sizeof(u.buf), MSG_DONTWAIT);
>
> should change to:
>
> r = recv(fd, u.buf, sizeof(u.buf), 0);
>
> For time being we are applying a header trick on OSv side to re-define
> MSG_DONTWAIT as 0 when compiling those specific musl sources.
Thanks! I'll try to track this down. One concern is that I'm not sure
how MSG_DONTWAIT is supposed to interact with "short reads" -- is it
needed (for netlink) to prevent blocking when some data has been read
but there is still buffer space for more?
On a related issue, I'm pretty sure the netlink API doesn't allow for
partial reads with some data remaining buffered on the kernel side,
but we should probably verify that too.
Rich