public inbox for discuss@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* [discuss] Mellanox LACP aggr fails without snoop if using vlan tags
@ 2024-10-31 20:47 Josh Coombs
  2024-10-31 21:23 ` Alex Wilson via illumos-discuss
  0 siblings, 1 reply; 2+ messages in thread
From: Josh Coombs @ 2024-10-31 20:47 UTC (permalink / raw)
  To: illumos-discuss

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

I've got a brand new r151050e install using a Mellanox CNX5 card, dual 25Gb
ports paired up in an aggr to a Juniper EX4650 cluster. It will only work
if I start snoop on the aggr. Without doing so, it won't pass traffic. I
ran into this back in 2019 with bnx devices after upgrading to r151030 and
was never able to find a fix, that box I ended up changing NICs to intel to
get around the problem.

It also works if I do a snoop -P -d aggr0 so it may not be promiscuous mode
directly that's 'fixing' things?

If I rebuild the setup and don't use VLAN tagging, that appears to work as
anticipated which is different from my prior bnx troubles. Those I never
got around to testing with vlans on OmniOS.

When not working, I can see my aggr's mac address upstream on the switches.
I haven't tried manually firing packets out an interface but based on last
time I expect that will work. The issue seems to be on the receive side in
OmniOS, something about starting a snoop suddenly allows OmniOS to accept
the packets fully for some reason.

I don't have a production and lab setup, sadly my test box is in a
datacenter that is an hour away, unmanned so I'm limited in what I can do
at the moment. I've got a little bit of time before I need to lock the conf
for production use, but can only be onsite once or twice a week at best. I
can live without vlan tagging, but if this is an unexpected fail I'd like
to gather what data I can to help before shutting the door to testing?

Joshua Coombs

------------------------------------------
illumos: illumos-discuss
Permalink: https://illumos.topicbox.com/groups/discuss/T608dab80e5db30f6-Mf17e7261eeb23be4da6bab83
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

[-- Attachment #2: Type: text/html, Size: 2705 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [discuss] Mellanox LACP aggr fails without snoop if using vlan tags
  2024-10-31 20:47 [discuss] Mellanox LACP aggr fails without snoop if using vlan tags Josh Coombs
@ 2024-10-31 21:23 ` Alex Wilson via illumos-discuss
  0 siblings, 0 replies; 2+ messages in thread
From: Alex Wilson via illumos-discuss @ 2024-10-31 21:23 UTC (permalink / raw)
  To: discuss; +Cc: rm

Hi Josh,

On 1/11/24 06:47, Josh Coombs wrote:
> I've got a brand new r151050e install using a Mellanox CNX5 card, dual 
> 25Gb ports paired up in an aggr to a Juniper EX4650 cluster. It will 
> only work if I start snoop on the aggr. Without doing so, it won't pass 
> traffic. I ran into this back in 2019 with bnx devices after upgrading 
> to r151030 and was never able to find a fix, that box I ended up 
> changing NICs to intel to get around the problem.
> 
> It also works if I do a snoop -P -d aggr0 so it may not be 
> promiscuous mode directly that's 'fixing' things?
> 

I've seen this bug as well. When I dtrace the calls into mlxcx what I 
see is that aggr never gives the driver any VLAN tag filters for the 
default group (but it does give MAC filters), so no traffic other than 
on the default tag ends up received.

If you perturb the MAC state of the aggr enough it will switch to 
explicit VLAN tag filters and work fine (e.g. if you add a VNIC as well 
as the VLAN interface, the existence of the VNIC will fix it since that 
causes MAC to add an explicit VLAN tag filter for the vlan DL)

I think this is a semantic bug here -- I suspect MAC is assuming that if 
it adds just MAC filters and no VLAN filters to a NIC, that means all 
tagged traffic for that MAC should be matched, not just un-tagged. 
Unfortunately the documentation (mac_capab_rings.9e and mac.9e) is not 
very clear on this point and some drivers (definitely mlxcx I can speak 
for, since I wrote most of it) have interpreted it differently.

For now I've just been always using VNICs, since those always generate 
explicit filters and work fine. But we should get this fixed up, and 
probably the documentation adjusted to spell it out more clearly so no 
other new drivers make the same mistake going forwards.

------------------------------------------
illumos: illumos-discuss
Permalink: https://illumos.topicbox.com/groups/discuss/T608dab80e5db30f6-M7a4bf3d343ab2df1faf1eed8
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-10-31 21:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-31 20:47 [discuss] Mellanox LACP aggr fails without snoop if using vlan tags Josh Coombs
2024-10-31 21:23 ` Alex Wilson via illumos-discuss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).