To support a bunch of ongoing performance work in oxide's
packet transformation stack, we'd initially considered the use of flows with custom callback functions. However, we found that there a lot of places that flows in general fall short today. I've put together an IPD to outline where most of the negative interactions with performance optimisations are, what conceptual similarities DLS bypass has with flows, and how we can unify the flow_tab and subflow_tab into a single tree to iron out those issues.
The main goal here is to reframe how flows fit into the MAC datapath in a more first-class way, such that subflows and flows can make identical use of polling, softrings, and DLS bypass. The secondary goal is to allow `dls`/`ip` to delegate traffic processing early in the stack to modules that offer to do so. This should also pave the way for dedicating hardware resources (Rx/Tx rings) to certain traffic classes.
I'd really appreciate any feedback.