This is only a minor comment not addressing the main question, but I was surprised by Christoph's assumption that folding on a tree (repeatedly calling Set.add) would be more efficient than a divide-and-conquer merge approach (repeatedly calling Set.union), at least with a balanced-search-tree implementation of sets. I would rather have assumed that a divide-and-conquer merge may be faster, on the vague hunch that merging whole trees instead of single leaves (adding a key is like merging a leave) allows to reuse information between comparing the root of a tree and merging its substrees.
I wrote a quick benchmark (
https://gitlab.com/gasche-snippets/tree-fold-merge-benchmark ) that collects leaves of a binary tree into a set, and it turns out that which is faster depends on the key ordering. If the keys are completely random, then the fold-based code is twice faster. If the keys are mostly sorted, then the merge-based code is twice faster. The fold-based code performance seems independent of the key distribution, it is the merge-based code that goes from twice slower to twice faster depending on the scenario.
My proposed explanation would be that merging two trees where all the keys in a tree are larger than almost all the keys in the other is fast (this is the case where little rebalancing is needed), so this hits a good case where tree-form indeed reuses comparison information. On the contrary, random trees are not faster than just calling add on all leaves.
(I would guess that in complexity tree union is always as good as repeated-add or better, for all key distributions, but that the constant factors of the current implementation are higher because we manipulate more slightly complex data structures.)