From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 3874 invoked from network); 6 Mar 2022 18:40:35 -0000 Received: from 4ess.inri.net (216.126.196.42) by inbox.vuxu.org with ESMTPUTF8; 6 Mar 2022 18:40:35 -0000 Received: from mimir.eigenstate.org ([206.124.132.107]) by 4ess; Sun Mar 6 13:33:26 -0500 2022 Received: from abbatoir.myfiosgateway.com (pool-74-108-56-225.nycmny.fios.verizon.net [74.108.56.225]) by mimir.eigenstate.org (OpenSMTPD) with ESMTPSA id d110aebf (TLSv1.2:ECDHE-RSA-AES256-SHA:256:NO) for <9front@9front.org>; Sun, 6 Mar 2022 10:33:05 -0800 (PST) Message-ID: To: 9front@9front.org Date: Sun, 06 Mar 2022 13:33:03 -0500 From: ori@eigenstate.org In-Reply-To: <2XMT5L9VLBDA3.2ZSPGPD38FJJR@mforney.org> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: anonymous compliant ORM cache STM generator Subject: Re: [9front] git lca bug Reply-To: 9front@9front.org Precedence: bulk Quoth Michael Forney : > ori@eigenstate.org wrote: > > How do you define 'lowest' in this case? especially > > without walking the whole graph from the initial > > commit, which is very slow in large repos? > > Yeah, it's a bit tricky, but I think it's fair to say that if A is > reachable through parent pointers from B, then B is lower than A > (but the converse is not necessarily true). > > Interestingly, I found that when I do `B A @` instead of `A B @`, > it *does* end up walking the whole graph to the initial commit. > > What happens here is that the Drop painting finds I first, and then > when we get to I through Keep painting. I is recorded as the best, > but then the loop continues (`if(... || ncolor == Drop) continue;`). > Then we keep painting the rest of the graph as Drop, finally returning > I when we run out of nodes. > > > Not yet. Tweaking repainting to sum the weight when > > colors meet seems like it may fix it, but I haven't > > thought it through yet. > > I also haven't thought this through fully, but here are some thoughts > I had: > - I don't think any nodes traversed by repaint() should be considered > as candidates for the LCA. repaint() traverses starting at a > common ancestor, so any nodes we traverse will be higher > on the graph. yes, though > - Instead of repaint() painting nodes from Keep to Drop, maybe there > should be a third color (Skip?). It could take a starting point > of either color, repainting everything it finds as Skip, stopping > when it repaints Blank nodes. If this happens to find our previous > best common ancestor, then we use the repaint starting point as > the new best. We may need to resplit the code, if we do that-- it's also used for pushing and pulling, where the goal is somewhat different; it wants to find all nodes that are in one graph but not the other, so we can put them into a pack file. With the current code: In theory, we should always be returning the node on the boundary between Keep and Drop with the shortest number of steps from both parents. Regardless of keep/drop, we'll need to explore more or less the full boundary, and ideally not much more. I think that this current approach should be more or less correct, though possibly not 100% optimal -- there's just a bug in the stop or selection condition. > - Then, when paint() pops a node painted Skip, it just skips it and > continues in the loop, since the node was an ancestor of a common > ancestor. > - paint() should be symmetric in its arguments. It can continue > if e.color == ncolor, paint it if ncolor == Blank, and repaint > to Skip otherwise, potentially choosing a new common ancestor. > > I'll try those changes in a bit to see if that approach is promising. > I *think* it should at least guarantee that the node it returns is > not reachable from any other common ancestor of A and B. > > There are still a few edge cases like the following that I'm not > sure how it would handle, but I don't know how often those come up > in practice. > > A B > |\ /| > | \ | > |/ \| > C D > \ | > E | > \| > F > | > ... In this case, torvalds git will return a list of {c,d} as the set of ancestors. merging will be done recursively, creating a virtual commit that is merge(c,d) -- and if that creates a cross branch, then the merge will be repeated. for us, I'd be happy if we pick an arbitrary parent. Though, in this case, thinking about it, we can return *all* minimums in the reachability boundary if we want to do that. (I don't think we want to)