From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 28326 invoked from network); 5 Mar 2022 19:45:34 -0000 Received: from 4ess.inri.net (216.126.196.42) by inbox.vuxu.org with ESMTPUTF8; 5 Mar 2022 19:45:34 -0000 Received: from mail-pl1-f181.google.com ([209.85.214.181]) by 4ess; Sat Mar 5 14:39:13 -0500 2022 Received: by mail-pl1-f181.google.com with SMTP id n2so1043558plf.4 for <9front@9front.org>; Sat, 05 Mar 2022 11:39:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mforney.org; s=google; h=message-id:date:from:to:subject:mime-version :content-transfer-encoding; bh=krLpgvVWhcD5myBrx5QzK2OzhVtF2ndKU9Bpm/lMJ4E=; b=dbPubST8RONHj3I+K0waVFXxRPQ7MGWTjM/FxPN25kFMpkY2DeGQrmhiLLiAA3WsbD uuWpe+VH9FLXRxFEZ19rBXqAoslxzWsDzj8ZHz9JAqbAMVS10wUcUL63LO6u16lPfm7e nbur1Q4E/YYVNrfLSW07f6D0vfSQiojRVjJApxKKo1XP660HraTQPq+NOedkG4ZK/goq IJlj3grS4gYkLqE4yX98IebAFgNQG2EcFWYcZEPlsmKidzKM2KJCSGhLoJMiQKmNaW96 dLYD1t2khuYOB15nJ8hBRLMoUBIwgfTWBAi/NFvCt+zlGlG2ihGLZpAXZ3VZvJ6ot7QD eLIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:from:to:subject:mime-version :content-transfer-encoding; bh=krLpgvVWhcD5myBrx5QzK2OzhVtF2ndKU9Bpm/lMJ4E=; b=wMEA5DJpd4gKY2WJKgxaNN85gT4QCImGr7gW6SohGA2yeh+YfPaerAzWowALoAJR9z yQMEzWrt4k103CIk4AH0FnOAlBDtVeBDgO2Bysz4gTiBG7NJNukiVNQuLwIXDn6BVftM tpX9bsJVzkLIBlragkPsiUP9+u7SXBw+ZgqNi1Z9N13bnD9gRcs7/9PtOqnZVPcw9erg qG40YyaKa2IgAg55o4dPM+Z1RlVSOAsxtcLIR1KREVGX8gj+DOTLsKIx4IjHW+M140CJ W4STUYGFl7WhW/ILhwGDqsxaJaAWoGpIOysfk7jmcpgbBTdPRcYbgnW0AJA2jEsEh1aI 5xRA== X-Gm-Message-State: AOAM531hNNXsWg75fIB5/moyIMG2GnLwT+jJXM2m8YtmWDzSGlVIYn+l vnjjYLeP88Pw6zapmn/nmRqQjfqvqVcXdRa1S/w= X-Google-Smtp-Source: ABdhPJygsBfJAay0vXEL0sKhygK4/FrbkeO7wkWCi9UOSug0m9Tdu8xQykN4JQ0UJ43uZV95TKg1rQ== X-Received: by 2002:aa7:96db:0:b0:4f6:a83a:784e with SMTP id h27-20020aa796db000000b004f6a83a784emr5145882pfq.3.1646508728020; Sat, 05 Mar 2022 11:32:08 -0800 (PST) Return-Path: Received: from localhost ([98.45.152.168]) by smtp.gmail.com with ESMTPSA id rm4-20020a17090b3ec400b001bf2f7e86b1sm4314345pjb.4.2022.03.05.11.32.06 for <9front@9front.org> (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sat, 05 Mar 2022 11:32:07 -0800 (PST) Message-ID: <6223bab7.1c69fb81.6fc89.a14b@mx.google.com> Date: Sat, 05 Mar 2022 11:32:05 -0800 From: Michael Forney To: 9front@9front.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: flexible abstract event CMS optimizer Subject: [9front] git lca bug Reply-To: 9front@9front.org Precedence: bulk I recently encountered a bug where git/query returns the wrong result when calculating LCA(A, B) in the following graph: A | B | | | C | | | D | | | E | | | F | | | G | | | H | / |/ I | J | K | L |\ | \ M N | /=20 |/ O | P | ... It returns M rather than I. The LCA algorithm was rewritten not too long ago in http://git.9front.org/plan9front/plan9front/c7dcc82b0be805717efbe77c98eaadf= 3ee1e31af/commit.html However, I don't understand that commit message. The definition of LCA(b, g) that I've read is "the lowest node that has both b and g as descendents". In the graph <--a--b--c--d--e--f--g \ / +-----h------- the lowest node that fits this definition is b. a is not the LCA, since b is a descendent of a, and therefore lower. I'm not sure what is meant by strict LCA, but maybe there is some other definition of LCA that finds the ancestor whose distances between the two nodes is minimized in some way? What metric was used here? Perhaps sum of distances? It seems to me that what we actually want to find is a common ancestor such that there is no other common ancestor that descends from it. I don't think the distances between the LCA arguments and the ancestor are relevant here, except maybe to break ties when there are multiple LCAs. Here's a debug log with commit hashes replaced with the letter in the problematic graph: term% git/query -dd A B init: keep A init: drop B finding twixt commits found best (dist 7 < 1073741824): I found best (dist 5 < 7): M found ancestor M term% term% git/query -dd A B init: keep A init: drop B finding twixt commits enqueue: keep I enqueue: drop C enqueue: keep J enqueue: keep K enqueue: keep L enqueue: keep M enqueue: keep N enqueue: keep O enqueue: keep P enqueue: keep Q enqueue: keep R enqueue: keep S enqueue: keep T enqueue: keep U enqueue: keep V enqueue: keep W enqueue: keep X enqueue: keep Y enqueue: keep Z enqueue: keep AA enqueue: keep AB enqueue: keep AC enqueue: keep AD enqueue: drop D enqueue: drop E enqueue: drop F enqueue: drop G enqueue: drop H enqueue: drop I found best (dist 7 < 1073741824): I repaint: blank =3D> drop AD repaint: blank =3D> drop M found best (dist 5 < 7): M found ancestor M term% Here's what I think is going wrong: - We paint commits reachable from A as Keep, and eventually we get to a commit with timestamp earlier than B. The remaining ancestors (M and AD) are left on the queue. - We start painting commits reachable from B as Drop and eventually find I, marking that as the current best with distance 7 (from B). - We repaint the commits reachable from I as Drop, until we get to the ancestors not yet marked (M and AD). - We continue with what was left on the queue, M and AD. M is painted Drop, and was queued as Keep, so this is a new common ancestor. - M's distance (from A) is 5, which is lower than 7, so we use that in preference to I (even though I only had distance 1 from A). Any idea on how best to fix this?