From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29865 invoked from network); 26 Mar 2022 01:43:52 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 26 Mar 2022 01:43:52 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1648259032; b=YffQ+L2W04bk/gBD1Dq3+86Suce5PigB1Se198tspwDSZhI79ka5kL8UNDpmRjmm2Frsm4iXkq KvnfBh53zmJC97XYczl2M+6V9NTYiejbJHgYsTExp3IdA2rUyBjxz7UtvRC8O1Yb6J1z49UD+b czjx16nyg6sDWgyYrC9kWYR5PcdZfF+BcEFLQV9Hi5cHbyj/57drIYQ+oeJ3vGwcqEI/81v+4t bXPmSw+DrSmah2iAgbwm0oCjpqAjxIXlwW6fvsl0+WiBvEx9aS7OymYqB2z8MLzTEsjlmZ1jS5 5sCPOd8VqtU2K1kTBWjCN0nFMsfLbZVEAX9jI7zlf/j1cA==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-lf1-f52.google.com) smtp.remote-ip=209.85.167.52; dkim=pass header.d=gmail.com header.s=20210112 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1648259032; bh=Lrnvc0ZsbpUG0CxWAHC/dO+T/SAPWFarGQz3DTWEXos=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:References:In-Reply-To:Message-ID:Date:Subject:To:From: DKIM-Signature:DKIM-Signature; b=IkKrUY6sVaTeyl5j5cXkehHZ9KPq5ca3/aJ7usg2/oKkYuPMSy9mjiWT2U4PrailSh9Mzfv92Z wTqCXbKfyyE7sfqLleAVvPTaURyWxh4SPAmVS4LA9X6wzb0RfvdOYgjfx0Rt9/raWVpX3I4LAA uhlNUrrByrphpzz/lV1Ogzmdvn9sGi9Uy/4Nmsn2Kkrc06c5EVGAG3J7HJPLREDF0v6LJ3nRMg 9FdfZWp4WzvVx/IVn7/9eT4QEMHc1Na5SfCC9z+zJQK7dWGi7xjayORBtiLBOKr6j6SpbcsWNg 6996llcNC4jCUVB4rUl3OEP+0qVlNZMkVsJRAooMhlgRPQ==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:References:In-Reply-To:Message-Id: Date:Subject:To:From:Reply-To:Cc:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=lk4JEyfaeF+p4L4R7uoVkg6a28wjBrcfZ/rkJ4pymck=; b=jh288mtsBfqKjGz5I6866mzeaU NX1yQ7GnW1UMSVDXtyWWxuX0rSwSRbPSV39ctC7K7x3FERyFB03H4iNmdeS75juAnW2OH37+1W7Si a/HwstOlgzBLmebVm6+bIqmaRvjDtVFLHntmhdlanS/FvM++H5HCVgMg0/iFkTmT3yjUMZezrUemY uiyDNs2giqBiRuznj5t9Z9UvGCAEHCrTLH3eQa7uhMM71Hhv2o6Ydzq6qjcgp5mkiOyOQakbV/61X maq3nqS8h/WSdu2b2td1agEEJ0v64axGE/dfY6aQ8PQLP2Dw7f3R/Ul/OW9j/jnqF4LvoLcA19kfx bMhv95qQ==; Received: from authenticated user by zero.zsh.org with local id 1nXvT5-0008FL-Iv; Sat, 26 Mar 2022 01:43:51 +0000 Authentication-Results: zsh.org; iprev=pass (mail-lf1-f52.google.com) smtp.remote-ip=209.85.167.52; dkim=pass header.d=gmail.com header.s=20210112 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none Received: from mail-lf1-f52.google.com ([209.85.167.52]:34472) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1nXvSq-0007u1-Oq; Sat, 26 Mar 2022 01:43:37 +0000 Received: by mail-lf1-f52.google.com with SMTP id 5so16100729lfp.1 for ; Fri, 25 Mar 2022 18:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:in-reply-to:references; bh=lk4JEyfaeF+p4L4R7uoVkg6a28wjBrcfZ/rkJ4pymck=; b=FPl9BbKK0iKLnmKJ8+AE/40hV98Ks60MsnQ8/nVRe0cUWrQLYkS4w4W7PBtPh2noW0 vPXgbfvHOiniXK70PphrxZdDCdCAxMLB4mbXk/+zUCJ8a3Y7G4BczRPdCESDipdwy8e2 vCVZ5z/Cd85SIRA+dNgdEmZ21khlNkiha0vlbOdFopPNnu710OGIaqxJCXDXcniaaqHP YP/UPqTPxvSePNASRH7T0qNciwXfcAw8fRfR8Hz/KNhSfzJkQcj5N1YyROHRdZ7bJaT3 yUrlC9EHGZyh7dmFdVV4hTz+137RlqnPsNds7b6fXolLnlZ1g8D1HhvpT557g1X9LfbU 9UAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=lk4JEyfaeF+p4L4R7uoVkg6a28wjBrcfZ/rkJ4pymck=; b=Rmk/znu0A40ZUlpoHR2VfcZjMlAGBUsAhLNNrVkqXxk06yebmWQBYOxBzmFNkvcEEf p2UW65JhZzqtcCNJ+BJEhhwW9QsZjAeXHFMu9U5NlcnoOg4ENcARXpmEwhG7um9+yj0T e3UTn5bGo/o+982BNIhcTqusK18xqHbeJ1R7MY2inboMT1VMg5L0mCYm8NhdFC6RFCs+ QrCuCXPDW84RpEOlcPnni1ZEStXghuEe7axFw1NIvA8ntygdwC2PZCWq424qTk9a2DaP aeyrPVXPdAKgmdI0ZsheUwvn0jNjchWDNb4Zf8+Jod6uiMC6EomxlFHvML3EIuY6gPzc P+2A== X-Gm-Message-State: AOAM530sOkR+hbMpMbgBW+7Cjnlf/uBxcfk/gAYlg1iHtD6JhUeABemm oTPpzK7E70tl3jgpnAGcfetxQrsP3yk= X-Google-Smtp-Source: ABdhPJymH1ct+qWQ8gk6Y2h+gVEogutUu1ymtd9e7iPUK95lpisEgGGijA0yeuUCuUF4VA8n5wKMFg== X-Received: by 2002:ac2:4c47:0:b0:44a:151f:a76e with SMTP id o7-20020ac24c47000000b0044a151fa76emr10203590lfk.121.1648259015982; Fri, 25 Mar 2022 18:43:35 -0700 (PDT) Received: from localhost.localdomain (h-212-85-88-110.A230.priv.bahnhof.se. [212.85.88.110]) by smtp.gmail.com with ESMTPSA id n17-20020a19ef11000000b0044a37ab9754sm880782lfh.39.2022.03.25.18.43.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Mar 2022 18:43:35 -0700 (PDT) From: Mikael Magnusson To: zsh-workers@zsh.org Subject: PATCH 2/2: [WIP] Efficient dedup for unsorted completions Date: Sat, 26 Mar 2022 02:43:30 +0100 Message-Id: <20220326014330.3050-1-mikachu@gmail.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20220326014030.2802-1-mikachu@gmail.com> References: <20220326014030.2802-1-mikachu@gmail.com> X-Seq: 49894 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: This implements my idea for sorting a temporary array and then using that for deduplication. This is fast enough that -2 isn't needed in _path_files, and will also help for potential other cases other than file completion. PS I realized a bit late that Cmatch is typedeffed to a pointer so the double pointer shenanigans are a bit pointless, but I'll leave reworking that until another evening... --- Src/Zle/comp.h | 1 + Src/Zle/compcore.c | 66 +++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 46 insertions(+), 21 deletions(-) diff --git a/Src/Zle/comp.h b/Src/Zle/comp.h index a8480c2bac..2ca779fe53 100644 --- a/Src/Zle/comp.h +++ b/Src/Zle/comp.h @@ -140,6 +140,7 @@ struct cmatch { #define CMF_ALL (1<<13) /* a match representing all other matches */ #define CMF_DUMMY (1<<14) /* unselectable dummy match */ #define CMF_MORDER (1<<15) /* order by matches, not display strings */ +#define CMF_DELETE (1<<16) /* used for deduplication of unsorted matches, don't set */ /* Stuff for completion matcher control. */ diff --git a/Src/Zle/compcore.c b/Src/Zle/compcore.c index a9ace5587b..126cdd3ae9 100644 --- a/Src/Zle/compcore.c +++ b/Src/Zle/compcore.c @@ -3191,6 +3191,13 @@ matchcmp(Cmatch *a, Cmatch *b) matchorder & CGF_NUMSORT) ? SORTIT_NUMERICALLY : 0)); } +/**/ +static int +matchcmp_pointer(Cmatch **a, Cmatch **b) +{ + return matchcmp(*a, *b); +} + /* This tests whether two matches are equal (would produce the same * strings on the command line). */ @@ -3284,30 +3291,47 @@ makearray(LinkList l, int type, int flags, int *np, int *nlp, int *llp) } /* used -O nosort or -V, don't sort */ } else { - /* didn't use -1 or -2, so remove all duplicates (inefficient) */ + /* didn't use -1 or -2, so remove all duplicates (efficient) */ if (!(flags & CGF_UNIQALL) && !(flags & CGF_UNIQCON)) { - int dup; - - for (ap = rp; *ap; ap++) { - for (bp = cp = ap + 1; *bp; bp++) { - if (!matcheq(*ap, *bp)) - *cp++ = *bp; - else - n--; + int dup, i; + + /* To avoid O(n^2) here, sort a temporary list of pointers to the real array */ + /* TODO: this can probably just be a copy of the array, i forgot Cmatch is typedef to pointer */ + matchorder = flags; + Cmatch **sp, **asp; + sp = (Cmatch **) zhalloc((n + 1) * sizeof(Cmatch *)); + asp = sp; + for (i = 0; i < n; i++) + *asp++ = rp + i; + *asp = NULL; + qsort((void *) sp, n, sizeof(Cmatch *), + (int (*) _((const void *, const void *)))matchcmp_pointer); + for (asp = sp + 1; *asp; asp++) { + Cmatch *ap = asp[-1], *bp = asp[0]; + if (matcheq(*ap, *bp)) { + bp[0]->flags = CMF_DELETE; + } else if (!ap[0]->disp) { + /* Mark those, that would show the same string in the list. */ + /* Mikael: I haven't tested this other than commenting out matcheq above */ + Cmatch **bsp = sp; + for (dup = 0; bp[0] && !(bp[0])->disp && + !strcmp((*ap)->str, (bp[0])->str); bp = *++sp) { + (bp[0])->flags |= CMF_MULT; + dup = 1; + } + if (dup) + (*ap)->flags |= CMF_FMULT; } - *cp = NULL; - if (!(*ap)->disp) { - for (dup = 0, bp = ap + 1; *bp; bp++) - if (!(*bp)->disp && - !((*bp)->flags & CMF_MULT) && - !strcmp((*ap)->str, (*bp)->str)) { - (*bp)->flags |= CMF_MULT; - dup = 1; - } - if (dup) - (*ap)->flags |= CMF_FMULT; - } } + int n_orig = n; + for (bp = rp, ap = rp; bp < rp + n_orig; ap++, bp++) { + while (bp[0]->flags & CMF_DELETE) { + bp++; + n--; + } + *ap = *bp; + } + *ap = NULL; /* passed -1 but not -2, so remove consecutive duplicates (efficient) */ } else if (!(flags & CGF_UNIQCON)) { int dup; -- 2.15.1