From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.zx2c4.com (lists.zx2c4.com [165.227.139.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6E1B3C433F5 for ; Wed, 12 Jan 2022 22:01:00 +0000 (UTC) Received: by lists.zx2c4.com (OpenSMTPD) with ESMTP id d69aaf9c; Wed, 12 Jan 2022 22:00:58 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lists.zx2c4.com (OpenSMTPD) with ESMTPS id 379c24f1 (TLSv1.2:ECDHE-ECDSA-AES256-GCM-SHA384:256:NO) for ; Wed, 12 Jan 2022 22:00:56 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7E03B61AE1 for ; Wed, 12 Jan 2022 22:00:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98DFCC36AEC for ; Wed, 12 Jan 2022 22:00:53 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="lr/xlBLC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1642024850; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x7iEe7IoxzS5LF9sUIO2C6nbsTbfn+EF6LhXMAOlaaE=; b=lr/xlBLCZhYJYGE3l7rjBQs6lpPhxatKOJKXsHsFfWW3BhDx3vbC+FWBWLhVD95Lbk1THF oSse5uLcPApKGTWPwssJ2FwtmJjFGDvACZbzrZlCO4liAvC4eQxuEuVrLVB2jAb8cYgDlJ DrO8bK6NuGXBT8OcmyHrGhlAr4ooSIo= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id ed197f36 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO) for ; Wed, 12 Jan 2022 22:00:50 +0000 (UTC) Received: by mail-yb1-f175.google.com with SMTP id p187so9707916ybc.0 for ; Wed, 12 Jan 2022 14:00:50 -0800 (PST) X-Gm-Message-State: AOAM533YnwAXr903ITEoyLyqoOc/P5QTGNADWhgokI9u0hDrIsBuuSYc QTvgixR8M03S0+JkhZSagIFqfRGryk2fmY8Oxf4= X-Google-Smtp-Source: ABdhPJxpRMesfcwVP3cDFwxhtrXE3cWVaFTB1pgeIm7wgCjUmTGJYn0+LmHJIWiD1lpG6RAGzWa6tkk4ceurJUL7qr0= X-Received: by 2002:a25:8c4:: with SMTP id 187mr2224483ybi.245.1642024848755; Wed, 12 Jan 2022 14:00:48 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a05:7110:209:b0:11c:1b85:d007 with HTTP; Wed, 12 Jan 2022 14:00:48 -0800 (PST) In-Reply-To: References: <20220111134934.324663-1-Jason@zx2c4.com> <20220111134934.324663-2-Jason@zx2c4.com> From: "Jason A. Donenfeld" Date: Wed, 12 Jan 2022 23:00:48 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH crypto 1/2] lib/crypto: blake2s-generic: reduce code size on small systems To: David Laight Cc: Eric Biggers , Linux Crypto Mailing List , Netdev , WireGuard mailing list , LKML , bpf , Geert Uytterhoeven , "Theodore Ts'o" , Greg Kroah-Hartman , Jean-Philippe Aumasson , Ard Biesheuvel , Herbert Xu Content-Type: text/plain; charset="UTF-8" X-BeenThere: wireguard@lists.zx2c4.com X-Mailman-Version: 2.1.30rc1 Precedence: list List-Id: Development discussion of WireGuard List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: wireguard-bounces@lists.zx2c4.com Sender: "WireGuard" Hi David, On 1/12/22, David Laight wrote: > I think you mentioned in another thread that the buffers (eg for IPv6 > addresses) are actually often quite short. > > For short buffers the 'rolled-up' loop may be of similar performance > to the unrolled one because of the time taken to read all the instructions > into the I-cache and decode them. > If the loop ends up small enough it will fit into the 'decoded loop > buffer' of modern Intel x86 cpu and won't even need decoding on > each iteration. > > I really suspect that the heavily unrolled loop is only really fast > for big buffers and/or when it is already in the I-cache. > In real life I wonder how often that actually happens? > Especially for the uses the kernel is making of the code. > > You need to benchmark single executions of the function > (doable on x86 with the performance monitor cycle counter) > to get typical/best clocks/byte figures rather than a > big average for repeated operation on a long buffer. > > David This patch has been dropped entirely from future revisions. The latest as of writing is at: https://lore.kernel.org/linux-crypto/20220111220506.742067-1-Jason@zx2c4.com/ If you'd like to do something with blake2s, by all means submit a patch and include various rationale and metrics and benchmarks. I do not intend to do that myself and do not think my particular patch here should be merged. But if you'd like to do something, feel free to CC me for a review. However, as mentioned, I don't think much needs to be done here. Again, v3 is here: https://lore.kernel.org/linux-crypto/20220111220506.742067-1-Jason@zx2c4.com/ Thanks, Jason