From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 25156 invoked from network); 17 Oct 2022 06:35:55 -0000 Received: from 9front.inri.net (168.235.81.73) by inbox.vuxu.org with ESMTPUTF8; 17 Oct 2022 06:35:55 -0000 Received: from wopr.sciops.net ([216.126.196.60]) by 9front; Mon Oct 17 02:34:17 -0400 2022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sciops.net; s=20210706; t=1665988341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to; bh=DGw24PJutOLl5suLDVKIItQnD2VZQt5Zbi8cgEkyVHI=; b=1tbDdG7LhPmrCvXt0xyAmPOwFsC1QWN734RVXMj1Af7/siPc6ERLeCF+Wwcg+l+5JdAOt2 2aY/gvBRxKVDitVdJJ7+BFNNXeHC9ALlQQUc3ocQjPDy2F0ahK8zlmtPAi95efujhw0SD3 op5NID9aFU+2kS2N24x2ViOXVFlcq9M= Received: by wopr.sciops.net (OpenSMTPD) with ESMTPSA id 32a89026 (TLSv1.2:ECDHE-RSA-CHACHA20-POLY1305:256:NO) for <9front@9front.org>; Sun, 16 Oct 2022 23:32:21 -0700 (PDT) Message-ID: <5B9E186DD7F2C4E2910241F9FDEB0C09@wopr.sciops.net> Date: Mon, 17 Oct 2022 08:34:06 +0200 From: qwx@sciops.net To: 9front@9front.org In-Reply-To: <5e9c64f4-bad0-ce74-7e15-e8883598fcc2@posixcafe.org> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: RESTful RESTful map/reduce hosting Subject: Re: [9front] [PATCH] introduce code points above BMP to /lib/unicode Reply-To: 9front@9front.org Precedence: bulk On Mon Oct 17 07:25:49 +0200 2022, moody@mail.posixcafe.org wrote: > Our /lib/unicode is a bit out of date, this updates our stripped down > version of UnicodeData.txt that we keep in /lib to cover characters > and code ranges above the Basic Multilingual Plane. > > This does balloon the file a bit compared to the ~200k original. > ; 800k /lib/unicode > > The full patch is attached. Of note the non-zero padding of the BMP > range is replicated in the upstream UnicodeData.txt, I would be open > to zero padding ours but this would change the results of existing > scripts that use look(1) with /lib/unicode. > > Not sure how much use others get out of /lib/unicode, but wanted > to ask if people thought it was worth the size to update. > > Thanks, > moody I'm definitely in favor, thanks for doing this. Maybe a problem on my end, but I can't gunzip the attached patch. Cheers, qwx