From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 9045 invoked from network); 20 Mar 2023 22:01:48 -0000 Received: from minnie.tuhs.org (2600:3c01:e000:146::1) by inbox.vuxu.org with ESMTPUTF8; 20 Mar 2023 22:01:48 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 016FA41220; Tue, 21 Mar 2023 08:01:43 +1000 (AEST) Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by minnie.tuhs.org (Postfix) with ESMTPS id 8AF3B4121D for ; Tue, 21 Mar 2023 08:01:35 +1000 (AEST) Received: by mail-ua1-x932.google.com with SMTP id g23so8979782uak.7 for ; Mon, 20 Mar 2023 15:01:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccil-org.20210112.gappssmtp.com; s=20210112; t=1679349694; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=xNIFuwDKWH29Wsi1NJJxElfQUlcyR/Jof7tlrZL9YlQ=; b=EaU5oOMO0MRZjQWEg0TPPN4ytt+qOfD7wM1v2HftCggbYpTHXGoO42AhEJuJu6Cg6T wJRKlW16tYdyi847B5FWSldUsJT7XNCZMEQ/6QGxUpmXwDLq5WavMoM7ybTjVZPr925l BgiENhFM7izu1h04hOFn9+6fc1j2Mg2SsCFytkaE4mSi8BKo2NDP+H8+MY2pMsQkBV8M uacV7+c0iNw2NxQ/D2SpnkTyVe/Bv0Z1kzg9/m1/8vCAPtk4SJTlk5dqxC+gYjG+u200 BnDgQEz3+gB1n/9t+56GoJevHJdp5tJuuOuv8Mtdru7nKdLfGpTfraCQoAm7lPaKpzrl 1BiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679349694; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xNIFuwDKWH29Wsi1NJJxElfQUlcyR/Jof7tlrZL9YlQ=; b=5qp+IKh3079Y9YwMnW09CpAQ2rBzR+KjAgcnb4hrlkQcYkYZLrom4+M1l3jSOYXioo wU+OksyT44KG2flpz3m37Dx3lB8D8lki+H9qAqesKbyhgeVTEFyOsS3M0YKOpoXpu6z+ Lx2G6ZAveQXgWXg5oqeQofyv1XLCrgs645kqWxHq1DTtO5OOzlVv7qATAN2TnpPwXgL1 Z++WvwVfhrFN7gjWJBW4Q7Bbhy07cLjo/djGMWoJvpm41kaectp0YzyLdDfPAK97GUMb CRvzc740WXaf+Zj/po3za8Oqn5++vIy5rQL81szO1EO8PHIxUVdA4/wI8Oz+a/3bBHfM PNuQ== X-Gm-Message-State: AO0yUKWsQVsGCX5mym0/eqG4WjUkVnE8uwSN7fygA/TDxRKDPUiRxfrs ERpGd7eI7uoGlriSZb4fvYN1VubM3AV1ygvf2zq7uA== X-Google-Smtp-Source: AK7set/lrkWrvN0pE6QMpSICu5vSA02XxiRjzx+2vvJMHCpPTmiiR1nuz6GnbI/yISpEmCUnL88xiOvvEENtr9uzY1E= X-Received: by 2002:a05:6130:325:b0:688:d612:2024 with SMTP id ay37-20020a056130032500b00688d6122024mr6047628uab.2.1679349694338; Mon, 20 Mar 2023 15:01:34 -0700 (PDT) MIME-Version: 1.0 References: <20230319134701.3A262220F7@orac.inputplus.co.uk> <202303200755.32K7tIeW023352@freefriends.org> <20230320154430.DW_SS%steffen@sdaoden.eu> In-Reply-To: <20230320154430.DW_SS%steffen@sdaoden.eu> From: John Cowan Date: Mon, 20 Mar 2023 18:01:23 -0400 Message-ID: To: arnold@skeeve.com, robpike@gmail.com, ralph@inputplus.co.uk, tuhs@tuhs.org Content-Type: multipart/alternative; boundary="000000000000fc309e05f75c11f5" Message-ID-Hash: ZI6QQQQO5D7FNUT5UT7ZPNMXKXYJX4WA X-Message-ID-Hash: ZI6QQQQO5D7FNUT5UT7ZPNMXKXYJX4WA X-MailFrom: cowan@ccil.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: Bell Foreign-Language UNIX Efforts List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --000000000000fc309e05f75c11f5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 20, 2023 at 4:48=E2=80=AFPM Steffen Nurpmeso wrote: However note that even something like "uppercase this string" > cannot be done the right way, because a truly Unicode aware > operation needs to look at the entire string (sentence), because > there may be interdependencies that modify the result. If you are talking about downcasing Greek =CE=A3, then it's true that alway= s downcasing =CE=A3 to =CF=83 is inadequate. Unicode specifies that if the = =CE=A3 appears before a space or punctuation mark, it downcases to =CF=82 instead. But th= is is not always correct. For example, if the string "=CE=A6=CE=99=CE=9B=CE=9F=CE=A3." is the word "= =CF=86=CE=B9=CE=BB=CE=BF=CF=83" (meaning 'beloved' or 'friend') at the end of a sentence, "=CF=86=CE=B9=CE=BB=CE=BF=CF=83." is= the correct downcasing. But if it is the abbreviation for "=CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF= =86=CE=AF=CE=B1", meaning "philosophy", then the correct downcasing is "=CF=86=CE=B9=CE=BB=CE=BF=CF=83." So getting thi= s right is an AI-complete problem which neither Unicode nor ICU can solve. --000000000000fc309e05f75c11f5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Mo= n, Mar 20, 2023 at 4:48=E2=80=AFPM Steffen Nurpmeso <steffen@sdaoden.eu> wrote:

However note that even something like "uppercase this strin= g"
cannot be done the right way, because a truly Unicode aware
operation needs to look at the entire string (sentence), because
there may be interdependencies that modify the result.
If you are talking about downcasing Greek = =CE=A3, then it's true that always downcasing =CE=A3 to =CF=83 is inade= quate.=C2=A0 Unicode specifies that if the =CE=A3 appears before a space or= punctuation mark, it downcases to =CF=82 instead.=C2=A0 But this is not al= ways correct.

For example, if the string "=CE=A6=CE=99=CE=9B=CE=9F=CE=A3."= ; is the word "=CF=86=CE=B9=CE=BB=CE=BF=CF=83" (meaning 'belo= ved' or 'friend') at the end of a sentence, "=CF=86=CE=B9= =CE=BB=CE=BF=CF=83." is the correct downcasing.=C2=A0 But if it is the= abbreviation for "=CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF=86=CE=AF=CE= =B1", meaning "philosophy", then the correct downcasing is &= quot;=CF=86=CE=B9=CE=BB=CE=BF=CF=83."=C2=A0 So getting this right is a= n AI-complete problem which neither Unicode nor ICU can solve.
<= /div> --000000000000fc309e05f75c11f5--