From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 11489 invoked from network); 19 Jul 2023 17:11:19 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 19 Jul 2023 17:11:19 -0000 Received: (qmail 19870 invoked by uid 550); 19 Jul 2023 17:11:15 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 19832 invoked from network); 19 Jul 2023 17:11:15 -0000 X-Proofpoint-GUID: TwwwHjjAf6CF4ITizI_cKajbo1HhUh2A X-Proofpoint-ORIG-GUID: TwwwHjjAf6CF4ITizI_cKajbo1HhUh2A X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.591,18.0.957 definitions=2023-07-11_09:2023-07-11,2023-07-11 signatures=0 X-Proofpoint-Spam-Details: rule=interactive_user_notspam policy=interactive_user score=0 spamscore=0 suspectscore=0 adultscore=0 mlxlogscore=999 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307110152 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : content-type : content-transfer-encoding : mime-version : subject : date : references : to : in-reply-to : message-id; s=20180706; bh=+FoEUFfkfMJItW1m6Ftn1ikvAuRken6VbdfcWfxwEik=; b=KYMDOQmLVQhtAWQ3fM+A3Y2Sk6PPqUvMySUV9oYwfJlFHFkP1hZ/2B5GLIHuAohiZyGM 7B/2924g0WFRMM7LfmDEaOJKRzgAzpYAPglEmoRnIMJ8dUd8NsVde7+zvfCXvH3cCm00 NV1r1IKMt9UCmYybJMk8GoGVtO9HbNRx92IkU7R5aZ/lyXVoIOEDgKVVP9nW9o5MCHQo sOuzm1xSfWTAhjlSYMkXVM0T558+ItSulPelW6jRGN32uOvkXg+ZfFW3Ro5osHtA+fCs 9TPIcOM6bSSYB5qXYO2WuXVMwj0pUPi8tLcnnZOqS+KedEMjOm4f5022p6BEccs7v4IC mw== X-Va-A: X-Va-T-CD: 14dc138f66d48087821ca031f1b11e6a X-Va-E-CD: f96c7fe1e97f4ac5299011733ae8751d X-Va-R-CD: 4d29c8d1aa54e20f4ea2a6a37d633d6c X-Va-ID: 6c5861bc-af19-4ae8-8ef9-1c95ef03254d X-Va-CD: 0 X-V-A: X-V-T-CD: 14dc138f66d48087821ca031f1b11e6a X-V-E-CD: f96c7fe1e97f4ac5299011733ae8751d X-V-R-CD: 4d29c8d1aa54e20f4ea2a6a37d633d6c X-V-ID: 4991af74-7dfe-4233-9d29-73dcb164a3ef X-V-CD: 0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.591,18.0.957 definitions=2023-07-19_12:2023-07-18,2023-07-19 signatures=0 From: Alastair Houghton Content-type: text/plain; charset=utf-8 Content-transfer-encoding: quoted-printable MIME-version: 1.0 (Mac OS X Mail 16.0 \(3769.100.3\)) Date: Wed, 19 Jul 2023 18:10:52 +0100 References: <92D1805E-DEAE-4BB4-94F0-38EB24F2EE33@apple.com> To: musl@lists.openwall.com In-reply-to: Message-id: <5E20CC9B-57C5-47D5-88F1-5CD8727B421F@apple.com> X-Mailer: Apple Mail (2.3769.100.3) Subject: Re: [musl] setlocale() behaviour On 19 Jul 2023, at 17:51, Markus Wichmann wrote: >=20 > Well, you must not depend on implementation internals. According to > POSIX, the form of the locale environment variables and the strings to > be plugged into setlocale() (except for "POSIX", "C", "", and the null > pointer) are implementation-defined, and musl defines that absolutely > any name is supported and is a copy of C.UTF-8 (again, except for > "POSIX" and "C"). The name handed in must be returned back out again = for > gettext to work. As was pointed out in the previous thread, POSIX does say, in the = Rationale section: > If the string does not correspond to a valid locale, setlocale() shall = return a null pointer and the international environment is not changed. And while I think musl probably does get out of it on the technicality = that the =E2=80=9Ccontents of this string are implementation-defined=E2=80= =9D so it can claim that any old string =E2=80=9Ccorrespond[s] to a = valid locale=E2=80=9D that just happens to be C.UTF-8 unless there=E2=80=99= s a data file installed, I think the current behaviour is very much not = in the spirit of what was intended here. But I don=E2=80=99t actually care about the standards lawyering here. = As Rich Felker noted previously about this behaviour: > Unfortunately this turns out to have been something of a tradeoff, > since there's no way for applications (and, as it turns out, > especially tests/test suites) to query whether a particular locale is > "really" available. I've been asked to change the behavior to fail on > unknown locale names, but of course that's not a working option in > light of the above. He then went on in a subsequent message to the list to suggest > 1. setlocale(cat, explicit_locale_name) - succeeds if the locale > actually has a definition file, fails and returns a null pointer > otherwise. >=20 > 2. setlocale(cat, "") - always succeeds, honoring the environment > variable for the category if a locale definition file by that name > exists, but otherwise (the unspecified behavior) treating it as if > it were C.UTF-8. Which would work just fine for libc++. What it=E2=80=99s trying to do = is to check whether e.g. fr_FR is supported, then it can enable = additional tests that rely on the French localisation being present. I = appreciate that, per the POSIX standard, you don=E2=80=99t *technically* = know what fr_FR means, but in practice that isn=E2=80=99t really true = --- an implementation that had a locale installed called fr_FR that = *wasn=E2=80=99t* French would be pretty silly. Unfortunately, because = of musl=E2=80=99s current behaviour, it *looks like* such an = implementation, even though it actually isn=E2=80=99t (and it totally = could have a genuine fr_FR locale if you had the right data in the right = place). I can, of course, check whether setlocale(LC_ALL, =E2=80=9Csomething = ridiculous=E2=80=9D) succeeds and returns =E2=80=9Csomething = ridiculous=E2=80=9D, then disable all locales except for POSIX, C and = C.UTF-8. That will work around the current musl behaviour without = causing trouble with other C libraries, but libc++=E2=80=99s maintainer = isn=E2=80=99t terribly keen on it and would rather we explore the = possibility of musl changing its implementation. TL/DR: What I=E2=80=99m really enquiring about here is the fact that = there was discussion about changing it to work in a more useful manner, = but nothing changed (and I don=E2=80=99t see anything in that email = thread to explain that it was decided to not make the proposed change; = but maybe I missed it?) Kind regards, Alastair.