From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 6999522159 for ; Tue, 21 Jan 2025 21:43:50 +0100 (CET) Received: (qmail 17710 invoked by uid 550); 21 Jan 2025 20:43:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com x-ms-reactions: disallow Received: (qmail 15938 invoked from network); 21 Jan 2025 20:43:26 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737492198; x=1738096998; darn=lists.openwall.com; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=NGM8/M/ldl5hXhovPDRvs9+bGTpb3jM8WMOPNemU4VI=; b=FwnBEnVNR1We70nVSubuousxncX8Atvjtwjdl5xKvBbNVgaAcR3Zqww8YVknCVp2EO mDcYyTbLCPHoea26+uRQiMCiJEJRCf/B8oy2xEqbCHhdAZgBo6b831opZ6uaL7xqcuGj 4TZB69K9B8WuZ5/zCFZaUc5AyWpo+2geaUqDWOHZNXkz4IbyfSE2Z/3C8lbt/f2te7ps kqmIqDRMbI8uZwxvNxewvxa46OodxOlpfuHH0mbIyV6S3CkP5VTj45GD0UxCzi2VGXGo t+VjjrAxe/pKDjWskh8gIKw/An9G+/okjCW+nxzCa2hfImprqCfOV2P+U/hSxYAC4YPC laGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737492198; x=1738096998; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NGM8/M/ldl5hXhovPDRvs9+bGTpb3jM8WMOPNemU4VI=; b=Bdks+/zI3+YH8Gaa8vvlv3ehLHEOHyr8Ze/YW8Nzybi6VJmkhs0KLRX3IdIzfq0s9B TzMvhSUUeNUfojAt6m/XHiRmOmn+J7d/Wnf8RDykIkiBwGgDP6uJM4s+wMtlILqPbaLf p/odFnmmSHE1pPI1AcrrD5rTVRw9JJv76+s+ZXvb6v3kNg5mhgSDMUhYzmyGoGosQuf0 shSLt7PizJuG4sECRtmnk8GiY3XLbgMnQK9I3WCdAn2SqTmlP+EjjQpM3nyZpGXIhqsT 76wlNpm9bOPW62hW/MHyJsN6Sw5f9VAnKYpIKsJi93vi6dhQpzElYZBD807LVuMWsAGy mAbQ== X-Gm-Message-State: AOJu0YyCOBojY9VspKILZYABgsaEmAKXZmjT62Ytd0cNhk8+pmXrMKZ+ 78r5h6/DlJ3AwkojrA1RLsHYm3OoVXKktH8fvJRDv5JYKahzgBZb X-Gm-Gg: ASbGnct0kdqeofw815u5rL43DsOLq4pv3P7rXenqVeEC4eLgckhhCb8ouTLZ79NpEdq DcwGen2hWvhii9YluOB6maEpVwoE/yIOtw6RGUET9ES+Nlg4O8oyT8EKGQXk/ZzG35q/PziV+cg AVK8twNwl6f0iF8BkD65ALzd1VHebjGMVTQ8FcuU4EwJzIJqsYDtwr24NxVrDsfNcGm2u80ZjLJ yLbYKp1BoqEcdLz+Ly2jwk/toSyQ5vmwLzhxpeXUOUk1osPDxmSYot5CIUtFLoz1y00Z1lOog== X-Google-Smtp-Source: AGHT+IGBsaKL2LFY+H75AcGhpwj2zQRWFW+ihF4TmXhwi4fR2CLbx5i79t6jnGBGx6olAdPdcsB1lg== X-Received: by 2002:a5d:64c2:0:b0:385:df73:2f18 with SMTP id ffacd0b85a97d-38bf59ed533mr21798466f8f.51.1737492198012; Tue, 21 Jan 2025 12:43:18 -0800 (PST) From: Gavin Smith X-Google-Original-From: Gavin Smith Date: Tue, 21 Jan 2025 20:43:16 +0000 To: Rich Felker Cc: musl@lists.openwall.com, Patrice Dumas Message-ID: References: <20250112045105.GI10433@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250112045105.GI10433@brightrain.aerifal.cx> Subject: Re: [musl] gettext LC_MESSAGES differences from other libc On Sat, Jan 11, 2025 at 11:51:05PM -0500, Rich Felker wrote: > > Recently in the Texinfo project, we found this incompatibility with musl > > for translations of strings to be placed in output files. The gettext > > API (neither musl or glibc/other) is not a perfect match for Texinfo > > needs as much assumes that the target language is that of the user, of > > the person sitting in front of the computer, whereas the appropriate > > translation language is that of the input document. For example, somebody > > could be generating documentation in Italian to be posted to a website, > > while they don't speak Italian themselves and do not have an Italian > > locale installed. > > This sounds like locale is not the right tool for processing it. > > > The only way we can support this with glibc is to set LC_MESSAGES and/or > > LC_ALL to a locale that is not "C" or "POSIX", and then to set the LANGUAGE > > variable for the actual target language. This is a nuisance, as sometimes > > it is a struggle to actually find such a locale. The assumption when this > > API was designed was that a user with only a "C" locale does not need > > translations, but this is false when they are generating them for somebody > > else. libc appears to offer no way just to open an arbitrary .mo file (the > > file with the translated strings in it) to get the translations, forcing > > you to go through the locale system. > > If you just want to process .mo files without going thru the locale > system, the necessary code is about 42 source lines/329 machine code > bytes that's MIT-licensed in musl that you're free to copy. This > probably makes the most sense. Thanks for the suggestion. It is possible that we will end up doing this, if the current approach has more problems. I noticed that your implementation at: https://git.musl-libc.org/cgit/musl/tree/src/locale/__mo_lookup.c does not refer to a hashing table section of the .mo file. This could make it slower. I'm not sure if there is a relevant standard for the format for .mo files. At https://pubs.opengroup.org/onlinepubs/9799919799/utilities/msgfmt.html, it says: "The format of the created messages object files is unspecified." The GNU gettext manual gives some documentation on the file format, but does not document the format of the hashing table: "The precise hashing algorithm used is fairly dependent on GNU gettext code, and is not documented here." https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html Apart from the hashing table issue, using libc gettext handles some other things that we would have to recreate ourselves, on top of the .mo file format. Ones I can think of are translation contexts, character encodings, and regional language variants (e.g. pt and pt_BR). Another issue could be plural variants of translations. We could probably reimplement all of this without a huge amount of difficulty, if we really wanted to, although as the code is in the libc already it would seem simpler if we could access it.