From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 25641 invoked from network); 30 Nov 2020 17:23:59 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 30 Nov 2020 17:23:59 -0000 Received: (qmail 24358 invoked by uid 550); 30 Nov 2020 17:23:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 24337 invoked from network); 30 Nov 2020 17:23:55 -0000 X-Virus-Scanned: Debian amavisd-new at disroot.org Mime-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=disroot.org; s=mail; t=1606757017; bh=+BlUu++f3NN5cs+aWmj5wRQoZp5BNcRaNGpwp4jtfto=; h=Cc:Subject:From:To:Date:In-Reply-To; b=KG0AJSn8UWmKaVQ0A29GI50X5uJeXUg1cRIGLkdwieuAI5DhbcF3TIyL1XmFNtC7h 47DIgGA/7A2EH+19JRNeBTwcsW335fv+m+gEgeqNDRFIxKfZ0XLeSuG1eTgJ3aAvwA Wn7ImN+lm0Gl42S6ByNFsvtVHvcvxvpgpQ2QHcy+IbvtRsYDf9I1LCqIpDEVOl5i1w //fiyzViL9DqOFBlXPgq5QK1bGbi6emUQR2gxLDCvzzf28ZDXvqexHTUkViBCB5Lij BrwOjg07YtmniOZqAJ6Rkz7JMHJaGUF7d4FMzpQMLqKr5EWn3vsM1ljugTbAxbkAZO D11WMBcTSk+Bg== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Cc: "Samuel Holland" , "Dong Brett" From: =?utf-8?q?=C3=89rico_Nogueira?= To: Date: Mon, 30 Nov 2020 14:14:15 -0300 Message-Id: In-Reply-To: <20201130153503.GP534@brightrain.aerifal.cx> Subject: Re: [musl] Question on C++ locale On Mon Nov 30, 2020 at 12:35 PM -03, Rich Felker wrote: > On Mon, Nov 30, 2020 at 12:12:50PM -0300, =C3=89rico Nogueira wrote: > > On Mon Nov 30, 2020 at 11:39 AM -03, Samuel Holland wrote: > > > On 11/30/20 7:44 AM, =C3=89rico Nogueira wrote: > > > > On Mon Nov 30, 2020 at 8:35 AM -03, Szabolcs Nagy wrote: > > > >> * Dong Brett [2020-11-30 18:41:33 > > > >> +0800]: > > > >>> However, the following C++ code does not work (our software uses = std::locale in C++ standard library for locale related stuff): > > > >>> #include > > > >>> #include > > > >>> #include > > > >>> using namespace std; > > > >>> int main() > > > >>> { > > > >>> std::locale::global(locale("")); > > > >>> initscr(); > > > >>> printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL)); > > > >>> printw("C++ locale: %s\n", locale().name().c_str()); > > > >>> printw("CODESET: %s\n", nl_langinfo(CODESET)); > > > >>> printw("Hello, world!\n"); > > > >>> printw("=E4=BD=A0=E5=A5=BD=EF=BC=8C=E4=B8=96=E7=95=8C!\n"); > > > >>> refresh(); > > > >>> getch(); > > > >>> endwin(); > > > >>> return 0; > > > >>> } > > > >> > > > >> fwiw for me even the first line fails. > > > >> i don't know how c++ locales are supposed to work. > > > >=20 > > > > From [1], it seems that C++ locales are supposed to affect the glob= al > > > > locale as well, so they should call setlocale() when appropriate. > > > >=20 > > > > - [1] https://www.cplusplus.com/reference/locale/locale/ > > > >=20 > > > > Unfortunately, I assume libstdc++ uses their generic locale support= on > > > > musl... From gcc-10.2.0/libstdc++-v3/config/locale/generic/c_local= e.cc: > > > >=20 > > > > void > > > > locale::facet::_S_create_c_locale(__c_locale& __cloc, const char*= __s, > > > > __c_locale) > > > > { > > > > // Currently, the generic model only supports the "C" locale. > > > > // See http://gcc.gnu.org/ml/libstdc++/2003-02/msg00345.html > > > > __cloc =3D 0; > > > > if (strcmp(__s, "C")) > > > > __throw_runtime_error(__N("locale::facet::_S_create_c_locale = " > > > > "name not valid")); > > > > } > > > >=20 > > > > > > I don't know for sure that it's the right thing to do, but I have bee= n > > > patching > > > out that error for the last several years[1] and so far I have not > > > noticed any > > > negative effects. Adelie, which is very thorough about testing, has a= lso > > > carried > > > the patch for a while[2]. > > > > > > Samuel > > > > > > [1]: > > > https://github.com/smaeul/portage/blob/c744774a/patches/sys-devel/gcc= /gcc-5.4.0-locale.patch > > > [2]: https://code.foxkit.us/adelie/packages/-/commit/d09b437d > >=20 > > Are those patches correct in functionality? The GNU version is: > >=20 > > void > > locale::facet::_S_create_c_locale(__c_locale& __cloc, const char* __s= , > > __c_locale __old) > > { > > __cloc =3D __newlocale(1 << LC_ALL, __s, __old); > > if (!__cloc) > > { > > // This named locale is not supported by the underlying OS. > > __throw_runtime_error(__N("locale::facet::_S_create_c_locale " > > "name not valid")); > > } > > } > >=20 > > It tries to create a locale object, which the generic code doesn't do. > > In the generic case, _S_create_c_locale is basically a noop, and I'd > > assume localization wouldn't work, even if it does avoid the runtime > > abort. > >=20 > > I will try it out locally when I get the time. > > The code there in the GNU version is correct (the one without > newlocale isn't correct) aside from having the __ prefix, but other > parts of the GNU version are wrong in that they poke at glibc > internals to "optimize" useless byte-based ctype functions (useless > because they can't operate on the only characters whose properties > could vary by locale, the non-ASCII ones). There should probably be a > new "posix" directory here based on the GNU one but with all the > GNUisms removed. If it's not hard to backport that to older GCC > versions maybe we should do that. C++ is a bit mysterious to me; do you think there's a chance that changing the libstdc++ locale implementation could break programs built for the old version? I also wonder what the configure script should look for in order to choose which version to use. >From a really quick look at _S_create_c_locale, the dragonfly version might be usable for this purpose, although it uses some non-standard headers. > > One thing: I think in order for std::locale::global to be able to > work, the locale creation code also needs to store the name (string) > passed to locale() constructor, since there's no way to setlocale to a > locale_t. Instead you need to remember the name so you can setlocale() > to the same name. Perhaps NL_LOCALE_NAME would suffice, but I don't > think it can easily give the exact same behavior since it's > per-category. > > Rich