mailing list of musl libc
 help / color / mirror / code / Atom feed
* How to set UTF-8 as default
@ 2016-06-02 11:05 Remko Tronçon
  2016-06-02 14:50 ` FRIGN
  2016-06-02 15:01 ` Rich Felker
  0 siblings, 2 replies; 7+ messages in thread
From: Remko Tronçon @ 2016-06-02 11:05 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 424 bytes --]

Hi,

When I call `nl_langinfo`, it returns "ASCII" by default. I can call
`setlocale(LC_CTYPE, "C.UTF-8")` to make it return "UTF-8", but I was
wondering if there was a way through environment variables to make C.UTF-8
be the default.

I tried setting LC_CTYPE and LC_ALL to C.UTF-8, but this doesn't seem to
get picked up by `nl_langinfo` (or by `setlocale(LC_CTYPE, NULL)`).

thanks!
Remko

PS: I'm trying this on Alpine.

[-- Attachment #2: Type: text/html, Size: 592 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 11:05 How to set UTF-8 as default Remko Tronçon
@ 2016-06-02 14:50 ` FRIGN
  2016-06-02 15:03   ` Rich Felker
  2016-06-02 15:01 ` Rich Felker
  1 sibling, 1 reply; 7+ messages in thread
From: FRIGN @ 2016-06-02 14:50 UTC (permalink / raw)
  To: musl

On Thu, 2 Jun 2016 13:05:40 +0200
Remko Tronçon <remko@el-tramo.be> wrote:

Hey Remko,

> I tried setting LC_CTYPE and LC_ALL to C.UTF-8, but this doesn't seem to
> get picked up by `nl_langinfo` (or by `setlocale(LC_CTYPE, NULL)`).

yeah, because that's wrong. You either use the C-locale (ASCII international)
or use the locales standard definition using e.g. en_US.UTF-8.
To get a "clean" UTF-8 environment, use (provided you have locales)

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

Cheers

FRIGN

-- 
FRIGN <dev@frign.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 11:05 How to set UTF-8 as default Remko Tronçon
  2016-06-02 14:50 ` FRIGN
@ 2016-06-02 15:01 ` Rich Felker
  2016-06-02 15:40   ` Remko Tronçon
  1 sibling, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-06-02 15:01 UTC (permalink / raw)
  To: Remko Tronçon; +Cc: musl

On Thu, Jun 02, 2016 at 01:05:40PM +0200, Remko Tronçon wrote:
> Hi,
> 
> When I call `nl_langinfo`, it returns "ASCII" by default. I can call
> `setlocale(LC_CTYPE, "C.UTF-8")` to make it return "UTF-8", but I was
> wondering if there was a way through environment variables to make C.UTF-8
> be the default.

Just call setlocale(LC_CTYPE, ""). This is the only way a correct
program should ever use setlocale. Specifying the locale name yourself
is not portable and generally contrary to the user's intent and
expectations. Passing "" requests the "default" locale, which is
implementation-defined by C. POSIX defines it in terms of the LC_* and
LANG environment variables if they are set, and an
implementation-defined default otherwise. musl's
implementation-defined default is C.UTF-8, so everythin works fine
even if no env vars are set.

> I tried setting LC_CTYPE and LC_ALL to C.UTF-8, but this doesn't seem to
> get picked up by `nl_langinfo` (or by `setlocale(LC_CTYPE, NULL)`).

NULL is not the same thing as "". Passing NULL as the second argument
to setlocale simply queries the current locale name. It does not setup
the locale.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 14:50 ` FRIGN
@ 2016-06-02 15:03   ` Rich Felker
  0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2016-06-02 15:03 UTC (permalink / raw)
  To: musl

On Thu, Jun 02, 2016 at 04:50:13PM +0200, FRIGN wrote:
> On Thu, 2 Jun 2016 13:05:40 +0200
> Remko Tronçon <remko@el-tramo.be> wrote:
> 
> Hey Remko,
> 
> > I tried setting LC_CTYPE and LC_ALL to C.UTF-8, but this doesn't seem to
> > get picked up by `nl_langinfo` (or by `setlocale(LC_CTYPE, NULL)`).
> 
> yeah, because that's wrong. You either use the C-locale (ASCII international)
> or use the locales standard definition using e.g. en_US.UTF-8.
> To get a "clean" UTF-8 environment, use (provided you have locales)
> 
> export LC_ALL=en_US.UTF-8
> export LANG=en_US.UTF-8
> export LANGUAGE=en_US.UTF-8

This isn't necessary and will possibly do more than the user wants in
the future, once LC_MONETARY and LC_COLLATE have non-stub
functionality. Wanting UTF-8 to work does not mean you want US dollars
or English collation order, etc.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 15:01 ` Rich Felker
@ 2016-06-02 15:40   ` Remko Tronçon
  2016-06-02 17:10     ` Markus Wichmann
  0 siblings, 1 reply; 7+ messages in thread
From: Remko Tronçon @ 2016-06-02 15:40 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

Hi Rich,

Thanks for your explanation.

Just call setlocale(LC_CTYPE, ""). This is the only way a correct
> program should ever use setlocale.


So, if I understand correctly, any program that expects `nl_langinfo`
to return the locale set through environment variables (or other
platform-specific ways) should call setlocale(LC_*, "") before
querying the locale; libc/musl/... will not trigger this registration
itself.

NULL is not the same thing as "". Passing NULL as the second argument
> to setlocale simply queries the current locale name. It does not setup
> the locale.
>

Right, I was just pointing out that the locale I tried to set in env
variables wasn't
being reported as the current locale by setlocale().

thanks!
Remko

[-- Attachment #2: Type: text/html, Size: 1361 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 15:40   ` Remko Tronçon
@ 2016-06-02 17:10     ` Markus Wichmann
  2016-06-02 17:32       ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Markus Wichmann @ 2016-06-02 17:10 UTC (permalink / raw)
  To: musl

On Thu, Jun 02, 2016 at 05:40:54PM +0200, Remko Tronçon wrote:
> So, if I understand correctly, any program that expects `nl_langinfo`
> to return the locale set through environment variables (or other
> platform-specific ways) should call setlocale(LC_*, "") before
> querying the locale; libc/musl/... will not trigger this registration
> itself.
> 

A libc that did that wouldn't be ISO-C compliant, because ISO-C says
that the initial locale has to be "C".

I will often just call setlocale(LC_ALL, ""); at the start of the
program, as that saves me the headache of selecting all the correct
locale categories for my program.

> thanks!
> Remko

Ciao,
Markus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to set UTF-8 as default
  2016-06-02 17:10     ` Markus Wichmann
@ 2016-06-02 17:32       ` Rich Felker
  0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2016-06-02 17:32 UTC (permalink / raw)
  To: musl

On Thu, Jun 02, 2016 at 07:10:01PM +0200, Markus Wichmann wrote:
> On Thu, Jun 02, 2016 at 05:40:54PM +0200, Remko Tronçon wrote:
> > So, if I understand correctly, any program that expects `nl_langinfo`
> > to return the locale set through environment variables (or other
> > platform-specific ways) should call setlocale(LC_*, "") before
> > querying the locale; libc/musl/... will not trigger this registration
> > itself.
> 
> A libc that did that wouldn't be ISO-C compliant, because ISO-C says
> that the initial locale has to be "C".

Well per ISO C, it could define the C locale that way (like musl used
to) but it's subject to some restrictions that musl did not follow,
making it slightly non-conforming. POSIX however is going to require a
byte-based C locale, and already has further requirements on character
classes that made it hard to conform to without just adopting the
future requirement.

> I will often just call setlocale(LC_ALL, ""); at the start of the
> program, as that saves me the headache of selecting all the correct
> locale categories for my program.

Yes, this is a good practice, although often you want to avoid setting
LC_NUMERIC or explicitly set it back to "C" so that floating point
number parsing/printing is not broken.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-02 17:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02 11:05 How to set UTF-8 as default Remko Tronçon
2016-06-02 14:50 ` FRIGN
2016-06-02 15:03   ` Rich Felker
2016-06-02 15:01 ` Rich Felker
2016-06-02 15:40   ` Remko Tronçon
2016-06-02 17:10     ` Markus Wichmann
2016-06-02 17:32       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).