List for cgit developers and users
 help / color / mirror / Atom feed
From: john at keeping.me.uk (John Keeping)
Subject: [PATCH] filter: set environment variable PYTHONIOENCODING to utf-8
Date: Fri, 17 Mar 2017 20:04:55 +0000	[thread overview]
Message-ID: <20170317200455.GO2102@john.keeping.me.uk> (raw)
In-Reply-To: <CAHmME9pmyNFWUaOntZgejYDu7nTk+PFF76=fKpY3+kgA=bw+Vg@mail.gmail.com>

On Fri, Mar 17, 2017 at 07:07:02PM +0100, Jason A. Donenfeld wrote:
> On Sun, Mar 12, 2017 at 6:51 PM, John Keeping <john at keeping.me.uk> wrote:
> > While I'm inclined to agree with this, in this particular case we
> > explicitly encode pages as UTF-8 so there is an argument that we should
> > be telling child processes that UTF-8 is the correct encoding.
> 
> That's a compelling argument, actually.
> 
> >
> > Maybe we should be looking to change LANG instead, but I'm not sure how
> > reliably we can do that.
> 
> I'm more onboard with that. Does changing LANG influence the PYTHON
> variable implicitly?

Yes, if there is no explicit encoding requested then Python derives it
from the locale.

However, it only works if the locale actually exists on the system; for
example on my system I get:

$ LANG=en_GB.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)'
UTF-8
$ LANG=en_GB.ISO-8859-1 python2 -c 'import sys; print(sys.stdin.encoding)' 
ISO-8859-1

but I don't have C.UTF-8, so:

$ LC_ALL=C.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)'
ANSI_X3.4-1968

There's an open glibc bug [1] to support C.UTF-8 but for now it looks
like it's only available on Debian and derivatives.

> > Is it safe to do something like:
> >
> >         const char *lang = getenv("LANG");
> >         struct strbuf sb = STRBUF_INIT;
> >
> >         if (!lang)
> >                 lang = "C";
> >         strbuf_addf(&sb, "%.*s.UTF-8",
> >                     (int) (strchrnul(lang, '.') - lang), lang);
> >         setenv("LANG", sb.buf);
> 
> That's probably not too bad, though I wonder if we could get away with
> just explicitly setting a more generic UTF-8 instead of trying to read
> the user's language preferences.

Other people have already found that it's not quite that simple [2] if
we want it to work on all systems.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17318
[2] https://github.com/commercialhaskell/stack/issues/856


  reply	other threads:[~2017-03-17 20:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 15:48 roy
2017-03-04 12:35 ` john
2017-03-06  9:14   ` roy
2017-03-08 19:01     ` roy
2017-03-09  0:10       ` john
2017-03-12 17:01         ` Jason
2017-03-12 17:51           ` john
2017-03-17 18:07             ` Jason
2017-03-17 20:04               ` john [this message]
2017-03-17 20:13                 ` roy
2017-08-10 14:04                   ` Jason
2017-03-09  0:18 roy
2017-03-10 15:28 ` john

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170317200455.GO2102@john.keeping.me.uk \
    --to=cgit@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).