From mboxrd@z Thu Jan 1 00:00:00 1970 From: john at keeping.me.uk (John Keeping) Date: Fri, 17 Mar 2017 20:04:55 +0000 Subject: [PATCH] filter: set environment variable PYTHONIOENCODING to utf-8 In-Reply-To: References: <20170223154823.18206-1-roy@marples.name> <20170304123521.GC2102@john.keeping.me.uk> <20170309001002.GF2102@john.keeping.me.uk> <20170312175153.GM2102@john.keeping.me.uk> Message-ID: <20170317200455.GO2102@john.keeping.me.uk> On Fri, Mar 17, 2017 at 07:07:02PM +0100, Jason A. Donenfeld wrote: > On Sun, Mar 12, 2017 at 6:51 PM, John Keeping wrote: > > While I'm inclined to agree with this, in this particular case we > > explicitly encode pages as UTF-8 so there is an argument that we should > > be telling child processes that UTF-8 is the correct encoding. > > That's a compelling argument, actually. > > > > > Maybe we should be looking to change LANG instead, but I'm not sure how > > reliably we can do that. > > I'm more onboard with that. Does changing LANG influence the PYTHON > variable implicitly? Yes, if there is no explicit encoding requested then Python derives it from the locale. However, it only works if the locale actually exists on the system; for example on my system I get: $ LANG=en_GB.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)' UTF-8 $ LANG=en_GB.ISO-8859-1 python2 -c 'import sys; print(sys.stdin.encoding)' ISO-8859-1 but I don't have C.UTF-8, so: $ LC_ALL=C.UTF-8 python2 -c 'import sys; print(sys.stdin.encoding)' ANSI_X3.4-1968 There's an open glibc bug [1] to support C.UTF-8 but for now it looks like it's only available on Debian and derivatives. > > Is it safe to do something like: > > > > const char *lang = getenv("LANG"); > > struct strbuf sb = STRBUF_INIT; > > > > if (!lang) > > lang = "C"; > > strbuf_addf(&sb, "%.*s.UTF-8", > > (int) (strchrnul(lang, '.') - lang), lang); > > setenv("LANG", sb.buf); > > That's probably not too bad, though I wonder if we could get away with > just explicitly setting a more generic UTF-8 instead of trying to read > the user's language preferences. Other people have already found that it's not quite that simple [2] if we want it to work on all systems. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=17318 [2] https://github.com/commercialhaskell/stack/issues/856