From mboxrd@z Thu Jan 1 00:00:00 1970 From: john at keeping.me.uk (John Keeping) Date: Sun, 12 Mar 2017 17:51:53 +0000 Subject: [PATCH] filter: set environment variable PYTHONIOENCODING to utf-8 In-Reply-To: References: <20170223154823.18206-1-roy@marples.name> <20170304123521.GC2102@john.keeping.me.uk> <20170309001002.GF2102@john.keeping.me.uk> Message-ID: <20170312175153.GM2102@john.keeping.me.uk> On Sun, Mar 12, 2017 at 10:01:10AM -0700, Jason A. Donenfeld wrote: > Sorry for the delay. I'm currently on the road traveling and won't be > properly back at my desk until the end of next week. > > However, my initial reaction is that hard coding various > interpreter-specific environment variables in cgit itself is not > correct, and that this is something better left to the CGI environment > as it sees fit. However, we may benefit from explicit script level > configuration of unicode stuff. While I'm inclined to agree with this, in this particular case we explicitly encode pages as UTF-8 so there is an argument that we should be telling child processes that UTF-8 is the correct encoding. Maybe we should be looking to change LANG instead, but I'm not sure how reliably we can do that. Is it safe to do something like: const char *lang = getenv("LANG"); struct strbuf sb = STRBUF_INIT; if (!lang) lang = "C"; strbuf_addf(&sb, "%.*s.UTF-8", (int) (strchrnul(lang, '.') - lang), lang); setenv("LANG", sb.buf); ?