From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27545 invoked by alias); 14 Nov 2013 08:19:16 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 31980 Received: (qmail 11755 invoked from network); 14 Nov 2013 08:19:02 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201210; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=+DD+Qn8yRcfJDcnAgClj1HPHo7t1I4RkJRhNJtnyjCk=; b=PjCrcq8WF2dNEiDlBYL58Xf5F8jWITyLecJqqztQNsqF0zlCmwlHkyuOkOvngwwYUOrjRabACHoaLZEGgW0md8dvlGaYepHzI0PqKX4V53C8EB8fcFZIIGUpmtE5Up9k1em9xQqTt2SDGckN/4aoKZ81K/yHedcIuJzOpXNrgWE=; Date: Thu, 14 Nov 2013 00:18:59 -0800 From: Phil Pennock To: Martin Vaeth Cc: zsh-workers@zsh.org Subject: Re: [PATCH] helpfiles: Also accept 'UTF-8' as an encoding name. Message-ID: <20131114081859.GA64798@redoubt.spodhuis.org> Mail-Followup-To: Martin Vaeth , zsh-workers@zsh.org References: <20131112101139.31d67b73@pwslap01u.europe.root.pri> <35BD8D7E-01D5-469D-95DD-3030251D22AB@kba.biglobe.ne.jp> <131113092737.ZM11794@torch.brasslantern.com> <20131114000658.GA52075@redoubt.spodhuis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: OpenPGP: url=https://www.security.spodhuis.org/PGP/keys/0x3903637F.asc On 2013-11-14 at 07:50 +0000, Martin Vaeth wrote: > Phil Pennock wrote: > >> perhaps also be checked) there are besides de_* also fy_DE and hsb_DE on > >> a Debian installation at my institute (though I do not know what > >> they mean). > > > > ISO 639 language code, followed by ISO 3166 region tag identifying a > > regional dialect. > > Thanks for the information. That it is some sort of German dialect > spoken somewhere else was clear to me, that's why I think it *might* > be a possible fallback. Back to the point being raised, which I was addressing without being too direct, in an attempt to be a little tactful: Bart is right, the language code does need to be anchored ("^en"). The fact that "de" can appear in two places in the examples you cite does not mean it should be _accepted_ in either; "de_" is "German", "_DE" is "Germany". Looking more closely, I'm confused: why do we care which character sets are available, and why are we not just setting `LC_ALL=C` during the generation? We're making plain-text files, not stored in a charset-dependent hierarchy, from a source under our control which has a limited set of possible outputs. It seems the only changes likely to be made, depending upon character set, relate to the various hyphen/dash options (where it's highly likely we want to force HYPHEN-MINUS from traditional ASCII, to be sure that options shown can be copy/pasted without worrying about man/groff macro variants and what dash might have been chosen). The commit message and ChangeLog additions are rather short on details, since they only talk about the change causing the helpfiles to be generated and don't mention anything about the addition of locale fiddling. -Phil