From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20831 invoked from network); 9 Dec 2007 18:01:59 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 9 Dec 2007 18:01:59 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 83031 invoked from network); 9 Dec 2007 18:01:54 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 9 Dec 2007 18:01:54 -0000 Received: (qmail 1770 invoked by alias); 9 Dec 2007 18:01:49 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24198 Received: (qmail 1753 invoked from network); 9 Dec 2007 18:01:49 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 9 Dec 2007 18:01:49 -0000 Received: from virusfilter.dotsrc.org (bifrost [127.0.0.1]) by spamfilter.dotsrc.org (Postfix) with ESMTP id 9280D801CDCF for ; Sun, 9 Dec 2007 18:59:24 +0100 (CET) Received: from a.mx.sunsite.dk (new-brage.dotsrc.org [130.225.254.104]) by bifrost.dotsrc.org (Postfix) with SMTP for ; Sun, 9 Dec 2007 18:59:24 +0100 (CET) Received: (qmail 82674 invoked from network); 9 Dec 2007 18:01:47 -0000 Received: from mtaout02-winn.ispmail.ntl.com (81.103.221.48) by a.mx.sunsite.dk with SMTP; 9 Dec 2007 18:01:39 -0000 Received: from aamtaout03-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com with ESMTP id <20071209180204.KUMU25022.mtaout02-winn.ispmail.ntl.com@aamtaout03-winn.ispmail.ntl.com>; Sun, 9 Dec 2007 18:02:04 +0000 Received: from pws-pc.ntlworld.com ([82.6.96.116]) by aamtaout03-winn.ispmail.ntl.com with SMTP id <20071209180234.WYUZ26699.aamtaout03-winn.ispmail.ntl.com@pws-pc.ntlworld.com>; Sun, 9 Dec 2007 18:02:34 +0000 Date: Sun, 9 Dec 2007 18:01:27 +0000 From: Peter Stephenson To: 451382@bugs.debian.org, zsh-workers@sunsite.dk Subject: Re: Bug#451382: i18n is NOT so easy! Message-Id: <20071209180127.d955eb4f.p.w.stephenson@ntlworld.com> In-Reply-To: <200712071726.lB7HQv76016517@news01.csr.com> References: <20071205200825.148710@gmx.net> <20071206155436.GA6034@scowler.net> <200712061808.56054.ismail@pardus.org.tr> <20071206161022.GA6960@scowler.net> <20071207104413.74da4ef6@news01> <200712071411.lB7EBf2U014439@news01.csr.com> <20071207171511.GA2937@scowler.net> <200712071726.lB7HQv76016517@news01.csr.com> X-Mailer: Sylpheed 2.3.1 (GTK+ 2.10.14; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP On Fri, 07 Dec 2007 17:26:57 +0000 Peter Stephenson wrote: > Clint Adams wrote: > > On Fri, Dec 07, 2007 at 02:11:41PM +0000, Peter Stephenson wrote: > > > Found it: see thread around > > > > > > http://www.zsh.org/mla/workers/2006/msg00753.html > > > > I think it would be easier to do something like bash's $"" interface to > > gettext and co-opt that for completion translations. > > As far as I understood it (it doesn't seem to be well documented) that > only does translations which are pre-compiled into the shell (or rather > its libraries). We need something which can be updated with completion > functions. It's OK if the definitions are in another file (though we > could presumably have an interface which adds translations from the > completion function itself) but it needs to be added at run time. > > Possibly we can still do this with $"...", but I don't like the idea > that if you change the original message you can no longer find the > translation, which seems to me to be asking for trouble. Further thoughts after groping through the gettext documentation for a bit... this is not a definitive answer (though rather closer to one than when I originally wrote that two hours and counting ago) but unless I post it now I'll forget it. A summary is that I believe we can use the internationalization functions in the library behind gettext(), to avoid reinventing the wheel and maintain some compatibility, but it'll take a bit more care to get this right than simply $"" plus gettext(""). I think we have two basic problems with the simplest $"..." / gettext() interface. 1. The problem in the last paragraph quoted. I'm convinced this is a real problem: unlike with C programmes, the urge to tinker with strings in shell functions is strong and if there's no visual cue that this has bad side effects then the interface is, in my view, fundamentally broken. To put it another way, only programmers tinker with C programmes while users are actively encouraged to tinker with shell functions, so the whole nature of the interface needs to be rethought to make it clear and robust rather than minimal. However, this isn't insuperable. The "msgid" is only by convention the original string and could be anything; it was designed to be simple in the case of having many calls to gettext() throughout a programme. As we essentially have only one point of entry for translations in shell functions (the shell's C code is a separate and much simpler problem since this isn't fundamentally different from any other C programme), we can do it how we like. We can, for example, have translation strings like: $"_mount_nfs_access_acregmin:specify cached file attributes minimum hold time" and have the following rule: - If the string is in the form * ":" . * (we might need to make this more complicated eventually), first attempt look-up with the identifier characters. If the lookup doesn't return the original string, this is the text we want. - Otherwise look up with the whole string. This is for compatibility. Use of this in zsh functions would be deprecated. - If it still returns the original string but there is an identifier part, return the string after the ":". - Maybe we want some rule about aliasing, it's not clear (we can leave it until a use becomes obvious). This scheme has various merits: (i) it is robust about changes to the English text (ii) the explicit msgid serves as a visual cue that there's something here that shouldn't be monkeyed with without good reason (and that even if you change the English text it should mean the same thing) (iii) the msgid in the catalogues is compact. 2. Unfortunately there's also the problem of finding message catalogues. For the same reason that it's designed for simplicity with pre-compiled programmes, gettext() itself appears to require them to be in a particular hierarchy the top of which is determined at compile time. This isn't good enough in our case. We have functions that are installed at different places in the function path. The path can change and the only clean way of finding message catalogues is using the same path. We *could* collect all translations at shell installation and simply shrug our shoulders saying "that's your lot", but in my view this is too botched to consider. (As far as I can tell this is what happens in bash.) It's a key part of the way the completion system works that people can customize it themselves just by writing functions, and even if adding translations to your own functions is unusual I still don't think being limited to a predefined set is acceptable. I don't mind users (which includes administrators) having to run some utility to add, or add to, a message catalogue, but I do mind them having to modify the shell configuration and reinstall; even updating the shell libraries with something like one of Clint's out-of-tree modules seems a bit over the top. However, it seems like we can get something better by interfacing to the library at a lower level, in particular to catopen() (strictly this is a different family of interfaces). That accepts an absolute path to a catalogue and also uses the environment variable NLSPATH to search for files. It's currently unclear to me how to mix use of a shell-specified directory (determined, in ways we'll need to discuss, from $fpath) with a user-specified language (since I presume the library has an intelligent system of fallbacks we don't want to have to imitate). Unfortunately it looks like this absolute paths aren't portable, either. If the worst comes to the worst, we may need to alter the environment variable directly: for example, temporarily either appending or prepending the zsh directories to it. (I don't think requiring the user to modify NLSPATH as well as $fpath is a good idea; I think the shell should "just find" the right catalogues associated with functions, as with .zwc files.) Comments on this are obviously welcome. To proceed I think we need the following. The second and third parts should wait until after 4.3.5 (which I'll make before Christmas, despite the open bugs, since I haven't seen anything which is obviously worse than in 4.3.4). They should also wait until after the first part is resonably clear. I. Design: - finalize the rule for $"..." (or equivalent) - invent rules for finding the catalogue which should probably be flexible, ideally allowing both per-fpath-directory and per-autoloadable-function files while still allowing the user to have all their own translations collected in one place. For the last case it would probably be OK to fall back on NLSPATH. (I'm not implying people will use all the mechanisms, just that at this stage we should plan on flexibility.) - decide if we want strings in the source to use a similar scheme or (perhaps better) just normal gettext() rules. II. Shell source: - add parsing for $"..." - add config support for locating libraries for language catalogues and (where necessary) determining their abilities - also (a separate job) we should prepare the C code for use of gettext() --- as I said, this is conceptually simpler but still a lot of work. Someone needs to look at gettextize: this is really part of the previous point except that we won't want to rely just on the GNU version; a quick look suggests it assumes a bit to much of a standard GNU interface in some areas, but I haven't gone into any detail. - add some trial mechanism behind $"..." using catopen() / catgets() / catclose(). This is where we're going to need the most fiddling to get the interface right. III. Shell functions etc.: - add a few trial translation files for the completion system and possibly other files to test the water - ditto translations for strings in the shell's source code - write a whole set of utilities that - create bare catalogues - update catalogues with untranslated strings - check for uniqueness of the zsh msgid (needs some subtlety since obviously reuse is a good thing: presumably we need to check that the English text after the colon is the same in both cases) - install catalogues - manipulate (e.g. agglomerate) catalogues - list or query what translations are available - check catalogues for redundant translations This is probably the biggest chunk of work. It would be OK at least initially to rely on the gettext utilities where possible, but I suspect that in many areas we're on our own: it looks like this hasn't been done before in a way that takes into account end user requirements adequately (obviously I'd be interested in hearing otherwise). -- Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/