From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22453 invoked by alias); 25 Apr 2010 13:36:16 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 27903 Received: (qmail 26966 invoked from network); 25 Apr 2010 13:36:14 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received-SPF: none (ns1.primenet.com.au: domain at spodhuis.org does not designate permitted sender hosts) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d200912; h=In-Reply-To:Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=RTCs06Zp9+Gqg2Wnxw7n3mV0Jb/YwoPdKwRlaTLZRfM=; b=RlVEr0BdqVxQ+DytCbREOGB/WyCrHZUKrrMdoZFHsFLp/oUVd+9QArb/39rdaGf3uK1f/gTa7e2NVPy4Gix7bhgVcR2RlUNMwnMCUoye8bc7mKBOhaBd3Te/ws3dILulPXfduXj94E6bFzO6n3BXr1H3HufbP2TR+yqrHHxIBPw=; Date: Sun, 25 Apr 2010 06:19:44 -0700 From: Phil Pennock To: Frank Terbeck Cc: zsh-workers@zsh.org Subject: Re: vcs_info and locales Message-ID: <20100425131944.GA55789@redoubt.spodhuis.org> Mail-Followup-To: Frank Terbeck , zsh-workers@zsh.org References: <20100424234017.776ae0ea@coriolan> <87aassncyk.fsf@ft.bewatermyfriend.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87aassncyk.fsf@ft.bewatermyfriend.org> On 2010-04-25 at 10:38 +0200, Frank Terbeck wrote: > Anyway, could you try the following patch for the locale problem? I > think it should solve the issue once and for all. I have one concern, which leads to the question: is it really necessary to set LC_ALL instead of LC_MESSAGES? The main problem is that when you override LC_CTYPE to C, you lose any potential UTF-8 support, unless the tool just passes through the binary data. I think the safest algorithm is not to set LC_ALL but instead: * set LC_MESSAGES=C * if LC_ALL is set and is not C, set LANG=$LC_ALL, unset LC_ALL Make sense? Rest of this email is just some exploration and skippable. I don't have NLS support on my main box, or I could do more testing myself; with { svn log }, where most of my UTF-8 shows, LC_CTYPE=C leads to expressing the content with escapes instead of cleanly. I know VCS_Info doesn't use that, I mention it by way of example. { svn info } by contrast always percent-encodes those characters; this works anyway, because VCS_Info walks back up the dir-tree to find the svn co dir, so has the relative info by comparing the FS realpath'd root of the repo to the current dir. URL: https://svn.spodhuis.org/ksvn/scratch/Fran%C3%A7ois -> VCS_Info %S == François (yes, I picked the OP's name as example testdata) For experimentation, I created a repo with a UTF-8 character in its name. Apache/mod_dav_svn won't serve it: (20014)Internal error: Can't convert string from 'UTF-8' to native encoding: [...] but I can use file:/// access instead. A repo named foo-☺ appears in my prompt as (). And still VCS_Info works: URL: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA/fred Repository Root: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA -> VCS_Info %S == fred pwd -> ..../T/foo-☺/fred URL: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA/%E2%99%A1 Repository Root: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA -> VCS_Info %S == ♡ pwd -> ..../T/foo-☺/♡ I ♡ VCS_Info for just working, but it's still juju. It also works as an accidental artifact of the VCS_INFO_get_data_svn implementation. I get to say "accidental" because apparently I wrote that code. *scratches head* (Through all this, cd gets interesting when xtitle updates to iTerm silently drop the UTF-8 characters through to the display)