zsh-users
 help / color / mirror / code / Atom feed
* filename completion with umlauts (again)
@ 2011-01-06 23:27 Andy Spiegl
  2011-01-07  0:05 ` Mikael Magnusson
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Spiegl @ 2011-01-06 23:27 UTC (permalink / raw)
  To: zsh-users

Hi, back in 2003 (zsh 4.0.4) there was a completion bug with filenames
that have German umlauts in them.

 Date: Mon, 20 Jan 2003 10:29:23 -0500
 From: Andy Spiegl <zsh.Andy@spiegl.de>
 To: ZSH User List <zsh-users@sunsite.dk>
 Subject: How to do filename completion with umlauts?

It was fixed in 4.0.6 but somehow came back now.  I can't tell exactly
when but probably when I switched from latin1 to utf-8 (LANG=de_DE.UTF-8).

Here is the full description:

 $ zstyle ":completion:*" matcher-list '' 'm:{A-ZÄÖÜa-zäöü}={a-zäöüA-ZÄÖÜ}'
 $ touch foo füü
 $ ls fO<TAB>
 $ ls foo
but
 $ ls fÜ<TAB>
just beeps.

Ctrl-x h instead of TAB shows:
tags in context :completion::complete:ls::
    argument-rest options  (_arguments _ls (eval))
tags in context :completion::complete:ls:argument-rest:
    all-files  (_files _arguments _ls (eval))

Back in 2003, Oliver Kiddle asked me to try this:
> Can you try just using a simple function like this:
> _foo () {
> compadd -M 'm:{A-Zöäüa-zÖÄÜ}={a-zÖÄÜA-Zöäü}' - Ö123 Ä123 A567 Ü987
> }
> compdef _foo foo
> foo ä<tab>
I tried the same thing today but "foo ä<tab>" just beeps instead of
giving me "foo Ä123".


My locale settings are:
$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=


Thanks for any hint,
 Andy.


-- 
 Why do we put suits in a garment bag and put garments in a suitcase?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filename completion with umlauts (again)
  2011-01-06 23:27 filename completion with umlauts (again) Andy Spiegl
@ 2011-01-07  0:05 ` Mikael Magnusson
  2011-01-07  9:44   ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Mikael Magnusson @ 2011-01-07  0:05 UTC (permalink / raw)
  To: zsh-users

On 7 January 2011 00:27, Andy Spiegl <zsh.Andy@spiegl.de> wrote:
> Hi, back in 2003 (zsh 4.0.4) there was a completion bug with filenames
> that have German umlauts in them.
>
>  Date: Mon, 20 Jan 2003 10:29:23 -0500
>  From: Andy Spiegl <zsh.Andy@spiegl.de>
>  To: ZSH User List <zsh-users@sunsite.dk>
>  Subject: How to do filename completion with umlauts?
>
> It was fixed in 4.0.6 but somehow came back now.  I can't tell exactly
> when but probably when I switched from latin1 to utf-8 (LANG=de_DE.UTF-8).

The answer is unfortunately very simple, matcher-list doesn't support
multibyte encodings yet.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filename completion with umlauts (again)
  2011-01-07  0:05 ` Mikael Magnusson
@ 2011-01-07  9:44   ` Peter Stephenson
  2011-01-07 23:35     ` Andy Spiegl
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2011-01-07  9:44 UTC (permalink / raw)
  To: zsh-users

On Fri, 7 Jan 2011 01:05:24 +0100
Mikael Magnusson <mikachu@gmail.com> wrote:
> On 7 January 2011 00:27, Andy Spiegl <zsh.Andy@spiegl.de> wrote:
> > Hi, back in 2003 (zsh 4.0.4) there was a completion bug with
> > filenames that have German umlauts in them.
> >
> >  Date: Mon, 20 Jan 2003 10:29:23 -0500
> >  From: Andy Spiegl <zsh.Andy@spiegl.de>
> >  To: ZSH User List <zsh-users@sunsite.dk>
> >  Subject: How to do filename completion with umlauts?
> >
> > It was fixed in 4.0.6 but somehow came back now.  I can't tell
> > exactly when but probably when I switched from latin1 to utf-8
> > (LANG=de_DE.UTF-8).
> 
> The answer is unfortunately very simple, matcher-list doesn't support
> multibyte encodings yet.

Yes, I'm afraid that's the last big reason why I haven't released the
multibyte-enabled shell as a stable version yet.  However, it's a big
job in underdocumented code, so I'm not going to wait any more.
Possibly someone will have time and patience for it one day.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filename completion with umlauts (again)
  2011-01-07  9:44   ` Peter Stephenson
@ 2011-01-07 23:35     ` Andy Spiegl
  2011-01-08  7:10       ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Spiegl @ 2011-01-07 23:35 UTC (permalink / raw)
  To: zsh-users

On 2011-01-07, 09:44, Peter Stephenson wrote:
> Yes, I'm afraid that's the last big reason why I haven't released the
> multibyte-enabled shell as a stable version yet.  However, it's a big
> job in underdocumented code, so I'm not going to wait any more.
> Possibly someone will have time and patience for it one day.

Uhm, too bad.  I am wondering whether case insensitivity in the
matcher could be achieved with a different trick?

Maybe somehow like this: Bart Schaefer once sent me a nifty function
to replace a leading 7 with a slash (common typing mistake on German
keyboards):

function _7slash {
   if [[ $words[CURRENT] = 7(#b)(*)(#e) ]]
   then
     compadd -U -X 'Correct leading 7 to /' -f /$match[1]
   fi
}

But no, I guess case insensitivity is a lot more complex, right?

Thanks,
 Andy.

-- 
 Whenever you meet yourself you're in a time loop or in front of a mirror.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filename completion with umlauts (again)
  2011-01-07 23:35     ` Andy Spiegl
@ 2011-01-08  7:10       ` Bart Schaefer
  2011-01-08 20:21         ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2011-01-08  7:10 UTC (permalink / raw)
  To: zsh-users

On Jan 8, 12:35am, Andy Spiegl wrote:
}
} Uhm, too bad.  I am wondering whether case insensitivity in the
} matcher could be achieved with a different trick?

As I understand it, the problem isn't case insensitivity.  The problem
is (a) representing each set of characters in a managable syntax and
(b) efficiently constructing a mapping between the two sets.

This is a tractable problem for single byte characters because there
is a single fixed ordering and no more than 256 values in each set; for
multibyte characters, not only is the number of values much larger,
but also the user-expected collating order is not always the same as
the numeric order of the underlying encoding.

(And now I fully expect someone to point out that I've got that entirely
wrong and the trouble really is something else.)

} But no, I guess case insensitivity is a lot more complex, right?

If all you want is bidirectional case insensitivity, you might be
able to construct a completer based on _approximate that inserts a
(#i) into the value of PREFIX in the way _approximate does for (#a).
Then add that completer to your completers zstyle.

This is considerably less powerful than what matcher-list does.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: filename completion with umlauts (again)
  2011-01-08  7:10       ` Bart Schaefer
@ 2011-01-08 20:21         ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2011-01-08 20:21 UTC (permalink / raw)
  To: zsh-users

On Fri, 07 Jan 2011 23:10:48 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jan 8, 12:35am, Andy Spiegl wrote:
> }
> } Uhm, too bad.  I am wondering whether case insensitivity in the
> } matcher could be achieved with a different trick?
> 
> As I understand it, the problem isn't case insensitivity.  The problem
> is (a) representing each set of characters in a managable syntax and
> (b) efficiently constructing a mapping between the two sets.
> 
> This is a tractable problem for single byte characters because there
> is a single fixed ordering and no more than 256 values in each set; for
> multibyte characters, not only is the number of values much larger,
> but also the user-expected collating order is not always the same as
> the numeric order of the underlying encoding.
> 
> (And now I fully expect someone to point out that I've got that entirely
> wrong and the trouble really is something else.)

The remaining problem is the multibyte one; the matcher code is heavily
tied to one character per array position in a way that doesn't make it
easy to turn multibyte into wide characters and back (and that doesn't
always make it obvious what the @*!@! it's actually doing with the
array).

The collating order might be potentially a problem if you use literal
characters, but that's already fixed in a general way by allowing the
syntax:

  m:{[:upper:][:lower:]}={[:lower:][:upper:]}

and similar --- basically, any use of {...} allows matching lower and
upper characters generically.

This already works for single byte locales using future-proof library
calls (i.e. things like iswupper() that operate on wide characters);
hence I'm reasonably confident that once we fix the multibyte problem
(if ever) the rest should fall naturally into place.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-01-08 20:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-06 23:27 filename completion with umlauts (again) Andy Spiegl
2011-01-07  0:05 ` Mikael Magnusson
2011-01-07  9:44   ` Peter Stephenson
2011-01-07 23:35     ` Andy Spiegl
2011-01-08  7:10       ` Bart Schaefer
2011-01-08 20:21         ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).