* printf %<n>s in UTF-8 is not always POSIX-compliant @ 2012-02-15 2:15 Vincent Lefevre 2012-02-15 8:14 ` Bart Schaefer 0 siblings, 1 reply; 10+ messages in thread From: Vincent Lefevre @ 2012-02-15 2:15 UTC (permalink / raw) To: zsh-workers Hi, I've reported the following bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=659932 In UTF-8 locales: xvii% printf ".%2s.\n" é . é. xvii% emulate sh xvii% printf ".%2s.\n" é .é. xvii% emulate ksh xvii% printf ".%2s.\n" é . é. It is correct in sh mode (according to POSIX[*]), but not in ksh mode, which should also follow the POSIX behavior. What about zsh mode? [*] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html and http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05 for %<n>s. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 2:15 printf %<n>s in UTF-8 is not always POSIX-compliant Vincent Lefevre @ 2012-02-15 8:14 ` Bart Schaefer 2012-02-15 9:10 ` Vincent Lefevre ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Bart Schaefer @ 2012-02-15 8:14 UTC (permalink / raw) To: zsh-workers On Feb 15, 3:15am, Vincent Lefevre wrote: } } In UTF-8 locales: } } xvii% printf ".%2s.\n" é } .é. Am I understanding correctly that the intent here is that é is a two- byte character so %2s should print the two literal bytes, rather than print the single logical character in a field two logical characters wide? The reason it's different for "emulate sh" is that sh emulation turns off all support for multibyte characters (unsetopt multibyte). If you were to do emulate sh -c 'setopt multibyte; printf ".%2s.\n" é' then I believe you'd see the same behavior as with "emulate ksh". As to whether it's correct ... I think I'd prefer the logical rather than literal interpretation, but it'll be difficult [or a hack that requires looking at the global emulation state, so it won't be possible to reproduce it with plain setopts] to turn off multibyte processing in printf for ksh emulation but not native zsh. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 8:14 ` Bart Schaefer @ 2012-02-15 9:10 ` Vincent Lefevre 2012-02-15 11:05 ` Peter Stephenson 2012-02-15 14:42 ` Oliver Kiddle 2 siblings, 0 replies; 10+ messages in thread From: Vincent Lefevre @ 2012-02-15 9:10 UTC (permalink / raw) To: zsh-workers On 2012-02-15 00:14:12 -0800, Bart Schaefer wrote: > On Feb 15, 3:15am, Vincent Lefevre wrote: > } > } In UTF-8 locales: > } > } xvii% printf ".%2s.\n" é > } .é. > > Am I understanding correctly that the intent here is that é is a two- > byte character so %2s should print the two literal bytes, rather than > print the single logical character in a field two logical characters > wide? Yes, the number is the size in bytes, not in characters. I think that the intent is to deal with internal structures (e.g. with file formats where some fields have a fixed or limited size, and the same syntax can be used in C to avoid buffer overflows). Note that there's the same problem with: xvii% printf ".%.3s.\n" éabcd .éab. xvii% emulate ksh xvii% printf ".%.3s.\n" éabcd .éab. xvii% emulate sh xvii% printf ".%.3s.\n" éabcd .éa. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 8:14 ` Bart Schaefer 2012-02-15 9:10 ` Vincent Lefevre @ 2012-02-15 11:05 ` Peter Stephenson 2012-02-15 11:53 ` Vincent Lefevre 2012-02-15 14:42 ` Oliver Kiddle 2 siblings, 1 reply; 10+ messages in thread From: Peter Stephenson @ 2012-02-15 11:05 UTC (permalink / raw) To: zsh-workers On Wed, 15 Feb 2012 00:14:12 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > The reason it's different for "emulate sh" is that sh emulation turns > off all support for multibyte characters (unsetopt multibyte). If you > were to do > emulate sh -c 'setopt multibyte; printf ".%2s.\n" é' > then I believe you'd see the same behavior as with "emulate ksh". > > As to whether it's correct ... I think I'd prefer the logical rather > than literal interpretation, but it'll be difficult [or a hack that > requires looking at the global emulation state, so it won't be possible > to reproduce it with plain setopts] to turn off multibyte processing in > printf for ksh emulation but not native zsh. This sounds correct... We've never promised ksh mode would be a complete representation of ksh anyway. I realise that, for historical reasons related to standards rather than zsh, you'd expect ksh mode to be POSIX compatible, but actually we don't tend to bother because ksh mode isn't that widely used and so doesn't get a lot of attention (I certainly never use it). If you really want compatibility native zsh mode or sh mode are the sensible choices. So probably the fix is to spread fear, uncertainty and doubt about ksh mode. I'll start right now. If there is a hard-core ksh mode user who'd like to maintain it, of course, that's another story. -- Peter Stephenson <pws@csr.com> Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 11:05 ` Peter Stephenson @ 2012-02-15 11:53 ` Vincent Lefevre 2012-02-15 12:09 ` Frank Terbeck 0 siblings, 1 reply; 10+ messages in thread From: Vincent Lefevre @ 2012-02-15 11:53 UTC (permalink / raw) To: zsh-workers On 2012-02-15 11:05:19 +0000, Peter Stephenson wrote: > This sounds correct... We've never promised ksh mode would be a complete > representation of ksh anyway. I realise that, for historical reasons > related to standards rather than zsh, you'd expect ksh mode to be POSIX > compatible, but actually we don't tend to bother because ksh mode isn't > that widely used and so doesn't get a lot of attention (I certainly > never use it). If you really want compatibility native zsh mode or sh > mode are the sensible choices. The problem is that on some machines, one has a symlink ksh -> zsh. If I type ksh or run a script with #!/usr/bin/ksh, I expect this to behave as a real ksh. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 11:53 ` Vincent Lefevre @ 2012-02-15 12:09 ` Frank Terbeck 2012-02-15 12:23 ` Peter Stephenson 2012-02-15 12:42 ` Vincent Lefevre 0 siblings, 2 replies; 10+ messages in thread From: Frank Terbeck @ 2012-02-15 12:09 UTC (permalink / raw) To: zsh-workers Vincent Lefevre wrote: > On 2012-02-15 11:05:19 +0000, Peter Stephenson wrote: >> This sounds correct... We've never promised ksh mode would be a complete >> representation of ksh anyway. [...] > The problem is that on some machines, one has a symlink ksh -> zsh. > If I type ksh or run a script with #!/usr/bin/ksh, I expect this to > behave as a real ksh. Frankly, that would be the vendor's fault then. There are many *MANY* ksh implementations, that make for a reasonable link target (ksh93, pdksh or mksh - to name just a few). Zsh is not one of them. IMHO, ksh-emulation is a little bit like csh emulation: It's meant to make users with ksh background feel more "at home", not as a strict bug-for-bug emulation. Regards, Frank ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 12:09 ` Frank Terbeck @ 2012-02-15 12:23 ` Peter Stephenson 2012-02-15 12:42 ` Vincent Lefevre 1 sibling, 0 replies; 10+ messages in thread From: Peter Stephenson @ 2012-02-15 12:23 UTC (permalink / raw) To: zsh-workers On Wed, 15 Feb 2012 13:09:17 +0100 Frank Terbeck <ft@bewatermyfriend.org> wrote: > IMHO, ksh-emulation is a little bit like csh emulation: It's meant to > make users with ksh background feel more "at home", not as a strict > bug-for-bug emulation. That's basically how I see it. It doesn't mean we can't do better --- but I don't think we can do better by people who don't really use the mode initiating random tweaks in the hope that the world becomes a better place. We really would need someone who is in a position to take a global view of how changes to the mode affect the emulation. This is a rather different case from POSIX emulation, where there's (i) a standard (ii) quite a lot of visibility of what the effect of changes are. -- Peter Stephenson <pws@csr.com> Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 12:09 ` Frank Terbeck 2012-02-15 12:23 ` Peter Stephenson @ 2012-02-15 12:42 ` Vincent Lefevre 1 sibling, 0 replies; 10+ messages in thread From: Vincent Lefevre @ 2012-02-15 12:42 UTC (permalink / raw) To: zsh-workers On 2012-02-15 13:09:17 +0100, Frank Terbeck wrote: > Frankly, that would be the vendor's fault then. There are many *MANY* > ksh implementations, that make for a reasonable link target (ksh93, > pdksh or mksh - to name just a few). Zsh is not one of them. OK, bug reported. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 8:14 ` Bart Schaefer 2012-02-15 9:10 ` Vincent Lefevre 2012-02-15 11:05 ` Peter Stephenson @ 2012-02-15 14:42 ` Oliver Kiddle 2012-02-15 14:56 ` Vincent Lefevre 2 siblings, 1 reply; 10+ messages in thread From: Oliver Kiddle @ 2012-02-15 14:42 UTC (permalink / raw) To: zsh-workers [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 870 bytes --] Bart wrote: > > Am I understanding correctly that the intent here is that é is a two- > byte character so %2s should print the two literal bytes, rather than > print the single logical character in a field two logical characters > wide? That's correct. The POSIX definition uses bytes. For multibyte behaviour, there is an L modifier. I don't really see the sense in it myself: I don't want to write low-level stuff in the shell. Frank Terbeck wrote: > Frankly, that would be the vendor's fault then. There are many *MANY* > ksh implementations, that make for a reasonable link target (ksh93, > pdksh or mksh - to name just a few). Zsh is not one of them. The fact that zsh is far from a perfect emulation doesn't stop it from being useful. I don't necessarily want to install a separate ksh package and zsh will run ksh scripts at least as well as pdksh. Oliver ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: printf %<n>s in UTF-8 is not always POSIX-compliant 2012-02-15 14:42 ` Oliver Kiddle @ 2012-02-15 14:56 ` Vincent Lefevre 0 siblings, 0 replies; 10+ messages in thread From: Vincent Lefevre @ 2012-02-15 14:56 UTC (permalink / raw) To: zsh-workers On 2012-02-15 15:42:15 +0100, Oliver Kiddle wrote: > Bart wrote: > > Am I understanding correctly that the intent here is that ?? is a two- > > byte character so %2s should print the two literal bytes, rather than > > print the single logical character in a field two logical characters > > wide? > > That's correct. The POSIX definition uses bytes. For multibyte > behaviour, there is an L modifier. I don't really see the sense in it > myself: I don't want to write low-level stuff in the shell. I think that's for consistency with C. Also, the shell could then be used as a front-end to test string-related things. > Frank Terbeck wrote: > > Frankly, that would be the vendor's fault then. There are many *MANY* > > ksh implementations, that make for a reasonable link target (ksh93, > > pdksh or mksh - to name just a few). Zsh is not one of them. > > The fact that zsh is far from a perfect emulation doesn't stop it from > being useful. I don't necessarily want to install a separate ksh package > and zsh will run ksh scripts at least as well as pdksh. But then the emulation should be correct. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-02-15 14:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-15 2:15 printf %<n>s in UTF-8 is not always POSIX-compliant Vincent Lefevre 2012-02-15 8:14 ` Bart Schaefer 2012-02-15 9:10 ` Vincent Lefevre 2012-02-15 11:05 ` Peter Stephenson 2012-02-15 11:53 ` Vincent Lefevre 2012-02-15 12:09 ` Frank Terbeck 2012-02-15 12:23 ` Peter Stephenson 2012-02-15 12:42 ` Vincent Lefevre 2012-02-15 14:42 ` Oliver Kiddle 2012-02-15 14:56 ` Vincent Lefevre
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).