* zsh/pcre has errors with unicode bytes
@ 2020-07-13 2:53 정누리
2020-07-13 14:02 ` Daniel Shahaf
2020-07-15 7:43 ` Jun T
0 siblings, 2 replies; 3+ messages in thread
From: 정누리 @ 2020-07-13 2:53 UTC (permalink / raw)
To: zsh-workers
Hi,
Looks like an error related to unicode bytes exists in current release (5.8) of the zsh/pcre.
When the locale is set to 'C' and trying to process a unicode string byte-by-byte, e.g.,
$ LC_ALL='C'
$ str='Hi😊'
$ for (( i = 1; i <= ${#str}; ++i )); do
byte="$str[i]"
ord=$(( [##16] #byte ))
echo $ord
done
>> 48
69
F0
9F
98
8A
$ for (( i = 1; i <= ${#str}; ++i )); do
byte="$str[i]"
[[ $byte -regex-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
done
>> H
i
no match
no match
no match
no match
$ for (( i = 1; i <= ${#str}; ++i )); do
byte="$str[i]"
[[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
done
>> H
i
zsh: pcre_exec() error [-10]
no match
zsh: pcre_exec() error [-10]
no match
zsh: pcre_exec() error [-10]
no match
zsh: pcre_exec() error [-10]
no match
Thanks for reading.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: zsh/pcre has errors with unicode bytes
2020-07-13 2:53 zsh/pcre has errors with unicode bytes 정누리
@ 2020-07-13 14:02 ` Daniel Shahaf
2020-07-15 7:43 ` Jun T
1 sibling, 0 replies; 3+ messages in thread
From: Daniel Shahaf @ 2020-07-13 14:02 UTC (permalink / raw)
To: 정누리; +Cc: zsh-workers
정누리 wrote on Mon, 13 Jul 2020 11:53 +0900:
> $ LC_ALL='C'
> $ str='Hi😊'
> $ for (( i = 1; i <= ${#str}; ++i )); do
> byte="$str[i]"
> [[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
> done
> >> H
> i
> zsh: pcre_exec() error [-10]
From /usr/include/pcre.h on my system:
#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */
So pcre expects the pattern to be a Unicode string, despite the locale.
Actually, wait. We don't know what the locale is. I don't build PCRE,
but could you try that again with «export LC_ALL='C'» at the start?
If that doesn't force it to use ASCII, try unsetting the MULTIBYTE
option. See zpcre_utf8_enabled() (in Src/Modules/pcre.c).
Cheers,
Daniel
> no match
> zsh: pcre_exec() error [-10]
> no match
> zsh: pcre_exec() error [-10]
> no match
> zsh: pcre_exec() error [-10]
> no match
>
> Thanks for reading.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: zsh/pcre has errors with unicode bytes
2020-07-13 2:53 zsh/pcre has errors with unicode bytes 정누리
2020-07-13 14:02 ` Daniel Shahaf
@ 2020-07-15 7:43 ` Jun T
1 sibling, 0 replies; 3+ messages in thread
From: Jun T @ 2020-07-15 7:43 UTC (permalink / raw)
To: zsh-workers
> 2020/07/13 11:53, 정누리 <jnooree@gmail.com> wrote:
>
> When the locale is set to 'C' and trying to process a unicode string byte-by-byte,
You also unset the multibyte option:
setopt nomultibyte
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-07-15 7:44 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 2:53 zsh/pcre has errors with unicode bytes 정누리
2020-07-13 14:02 ` Daniel Shahaf
2020-07-15 7:43 ` Jun T
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).