zsh-workers
 help / color / Atom feed
* zsh/pcre has errors with unicode bytes
@ 2020-07-13  2:53 정누리
  2020-07-13 14:02 ` Daniel Shahaf
  2020-07-15  7:43 ` Jun T
  0 siblings, 2 replies; 3+ messages in thread
From: 정누리 @ 2020-07-13  2:53 UTC (permalink / raw)
  To: zsh-workers

Hi,

Looks like an error related to unicode bytes exists in current release (5.8) of the zsh/pcre.
When the locale is set to 'C' and trying to process a unicode string byte-by-byte, e.g.,

$ LC_ALL='C'
$ str='Hi😊'
$ for (( i = 1; i <= ${#str}; ++i )); do                     
      byte="$str[i]"
      ord=$(( [##16] #byte ))                           
      echo $ord
  done
>> 48
69
F0
9F
98
8A
$ for (( i = 1; i <= ${#str}; ++i )); do                     
      byte="$str[i]"                  
      [[ $byte -regex-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
  done
>> H
   i
   no match
   no match
   no match
   no match
$ for (( i = 1; i <= ${#str}; ++i )); do                     
      byte="$str[i]"                  
      [[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
  done
>> H
   i
   zsh: pcre_exec() error [-10]
   no match
   zsh: pcre_exec() error [-10]
   no match
   zsh: pcre_exec() error [-10]
   no match
   zsh: pcre_exec() error [-10]
   no match

Thanks for reading.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: zsh/pcre has errors with unicode bytes
  2020-07-13  2:53 zsh/pcre has errors with unicode bytes 정누리
@ 2020-07-13 14:02 ` Daniel Shahaf
  2020-07-15  7:43 ` Jun T
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Shahaf @ 2020-07-13 14:02 UTC (permalink / raw)
  To: 정누리; +Cc: zsh-workers

정누리 wrote on Mon, 13 Jul 2020 11:53 +0900:
> $ LC_ALL='C'
> $ str='Hi😊'
> $ for (( i = 1; i <= ${#str}; ++i )); do                     
>       byte="$str[i]"                  
>       [[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match'
>   done
> >> H  
>    i
>    zsh: pcre_exec() error [-10]

From /usr/include/pcre.h on my system:

#define PCRE_ERROR_BADUTF8         (-10)  /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF16        (-10)  /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF32        (-10)  /* Same for 8/16/32 */

So pcre expects the pattern to be a Unicode string, despite the locale.

Actually, wait.  We don't know what the locale is.  I don't build PCRE,
but could you try that again with «export LC_ALL='C'» at the start?

If that doesn't force it to use ASCII, try unsetting the MULTIBYTE
option.  See zpcre_utf8_enabled() (in Src/Modules/pcre.c).

Cheers,

Daniel


>    no match
>    zsh: pcre_exec() error [-10]
>    no match
>    zsh: pcre_exec() error [-10]
>    no match
>    zsh: pcre_exec() error [-10]
>    no match
> 
> Thanks for reading.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: zsh/pcre has errors with unicode bytes
  2020-07-13  2:53 zsh/pcre has errors with unicode bytes 정누리
  2020-07-13 14:02 ` Daniel Shahaf
@ 2020-07-15  7:43 ` Jun T
  1 sibling, 0 replies; 3+ messages in thread
From: Jun T @ 2020-07-15  7:43 UTC (permalink / raw)
  To: zsh-workers


> 2020/07/13 11:53, 정누리 <jnooree@gmail.com> wrote:
> 
> When the locale is set to 'C' and trying to process a unicode string byte-by-byte,

You also unset the multibyte option:

setopt nomultibyte



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13  2:53 zsh/pcre has errors with unicode bytes 정누리
2020-07-13 14:02 ` Daniel Shahaf
2020-07-15  7:43 ` Jun T

zsh-workers

Archives are clonable: git clone --mirror http://inbox.vuxu.org/zsh-workers

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.zsh.workers


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git