zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <Peter.Stephenson@csr.com>
To: <zsh-workers@zsh.org>
Subject: Re: PATCH: PCRE support for embedded NUL characters
Date: Tue, 18 Sep 2012 09:51:45 +0100	[thread overview]
Message-ID: <20120918095145.76dabc4b@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <20120917190422.GA41017@redoubt.spodhuis.org>

On Mon, 17 Sep 2012 15:04:23 -0400
Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> Yeah, but correlating offsets in unmetafied strings to the metafied
> strings for then counting is non-trivial (or so it seems to me).

It's not so difficult: we already do most of this conversion for other
similar cases of pattern matching, where we need to convert offsets in
octets to characters, the only difference being the metafication which
just means the loop over the characters is slightly different.  In fact,
it's if anything marginally easier since the metafication is a pure zsh
invention.

> And wcwidth() tells how many display cells are needed for a given
> character, assuming a monospace layout.  For this, instead, mblen() is
> needed, on a character-by-character basis.  Given that mblen() is C99, I
> opted to avoid it, and implement this just for UTF-8 with bit-pattern
> examination to quickly count past characters.  We only initialise PCRE
> for wide characters with UTF-8.  I've no idea how much effort we want to
> put into supporting non-UTF-8 wide-character PCRE across multiple OSes.

Doing it just for UTF-8 is incompatible with the rest of the shell.  It
should be possible to do it similarly to mb_metastrlen() in utils.c.
Basically the only difference is using an explicit length rather than null
termination, plus not having an internal test for Meta characters.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog


  reply	other threads:[~2012-09-18  8:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-16 12:50 Phil Pennock
2012-09-17  5:59 ` Phil Pennock
2012-09-19 18:24   ` Peter Stephenson
2012-09-19 18:49     ` Phil Pennock
2012-09-17  8:57 ` Peter Stephenson
2012-09-17 19:04   ` Phil Pennock
2012-09-18  8:51     ` Peter Stephenson [this message]
2012-09-18 10:40       ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120918095145.76dabc4b@pwslap01u.europe.root.pri \
    --to=peter.stephenson@csr.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).