From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11088 invoked by alias); 19 Sep 2012 18:49:33 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 30685 Received: (qmail 20170 invoked from network); 19 Sep 2012 18:49:31 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_ADSP_ALL, DKIM_SIGNED,RCVD_IN_DNSWL_MED,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.2 Received-SPF: none (ns1.primenet.com.au: domain at spodhuis.org does not designate permitted sender hosts) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201107; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:From:Date; bh=k7aplY8EXEW4Ylk3+jmD6N5lEBL4Ko6crzFMuQwQOjE=; b=pEDlfdQBUc11RfV1T6J7RHpflcy8wG/lo8PBt+nSrwbJzEP20gMXA6CthpZpzKTC7Q+SM+XOgH61PPzw7WZN6/UiIjdlVgu2EpT21VT6Oqd0U/RL7zC0YGfKeLkh4e4gUZ0XIFEaNG2G10OQU2qWnKNh+9YYGnsHLnXLeHc31H4=; Date: Wed, 19 Sep 2012 14:49:26 -0400 From: Phil Pennock To: zsh-workers@zsh.org Subject: Re: PATCH: PCRE support for embedded NUL characters Message-ID: <20120919184926.GA72712@redoubt.spodhuis.org> Mail-Followup-To: zsh-workers@zsh.org References: <20120916125015.GA87764@redoubt.spodhuis.org> <20120917055925.GA32663@redoubt.spodhuis.org> <20120919192459.30480644@pws-pc.ntlworld.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120919192459.30480644@pws-pc.ntlworld.com> On 2012-09-19 at 19:24 +0100, Peter Stephenson wrote: > On Mon, 17 Sep 2012 01:59:25 -0400 > Phil Pennock wrote: > > Anyone have any opinions of what ZPCRE_OP should count? > > > > It's currently documented as counting bytes, which is certainly > > accurate, but seems not very useful given that you can't index strings > > by byte offsets in zsh (or can you, with some option I've not noticed > > before?) > > Ideally, it's bytes with MULTIBYTE unset, and code points with it set, > i.e. what the mb/wc functions consider a "character". Okay, good; when I get back to this, I'll harmonise all the outputs to be by character, including that one, and update the docs. Probably this weekend, unless I need a mental code-switch break before then. -Phil