From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Thu,  2 May 2013 17:08:29 +0200
From: tlaronde@polynum.com
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <20130502150829.GA435@polynum.com>
References: <20130502123825.GA1975@polynum.com>
	<f0b15ae0cf8846283eb0f5e513ef8684@kw.quanstro.net>
	<20130502132556.GA2653@polynum.com>
	<d2252bd65147f2caa38a56333366f813@9srv.net>
Mime-Version: 1.0
In-Reply-To: <d2252bd65147f2caa38a56333366f813@9srv.net>
User-Agent: Mutt/1.4.2.3i
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Subject: Re: [9fans] Octets regexp
Topicbox-Message-UUID: 50c797ec-ead8-11e9-9d60-3106f5b1d025

On Thu, May 02, 2013 at 10:58:30AM -0400, a@9srv.net wrote:
>
> i think the answer is just "no, there's no way to do that today".
> and i'd strongly advise keeping that tool as far away from any
> discussion of localization or character sets or runes or the like.
> there's oughtn't be any mode switching or the like: it's utf-8
> encoded unicode runes, or it's binary, not characters at all.
>

But that is exactly my point: to have localization far from regexp.
Regexp taking simply a string of bytes and matching strings of bytes.
(The main advantage of UTF-8 is not, for me, Unicode (UTF-8 could
survive being an encoding for something else than Unicode), but
precisely that it is still strings of octets, and that the system
can be left alone, far from localization.)

This is a side effect of not Unicode (UTF-8) aware tools to be able to
be used with whatever string of bytes since no interpretation is done.

One could even imagine using regexp to find a pattern in an image (even
a sed like program, trying to math a first row pattern, and then looking
for the following rows if some patterns are matched too).

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C