From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 2 May 2013 17:08:29 +0200 From: tlaronde@polynum.com To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20130502150829.GA435@polynum.com> References: <20130502123825.GA1975@polynum.com> <20130502132556.GA2653@polynum.com> Mime-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.4.2.3i Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: [9fans] Octets regexp Topicbox-Message-UUID: 50c797ec-ead8-11e9-9d60-3106f5b1d025 On Thu, May 02, 2013 at 10:58:30AM -0400, a@9srv.net wrote: > > i think the answer is just "no, there's no way to do that today". > and i'd strongly advise keeping that tool as far away from any > discussion of localization or character sets or runes or the like. > there's oughtn't be any mode switching or the like: it's utf-8 > encoded unicode runes, or it's binary, not characters at all. > But that is exactly my point: to have localization far from regexp. Regexp taking simply a string of bytes and matching strings of bytes. (The main advantage of UTF-8 is not, for me, Unicode (UTF-8 could survive being an encoding for something else than Unicode), but precisely that it is still strings of octets, and that the system can be left alone, far from localization.) This is a side effect of not Unicode (UTF-8) aware tools to be able to be used with whatever string of bytes since no interpretation is done. One could even imagine using regexp to find a pattern in an image (even a sed like program, trying to math a first row pattern, and then looking for the following rows if some patterns are matched too). -- Thierry Laronde http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C