From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 2 May 2013 08:48:06 -0400 To: 9fans@9fans.net Message-ID: In-Reply-To: <20130502123825.GA1975@polynum.com> References: <20130502123825.GA1975@polynum.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Octets regexp Topicbox-Message-UUID: 505bc13e-ead8-11e9-9d60-3106f5b1d025 > Regexp(6) handles "characters" that are runes. perhaps the man page is misleading. rune in this context means utf-8. see regexp(2). all the functions take char*s. > I wonder if Plan9 developers, when trying to design a way towards some > localization, have ever thought of bytes (octets) regexp, that is using > regexp with not rune but octets strings (maybe UTF-8 as is) allowing to > use regexp with binary too, not only newline terminated chunks etc.? one of the points of plan 9 was to standardize on one character set, utf-8. imho, localization and character set aren't related unless one is dealing with 8859-x overlays or some other character set insufficient to represent the range of languages. however, sam and acme allow for structured regular expressions, and are generally not line oriented: http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf and iirc, cinap has written a cifs bit that uses a bit of binary matching. - erik