From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Thu,  2 May 2013 18:16:14 +0200
From: tlaronde@polynum.com
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <20130502161614.GA1437@polynum.com>
References: <20130502123825.GA1975@polynum.com>
Mime-Version: 1.0
In-Reply-To: <20130502123825.GA1975@polynum.com>
User-Agent: Mutt/1.4.2.3i
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Subject: Re: [9fans] Octets regexp
Topicbox-Message-UUID: 511657ec-ead8-11e9-9d60-3106f5b1d025

On Thu, May 02, 2013 at 02:38:25PM +0200, tlaronde@polynum.com wrote:
> Regexp(6) handles "characters" that are runes.
>

Answering to myself: regexp deals with entities called "characters".
Some regexp specifications ('.', ranges, classes etc.) apply to
"characters".

This means that the size of the character has to be known, and one can
not deal directly with UTF-8 for example ignoring it is UTF-8 since '.'
for example is a variable size sequence, whose start depends on
what was before.

So a libregexp dealing with not only runes will be possible, but would
need to specify the fixed size of the characters, i.e. the "encoding"
of the input (this has nothing to do with localization; but with what is
an elementary entity).

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C