* [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
@ 2007-02-22 22:16 Folkert van Heusden
2007-02-22 23:17 ` Alberto Cortés
2007-02-22 23:21 ` William Josephson
0 siblings, 2 replies; 12+ messages in thread
From: Folkert van Heusden @ 2007-02-22 22:16 UTC (permalink / raw)
To: 9fans
Hi,
A user of a program of mine (http://www.vanheusden.com/multitail/) tries
to use plan9 regexps under linux and doesn't succeed.
Am I right that plan9 regular expressions are not compatible with the
ones of "regular" unix?
Folkert van Heusden
--
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden
@ 2007-02-22 23:17 ` Alberto Cortés
2007-02-22 23:21 ` William Josephson
1 sibling, 0 replies; 12+ messages in thread
From: Alberto Cortés @ 2007-02-22 23:17 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Folkert van Heusden said:
> Hi,
>
> A user of a program of mine (http://www.vanheusden.com/multitail/) tries
> to use plan9 regexps under linux and doesn't succeed.
> Am I right that plan9 regular expressions are not compatible with the
> ones of "regular" unix?
They are different. I am not very sure what you mean by "regular"
UNIX regexp, as far as I now in Linux each command seems to use
different sets of regexps.
As for plan9, you can read regexp(6) at:
http://plan9.bell-labs.com/magic/man2html/6/regexp
Sam also support structural regexps:
http://plan9.bell-labs.com/sources/contrib/uriel/mirror/se.pdf
--
Alberto Cortés
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden
2007-02-22 23:17 ` Alberto Cortés
@ 2007-02-22 23:21 ` William Josephson
2007-02-22 23:48 ` Russ Cox
1 sibling, 1 reply; 12+ messages in thread
From: William Josephson @ 2007-02-22 23:21 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Thu, Feb 22, 2007 at 11:16:26PM +0100, Folkert van Heusden wrote:
> A user of a program of mine (http://www.vanheusden.com/multitail/) tries
> to use plan9 regexps under linux and doesn't succeed.
> Am I right that plan9 regular expressions are not compatible with the
> ones of "regular" unix?
Many unix programs don't use ``extended'' regular expressions by
default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-22 23:21 ` William Josephson
@ 2007-02-22 23:48 ` Russ Cox
2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola
0 siblings, 2 replies; 12+ messages in thread
From: Russ Cox @ 2007-02-22 23:48 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> Many unix programs don't use ``extended'' regular expressions by
> default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix.
The Plan 9 regexp library matches the old Unix egrep command.
Any regexp you'd try under Plan 9 should work with new egreps,
though not vice versa -- new egreps tend to have newfangled
additions like [:upper:] and \w and {4,6} for repetition.
Russ
^ permalink raw reply [flat|nested] 12+ messages in thread
* Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?)
2007-02-22 23:48 ` Russ Cox
@ 2007-02-23 6:27 ` Joel Salomon
2007-02-23 6:54 ` William K. Josephson
2007-02-23 17:33 ` Russ Cox
2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola
1 sibling, 2 replies; 12+ messages in thread
From: Joel Salomon @ 2007-02-23 6:27 UTC (permalink / raw)
To: 9fans
On 2/22/07, Russ Cox <rsc@swtch.com> wrote:
> The Plan 9 regexp library matches the old Unix egrep command.
> Any regexp you'd try under Plan 9 should work with new egreps,
> though not vice versa -- new egreps tend to have newfangled
> additions like [:upper:] and \w and {4,6} for repetition.
This came up as I was implementing my C lexer for the compilers class
I'm taking. How hard would it be to allow access to regcomp(2)'s
internals, so I could build up a regexp part-by part a la lex?
For example, to recognize C99 hexadecimal floating-point constants, I
wrote a second program that builds up the regexp piece-by-piece using
smprint(2), then compiling the whole thing:
char *decdig = "([0-9])",
*hexdig = "([0-9A-Fa-f])",
*sign = "([+\\-])",
*dot = "(\\.)",
*dseq, *dexp, *dfrac, *decflt,
*hseq, *bexp, *hfrac, *hexflt;
dseq = smprint("(%s+)", decdig);
dexp = smprint("([Ee]%s?%s)", sign, dseq);
dfrac = smprint("((%s?%s%s)|(%s%s))", dseq, dot, dseq, dseq, dot);
decflt = smprint("(%s%s?)|(%s%s)", dfrac, dexp, dseq, dexp);
regcomp(decflt); // make sure it compiles
print("decfloat: %s\n", decflt);
hseq = smprint("(%s+)", hexdig);
bexp = smprint("([Pp]%s?%s)", sign, dseq);
hfrac = smprint("((%s?%s%s)|(%s%s))", hseq, dot, hseq, hseq, dot);
hexflt = smprint("0[Xx](%s|%s)%s", hfrac, hseq, bexp);
regcomp(hexflt); // make sure it compiles
print("hexfloat: %s\n", hexflt);
I know that regcomp builds up the Reprog by combining subprograms with
catenation and alternation &c., but I’d be loath to try tinkering
there directly without a much better understanding of the algorithm.
I’ve glanced through the documents at swtch.com/????? and the regcomp
source code, just haven’t had the time for an in-depth study.
Would such a project be a worthwhile spent of time? (Might it develop
into the asteroid to kill the dinosaur waiting for it?)
--Joel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?)
2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
@ 2007-02-23 6:54 ` William K. Josephson
2007-02-23 13:34 ` Joel C. Salomon
2007-02-23 17:33 ` Russ Cox
1 sibling, 1 reply; 12+ messages in thread
From: William K. Josephson @ 2007-02-23 6:54 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote:
> Would such a project be a worthwhile spent of time? (Might it develop
> into the asteroid to kill the dinosaur waiting for it?)
Why go to the trouble? For C, the lexer is easy
enough to just write by hand.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-22 23:48 ` Russ Cox
2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
@ 2007-02-23 11:19 ` Gorka Guardiola
2007-02-23 12:12 ` erik quanstrom
1 sibling, 1 reply; 12+ messages in thread
From: Gorka Guardiola @ 2007-02-23 11:19 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Also, I am not sure if you can use expressions with big unicode
characteres in Unix, last time I looked with sed, you could not.
On 2/23/07, Russ Cox <rsc@swtch.com> wrote:
> > Many unix programs don't use ``extended'' regular expressions by
> > default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix.
>
> The Plan 9 regexp library matches the old Unix egrep command.
> Any regexp you'd try under Plan 9 should work with new egreps,
> though not vice versa -- new egreps tend to have newfangled
> additions like [:upper:] and \w and {4,6} for repetition.
>
> Russ
>
--
- curiosity sKilled the cat
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola
@ 2007-02-23 12:12 ` erik quanstrom
2007-02-23 12:17 ` Gorka Guardiola
0 siblings, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2007-02-23 12:12 UTC (permalink / raw)
To: 9fans
utf-8 encoding will "just work" (unless the gnu folk are
rearranging characters with the bucky bit set) or if
the result depends on knowing the width of a character,
e.g. in
a) a character class
b) matching a single character with ".".
for example for a file "fu" with these lines
α0
β0
α1
(no leading tab) i get these results with no
local settings at all.
; grep δ fu
δ0
works because as far as grep is concerned, the string
i asked for 03 b4 is in there. this works, too
; egrep '(ε|δ)0' fu
ε0
δ0
and this works because there is a character before
"0" on the line:
; egrep '.0' fu
ε0
δ0
but this doesn't
; egrep '[αβ]0' fu
; egrep '^.0' fu
this is for gnu grep version
; egrep --version
egrep (GNU grep) 2.5.1
- erik
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-23 12:12 ` erik quanstrom
@ 2007-02-23 12:17 ` Gorka Guardiola
2007-02-23 13:02 ` erik quanstrom
0 siblings, 1 reply; 12+ messages in thread
From: Gorka Guardiola @ 2007-02-23 12:17 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
If it doesn't for one case, then it doesn't.
On 2/23/07, erik quanstrom <quanstro@coraid.com> wrote:
> ; egrep '[αβ]0' fu
> ; egrep '^.0' fu
>
--
- curiosity sKilled the cat
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux)
2007-02-23 12:17 ` Gorka Guardiola
@ 2007-02-23 13:02 ` erik quanstrom
0 siblings, 0 replies; 12+ messages in thread
From: erik quanstrom @ 2007-02-23 13:02 UTC (permalink / raw)
To: 9fans
i don't think that sort of absolutist thinking really works.
i used gnu grep (and all the other gnu tools) on utf-8 stuff
from the time of the first sam release for unix till i stopped using
linux for much development. i never had a problem with
g(ed|sed|awk|e?grep) tripping on utf-8 when the local was
unset or "C". i did keep in mind that . wasn't going to match
"☺", though.
we all know the limitations of our tools. that doesn't make
them broken.
just because plan 9 does bad things if you exceed NPROCS,
doesn't make it broken.
- erik
On 2/23/07, erik quanstrom <quanstro@coraid.com> wrote:
> ; egrep '[��]0' fu
> ; egrep '^.0' fu
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?)
2007-02-23 6:54 ` William K. Josephson
@ 2007-02-23 13:34 ` Joel C. Salomon
0 siblings, 0 replies; 12+ messages in thread
From: Joel C. Salomon @ 2007-02-23 13:34 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 2/23/07, William K. Josephson <jkw@eecs.harvard.edu> wrote:
> On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote:
> > Would such a project be a worthwhile spent of time? (Might it develop
> > into the asteroid to kill the dinosaur waiting for it?)
>
> Why go to the trouble? For C, the lexer is easy
> enough to just write by hand.
For a useful and significant subset of C, the lexer is easy enough to
just write by hand. I was trying for full C99 (what were those ISO
guys drinking?). I spent far too much time on it to call the task
"easy".
I have what I believe is a pretty complete C lexer
(http://www.tip9ug.jp/who/chesky/comp/lex.c). It still is far from
being integrated into a full grammar, but it scans cpp(1) output
nicely. I tested it against some of the odder "features" of C99—UCNs,
hex floats, &c.—and it seems to work.
Some parts were easy, some less so, and some looked easy until they
turned out to be subtly wrong. Recognizing whether the number seen is
an integer (in decimal, octal, or hex) or a real number was one of the
hard parts, and one I gladly handed off to a regexp. The way I
generated the regexp may not be ideal, as someone pointed out to me
off-list, but hand-generated code that recognizes what sort of number
was seen would be exactly equivalent to the regexp, and less readable.
--Joel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?)
2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
2007-02-23 6:54 ` William K. Josephson
@ 2007-02-23 17:33 ` Russ Cox
1 sibling, 0 replies; 12+ messages in thread
From: Russ Cox @ 2007-02-23 17:33 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Lex has three benefits:
1) You don't have to write the lexer directly.
2) What you do have to write is fairly concise.
3) The resulting lexer is fairly efficient.
It has two main drawbacks:
4) The input model does not always match your
own program's input model, creating a messy interface.
5) Once you need more than regular expressions,
lexers written with state variables and such can get
very opaque very fast.
Many on this list would argue that (1) and (2) do not
outweigh (4) and (5), instead suggesting that writing a
lexer by hand is not too difficult and ends up being
more maintainable than a lex spec in the long run.
And of course, for a well-written by-hand lexer,
you get to keep (3).
Creating new entry hooks in the regexp library doesn't
preserve (1), (2), or (3). And if much of your time is
spent in lexical analysis (as Ken claimed was true for
the Plan 9 compilers), losing (3) is a big deal.
So that seems like not a very good replacement for lex.
All that said, lex has been used to write a lot of C
compilers, and can be used in that context without
running into much of (4) or (5). Why not just use lex here?
Russ
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-02-23 17:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden
2007-02-22 23:17 ` Alberto Cortés
2007-02-22 23:21 ` William Josephson
2007-02-22 23:48 ` Russ Cox
2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
2007-02-23 6:54 ` William K. Josephson
2007-02-23 13:34 ` Joel C. Salomon
2007-02-23 17:33 ` Russ Cox
2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola
2007-02-23 12:12 ` erik quanstrom
2007-02-23 12:17 ` Gorka Guardiola
2007-02-23 13:02 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).