* [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) @ 2007-02-22 22:16 Folkert van Heusden 2007-02-22 23:17 ` Alberto Cortés 2007-02-22 23:21 ` William Josephson 0 siblings, 2 replies; 12+ messages in thread From: Folkert van Heusden @ 2007-02-22 22:16 UTC (permalink / raw) To: 9fans Hi, A user of a program of mine (http://www.vanheusden.com/multitail/) tries to use plan9 regexps under linux and doesn't succeed. Am I right that plan9 regular expressions are not compatible with the ones of "regular" unix? Folkert van Heusden -- www.vanheusden.com/multitail - multitail is tail on steroids. multiple windows, filtering, coloring, anything you can think of ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden @ 2007-02-22 23:17 ` Alberto Cortés 2007-02-22 23:21 ` William Josephson 1 sibling, 0 replies; 12+ messages in thread From: Alberto Cortés @ 2007-02-22 23:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Folkert van Heusden said: > Hi, > > A user of a program of mine (http://www.vanheusden.com/multitail/) tries > to use plan9 regexps under linux and doesn't succeed. > Am I right that plan9 regular expressions are not compatible with the > ones of "regular" unix? They are different. I am not very sure what you mean by "regular" UNIX regexp, as far as I now in Linux each command seems to use different sets of regexps. As for plan9, you can read regexp(6) at: http://plan9.bell-labs.com/magic/man2html/6/regexp Sam also support structural regexps: http://plan9.bell-labs.com/sources/contrib/uriel/mirror/se.pdf -- Alberto Cortés ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden 2007-02-22 23:17 ` Alberto Cortés @ 2007-02-22 23:21 ` William Josephson 2007-02-22 23:48 ` Russ Cox 1 sibling, 1 reply; 12+ messages in thread From: William Josephson @ 2007-02-22 23:21 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Feb 22, 2007 at 11:16:26PM +0100, Folkert van Heusden wrote: > A user of a program of mine (http://www.vanheusden.com/multitail/) tries > to use plan9 regexps under linux and doesn't succeed. > Am I right that plan9 regular expressions are not compatible with the > ones of "regular" unix? Many unix programs don't use ``extended'' regular expressions by default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-22 23:21 ` William Josephson @ 2007-02-22 23:48 ` Russ Cox 2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon 2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola 0 siblings, 2 replies; 12+ messages in thread From: Russ Cox @ 2007-02-22 23:48 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > Many unix programs don't use ``extended'' regular expressions by > default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix. The Plan 9 regexp library matches the old Unix egrep command. Any regexp you'd try under Plan 9 should work with new egreps, though not vice versa -- new egreps tend to have newfangled additions like [:upper:] and \w and {4,6} for repetition. Russ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) 2007-02-22 23:48 ` Russ Cox @ 2007-02-23 6:27 ` Joel Salomon 2007-02-23 6:54 ` William K. Josephson 2007-02-23 17:33 ` Russ Cox 2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola 1 sibling, 2 replies; 12+ messages in thread From: Joel Salomon @ 2007-02-23 6:27 UTC (permalink / raw) To: 9fans On 2/22/07, Russ Cox <rsc@swtch.com> wrote: > The Plan 9 regexp library matches the old Unix egrep command. > Any regexp you'd try under Plan 9 should work with new egreps, > though not vice versa -- new egreps tend to have newfangled > additions like [:upper:] and \w and {4,6} for repetition. This came up as I was implementing my C lexer for the compilers class I'm taking. How hard would it be to allow access to regcomp(2)'s internals, so I could build up a regexp part-by part a la lex? For example, to recognize C99 hexadecimal floating-point constants, I wrote a second program that builds up the regexp piece-by-piece using smprint(2), then compiling the whole thing: char *decdig = "([0-9])", *hexdig = "([0-9A-Fa-f])", *sign = "([+\\-])", *dot = "(\\.)", *dseq, *dexp, *dfrac, *decflt, *hseq, *bexp, *hfrac, *hexflt; dseq = smprint("(%s+)", decdig); dexp = smprint("([Ee]%s?%s)", sign, dseq); dfrac = smprint("((%s?%s%s)|(%s%s))", dseq, dot, dseq, dseq, dot); decflt = smprint("(%s%s?)|(%s%s)", dfrac, dexp, dseq, dexp); regcomp(decflt); // make sure it compiles print("decfloat: %s\n", decflt); hseq = smprint("(%s+)", hexdig); bexp = smprint("([Pp]%s?%s)", sign, dseq); hfrac = smprint("((%s?%s%s)|(%s%s))", hseq, dot, hseq, hseq, dot); hexflt = smprint("0[Xx](%s|%s)%s", hfrac, hseq, bexp); regcomp(hexflt); // make sure it compiles print("hexfloat: %s\n", hexflt); I know that regcomp builds up the Reprog by combining subprograms with catenation and alternation &c., but I’d be loath to try tinkering there directly without a much better understanding of the algorithm. I’ve glanced through the documents at swtch.com/????? and the regcomp source code, just haven’t had the time for an in-depth study. Would such a project be a worthwhile spent of time? (Might it develop into the asteroid to kill the dinosaur waiting for it?) --Joel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) 2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon @ 2007-02-23 6:54 ` William K. Josephson 2007-02-23 13:34 ` Joel C. Salomon 2007-02-23 17:33 ` Russ Cox 1 sibling, 1 reply; 12+ messages in thread From: William K. Josephson @ 2007-02-23 6:54 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote: > Would such a project be a worthwhile spent of time? (Might it develop > into the asteroid to kill the dinosaur waiting for it?) Why go to the trouble? For C, the lexer is easy enough to just write by hand. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) 2007-02-23 6:54 ` William K. Josephson @ 2007-02-23 13:34 ` Joel C. Salomon 0 siblings, 0 replies; 12+ messages in thread From: Joel C. Salomon @ 2007-02-23 13:34 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On 2/23/07, William K. Josephson <jkw@eecs.harvard.edu> wrote: > On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote: > > Would such a project be a worthwhile spent of time? (Might it develop > > into the asteroid to kill the dinosaur waiting for it?) > > Why go to the trouble? For C, the lexer is easy > enough to just write by hand. For a useful and significant subset of C, the lexer is easy enough to just write by hand. I was trying for full C99 (what were those ISO guys drinking?). I spent far too much time on it to call the task "easy". I have what I believe is a pretty complete C lexer (http://www.tip9ug.jp/who/chesky/comp/lex.c). It still is far from being integrated into a full grammar, but it scans cpp(1) output nicely. I tested it against some of the odder "features" of C99—UCNs, hex floats, &c.—and it seems to work. Some parts were easy, some less so, and some looked easy until they turned out to be subtly wrong. Recognizing whether the number seen is an integer (in decimal, octal, or hex) or a real number was one of the hard parts, and one I gladly handed off to a regexp. The way I generated the regexp may not be ideal, as someone pointed out to me off-list, but hand-generated code that recognizes what sort of number was seen would be exactly equivalent to the regexp, and less readable. --Joel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) 2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon 2007-02-23 6:54 ` William K. Josephson @ 2007-02-23 17:33 ` Russ Cox 1 sibling, 0 replies; 12+ messages in thread From: Russ Cox @ 2007-02-23 17:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Lex has three benefits: 1) You don't have to write the lexer directly. 2) What you do have to write is fairly concise. 3) The resulting lexer is fairly efficient. It has two main drawbacks: 4) The input model does not always match your own program's input model, creating a messy interface. 5) Once you need more than regular expressions, lexers written with state variables and such can get very opaque very fast. Many on this list would argue that (1) and (2) do not outweigh (4) and (5), instead suggesting that writing a lexer by hand is not too difficult and ends up being more maintainable than a lex spec in the long run. And of course, for a well-written by-hand lexer, you get to keep (3). Creating new entry hooks in the regexp library doesn't preserve (1), (2), or (3). And if much of your time is spent in lexical analysis (as Ken claimed was true for the Plan 9 compilers), losing (3) is a big deal. So that seems like not a very good replacement for lex. All that said, lex has been used to write a lot of C compilers, and can be used in that context without running into much of (4) or (5). Why not just use lex here? Russ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-22 23:48 ` Russ Cox 2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon @ 2007-02-23 11:19 ` Gorka Guardiola 2007-02-23 12:12 ` erik quanstrom 1 sibling, 1 reply; 12+ messages in thread From: Gorka Guardiola @ 2007-02-23 11:19 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Also, I am not sure if you can use expressions with big unicode characteres in Unix, last time I looked with sed, you could not. On 2/23/07, Russ Cox <rsc@swtch.com> wrote: > > Many unix programs don't use ``extended'' regular expressions by > > default. See regexp(7) on Plan 9 or try egrep/grep -E under Unix. > > The Plan 9 regexp library matches the old Unix egrep command. > Any regexp you'd try under Plan 9 should work with new egreps, > though not vice versa -- new egreps tend to have newfangled > additions like [:upper:] and \w and {4,6} for repetition. > > Russ > -- - curiosity sKilled the cat ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola @ 2007-02-23 12:12 ` erik quanstrom 2007-02-23 12:17 ` Gorka Guardiola 0 siblings, 1 reply; 12+ messages in thread From: erik quanstrom @ 2007-02-23 12:12 UTC (permalink / raw) To: 9fans utf-8 encoding will "just work" (unless the gnu folk are rearranging characters with the bucky bit set) or if the result depends on knowing the width of a character, e.g. in a) a character class b) matching a single character with ".". for example for a file "fu" with these lines α0 β0 α1 (no leading tab) i get these results with no local settings at all. ; grep δ fu δ0 works because as far as grep is concerned, the string i asked for 03 b4 is in there. this works, too ; egrep '(ε|δ)0' fu ε0 δ0 and this works because there is a character before "0" on the line: ; egrep '.0' fu ε0 δ0 but this doesn't ; egrep '[αβ]0' fu ; egrep '^.0' fu this is for gnu grep version ; egrep --version egrep (GNU grep) 2.5.1 - erik ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-23 12:12 ` erik quanstrom @ 2007-02-23 12:17 ` Gorka Guardiola 2007-02-23 13:02 ` erik quanstrom 0 siblings, 1 reply; 12+ messages in thread From: Gorka Guardiola @ 2007-02-23 12:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs If it doesn't for one case, then it doesn't. On 2/23/07, erik quanstrom <quanstro@coraid.com> wrote: > ; egrep '[αβ]0' fu > ; egrep '^.0' fu > -- - curiosity sKilled the cat ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) 2007-02-23 12:17 ` Gorka Guardiola @ 2007-02-23 13:02 ` erik quanstrom 0 siblings, 0 replies; 12+ messages in thread From: erik quanstrom @ 2007-02-23 13:02 UTC (permalink / raw) To: 9fans i don't think that sort of absolutist thinking really works. i used gnu grep (and all the other gnu tools) on utf-8 stuff from the time of the first sam release for unix till i stopped using linux for much development. i never had a problem with g(ed|sed|awk|e?grep) tripping on utf-8 when the local was unset or "C". i did keep in mind that . wasn't going to match "☺", though. we all know the limitations of our tools. that doesn't make them broken. just because plan 9 does bad things if you exceed NPROCS, doesn't make it broken. - erik On 2/23/07, erik quanstrom <quanstro@coraid.com> wrote: > ; egrep '[��]0' fu > ; egrep '^.0' fu > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-02-23 17:33 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden 2007-02-22 23:17 ` Alberto Cortés 2007-02-22 23:21 ` William Josephson 2007-02-22 23:48 ` Russ Cox 2007-02-23 6:27 ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon 2007-02-23 6:54 ` William K. Josephson 2007-02-23 13:34 ` Joel C. Salomon 2007-02-23 17:33 ` Russ Cox 2007-02-23 11:19 ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola 2007-02-23 12:12 ` erik quanstrom 2007-02-23 12:17 ` Gorka Guardiola 2007-02-23 13:02 ` erik quanstrom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).