9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Plan9 awk vs. GNU awk
@ 2009-10-24  2:20 Dimitry Golubovsky
  2009-10-24  2:45 ` michael block
  0 siblings, 1 reply; 5+ messages in thread
From: Dimitry Golubovsky @ 2009-10-24  2:20 UTC (permalink / raw)
  To: 9fans

Hi,

Working through the autoconf stuff I came across another issue: Plan9
awk does not seem to recognize a regexp that GNU awk does. This
results in config.status not properly creating config.h file based on
the configure's findings. The regexp in the test program was extracted
from config.status. Can anybody see anything wrong with this regexp?

The test program named `rxtest' (long regexp should match a #define or
#undef statement, if matched the whole line is printed):

#!/bin/sh
# (or /bin/rc for Plan9)

awk '

/^[\t ]*#[\t ]*(define|undef)[\t
]+[_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ][_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]*([\t
(]|$)/ {
  line = $ 0
print line }
'
The source file (config.h.in), top portion of it:

/* src/config.h.in.  Generated from configure.ac by autoheader.  */

/* platform-specific defines */
#include "platform.h"

/* Define to 1 if you want to use the primitives which let you examine Hugs
   bytecodes (requires INTERNAL_PRIMS). */
#undef BYTECODE_PRIMS

/* Define to 1 to use a Char encoding determined by the locale. */
#undef CHAR_ENCODING_LOCALE

/* Define to 1 to use the UTF-8 Char encoding. */
#undef CHAR_ENCODING_UTF8

/* Define to 1 if you want to perform runtime tag-checks as an internal
   consistency check. This makes Hugs run very slowly - but is very effective
   at detecting and locating subtle bugs. */
#undef CHECK_TAGS

/* Define to one of `_getb67', `GETB67', `getb67' for Cray-2 and Cray-YMP
   systems. This function is required for `alloca.c' support on those systems.
   */
#undef CRAY_STACKSEG_END

/* Define to 1 if using `alloca.c'. */
#undef C_ALLOCA

/* Define to 1 if debugging generated bytecodes or the bytecode interpreter.
   */
#undef DEBUG_CODE

Result under Linux (top portion):

$ cat src/config.h.in | rxtest | head
#undef BYTECODE_PRIMS
#undef CHAR_ENCODING_LOCALE
#undef CHAR_ENCODING_UTF8
#undef CHECK_TAGS
#undef CRAY_STACKSEG_END
#undef C_ALLOCA
#undef DEBUG_CODE

that is, all lines matched.

awk used was:

awk --version
GNU Awk 3.1.6
Copyright (C) 1989, 1991-2007 Free Software Foundation.


Result under Plan9:

term% cat src/config.h.in | rxtest
term%

Nothing: Plan9 awk does not match this regexp and source file's lines.

Not sure which version Plan9 awk is, the binary is dated:

term% ls -l /bin/awk
total 1000
-rwxr-xr-x 1 dima users 327504 Feb  4  2009 /bin/awk

/sys/src/cmd/awk/main.c contains:

char	*version = "version 19990602";

Thanks.

--
Dimitry Golubovsky

Anywhere on the Web



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] Plan9 awk vs. GNU awk
  2009-10-24  2:20 [9fans] Plan9 awk vs. GNU awk Dimitry Golubovsky
@ 2009-10-24  2:45 ` michael block
  2009-10-24 17:32   ` Russ Cox
  0 siblings, 1 reply; 5+ messages in thread
From: michael block @ 2009-10-24  2:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

try using [_a-zA-Z][_a-zA-Z0-9] rather than the expanded form. i
didn't track it down to be sure, but plan 9 awk seem to have a limit
34 characters inside square brackets. a rather odd number

cpu% echo hello | awk '/^[abcdefghijklmnopqrstuvwxyzABCDEFGH].+/ {
line = $0 ; print line }'
cpu% echo hello | awk '/^[abcdefghijklmnopqrstuvwxyzABCDEFG].+/ { line
= $0 ; print line }'
hello



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] Plan9 awk vs. GNU awk
  2009-10-24  2:45 ` michael block
@ 2009-10-24 17:32   ` Russ Cox
  2009-10-24 18:34     ` Anthony Sorace
  0 siblings, 1 reply; 5+ messages in thread
From: Russ Cox @ 2009-10-24 17:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Back when I still ported software to (instead of from) Plan 9,
I lost many hours to this kind of fiddling with making Plan 9
behave the way configure expects.  It's not worth it.

Start porting any piece of software by rm configure* auto*.
Usually a good start is to write a mkfile that lists every .c
file as a .$O, and then go from there.  You can try to compile
the files and tweak config.h.in (copied to config.h) as needed
to make them compile.  This is far less work, and at the
end you have a Plan 9 mkfile that fits better into the
build process.

In the absolute worst case you can run configure on a Linux
system and adapt the resulting Makefiles.

Configure is supposed to adapt to the systems it runs on
(that's the whole reason it exists!), not the other way around.
If you choose to spend time on this problem, it would be better
spent by fixing autoconf to handle running on Plan 9,
submitting patches, and then following through to make sure
those patches get accepted.  This is probably the same amount
of work as adapting Plan 9 but could have a more lasting effect.

Russ


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] Plan9 awk vs. GNU awk
  2009-10-24 17:32   ` Russ Cox
@ 2009-10-24 18:34     ` Anthony Sorace
  0 siblings, 0 replies; 5+ messages in thread
From: Anthony Sorace @ 2009-10-24 18:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

along that line, you might check out steve's
mkmk (contrib/install steve/mkmk). i've only
used it lightly, but it often generates a useful
starting point for turning a chunk of files into
a mkfile.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] Plan9 awk vs. GNU awk
       [not found] <<3df301650910231945n522c4592ib25cef16e9830202@mail.gmail.com>
@ 2009-10-27 17:42 ` erik quanstrom
  0 siblings, 0 replies; 5+ messages in thread
From: erik quanstrom @ 2009-10-27 17:42 UTC (permalink / raw)
  To: 9fans

On Fri Oct 23 22:47:46 EDT 2009, michaelmuffin@gmail.com wrote:
> try using [_a-zA-Z][_a-zA-Z0-9] rather than the expanded form. i
> didn't track it down to be sure, but plan 9 awk seem to have a limit
> 34 characters inside square brackets. a rather odd number
>
> cpu% echo hello | awk '/^[abcdefghijklmnopqrstuvwxyzABCDEFGH].+/ {
> line = $0 ; print line }'
> cpu% echo hello | awk '/^[abcdefghijklmnopqrstuvwxyzABCDEFG].+/ { line
> = $0 ; print line }'
> hello

there are two bits to fixing this.  steve simon
submitted an excellent patch, which i applied.
but i think there is still a bug.  34 is as noted a
really wierd number.

a bit of background:  the way cclasses work is
that each entry gets 2 entries.  a start and
an end.  for a-b the start is a and the end is b.
for just a, the start and end are both a.  one
would expect since the original code has 64
spaces, we would get a maximum of 32 chars
per class.  however, /sys/src/ape/lib/regexp/regcomp.h
has
	/* max rune ranges per character class */
	#define NCCRUNE	(sizeof(Reclass)/sizeof(wchar_t))
which is wrong since there is more stuff in Reclass
than an array of wchar_t's.  this is a big problem
since it will scribble 4 bytes past the end of the
Reclass.

these are the diffs i came up with.  they attack
the problem a bit differently, but i think it's a
bit cleaner and worth a few extra lines of diff:

; 9diff regcomp.h
/n/sources/plan9//sys/src/ape/lib/regexp/regcomp.h:1,23 - regcomp.h:1,17
  /*
   *  substitution list
   */
+ enum {
+ 	NSUBEXP	= 32,
+ 	LISTINCREMENT	= 8,
+ };
+
  typedef struct Resublist	Resublist;
  struct	Resublist
  {
- 	Resub	m[32];
+ 	Resub	m[NSUBEXP];
  };

- /* max subexpressions per program */
- Resublist ReSuBlIsT;
- #define NSUBEXP (sizeof(ReSuBlIsT.m)/sizeof(Resub))
-
- /* max character classes per program */
- Reprog	RePrOg;
- #define	NCLASS	(sizeof(RePrOg.class)/sizeof(Reclass))
-
- /* max rune ranges per character class */
- #define NCCRUNE	(sizeof(Reclass)/sizeof(wchar_t))
-
  /*
   * Actions and Tokens (Reinst types)
   *
/n/sources/plan9//sys/src/ape/lib/regexp/regcomp.h:46,52 - regcomp.h:40,45
  /*
   *  regexec execution lists
   */
- #define LISTINCREMENT 8
  typedef struct Relist	Relist;
  struct Relist
  {
; 9diff regexp.h
/n/sources/plan9//sys/include/ape/regexp.h:35,43 - regexp.h:35,50
  /*
   *	character class, each pair of rune's defines a range
   */
+ enum{
+ 	NCCRUNE	= 256,
+ 	NCLASS	= 16,
+ 	NINST		= 5,
+
+ };
+
  struct Reclass{
  	wchar_t	*end;
- 	wchar_t	spans[64];
+ 	wchar_t	spans[NCCRUNE];
  };

  /*
/n/sources/plan9//sys/include/ape/regexp.h:62,69 - regexp.h:69,76
   */
  struct Reprog{
  	Reinst	*startinst;	/* start pc */
- 	Reclass	class[16];	/* .data */
- 	Reinst	firstinst[5];	/* .text */
+ 	Reclass	class[NCLASS];	/* .data */
+ 	Reinst	firstinst[NINST];	/* .text */
  };

  extern Reprog	*regcomp(char*);

- erik



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-27 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-24  2:20 [9fans] Plan9 awk vs. GNU awk Dimitry Golubovsky
2009-10-24  2:45 ` michael block
2009-10-24 17:32   ` Russ Cox
2009-10-24 18:34     ` Anthony Sorace
     [not found] <<3df301650910231945n522c4592ib25cef16e9830202@mail.gmail.com>
2009-10-27 17:42 ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).