[-- Attachment #1: Type: text/plain, Size: 98 bytes --] this one comes in handy for building bulk rename scripts. patch for git/import attached. - james [-- Attachment #2.1: Type: text/plain, Size: 327 bytes --] from postmaster@1ess: The following attachment had content that we can't prove to be harmless. To avoid possible automatic execution, we changed the content headers. The original header was: Content-Disposition: attachment;filename="awk.patch" Content-Type: text/x-patch; name="awk.patch" Content-Transfer-Encoding: BASE64 [-- Attachment #2.2: awk.patch.suspect --] [-- Type: application/octet-stream, Size: 1509 bytes --] From: james palmer <james@biobuf.link> Date: Tue, 03 Aug 2021 14:33:24 +0000 Subject: [PATCH] awk: add %q format for quoted strings (see quote(2)) --- diff 223daf6104b5fd73a6214fc3b2fcbd237ffbe666 0a401c48ef82bbb8bdafc93a6107cc3a35ada4ee --- a/sys/src/cmd/awk/main.c Sat Jul 31 19:29:39 2021 +++ b/sys/src/cmd/awk/main.c Tue Aug 3 15:33:24 2021 @@ -60,6 +60,8 @@ Binit(&stdout, 1, OWRITE); Binit(&stderr, 2, OWRITE); + quotefmtinstall(); + cmdname = argv[0]; if (argc == 1) { Bprint(&stderr, "usage: %s [-F fieldsep] [-d] [-mf n] [-mr n] [-safe] [-v var=value] [-f programfile | 'program'] [file ...]\n", cmdname); --- a/sys/src/cmd/awk/run.c Sat Jul 31 19:29:39 2021 +++ b/sys/src/cmd/awk/run.c Tue Aug 3 15:33:24 2021 @@ -836,7 +836,7 @@ int format(char **pbuf, int *pbufsize, char *s, Node *a) /* printf-like conversions */ { char *fmt; - char *p, *t, *os; + char *p, *t, *os, *tmp; Cell *x; int flag, n; int fmtwd; /* format width */ @@ -915,6 +915,9 @@ case 'c': flag = 5; break; + case 'q': + flag = 6; + break; default: WARNING("weird printf conversion %s", fmt); flag = 0; @@ -964,6 +967,14 @@ p++; *p = '\0'; } + break; + case 6: + t = getsval(x); + tmp = t; + while(*tmp++) { if(*tmp == '\'') { n++; } n++; } + if(!adjbuf(&buf, &bufsize, 3+n+p-buf, recsize, &p, 0)) + FATAL("huge string/format (%d chars) in printf %.30s... ran format() out of memory", n, t); + sprint(p, fmt, t); break; } if (istemp(x))
i see why you would want this but the argument against it was always to keep the plan9 version of awk compatible with Brian’s one-true-awk so awk code is truely portable. how much hassle is portability worth? good question. -Steve
Quoth james palmer <james@biobuf.link>:
> this one comes in handy for building bulk rename scripts.
> patch for git/import attached.
Neat. If it is applied, could the man page be updated to reflect the
change, too?
Quoth unobe@cpan.org:
> Quoth james palmer <james@biobuf.link>:
> > this one comes in handy for building bulk rename scripts.
> > patch for git/import attached.
>
> Neat. If it is applied, could the man page be updated to reflect the
> change, too?
which man page? i don't see a list of format codes in awk(1). it refers to fprintf(2) which hasn't been changed. (surely it should be print(2) ?)
should i just add a sentence saying that the quote format has been installed?
- james
Quoth steve@quintile.net:
>
> i see why you would want this but the argument against it was always to
> keep the plan9 version of awk compatible with Brian’s one-true-awk so
> awk code is truely portable.
>
> how much hassle is portability worth? good question.
>
> -Steve
yeah i suppose that's a sensible argument to make.
i don't think it is that big of a deal. perhaph it should be documented as non-standard in the man page?
- james
Quoth james palmer <james@biobuf.link>:
> this one comes in handy for building bulk rename scripts.
> patch for git/import attached.
>
> - james
>
I think it makes sense to add this, though I'm
not sure how much we want to diverge from upstream.
Ape would get this by accident, but it's a superset
of what posix needs.
going the other way would also be neat:
ls -l | awk -Q {print $NF}
Quoth ori@eigenstate.org:
> Quoth james palmer <james@biobuf.link>:
> > this one comes in handy for building bulk rename scripts.
> > patch for git/import attached.
> >
> > - james
> >
>
> I think it makes sense to add this, though I'm
> not sure how much we want to diverge from upstream.
>
> Ape would get this by accident, but it's a superset
> of what posix needs.
>
> going the other way would also be neat:
>
> ls -l | awk -Q {print $NF}
>
Here's a proof of concept -- though, I don't like how it
totally ignores FS.
diff fb2e0a1987b33083e3e08fa0659f99534c56d6aa uncommitted
--- a/sys/src/cmd/awk/awk.h
+++ b/sys/src/cmd/awk/awk.h
@@ -49,6 +49,7 @@
extern int donefld; /* 1 if record broken into fields */
extern int donerec; /* 1 if record is valid (no fld has changed */
extern char inputFS[]; /* FS at time of input, for field splitting */
+extern int quotefld; /* if we use quotes instead of FS for splitting */
extern int dbg;
--- a/sys/src/cmd/awk/lib.c
+++ b/sys/src/cmd/awk/lib.c
@@ -38,6 +38,7 @@
Cell **fldtab; /* pointers to Cells */
char inputFS[100] = " ";
+int quotefld;
#define MAXFLD 200
int nfields = MAXFLD; /* last allocated slot for $i */
@@ -242,7 +243,49 @@
dprint( ("command line set %s to |%s|\n", s, p) );
}
+char *unquoted(char *t, char **et)
+{
+ int quoting;
+ char *h, *s;
+ quoting = 0;
+ /* unquoting only shrinks s */
+ while (*t == ' ' || *t == '\t' || *t == '\n')
+ t++;
+ s = strdup(t);
+ h = s;
+ if (s == nil)
+ FATAL("out of space in tostring on %s", s);
+ while(*t!='\0'){
+ if(!quoting && (*t == ' ' || *t == '\t' || *t == '\n'))
+ break;
+ if(*t != '\''){
+ *s++ = *t++;
+ continue;
+ }
+ /* *t is a quote */
+ if(!quoting){
+ quoting = 1;
+ t++;
+ continue;
+ }
+ /* quoting and we're on a quote */
+ if(t[1] != '\''){
+ /* end of quoted section; absorb closing quote */
+ t++;
+ quoting = 0;
+ continue;
+ }
+ /* doubled quote; fold one quote into two */
+ t++;
+ *s++ = *t++;
+
+ }
+ *s = 0;
+ *et = t;
+ return h;
+}
+
void fldbld(void) /* create fields from current record */
{
/* this relies on having fields[] the same length as $0 */
@@ -265,7 +308,21 @@
}
fr = fields;
i = 0; /* number of fields accumulated here */
- if (strlen(inputFS) > 1) { /* it's a regular expression */
+ if (quotefld){ /* it's quoted text */
+ for (i = 0; ; ) {
+ while (*r == ' ' || *r == '\t' || *r == '\n')
+ r++;
+ if (*r == 0)
+ break;
+ i++;
+ if (i > nfields)
+ growfldtab(i);
+ if (freeable(fldtab[i]))
+ xfree(fldtab[i]->sval);
+ fldtab[i]->sval = unquoted(r, &r);
+ fldtab[i]->tval = FLD | STR;
+ }
+ }else if (strlen(inputFS) > 1) { /* it's a regular expression */
i = refldbld(r, inputFS);
} else if (*inputFS == ' ') { /* default whitespace */
for (i = 0; ; ) {
--- a/sys/src/cmd/awk/main.c
+++ b/sys/src/cmd/awk/main.c
@@ -51,7 +51,7 @@
void main(int argc, char *argv[])
{
- char *fs = nil, *marg;
+ char *fs = nil, qs = 0, *marg;
int temp;
setfcr(getfcr() & ~FPINVAL);
@@ -103,6 +103,9 @@
if (fs == nil || *fs == '\0')
WARNING("field separator FS is empty");
break;
+ case 'Q':
+ qs = 1;
+ break;
case 'v': /* -v a=1 to be done NOW. one -v for each */
if (argv[1][2] == '\0' && --argc > 1 && isclvar((++argv)[1]))
setclvar(argv[1]);
@@ -158,6 +161,8 @@
dprint( ("argc=%d, argv[0]=%s\n", argc, argv[0]) );
arginit(argc, argv);
yyparse();
+ if (qs)
+ quotefld = 1;
if (fs)
*FS = qstring(fs, '\0');
dprint( ("exitstatus=%s\n", exitstatus) );
Would a separate program be sufficient? A simple main wrapper around quote() It'd preserve awk's artwork intact, and having separate programs is what Unix Way is all about, or so I heard.