From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id oAMNTwfO008410 for ; Mon, 22 Nov 2010 18:30:01 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1PKfpc-0007Yl-OD; Tue, 23 Nov 2010 00:29:56 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1PKfpc-0001td-Kz; Tue, 23 Nov 2010 00:29:56 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1PKfpc-0002zt-GV; Tue, 23 Nov 2010 00:29:56 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1PKfpc-0007Bk-2q; Tue, 23 Nov 2010 00:29:56 +0100 Date: Tue, 23 Nov 2010 00:29:55 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Cc: jmc@openbsd.org Subject: implement .de Message-ID: <20101122232955.GA17247@iris.usta.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Hi, as i hoped, i did find a bit of time to finish the first step of the .de implementation that i started on the train back from Budapest. I'm including Jason because this is relevant new functionality, even though we don't need it for base. In case we digress into implementation technicalities, we can probably drop him from the Cc:. The patch against OpenBSD shown below implements: - defining roff macros using .de mymacro definition .. The definition may be one or more lines. It may or may not contain macros; typically, it will. Custom end macros are also supported. - using roff macros as in .mymacro Variable macro arguments are not yet supported. When this goes in, variable arguments will be the next step. - defining roff strings using .ds mystring replacement This is not new, but still works. - using roff strings as in \*[mystring] This is not new, but still works. - using macros as strings as in \*[mymacro] For example, .de mine .I italic .B bold roman .. .TH DSMULTI 1 "October 31, 2010" .SH NAME dsmulti \- multi-line string usage .SH DESCRIPTION prefix text (\*[mine]) postfix test .nf prefix text (\*[mine]) postfix test .fi - using strings as macros as in .mystring Note that the next line will be appended to the expanded value. Here is my favourite: .ds mine .S .TH DSMACRO 1 "October 31, 2010" .mine H NAME dsmacro \- abusing strings as macros .mine H DESCRIPTION text Changes to roff.c: * Handling of macro definitions: - struct roffnode needs a new member *name for .de nodes: The name must be fed into roffnode_push(), and it must be freed in roffnode_pop(). - roff_block[|_sub|_text]() get handling code for .de. - roff_setstr() needs an additional argument to distinguish between single-line strings (.ds) and multi-line strings (.de). - roff_strdup() becomes insufficient and obsolete and can be integrated into roff_setstr(). * Handling of user-defined macros: - roff_hash_find() needs an additional size parameter because user-defined macros can be of arbitrary length, so we cannot use a static mac[] buffer any longer. - enum rofft needs a new entry ROFF_USERDEF and we need a new handler function roff_userdef(). - struct roff needs a new member *current_string because it is set in roff_parse() and used in roff_userdef(); for the same reason, roff_parse() needs access to struct roff. Changes to main.c: * Function parsebuf() was split out of pdesc() because ROFF_REPARSE needs to call this part recursively. * Handling of the roff_parseln() result becomes a switch because there are now many possibilities. You may wonder why we need so many different reprocessing flags instead of just ROFF_RERUN in the past. Indeed, all three are substantially different and all three are actually needed, and i fail to see how i could subsume any of them by either of the two others: ROFF_REPARSE must be used: - for \* string inclusion because strings defined with .de may contain multiple lines; - when calling user-defined macros defined with .de because these may contain multiple lines. ROFF_APPEND must be used: - when calling user-defined macros defined with .ds. As these are defined with .ds, they never contain multiple lines, so not using ROFF_REPARSE is OK; as the next line must be appended, calling parsebuf() is not an option. ROFF_RERUN must be used: - on user-defined end macros; - after evaluating conditionals. - In both cases, ROFF_REPARSE is not needed because we know there is just a single line, and ROFF_APPEND would be wrong because we do not want to append the next line. Yours, Ingo Index: main.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/main.c,v retrieving revision 1.54 diff -u -p -r1.54 main.c --- main.c 26 Oct 2010 23:34:38 -0000 1.54 +++ main.c 22 Nov 2010 23:19:18 -0000 @@ -190,6 +190,7 @@ static const char * const mandocerrs[MAN "static buffer exhausted", }; +static void parsebuf(struct curparse *, struct buf, int); static void pdesc(struct curparse *); static void fdesc(struct curparse *); static void ffile(const char *, struct curparse *); @@ -552,20 +553,8 @@ fdesc(struct curparse *curp) static void pdesc(struct curparse *curp) { - struct buf ln, blk; - int i, pos, lnn, lnn_start, with_mmap, of; - enum rofferr re; - unsigned char c; - struct man *man; - struct mdoc *mdoc; - struct roff *roff; - - memset(&ln, 0, sizeof(struct buf)); - - /* - * Two buffers: ln and buf. buf is the input file and may be - * memory mapped. ln is a line buffer and grows on-demand. - */ + struct buf blk; + int with_mmap; if ( ! read_whole_file(curp, &blk, &with_mmap)) { exit_status = MANDOCLEVEL_SYSERR; @@ -575,14 +564,39 @@ pdesc(struct curparse *curp) if (NULL == curp->roff) curp->roff = roff_alloc(&curp->regs, curp, mmsg); assert(curp->roff); - roff = curp->roff; - mdoc = curp->mdoc; + + parsebuf(curp, blk, 1); + + if (with_mmap) + munmap(blk.buf, blk.sz); + else + free(blk.buf); +} + +static void +parsebuf(struct curparse *curp, struct buf blk, int start) +{ + struct buf ln; + int i, pos, lnn, lnn_start, of; + unsigned char c; + struct man *man; + struct mdoc *mdoc; + struct roff *roff; + man = curp->man; + mdoc = curp->mdoc; + roff = curp->roff; - for (i = 0, lnn = 1; i < (int)blk.sz;) { - pos = 0; - lnn_start = lnn; - while (i < (int)blk.sz) { + memset(&ln, 0, sizeof(struct buf)); + + lnn = 1; /* line number in the blk buffer */ + pos = 0; /* byte number in the ln buffer */ + + for (i = 0; i < (int)blk.sz;) { + if (start) + lnn_start = lnn; + + while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) { if ('\n' == blk.buf[i]) { ++i; ++lnn; @@ -661,21 +675,32 @@ pdesc(struct curparse *curp) */ of = 0; - do { - re = roff_parseln(roff, lnn_start, - &ln.buf, &ln.sz, of, &of); - } while (ROFF_RERUN == re); - - if (ROFF_IGN == re) { +rerun: + switch (roff_parseln(roff, lnn_start, &ln.buf, &ln.sz, + of, &of)) { + case (ROFF_REPARSE): + parsebuf(curp, ln, 0); + pos = 0; continue; - } else if (ROFF_ERR == re) { + case (ROFF_APPEND): + pos = strlen(ln.buf); + continue; + case (ROFF_RERUN): + goto rerun; + case (ROFF_IGN): + pos = 0; + continue; + case (ROFF_ERR): assert(MANDOCLEVEL_FATAL <= exit_status); break; - } else if (ROFF_SO == re) { - if (pfile(ln.buf + of, curp, lnn_start)) + case (ROFF_SO): + if (pfile(ln.buf + of, curp, lnn_start)) { + pos = 0; continue; - else + } else break; + case (ROFF_CONT): + break; } /* @@ -698,13 +723,16 @@ pdesc(struct curparse *curp) assert(MANDOCLEVEL_FATAL <= exit_status); break; } + + /* Temporary buffers typically are not full. */ + if (0 == start && '\0' == blk.buf[i]) + break; + + /* Start the next input line. */ + pos = 0; } free(ln.buf); - if (with_mmap) - munmap(blk.buf, blk.sz); - else - free(blk.buf); } Index: roff.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/roff.c,v retrieving revision 1.15 diff -u -p -r1.15 roff.c --- roff.c 26 Oct 2010 23:34:38 -0000 1.15 +++ roff.c 22 Nov 2010 23:19:18 -0000 @@ -7,9 +7,9 @@ * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * - * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHORS DISCLAIM ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF - * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF @@ -54,6 +54,7 @@ enum rofft { ROFF_tr, ROFF_cblock, ROFF_ccond, /* FIXME: remove this. */ + ROFF_USERDEF, ROFF_MAX }; @@ -76,7 +77,8 @@ struct roff { enum roffrule rstack[RSTACK_MAX]; /* stack of !`ie' rules */ int rstackpos; /* position in rstack */ struct regset *regs; /* read/writable registers */ - struct roffstr *first_string; + struct roffstr *first_string; /* user-defined strings & macros */ + const char *current_string; /* value of last called user macro */ }; struct roffnode { @@ -84,6 +86,7 @@ struct roffnode { struct roffnode *parent; /* up one in stack */ int line; /* parse line */ int col; /* parse col */ + char *name; /* node name, e.g. macro name */ char *end; /* end-rules: custom token */ int endspan; /* end-rules: next-line or infty */ enum roffrule rule; /* current evaluation rule */ @@ -128,9 +131,9 @@ static enum rofferr roff_nr(ROFF_ARGS); static int roff_res(struct roff *, char **, size_t *, int); static void roff_setstr(struct roff *, - const char *, const char *); + const char *, const char *, int); static enum rofferr roff_so(ROFF_ARGS); -static char *roff_strdup(const char *); +static enum rofferr roff_userdef(ROFF_ARGS); /* See roff_hash_find() */ @@ -158,16 +161,17 @@ static struct roffmac roffs[ROFF_MAX] = { "tr", roff_line, NULL, NULL, 0, NULL }, { ".", roff_cblock, NULL, NULL, 0, NULL }, { "\\}", roff_ccond, NULL, NULL, 0, NULL }, + { NULL, roff_userdef, NULL, NULL, 0, NULL }, }; static void roff_free1(struct roff *); -static enum rofft roff_hash_find(const char *); +static enum rofft roff_hash_find(const char *, size_t); static void roff_hash_init(void); static void roffnode_cleanscope(struct roff *); -static void roffnode_push(struct roff *, - enum rofft, int, int); +static void roffnode_push(struct roff *, enum rofft, + const char *, int, int); static void roffnode_pop(struct roff *); -static enum rofft roff_parse(const char *, int *); +static enum rofft roff_parse(struct roff *, const char *, int *); static int roff_parse_nat(const char *, unsigned int *); /* See roff_hash_find() */ @@ -179,7 +183,7 @@ roff_hash_init(void) struct roffmac *n; int buc, i; - for (i = 0; i < (int)ROFF_MAX; i++) { + for (i = 0; i < (int)ROFF_USERDEF; i++) { assert(roffs[i].name[0] >= ASCII_LO); assert(roffs[i].name[0] <= ASCII_HI); @@ -200,7 +204,7 @@ roff_hash_init(void) * the nil-terminated string name could be found. */ static enum rofft -roff_hash_find(const char *p) +roff_hash_find(const char *p, size_t s) { int buc; struct roffmac *n; @@ -220,7 +224,7 @@ roff_hash_find(const char *p) if (NULL == (n = hash[buc])) return(ROFF_MAX); for ( ; n; n = n->next) - if (0 == strcmp(n->name, p)) + if (0 == strncmp(n->name, p, s) && '\0' == n->name[(int)s]) return((enum rofft)(n - roffs)); return(ROFF_MAX); @@ -244,8 +248,8 @@ roffnode_pop(struct roff *r) r->rstackpos--; r->last = r->last->parent; - if (p->end) - free(p->end); + free(p->name); + free(p->end); free(p); } @@ -255,12 +259,15 @@ roffnode_pop(struct roff *r) * removed with roffnode_pop(). */ static void -roffnode_push(struct roff *r, enum rofft tok, int line, int col) +roffnode_push(struct roff *r, enum rofft tok, const char *name, + int line, int col) { struct roffnode *p; p = mandoc_calloc(1, sizeof(struct roffnode)); p->tok = tok; + if (name) + p->name = mandoc_strdup(name); p->parent = r->last; p->line = line; p->col = col; @@ -392,7 +399,7 @@ roff_parseln(struct roff *r, int ln, cha */ if (r->first_string && ! roff_res(r, bufp, szp, pos)) - return(ROFF_RERUN); + return(ROFF_REPARSE); /* * First, if a scope is open and we're not a macro, pass the @@ -429,7 +436,7 @@ roff_parseln(struct roff *r, int ln, cha */ ppos = pos; - if (ROFF_MAX == (t = roff_parse(*bufp, &pos))) + if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos))) return(ROFF_CONT); assert(roffs[t].proc); @@ -455,35 +462,28 @@ roff_endparse(struct roff *r) * form of ".foo xxx" in the usual way. */ static enum rofft -roff_parse(const char *buf, int *pos) +roff_parse(struct roff *r, const char *buf, int *pos) { - int j; - char mac[5]; + const char *mac; + size_t maclen; enum rofft t; assert(ROFF_CTL(buf[*pos])); (*pos)++; - while (buf[*pos] && (' ' == buf[*pos] || '\t' == buf[*pos])) + while (' ' == buf[*pos] || '\t' == buf[*pos]) (*pos)++; if ('\0' == buf[*pos]) return(ROFF_MAX); - for (j = 0; j < 4; j++, (*pos)++) - if ('\0' == (mac[j] = buf[*pos])) - break; - else if (' ' == buf[*pos] || (j && '\\' == buf[*pos])) - break; - - if (j == 4 || j < 1) - return(ROFF_MAX); - - mac[j] = '\0'; + mac = buf + *pos; + maclen = strcspn(mac, " \\\t\0"); - if (ROFF_MAX == (t = roff_hash_find(mac))) - return(t); + t = (r->current_string = roff_getstrn(r, mac, maclen)) + ? ROFF_USERDEF : roff_hash_find(mac, maclen); + *pos += maclen; while (buf[*pos] && ' ' == buf[*pos]) (*pos)++; @@ -617,19 +617,32 @@ roff_block(ROFF_ARGS) { int sv; size_t sz; + char *name; - if (ROFF_ig != tok && '\0' == (*bufp)[pos]) { - if ( ! (*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL)) - return(ROFF_ERR); - return(ROFF_IGN); - } else if (ROFF_ig != tok) { + name = NULL; + + if (ROFF_ig != tok) { + if ('\0' == (*bufp)[pos]) { + (*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL); + return(ROFF_IGN); + } + if (ROFF_de == tok) + name = *bufp + pos; while ((*bufp)[pos] && ' ' != (*bufp)[pos]) pos++; while (' ' == (*bufp)[pos]) - pos++; + (*bufp)[pos++] = '\0'; } - roffnode_push(r, tok, ln, ppos); + roffnode_push(r, tok, name, ln, ppos); + + /* + * At the beginning of a `de' macro, clear the existing string + * with the same name, if there is one. New content will be + * added from roff_block_text() in multiline mode. + */ + if (ROFF_de == tok) + roff_setstr(r, name, NULL, 0); if ('\0' == (*bufp)[pos]) return(ROFF_IGN); @@ -696,7 +709,7 @@ roff_block_sub(ROFF_ARGS) roffnode_pop(r); roffnode_cleanscope(r); - if (ROFF_MAX != roff_parse(*bufp, &pos)) + if (ROFF_MAX != roff_parse(r, *bufp, &pos)) return(ROFF_RERUN); return(ROFF_IGN); } @@ -708,11 +721,17 @@ roff_block_sub(ROFF_ARGS) */ ppos = pos; - t = roff_parse(*bufp, &pos); + t = roff_parse(r, *bufp, &pos); - /* If we're not a comment-end, then throw it away. */ - if (ROFF_cblock != t) + /* + * Macros other than block-end are only significant + * in `de' blocks; elsewhere, simply throw them away. + */ + if (ROFF_cblock != t) { + if (ROFF_de == tok) + roff_setstr(r, r->last->name, *bufp + ppos, 1); return(ROFF_IGN); + } assert(roffs[t].proc); return((*roffs[t].proc)(r, t, bufp, szp, @@ -725,6 +744,9 @@ static enum rofferr roff_block_text(ROFF_ARGS) { + if (ROFF_de == tok) + roff_setstr(r, r->last->name, *bufp + pos, 1); + return(ROFF_IGN); } @@ -746,7 +768,7 @@ roff_cond_sub(ROFF_ARGS) roffnode_cleanscope(r); - if (ROFF_MAX == (t = roff_parse(*bufp, &pos))) { + if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos))) { if ('\\' == (*bufp)[pos] && '}' == (*bufp)[pos + 1]) return(roff_ccond (r, ROFF_ccond, bufp, szp, @@ -880,7 +902,7 @@ roff_cond(ROFF_ARGS) return(ROFF_ERR); } - roffnode_push(r, tok, ln, ppos); + roffnode_push(r, tok, NULL, ln, ppos); r->last->rule = rule; @@ -967,7 +989,7 @@ roff_ds(ROFF_ARGS) string++; /* The rest is the value. */ - roff_setstr(r, name, string); + roff_setstr(r, name, string, 0); return(ROFF_IGN); } @@ -1030,48 +1052,86 @@ roff_so(ROFF_ARGS) } -static char * -roff_strdup(const char *name) +/* ARGSUSED */ +static enum rofferr +roff_userdef(ROFF_ARGS) { - char *namecopy, *sv; - /* - * This isn't a nice simple mandoc_strdup() because we must - * handle roff's stupid double-escape rule. - */ - sv = namecopy = mandoc_malloc(strlen(name) + 1); - while (*name) { - if ('\\' == *name && '\\' == *(name + 1)) - name++; - *namecopy++ = *name++; - } + free(*bufp); + *bufp = mandoc_strdup(r->current_string); + *szp = strlen(*bufp); - *namecopy = '\0'; - return(sv); + return(*szp && '\n' == (*bufp)[(int)*szp - 1] ? + ROFF_REPARSE : ROFF_APPEND); } - +/* + * Store *string into the user-defined string called *name. + * In multiline mode, append to an existing entry and append '\n'; + * else replace the existing entry, if there is one. + * To clear an existing entry, call with (*r, *name, NULL, 0). + */ static void -roff_setstr(struct roff *r, const char *name, const char *string) +roff_setstr(struct roff *r, const char *name, const char *string, + int multiline) { struct roffstr *n; - char *namecopy; + char *c; + size_t oldch, newch; + /* Search for an existing string with the same name. */ n = r->first_string; while (n && strcmp(name, n->name)) n = n->next; if (NULL == n) { - namecopy = mandoc_strdup(name); + /* Create a new string table entry. */ n = mandoc_malloc(sizeof(struct roffstr)); - n->name = namecopy; + n->name = mandoc_strdup(name); + n->string = NULL; n->next = r->first_string; r->first_string = n; - } else + } else if (0 == multiline) { + /* In multiline mode, append; else replace. */ free(n->string); + n->string = NULL; + } + + if (NULL == string) + return; + + /* + * One additional byte for the '\n' in multiline mode, + * and one for the terminating '\0'. + */ + newch = strlen(string) + (multiline ? 2 : 1); + if (NULL == n->string) { + n->string = mandoc_malloc(newch); + *n->string = '\0'; + oldch = 0; + } else { + oldch = strlen(n->string); + n->string = mandoc_realloc(n->string, oldch + newch); + } + + /* Skip existing content in the destination buffer. */ + c = n->string + oldch; + + /* Append new content to the destination buffer. */ + while (*string) { + /* + * Rudimentary roff copy mode: + * Handle escaped backslashes. + */ + if ('\\' == *string && '\\' == *(string + 1)) + string++; + *c++ = *string++; + } - /* Don't use mandoc_strdup: clean out double-escapes. */ - n->string = string ? roff_strdup(string) : NULL; + /* Append terminating bytes. */ + if (multiline) + *c++ = '\n'; + *c = '\0'; } Index: roff.h =================================================================== RCS file: /cvs/src/usr.bin/mandoc/roff.h,v retrieving revision 1.5 diff -u -p -r1.5 roff.h --- roff.h 26 Oct 2010 22:28:57 -0000 1.5 +++ roff.h 22 Nov 2010 23:19:18 -0000 @@ -20,6 +20,8 @@ enum rofferr { ROFF_CONT, /* continue processing line */ ROFF_RERUN, /* re-run roff interpreter with offset */ + ROFF_APPEND, /* re-run main parser, appending next line */ + ROFF_REPARSE, /* re-run main parser on the result */ ROFF_SO, /* include another file */ ROFF_IGN, /* ignore current line */ ROFF_ERR /* badness: puke and stop */ -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv