From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Cc: jmc@openbsd.org
Subject: implement .de
Date: Tue, 23 Nov 2010 00:29:55 +0100 [thread overview]
Message-ID: <20101122232955.GA17247@iris.usta.de> (raw)
Hi,
as i hoped, i did find a bit of time to finish the first step of the
.de implementation that i started on the train back from Budapest.
I'm including Jason because this is relevant new functionality,
even though we don't need it for base. In case we digress into
implementation technicalities, we can probably drop him from the Cc:.
The patch against OpenBSD shown below implements:
- defining roff macros using
.de mymacro
definition
..
The definition may be one or more lines.
It may or may not contain macros; typically, it will.
Custom end macros are also supported.
- using roff macros as in
.mymacro
Variable macro arguments are not yet supported.
When this goes in, variable arguments will be the next step.
- defining roff strings using
.ds mystring replacement
This is not new, but still works.
- using roff strings as in
\*[mystring]
This is not new, but still works.
- using macros as strings as in
\*[mymacro]
For example,
.de mine
.I italic
.B bold
roman
..
.TH DSMULTI 1 "October 31, 2010"
.SH NAME
dsmulti \- multi-line string usage
.SH DESCRIPTION
prefix text (\*[mine]) postfix test
.nf
prefix text (\*[mine]) postfix test
.fi
- using strings as macros as in
.mystring
Note that the next line will be appended to the expanded value.
Here is my favourite:
.ds mine .S
.TH DSMACRO 1 "October 31, 2010"
.mine
H NAME
dsmacro \- abusing strings as macros
.mine
H DESCRIPTION
text
Changes to roff.c:
* Handling of macro definitions:
- struct roffnode needs a new member *name for .de nodes:
The name must be fed into roffnode_push(),
and it must be freed in roffnode_pop().
- roff_block[|_sub|_text]() get handling code for .de.
- roff_setstr() needs an additional argument to distinguish
between single-line strings (.ds) and multi-line strings (.de).
- roff_strdup() becomes insufficient and obsolete
and can be integrated into roff_setstr().
* Handling of user-defined macros:
- roff_hash_find() needs an additional size parameter
because user-defined macros can be of arbitrary length,
so we cannot use a static mac[] buffer any longer.
- enum rofft needs a new entry ROFF_USERDEF
and we need a new handler function roff_userdef().
- struct roff needs a new member *current_string
because it is set in roff_parse() and used in roff_userdef();
for the same reason, roff_parse() needs access to struct roff.
Changes to main.c:
* Function parsebuf() was split out of pdesc()
because ROFF_REPARSE needs to call this part recursively.
* Handling of the roff_parseln() result becomes a switch
because there are now many possibilities.
You may wonder why we need so many different reprocessing flags
instead of just ROFF_RERUN in the past. Indeed, all three are
substantially different and all three are actually needed,
and i fail to see how i could subsume any of them
by either of the two others:
ROFF_REPARSE must be used:
- for \* string inclusion
because strings defined with .de may contain multiple lines;
- when calling user-defined macros defined with .de
because these may contain multiple lines.
ROFF_APPEND must be used:
- when calling user-defined macros defined with .ds.
As these are defined with .ds, they never contain multiple lines,
so not using ROFF_REPARSE is OK; as the next line must be appended,
calling parsebuf() is not an option.
ROFF_RERUN must be used:
- on user-defined end macros;
- after evaluating conditionals.
- In both cases, ROFF_REPARSE is not needed because we know there
is just a single line, and ROFF_APPEND would be wrong because we
do not want to append the next line.
Yours,
Ingo
Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.54
diff -u -p -r1.54 main.c
--- main.c 26 Oct 2010 23:34:38 -0000 1.54
+++ main.c 22 Nov 2010 23:19:18 -0000
@@ -190,6 +190,7 @@ static const char * const mandocerrs[MAN
"static buffer exhausted",
};
+static void parsebuf(struct curparse *, struct buf, int);
static void pdesc(struct curparse *);
static void fdesc(struct curparse *);
static void ffile(const char *, struct curparse *);
@@ -552,20 +553,8 @@ fdesc(struct curparse *curp)
static void
pdesc(struct curparse *curp)
{
- struct buf ln, blk;
- int i, pos, lnn, lnn_start, with_mmap, of;
- enum rofferr re;
- unsigned char c;
- struct man *man;
- struct mdoc *mdoc;
- struct roff *roff;
-
- memset(&ln, 0, sizeof(struct buf));
-
- /*
- * Two buffers: ln and buf. buf is the input file and may be
- * memory mapped. ln is a line buffer and grows on-demand.
- */
+ struct buf blk;
+ int with_mmap;
if ( ! read_whole_file(curp, &blk, &with_mmap)) {
exit_status = MANDOCLEVEL_SYSERR;
@@ -575,14 +564,39 @@ pdesc(struct curparse *curp)
if (NULL == curp->roff)
curp->roff = roff_alloc(&curp->regs, curp, mmsg);
assert(curp->roff);
- roff = curp->roff;
- mdoc = curp->mdoc;
+
+ parsebuf(curp, blk, 1);
+
+ if (with_mmap)
+ munmap(blk.buf, blk.sz);
+ else
+ free(blk.buf);
+}
+
+static void
+parsebuf(struct curparse *curp, struct buf blk, int start)
+{
+ struct buf ln;
+ int i, pos, lnn, lnn_start, of;
+ unsigned char c;
+ struct man *man;
+ struct mdoc *mdoc;
+ struct roff *roff;
+
man = curp->man;
+ mdoc = curp->mdoc;
+ roff = curp->roff;
- for (i = 0, lnn = 1; i < (int)blk.sz;) {
- pos = 0;
- lnn_start = lnn;
- while (i < (int)blk.sz) {
+ memset(&ln, 0, sizeof(struct buf));
+
+ lnn = 1; /* line number in the blk buffer */
+ pos = 0; /* byte number in the ln buffer */
+
+ for (i = 0; i < (int)blk.sz;) {
+ if (start)
+ lnn_start = lnn;
+
+ while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
if ('\n' == blk.buf[i]) {
++i;
++lnn;
@@ -661,21 +675,32 @@ pdesc(struct curparse *curp)
*/
of = 0;
- do {
- re = roff_parseln(roff, lnn_start,
- &ln.buf, &ln.sz, of, &of);
- } while (ROFF_RERUN == re);
-
- if (ROFF_IGN == re) {
+rerun:
+ switch (roff_parseln(roff, lnn_start, &ln.buf, &ln.sz,
+ of, &of)) {
+ case (ROFF_REPARSE):
+ parsebuf(curp, ln, 0);
+ pos = 0;
continue;
- } else if (ROFF_ERR == re) {
+ case (ROFF_APPEND):
+ pos = strlen(ln.buf);
+ continue;
+ case (ROFF_RERUN):
+ goto rerun;
+ case (ROFF_IGN):
+ pos = 0;
+ continue;
+ case (ROFF_ERR):
assert(MANDOCLEVEL_FATAL <= exit_status);
break;
- } else if (ROFF_SO == re) {
- if (pfile(ln.buf + of, curp, lnn_start))
+ case (ROFF_SO):
+ if (pfile(ln.buf + of, curp, lnn_start)) {
+ pos = 0;
continue;
- else
+ } else
break;
+ case (ROFF_CONT):
+ break;
}
/*
@@ -698,13 +723,16 @@ pdesc(struct curparse *curp)
assert(MANDOCLEVEL_FATAL <= exit_status);
break;
}
+
+ /* Temporary buffers typically are not full. */
+ if (0 == start && '\0' == blk.buf[i])
+ break;
+
+ /* Start the next input line. */
+ pos = 0;
}
free(ln.buf);
- if (with_mmap)
- munmap(blk.buf, blk.sz);
- else
- free(blk.buf);
}
Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.15
diff -u -p -r1.15 roff.c
--- roff.c 26 Oct 2010 23:34:38 -0000 1.15
+++ roff.c 22 Nov 2010 23:19:18 -0000
@@ -7,9 +7,9 @@
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
- * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHORS DISCLAIM ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
- * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
@@ -54,6 +54,7 @@ enum rofft {
ROFF_tr,
ROFF_cblock,
ROFF_ccond, /* FIXME: remove this. */
+ ROFF_USERDEF,
ROFF_MAX
};
@@ -76,7 +77,8 @@ struct roff {
enum roffrule rstack[RSTACK_MAX]; /* stack of !`ie' rules */
int rstackpos; /* position in rstack */
struct regset *regs; /* read/writable registers */
- struct roffstr *first_string;
+ struct roffstr *first_string; /* user-defined strings & macros */
+ const char *current_string; /* value of last called user macro */
};
struct roffnode {
@@ -84,6 +86,7 @@ struct roffnode {
struct roffnode *parent; /* up one in stack */
int line; /* parse line */
int col; /* parse col */
+ char *name; /* node name, e.g. macro name */
char *end; /* end-rules: custom token */
int endspan; /* end-rules: next-line or infty */
enum roffrule rule; /* current evaluation rule */
@@ -128,9 +131,9 @@ static enum rofferr roff_nr(ROFF_ARGS);
static int roff_res(struct roff *,
char **, size_t *, int);
static void roff_setstr(struct roff *,
- const char *, const char *);
+ const char *, const char *, int);
static enum rofferr roff_so(ROFF_ARGS);
-static char *roff_strdup(const char *);
+static enum rofferr roff_userdef(ROFF_ARGS);
/* See roff_hash_find() */
@@ -158,16 +161,17 @@ static struct roffmac roffs[ROFF_MAX] =
{ "tr", roff_line, NULL, NULL, 0, NULL },
{ ".", roff_cblock, NULL, NULL, 0, NULL },
{ "\\}", roff_ccond, NULL, NULL, 0, NULL },
+ { NULL, roff_userdef, NULL, NULL, 0, NULL },
};
static void roff_free1(struct roff *);
-static enum rofft roff_hash_find(const char *);
+static enum rofft roff_hash_find(const char *, size_t);
static void roff_hash_init(void);
static void roffnode_cleanscope(struct roff *);
-static void roffnode_push(struct roff *,
- enum rofft, int, int);
+static void roffnode_push(struct roff *, enum rofft,
+ const char *, int, int);
static void roffnode_pop(struct roff *);
-static enum rofft roff_parse(const char *, int *);
+static enum rofft roff_parse(struct roff *, const char *, int *);
static int roff_parse_nat(const char *, unsigned int *);
/* See roff_hash_find() */
@@ -179,7 +183,7 @@ roff_hash_init(void)
struct roffmac *n;
int buc, i;
- for (i = 0; i < (int)ROFF_MAX; i++) {
+ for (i = 0; i < (int)ROFF_USERDEF; i++) {
assert(roffs[i].name[0] >= ASCII_LO);
assert(roffs[i].name[0] <= ASCII_HI);
@@ -200,7 +204,7 @@ roff_hash_init(void)
* the nil-terminated string name could be found.
*/
static enum rofft
-roff_hash_find(const char *p)
+roff_hash_find(const char *p, size_t s)
{
int buc;
struct roffmac *n;
@@ -220,7 +224,7 @@ roff_hash_find(const char *p)
if (NULL == (n = hash[buc]))
return(ROFF_MAX);
for ( ; n; n = n->next)
- if (0 == strcmp(n->name, p))
+ if (0 == strncmp(n->name, p, s) && '\0' == n->name[(int)s])
return((enum rofft)(n - roffs));
return(ROFF_MAX);
@@ -244,8 +248,8 @@ roffnode_pop(struct roff *r)
r->rstackpos--;
r->last = r->last->parent;
- if (p->end)
- free(p->end);
+ free(p->name);
+ free(p->end);
free(p);
}
@@ -255,12 +259,15 @@ roffnode_pop(struct roff *r)
* removed with roffnode_pop().
*/
static void
-roffnode_push(struct roff *r, enum rofft tok, int line, int col)
+roffnode_push(struct roff *r, enum rofft tok, const char *name,
+ int line, int col)
{
struct roffnode *p;
p = mandoc_calloc(1, sizeof(struct roffnode));
p->tok = tok;
+ if (name)
+ p->name = mandoc_strdup(name);
p->parent = r->last;
p->line = line;
p->col = col;
@@ -392,7 +399,7 @@ roff_parseln(struct roff *r, int ln, cha
*/
if (r->first_string && ! roff_res(r, bufp, szp, pos))
- return(ROFF_RERUN);
+ return(ROFF_REPARSE);
/*
* First, if a scope is open and we're not a macro, pass the
@@ -429,7 +436,7 @@ roff_parseln(struct roff *r, int ln, cha
*/
ppos = pos;
- if (ROFF_MAX == (t = roff_parse(*bufp, &pos)))
+ if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos)))
return(ROFF_CONT);
assert(roffs[t].proc);
@@ -455,35 +462,28 @@ roff_endparse(struct roff *r)
* form of ".foo xxx" in the usual way.
*/
static enum rofft
-roff_parse(const char *buf, int *pos)
+roff_parse(struct roff *r, const char *buf, int *pos)
{
- int j;
- char mac[5];
+ const char *mac;
+ size_t maclen;
enum rofft t;
assert(ROFF_CTL(buf[*pos]));
(*pos)++;
- while (buf[*pos] && (' ' == buf[*pos] || '\t' == buf[*pos]))
+ while (' ' == buf[*pos] || '\t' == buf[*pos])
(*pos)++;
if ('\0' == buf[*pos])
return(ROFF_MAX);
- for (j = 0; j < 4; j++, (*pos)++)
- if ('\0' == (mac[j] = buf[*pos]))
- break;
- else if (' ' == buf[*pos] || (j && '\\' == buf[*pos]))
- break;
-
- if (j == 4 || j < 1)
- return(ROFF_MAX);
-
- mac[j] = '\0';
+ mac = buf + *pos;
+ maclen = strcspn(mac, " \\\t\0");
- if (ROFF_MAX == (t = roff_hash_find(mac)))
- return(t);
+ t = (r->current_string = roff_getstrn(r, mac, maclen))
+ ? ROFF_USERDEF : roff_hash_find(mac, maclen);
+ *pos += maclen;
while (buf[*pos] && ' ' == buf[*pos])
(*pos)++;
@@ -617,19 +617,32 @@ roff_block(ROFF_ARGS)
{
int sv;
size_t sz;
+ char *name;
- if (ROFF_ig != tok && '\0' == (*bufp)[pos]) {
- if ( ! (*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL))
- return(ROFF_ERR);
- return(ROFF_IGN);
- } else if (ROFF_ig != tok) {
+ name = NULL;
+
+ if (ROFF_ig != tok) {
+ if ('\0' == (*bufp)[pos]) {
+ (*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL);
+ return(ROFF_IGN);
+ }
+ if (ROFF_de == tok)
+ name = *bufp + pos;
while ((*bufp)[pos] && ' ' != (*bufp)[pos])
pos++;
while (' ' == (*bufp)[pos])
- pos++;
+ (*bufp)[pos++] = '\0';
}
- roffnode_push(r, tok, ln, ppos);
+ roffnode_push(r, tok, name, ln, ppos);
+
+ /*
+ * At the beginning of a `de' macro, clear the existing string
+ * with the same name, if there is one. New content will be
+ * added from roff_block_text() in multiline mode.
+ */
+ if (ROFF_de == tok)
+ roff_setstr(r, name, NULL, 0);
if ('\0' == (*bufp)[pos])
return(ROFF_IGN);
@@ -696,7 +709,7 @@ roff_block_sub(ROFF_ARGS)
roffnode_pop(r);
roffnode_cleanscope(r);
- if (ROFF_MAX != roff_parse(*bufp, &pos))
+ if (ROFF_MAX != roff_parse(r, *bufp, &pos))
return(ROFF_RERUN);
return(ROFF_IGN);
}
@@ -708,11 +721,17 @@ roff_block_sub(ROFF_ARGS)
*/
ppos = pos;
- t = roff_parse(*bufp, &pos);
+ t = roff_parse(r, *bufp, &pos);
- /* If we're not a comment-end, then throw it away. */
- if (ROFF_cblock != t)
+ /*
+ * Macros other than block-end are only significant
+ * in `de' blocks; elsewhere, simply throw them away.
+ */
+ if (ROFF_cblock != t) {
+ if (ROFF_de == tok)
+ roff_setstr(r, r->last->name, *bufp + ppos, 1);
return(ROFF_IGN);
+ }
assert(roffs[t].proc);
return((*roffs[t].proc)(r, t, bufp, szp,
@@ -725,6 +744,9 @@ static enum rofferr
roff_block_text(ROFF_ARGS)
{
+ if (ROFF_de == tok)
+ roff_setstr(r, r->last->name, *bufp + pos, 1);
+
return(ROFF_IGN);
}
@@ -746,7 +768,7 @@ roff_cond_sub(ROFF_ARGS)
roffnode_cleanscope(r);
- if (ROFF_MAX == (t = roff_parse(*bufp, &pos))) {
+ if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos))) {
if ('\\' == (*bufp)[pos] && '}' == (*bufp)[pos + 1])
return(roff_ccond
(r, ROFF_ccond, bufp, szp,
@@ -880,7 +902,7 @@ roff_cond(ROFF_ARGS)
return(ROFF_ERR);
}
- roffnode_push(r, tok, ln, ppos);
+ roffnode_push(r, tok, NULL, ln, ppos);
r->last->rule = rule;
@@ -967,7 +989,7 @@ roff_ds(ROFF_ARGS)
string++;
/* The rest is the value. */
- roff_setstr(r, name, string);
+ roff_setstr(r, name, string, 0);
return(ROFF_IGN);
}
@@ -1030,48 +1052,86 @@ roff_so(ROFF_ARGS)
}
-static char *
-roff_strdup(const char *name)
+/* ARGSUSED */
+static enum rofferr
+roff_userdef(ROFF_ARGS)
{
- char *namecopy, *sv;
- /*
- * This isn't a nice simple mandoc_strdup() because we must
- * handle roff's stupid double-escape rule.
- */
- sv = namecopy = mandoc_malloc(strlen(name) + 1);
- while (*name) {
- if ('\\' == *name && '\\' == *(name + 1))
- name++;
- *namecopy++ = *name++;
- }
+ free(*bufp);
+ *bufp = mandoc_strdup(r->current_string);
+ *szp = strlen(*bufp);
- *namecopy = '\0';
- return(sv);
+ return(*szp && '\n' == (*bufp)[(int)*szp - 1] ?
+ ROFF_REPARSE : ROFF_APPEND);
}
-
+/*
+ * Store *string into the user-defined string called *name.
+ * In multiline mode, append to an existing entry and append '\n';
+ * else replace the existing entry, if there is one.
+ * To clear an existing entry, call with (*r, *name, NULL, 0).
+ */
static void
-roff_setstr(struct roff *r, const char *name, const char *string)
+roff_setstr(struct roff *r, const char *name, const char *string,
+ int multiline)
{
struct roffstr *n;
- char *namecopy;
+ char *c;
+ size_t oldch, newch;
+ /* Search for an existing string with the same name. */
n = r->first_string;
while (n && strcmp(name, n->name))
n = n->next;
if (NULL == n) {
- namecopy = mandoc_strdup(name);
+ /* Create a new string table entry. */
n = mandoc_malloc(sizeof(struct roffstr));
- n->name = namecopy;
+ n->name = mandoc_strdup(name);
+ n->string = NULL;
n->next = r->first_string;
r->first_string = n;
- } else
+ } else if (0 == multiline) {
+ /* In multiline mode, append; else replace. */
free(n->string);
+ n->string = NULL;
+ }
+
+ if (NULL == string)
+ return;
+
+ /*
+ * One additional byte for the '\n' in multiline mode,
+ * and one for the terminating '\0'.
+ */
+ newch = strlen(string) + (multiline ? 2 : 1);
+ if (NULL == n->string) {
+ n->string = mandoc_malloc(newch);
+ *n->string = '\0';
+ oldch = 0;
+ } else {
+ oldch = strlen(n->string);
+ n->string = mandoc_realloc(n->string, oldch + newch);
+ }
+
+ /* Skip existing content in the destination buffer. */
+ c = n->string + oldch;
+
+ /* Append new content to the destination buffer. */
+ while (*string) {
+ /*
+ * Rudimentary roff copy mode:
+ * Handle escaped backslashes.
+ */
+ if ('\\' == *string && '\\' == *(string + 1))
+ string++;
+ *c++ = *string++;
+ }
- /* Don't use mandoc_strdup: clean out double-escapes. */
- n->string = string ? roff_strdup(string) : NULL;
+ /* Append terminating bytes. */
+ if (multiline)
+ *c++ = '\n';
+ *c = '\0';
}
Index: roff.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.h,v
retrieving revision 1.5
diff -u -p -r1.5 roff.h
--- roff.h 26 Oct 2010 22:28:57 -0000 1.5
+++ roff.h 22 Nov 2010 23:19:18 -0000
@@ -20,6 +20,8 @@
enum rofferr {
ROFF_CONT, /* continue processing line */
ROFF_RERUN, /* re-run roff interpreter with offset */
+ ROFF_APPEND, /* re-run main parser, appending next line */
+ ROFF_REPARSE, /* re-run main parser on the result */
ROFF_SO, /* include another file */
ROFF_IGN, /* ignore current line */
ROFF_ERR /* badness: puke and stop */
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
reply other threads:[~2010-11-22 23:30 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101122232955.GA17247@iris.usta.de \
--to=schwarze@usta.de \
--cc=jmc@openbsd.org \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).