tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* implement .de, now with argument expansion
@ 2010-11-24  0:31 Ingo Schwarze
  0 siblings, 0 replies; only message in thread
From: Ingo Schwarze @ 2010-11-24  0:31 UTC (permalink / raw)
  To: tech; +Cc: matthieu, jmc

Hi,

here is an updated version of the patch i sent yesterday.

I'm also cc:ing matthieu@ because this addresses one of the most
glaring issues we still have with the Xenocara manuals.
For example, it fixes XtVaCreateArgsList(3) containing stuff like

  .de ZN
  .ie t \fB\^\\$1\^\fR\\$2
  .el \fI\^\\$1\^\fP\\$2
  ..

and then

  .ZN XtVaNestedList .

to produce the following man(7) input code:

  \fI\^XtVaNestedList\^\fP.

Mandoc still builds base and i don't see regressions (though more eyes
can't hurt).  Unless problems turn up, i'm planning to commit this
to OpenBSD on Thursday.

Changes since the first version of this patch:
To roff.c:
 - Implement argument expansion in roff_userdef().
To main.c (fixes for two regressions caused by the first patch):
 - Fix a bug that could make parsebuf() loop endlessly.
   Leave the for() loop when we hit a \0 byte in the blk buffer
   and the ln buffer has been fully handled as well.
 - Move lnn_start, which was local in parsebuf(), to become
   a member of struct curparse.  The logic was wrong, and it
   could end up being used uninitialized.

Yours,
  Ingo


Here is a copy of my explanation of the first version of the patch,
and after that, the current version of the patch itself:


----- Forwarded message from Ingo Schwarze <schwarze@usta.de> -----

From: Ingo Schwarze <schwarze@usta.de>
Date: Tue, 23 Nov 2010 00:29:55 +0100
To: tech@mdocml.bsd.lv
Cc: jmc@openbsd.org
Subject: implement .de

Hi,

as i hoped, i did find a bit of time to finish the first step of the
.de implementation that i started on the train back from Budapest.

I'm including Jason because this is relevant new functionality,
even though we don't need it for base.  In case we digress into
implementation technicalities, we can probably drop him from the Cc:.


The patch against OpenBSD shown below implements:

 - defining roff macros using

   .de mymacro
   definition
   ..

   The definition may be one or more lines.
   It may or may not contain macros; typically, it will.
   Custom end macros are also supported.

 - using roff macros as in

   .mymacro

   Variable macro arguments are not yet supported.
   When this goes in, variable arguments will be the next step.

 - defining roff strings using

   .ds mystring replacement

   This is not new, but still works.

 - using roff strings as in

   \*[mystring]

   This is not new, but still works.

 - using macros as strings as in

   \*[mymacro]

   For example,

.de mine
.I italic
.B bold
roman
..
.TH DSMULTI 1 "October 31, 2010"
.SH NAME
dsmulti \- multi-line string usage
.SH DESCRIPTION
prefix text (\*[mine]) postfix test
.nf
prefix text (\*[mine]) postfix test
.fi

 - using strings as macros as in

   .mystring

   Note that the next line will be appended to the expanded value.
   Here is my favourite:

.ds mine .S
.TH DSMACRO 1 "October 31, 2010"
.mine
H NAME
dsmacro \- abusing strings as macros
.mine
H DESCRIPTION
text


Changes to roff.c:
 * Handling of macro definitions:
    - struct roffnode needs a new member *name for .de nodes:
      The name must be fed into roffnode_push(),
      and it must be freed in roffnode_pop().
    - roff_block[|_sub|_text]() get handling code for .de.
    - roff_setstr() needs an additional argument to distinguish
      between single-line strings (.ds) and multi-line strings (.de).
    - roff_strdup() becomes insufficient and obsolete
      and can be integrated into roff_setstr().
 * Handling of user-defined macros:
    - roff_hash_find() needs an additional size parameter
      because user-defined macros can be of arbitrary length,
      so we cannot use a static mac[] buffer any longer.
    - enum rofft needs a new entry ROFF_USERDEF
      and we need a new handler function roff_userdef().
    - struct roff needs a new member *current_string
      because it is set in roff_parse() and used in roff_userdef();
      for the same reason, roff_parse() needs access to struct roff.

Changes to main.c:
 * Function parsebuf() was split out of pdesc()
   because ROFF_REPARSE needs to call this part recursively.
 * Handling of the roff_parseln() result becomes a switch
   because there are now many possibilities.

You may wonder why we need so many different reprocessing flags
instead of just ROFF_RERUN in the past.  Indeed, all three are
substantially different and all three are actually needed,
and i fail to see how i could subsume any of them
by either of the two others:

ROFF_REPARSE must be used:
 - for \* string inclusion
   because strings defined with .de may contain multiple lines;
 - when calling user-defined macros defined with .de
   because these may contain multiple lines.

ROFF_APPEND must be used:
 - when calling user-defined macros defined with .ds.
   As these are defined with .ds, they never contain multiple lines,
   so not using ROFF_REPARSE is OK; as the next line must be appended,
   calling parsebuf() is not an option.

ROFF_RERUN must be used:
 - on user-defined end macros;
 - after evaluating conditionals.
 - In both cases, ROFF_REPARSE is not needed because we know there
   is just a single line, and ROFF_APPEND would be wrong because we
   do not want to append the next line.

Yours,
  Ingo

----- End forwarded message -----


Index: roff.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.h,v
retrieving revision 1.5
diff -u -p -r1.5 roff.h
--- roff.h	26 Oct 2010 22:28:57 -0000	1.5
+++ roff.h	23 Nov 2010 23:55:16 -0000
@@ -20,6 +20,8 @@
 enum	rofferr {
 	ROFF_CONT, /* continue processing line */
 	ROFF_RERUN, /* re-run roff interpreter with offset */
+	ROFF_APPEND, /* re-run main parser, appending next line */
+	ROFF_REPARSE, /* re-run main parser on the result */
 	ROFF_SO, /* include another file */
 	ROFF_IGN, /* ignore current line */
 	ROFF_ERR /* badness: puke and stop */
Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.15
diff -u -p -r1.15 roff.c
--- roff.c	26 Oct 2010 23:34:38 -0000	1.15
+++ roff.c	23 Nov 2010 23:55:17 -0000
@@ -7,9 +7,9 @@
  * purpose with or without fee is hereby granted, provided that the above
  * copyright notice and this permission notice appear in all copies.
  *
- * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHORS DISCLAIM ALL WARRANTIES
  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
- * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR
  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
@@ -54,6 +54,7 @@ enum	rofft {
 	ROFF_tr,
 	ROFF_cblock,
 	ROFF_ccond, /* FIXME: remove this. */
+	ROFF_USERDEF,
 	ROFF_MAX
 };
 
@@ -76,7 +77,8 @@ struct	roff {
 	enum roffrule	 rstack[RSTACK_MAX]; /* stack of !`ie' rules */
 	int		 rstackpos; /* position in rstack */
 	struct regset	*regs; /* read/writable registers */
-	struct roffstr	*first_string;
+	struct roffstr	*first_string; /* user-defined strings & macros */
+	const char	*current_string; /* value of last called user macro */
 };
 
 struct	roffnode {
@@ -84,6 +86,7 @@ struct	roffnode {
 	struct roffnode	*parent; /* up one in stack */
 	int		 line; /* parse line */
 	int		 col; /* parse col */
+	char		*name; /* node name, e.g. macro name */
 	char		*end; /* end-rules: custom token */
 	int		 endspan; /* end-rules: next-line or infty */
 	enum roffrule	 rule; /* current evaluation rule */
@@ -128,9 +131,9 @@ static	enum rofferr	 roff_nr(ROFF_ARGS);
 static	int		 roff_res(struct roff *, 
 				char **, size_t *, int);
 static	void		 roff_setstr(struct roff *,
-				const char *, const char *);
+				const char *, const char *, int);
 static	enum rofferr	 roff_so(ROFF_ARGS);
-static	char		*roff_strdup(const char *);
+static	enum rofferr	 roff_userdef(ROFF_ARGS);
 
 /* See roff_hash_find() */
 
@@ -158,16 +161,17 @@ static	struct roffmac	 roffs[ROFF_MAX] =
 	{ "tr", roff_line, NULL, NULL, 0, NULL },
 	{ ".", roff_cblock, NULL, NULL, 0, NULL },
 	{ "\\}", roff_ccond, NULL, NULL, 0, NULL },
+	{ NULL, roff_userdef, NULL, NULL, 0, NULL },
 };
 
 static	void		 roff_free1(struct roff *);
-static	enum rofft	 roff_hash_find(const char *);
+static	enum rofft	 roff_hash_find(const char *, size_t);
 static	void		 roff_hash_init(void);
 static	void		 roffnode_cleanscope(struct roff *);
-static	void		 roffnode_push(struct roff *, 
-				enum rofft, int, int);
+static	void		 roffnode_push(struct roff *, enum rofft,
+				const char *, int, int);
 static	void		 roffnode_pop(struct roff *);
-static	enum rofft	 roff_parse(const char *, int *);
+static	enum rofft	 roff_parse(struct roff *, const char *, int *);
 static	int		 roff_parse_nat(const char *, unsigned int *);
 
 /* See roff_hash_find() */
@@ -179,7 +183,7 @@ roff_hash_init(void)
 	struct roffmac	 *n;
 	int		  buc, i;
 
-	for (i = 0; i < (int)ROFF_MAX; i++) {
+	for (i = 0; i < (int)ROFF_USERDEF; i++) {
 		assert(roffs[i].name[0] >= ASCII_LO);
 		assert(roffs[i].name[0] <= ASCII_HI);
 
@@ -200,7 +204,7 @@ roff_hash_init(void)
  * the nil-terminated string name could be found.
  */
 static enum rofft
-roff_hash_find(const char *p)
+roff_hash_find(const char *p, size_t s)
 {
 	int		 buc;
 	struct roffmac	*n;
@@ -220,7 +224,7 @@ roff_hash_find(const char *p)
 	if (NULL == (n = hash[buc]))
 		return(ROFF_MAX);
 	for ( ; n; n = n->next)
-		if (0 == strcmp(n->name, p))
+		if (0 == strncmp(n->name, p, s) && '\0' == n->name[(int)s])
 			return((enum rofft)(n - roffs));
 
 	return(ROFF_MAX);
@@ -244,8 +248,8 @@ roffnode_pop(struct roff *r)
 			r->rstackpos--;
 
 	r->last = r->last->parent;
-	if (p->end)
-		free(p->end);
+	free(p->name);
+	free(p->end);
 	free(p);
 }
 
@@ -255,12 +259,15 @@ roffnode_pop(struct roff *r)
  * removed with roffnode_pop().
  */
 static void
-roffnode_push(struct roff *r, enum rofft tok, int line, int col)
+roffnode_push(struct roff *r, enum rofft tok, const char *name,
+		int line, int col)
 {
 	struct roffnode	*p;
 
 	p = mandoc_calloc(1, sizeof(struct roffnode));
 	p->tok = tok;
+	if (name)
+		p->name = mandoc_strdup(name);
 	p->parent = r->last;
 	p->line = line;
 	p->col = col;
@@ -392,7 +399,7 @@ roff_parseln(struct roff *r, int ln, cha
 	 */
 
 	if (r->first_string && ! roff_res(r, bufp, szp, pos))
-		return(ROFF_RERUN);
+		return(ROFF_REPARSE);
 
 	/*
 	 * First, if a scope is open and we're not a macro, pass the
@@ -429,7 +436,7 @@ roff_parseln(struct roff *r, int ln, cha
 	 */
 
 	ppos = pos;
-	if (ROFF_MAX == (t = roff_parse(*bufp, &pos)))
+	if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos)))
 		return(ROFF_CONT);
 
 	assert(roffs[t].proc);
@@ -455,35 +462,28 @@ roff_endparse(struct roff *r)
  * form of ".foo xxx" in the usual way.
  */
 static enum rofft
-roff_parse(const char *buf, int *pos)
+roff_parse(struct roff *r, const char *buf, int *pos)
 {
-	int		 j;
-	char		 mac[5];
+	const char	*mac;
+	size_t		 maclen;
 	enum rofft	 t;
 
 	assert(ROFF_CTL(buf[*pos]));
 	(*pos)++;
 
-	while (buf[*pos] && (' ' == buf[*pos] || '\t' == buf[*pos]))
+	while (' ' == buf[*pos] || '\t' == buf[*pos])
 		(*pos)++;
 
 	if ('\0' == buf[*pos])
 		return(ROFF_MAX);
 
-	for (j = 0; j < 4; j++, (*pos)++)
-		if ('\0' == (mac[j] = buf[*pos]))
-			break;
-		else if (' ' == buf[*pos] || (j && '\\' == buf[*pos]))
-			break;
-
-	if (j == 4 || j < 1)
-		return(ROFF_MAX);
+	mac = buf + *pos;
+	maclen = strcspn(mac, " \\\t\0");
 
-	mac[j] = '\0';
-
-	if (ROFF_MAX == (t = roff_hash_find(mac)))
-		return(t);
+	t = (r->current_string = roff_getstrn(r, mac, maclen))
+	    ? ROFF_USERDEF : roff_hash_find(mac, maclen);
 
+	*pos += maclen;
 	while (buf[*pos] && ' ' == buf[*pos])
 		(*pos)++;
 
@@ -617,19 +617,32 @@ roff_block(ROFF_ARGS)
 {
 	int		sv;
 	size_t		sz;
+	char		*name;
 
-	if (ROFF_ig != tok && '\0' == (*bufp)[pos]) {
-		if ( ! (*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL))
-			return(ROFF_ERR);
-		return(ROFF_IGN);
-	} else if (ROFF_ig != tok) {
+	name = NULL;
+
+	if (ROFF_ig != tok) {
+		if ('\0' == (*bufp)[pos]) {
+			(*r->msg)(MANDOCERR_NOARGS, r->data, ln, ppos, NULL);
+			return(ROFF_IGN);
+		}
+		if (ROFF_de == tok)
+			name = *bufp + pos;
 		while ((*bufp)[pos] && ' ' != (*bufp)[pos])
 			pos++;
 		while (' ' == (*bufp)[pos])
-			pos++;
+			(*bufp)[pos++] = '\0';
 	}
 
-	roffnode_push(r, tok, ln, ppos);
+	roffnode_push(r, tok, name, ln, ppos);
+
+	/*
+	 * At the beginning of a `de' macro, clear the existing string
+	 * with the same name, if there is one.  New content will be
+	 * added from roff_block_text() in multiline mode.
+	 */
+	if (ROFF_de == tok)
+		roff_setstr(r, name, NULL, 0);
 
 	if ('\0' == (*bufp)[pos])
 		return(ROFF_IGN);
@@ -696,7 +709,7 @@ roff_block_sub(ROFF_ARGS)
 			roffnode_pop(r);
 			roffnode_cleanscope(r);
 
-			if (ROFF_MAX != roff_parse(*bufp, &pos))
+			if (ROFF_MAX != roff_parse(r, *bufp, &pos))
 				return(ROFF_RERUN);
 			return(ROFF_IGN);
 		}
@@ -708,11 +721,17 @@ roff_block_sub(ROFF_ARGS)
 	 */
 
 	ppos = pos;
-	t = roff_parse(*bufp, &pos);
+	t = roff_parse(r, *bufp, &pos);
 
-	/* If we're not a comment-end, then throw it away. */
-	if (ROFF_cblock != t)
+	/*
+	 * Macros other than block-end are only significant
+	 * in `de' blocks; elsewhere, simply throw them away.
+	 */
+	if (ROFF_cblock != t) {
+		if (ROFF_de == tok)
+			roff_setstr(r, r->last->name, *bufp + ppos, 1);
 		return(ROFF_IGN);
+	}
 
 	assert(roffs[t].proc);
 	return((*roffs[t].proc)(r, t, bufp, szp, 
@@ -725,6 +744,9 @@ static enum rofferr
 roff_block_text(ROFF_ARGS)
 {
 
+	if (ROFF_de == tok)
+		roff_setstr(r, r->last->name, *bufp + pos, 1);
+
 	return(ROFF_IGN);
 }
 
@@ -746,7 +768,7 @@ roff_cond_sub(ROFF_ARGS)
 
 	roffnode_cleanscope(r);
 
-	if (ROFF_MAX == (t = roff_parse(*bufp, &pos))) {
+	if (ROFF_MAX == (t = roff_parse(r, *bufp, &pos))) {
 		if ('\\' == (*bufp)[pos] && '}' == (*bufp)[pos + 1])
 			return(roff_ccond
 				(r, ROFF_ccond, bufp, szp,
@@ -880,7 +902,7 @@ roff_cond(ROFF_ARGS)
 		return(ROFF_ERR);
 	}
 
-	roffnode_push(r, tok, ln, ppos);
+	roffnode_push(r, tok, NULL, ln, ppos);
 
 	r->last->rule = rule;
 
@@ -967,7 +989,7 @@ roff_ds(ROFF_ARGS)
 		string++;
 
 	/* The rest is the value. */
-	roff_setstr(r, name, string);
+	roff_setstr(r, name, string, 0);
 	return(ROFF_IGN);
 }
 
@@ -1030,48 +1052,135 @@ roff_so(ROFF_ARGS)
 }
 
 
-static char *
-roff_strdup(const char *name)
+/* ARGSUSED */
+static enum rofferr
+roff_userdef(ROFF_ARGS)
 {
-	char		*namecopy, *sv;
+	const char	 *arg[9];
+	char		 *cp, *n1, *n2;
+	int		  i;
 
-	/* 
-	 * This isn't a nice simple mandoc_strdup() because we must
-	 * handle roff's stupid double-escape rule. 
+	/*
+	 * Collect pointers to macro argument strings
+	 * and null-terminate them.
 	 */
-	sv = namecopy = mandoc_malloc(strlen(name) + 1);
-	while (*name) {
-		if ('\\' == *name && '\\' == *(name + 1))
-			name++;
-		*namecopy++ = *name++;
+	cp = *bufp + pos;
+	for (i = 0; i < 9; i++) {
+		arg[i] = cp;
+		while ('\0' != *cp && ' ' != *cp)
+			cp++;
+		if ('\0' == *cp)
+			continue;
+		*cp++ = '\0';
+		while (' ' == *cp)
+			cp++;
 	}
 
-	*namecopy = '\0';
-	return(sv);
-}
+	/*
+	 * Expand macro arguments.
+	 */
+	*szp = 0;
+	n1 = cp = mandoc_strdup(r->current_string);
+	while (NULL != (cp = strstr(cp, "\\$"))) {
+		i = cp[2] - '1';
+		if (0 > i || 8 < i) {
+			/* Not an argument invocation. */
+			cp += 2;
+			continue;
+		}
 
+		*szp = strlen(n1) - 3 + strlen(arg[i]) + 1;
+		n2 = mandoc_malloc(*szp);
 
+		strlcpy(n2, n1, (size_t)(cp - n1 + 1));
+		strlcat(n2, arg[i], *szp);
+		strlcat(n2, cp + 3, *szp);
+
+		cp = n2 + (cp - n1);
+		free(n1);
+		n1 = n2;
+	}
+
+	/*
+	 * Replace the macro invocation
+	 * by the expanded macro.
+	 */
+	free(*bufp);
+	*bufp = n1;
+	if (0 == *szp)
+		*szp = strlen(*bufp) + 1;
+
+	return(*szp && '\n' == (*bufp)[(int)*szp - 2] ?
+	   ROFF_REPARSE : ROFF_APPEND);
+}
+
+/*
+ * Store *string into the user-defined string called *name.
+ * In multiline mode, append to an existing entry and append '\n';
+ * else replace the existing entry, if there is one.
+ * To clear an existing entry, call with (*r, *name, NULL, 0).
+ */
 static void
-roff_setstr(struct roff *r, const char *name, const char *string)
+roff_setstr(struct roff *r, const char *name, const char *string,
+	int multiline)
 {
 	struct roffstr	 *n;
-	char		 *namecopy;
+	char		 *c;
+	size_t		  oldch, newch;
 
+	/* Search for an existing string with the same name. */
 	n = r->first_string;
 	while (n && strcmp(name, n->name))
 		n = n->next;
 
 	if (NULL == n) {
-		namecopy = mandoc_strdup(name);
+		/* Create a new string table entry. */
 		n = mandoc_malloc(sizeof(struct roffstr));
-		n->name = namecopy;
+		n->name = mandoc_strdup(name);
+		n->string = NULL;
 		n->next = r->first_string;
 		r->first_string = n;
-	} else
+	} else if (0 == multiline) {
+		/* In multiline mode, append; else replace. */
 		free(n->string);
+		n->string = NULL;
+	}
+
+	if (NULL == string)
+		return;
+
+	/*
+	 * One additional byte for the '\n' in multiline mode,
+	 * and one for the terminating '\0'.
+	 */
+	newch = strlen(string) + (multiline ? 2 : 1);
+	if (NULL == n->string) {
+		n->string = mandoc_malloc(newch);
+		*n->string = '\0';
+		oldch = 0;
+	} else {
+		oldch = strlen(n->string);
+		n->string = mandoc_realloc(n->string, oldch + newch);
+	}
+
+	/* Skip existing content in the destination buffer. */
+	c = n->string + oldch;
+
+	/* Append new content to the destination buffer. */
+	while (*string) {
+		/*
+		 * Rudimentary roff copy mode:
+		 * Handle escaped backslashes.
+		 */
+		if ('\\' == *string && '\\' == *(string + 1))
+			string++;
+		*c++ = *string++;
+	}
 
-	/* Don't use mandoc_strdup: clean out double-escapes. */
-	n->string = string ? roff_strdup(string) : NULL;
+	/* Append terminating bytes. */
+	if (multiline)
+		*c++ = '\n';
+	*c = '\0';
 }
 
 
Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.54
diff -u -p -r1.54 main.c
--- main.c	26 Oct 2010 23:34:38 -0000	1.54
+++ main.c	23 Nov 2010 23:55:17 -0000
@@ -64,6 +64,7 @@ enum	outt {
 struct	curparse {
 	const char	 *file;		/* Current parse. */
 	int		  fd;		/* Current parse. */
+	int		  line;		/* Line number in the file. */
 	enum mandoclevel  wlevel;	/* Ignore messages below this. */
 	int		  wstop;	/* Stop after a file with a warning. */
 	enum intt	  inttype;	/* which parser to use */
@@ -190,10 +191,11 @@ static	const char * const	mandocerrs[MAN
 	"static buffer exhausted",
 };
 
+static	void		  parsebuf(struct curparse *, struct buf, int);
 static	void		  pdesc(struct curparse *);
 static	void		  fdesc(struct curparse *);
 static	void		  ffile(const char *, struct curparse *);
-static	int		  pfile(const char *, struct curparse *, int);
+static	int		  pfile(const char *, struct curparse *);
 static	int		  moptions(enum intt *, char *);
 static	int		  mmsg(enum mandocerr, void *, 
 				int, int, const char *);
@@ -320,7 +322,7 @@ ffile(const char *file, struct curparse 
 }
 
 static int
-pfile(const char *file, struct curparse *curp, int ln)
+pfile(const char *file, struct curparse *curp)
 {
 	const char	*savefile;
 	int		 fd, savefd;
@@ -552,20 +554,8 @@ fdesc(struct curparse *curp)
 static void
 pdesc(struct curparse *curp)
 {
-	struct buf	 ln, blk;
-	int		 i, pos, lnn, lnn_start, with_mmap, of;
-	enum rofferr	 re;
-	unsigned char	 c;
-	struct man	*man;
-	struct mdoc	*mdoc;
-	struct roff	*roff;
-
-	memset(&ln, 0, sizeof(struct buf));
-
-	/*
-	 * Two buffers: ln and buf.  buf is the input file and may be
-	 * memory mapped.  ln is a line buffer and grows on-demand.
-	 */
+	struct buf	 blk;
+	int		 with_mmap;
 
 	if ( ! read_whole_file(curp, &blk, &with_mmap)) {
 		exit_status = MANDOCLEVEL_SYSERR;
@@ -575,14 +565,42 @@ pdesc(struct curparse *curp)
 	if (NULL == curp->roff) 
 		curp->roff = roff_alloc(&curp->regs, curp, mmsg);
 	assert(curp->roff);
-	roff = curp->roff;
-	mdoc = curp->mdoc;
+
+	curp->line = 1;
+	parsebuf(curp, blk, 1);
+
+	if (with_mmap)
+		munmap(blk.buf, blk.sz);
+	else
+		free(blk.buf);
+}
+
+static void
+parsebuf(struct curparse *curp, struct buf blk, int start)
+{
+	struct buf	 ln;
+	int		 i, pos, lnn, of;
+	unsigned char	 c;
+	struct man	*man;
+	struct mdoc	*mdoc;
+	struct roff	*roff;
+
 	man  = curp->man;
+	mdoc = curp->mdoc;
+	roff = curp->roff;
 
-	for (i = 0, lnn = 1; i < (int)blk.sz;) {
-		pos = 0;
-		lnn_start = lnn;
-		while (i < (int)blk.sz) {
+	memset(&ln, 0, sizeof(struct buf));
+
+	lnn = curp->line;  /* line number in the real file */
+	pos = 0;  /* byte number in the ln buffer */
+
+	for (i = 0; i < (int)blk.sz;) {
+		if (0 == pos && '\0' == blk.buf[i])
+			break;
+		if (start)
+			curp->line = lnn;
+
+		while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
 			if ('\n' == blk.buf[i]) {
 				++i;
 				++lnn;
@@ -601,7 +619,7 @@ pdesc(struct curparse *curp)
 			c = (unsigned char) blk.buf[i];
 			if ( ! (isascii(c) && (isgraph(c) || isblank(c)))) {
 				mmsg(MANDOCERR_BADCHAR, curp, 
-				    lnn_start, pos, "ignoring byte");
+				    curp->line, pos, "ignoring byte");
 				i++;
 				continue;
 			}
@@ -661,21 +679,32 @@ pdesc(struct curparse *curp)
 		 */
 
 		of = 0;
-		do {
-			re = roff_parseln(roff, lnn_start, 
-					&ln.buf, &ln.sz, of, &of);
-		} while (ROFF_RERUN == re);
-
-		if (ROFF_IGN == re) {
+rerun:
+		switch (roff_parseln(roff, curp->line, &ln.buf, &ln.sz,
+		    of, &of)) {
+		case (ROFF_REPARSE):
+			parsebuf(curp, ln, 0);
+			pos = 0;
 			continue;
-		} else if (ROFF_ERR == re) {
+		case (ROFF_APPEND):
+			pos = strlen(ln.buf);
+			continue;
+		case (ROFF_RERUN):
+			goto rerun;
+		case (ROFF_IGN):
+			pos = 0;
+			continue;
+		case (ROFF_ERR):
 			assert(MANDOCLEVEL_FATAL <= exit_status);
 			break;
-		} else if (ROFF_SO == re) {
-			if (pfile(ln.buf + of, curp, lnn_start))
+		case (ROFF_SO):
+			if (pfile(ln.buf + of, curp)) {
+				pos = 0;
 				continue;
-			else
+			} else
 				break;
+		case (ROFF_CONT):
+			break;
 		}
 
 		/*
@@ -690,21 +719,24 @@ pdesc(struct curparse *curp)
 
 		/* Lastly, push down into the parsers themselves. */
 
-		if (man && ! man_parseln(man, lnn_start, ln.buf, of)) {
+		if (man && ! man_parseln(man, curp->line, ln.buf, of)) {
 			assert(MANDOCLEVEL_FATAL <= exit_status);
 			break;
 		}
-		if (mdoc && ! mdoc_parseln(mdoc, lnn_start, ln.buf, of)) {
+		if (mdoc && ! mdoc_parseln(mdoc, curp->line, ln.buf, of)) {
 			assert(MANDOCLEVEL_FATAL <= exit_status);
 			break;
 		}
+
+		/* Temporary buffers typically are not full. */
+		if (0 == start && '\0' == blk.buf[i])
+			break;
+
+		/* Start the next input line. */
+		pos = 0;
 	}
 
 	free(ln.buf);
-	if (with_mmap)
-		munmap(blk.buf, blk.sz);
-	else
-		free(blk.buf);
 }
 
 
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-11-24  0:31 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-24  0:31 implement .de, now with argument expansion Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).