tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* PATCH: mdoc_ptext rewrite
@ 2010-05-24 13:19 Ingo Schwarze
  2010-05-24 13:38 ` Kristaps Dzonsons
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2010-05-24 13:19 UTC (permalink / raw)
  To: tech

Hi Kristaps,

as part of the resync, i should like to commit the following to bsd.lv:

> * Ingo has rewritten the main mdoc text parser, mdoc_ptext(),
>   making it easier to understand and fixing various bugs.
>   It is now correctly stripping white space from the end of
>   text lines, in literal mode stripping tab characters as well,
>   and it issues consistent warnings regarding trailing spaces
>   and tabs on text lines.  Besides, escaped backslashes no
>   longer escape the following character.

The patch included below has been updated to work with all changes
that happened in both trees, has been tested in the OpenBSD tree,
compiles in the bsd.lv tree, and the output of the binary from the
latter looks good to me, too.

All the same, as this is somewhat intrusive, i would appreciate
your explicit testing and OK before committing to bsd.lv.

This patch is also available from
  /usr/vhosts/mdocml.bsd.lv/patch/schwarze/04.mdoc_ptext-rewrite.patch

Thanks,
  Ingo


--- mdoc.c	Tue May 18 02:05:38 2010
+++ mdoc.c	Mon May 24 00:45:00 2010
@@ -542,7 +538,7 @@ mdoc_node_delete(struct mdoc *m, struct mdoc_node *p)
 static int
 mdoc_ptext(struct mdoc *m, int line, char *buf, int offs)
 {
-	int		 i;
+	char		*c, *ws, *end;
 
 	/* Ignore bogus comments. */
 
@@ -556,18 +552,51 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
 	if (SEC_NONE == m->lastnamed)
 		return(mdoc_pmsg(m, line, offs, MANDOCERR_NOTEXT));
 
-	/* Literal just gets pulled in as-is. */
-	
-	if (MDOC_LITERAL & m->flags)
-		return(mdoc_word_alloc(m, line, offs, buf + offs));
+	/*
+	 * Search for the beginning of unescaped trailing whitespace (ws)
+	 * and for the first character not to be output (end).
+	 */
+	ws = NULL;
+	for (c = end = buf + offs; *c; c++) {
+		switch (*c) {
+		case ' ':
+			if (NULL == ws)
+				ws = c;
+			continue;
+		case '\t':
+			/*
+			 * Always warn about trailing tabs,
+			 * even outside literal context,
+			 * where they should be put on the next line.
+			 */
+			if (NULL == ws)
+				ws = c;
+			/*
+			 * Strip trailing tabs in literal context only;
+			 * outside, they affect the next line.
+			 */
+			if (MDOC_LITERAL & m->flags)
+				continue;
+			break;
+		case '\\':
+			/* Skip the escaped character, too, if any. */
+			if (c[1])
+				c++;
+			/* FALLTHROUGH */
+		default:
+			ws = NULL;
+			break;
+		}
+		end = c + 1;
+	}
+	*end = '\0';
 
-	/* Check for a blank line, which may also consist of spaces. */
+	if (ws)
+		if ( ! mdoc_pmsg(m, line, (int)(ws-buf), MANDOCERR_EOLNSPACE))
+			return(0);
 
-	for (i = offs; ' ' == buf[i]; i++)
-		/* Skip to first non-space. */ ;
-
-	if ('\0' == buf[i]) {
-		if ( ! mdoc_pmsg(m, line, offs, MANDOCERR_NOBLANKLN))
+	if ('\0' == buf[offs] && ! (MDOC_LITERAL & m->flags)) {
+		if ( ! mdoc_pmsg(m, line, (int)(c-buf), MANDOCERR_NOBLANKLN))
 			return(0);
 
 		/*
@@ -582,41 +611,21 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
 		return(1);
 	}
 
-	/* 
-	 * Warn if the last un-escaped character is whitespace. Then
-	 * strip away the remaining spaces (tabs stay!).   
-	 */
-
-	i = (int)strlen(buf);
-	assert(i);
-
-	if (' ' == buf[i - 1] || '\t' == buf[i - 1]) {
-		if (i > 1 && '\\' != buf[i - 2])
-			if ( ! mdoc_pmsg(m, line, i - 1, MANDOCERR_EOLNSPACE))
-				return(0);
-
-		for (--i; i && ' ' == buf[i]; i--)
-			/* Spin back to non-space. */ ;
-
-		/* Jump ahead of escaped whitespace. */
-		i += '\\' == buf[i] ? 2 : 1;
-
-		buf[i] = '\0';
-	}
-
-	/* Allocate the whole word. */
-
-	if ( ! mdoc_word_alloc(m, line, offs, buf + offs))
+	if ( ! mdoc_word_alloc(m, line, offs, buf+offs))
 		return(0);
 
+	if (MDOC_LITERAL & m->flags)
+		return(1);
+
 	/*
 	 * End-of-sentence check.  If the last character is an unescaped
 	 * EOS character, then flag the node as being the end of a
 	 * sentence.  The front-end will know how to interpret this.
 	 */
 
-	assert(i);
-	if (mandoc_eos(buf, (size_t)i))
+	assert(buf < end);
+
+	if (mandoc_eos(buf+offs, (size_t)(end-buf-offs)))
 		m->last->flags |= MDOC_EOS;
 
 	return(1);
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-24 14:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-24 13:19 PATCH: mdoc_ptext rewrite Ingo Schwarze
2010-05-24 13:38 ` Kristaps Dzonsons
2010-05-24 13:59   ` Kristaps Dzonsons
2010-05-24 14:12     ` Ingo Schwarze
2010-05-24 14:18       ` Kristaps Dzonsons
2010-05-24 14:27         ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).