tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* PATCH: mdoc_ptext rewrite
@ 2010-05-24 13:19 Ingo Schwarze
  2010-05-24 13:38 ` Kristaps Dzonsons
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2010-05-24 13:19 UTC (permalink / raw)
  To: tech

Hi Kristaps,

as part of the resync, i should like to commit the following to bsd.lv:

> * Ingo has rewritten the main mdoc text parser, mdoc_ptext(),
>   making it easier to understand and fixing various bugs.
>   It is now correctly stripping white space from the end of
>   text lines, in literal mode stripping tab characters as well,
>   and it issues consistent warnings regarding trailing spaces
>   and tabs on text lines.  Besides, escaped backslashes no
>   longer escape the following character.

The patch included below has been updated to work with all changes
that happened in both trees, has been tested in the OpenBSD tree,
compiles in the bsd.lv tree, and the output of the binary from the
latter looks good to me, too.

All the same, as this is somewhat intrusive, i would appreciate
your explicit testing and OK before committing to bsd.lv.

This patch is also available from
  /usr/vhosts/mdocml.bsd.lv/patch/schwarze/04.mdoc_ptext-rewrite.patch

Thanks,
  Ingo


--- mdoc.c	Tue May 18 02:05:38 2010
+++ mdoc.c	Mon May 24 00:45:00 2010
@@ -542,7 +538,7 @@ mdoc_node_delete(struct mdoc *m, struct mdoc_node *p)
 static int
 mdoc_ptext(struct mdoc *m, int line, char *buf, int offs)
 {
-	int		 i;
+	char		*c, *ws, *end;
 
 	/* Ignore bogus comments. */
 
@@ -556,18 +552,51 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
 	if (SEC_NONE == m->lastnamed)
 		return(mdoc_pmsg(m, line, offs, MANDOCERR_NOTEXT));
 
-	/* Literal just gets pulled in as-is. */
-	
-	if (MDOC_LITERAL & m->flags)
-		return(mdoc_word_alloc(m, line, offs, buf + offs));
+	/*
+	 * Search for the beginning of unescaped trailing whitespace (ws)
+	 * and for the first character not to be output (end).
+	 */
+	ws = NULL;
+	for (c = end = buf + offs; *c; c++) {
+		switch (*c) {
+		case ' ':
+			if (NULL == ws)
+				ws = c;
+			continue;
+		case '\t':
+			/*
+			 * Always warn about trailing tabs,
+			 * even outside literal context,
+			 * where they should be put on the next line.
+			 */
+			if (NULL == ws)
+				ws = c;
+			/*
+			 * Strip trailing tabs in literal context only;
+			 * outside, they affect the next line.
+			 */
+			if (MDOC_LITERAL & m->flags)
+				continue;
+			break;
+		case '\\':
+			/* Skip the escaped character, too, if any. */
+			if (c[1])
+				c++;
+			/* FALLTHROUGH */
+		default:
+			ws = NULL;
+			break;
+		}
+		end = c + 1;
+	}
+	*end = '\0';
 
-	/* Check for a blank line, which may also consist of spaces. */
+	if (ws)
+		if ( ! mdoc_pmsg(m, line, (int)(ws-buf), MANDOCERR_EOLNSPACE))
+			return(0);
 
-	for (i = offs; ' ' == buf[i]; i++)
-		/* Skip to first non-space. */ ;
-
-	if ('\0' == buf[i]) {
-		if ( ! mdoc_pmsg(m, line, offs, MANDOCERR_NOBLANKLN))
+	if ('\0' == buf[offs] && ! (MDOC_LITERAL & m->flags)) {
+		if ( ! mdoc_pmsg(m, line, (int)(c-buf), MANDOCERR_NOBLANKLN))
 			return(0);
 
 		/*
@@ -582,41 +611,21 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
 		return(1);
 	}
 
-	/* 
-	 * Warn if the last un-escaped character is whitespace. Then
-	 * strip away the remaining spaces (tabs stay!).   
-	 */
-
-	i = (int)strlen(buf);
-	assert(i);
-
-	if (' ' == buf[i - 1] || '\t' == buf[i - 1]) {
-		if (i > 1 && '\\' != buf[i - 2])
-			if ( ! mdoc_pmsg(m, line, i - 1, MANDOCERR_EOLNSPACE))
-				return(0);
-
-		for (--i; i && ' ' == buf[i]; i--)
-			/* Spin back to non-space. */ ;
-
-		/* Jump ahead of escaped whitespace. */
-		i += '\\' == buf[i] ? 2 : 1;
-
-		buf[i] = '\0';
-	}
-
-	/* Allocate the whole word. */
-
-	if ( ! mdoc_word_alloc(m, line, offs, buf + offs))
+	if ( ! mdoc_word_alloc(m, line, offs, buf+offs))
 		return(0);
 
+	if (MDOC_LITERAL & m->flags)
+		return(1);
+
 	/*
 	 * End-of-sentence check.  If the last character is an unescaped
 	 * EOS character, then flag the node as being the end of a
 	 * sentence.  The front-end will know how to interpret this.
 	 */
 
-	assert(i);
-	if (mandoc_eos(buf, (size_t)i))
+	assert(buf < end);
+
+	if (mandoc_eos(buf+offs, (size_t)(end-buf-offs)))
 		m->last->flags |= MDOC_EOS;
 
 	return(1);
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PATCH: mdoc_ptext rewrite
  2010-05-24 13:19 PATCH: mdoc_ptext rewrite Ingo Schwarze
@ 2010-05-24 13:38 ` Kristaps Dzonsons
  2010-05-24 13:59   ` Kristaps Dzonsons
  0 siblings, 1 reply; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-05-24 13:38 UTC (permalink / raw)
  To: tech

> as part of the resync, i should like to commit the following to bsd.lv:
> 
>> * Ingo has rewritten the main mdoc text parser, mdoc_ptext(),
>>   making it easier to understand and fixing various bugs.
>>   It is now correctly stripping white space from the end of
>>   text lines, in literal mode stripping tab characters as well,
>>   and it issues consistent warnings regarding trailing spaces
>>   and tabs on text lines.  Besides, escaped backslashes no
>>   longer escape the following character.
> 
> The patch included below has been updated to work with all changes
> that happened in both trees, has been tested in the OpenBSD tree,
> compiles in the bsd.lv tree, and the output of the binary from the
> latter looks good to me, too.
> 
> All the same, as this is somewhat intrusive, i would appreciate
> your explicit testing and OK before committing to bsd.lv.

Tested (only topically for visual effect, however) and looked over. 
Commit, please.

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PATCH: mdoc_ptext rewrite
  2010-05-24 13:38 ` Kristaps Dzonsons
@ 2010-05-24 13:59   ` Kristaps Dzonsons
  2010-05-24 14:12     ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-05-24 13:59 UTC (permalink / raw)
  To: tech

Replying to myself... can you pop in some regression tests that test 
against free-form text tab/space inclusions?

Also, is this behaviour different in -man?
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PATCH: mdoc_ptext rewrite
  2010-05-24 13:59   ` Kristaps Dzonsons
@ 2010-05-24 14:12     ` Ingo Schwarze
  2010-05-24 14:18       ` Kristaps Dzonsons
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2010-05-24 14:12 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, May 24, 2010 at 03:59:49PM +0200:

> Replying to myself... can you pop in some regression tests that test
> against free-form text tab/space inclusions?

After syncing, i need to do a two-way sync of regression tests
anyway.  Yes, certainly it is possible to test for this.

> Also, is this behaviour different in -man?

Frankly, i have no idea.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PATCH: mdoc_ptext rewrite
  2010-05-24 14:12     ` Ingo Schwarze
@ 2010-05-24 14:18       ` Kristaps Dzonsons
  2010-05-24 14:27         ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-05-24 14:18 UTC (permalink / raw)
  To: tech

> Kristaps Dzonsons wrote on Mon, May 24, 2010 at 03:59:49PM +0200:
> 
>> Replying to myself... can you pop in some regression tests that test
>> against free-form text tab/space inclusions?
> 
> After syncing, i need to do a two-way sync of regression tests
> anyway.  Yes, certainly it is possible to test for this.
> 
>> Also, is this behaviour different in -man?
> 
> Frankly, i have no idea.

Let's just copy over the free-form text tests to have a -man header and 
catch fallout from there.
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PATCH: mdoc_ptext rewrite
  2010-05-24 14:18       ` Kristaps Dzonsons
@ 2010-05-24 14:27         ` Ingo Schwarze
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Schwarze @ 2010-05-24 14:27 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, May 24, 2010 at 04:18:36PM +0200:
> >Kristaps Dzonsons wrote on Mon, May 24, 2010 at 03:59:49PM +0200:

>>> Replying to myself... can you pop in some regression tests that test
>>> against free-form text tab/space inclusions?

>> After syncing, i need to do a two-way sync of regression tests
>> anyway.  Yes, certainly it is possible to test for this.

>>> Also, is this behaviour different in -man?

>> Frankly, i have no idea.

> Let's just copy over the free-form text tests

cp: No such file or directory, yet.  :)

> to have a -man header and catch fallout from there.

Apart form that minor issue, sure.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-24 14:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-24 13:19 PATCH: mdoc_ptext rewrite Ingo Schwarze
2010-05-24 13:38 ` Kristaps Dzonsons
2010-05-24 13:59   ` Kristaps Dzonsons
2010-05-24 14:12     ` Ingo Schwarze
2010-05-24 14:18       ` Kristaps Dzonsons
2010-05-24 14:27         ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).