From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: PATCH: mdoc_ptext rewrite
Date: Mon, 24 May 2010 15:19:52 +0200 [thread overview]
Message-ID: <20100524131952.GK13544@iris.usta.de> (raw)
Hi Kristaps,
as part of the resync, i should like to commit the following to bsd.lv:
> * Ingo has rewritten the main mdoc text parser, mdoc_ptext(),
> making it easier to understand and fixing various bugs.
> It is now correctly stripping white space from the end of
> text lines, in literal mode stripping tab characters as well,
> and it issues consistent warnings regarding trailing spaces
> and tabs on text lines. Besides, escaped backslashes no
> longer escape the following character.
The patch included below has been updated to work with all changes
that happened in both trees, has been tested in the OpenBSD tree,
compiles in the bsd.lv tree, and the output of the binary from the
latter looks good to me, too.
All the same, as this is somewhat intrusive, i would appreciate
your explicit testing and OK before committing to bsd.lv.
This patch is also available from
/usr/vhosts/mdocml.bsd.lv/patch/schwarze/04.mdoc_ptext-rewrite.patch
Thanks,
Ingo
--- mdoc.c Tue May 18 02:05:38 2010
+++ mdoc.c Mon May 24 00:45:00 2010
@@ -542,7 +538,7 @@ mdoc_node_delete(struct mdoc *m, struct mdoc_node *p)
static int
mdoc_ptext(struct mdoc *m, int line, char *buf, int offs)
{
- int i;
+ char *c, *ws, *end;
/* Ignore bogus comments. */
@@ -556,18 +552,51 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
if (SEC_NONE == m->lastnamed)
return(mdoc_pmsg(m, line, offs, MANDOCERR_NOTEXT));
- /* Literal just gets pulled in as-is. */
-
- if (MDOC_LITERAL & m->flags)
- return(mdoc_word_alloc(m, line, offs, buf + offs));
+ /*
+ * Search for the beginning of unescaped trailing whitespace (ws)
+ * and for the first character not to be output (end).
+ */
+ ws = NULL;
+ for (c = end = buf + offs; *c; c++) {
+ switch (*c) {
+ case ' ':
+ if (NULL == ws)
+ ws = c;
+ continue;
+ case '\t':
+ /*
+ * Always warn about trailing tabs,
+ * even outside literal context,
+ * where they should be put on the next line.
+ */
+ if (NULL == ws)
+ ws = c;
+ /*
+ * Strip trailing tabs in literal context only;
+ * outside, they affect the next line.
+ */
+ if (MDOC_LITERAL & m->flags)
+ continue;
+ break;
+ case '\\':
+ /* Skip the escaped character, too, if any. */
+ if (c[1])
+ c++;
+ /* FALLTHROUGH */
+ default:
+ ws = NULL;
+ break;
+ }
+ end = c + 1;
+ }
+ *end = '\0';
- /* Check for a blank line, which may also consist of spaces. */
+ if (ws)
+ if ( ! mdoc_pmsg(m, line, (int)(ws-buf), MANDOCERR_EOLNSPACE))
+ return(0);
- for (i = offs; ' ' == buf[i]; i++)
- /* Skip to first non-space. */ ;
-
- if ('\0' == buf[i]) {
- if ( ! mdoc_pmsg(m, line, offs, MANDOCERR_NOBLANKLN))
+ if ('\0' == buf[offs] && ! (MDOC_LITERAL & m->flags)) {
+ if ( ! mdoc_pmsg(m, line, (int)(c-buf), MANDOCERR_NOBLANKLN))
return(0);
/*
@@ -582,41 +611,21 @@ mdoc_ptext(struct mdoc *m, int line, char *buf, int of
return(1);
}
- /*
- * Warn if the last un-escaped character is whitespace. Then
- * strip away the remaining spaces (tabs stay!).
- */
-
- i = (int)strlen(buf);
- assert(i);
-
- if (' ' == buf[i - 1] || '\t' == buf[i - 1]) {
- if (i > 1 && '\\' != buf[i - 2])
- if ( ! mdoc_pmsg(m, line, i - 1, MANDOCERR_EOLNSPACE))
- return(0);
-
- for (--i; i && ' ' == buf[i]; i--)
- /* Spin back to non-space. */ ;
-
- /* Jump ahead of escaped whitespace. */
- i += '\\' == buf[i] ? 2 : 1;
-
- buf[i] = '\0';
- }
-
- /* Allocate the whole word. */
-
- if ( ! mdoc_word_alloc(m, line, offs, buf + offs))
+ if ( ! mdoc_word_alloc(m, line, offs, buf+offs))
return(0);
+ if (MDOC_LITERAL & m->flags)
+ return(1);
+
/*
* End-of-sentence check. If the last character is an unescaped
* EOS character, then flag the node as being the end of a
* sentence. The front-end will know how to interpret this.
*/
- assert(i);
- if (mandoc_eos(buf, (size_t)i))
+ assert(buf < end);
+
+ if (mandoc_eos(buf+offs, (size_t)(end-buf-offs)))
m->last->flags |= MDOC_EOS;
return(1);
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
next reply other threads:[~2010-05-24 13:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-24 13:19 Ingo Schwarze [this message]
2010-05-24 13:38 ` Kristaps Dzonsons
2010-05-24 13:59 ` Kristaps Dzonsons
2010-05-24 14:12 ` Ingo Schwarze
2010-05-24 14:18 ` Kristaps Dzonsons
2010-05-24 14:27 ` Ingo Schwarze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100524131952.GK13544@iris.usta.de \
--to=schwarze@usta.de \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).