From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: [PATCH] deal with bad block nesting
Date: Mon, 14 Jun 2010 04:47:11 +0200 [thread overview]
Message-ID: <20100614024711.GI14228@iris.usta.de> (raw)
[-- Attachment #1: Type: text/plain, Size: 16869 bytes --]
Hi!
Here is the most difficult patch i have done for mandoc so far.
Even though i already had the make_pending() algorithm before Rostock,
it cost me both the train ride to Rostock and back to design and
implement the remaining algorithms, and another day today for
debugging, which resulted in considerable parts of the patch to be
reorganized, because the code got stuck in various edge cases.
All in all, about a week of work has gone into this...
So, badly nested blocks occur in quite a few manuals -
but what are they? In a nutshell, blocks are badly nested
if one block ends while at least one of its sub blocks is
still open.
Here are the easy cases:
1) Partial explicit block ending while a partial explicit sub block
is still open - briefly, explicit (A) breaking explicit (B):
.Ao ao
.Bo bo ac
.Ac bc
.Bc end
-> renders as "<ao [bo ac> bc] end"
2) Partial implicit block ending while a partial explicit sub block
is still open - briefly, implicit (A) breaking explicit (B):
.Aq aq Bo bo eol
bc
.Bc end
-> renders as "<aq [bo eol>bc] end"
3) Partial explicit block ending while a partial implicit sub block
is still open - briefly, explicit (A) breaking implicit (B):
.Ao ao
.Bq bq ac Ac eol
end
-> renders as "<ao [bq ac> eol] end"
4) Full block header extended by a partial explicit sub block -
briefly, full (It) extended by explicit (B), or just It extended:
.Bl -tag -width Ds
.It it Bo
.No bo bc
.Bc
text
.El
Actually, the last one is not new, it already works with the
so-called .Xo-support in our trees.
In all cases so far, when A breaks B, B must remeber to close out A
after closing itself.
Now, you want something more difficult?
You can re-break a block that is already broken,
using purely explicit blocks:
5) Double-break, inner break before outer break.
This one is tricky because by the time A breaks S,
S was already broken by B, so S already remembers to
close B and can't remeber to close A any more.
Thus, S must delegate closing A to B.
.Ao ao
.Bo bo
.So so bc
.Bc ac
.Ac sc
.Sc end
-> renders as "<ao [bo `so bc] ac> sc' end"
6) Double-break, outer break before inner.
This one is even trickier because when B breaks S,
it is not an option for S to delegate closing B to A
like in case 5, because B must be closed before A.
Thus, B must first take over the task to close A from S,
and then S can remember to close B.
.Ao ao
.Bo bo
.So so ac
.Ac bc
.Bc sc
.Sc end
-> renders as "<ao [bo `so ac> bc] sc' end"
7) Broken breaker.
Even a block that was broken itself can break another block.
.Ao ao
.Bo bo ac
.Ac middle
.So so bc
.Bc sc
.Sc end
-> renders as "<ao [bo ac> middle `so bc] sc' end"
8) Broken double-breaker.
Actually, the broken breaker is easier than may seem;
but it gets really tough when the broken block does
a style-6 inner double break, because a broken block already
remebers to close its breaker and cannot - as required by
case 6 - take over the outer breaker from the block it is
breaking. Instead, it must hand over the breaker of
the block it is breaking to the block it was broken by,
and only then can the block it is breaking remember it.
.Ao ao
.Bo bo
.So so bc
.Bc middle
.Do do ac
.Ac sc
.Sc dc
.Dc end
-> renders as "<ao [bo `so bc] middle ``do ac> sc' dc'' end"
So far, this was all about explicit blocks. Additional aspects come
into play when implicit blocks are involved.
9) When both explicit and implicit blocks are open, you must
always break the last explicit block first, and everything
will just work:
.Ao ao Bq bq So so Dq dq ac Ac sc Sc eol
end
.br
.Ao ao Bq bq So so Dq dq ac Ac eol
sc
.Sc end
-> renders as "<ao [bq `so ``dq ac> sc' eol''] end"
"<ao [bq `so ``dq ac> eol'']sc' end"
10) But if only implicit blocks are open, you must break
the *first* implicit block right away:
.Ao ao Bq bq Sq sq ac Ac eol
end
-> renders as "<ao [bq `sq ac> eol'] end"
So, here is the patch for you guys to play with.
I fear i will need to read it again carefully by the light of day:
Even though it is now passing all tests and building the OpenBSD
tree all right, there might still be some lines of code that are
now obsolete and can be removed.
I'm also attaching the test cases cited above, put together
in a single file. You may want to run them through -Ttree.
Have fun,
Ingo
Index: mdoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.h,v
retrieving revision 1.27
diff -u -p -r1.27 mdoc.h
--- mdoc.h 6 Jun 2010 20:30:08 -0000 1.27
+++ mdoc.h 14 Jun 2010 00:55:10 -0000
@@ -278,6 +278,7 @@ struct mdoc_node {
#define MDOC_ACTED (1 << 1) /* has been acted upon */
#define MDOC_EOS (1 << 2) /* at sentence boundary */
#define MDOC_LINE (1 << 3) /* first macro/text on line */
+#define MDOC_ENDED (1 << 4) /* rendering has been ended */
enum mdoc_type type; /* AST node type */
enum mdoc_sec sec; /* current named section */
struct mdoc_arg *args; /* BLOCK/ELEM */
@@ -286,6 +287,7 @@ struct mdoc_node {
struct mdoc_node *body; /* BLOCK */
struct mdoc_node *tail; /* BLOCK */
char *string; /* TEXT */
+ int end; /* BODY */
union {
enum mdoc_list list; /* for `Bl' nodes */
Index: mdoc_macro.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v
retrieving revision 1.46
diff -u -p -r1.46 mdoc_macro.c
--- mdoc_macro.c 6 Jun 2010 20:30:08 -0000 1.46
+++ mdoc_macro.c 14 Jun 2010 00:55:10 -0000
@@ -46,6 +46,8 @@ static int append_delims(struct mdoc
int, int *, char *);
static enum mdoct lookup(enum mdoct, const char *);
static enum mdoct lookup_raw(const char *);
+static int make_pending(struct mdoc_node *, enum mdoc_type,
+ struct mdoc *, int, int);
static int phrase(struct mdoc *, int, int, char *);
static enum mdoct rew_alt(enum mdoct);
static int rew_dobreak(enum mdoct,
@@ -406,7 +408,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
/* FALLTHROUGH */
case (MDOC_Vt):
assert(MDOC_TAIL != type);
- if (type == p->type && tok == p->tok)
+ if (tok != p->tok)
+ break;
+ if (p->end)
+ return(REWIND_HALT);
+ if (type == p->type)
return(REWIND_REWIND);
break;
case (MDOC_It):
@@ -460,7 +466,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
case (MDOC_So):
/* FALLTHROUGH */
case (MDOC_Xo):
- if (type == p->type && tok == p->tok)
+ if (tok != p->tok)
+ break;
+ if (p->end)
+ return(REWIND_HALT);
+ if (type == p->type)
return(REWIND_REWIND);
break;
/* Multi-line explicit scope close. */
@@ -495,7 +505,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
case (MDOC_Sc):
/* FALLTHROUGH */
case (MDOC_Xc):
- if (type == p->type && rew_alt(tok) == p->tok)
+ if (rew_alt(tok) != p->tok)
+ break;
+ if (p->end)
+ return(REWIND_HALT);
+ if (type == p->type)
return(REWIND_REWIND);
break;
default:
@@ -522,6 +536,8 @@ rew_dobreak(enum mdoct tok, const struct
return(1);
if (MDOC_VALID & p->flags)
return(1);
+ if (MDOC_BODY == p->type && p->end)
+ return(1);
switch (tok) {
case (MDOC_It):
@@ -572,6 +588,81 @@ rew_elem(struct mdoc *mdoc, enum mdoct t
}
+/*
+ * We are trying to close a block identified by tok,
+ * but the child block *broken is still open.
+ * Thus, postpone closing the tok block
+ * until the rew_sub call closing *broken.
+ */
+static int
+make_pending(struct mdoc_node *broken, enum mdoct tok,
+ struct mdoc *m, int line, int ppos)
+{
+ struct mdoc_node *breaker;
+
+ /*
+ * Iterate backwards, searching for the block matching tok,
+ * that is, the block breaking the *broken block.
+ */
+ for (breaker = broken->parent; breaker; breaker = breaker->parent) {
+
+ /*
+ * If the *broken block had already been broken before
+ * and we encounter its breaker, make the tok block
+ * pending on the inner breaker.
+ * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
+ * becomes "[A broken=[B [C->B B] tok=A] C]"
+ * and finally "[A [B->A [C->B B] A] C]".
+ */
+ if (breaker == broken->pending) {
+ broken = breaker;
+ continue;
+ }
+
+ if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
+ continue;
+ if (MDOC_BODY == broken->type)
+ broken = broken->parent;
+
+ /*
+ * Found the breaker.
+ * If another, outer breaker is already pending on
+ * the *broken block, we must not clobber the link
+ * to the outer breaker, but make it pending on the
+ * new, now inner breaker.
+ * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
+ * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
+ * and finally "[A [B->A [C->B A] B] C]".
+ */
+ if (broken->pending) {
+ struct mdoc_node *taker;
+
+ /*
+ * If the breaker had also been broken before,
+ * it cannot take on the outer breaker itself,
+ * but must hand it on to its own breakers.
+ * Graphically, this is the following situation:
+ * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
+ * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
+ */
+ taker = breaker;
+ while (taker->pending)
+ taker = taker->pending;
+ taker->pending = broken->pending;
+ }
+ broken->pending = breaker;
+ return(1);
+ }
+
+ /*
+ * Found no matching block for tok.
+ * Are you trying to close a block that is not open?
+ * Report failure.
+ */
+ mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
+ return(0);
+}
+
static int
rew_sub(enum mdoc_type t, struct mdoc *m,
enum mdoct tok, int line, int ppos)
@@ -583,7 +674,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
for (n = m->last; n; n = n->parent) {
c = rew_dohalt(tok, t, n);
if (REWIND_HALT == c) {
- if (MDOC_BLOCK != t)
+ if (n->end || MDOC_BLOCK != t)
return(1);
if ( ! (MDOC_EXPLICIT & mdoc_macros[tok].flags))
return(1);
@@ -597,6 +688,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
continue;
if ( ! swarn(m, t, line, ppos, n))
return(0);
+ return make_pending(n, tok, m, line, ppos);
}
assert(n);
@@ -604,15 +696,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
return(0);
/*
- * The current block extends an enclosing block beyond a line
- * break. Now that the current block ends, close the enclosing
- * block, too.
+ * The current block extends an enclosing block.
+ * Now that the current block ends, close the enclosing block, too.
*/
- if (NULL != (n = n->pending)) {
- assert(MDOC_HEAD == n->type);
+ while (NULL != (n = n->pending)) {
if ( ! rew_last(m, n))
return(0);
- if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
+ if (MDOC_HEAD == n->type &&
+ ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
return(0);
}
return(1);
@@ -667,9 +758,13 @@ append_delims(struct mdoc *m, int line,
static int
blk_exp_close(MACRO_PROT_ARGS)
{
+ struct mdoc_node *body = NULL; /* Our own body. */
+ struct mdoc_node *later = NULL; /* A sub-block starting later. */
+ struct mdoc_node *n; /* For searching backwards. */
+
int j, lastarg, maxargs, flushed, nl;
enum margserr ac;
- enum mdoct ntok;
+ enum mdoct atok, ntok;
char *p;
nl = MDOC_NEWLINE & m->flags;
@@ -683,6 +778,70 @@ blk_exp_close(MACRO_PROT_ARGS)
break;
}
+ /*
+ * Search backwards for beginnings of blocks,
+ * both of our own and of pending sub-blocks.
+ */
+ atok = rew_alt(tok);
+ for (n = m->last; n; n = n->parent) {
+ if (MDOC_VALID & n->flags)
+ continue;
+
+ /* Remember the start of our own body. */
+ if (MDOC_BODY == n->type && atok == n->tok) {
+ if (0 == n->end)
+ body = n;
+ continue;
+ }
+
+ if (MDOC_BLOCK != n->type)
+ continue;
+ if (atok == n->tok) {
+ assert(body);
+
+ /*
+ * Found the start of our own block.
+ * When there is no pending sub block,
+ * just proceed to closing out.
+ */
+ if (NULL == later)
+ break;
+
+ /*
+ * When there is a pending sub block,
+ * postpone closing out the current block
+ * until the rew_sub() closing out the sub-block.
+ */
+ if ( ! make_pending(later, tok, m, line, ppos))
+ return(0);
+
+ /*
+ * Mark the place where the formatting - but not
+ * the scope - of the current block ends.
+ */
+ if ( ! mdoc_elem_alloc(m, line, ppos, atok, NULL))
+ return(0);
+ m->last->type = MDOC_BODY;
+ m->last->end = 1; /* Ask for normal spacing. */
+ m->last->pending = body;
+ m->next = MDOC_NEXT_SIBLING;
+ break;
+ }
+
+ /*
+ * When finding an open sub block, remember the last
+ * open explicit block, or, in case there are only
+ * implicit ones, the first open implicit block.
+ */
+ if (later &&
+ MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
+ continue;
+ if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
+ assert( ! (MDOC_ACTED & n->flags));
+ later = n;
+ }
+ }
+
if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
/* FIXME: do this in validate */
if (buf[*pos])
@@ -697,7 +856,7 @@ blk_exp_close(MACRO_PROT_ARGS)
if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
return(0);
- if (maxargs > 0)
+ if (NULL == later && maxargs > 0)
if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
return(0);
@@ -1249,20 +1408,38 @@ blk_part_imp(MACRO_PROT_ARGS)
body->parent->flags |= MDOC_EOS;
}
+ /*
+ * If there is an open sub-block requiring explicit close-out,
+ * postpone closing out the current block
+ * until the rew_sub() call closing out the sub-block.
+ */
+ for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
+ if (MDOC_BLOCK == n->type &&
+ MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
+ ! (MDOC_VALID & n->flags)) {
+ assert( ! (MDOC_ACTED & n->flags));
+ if ( ! make_pending(n, tok, m, line, ppos))
+ return(0);
+ if ( ! mdoc_elem_alloc(m, line, ppos, tok, NULL))
+ return(0);
+ m->last->type = MDOC_BODY;
+ m->last->end = 2; /* Ask for TERMP_NOSPACE. */
+ m->last->pending = body;
+ m->next = MDOC_NEXT_SIBLING;
+ return(1);
+ }
+ }
+
/*
* If we can't rewind to our body, then our scope has already
* been closed by another macro (like `Oc' closing `Op'). This
* is ugly behaviour nodding its head to OpenBSD's overwhelming
* crufty use of `Op' breakage.
*/
- for (n = m->last; n; n = n->parent)
- if (body == n)
- break;
-
- if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
+ if (n != body && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
return(0);
- if (n && ! rew_last(m, body))
+ if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
return(0);
/* Standard appending of delimiters. */
@@ -1272,7 +1449,7 @@ blk_part_imp(MACRO_PROT_ARGS)
/* Rewind scope, if applicable. */
- if (n && ! rew_last(m, blk))
+ if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
return(0);
return(1);
Index: mdoc_term.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_term.c,v
retrieving revision 1.87
diff -u -p -r1.87 mdoc_term.c
--- mdoc_term.c 10 Jun 2010 22:50:10 -0000 1.87
+++ mdoc_term.c 14 Jun 2010 00:55:11 -0000
@@ -321,20 +321,37 @@ print_mdoc_node(DECL_ARGS)
memset(&npair, 0, sizeof(struct termpair));
npair.ppair = pair;
- if (MDOC_TEXT != n->type) {
- if (termacts[n->tok].pre)
- chld = (*termacts[n->tok].pre)(p, &npair, m, n);
- } else
+ if (MDOC_TEXT == n->type)
term_word(p, n->string);
+ else if (termacts[n->tok].pre && !n->end)
+ chld = (*termacts[n->tok].pre)(p, &npair, m, n);
if (chld && n->child)
print_mdoc_nodelist(p, &npair, m, n->child);
term_fontpopq(p, font);
- if (MDOC_TEXT != n->type)
- if (termacts[n->tok].post)
- (*termacts[n->tok].post)(p, &npair, m, n);
+ if (MDOC_TEXT != n->type &&
+ termacts[n->tok].post &&
+ ! (MDOC_ENDED & n->flags)) {
+ (*termacts[n->tok].post)(p, &npair, m, n);
+
+ /*
+ * Explicit end tokens not only call the post
+ * handler, but also tell the respective block
+ * that it must not call the post handler again.
+ */
+ if (n->end)
+ n->pending->flags |= MDOC_ENDED;
+
+ /*
+ * End of line terminating an implicit block
+ * while an explicit block is still open.
+ * Continue the explicit block without spacing.
+ */
+ if (1 < n->end)
+ p->flags |= TERMP_NOSPACE;
+ }
if (MDOC_EOS & n->flags)
p->flags |= TERMP_SENTENCE;
Index: tree.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/tree.c,v
retrieving revision 1.7
diff -u -p -r1.7 tree.c
--- tree.c 23 May 2010 22:45:01 -0000 1.7
+++ tree.c 14 Jun 2010 00:55:11 -0000
@@ -70,7 +70,10 @@ print_mdoc(const struct mdoc_node *n, in
t = "block-head";
break;
case (MDOC_BODY):
- t = "block-body";
+ if (n->end)
+ t = "body-end";
+ else
+ t = "block-body";
break;
case (MDOC_TAIL):
t = "block-tail";
[-- Attachment #2: badnest.in --]
[-- Type: text/plain, Size: 894 bytes --]
.Dd $Mdocdate: February 17 2010 $
.Dt BLOCK-BADNEST 1
.Os
.Sh NAME
.Nm block-badnest
.Nd badly nested blocks
.Sh DESCRIPTION
exp breaking exp:
.Ao ao
.Bo bo ac
.Ac bc
.Bc end
.Pp
imp breaking exp:
.Aq aq Bo bo eol
bc
.Bc end
.Pp
exp breaking imp:
.Ao ao
.Bq bq ac Ac eol
end
.Pp
It extended:
.Bl -tag -width Ds
.It it Bo
.No bo bc
.Bc
text
.El
.Pp
Double-break, inner break before outer:
.Ao ao
.Bo bo
.So so bc
.Bc ac
.Ac sc
.Sc end
.Pp
Double-break, outer break before inner:
.Ao ao
.Bo bo
.So so ac
.Ac bc
.Bc sc
.Sc end
.Pp
Broken breaker:
.Ao ao
.Bo bo ac
.Ac middle
.So so bc
.Bc sc
.Sc end
.Pp
Broken double-breaker:
.Ao ao
.Bo bo
.So so bc
.Bc middle
.Do do ac
.Ac sc
.Sc dc
.Dc end
.Pp
Break the last explicit block:
.br
.Ao ao Bq bq So so Dq dq ac Ac sc Sc eol
end
.br
.Ao ao Bq bq So so Dq dq ac Ac eol
sc
.Sc end
.Pp
Break multiple implicit blocks:
.Ao ao Bq bq Sq sq ac Ac eol
end
next reply other threads:[~2010-06-14 2:47 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-14 2:47 Ingo Schwarze [this message]
2010-06-14 20:46 ` Kristaps Dzonsons
2010-06-15 0:01 ` Ingo Schwarze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100614024711.GI14228@iris.usta.de \
--to=schwarze@usta.de \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).