[PATCH] deal with bad block nesting

tech@mandoc.bsd.lv
 help / color / mirror / Atom feed

* [PATCH] deal with bad block nesting
@ 2010-06-14  2:47 Ingo Schwarze
  2010-06-14 20:46 ` Kristaps Dzonsons
  0 siblings, 1 reply; 3+ messages in thread
From: Ingo Schwarze @ 2010-06-14  2:47 UTC (permalink / raw)
  To: tech

[-- Attachment #1: Type: text/plain, Size: 16869 bytes --]

Hi!

Here is the most difficult patch i have done for mandoc so far.
Even though i already had the make_pending() algorithm before Rostock,
it cost me both the train ride to Rostock and back to design and
implement the remaining algorithms, and another day today for
debugging, which resulted in considerable parts of the patch to be
reorganized, because the code got stuck in various edge cases.
All in all, about a week of work has gone into this...

So, badly nested blocks occur in quite a few manuals -
but what are they?  In a nutshell, blocks are badly nested
if one block ends while at least one of its sub blocks is
still open.

Here are the easy cases:

 1) Partial explicit block ending while a partial explicit sub block
    is still open - briefly, explicit (A) breaking explicit (B):

.Ao ao
.Bo bo ac
.Ac bc
.Bc end

    -> renders as "<ao [bo ac> bc] end"

 2) Partial implicit block ending while a partial explicit sub block
    is still open - briefly, implicit (A) breaking explicit (B):

.Aq aq Bo bo eol
bc
.Bc end

    -> renders as "<aq [bo eol>bc] end"

 3) Partial explicit block ending while a partial implicit sub block
    is still open - briefly, explicit (A) breaking implicit (B):

.Ao ao
.Bq bq ac Ac eol
end

    -> renders as "<ao [bq ac> eol] end"

 4) Full block header extended by a partial explicit sub block -
    briefly, full (It) extended by explicit (B), or just It extended:

.Bl -tag -width Ds
.It it Bo
.No bo bc
.Bc
text
.El

    Actually, the last one is not new, it already works with the
    so-called .Xo-support in our trees.

In all cases so far, when A breaks B, B must remeber to close out A
after closing itself.


Now, you want something more difficult?
You can re-break a block that is already broken,
using purely explicit blocks:

 5) Double-break, inner break before outer break.
    This one is tricky because by the time A breaks S,
    S was already broken by B, so S already remembers to
    close B and can't remeber to close A any more.
    Thus, S must delegate closing A to B.

.Ao ao
.Bo bo
.So so bc
.Bc ac
.Ac sc
.Sc end

    -> renders as "<ao [bo `so bc] ac> sc' end"

 6) Double-break, outer break before inner.
    This one is even trickier because when B breaks S,
    it is not an option for S to delegate closing B to A
    like in case 5, because B must be closed before A.
    Thus, B must first take over the task to close A from S,
    and then S can remember to close B.

.Ao ao
.Bo bo
.So so ac
.Ac bc
.Bc sc
.Sc end

    -> renders as "<ao [bo `so ac> bc] sc' end"

 7) Broken breaker.
    Even a block that was broken itself can break another block.

.Ao ao
.Bo bo ac
.Ac middle
.So so bc
.Bc sc
.Sc end

    -> renders as "<ao [bo ac> middle `so bc] sc' end"

 8) Broken double-breaker.
    Actually, the broken breaker is easier than may seem;
    but it gets really tough when the broken block does
    a style-6 inner double break, because a broken block already
    remebers to close its breaker and cannot - as required by
    case 6 - take over the outer breaker from the block it is
    breaking.  Instead, it must hand over the breaker of
    the block it is breaking to the block it was broken by,
    and only then can the block it is breaking remember it.

.Ao ao
.Bo bo
.So so bc
.Bc middle
.Do do ac
.Ac sc
.Sc dc
.Dc end

    -> renders as "<ao [bo `so bc] middle ``do ac> sc' dc'' end"


So far, this was all about explicit blocks.  Additional aspects come
into play when implicit blocks are involved.

 9) When both explicit and implicit blocks are open, you must
    always break the last explicit block first, and everything
    will just work:

.Ao ao Bq bq So so Dq dq ac Ac sc Sc eol
end
.br
.Ao ao Bq bq So so Dq dq ac Ac eol
sc
.Sc end

    -> renders as "<ao [bq `so ``dq ac> sc' eol''] end"
                  "<ao [bq `so ``dq ac> eol'']sc' end"

 10) But if only implicit blocks are open, you must break
     the *first* implicit block right away:

.Ao ao Bq bq Sq sq ac Ac eol
end

    -> renders as "<ao [bq `sq ac> eol'] end"

So, here is the patch for you guys to play with.
I fear i will need to read it again carefully by the light of day:
Even though it is now passing all tests and building the OpenBSD
tree all right, there might still be some lines of code that are
now obsolete and can be removed.

I'm also attaching the test cases cited above, put together
in a single file.  You may want to run them through -Ttree.

Have fun,
  Ingo


Index: mdoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.h,v
retrieving revision 1.27
diff -u -p -r1.27 mdoc.h
--- mdoc.h	6 Jun 2010 20:30:08 -0000	1.27
+++ mdoc.h	14 Jun 2010 00:55:10 -0000
@@ -278,6 +278,7 @@ struct	mdoc_node {
 #define	MDOC_ACTED	 (1 << 1) /* has been acted upon */
 #define	MDOC_EOS	 (1 << 2) /* at sentence boundary */
 #define	MDOC_LINE	 (1 << 3) /* first macro/text on line */
+#define	MDOC_ENDED	 (1 << 4) /* rendering has been ended */
 	enum mdoc_type	  type; /* AST node type */
 	enum mdoc_sec	  sec; /* current named section */
 	struct mdoc_arg	 *args; 	/* BLOCK/ELEM */
@@ -286,6 +287,7 @@ struct	mdoc_node {
 	struct mdoc_node *body;		/* BLOCK */
 	struct mdoc_node *tail;		/* BLOCK */
 	char		 *string;	/* TEXT */
+	int		  end;		/* BODY */
 
 	union {
 		enum mdoc_list list; /* for `Bl' nodes */
Index: mdoc_macro.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v
retrieving revision 1.46
diff -u -p -r1.46 mdoc_macro.c
--- mdoc_macro.c	6 Jun 2010 20:30:08 -0000	1.46
+++ mdoc_macro.c	14 Jun 2010 00:55:10 -0000
@@ -46,6 +46,8 @@ static	int	  	append_delims(struct mdoc 
 				int, int *, char *);
 static	enum mdoct	lookup(enum mdoct, const char *);
 static	enum mdoct	lookup_raw(const char *);
+static	int		make_pending(struct mdoc_node *, enum mdoc_type,
+				struct mdoc *, int, int);
 static	int	  	phrase(struct mdoc *, int, int, char *);
 static	enum mdoct 	rew_alt(enum mdoct);
 static	int	  	rew_dobreak(enum mdoct, 
@@ -406,7 +408,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 		/* FALLTHROUGH */
 	case (MDOC_Vt):
 		assert(MDOC_TAIL != type);
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	case (MDOC_It):
@@ -460,7 +466,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_So):
 		/* FALLTHROUGH */
 	case (MDOC_Xo):
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	/* Multi-line explicit scope close. */
@@ -495,7 +505,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_Sc):
 		/* FALLTHROUGH */
 	case (MDOC_Xc):
-		if (type == p->type && rew_alt(tok) == p->tok)
+		if (rew_alt(tok) != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	default:
@@ -522,6 +536,8 @@ rew_dobreak(enum mdoct tok, const struct
 		return(1);
 	if (MDOC_VALID & p->flags)
 		return(1);
+	if (MDOC_BODY == p->type && p->end)
+		return(1);
 
 	switch (tok) {
 	case (MDOC_It):
@@ -572,6 +588,81 @@ rew_elem(struct mdoc *mdoc, enum mdoct t
 }
 
 
+/*
+ * We are trying to close a block identified by tok,
+ * but the child block *broken is still open.
+ * Thus, postpone closing the tok block
+ * until the rew_sub call closing *broken.
+ */
+static int
+make_pending(struct mdoc_node *broken, enum mdoct tok,
+		struct mdoc *m, int line, int ppos)
+{
+	struct mdoc_node *breaker;
+
+	/*
+	 * Iterate backwards, searching for the block matching tok,
+	 * that is, the block breaking the *broken block.
+	 */
+	for (breaker = broken->parent; breaker; breaker = breaker->parent) {
+
+		/*
+		 * If the *broken block had already been broken before
+		 * and we encounter its breaker, make the tok block
+		 * pending on the inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
+		 * becomes "[A broken=[B [C->B B] tok=A] C]"
+		 * and finally "[A [B->A [C->B B] A] C]".
+		 */
+		if (breaker == broken->pending) {
+			broken = breaker;
+			continue;
+		}
+
+		if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
+			continue;
+		if (MDOC_BODY == broken->type)
+			broken = broken->parent;
+
+		/*
+		 * Found the breaker.
+		 * If another, outer breaker is already pending on
+		 * the *broken block, we must not clobber the link
+		 * to the outer breaker, but make it pending on the
+		 * new, now inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
+		 * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
+		 * and finally "[A [B->A [C->B A] B] C]".
+		 */
+		if (broken->pending) {
+			struct mdoc_node *taker;
+
+			/*
+			 * If the breaker had also been broken before,
+			 * it cannot take on the outer breaker itself,
+			 * but must hand it on to its own breakers.
+			 * Graphically, this is the following situation:
+			 * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
+			 * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
+			 */
+			taker = breaker;
+			while (taker->pending)
+				taker = taker->pending;
+			taker->pending = broken->pending;
+		}
+		broken->pending = breaker;
+		return(1);
+	}
+
+	/*
+	 * Found no matching block for tok.
+	 * Are you trying to close a block that is not open?
+	 * Report failure.
+	 */
+	mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
+	return(0);
+}
+
 static int
 rew_sub(enum mdoc_type t, struct mdoc *m, 
 		enum mdoct tok, int line, int ppos)
@@ -583,7 +674,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 	for (n = m->last; n; n = n->parent) {
 		c = rew_dohalt(tok, t, n);
 		if (REWIND_HALT == c) {
-			if (MDOC_BLOCK != t)
+			if (n->end || MDOC_BLOCK != t)
 				return(1);
 			if ( ! (MDOC_EXPLICIT & mdoc_macros[tok].flags))
 				return(1);
@@ -597,6 +688,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 			continue;
 		if ( ! swarn(m, t, line, ppos, n))
 			return(0);
+		return make_pending(n, tok, m, line, ppos);
 	}
 
 	assert(n);
@@ -604,15 +696,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 		return(0);
 
 	/*
-	 * The current block extends an enclosing block beyond a line
-	 * break.  Now that the current block ends, close the enclosing
-	 * block, too.
+	 * The current block extends an enclosing block.
+	 * Now that the current block ends, close the enclosing block, too.
 	 */
-	if (NULL != (n = n->pending)) {
-		assert(MDOC_HEAD == n->type);
+	while (NULL != (n = n->pending)) {
 		if ( ! rew_last(m, n))
 			return(0);
-		if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
+		if (MDOC_HEAD == n->type &&
+		    ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
 			return(0);
 	}
 	return(1);
@@ -667,9 +758,13 @@ append_delims(struct mdoc *m, int line, 
 static int
 blk_exp_close(MACRO_PROT_ARGS)
 {
+	struct mdoc_node *body = NULL;	/* Our own body. */
+	struct mdoc_node *later = NULL;	/* A sub-block starting later. */
+	struct mdoc_node *n;		/* For searching backwards. */
+
 	int	 	 j, lastarg, maxargs, flushed, nl;
 	enum margserr	 ac;
-	enum mdoct	 ntok;
+	enum mdoct	 atok, ntok;
 	char		*p;
 
 	nl = MDOC_NEWLINE & m->flags;
@@ -683,6 +778,70 @@ blk_exp_close(MACRO_PROT_ARGS)
 		break;
 	}
 
+	/*
+	 * Search backwards for beginnings of blocks,
+	 * both of our own and of pending sub-blocks.
+	 */
+	atok = rew_alt(tok);
+	for (n = m->last; n; n = n->parent) {
+		if (MDOC_VALID & n->flags)
+			continue;
+
+		/* Remember the start of our own body. */
+		if (MDOC_BODY == n->type && atok == n->tok) {
+			if (0 == n->end)
+				body = n;
+			continue;
+		}
+
+		if (MDOC_BLOCK != n->type)
+			continue;
+		if (atok == n->tok) {
+			assert(body);
+
+			/*
+			 * Found the start of our own block.
+			 * When there is no pending sub block,
+			 * just proceed to closing out.
+			 */
+			if (NULL == later)
+				break;
+
+			/* 
+			 * When there is a pending sub block,
+			 * postpone closing out the current block
+			 * until the rew_sub() closing out the sub-block.
+			 */
+			if ( ! make_pending(later, tok, m, line, ppos))
+				return(0);
+
+			/*
+			 * Mark the place where the formatting - but not
+			 * the scope - of the current block ends.
+			 */
+			if ( ! mdoc_elem_alloc(m, line, ppos, atok, NULL))
+				return(0);
+			m->last->type = MDOC_BODY;
+			m->last->end = 1;  /* Ask for normal spacing. */
+			m->last->pending = body;
+			m->next = MDOC_NEXT_SIBLING;
+			break;
+		}
+
+		/*
+		 * When finding an open sub block, remember the last
+		 * open explicit block, or, in case there are only
+		 * implicit ones, the first open implicit block.
+		 */
+		if (later &&
+		    MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
+			continue;
+		if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
+			assert( ! (MDOC_ACTED & n->flags));
+			later = n;
+		}
+	}
+
 	if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
 		/* FIXME: do this in validate */
 		if (buf[*pos]) 
@@ -697,7 +856,7 @@ blk_exp_close(MACRO_PROT_ARGS)
 	if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
-	if (maxargs > 0) 
+	if (NULL == later && maxargs > 0) 
 		if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
 			return(0);
 
@@ -1249,20 +1408,38 @@ blk_part_imp(MACRO_PROT_ARGS)
 		body->parent->flags |= MDOC_EOS;
 	}
 
+	/*
+	 * If there is an open sub-block requiring explicit close-out,
+	 * postpone closing out the current block
+	 * until the rew_sub() call closing out the sub-block.
+	 */
+	for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
+		if (MDOC_BLOCK == n->type &&
+		    MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
+		    ! (MDOC_VALID & n->flags)) {
+			assert( ! (MDOC_ACTED & n->flags));
+			if ( ! make_pending(n, tok, m, line, ppos))
+				return(0);
+			if ( ! mdoc_elem_alloc(m, line, ppos, tok, NULL))
+				return(0);
+			m->last->type = MDOC_BODY;
+			m->last->end = 2;  /* Ask for TERMP_NOSPACE. */
+			m->last->pending = body;
+			m->next = MDOC_NEXT_SIBLING;
+			return(1);
+		}
+	}
+
 	/* 
 	 * If we can't rewind to our body, then our scope has already
 	 * been closed by another macro (like `Oc' closing `Op').  This
 	 * is ugly behaviour nodding its head to OpenBSD's overwhelming
 	 * crufty use of `Op' breakage.
 	 */
-	for (n = m->last; n; n = n->parent)
-		if (body == n)
-			break;
-
-	if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
+	if (n != body && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
 		return(0);
 
-	if (n && ! rew_last(m, body))
+	if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
 	/* Standard appending of delimiters. */
@@ -1272,7 +1449,7 @@ blk_part_imp(MACRO_PROT_ARGS)
 
 	/* Rewind scope, if applicable. */
 
-	if (n && ! rew_last(m, blk))
+	if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
 		return(0);
 
 	return(1);
Index: mdoc_term.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_term.c,v
retrieving revision 1.87
diff -u -p -r1.87 mdoc_term.c
--- mdoc_term.c	10 Jun 2010 22:50:10 -0000	1.87
+++ mdoc_term.c	14 Jun 2010 00:55:11 -0000
@@ -321,20 +321,37 @@ print_mdoc_node(DECL_ARGS)
 	memset(&npair, 0, sizeof(struct termpair));
 	npair.ppair = pair;
 
-	if (MDOC_TEXT != n->type) {
-		if (termacts[n->tok].pre)
-			chld = (*termacts[n->tok].pre)(p, &npair, m, n);
-	} else 
+	if (MDOC_TEXT == n->type)
 		term_word(p, n->string); 
+	else if (termacts[n->tok].pre && !n->end)
+		chld = (*termacts[n->tok].pre)(p, &npair, m, n);
 
 	if (chld && n->child)
 		print_mdoc_nodelist(p, &npair, m, n->child);
 
 	term_fontpopq(p, font);
 
-	if (MDOC_TEXT != n->type)
-		if (termacts[n->tok].post)
-			(*termacts[n->tok].post)(p, &npair, m, n);
+	if (MDOC_TEXT != n->type &&
+	    termacts[n->tok].post &&
+	    ! (MDOC_ENDED & n->flags)) {
+		(*termacts[n->tok].post)(p, &npair, m, n);
+
+		/*
+		 * Explicit end tokens not only call the post
+		 * handler, but also tell the respective block
+		 * that it must not call the post handler again.
+		 */
+		if (n->end)
+			n->pending->flags |= MDOC_ENDED;
+
+		/*
+		 * End of line terminating an implicit block
+		 * while an explicit block is still open.
+		 * Continue the explicit block without spacing.
+		 */
+		if (1 < n->end)
+			p->flags |= TERMP_NOSPACE;
+	}
 
 	if (MDOC_EOS & n->flags)
 		p->flags |= TERMP_SENTENCE;
Index: tree.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/tree.c,v
retrieving revision 1.7
diff -u -p -r1.7 tree.c
--- tree.c	23 May 2010 22:45:01 -0000	1.7
+++ tree.c	14 Jun 2010 00:55:11 -0000
@@ -70,7 +70,10 @@ print_mdoc(const struct mdoc_node *n, in
 		t = "block-head";
 		break;
 	case (MDOC_BODY):
-		t = "block-body";
+		if (n->end)
+			t = "body-end";
+		else
+			t = "block-body";
 		break;
 	case (MDOC_TAIL):
 		t = "block-tail";


[-- Attachment #2: badnest.in --]
[-- Type: text/plain, Size: 894 bytes --]

.Dd $Mdocdate: February 17 2010 $
.Dt BLOCK-BADNEST 1
.Os
.Sh NAME
.Nm block-badnest
.Nd badly nested blocks
.Sh DESCRIPTION
exp breaking exp:
.Ao ao
.Bo bo ac
.Ac bc
.Bc end
.Pp
imp breaking exp:
.Aq aq Bo bo eol
bc
.Bc end
.Pp
exp breaking imp:
.Ao ao
.Bq bq ac Ac eol
end
.Pp
It extended:
.Bl -tag -width Ds
.It it Bo
.No bo bc
.Bc
text
.El
.Pp
Double-break, inner break before outer:
.Ao ao
.Bo bo
.So so bc
.Bc ac
.Ac sc
.Sc end
.Pp
Double-break, outer break before inner:
.Ao ao
.Bo bo
.So so ac
.Ac bc
.Bc sc
.Sc end
.Pp
Broken breaker:
.Ao ao
.Bo bo ac
.Ac middle
.So so bc
.Bc sc
.Sc end
.Pp
Broken double-breaker:
.Ao ao
.Bo bo
.So so bc
.Bc middle
.Do do ac
.Ac sc
.Sc dc
.Dc end
.Pp
Break the last explicit block:
.br
.Ao ao Bq bq So so Dq dq ac Ac sc Sc eol
end
.br
.Ao ao Bq bq So so Dq dq ac Ac eol
sc
.Sc end
.Pp
Break multiple implicit blocks:
.Ao ao Bq bq Sq sq ac Ac eol
end

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] deal with bad block nesting
  2010-06-14  2:47 [PATCH] deal with bad block nesting Ingo Schwarze
@ 2010-06-14 20:46 ` Kristaps Dzonsons
  2010-06-15  0:01   ` Ingo Schwarze
  0 siblings, 1 reply; 3+ messages in thread
From: Kristaps Dzonsons @ 2010-06-14 20:46 UTC (permalink / raw)
  To: tech

Ingo,

First I'm going to tag 1.10.2, which features improved -Tps and some 
performance/simplicity improvements (the cached data points), then we 
can take a more in-depth look at this with a stable roll-back point.

My first impression is that this makes our nice and regular AST into 
shit soup.  Don't get me wrong--it's a wonderful piece of work, but I 
think we can do it better in a completely different way.

I propose (dum dum duuuum):

If the offending macros are all Ao/Ac Eo/Ec blk_part_exp(), which it 
seems from the code, why don't we just make these into in_line_argv(1) 
macros and completely avoid this complexity?

My motivation is that none of the Ao/Ac/etc. macros are semantically or 
syntatically valuable.  They're just eye-candy.  They don't affect their 
contents, just the point at which the Ao or Ac is invoked.

The only thing this doesn't apply to is Oo/Oc, which has semantic 
meaning in the SYNOPSIS.  Even this failure can be reduced if we 
special-case Oo/Oc within the SYNOPSIS (we already do this with Vt) to 
be a "real" block and the rest to be some shitty in_line_eoln().

Ingo, what are your thoughts on such an approach?  Of course, my 
arguments hinge on the primary offenders for this behaviour...

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] deal with bad block nesting
  2010-06-14 20:46 ` Kristaps Dzonsons
@ 2010-06-15  0:01   ` Ingo Schwarze
  0 siblings, 0 replies; 3+ messages in thread
From: Ingo Schwarze @ 2010-06-15  0:01 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, Jun 14, 2010 at 10:46:39PM +0200:

> First I'm going to tag 1.10.2, which features improved -Tps and some
> performance/simplicity improvements (the cached data points), then
> we can take a more in-depth look at this with a stable roll-back
> point.

Sounds good.

> My first impression is that this makes our nice and regular AST into
> shit soup.

In a way, yes, no doubt.

The open questions are whether we can do better, how much effeort
that will cause, and which will be the best point in time to try.

> Don't get me wrong--it's a wonderful piece of work, but
> I think we can do it better in a completely different way.
> 
> I propose (dum dum duuuum):
> 
> If the offending macros are all Ao/Ac Eo/Ec blk_part_exp(), which it
> seems from the code,

Well, when talking about bad nesting, we always talk about *two* macros:
One that is breaking the other, and the other one that is broken.

The former, the breaking macro, can be any PARSED block macro.
Here ist the complete list of parsed block macros:

 * blk_full: It  (implicit)
 * blk_part_exp: all (Ao Bo Do Po Qo So Xo Oo Bro Eo)
 * blk_part_imp: all (D1 Dl Op Aq Bq Dq Pq Ql Qq Sq Brq)

The latter, the broken macro, can be any CALLABLE block macro.
Here ist the complete list of callable block macros:

 * blk_full: Fo  (explicit)
 * blk_part_exp: all (Ao Bo Do Po Qo So Xo Oo Bro Eo)
 * blk_part_imp: Op Aq Bq Dq Pq Ql Qq Sq Brq

The relevant combinations are somewhat restricted because implicit
macros cannot break each other: Two subsequent implicit macros
implicitely end at the same place, that is, without one breaking
the other.  Thus, we get the following breakings:

 * It broken by blk_part_exp
 * It broken by Fo  (just found that case right now! oops!)
 * blk_part_exp broken by blk_part_exp
 * blk_part_exp broken by Fo  (again, oops, as above)
 * blk_part_exp broken by blk_part_imp
 * blk_part_imp broken by blk_part_exp

The new case It/Fo

.Bl -tag -width Ds
.It it Fo sin
.Fa "double x"
.Fc
The sine function.
.El

gives in old groff:

     it sin(
             double x) The sine function.

in new groff:

             sin( The sine function.

in mandoc:

     it sin(double x)
             The sine function.

So i guess we don't need to worry about that one,
groff pukes, and not even consistently.

> why don't we just make these into
> in_line_argv(1) macros and completely avoid this complexity?

Which cases does this address?

 * It broken by blk_part_exp           => MADE WORSE
 * It broken by Fo                     => NOT SOLVED, but WEIRD ANYWAY
 * blk_part_exp broken by blk_part_exp => SOLVED
 * blk_part_exp broken by Fo           => SOLVED
 * blk_part_exp broken by blk_part_imp => SOLVED
 * blk_part_imp broken by blk_part_exp => SOLVED

The first case is made worse because blk_part_exp (say, Ao) does
not only break It, but extends it.  So, the Ac macro must close
the It HEAD block.  As long as Ac closes an Ao block, and that
block remembers that it was broken by the It, it is relatively
easy for the Ac to search its corresponding Ao (because that's
a block, it can only be in the direct line of parents, not
hidden somewhere as a child) and close out the pending It HEAD.
Now, if the Ao is just in_line, it can be hidden anywhere, and
the whole point of the change is that the It does not even
realize breaking anything.  So, how is the It going to realize
that it is being extended?  And how is the Ac going to find out
that it needs to perform the postponed It close-out?

This will need completely new algorithms.  For example, a global
pointer and a global stack, the pointer normally being NULL, the
stack empty.  When entering an It HEAD, you let the pointer point
to the new It.  When parsing a block-opening in_line (Ao) while
the pointer is non-NULL, you push a note onto the stack.  When
parsing a block-closing in_line (Ac) while the pointer is non-NULL,
you remove the latest matching note from the stack.  When, at
the end of the It line, the stack is not empty, you extend the
It HEAD scope to the next line.  Then, you don't add new notes
to the stack any more, but continue removing old ones.  As soon
as the stack becomes empty, the It scope ends.

But is that really better than what I sent?

> My motivation is that none of the Ao/Ac/etc. macros are semantically
> or syntatically valuable.  They're just eye-candy.  They don't
> affect their contents, just the point at which the Ao or Ac is
> invoked.

Maybe, but you still want to warn when they don't pair up, right?
How do you check that when they are blocks no longer?
OK, you can set up a different algorithm, for example a global
counter for each type, and when that counter goes negative,
you warn ("block not open"), and when it is still non-zero at
the end of a scope (e.g., at Ss, Sh, Ed, El, It or end of file),
you warn as well ("unclosed block").

Again, new stuff to be designed and implemented.

> The only thing this doesn't apply to is Oo/Oc, which has semantic
> meaning in the SYNOPSIS.  Even this failure can be reduced if we
> special-case Oo/Oc within the SYNOPSIS (we already do this with Vt)
> to be a "real" block and the rest to be some shitty in_line_eoln().

Except that Oo/Oc and Op are most notorious for bad nesting,
in particular in the SYNOPSIS.  So, are we then right back to the
original problem exactly where it is going to hurt us most?
If Oo/Oc is going to stay a block in the SYNOPSIS, how are we going
to handle bad O* nesting in the SYNOPSIS?  It is not only O*, O* may
and does mix with Xo and might even mix with Fo (not sure),
in particular in the SYNOPSIS.

> Ingo, what are your thoughts on such an approach?  Of course, my
> arguments hinge on the primary offenders for this behaviour...

If your approach turns out to be able to deal with all cases (as i'm
quite convinced mine does, because it is implemented, i see that it
works, and have a good feeling now how difficult fixing any remaining
bugs might be in case any show up), and in case your approach turns
out to be simpler and cleaner than mine, we should certainly take
yours, because then i would consider it the better one.

The above questions do not look completely trivial, though, and there
may be hidden obstacles along the way that i overlooked (there were
many with my approach that i found and solved one by one).  So, who
is going to implement your suggestion, and work out the remaining
design decisions, such that we can compare and decide which solution
is the better one?

I'm not particularly keen on doing that *right now* (or during the
hackathon), even if it is better.  There are still some other
critical problems to be solved, and personally, i would rather
take the solution i have for one problem, at least for now, and
move on to the next one instead of starting over - and taking the
risk to get stuck with the same problem a second time.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-06-15  0:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-14  2:47 [PATCH] deal with bad block nesting Ingo Schwarze
2010-06-14 20:46 ` Kristaps Dzonsons
2010-06-15  0:01   ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).