tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Back-block nesting patch.
       [not found] <20100627001247.GB2850@iris.usta.de>
@ 2010-06-27 23:47 ` Kristaps Dzonsons
  2010-06-28 19:02   ` Ingo Schwarze
  2010-06-29  5:06   ` rew_dohalt reorg and bad block-nesting bug Ingo Schwarze
  0 siblings, 2 replies; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-06-27 23:47 UTC (permalink / raw)
  To: tech

Ingo, some comments in-line before I crash and sleep.  This is CC'd to 
tech@mdocml so that my plan is known.

The following comments (in-line + snippage) are mechanical; I won't be 
catching any logic bugs until this is merged and I can look through it 
as part of the code.

On that note, here's what I suggest.  Let's get fully BSD.lv-OpenBSD 
sync'd, then I'll tag the tree.  Then we merge, clean up, and EVERYBODY 
ON THIS LIST test the hell out of it.  I'll post to discuss@ as well. 
Ingo's put a lot of work into this and it looks good, but the more eyes 
in different local manual trees, the better.

My first general comment: I want a big fat non-fatal "you're an idiot 
for making this manual" warning for each and every break (this may 
already be implemented).

My second is that the documentation must be precise for this, because 
it's quite confusing.  I mean mdoc.3, which documents the AST.  I'd 
rather mdoc.7 know nothing about this.  This documentation should go in 
with the first raft of commits.

> +       int               end;          /* BODY */

Can you make this into an enum to reduce ambiguity?

> +/*
> + * We are trying to close a block identified by tok,
> + * but the child block *broken is still open.
> + * Thus, postpone closing the tok block
> + * until the rew_sub call closing *broken.
> + */
> +static int
> +make_pending(struct mdoc_node *broken, enum mdoct tok,
> +               struct mdoc *m, int line, int ppos)
> +{
> +       struct mdoc_node *breaker;
> +
> +       /*
> +        * Iterate backwards, searching for the block matching tok,
> +        * that is, the block breaking the *broken block.
> +        */
> +       for (breaker = broken->parent; breaker; breaker = breaker->parent) {
> +
> +               /*
> +                * If the *broken block had already been broken before
> +                * and we encounter its breaker, make the tok block
> +                * pending on the inner breaker.
> +                * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
> +                * becomes "[A broken=[B [C->B B] tok=A] C]"
> +                * and finally "[A [B->A [C->B B] A] C]".
> +                */
> +               if (breaker == broken->pending) {
> +                       broken = breaker;
> +                       continue;
> +               }
> +
> +               if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
> +                       continue;
> +               if (MDOC_BODY == broken->type)
> +                       broken = broken->parent;
> +
> +               /*
> +                * Found the breaker.
> +                * If another, outer breaker is already pending on
> +                * the *broken block, we must not clobber the link
> +                * to the outer breaker, but make it pending on the
> +                * new, now inner breaker.
> +                * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
> +                * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
> +                * and finally "[A [B->A [C->B A] B] C]".
> +                */
> +               if (broken->pending) {
> +                       struct mdoc_node *taker;
> +
> +                       /*
> +                        * If the breaker had also been broken before,
> +                        * it cannot take on the outer breaker itself,
> +                        * but must hand it on to its own breakers.
> +                        * Graphically, this is the following situation:
> +                        * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
> +                        * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
> +                        */
> +                       taker = breaker;
> +                       while (taker->pending)
> +                               taker = taker->pending;
> +                       taker->pending = broken->pending;
> +               }
> +               broken->pending = breaker;
> +               mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos, "%s breaks %s",
> +                   mdoc_macronames[tok], mdoc_macronames[broken->tok]);
> +               return(1);
> +       }
> +
> +       /*
> +        * Found no matching block for tok.
> +        * Are you trying to close a block that is not open?
> +        * Report failure and abort the parser.
> +        */
> +       mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
> +       return(0);
> +}

The comments in-line here are great.  I just wanted to say.  Good work!

> +               return make_pending(n, tok, m, line, ppos);

Nit: I always parenthesise my returns, e.g.,

       return(make_pending(n, tok, m, line, ppos));

Functional background, what can I say.  Same with case statements.

>         }
> 
>         assert(n);
> @@ -604,15 +647,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
>                 return(0);
> 
>         /*
> -        * The current block extends an enclosing block beyond a line
> -        * break.  Now that the current block ends, close the enclosing
> -        * block, too.
> +        * The current block extends an enclosing block.
> +        * Now that the current block ends, close the enclosing block, too.
>          */
> -       if (NULL != (n = n->pending)) {
> -               assert(MDOC_HEAD == n->type);
> +       while (NULL != (n = n->pending)) {
>                 if ( ! rew_last(m, n))
>                         return(0);
> -               if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
> +               if (MDOC_HEAD == n->type &&
> +                   ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
>                         return(0);
>         }
>         return(1);
> @@ -667,9 +709,13 @@ append_delims(struct mdoc *m, int line,
>  static int
>  blk_exp_close(MACRO_PROT_ARGS)
>  {
> +       struct mdoc_node *body = NULL;  /* Our own body. */
> +       struct mdoc_node *later = NULL; /* A sub-block starting later. */

Another nit: don't initialise in declarations (this is in OpenBSD's 
style(9) as well).

> +       struct mdoc_node *n;            /* For searching backwards. */

Note: these sorts of comments are really quite useful.  I use them all 
the time in term.c (as I always mix up my vbl's and vis's), so I 
heartily endorse their use for complex bits.

> +
>         int              j, lastarg, maxargs, flushed, nl;
>         enum margserr    ac;
> -       enum mdoct       ntok;
> +       enum mdoct       atok, ntok;
>         char            *p;
> 
>         nl = MDOC_NEWLINE & m->flags;
> @@ -683,6 +729,70 @@ blk_exp_close(MACRO_PROT_ARGS)
>                 break;
>         }
> 
> +       /*
> +        * Search backwards for beginnings of blocks,
> +        * both of our own and of pending sub-blocks.
> +        */
> +       atok = rew_alt(tok);
> +       for (n = m->last; n; n = n->parent) {
> +               if (MDOC_VALID & n->flags)
> +                       continue;
> +
> +               /* Remember the start of our own body. */
> +               if (MDOC_BODY == n->type && atok == n->tok) {
> +                       if (0 == n->end)
> +                               body = n;
> +                       continue;
> +               }
> +
> +               if (MDOC_BLOCK != n->type)
> +                       continue;
> +               if (atok == n->tok) {
> +                       assert(body);

I love assertions!  Use them as often as you'd like (within reason).

> +
> +                       /*
> +                        * Found the start of our own block.
> +                        * When there is no pending sub block,
> +                        * just proceed to closing out.
> +                        */
> +                       if (NULL == later)
> +                               break;
> +
> +                       /*
> +                        * When there is a pending sub block,
> +                        * postpone closing out the current block
> +                        * until the rew_sub() closing out the sub-block.
> +                        */
> +                       if ( ! make_pending(later, tok, m, line, ppos))
> +                               return(0);
> +
> +                       /*
> +                        * Mark the place where the formatting - but not
> +                        * the scope - of the current block ends.
> +                        */
> +                       if ( ! mdoc_elem_alloc(m, line, ppos, atok, NULL))
> +                               return(0);
> +                       m->last->type = MDOC_BODY;
> +                       m->last->end = 1;  /* Ask for normal spacing. */
> +                       m->last->pending = body;
> +                       m->next = MDOC_NEXT_SIBLING;
> +                       break;

Please make a mdoc_endbody_alloc() in mdoc.c to make this fully clear.

> +               }
> +
> +               /*
> +                * When finding an open sub block, remember the last
> +                * open explicit block, or, in case there are only
> +                * implicit ones, the first open implicit block.
> +                */
> +               if (later &&
> +                   MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
> +                       continue;
> +               if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
> +                       assert( ! (MDOC_ACTED & n->flags));
> +                       later = n;
> +               }
> +       }
> +
>         if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
>                 /* FIXME: do this in validate */
>                 if (buf[*pos])
> @@ -697,7 +807,7 @@ blk_exp_close(MACRO_PROT_ARGS)
>         if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
>                 return(0);
> 
> -       if (maxargs > 0)
> +       if (NULL == later && maxargs > 0)
>                 if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
>                         return(0);
> 
> @@ -1122,6 +1232,10 @@ blk_full(MACRO_PROT_ARGS)
>                                 MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
>                                 ! (MDOC_VALID & n->flags)) {
>                         assert( ! (MDOC_ACTED & n->flags));
> +                       mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
> +                           "%s header extended by %s",
> +                           mdoc_macronames[tok],
> +                           mdoc_macronames[n->tok]);
>                         n->pending = head;
>                         return(1);
>                 }
> @@ -1249,20 +1363,39 @@ blk_part_imp(MACRO_PROT_ARGS)
>                 body->parent->flags |= MDOC_EOS;
>         }
> 
> +       /*
> +        * If there is an open sub-block requiring explicit close-out,
> +        * postpone closing out the current block
> +        * until the rew_sub() call closing out the sub-block.
> +        */
> +       for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
> +               if (MDOC_BLOCK == n->type &&
> +                   MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
> +                   ! (MDOC_VALID & n->flags)) {
> +                       assert( ! (MDOC_ACTED & n->flags));
> +                       if ( ! make_pending(n, tok, m, line, ppos))
> +                               return(0);
> +                       if ( ! mdoc_elem_alloc(m, line, ppos, tok, NULL))
> +                               return(0);
> +                       m->last->type = MDOC_BODY;
> +                       m->last->end = 2;  /* Ask for TERMP_NOSPACE. */
> +                       m->last->pending = body;
> +                       m->next = MDOC_NEXT_SIBLING;
> +                       return(1);

Same as above for mdoc_endbody_alloc().

> +               }
> +       }
> +
>         /*
>          * If we can't rewind to our body, then our scope has already
>          * been closed by another macro (like `Oc' closing `Op').  This
>          * is ugly behaviour nodding its head to OpenBSD's overwhelming
>          * crufty use of `Op' breakage.
>          */
> -       for (n = m->last; n; n = n->parent)
> -               if (body == n)
> -                       break;
> -
> -       if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
> +       if (n != body && ! mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
> +           "%s broken", mdoc_macronames[tok]))
>                 return(0);
> 
> -       if (n && ! rew_last(m, body))
> +       if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
>                 return(0);
> 
>         /* Standard appending of delimiters. */
> @@ -1272,7 +1405,7 @@ blk_part_imp(MACRO_PROT_ARGS)
> 
>         /* Rewind scope, if applicable. */
> 
> -       if (n && ! rew_last(m, blk))
> +       if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
>                 return(0);
> 
>         return(1);
> Index: mdoc_term.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mdoc_term.c,v
> retrieving revision 1.89
> diff -u -p -r1.89 mdoc_term.c
> --- mdoc_term.c 26 Jun 2010 19:08:00 -0000      1.89
> +++ mdoc_term.c 27 Jun 2010 00:08:58 -0000
> @@ -318,20 +318,37 @@ print_mdoc_node(DECL_ARGS)
>         memset(&npair, 0, sizeof(struct termpair));
>         npair.ppair = pair;
> 
> -       if (MDOC_TEXT != n->type) {
> -               if (termacts[n->tok].pre)
> -                       chld = (*termacts[n->tok].pre)(p, &npair, m, n);
> -       } else
> +       if (MDOC_TEXT == n->type)
>                 term_word(p, n->string);
> +       else if (termacts[n->tok].pre && !n->end)
> +               chld = (*termacts[n->tok].pre)(p, &npair, m, n);
> 
>         if (chld && n->child)
>                 print_mdoc_nodelist(p, &npair, m, n->child);
> 
>         term_fontpopq(p, font);
> 
> -       if (MDOC_TEXT != n->type)
> -               if (termacts[n->tok].post)
> -                       (*termacts[n->tok].post)(p, &npair, m, n);
> +       if (MDOC_TEXT != n->type &&
> +           termacts[n->tok].post &&
> +           ! (MDOC_ENDED & n->flags)) {
> +               (*termacts[n->tok].post)(p, &npair, m, n);
> +
> +               /*
> +                * Explicit end tokens not only call the post
> +                * handler, but also tell the respective block
> +                * that it must not call the post handler again.
> +                */
> +               if (n->end)
> +                       n->pending->flags |= MDOC_ENDED;
> +
> +               /*
> +                * End of line terminating an implicit block
> +                * while an explicit block is still open.
> +                * Continue the explicit block without spacing.
> +                */
> +               if (1 < n->end)
> +                       p->flags |= TERMP_NOSPACE;
> +       }

This obviously needs some sort of analogue in mdoc_html.c.

> 
>         if (MDOC_EOS & n->flags)
>                 p->flags |= TERMP_SENTENCE;
> Index: tree.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/tree.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 tree.c
> --- tree.c      23 May 2010 22:45:01 -0000      1.7
> +++ tree.c      27 Jun 2010 00:08:58 -0000
> @@ -70,7 +70,10 @@ print_mdoc(const struct mdoc_node *n, in
>                 t = "block-head";
>                 break;
>         case (MDOC_BODY):
> -               t = "block-body";
> +               if (n->end)
> +                       t = "body-end";
> +               else
> +                       t = "block-body";
>                 break;
>         case (MDOC_TAIL):
>                 t = "block-tail";

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Back-block nesting patch.
  2010-06-27 23:47 ` Back-block nesting patch Kristaps Dzonsons
@ 2010-06-28 19:02   ` Ingo Schwarze
  2010-06-28 19:26     ` Kristaps Dzonsons
  2010-06-29  5:06   ` rew_dohalt reorg and bad block-nesting bug Ingo Schwarze
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2010-06-28 19:02 UTC (permalink / raw)
  To: tech

Hi Kristaps,

here is the updated patch taking your comments into account.
I shall now merge your latest commit; we are nearly in sync
already.

Yours,
  Ingo


Index: libmdoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/libmdoc.h,v
retrieving revision 1.38
diff -u -p -r1.38 libmdoc.h
--- libmdoc.h	27 Jun 2010 21:54:41 -0000	1.38
+++ libmdoc.h	28 Jun 2010 18:56:18 -0000
@@ -109,6 +109,9 @@ int		  mdoc_block_alloc(struct mdoc *, i
 int		  mdoc_head_alloc(struct mdoc *, int, int, enum mdoct);
 int		  mdoc_tail_alloc(struct mdoc *, int, int, enum mdoct);
 int		  mdoc_body_alloc(struct mdoc *, int, int, enum mdoct);
+int		  mdoc_endbody_alloc(struct mdoc *m, int line, int pos,
+			enum mdoct tok, struct mdoc_node *body,
+			enum mdoc_endbody end);
 void		  mdoc_node_delete(struct mdoc *, struct mdoc_node *);
 void		  mdoc_hash_init(void);
 enum mdoct	  mdoc_hash_find(const char *);
Index: mdoc.3
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.3,v
retrieving revision 1.9
diff -u -p -r1.9 mdoc.3
--- mdoc.3	27 Jun 2010 21:54:42 -0000	1.9
+++ mdoc.3	28 Jun 2010 18:56:25 -0000
@@ -217,10 +217,14 @@ and
 fields), its position in the tree (the
 .Va parent ,
 .Va child ,
+.Va nchild ,
 .Va next
 and
 .Va prev
-fields) and some type-specific data.
+fields) and some type-specific data, in particular, for nodes generated
+from macros, the generating macro in the
+.Va tok
+field.
 .Pp
 The tree itself is arranged according to the following normal form,
 where capitalised non-terminals represent nodes.
@@ -235,11 +239,11 @@ where capitalised non-terminals represen
 .It ELEMENT
 \(<- TEXT*
 .It HEAD
-\(<- mnode+
+\(<- mnode*
 .It BODY
-\(<- mnode+
+\(<- mnode* [ENDBODY mnode*]
 .It TAIL
-\(<- mnode+
+\(<- mnode*
 .It TEXT
 \(<- [[:printable:],0x1e]*
 .El
@@ -253,6 +257,65 @@ an empty line will produce a zero-length
 Multiple body parts are only found in invocations of
 .Sq \&Bl \-column ,
 where a new body introduces a new phrase.
+.Ss Badly nested blocks
+A special kind of node is available to end the formatting
+associated with a given block before the physical end of that block.
+Such an ENDBODY node has a non-null
+.Va end
+field, is of the BODY
+.Va type ,
+has the same
+.Va tok
+as the BLOCK it is ending, and has a
+.Va pending
+field pointing to that BLOCK's BODY node.
+It is an indirect child of that BODY node
+and has no children of its own.
+.Pp
+An ENDBODY node is generated when a block ends while one of its child
+blocks is still open, like in the following example:
+.Bd -literal -offset indent
+\&.Ao ao
+\&.Bo bo ac
+\&.Ac bc
+\&.Bc end
+.Ed
+.Pp
+This example results in the following block structure:
+.Bd -literal -offset indent
+BLOCK Ao
+	HEAD Ao
+	BODY Ao
+		TEXT ao
+		BLOCK Bo, pending -> Ao
+			HEAD Bo
+			BODY Bo
+				TEXT bo
+				TEXT ac
+				ENDBODY Ao, pending -> Ao
+				TEXT bc
+TEXT end
+.Ed
+.Pp
+Here, the formatting of the Ao block extends from TEXT ao to TEXT ac,
+while the formatting of the Bo block extends from TEXT bo to TEXT bc,
+rendering like this in
+.Fl T Ns Cm ascii
+mode:
+.Dl <ao [bo ac> bc] end
+Support for badly nested blocks is only provided for backward
+compatibility with some older
+.Xr mdoc 7
+implementations.
+Using them in new code is stronly discouraged:
+Some frontends, in particular
+.Fl T Ns Cm html ,
+are unable to render them in any meaningful way,
+many other
+.Xr mdoc 7
+implementations do not support them, and even for those that do,
+the behaviour is not well-defined, in particular when using multiple
+levels of badly nested blocks.
 .Sh EXAMPLES
 The following example reads lines from stdin and parses them, operating
 on the finished parse tree with
Index: mdoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v
retrieving revision 1.58
diff -u -p -r1.58 mdoc.c
--- mdoc.c	27 Jun 2010 21:54:42 -0000	1.58
+++ mdoc.c	28 Jun 2010 18:56:30 -0000
@@ -328,6 +328,8 @@ node_append(struct mdoc *mdoc, struct md
 		p->parent->tail = p;
 		break;
 	case (MDOC_BODY):
+		if (p->end)
+			break;
 		assert(MDOC_BLOCK == p->parent->type);
 		p->parent->body = p;
 		break;
@@ -427,6 +429,22 @@ mdoc_body_alloc(struct mdoc *m, int line
 	if ( ! node_append(m, p))
 		return(0);
 	m->next = MDOC_NEXT_CHILD;
+	return(1);
+}
+
+
+int
+mdoc_endbody_alloc(struct mdoc *m, int line, int pos, enum mdoct tok,
+		struct mdoc_node *body, enum mdoc_endbody end)
+{
+	struct mdoc_node *p;
+
+	p = node_alloc(m, line, pos, tok, MDOC_BODY);
+	p->pending = body;
+	p->end = end;
+	if ( ! node_append(m, p))
+		return(0);
+	m->next = MDOC_NEXT_SIBLING;
 	return(1);
 }
 
Index: mdoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.h,v
retrieving revision 1.29
diff -u -p -r1.29 mdoc.h
--- mdoc.h	27 Jun 2010 21:54:42 -0000	1.29
+++ mdoc.h	28 Jun 2010 18:56:30 -0000
@@ -249,6 +249,12 @@ struct 	mdoc_arg {
 	unsigned int	  refcnt;
 };
 
+enum	mdoc_endbody {
+	ENDBODY_NOT = 0,
+	ENDBODY_SPACE,
+	ENDBODY_NOSPACE,
+};
+
 enum	mdoc_list {
 	LIST__NONE = 0,
 	LIST_bullet,
@@ -302,6 +308,7 @@ struct	mdoc_node {
 #define	MDOC_EOS	 (1 << 2) /* at sentence boundary */
 #define	MDOC_LINE	 (1 << 3) /* first macro/text on line */
 #define	MDOC_SYNPRETTY	 (1 << 4) /* SYNOPSIS-style formatting */
+#define	MDOC_ENDED	 (1 << 5) /* rendering has been ended */
 	enum mdoc_type	  type; /* AST node type */
 	enum mdoc_sec	  sec; /* current named section */
 	/* FIXME: these can be union'd to shave a few bytes. */
@@ -311,6 +318,7 @@ struct	mdoc_node {
 	struct mdoc_node *body;		/* BLOCK */
 	struct mdoc_node *tail;		/* BLOCK */
 	char		 *string;	/* TEXT */
+	enum mdoc_endbody end;		/* BODY */
 
 	union {
 		struct mdoc_bl Bl;
Index: mdoc_html.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_html.c,v
retrieving revision 1.23
diff -u -p -r1.23 mdoc_html.c
--- mdoc_html.c	27 Jun 2010 21:54:42 -0000	1.23
+++ mdoc_html.c	28 Jun 2010 18:57:04 -0000
@@ -433,7 +433,7 @@ print_mdoc_node(MDOC_ARGS)
 		print_text(h, n->string);
 		return;
 	default:
-		if (mdocs[n->tok].pre)
+		if (mdocs[n->tok].pre && !n->end)
 			child = (*mdocs[n->tok].pre)(m, n, h);
 		break;
 	}
@@ -449,7 +449,7 @@ print_mdoc_node(MDOC_ARGS)
 		mdoc_root_post(m, n, h);
 		break;
 	default:
-		if (mdocs[n->tok].post)
+		if (mdocs[n->tok].post && !n->end)
 			(*mdocs[n->tok].post)(m, n, h);
 		break;
 	}
Index: mdoc_macro.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v
retrieving revision 1.46
diff -u -p -r1.46 mdoc_macro.c
--- mdoc_macro.c	6 Jun 2010 20:30:08 -0000	1.46
+++ mdoc_macro.c	28 Jun 2010 18:57:09 -0000
@@ -46,6 +46,8 @@ static	int	  	append_delims(struct mdoc 
 				int, int *, char *);
 static	enum mdoct	lookup(enum mdoct, const char *);
 static	enum mdoct	lookup_raw(const char *);
+static	int		make_pending(struct mdoc_node *, enum mdoc_type,
+				struct mdoc *, int, int);
 static	int	  	phrase(struct mdoc *, int, int, char *);
 static	enum mdoct 	rew_alt(enum mdoct);
 static	int	  	rew_dobreak(enum mdoct, 
@@ -57,8 +59,6 @@ static	int	  	rew_last(struct mdoc *, 
 				const struct mdoc_node *);
 static	int	  	rew_sub(enum mdoc_type, struct mdoc *, 
 				enum mdoct, int, int);
-static	int	  	swarn(struct mdoc *, enum mdoc_type, int, 
-				int, const struct mdoc_node *);
 
 const	struct mdoc_macro __mdoc_macros[MDOC_MAX] = {
 	{ in_line_argn, MDOC_CALLABLE | MDOC_PARSED }, /* Ap */
@@ -188,53 +188,6 @@ const	struct mdoc_macro __mdoc_macros[MD
 const	struct mdoc_macro * const mdoc_macros = __mdoc_macros;
 
 
-static int
-swarn(struct mdoc *mdoc, enum mdoc_type type, 
-		int line, int pos, const struct mdoc_node *p)
-{
-	const char	*n, *t, *tt;
-	enum mandocerr	 ec;
-
-	n = t = "<root>";
-	tt = "block";
-
-	switch (type) {
-	case (MDOC_BODY):
-		tt = "multi-line";
-		break;
-	case (MDOC_HEAD):
-		tt = "line";
-		break;
-	default:
-		break;
-	}
-
-	switch (p->type) {
-	case (MDOC_BLOCK):
-		n = mdoc_macronames[p->tok];
-		t = "block";
-		break;
-	case (MDOC_BODY):
-		n = mdoc_macronames[p->tok];
-		t = "multi-line";
-		break;
-	case (MDOC_HEAD):
-		n = mdoc_macronames[p->tok];
-		t = "line";
-		break;
-	default:
-		break;
-	}
-
-	ec = (MDOC_IGN_SCOPE & mdoc->pflags) ?
-		MANDOCERR_SCOPE : MANDOCERR_SYNTSCOPE;
-
-	return(mdoc_vmsg(mdoc, ec, line, pos, 
-				"%s scope breaks %s of %s", 
-				tt, t, n));
-}
-
-
 /*
  * This is called at the end of parsing.  It must traverse up the tree,
  * closing out open [implicit] scopes.  Obviously, open explicit scopes
@@ -406,7 +359,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 		/* FALLTHROUGH */
 	case (MDOC_Vt):
 		assert(MDOC_TAIL != type);
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	case (MDOC_It):
@@ -460,7 +417,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_So):
 		/* FALLTHROUGH */
 	case (MDOC_Xo):
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	/* Multi-line explicit scope close. */
@@ -495,7 +456,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_Sc):
 		/* FALLTHROUGH */
 	case (MDOC_Xc):
-		if (type == p->type && rew_alt(tok) == p->tok)
+		if (rew_alt(tok) != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	default:
@@ -522,6 +487,8 @@ rew_dobreak(enum mdoct tok, const struct
 		return(1);
 	if (MDOC_VALID & p->flags)
 		return(1);
+	if (MDOC_BODY == p->type && p->end)
+		return(1);
 
 	switch (tok) {
 	case (MDOC_It):
@@ -572,6 +539,83 @@ rew_elem(struct mdoc *mdoc, enum mdoct t
 }
 
 
+/*
+ * We are trying to close a block identified by tok,
+ * but the child block *broken is still open.
+ * Thus, postpone closing the tok block
+ * until the rew_sub call closing *broken.
+ */
+static int
+make_pending(struct mdoc_node *broken, enum mdoct tok,
+		struct mdoc *m, int line, int ppos)
+{
+	struct mdoc_node *breaker;
+
+	/*
+	 * Iterate backwards, searching for the block matching tok,
+	 * that is, the block breaking the *broken block.
+	 */
+	for (breaker = broken->parent; breaker; breaker = breaker->parent) {
+
+		/*
+		 * If the *broken block had already been broken before
+		 * and we encounter its breaker, make the tok block
+		 * pending on the inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
+		 * becomes "[A broken=[B [C->B B] tok=A] C]"
+		 * and finally "[A [B->A [C->B B] A] C]".
+		 */
+		if (breaker == broken->pending) {
+			broken = breaker;
+			continue;
+		}
+
+		if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
+			continue;
+		if (MDOC_BODY == broken->type)
+			broken = broken->parent;
+
+		/*
+		 * Found the breaker.
+		 * If another, outer breaker is already pending on
+		 * the *broken block, we must not clobber the link
+		 * to the outer breaker, but make it pending on the
+		 * new, now inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
+		 * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
+		 * and finally "[A [B->A [C->B A] B] C]".
+		 */
+		if (broken->pending) {
+			struct mdoc_node *taker;
+
+			/*
+			 * If the breaker had also been broken before,
+			 * it cannot take on the outer breaker itself,
+			 * but must hand it on to its own breakers.
+			 * Graphically, this is the following situation:
+			 * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
+			 * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
+			 */
+			taker = breaker;
+			while (taker->pending)
+				taker = taker->pending;
+			taker->pending = broken->pending;
+		}
+		broken->pending = breaker;
+		mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos, "%s breaks %s",
+		    mdoc_macronames[tok], mdoc_macronames[broken->tok]);
+		return(1);
+	}
+
+	/*
+	 * Found no matching block for tok.
+	 * Are you trying to close a block that is not open?
+	 * Report failure and abort the parser.
+	 */
+	mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
+	return(0);
+}
+
 static int
 rew_sub(enum mdoc_type t, struct mdoc *m, 
 		enum mdoct tok, int line, int ppos)
@@ -583,7 +627,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 	for (n = m->last; n; n = n->parent) {
 		c = rew_dohalt(tok, t, n);
 		if (REWIND_HALT == c) {
-			if (MDOC_BLOCK != t)
+			if (n->end || MDOC_BLOCK != t)
 				return(1);
 			if ( ! (MDOC_EXPLICIT & mdoc_macros[tok].flags))
 				return(1);
@@ -595,8 +639,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 			break;
 		else if (rew_dobreak(tok, n))
 			continue;
-		if ( ! swarn(m, t, line, ppos, n))
-			return(0);
+		return(make_pending(n, tok, m, line, ppos));
 	}
 
 	assert(n);
@@ -604,15 +647,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 		return(0);
 
 	/*
-	 * The current block extends an enclosing block beyond a line
-	 * break.  Now that the current block ends, close the enclosing
-	 * block, too.
+	 * The current block extends an enclosing block.
+	 * Now that the current block ends, close the enclosing block, too.
 	 */
-	if (NULL != (n = n->pending)) {
-		assert(MDOC_HEAD == n->type);
+	while (NULL != (n = n->pending)) {
 		if ( ! rew_last(m, n))
 			return(0);
-		if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
+		if (MDOC_HEAD == n->type &&
+		    ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
 			return(0);
 	}
 	return(1);
@@ -667,9 +709,13 @@ append_delims(struct mdoc *m, int line, 
 static int
 blk_exp_close(MACRO_PROT_ARGS)
 {
+	struct mdoc_node *body;		/* Our own body. */
+	struct mdoc_node *later;	/* A sub-block starting later. */
+	struct mdoc_node *n;		/* For searching backwards. */
+
 	int	 	 j, lastarg, maxargs, flushed, nl;
 	enum margserr	 ac;
-	enum mdoct	 ntok;
+	enum mdoct	 atok, ntok;
 	char		*p;
 
 	nl = MDOC_NEWLINE & m->flags;
@@ -683,6 +729,68 @@ blk_exp_close(MACRO_PROT_ARGS)
 		break;
 	}
 
+	/*
+	 * Search backwards for beginnings of blocks,
+	 * both of our own and of pending sub-blocks.
+	 */
+	atok = rew_alt(tok);
+	body = later = NULL;
+	for (n = m->last; n; n = n->parent) {
+		if (MDOC_VALID & n->flags)
+			continue;
+
+		/* Remember the start of our own body. */
+		if (MDOC_BODY == n->type && atok == n->tok) {
+			if ( ! n->end)
+				body = n;
+			continue;
+		}
+
+		if (MDOC_BLOCK != n->type)
+			continue;
+		if (atok == n->tok) {
+			assert(body);
+
+			/*
+			 * Found the start of our own block.
+			 * When there is no pending sub block,
+			 * just proceed to closing out.
+			 */
+			if (NULL == later)
+				break;
+
+			/* 
+			 * When there is a pending sub block,
+			 * postpone closing out the current block
+			 * until the rew_sub() closing out the sub-block.
+			 */
+			if ( ! make_pending(later, tok, m, line, ppos))
+				return(0);
+
+			/*
+			 * Mark the place where the formatting - but not
+			 * the scope - of the current block ends.
+			 */
+			if ( ! mdoc_endbody_alloc(m, line, ppos,
+			    atok, body, ENDBODY_SPACE))
+				return(0);
+			break;
+		}
+
+		/*
+		 * When finding an open sub block, remember the last
+		 * open explicit block, or, in case there are only
+		 * implicit ones, the first open implicit block.
+		 */
+		if (later &&
+		    MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
+			continue;
+		if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
+			assert( ! (MDOC_ACTED & n->flags));
+			later = n;
+		}
+	}
+
 	if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
 		/* FIXME: do this in validate */
 		if (buf[*pos]) 
@@ -697,7 +805,7 @@ blk_exp_close(MACRO_PROT_ARGS)
 	if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
-	if (maxargs > 0) 
+	if (NULL == later && maxargs > 0) 
 		if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
 			return(0);
 
@@ -1249,20 +1357,36 @@ blk_part_imp(MACRO_PROT_ARGS)
 		body->parent->flags |= MDOC_EOS;
 	}
 
+	/*
+	 * If there is an open sub-block requiring explicit close-out,
+	 * postpone closing out the current block
+	 * until the rew_sub() call closing out the sub-block.
+	 */
+	for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
+		if (MDOC_BLOCK == n->type &&
+		    MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
+		    ! (MDOC_VALID & n->flags)) {
+			assert( ! (MDOC_ACTED & n->flags));
+			if ( ! make_pending(n, tok, m, line, ppos))
+				return(0);
+			if ( ! mdoc_endbody_alloc(m, line, ppos,
+			    tok, body, ENDBODY_NOSPACE))
+				return(0);
+			return(1);
+		}
+	}
+
 	/* 
 	 * If we can't rewind to our body, then our scope has already
 	 * been closed by another macro (like `Oc' closing `Op').  This
 	 * is ugly behaviour nodding its head to OpenBSD's overwhelming
 	 * crufty use of `Op' breakage.
 	 */
-	for (n = m->last; n; n = n->parent)
-		if (body == n)
-			break;
-
-	if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
+	if (n != body && ! mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
+	    "%s broken", mdoc_macronames[tok]))
 		return(0);
 
-	if (n && ! rew_last(m, body))
+	if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
 	/* Standard appending of delimiters. */
@@ -1272,7 +1396,7 @@ blk_part_imp(MACRO_PROT_ARGS)
 
 	/* Rewind scope, if applicable. */
 
-	if (n && ! rew_last(m, blk))
+	if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
 		return(0);
 
 	return(1);
Index: mdoc_term.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_term.c,v
retrieving revision 1.92
diff -u -p -r1.92 mdoc_term.c
--- mdoc_term.c	27 Jun 2010 21:54:42 -0000	1.92
+++ mdoc_term.c	28 Jun 2010 18:57:33 -0000
@@ -321,20 +321,37 @@ print_mdoc_node(DECL_ARGS)
 	memset(&npair, 0, sizeof(struct termpair));
 	npair.ppair = pair;
 
-	if (MDOC_TEXT != n->type) {
-		if (termacts[n->tok].pre)
-			chld = (*termacts[n->tok].pre)(p, &npair, m, n);
-	} else 
+	if (MDOC_TEXT == n->type)
 		term_word(p, n->string); 
+	else if (termacts[n->tok].pre && !n->end)
+		chld = (*termacts[n->tok].pre)(p, &npair, m, n);
 
 	if (chld && n->child)
 		print_mdoc_nodelist(p, &npair, m, n->child);
 
 	term_fontpopq(p, font);
 
-	if (MDOC_TEXT != n->type)
-		if (termacts[n->tok].post)
-			(*termacts[n->tok].post)(p, &npair, m, n);
+	if (MDOC_TEXT != n->type &&
+	    termacts[n->tok].post &&
+	    ! (MDOC_ENDED & n->flags)) {
+		(*termacts[n->tok].post)(p, &npair, m, n);
+
+		/*
+		 * Explicit end tokens not only call the post
+		 * handler, but also tell the respective block
+		 * that it must not call the post handler again.
+		 */
+		if (n->end)
+			n->pending->flags |= MDOC_ENDED;
+
+		/*
+		 * End of line terminating an implicit block
+		 * while an explicit block is still open.
+		 * Continue the explicit block without spacing.
+		 */
+		if (ENDBODY_NOSPACE == n->end)
+			p->flags |= TERMP_NOSPACE;
+	}
 
 	if (MDOC_EOS & n->flags)
 		p->flags |= TERMP_SENTENCE;
Index: tree.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/tree.c,v
retrieving revision 1.8
diff -u -p -r1.8 tree.c
--- tree.c	27 Jun 2010 21:54:42 -0000	1.8
+++ tree.c	28 Jun 2010 18:57:33 -0000
@@ -71,7 +71,10 @@ print_mdoc(const struct mdoc_node *n, in
 		t = "block-head";
 		break;
 	case (MDOC_BODY):
-		t = "block-body";
+		if (n->end)
+			t = "body-end";
+		else
+			t = "block-body";
 		break;
 	case (MDOC_TAIL):
 		t = "block-tail";
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Back-block nesting patch.
  2010-06-28 19:02   ` Ingo Schwarze
@ 2010-06-28 19:26     ` Kristaps Dzonsons
  0 siblings, 0 replies; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-06-28 19:26 UTC (permalink / raw)
  To: tech

> here is the updated patch taking your comments into account.
> I shall now merge your latest commit; we are nearly in sync
> already.

Note that I'm bug-busting a patch that actually uses the font metrics in 
term_ps.c.  I should have it committed in the next few hours.
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* rew_dohalt reorg and bad block-nesting bug
  2010-06-27 23:47 ` Back-block nesting patch Kristaps Dzonsons
  2010-06-28 19:02   ` Ingo Schwarze
@ 2010-06-29  5:06   ` Ingo Schwarze
  2010-06-29  9:18     ` Kristaps Dzonsons
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2010-06-29  5:06 UTC (permalink / raw)
  To: tech

Hi,

just to let you know what happened here...

After brushing up the badly nested blocks patch according to Kristaps'
comments, i went over to the .Nm stuff, SYNOPSIS indentation.
As you know, this is still somewhat broken because there are still
bugs in the rewind rules.  Trying to fix these, i lost patience
with rew_dohalt and rew_dobreak and decided to clean them up.

So i wrote a +46 -171 patch not changing functionality, but making
this a lot easier.  That patch was just ready when i realized that
it conflicts really badly with my badly nested blocks patch.

So i ported the rew_dohalt reorg over to apply on top of the
badly nested blocks patch, where it turns out to be +66 -200.
But while doing that, i found a really nasty bug in the badly
nested block patch.  It fails miserably when one block is broken
twice by two blocks of the same type, e.g. ".Ao Ao Bo Ac Ac Bc".
In this situation, the first Ac breaks Bo, sets the second Ao
pending on Bo, and adds an Ao end-marker.  The second Ac
finds the end marker and doesn't close itself out at all,
thinking that it were already pending.  Finally, the Bc will
only close Bo and the second Ao, but not the first one;
the first one will stay open and make the parser fail.

Now this is terribly hard to fix because rew_sub only gets
type and tok arguments, but it has no idea which block it is
trying to rewind.  It cannot distinguish whether or not the
Ao end marker belongs to itself or to another open block.

This is also relevant because ".Op Op Oo\n.Oc" triggers the same
bug and does occur in practice.

Currently, i'm not quite sure in which order to proceed.  Maybe
 1) sync
 2) tag
 3) merge badly nested blocks without the bugfix and reorg
 4) merge the reorg without the bugfix
 5) finish the .Nm/SYNOPSIS indentation stuff
 6) look at the double-breaking bug

I guess i will have to look at this once again tomorrow in the
morning.  I should better sleep now.

Right now, i'm just sending the reorg such that you can have
a look.  It must be applied on top of the latest badly nested
block patch, otherwise it won't apply at all.

Yours,
  Ingo


diff -Naurp mandoc-block-nest/mdoc_macro.c mandoc-bn-rew/mdoc_macro.c
--- mandoc-block-nest/mdoc_macro.c	Mon Jun 28 10:57:36 2010
+++ mandoc-bn-rew/mdoc_macro.c	Mon Jun 28 22:35:02 2010
@@ -25,10 +25,12 @@
 #include "libmdoc.h"
 #include "libmandoc.h"
 
-enum	rew {
-	REWIND_REWIND,
-	REWIND_NOHALT,
-	REWIND_HALT
+enum	rew {	/* see rew_dohalt() */
+	REWIND_NONE,
+	REWIND_THIS,
+	REWIND_MORE,
+	REWIND_LATER,
+	REWIND_ERROR,
 };
 
 static	int	  	blk_full(MACRO_PROT_ARGS);
@@ -50,8 +52,6 @@ static	int		make_pending(struct mdoc_node *, enum mdoc
 				struct mdoc *, int, int);
 static	int	  	phrase(struct mdoc *, int, int, char *);
 static	enum mdoct 	rew_alt(enum mdoct);
-static	int	  	rew_dobreak(enum mdoct, 
-				const struct mdoc_node *);
 static	enum rew  	rew_dohalt(enum mdoct, enum mdoc_type, 
 				const struct mdoc_node *);
 static	int	  	rew_elem(struct mdoc *, enum mdoct);
@@ -271,8 +271,8 @@ rew_last(struct mdoc *mdoc, const struct mdoc_node *to
 
 
 /*
- * Return the opening macro of a closing one, e.g., `Ec' has `Eo' as its
- * matching pair.
+ * For a block closing macro, return the corresponding opening one.
+ * Otherwise, return the macro itself.
  */
 static enum mdoct
 rew_alt(enum mdoct tok)
@@ -311,18 +311,20 @@ rew_alt(enum mdoct tok)
 	case (MDOC_Xc):
 		return(MDOC_Xo);
 	default:
-		break;
+		return(tok);
 	}
-	abort();
 	/* NOTREACHED */
 }
 
 
-/* 
- * Rewind rules.  This indicates whether to stop rewinding
- * (REWIND_HALT) without touching our current scope, stop rewinding and
- * close our current scope (REWIND_REWIND), or continue (REWIND_NOHALT).
- * The scope-closing and so on occurs in the various rew_* routines.
+/*
+ * Rewinding to tok, how do we have to handle *p?
+ * REWIND_NONE: *p would delimit tok, but no tok scope is open
+ *   inside *p, so there is no need to rewind anything at all.
+ * REWIND_THIS: *p matches tok, so rewind *p and nothing else.
+ * REWIND_MORE: *p is implicit, rewind it and keep searching for tok.
+ * REWIND_LATER: *p is explicit and still open, postpone rewinding.
+ * REWIND_ERROR: No tok block is open at all.
  */
 static enum rew
 rew_dohalt(enum mdoct tok, enum mdoc_type type, 
@@ -330,197 +332,57 @@ rew_dohalt(enum mdoct tok, enum mdoc_type type, 
 {
 
 	if (MDOC_ROOT == p->type)
-		return(REWIND_HALT);
-	if (MDOC_VALID & p->flags)
-		return(REWIND_NOHALT);
+		return(MDOC_BLOCK == type &&
+		    MDOC_EXPLICIT & mdoc_macros[tok].flags ?
+		    REWIND_ERROR : REWIND_NONE);
+	if (MDOC_TEXT == p->type || MDOC_VALID & p->flags)
+		return(REWIND_MORE);
 
+	tok = rew_alt(tok);
+	if (tok == p->tok)
+		return(p->end ? REWIND_NONE :
+		    type == p->type ? REWIND_THIS : REWIND_MORE);
+
+	if (MDOC_ELEM == p->type)
+		return(REWIND_MORE);
+
 	switch (tok) {
-	case (MDOC_Aq):
-		/* FALLTHROUGH */
-	case (MDOC_Bq):
-		/* FALLTHROUGH */
-	case (MDOC_Brq):
-		/* FALLTHROUGH */
-	case (MDOC_D1):
-		/* FALLTHROUGH */
-	case (MDOC_Dl):
-		/* FALLTHROUGH */
-	case (MDOC_Dq):
-		/* FALLTHROUGH */
-	case (MDOC_Op):
-		/* FALLTHROUGH */
-	case (MDOC_Pq):
-		/* FALLTHROUGH */
-	case (MDOC_Ql):
-		/* FALLTHROUGH */
-	case (MDOC_Qq):
-		/* FALLTHROUGH */
-	case (MDOC_Sq):
-		/* FALLTHROUGH */
-	case (MDOC_Vt):
-		assert(MDOC_TAIL != type);
-		if (tok != p->tok)
-			break;
-		if (p->end)
-			return(REWIND_HALT);
-		if (type == p->type)
-			return(REWIND_REWIND);
+	case (MDOC_Bl):
+		if (MDOC_It == p->tok)
+			return(REWIND_MORE);
 		break;
 	case (MDOC_It):
-		assert(MDOC_TAIL != type);
-		if (type == p->type && tok == p->tok)
-			return(REWIND_REWIND);
 		if (MDOC_BODY == p->type && MDOC_Bl == p->tok)
-			return(REWIND_HALT);
+			return(REWIND_NONE);
 		break;
-	case (MDOC_Sh):
-		if (type == p->type && tok == p->tok)
-			return(REWIND_REWIND);
+	/*
+	 * XXX Badly nested block handling still fails badly
+	 * when one block is breaking two blocks of the same type.
+	 * This is an uncomplete and extremely ugly workaround,
+	 * required to let the OpenBSD tree build.
+	 */
+	case (MDOC_Oo):
+		if (MDOC_Op == p->tok)
+			return(REWIND_MORE);
 		break;
 	case (MDOC_Nd):
 		/* FALLTHROUGH */
 	case (MDOC_Ss):
-		assert(MDOC_TAIL != type);
-		if (type == p->type && tok == p->tok)
-			return(REWIND_REWIND);
 		if (MDOC_BODY == p->type && MDOC_Sh == p->tok)
-			return(REWIND_HALT);
-		break;
-	case (MDOC_Ao):
+			return(REWIND_NONE);
 		/* FALLTHROUGH */
-	case (MDOC_Bd):
-		/* FALLTHROUGH */
-	case (MDOC_Bf):
-		/* FALLTHROUGH */
-	case (MDOC_Bk):
-		/* FALLTHROUGH */
-	case (MDOC_Bl):
-		/* FALLTHROUGH */
-	case (MDOC_Bo):
-		/* FALLTHROUGH */
-	case (MDOC_Bro):
-		/* FALLTHROUGH */
-	case (MDOC_Do):
-		/* FALLTHROUGH */
-	case (MDOC_Eo):
-		/* FALLTHROUGH */
-	case (MDOC_Fo):
-		/* FALLTHROUGH */
-	case (MDOC_Oo):
-		/* FALLTHROUGH */
-	case (MDOC_Po):
-		/* FALLTHROUGH */
-	case (MDOC_Qo):
-		/* FALLTHROUGH */
-	case (MDOC_Rs):
-		/* FALLTHROUGH */
-	case (MDOC_So):
-		/* FALLTHROUGH */
-	case (MDOC_Xo):
-		if (tok != p->tok)
-			break;
-		if (p->end)
-			return(REWIND_HALT);
-		if (type == p->type)
-			return(REWIND_REWIND);
-		break;
-	/* Multi-line explicit scope close. */
-	case (MDOC_Ac):
-		/* FALLTHROUGH */
-	case (MDOC_Bc):
-		/* FALLTHROUGH */
-	case (MDOC_Brc):
-		/* FALLTHROUGH */
-	case (MDOC_Dc):
-		/* FALLTHROUGH */
-	case (MDOC_Ec):
-		/* FALLTHROUGH */
-	case (MDOC_Ed):
-		/* FALLTHROUGH */
-	case (MDOC_Ek):
-		/* FALLTHROUGH */
-	case (MDOC_El):
-		/* FALLTHROUGH */
-	case (MDOC_Fc):
-		/* FALLTHROUGH */
-	case (MDOC_Ef):
-		/* FALLTHROUGH */
-	case (MDOC_Oc):
-		/* FALLTHROUGH */
-	case (MDOC_Pc):
-		/* FALLTHROUGH */
-	case (MDOC_Qc):
-		/* FALLTHROUGH */
-	case (MDOC_Re):
-		/* FALLTHROUGH */
-	case (MDOC_Sc):
-		/* FALLTHROUGH */
-	case (MDOC_Xc):
-		if (rew_alt(tok) != p->tok)
-			break;
-		if (p->end)
-			return(REWIND_HALT);
-		if (type == p->type)
-			return(REWIND_REWIND);
-		break;
-	default:
-		abort();
-		/* NOTREACHED */
-	}
-
-	return(REWIND_NOHALT);
-}
-
-
-/*
- * See if we can break an encountered scope (the rew_dohalt has returned
- * REWIND_NOHALT). 
- */
-static int
-rew_dobreak(enum mdoct tok, const struct mdoc_node *p)
-{
-
-	assert(MDOC_ROOT != p->type);
-	if (MDOC_ELEM == p->type)
-		return(1);
-	if (MDOC_TEXT == p->type)
-		return(1);
-	if (MDOC_VALID & p->flags)
-		return(1);
-	if (MDOC_BODY == p->type && p->end)
-		return(1);
-
-	switch (tok) {
-	case (MDOC_It):
-		return(MDOC_It == p->tok);
-	case (MDOC_Nd):
-		return(MDOC_Nd == p->tok);
-	case (MDOC_Ss):
-		return(MDOC_Ss == p->tok);
 	case (MDOC_Sh):
-		if (MDOC_Nd == p->tok)
-			return(1);
-		if (MDOC_Ss == p->tok)
-			return(1);
-		return(MDOC_Sh == p->tok);
-	case (MDOC_El):
-		if (MDOC_It == p->tok)
-			return(1);
+		if (MDOC_Nd == p->tok || MDOC_Ss == p->tok ||
+		    MDOC_Sh == p->tok)
+			return(REWIND_MORE);
 		break;
-	case (MDOC_Oc):
-		if (MDOC_Op == p->tok)
-			return(1);
-		break;
 	default:
 		break;
 	}
 
-	if (MDOC_EXPLICIT & mdoc_macros[tok].flags) 
-		return(p->tok == rew_alt(tok));
-	else if (MDOC_BLOCK == p->type)
-		return(1);
-
-	return(tok == p->tok);
+	return(p->end || MDOC_BLOCK == p->type &&
+	    ! (MDOC_EXPLICIT & mdoc_macros[tok].flags) ?
+	    REWIND_MORE : REWIND_LATER);
 }
 
 
@@ -570,7 +432,7 @@ make_pending(struct mdoc_node *broken, enum mdoct tok,
 			continue;
 		}
 
-		if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
+		if (REWIND_THIS != rew_dohalt(tok, MDOC_BLOCK, breaker))
 			continue;
 		if (MDOC_BODY == broken->type)
 			broken = broken->parent;
@@ -610,36 +472,37 @@ make_pending(struct mdoc_node *broken, enum mdoct tok,
 	/*
 	 * Found no matching block for tok.
 	 * Are you trying to close a block that is not open?
-	 * Report failure and abort the parser.
+	 * XXX Make this non-fatal.
 	 */
 	mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
 	return(0);
 }
 
+
 static int
 rew_sub(enum mdoc_type t, struct mdoc *m, 
 		enum mdoct tok, int line, int ppos)
 {
 	struct mdoc_node *n;
-	enum rew	  c;
 
-	/* LINTED */
-	for (n = m->last; n; n = n->parent) {
-		c = rew_dohalt(tok, t, n);
-		if (REWIND_HALT == c) {
-			if (n->end || MDOC_BLOCK != t)
-				return(1);
-			if ( ! (MDOC_EXPLICIT & mdoc_macros[tok].flags))
-				return(1);
-			/* FIXME: shouldn't raise an error */
-			mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
-			return(0);
-		}
-		if (REWIND_REWIND == c)
+	n = m->last;
+	while (n) {
+		switch (rew_dohalt(tok, t, n)) {
+		case (REWIND_NONE):
+			return(1);
+		case (REWIND_THIS):
 			break;
-		else if (rew_dobreak(tok, n))
+		case (REWIND_MORE):
+			n = n->parent;
 			continue;
-		return(make_pending(n, tok, m, line, ppos));
+		case (REWIND_LATER):
+			return(make_pending(n, tok, m, line, ppos));
+		case (REWIND_ERROR):
+			/* XXX Make this non-fatal. */
+			mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
+			return 0;
+		}
+		break;
 	}
 
 	assert(n);
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rew_dohalt reorg and bad block-nesting bug
  2010-06-29  5:06   ` rew_dohalt reorg and bad block-nesting bug Ingo Schwarze
@ 2010-06-29  9:18     ` Kristaps Dzonsons
  2010-06-29 18:33       ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Kristaps Dzonsons @ 2010-06-29  9:18 UTC (permalink / raw)
  To: tech

Ingo, you beat me to cleaning out the rew functions--good.  Those are 
throwbacks from when parse rules were still changing under my feet.

> Currently, i'm not quite sure in which order to proceed.  Maybe
>  1) sync
>  2) tag

Let me know when this is done; I'll hold off on further changes until 
then.  I expect you awake at 06:00 to respond. ;-)

>  3) merge badly nested blocks without the bugfix and reorg
>  4) merge the reorg without the bugfix
>  5) finish the .Nm/SYNOPSIS indentation stuff
>  6) look at the double-breaking bug

That's fine.  My biggest concern is (3) and (4).  Get them in and we'll 
test them thoroughly.  Then proceed from there.

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rew_dohalt reorg and bad block-nesting bug
  2010-06-29  9:18     ` Kristaps Dzonsons
@ 2010-06-29 18:33       ` Ingo Schwarze
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Schwarze @ 2010-06-29 18:33 UTC (permalink / raw)
  To: tech

Hi,

Kristaps Dzonsons wrote on Tue, Jun 29, 2010 at 11:18:41AM +0200:
> Ingo Schwarze wrote:

>>Currently, i'm not quite sure in which order to proceed.  Maybe
>> 1) sync
>> 2) tag

> Let me know when this is done;

It is.

>> 3) merge badly nested blocks without the bugfix and reorg
>> 4) merge the reorg without the bugfix
>> 5) finish the .Nm/SYNOPSIS indentation stuff
>> 6) look at the double-breaking bug

> That's fine.  My biggest concern is (3) and (4).  Get them in and

I have committed them to the OpenBSD tree.
Here is the merge diff for (3), tested in the bsd.lv tree.
OK to commit that?

Once that is in, http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/mandoc/mdoc_macro.c.diff?r1=1.47;r2=1.48
applies cleanly on top of it, too, and it works.
OK to commit that, as well?

Yours,
  Ingo


Index: libmdoc.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/libmdoc.h,v
retrieving revision 1.57
diff -u -p -r1.57 libmdoc.h
--- libmdoc.h	27 Jun 2010 16:18:13 -0000	1.57
+++ libmdoc.h	29 Jun 2010 18:23:00 -0000
@@ -109,6 +109,9 @@ int		  mdoc_block_alloc(struct mdoc *, i
 int		  mdoc_head_alloc(struct mdoc *, int, int, enum mdoct);
 int		  mdoc_tail_alloc(struct mdoc *, int, int, enum mdoct);
 int		  mdoc_body_alloc(struct mdoc *, int, int, enum mdoct);
+int		  mdoc_endbody_alloc(struct mdoc *m, int line, int pos,
+			enum mdoct tok, struct mdoc_node *body,
+			enum mdoc_endbody end);
 void		  mdoc_node_delete(struct mdoc *, struct mdoc_node *);
 void		  mdoc_hash_init(void);
 enum mdoct	  mdoc_hash_find(const char *);
Index: mdoc.3
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.3,v
retrieving revision 1.44
diff -u -p -r1.44 mdoc.3
--- mdoc.3	27 Jun 2010 16:18:13 -0000	1.44
+++ mdoc.3	29 Jun 2010 18:23:00 -0000
@@ -217,10 +217,14 @@ and
 fields), its position in the tree (the
 .Va parent ,
 .Va child ,
+.Va nchild ,
 .Va next
 and
 .Va prev
-fields) and some type-specific data.
+fields) and some type-specific data, in particular, for nodes generated
+from macros, the generating macro in the
+.Va tok
+field.
 .Pp
 The tree itself is arranged according to the following normal form,
 where capitalised non-terminals represent nodes.
@@ -235,11 +239,11 @@ where capitalised non-terminals represen
 .It ELEMENT
 \(<- TEXT*
 .It HEAD
-\(<- mnode+
+\(<- mnode*
 .It BODY
-\(<- mnode+
+\(<- mnode* [ENDBODY mnode*]
 .It TAIL
-\(<- mnode+
+\(<- mnode*
 .It TEXT
 \(<- [[:printable:],0x1e]*
 .El
@@ -253,6 +257,65 @@ an empty line will produce a zero-length
 Multiple body parts are only found in invocations of
 .Sq \&Bl \-column ,
 where a new body introduces a new phrase.
+.Ss Badly nested blocks
+A special kind of node is available to end the formatting
+associated with a given block before the physical end of that block.
+Such an ENDBODY node has a non-null
+.Va end
+field, is of the BODY
+.Va type ,
+has the same
+.Va tok
+as the BLOCK it is ending, and has a
+.Va pending
+field pointing to that BLOCK's BODY node.
+It is an indirect child of that BODY node
+and has no children of its own.
+.Pp
+An ENDBODY node is generated when a block ends while one of its child
+blocks is still open, like in the following example:
+.Bd -literal -offset indent
+\&.Ao ao
+\&.Bo bo ac
+\&.Ac bc
+\&.Bc end
+.Ed
+.Pp
+This example results in the following block structure:
+.Bd -literal -offset indent
+BLOCK Ao
+	HEAD Ao
+	BODY Ao
+		TEXT ao
+		BLOCK Bo, pending -> Ao
+			HEAD Bo
+			BODY Bo
+				TEXT bo
+				TEXT ac
+				ENDBODY Ao, pending -> Ao
+				TEXT bc
+TEXT end
+.Ed
+.Pp
+Here, the formatting of the Ao block extends from TEXT ao to TEXT ac,
+while the formatting of the Bo block extends from TEXT bo to TEXT bc,
+rendering like this in
+.Fl T Ns Cm ascii
+mode:
+.Dl <ao [bo ac> bc] end
+Support for badly nested blocks is only provided for backward
+compatibility with some older
+.Xr mdoc 7
+implementations.
+Using them in new code is stronly discouraged:
+Some frontends, in particular
+.Fl T Ns Cm html ,
+are unable to render them in any meaningful way,
+many other
+.Xr mdoc 7
+implementations do not support them, and even for those that do,
+the behaviour is not well-defined, in particular when using multiple
+levels of badly nested blocks.
 .Sh EXAMPLES
 The following example reads lines from stdin and parses them, operating
 on the finished parse tree with
Index: mdoc.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.c,v
retrieving revision 1.151
diff -u -p -r1.151 mdoc.c
--- mdoc.c	27 Jun 2010 16:36:22 -0000	1.151
+++ mdoc.c	29 Jun 2010 18:23:00 -0000
@@ -332,6 +332,8 @@ node_append(struct mdoc *mdoc, struct md
 		p->parent->tail = p;
 		break;
 	case (MDOC_BODY):
+		if (p->end)
+			break;
 		assert(MDOC_BLOCK == p->parent->type);
 		p->parent->body = p;
 		break;
@@ -431,6 +433,22 @@ mdoc_body_alloc(struct mdoc *m, int line
 	if ( ! node_append(m, p))
 		return(0);
 	m->next = MDOC_NEXT_CHILD;
+	return(1);
+}
+
+
+int
+mdoc_endbody_alloc(struct mdoc *m, int line, int pos, enum mdoct tok,
+		struct mdoc_node *body, enum mdoc_endbody end)
+{
+	struct mdoc_node *p;
+
+	p = node_alloc(m, line, pos, tok, MDOC_BODY);
+	p->pending = body;
+	p->end = end;
+	if ( ! node_append(m, p))
+		return(0);
+	m->next = MDOC_NEXT_SIBLING;
 	return(1);
 }
 
Index: mdoc.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.h,v
retrieving revision 1.94
diff -u -p -r1.94 mdoc.h
--- mdoc.h	27 Jun 2010 16:18:13 -0000	1.94
+++ mdoc.h	29 Jun 2010 18:23:00 -0000
@@ -249,6 +249,12 @@ struct 	mdoc_arg {
 	unsigned int	  refcnt;
 };
 
+enum	mdoc_endbody {
+	ENDBODY_NOT = 0,
+	ENDBODY_SPACE,
+	ENDBODY_NOSPACE,
+};
+
 enum	mdoc_list {
 	LIST__NONE = 0,
 	LIST_bullet,
@@ -302,6 +308,7 @@ struct	mdoc_node {
 #define	MDOC_EOS	 (1 << 2) /* at sentence boundary */
 #define	MDOC_LINE	 (1 << 3) /* first macro/text on line */
 #define	MDOC_SYNPRETTY	 (1 << 4) /* SYNOPSIS-style formatting */
+#define	MDOC_ENDED	 (1 << 5) /* rendering has been ended */
 	enum mdoc_type	  type; /* AST node type */
 	enum mdoc_sec	  sec; /* current named section */
 	/* FIXME: these can be union'd to shave a few bytes. */
@@ -311,6 +318,7 @@ struct	mdoc_node {
 	struct mdoc_node *body;		/* BLOCK */
 	struct mdoc_node *tail;		/* BLOCK */
 	char		 *string;	/* TEXT */
+	enum mdoc_endbody end;		/* BODY */
 
 	union {
 		struct mdoc_bl Bl;
Index: mdoc_html.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc_html.c,v
retrieving revision 1.87
diff -u -p -r1.87 mdoc_html.c
--- mdoc_html.c	27 Jun 2010 16:18:13 -0000	1.87
+++ mdoc_html.c	29 Jun 2010 18:23:00 -0000
@@ -437,7 +437,7 @@ print_mdoc_node(MDOC_ARGS)
 		print_text(h, n->string);
 		return;
 	default:
-		if (mdocs[n->tok].pre)
+		if (mdocs[n->tok].pre && !n->end)
 			child = (*mdocs[n->tok].pre)(m, n, h);
 		break;
 	}
@@ -453,7 +453,7 @@ print_mdoc_node(MDOC_ARGS)
 		mdoc_root_post(m, n, h);
 		break;
 	default:
-		if (mdocs[n->tok].post)
+		if (mdocs[n->tok].post && !n->end)
 			(*mdocs[n->tok].post)(m, n, h);
 		break;
 	}
Index: mdoc_macro.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc_macro.c,v
retrieving revision 1.82
diff -u -p -r1.82 mdoc_macro.c
--- mdoc_macro.c	27 Jun 2010 15:52:41 -0000	1.82
+++ mdoc_macro.c	29 Jun 2010 18:23:00 -0000
@@ -50,6 +50,8 @@ static	int	  	append_delims(struct mdoc 
 				int, int *, char *);
 static	enum mdoct	lookup(enum mdoct, const char *);
 static	enum mdoct	lookup_raw(const char *);
+static	int		make_pending(struct mdoc_node *, enum mdoc_type,
+				struct mdoc *, int, int);
 static	int	  	phrase(struct mdoc *, int, int, char *);
 static	enum mdoct 	rew_alt(enum mdoct);
 static	int	  	rew_dobreak(enum mdoct, 
@@ -61,8 +63,6 @@ static	int	  	rew_last(struct mdoc *, 
 				const struct mdoc_node *);
 static	int	  	rew_sub(enum mdoc_type, struct mdoc *, 
 				enum mdoct, int, int);
-static	int	  	swarn(struct mdoc *, enum mdoc_type, int, 
-				int, const struct mdoc_node *);
 
 const	struct mdoc_macro __mdoc_macros[MDOC_MAX] = {
 	{ in_line_argn, MDOC_CALLABLE | MDOC_PARSED }, /* Ap */
@@ -192,53 +192,6 @@ const	struct mdoc_macro __mdoc_macros[MD
 const	struct mdoc_macro * const mdoc_macros = __mdoc_macros;
 
 
-static int
-swarn(struct mdoc *mdoc, enum mdoc_type type, 
-		int line, int pos, const struct mdoc_node *p)
-{
-	const char	*n, *t, *tt;
-	enum mandocerr	 ec;
-
-	n = t = "<root>";
-	tt = "block";
-
-	switch (type) {
-	case (MDOC_BODY):
-		tt = "multi-line";
-		break;
-	case (MDOC_HEAD):
-		tt = "line";
-		break;
-	default:
-		break;
-	}
-
-	switch (p->type) {
-	case (MDOC_BLOCK):
-		n = mdoc_macronames[p->tok];
-		t = "block";
-		break;
-	case (MDOC_BODY):
-		n = mdoc_macronames[p->tok];
-		t = "multi-line";
-		break;
-	case (MDOC_HEAD):
-		n = mdoc_macronames[p->tok];
-		t = "line";
-		break;
-	default:
-		break;
-	}
-
-	ec = (MDOC_IGN_SCOPE & mdoc->pflags) ?
-		MANDOCERR_SCOPE : MANDOCERR_SYNTSCOPE;
-
-	return(mdoc_vmsg(mdoc, ec, line, pos, 
-				"%s scope breaks %s of %s", 
-				tt, t, n));
-}
-
-
 /*
  * This is called at the end of parsing.  It must traverse up the tree,
  * closing out open [implicit] scopes.  Obviously, open explicit scopes
@@ -410,7 +363,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 		/* FALLTHROUGH */
 	case (MDOC_Vt):
 		assert(MDOC_TAIL != type);
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	case (MDOC_It):
@@ -464,7 +421,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_So):
 		/* FALLTHROUGH */
 	case (MDOC_Xo):
-		if (type == p->type && tok == p->tok)
+		if (tok != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	/* Multi-line explicit scope close. */
@@ -499,7 +460,11 @@ rew_dohalt(enum mdoct tok, enum mdoc_typ
 	case (MDOC_Sc):
 		/* FALLTHROUGH */
 	case (MDOC_Xc):
-		if (type == p->type && rew_alt(tok) == p->tok)
+		if (rew_alt(tok) != p->tok)
+			break;
+		if (p->end)
+			return(REWIND_HALT);
+		if (type == p->type)
 			return(REWIND_REWIND);
 		break;
 	default:
@@ -526,6 +491,8 @@ rew_dobreak(enum mdoct tok, const struct
 		return(1);
 	if (MDOC_VALID & p->flags)
 		return(1);
+	if (MDOC_BODY == p->type && p->end)
+		return(1);
 
 	switch (tok) {
 	case (MDOC_It):
@@ -576,6 +543,83 @@ rew_elem(struct mdoc *mdoc, enum mdoct t
 }
 
 
+/*
+ * We are trying to close a block identified by tok,
+ * but the child block *broken is still open.
+ * Thus, postpone closing the tok block
+ * until the rew_sub call closing *broken.
+ */
+static int
+make_pending(struct mdoc_node *broken, enum mdoct tok,
+		struct mdoc *m, int line, int ppos)
+{
+	struct mdoc_node *breaker;
+
+	/*
+	 * Iterate backwards, searching for the block matching tok,
+	 * that is, the block breaking the *broken block.
+	 */
+	for (breaker = broken->parent; breaker; breaker = breaker->parent) {
+
+		/*
+		 * If the *broken block had already been broken before
+		 * and we encounter its breaker, make the tok block
+		 * pending on the inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
+		 * becomes "[A broken=[B [C->B B] tok=A] C]"
+		 * and finally "[A [B->A [C->B B] A] C]".
+		 */
+		if (breaker == broken->pending) {
+			broken = breaker;
+			continue;
+		}
+
+		if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
+			continue;
+		if (MDOC_BODY == broken->type)
+			broken = broken->parent;
+
+		/*
+		 * Found the breaker.
+		 * If another, outer breaker is already pending on
+		 * the *broken block, we must not clobber the link
+		 * to the outer breaker, but make it pending on the
+		 * new, now inner breaker.
+		 * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
+		 * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
+		 * and finally "[A [B->A [C->B A] B] C]".
+		 */
+		if (broken->pending) {
+			struct mdoc_node *taker;
+
+			/*
+			 * If the breaker had also been broken before,
+			 * it cannot take on the outer breaker itself,
+			 * but must hand it on to its own breakers.
+			 * Graphically, this is the following situation:
+			 * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
+			 * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
+			 */
+			taker = breaker;
+			while (taker->pending)
+				taker = taker->pending;
+			taker->pending = broken->pending;
+		}
+		broken->pending = breaker;
+		mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos, "%s breaks %s",
+		    mdoc_macronames[tok], mdoc_macronames[broken->tok]);
+		return(1);
+	}
+
+	/*
+	 * Found no matching block for tok.
+	 * Are you trying to close a block that is not open?
+	 * Report failure and abort the parser.
+	 */
+	mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
+	return(0);
+}
+
 static int
 rew_sub(enum mdoc_type t, struct mdoc *m, 
 		enum mdoct tok, int line, int ppos)
@@ -587,7 +631,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 	for (n = m->last; n; n = n->parent) {
 		c = rew_dohalt(tok, t, n);
 		if (REWIND_HALT == c) {
-			if (MDOC_BLOCK != t)
+			if (n->end || MDOC_BLOCK != t)
 				return(1);
 			if ( ! (MDOC_EXPLICIT & mdoc_macros[tok].flags))
 				return(1);
@@ -599,8 +643,7 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 			break;
 		else if (rew_dobreak(tok, n))
 			continue;
-		if ( ! swarn(m, t, line, ppos, n))
-			return(0);
+		return(make_pending(n, tok, m, line, ppos));
 	}
 
 	assert(n);
@@ -608,15 +651,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
 		return(0);
 
 	/*
-	 * The current block extends an enclosing block beyond a line
-	 * break.  Now that the current block ends, close the enclosing
-	 * block, too.
+	 * The current block extends an enclosing block.
+	 * Now that the current block ends, close the enclosing block, too.
 	 */
-	if (NULL != (n = n->pending)) {
-		assert(MDOC_HEAD == n->type);
+	while (NULL != (n = n->pending)) {
 		if ( ! rew_last(m, n))
 			return(0);
-		if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
+		if (MDOC_HEAD == n->type &&
+		    ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
 			return(0);
 	}
 
@@ -672,9 +714,13 @@ append_delims(struct mdoc *m, int line, 
 static int
 blk_exp_close(MACRO_PROT_ARGS)
 {
+	struct mdoc_node *body;		/* Our own body. */
+	struct mdoc_node *later;	/* A sub-block starting later. */
+	struct mdoc_node *n;		/* For searching backwards. */
+
 	int	 	 j, lastarg, maxargs, flushed, nl;
 	enum margserr	 ac;
-	enum mdoct	 ntok;
+	enum mdoct	 atok, ntok;
 	char		*p;
 
 	nl = MDOC_NEWLINE & m->flags;
@@ -688,6 +734,68 @@ blk_exp_close(MACRO_PROT_ARGS)
 		break;
 	}
 
+	/*
+	 * Search backwards for beginnings of blocks,
+	 * both of our own and of pending sub-blocks.
+	 */
+	atok = rew_alt(tok);
+	body = later = NULL;
+	for (n = m->last; n; n = n->parent) {
+		if (MDOC_VALID & n->flags)
+			continue;
+
+		/* Remember the start of our own body. */
+		if (MDOC_BODY == n->type && atok == n->tok) {
+			if ( ! n->end)
+				body = n;
+			continue;
+		}
+
+		if (MDOC_BLOCK != n->type)
+			continue;
+		if (atok == n->tok) {
+			assert(body);
+
+			/*
+			 * Found the start of our own block.
+			 * When there is no pending sub block,
+			 * just proceed to closing out.
+			 */
+			if (NULL == later)
+				break;
+
+			/* 
+			 * When there is a pending sub block,
+			 * postpone closing out the current block
+			 * until the rew_sub() closing out the sub-block.
+			 */
+			if ( ! make_pending(later, tok, m, line, ppos))
+				return(0);
+
+			/*
+			 * Mark the place where the formatting - but not
+			 * the scope - of the current block ends.
+			 */
+			if ( ! mdoc_endbody_alloc(m, line, ppos,
+			    atok, body, ENDBODY_SPACE))
+				return(0);
+			break;
+		}
+
+		/*
+		 * When finding an open sub block, remember the last
+		 * open explicit block, or, in case there are only
+		 * implicit ones, the first open implicit block.
+		 */
+		if (later &&
+		    MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
+			continue;
+		if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
+			assert( ! (MDOC_ACTED & n->flags));
+			later = n;
+		}
+	}
+
 	if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
 		/* FIXME: do this in validate */
 		if (buf[*pos]) 
@@ -702,7 +810,7 @@ blk_exp_close(MACRO_PROT_ARGS)
 	if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
-	if (maxargs > 0) 
+	if (NULL == later && maxargs > 0) 
 		if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
 			return(0);
 
@@ -1255,22 +1363,36 @@ blk_part_imp(MACRO_PROT_ARGS)
 		body->parent->flags |= MDOC_EOS;
 	}
 
+	/*
+	 * If there is an open sub-block requiring explicit close-out,
+	 * postpone closing out the current block
+	 * until the rew_sub() call closing out the sub-block.
+	 */
+	for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
+		if (MDOC_BLOCK == n->type &&
+		    MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
+		    ! (MDOC_VALID & n->flags)) {
+			assert( ! (MDOC_ACTED & n->flags));
+			if ( ! make_pending(n, tok, m, line, ppos))
+				return(0);
+			if ( ! mdoc_endbody_alloc(m, line, ppos,
+			    tok, body, ENDBODY_NOSPACE))
+				return(0);
+			return(1);
+		}
+	}
+
 	/* 
 	 * If we can't rewind to our body, then our scope has already
 	 * been closed by another macro (like `Oc' closing `Op').  This
 	 * is ugly behaviour nodding its head to OpenBSD's overwhelming
 	 * crufty use of `Op' breakage.
-	 *
-	 * FIXME - this should be ifdef'd OpenBSD?
 	 */
-	for (n = m->last; n; n = n->parent)
-		if (body == n)
-			break;
-
-	if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
+	if (n != body && ! mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
+	    "%s broken", mdoc_macronames[tok]))
 		return(0);
 
-	if (n && ! rew_last(m, body))
+	if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
 		return(0);
 
 	/* Standard appending of delimiters. */
@@ -1280,7 +1402,7 @@ blk_part_imp(MACRO_PROT_ARGS)
 
 	/* Rewind scope, if applicable. */
 
-	if (n && ! rew_last(m, blk))
+	if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
 		return(0);
 
 	return(1);
Index: mdoc_term.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc_term.c,v
retrieving revision 1.161
diff -u -p -r1.161 mdoc_term.c
--- mdoc_term.c	27 Jun 2010 17:53:27 -0000	1.161
+++ mdoc_term.c	29 Jun 2010 18:23:01 -0000
@@ -325,20 +325,37 @@ print_mdoc_node(DECL_ARGS)
 	memset(&npair, 0, sizeof(struct termpair));
 	npair.ppair = pair;
 
-	if (MDOC_TEXT != n->type) {
-		if (termacts[n->tok].pre)
-			chld = (*termacts[n->tok].pre)(p, &npair, m, n);
-	} else 
+	if (MDOC_TEXT == n->type)
 		term_word(p, n->string); 
+	else if (termacts[n->tok].pre && !n->end)
+		chld = (*termacts[n->tok].pre)(p, &npair, m, n);
 
 	if (chld && n->child)
 		print_mdoc_nodelist(p, &npair, m, n->child);
 
 	term_fontpopq(p, font);
 
-	if (MDOC_TEXT != n->type)
-		if (termacts[n->tok].post)
-			(*termacts[n->tok].post)(p, &npair, m, n);
+	if (MDOC_TEXT != n->type &&
+	    termacts[n->tok].post &&
+	    ! (MDOC_ENDED & n->flags)) {
+		(*termacts[n->tok].post)(p, &npair, m, n);
+
+		/*
+		 * Explicit end tokens not only call the post
+		 * handler, but also tell the respective block
+		 * that it must not call the post handler again.
+		 */
+		if (n->end)
+			n->pending->flags |= MDOC_ENDED;
+
+		/*
+		 * End of line terminating an implicit block
+		 * while an explicit block is still open.
+		 * Continue the explicit block without spacing.
+		 */
+		if (ENDBODY_NOSPACE == n->end)
+			p->flags |= TERMP_NOSPACE;
+	}
 
 	if (MDOC_EOS & n->flags)
 		p->flags |= TERMP_SENTENCE;
Index: tree.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/tree.c,v
retrieving revision 1.22
diff -u -p -r1.22 tree.c
--- tree.c	26 Jun 2010 15:36:37 -0000	1.22
+++ tree.c	29 Jun 2010 18:23:01 -0000
@@ -75,7 +75,10 @@ print_mdoc(const struct mdoc_node *n, in
 		t = "block-head";
 		break;
 	case (MDOC_BODY):
-		t = "block-body";
+		if (n->end)
+			t = "body-end";
+		else
+			t = "block-body";
 		break;
 	case (MDOC_TAIL):
 		t = "block-tail";
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-29 18:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20100627001247.GB2850@iris.usta.de>
2010-06-27 23:47 ` Back-block nesting patch Kristaps Dzonsons
2010-06-28 19:02   ` Ingo Schwarze
2010-06-28 19:26     ` Kristaps Dzonsons
2010-06-29  5:06   ` rew_dohalt reorg and bad block-nesting bug Ingo Schwarze
2010-06-29  9:18     ` Kristaps Dzonsons
2010-06-29 18:33       ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).