tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Kristaps Dzonsons <kristaps@bsd.lv>
To: "tech@mdocml.bsd.lv" <tech@mdocml.bsd.lv>
Subject: Back-block nesting patch.
Date: Mon, 28 Jun 2010 01:47:18 +0200	[thread overview]
Message-ID: <4C27E306.6070605@bsd.lv> (raw)
In-Reply-To: <20100627001247.GB2850@iris.usta.de>

Ingo, some comments in-line before I crash and sleep.  This is CC'd to 
tech@mdocml so that my plan is known.

The following comments (in-line + snippage) are mechanical; I won't be 
catching any logic bugs until this is merged and I can look through it 
as part of the code.

On that note, here's what I suggest.  Let's get fully BSD.lv-OpenBSD 
sync'd, then I'll tag the tree.  Then we merge, clean up, and EVERYBODY 
ON THIS LIST test the hell out of it.  I'll post to discuss@ as well. 
Ingo's put a lot of work into this and it looks good, but the more eyes 
in different local manual trees, the better.

My first general comment: I want a big fat non-fatal "you're an idiot 
for making this manual" warning for each and every break (this may 
already be implemented).

My second is that the documentation must be precise for this, because 
it's quite confusing.  I mean mdoc.3, which documents the AST.  I'd 
rather mdoc.7 know nothing about this.  This documentation should go in 
with the first raft of commits.

> +       int               end;          /* BODY */

Can you make this into an enum to reduce ambiguity?

> +/*
> + * We are trying to close a block identified by tok,
> + * but the child block *broken is still open.
> + * Thus, postpone closing the tok block
> + * until the rew_sub call closing *broken.
> + */
> +static int
> +make_pending(struct mdoc_node *broken, enum mdoct tok,
> +               struct mdoc *m, int line, int ppos)
> +{
> +       struct mdoc_node *breaker;
> +
> +       /*
> +        * Iterate backwards, searching for the block matching tok,
> +        * that is, the block breaking the *broken block.
> +        */
> +       for (breaker = broken->parent; breaker; breaker = breaker->parent) {
> +
> +               /*
> +                * If the *broken block had already been broken before
> +                * and we encounter its breaker, make the tok block
> +                * pending on the inner breaker.
> +                * Graphically, "[A breaker=[B broken=[C->B B] tok=A] C]"
> +                * becomes "[A broken=[B [C->B B] tok=A] C]"
> +                * and finally "[A [B->A [C->B B] A] C]".
> +                */
> +               if (breaker == broken->pending) {
> +                       broken = breaker;
> +                       continue;
> +               }
> +
> +               if (REWIND_REWIND != rew_dohalt(tok, MDOC_BLOCK, breaker))
> +                       continue;
> +               if (MDOC_BODY == broken->type)
> +                       broken = broken->parent;
> +
> +               /*
> +                * Found the breaker.
> +                * If another, outer breaker is already pending on
> +                * the *broken block, we must not clobber the link
> +                * to the outer breaker, but make it pending on the
> +                * new, now inner breaker.
> +                * Graphically, "[A breaker=[B broken=[C->A A] tok=B] C]"
> +                * becomes "[A breaker=[B->A broken=[C A] tok=B] C]"
> +                * and finally "[A [B->A [C->B A] B] C]".
> +                */
> +               if (broken->pending) {
> +                       struct mdoc_node *taker;
> +
> +                       /*
> +                        * If the breaker had also been broken before,
> +                        * it cannot take on the outer breaker itself,
> +                        * but must hand it on to its own breakers.
> +                        * Graphically, this is the following situation:
> +                        * "[A [B breaker=[C->B B] broken=[D->A A] tok=C] D]"
> +                        * "[A taker=[B->A breaker=[C->B B] [D->C A] C] D]"
> +                        */
> +                       taker = breaker;
> +                       while (taker->pending)
> +                               taker = taker->pending;
> +                       taker->pending = broken->pending;
> +               }
> +               broken->pending = breaker;
> +               mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos, "%s breaks %s",
> +                   mdoc_macronames[tok], mdoc_macronames[broken->tok]);
> +               return(1);
> +       }
> +
> +       /*
> +        * Found no matching block for tok.
> +        * Are you trying to close a block that is not open?
> +        * Report failure and abort the parser.
> +        */
> +       mdoc_pmsg(m, line, ppos, MANDOCERR_SYNTNOSCOPE);
> +       return(0);
> +}

The comments in-line here are great.  I just wanted to say.  Good work!

> +               return make_pending(n, tok, m, line, ppos);

Nit: I always parenthesise my returns, e.g.,

       return(make_pending(n, tok, m, line, ppos));

Functional background, what can I say.  Same with case statements.

>         }
> 
>         assert(n);
> @@ -604,15 +647,14 @@ rew_sub(enum mdoc_type t, struct mdoc *m
>                 return(0);
> 
>         /*
> -        * The current block extends an enclosing block beyond a line
> -        * break.  Now that the current block ends, close the enclosing
> -        * block, too.
> +        * The current block extends an enclosing block.
> +        * Now that the current block ends, close the enclosing block, too.
>          */
> -       if (NULL != (n = n->pending)) {
> -               assert(MDOC_HEAD == n->type);
> +       while (NULL != (n = n->pending)) {
>                 if ( ! rew_last(m, n))
>                         return(0);
> -               if ( ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
> +               if (MDOC_HEAD == n->type &&
> +                   ! mdoc_body_alloc(m, n->line, n->pos, n->tok))
>                         return(0);
>         }
>         return(1);
> @@ -667,9 +709,13 @@ append_delims(struct mdoc *m, int line,
>  static int
>  blk_exp_close(MACRO_PROT_ARGS)
>  {
> +       struct mdoc_node *body = NULL;  /* Our own body. */
> +       struct mdoc_node *later = NULL; /* A sub-block starting later. */

Another nit: don't initialise in declarations (this is in OpenBSD's 
style(9) as well).

> +       struct mdoc_node *n;            /* For searching backwards. */

Note: these sorts of comments are really quite useful.  I use them all 
the time in term.c (as I always mix up my vbl's and vis's), so I 
heartily endorse their use for complex bits.

> +
>         int              j, lastarg, maxargs, flushed, nl;
>         enum margserr    ac;
> -       enum mdoct       ntok;
> +       enum mdoct       atok, ntok;
>         char            *p;
> 
>         nl = MDOC_NEWLINE & m->flags;
> @@ -683,6 +729,70 @@ blk_exp_close(MACRO_PROT_ARGS)
>                 break;
>         }
> 
> +       /*
> +        * Search backwards for beginnings of blocks,
> +        * both of our own and of pending sub-blocks.
> +        */
> +       atok = rew_alt(tok);
> +       for (n = m->last; n; n = n->parent) {
> +               if (MDOC_VALID & n->flags)
> +                       continue;
> +
> +               /* Remember the start of our own body. */
> +               if (MDOC_BODY == n->type && atok == n->tok) {
> +                       if (0 == n->end)
> +                               body = n;
> +                       continue;
> +               }
> +
> +               if (MDOC_BLOCK != n->type)
> +                       continue;
> +               if (atok == n->tok) {
> +                       assert(body);

I love assertions!  Use them as often as you'd like (within reason).

> +
> +                       /*
> +                        * Found the start of our own block.
> +                        * When there is no pending sub block,
> +                        * just proceed to closing out.
> +                        */
> +                       if (NULL == later)
> +                               break;
> +
> +                       /*
> +                        * When there is a pending sub block,
> +                        * postpone closing out the current block
> +                        * until the rew_sub() closing out the sub-block.
> +                        */
> +                       if ( ! make_pending(later, tok, m, line, ppos))
> +                               return(0);
> +
> +                       /*
> +                        * Mark the place where the formatting - but not
> +                        * the scope - of the current block ends.
> +                        */
> +                       if ( ! mdoc_elem_alloc(m, line, ppos, atok, NULL))
> +                               return(0);
> +                       m->last->type = MDOC_BODY;
> +                       m->last->end = 1;  /* Ask for normal spacing. */
> +                       m->last->pending = body;
> +                       m->next = MDOC_NEXT_SIBLING;
> +                       break;

Please make a mdoc_endbody_alloc() in mdoc.c to make this fully clear.

> +               }
> +
> +               /*
> +                * When finding an open sub block, remember the last
> +                * open explicit block, or, in case there are only
> +                * implicit ones, the first open implicit block.
> +                */
> +               if (later &&
> +                   MDOC_EXPLICIT & mdoc_macros[later->tok].flags)
> +                       continue;
> +               if (MDOC_CALLABLE & mdoc_macros[n->tok].flags) {
> +                       assert( ! (MDOC_ACTED & n->flags));
> +                       later = n;
> +               }
> +       }
> +
>         if ( ! (MDOC_CALLABLE & mdoc_macros[tok].flags)) {
>                 /* FIXME: do this in validate */
>                 if (buf[*pos])
> @@ -697,7 +807,7 @@ blk_exp_close(MACRO_PROT_ARGS)
>         if ( ! rew_sub(MDOC_BODY, m, tok, line, ppos))
>                 return(0);
> 
> -       if (maxargs > 0)
> +       if (NULL == later && maxargs > 0)
>                 if ( ! mdoc_tail_alloc(m, line, ppos, rew_alt(tok)))
>                         return(0);
> 
> @@ -1122,6 +1232,10 @@ blk_full(MACRO_PROT_ARGS)
>                                 MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
>                                 ! (MDOC_VALID & n->flags)) {
>                         assert( ! (MDOC_ACTED & n->flags));
> +                       mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
> +                           "%s header extended by %s",
> +                           mdoc_macronames[tok],
> +                           mdoc_macronames[n->tok]);
>                         n->pending = head;
>                         return(1);
>                 }
> @@ -1249,20 +1363,39 @@ blk_part_imp(MACRO_PROT_ARGS)
>                 body->parent->flags |= MDOC_EOS;
>         }
> 
> +       /*
> +        * If there is an open sub-block requiring explicit close-out,
> +        * postpone closing out the current block
> +        * until the rew_sub() call closing out the sub-block.
> +        */
> +       for (n = m->last; n && n != body && n != blk->parent; n = n->parent) {
> +               if (MDOC_BLOCK == n->type &&
> +                   MDOC_EXPLICIT & mdoc_macros[n->tok].flags &&
> +                   ! (MDOC_VALID & n->flags)) {
> +                       assert( ! (MDOC_ACTED & n->flags));
> +                       if ( ! make_pending(n, tok, m, line, ppos))
> +                               return(0);
> +                       if ( ! mdoc_elem_alloc(m, line, ppos, tok, NULL))
> +                               return(0);
> +                       m->last->type = MDOC_BODY;
> +                       m->last->end = 2;  /* Ask for TERMP_NOSPACE. */
> +                       m->last->pending = body;
> +                       m->next = MDOC_NEXT_SIBLING;
> +                       return(1);

Same as above for mdoc_endbody_alloc().

> +               }
> +       }
> +
>         /*
>          * If we can't rewind to our body, then our scope has already
>          * been closed by another macro (like `Oc' closing `Op').  This
>          * is ugly behaviour nodding its head to OpenBSD's overwhelming
>          * crufty use of `Op' breakage.
>          */
> -       for (n = m->last; n; n = n->parent)
> -               if (body == n)
> -                       break;
> -
> -       if (NULL == n && ! mdoc_nmsg(m, body, MANDOCERR_SCOPE))
> +       if (n != body && ! mdoc_vmsg(m, MANDOCERR_SCOPE, line, ppos,
> +           "%s broken", mdoc_macronames[tok]))
>                 return(0);
> 
> -       if (n && ! rew_last(m, body))
> +       if (n && ! rew_sub(MDOC_BODY, m, tok, line, ppos))
>                 return(0);
> 
>         /* Standard appending of delimiters. */
> @@ -1272,7 +1405,7 @@ blk_part_imp(MACRO_PROT_ARGS)
> 
>         /* Rewind scope, if applicable. */
> 
> -       if (n && ! rew_last(m, blk))
> +       if (n && ! rew_sub(MDOC_BLOCK, m, tok, line, ppos))
>                 return(0);
> 
>         return(1);
> Index: mdoc_term.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mdoc_term.c,v
> retrieving revision 1.89
> diff -u -p -r1.89 mdoc_term.c
> --- mdoc_term.c 26 Jun 2010 19:08:00 -0000      1.89
> +++ mdoc_term.c 27 Jun 2010 00:08:58 -0000
> @@ -318,20 +318,37 @@ print_mdoc_node(DECL_ARGS)
>         memset(&npair, 0, sizeof(struct termpair));
>         npair.ppair = pair;
> 
> -       if (MDOC_TEXT != n->type) {
> -               if (termacts[n->tok].pre)
> -                       chld = (*termacts[n->tok].pre)(p, &npair, m, n);
> -       } else
> +       if (MDOC_TEXT == n->type)
>                 term_word(p, n->string);
> +       else if (termacts[n->tok].pre && !n->end)
> +               chld = (*termacts[n->tok].pre)(p, &npair, m, n);
> 
>         if (chld && n->child)
>                 print_mdoc_nodelist(p, &npair, m, n->child);
> 
>         term_fontpopq(p, font);
> 
> -       if (MDOC_TEXT != n->type)
> -               if (termacts[n->tok].post)
> -                       (*termacts[n->tok].post)(p, &npair, m, n);
> +       if (MDOC_TEXT != n->type &&
> +           termacts[n->tok].post &&
> +           ! (MDOC_ENDED & n->flags)) {
> +               (*termacts[n->tok].post)(p, &npair, m, n);
> +
> +               /*
> +                * Explicit end tokens not only call the post
> +                * handler, but also tell the respective block
> +                * that it must not call the post handler again.
> +                */
> +               if (n->end)
> +                       n->pending->flags |= MDOC_ENDED;
> +
> +               /*
> +                * End of line terminating an implicit block
> +                * while an explicit block is still open.
> +                * Continue the explicit block without spacing.
> +                */
> +               if (1 < n->end)
> +                       p->flags |= TERMP_NOSPACE;
> +       }

This obviously needs some sort of analogue in mdoc_html.c.

> 
>         if (MDOC_EOS & n->flags)
>                 p->flags |= TERMP_SENTENCE;
> Index: tree.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/tree.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 tree.c
> --- tree.c      23 May 2010 22:45:01 -0000      1.7
> +++ tree.c      27 Jun 2010 00:08:58 -0000
> @@ -70,7 +70,10 @@ print_mdoc(const struct mdoc_node *n, in
>                 t = "block-head";
>                 break;
>         case (MDOC_BODY):
> -               t = "block-body";
> +               if (n->end)
> +                       t = "body-end";
> +               else
> +                       t = "block-body";
>                 break;
>         case (MDOC_TAIL):
>                 t = "block-tail";

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

       reply	other threads:[~2010-06-27 23:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20100627001247.GB2850@iris.usta.de>
2010-06-27 23:47 ` Kristaps Dzonsons [this message]
2010-06-28 19:02   ` Ingo Schwarze
2010-06-28 19:26     ` Kristaps Dzonsons
2010-06-29  5:06   ` rew_dohalt reorg and bad block-nesting bug Ingo Schwarze
2010-06-29  9:18     ` Kristaps Dzonsons
2010-06-29 18:33       ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C27E306.6070605@bsd.lv \
    --to=kristaps@bsd.lv \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).