From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217])
	by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o5F01N75012889
	for <tech@mdocml.bsd.lv>; Mon, 14 Jun 2010 20:01:25 -0400 (EDT)
Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82])
	by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1)
	id 1OOJai-00034N-Nf; Tue, 15 Jun 2010 02:01:21 +0200
Received: from donnerwolke.usta.de ([172.24.96.3])
	by hekate.usta.de with esmtp (Exim 4.71)
	(envelope-from <schwarze@usta.de>)
	id 1OOJai-0007Yd-MO
	for tech@mdocml.bsd.lv; Tue, 15 Jun 2010 02:01:20 +0200
Received: from iris.usta.de ([172.24.96.5] helo=usta.de)
	by donnerwolke.usta.de with esmtp (Exim 4.69)
	(envelope-from <schwarze@usta.de>)
	id 1OOJai-0007mY-L5
	for tech@mdocml.bsd.lv; Tue, 15 Jun 2010 02:01:20 +0200
Received: from schwarze by usta.de with local (Exim 4.71)
	(envelope-from <schwarze@usta.de>)
	id 1OOJai-00039e-9a
	for tech@mdocml.bsd.lv; Tue, 15 Jun 2010 02:01:20 +0200
Date: Tue, 15 Jun 2010 02:01:20 +0200
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: [PATCH] deal with bad block nesting
Message-ID: <20100615000119.GB30818@iris.usta.de>
References: <20100614024711.GI14228@iris.usta.de>
 <4C16952F.1060906@bsd.lv>
X-Mailinglist: mdocml-tech
Reply-To: tech@mdocml.bsd.lv
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4C16952F.1060906@bsd.lv>
User-Agent: Mutt/1.5.20 (2009-06-14)

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, Jun 14, 2010 at 10:46:39PM +0200:

> First I'm going to tag 1.10.2, which features improved -Tps and some
> performance/simplicity improvements (the cached data points), then
> we can take a more in-depth look at this with a stable roll-back
> point.

Sounds good.

> My first impression is that this makes our nice and regular AST into
> shit soup.

In a way, yes, no doubt.

The open questions are whether we can do better, how much effeort
that will cause, and which will be the best point in time to try.

> Don't get me wrong--it's a wonderful piece of work, but
> I think we can do it better in a completely different way.
> 
> I propose (dum dum duuuum):
> 
> If the offending macros are all Ao/Ac Eo/Ec blk_part_exp(), which it
> seems from the code,

Well, when talking about bad nesting, we always talk about *two* macros:
One that is breaking the other, and the other one that is broken.

The former, the breaking macro, can be any PARSED block macro.
Here ist the complete list of parsed block macros:

 * blk_full: It  (implicit)
 * blk_part_exp: all (Ao Bo Do Po Qo So Xo Oo Bro Eo)
 * blk_part_imp: all (D1 Dl Op Aq Bq Dq Pq Ql Qq Sq Brq)

The latter, the broken macro, can be any CALLABLE block macro.
Here ist the complete list of callable block macros:

 * blk_full: Fo  (explicit)
 * blk_part_exp: all (Ao Bo Do Po Qo So Xo Oo Bro Eo)
 * blk_part_imp: Op Aq Bq Dq Pq Ql Qq Sq Brq

The relevant combinations are somewhat restricted because implicit
macros cannot break each other: Two subsequent implicit macros
implicitely end at the same place, that is, without one breaking
the other.  Thus, we get the following breakings:

 * It broken by blk_part_exp
 * It broken by Fo  (just found that case right now! oops!)
 * blk_part_exp broken by blk_part_exp
 * blk_part_exp broken by Fo  (again, oops, as above)
 * blk_part_exp broken by blk_part_imp
 * blk_part_imp broken by blk_part_exp

The new case It/Fo

.Bl -tag -width Ds
.It it Fo sin
.Fa "double x"
.Fc
The sine function.
.El

gives in old groff:

     it sin(
             double x) The sine function.

in new groff:

             sin( The sine function.

in mandoc:

     it sin(double x)
             The sine function.

So i guess we don't need to worry about that one,
groff pukes, and not even consistently.

> why don't we just make these into
> in_line_argv(1) macros and completely avoid this complexity?

Which cases does this address?

 * It broken by blk_part_exp           => MADE WORSE
 * It broken by Fo                     => NOT SOLVED, but WEIRD ANYWAY
 * blk_part_exp broken by blk_part_exp => SOLVED
 * blk_part_exp broken by Fo           => SOLVED
 * blk_part_exp broken by blk_part_imp => SOLVED
 * blk_part_imp broken by blk_part_exp => SOLVED

The first case is made worse because blk_part_exp (say, Ao) does
not only break It, but extends it.  So, the Ac macro must close
the It HEAD block.  As long as Ac closes an Ao block, and that
block remembers that it was broken by the It, it is relatively
easy for the Ac to search its corresponding Ao (because that's
a block, it can only be in the direct line of parents, not
hidden somewhere as a child) and close out the pending It HEAD.
Now, if the Ao is just in_line, it can be hidden anywhere, and
the whole point of the change is that the It does not even
realize breaking anything.  So, how is the It going to realize
that it is being extended?  And how is the Ac going to find out
that it needs to perform the postponed It close-out?

This will need completely new algorithms.  For example, a global
pointer and a global stack, the pointer normally being NULL, the
stack empty.  When entering an It HEAD, you let the pointer point
to the new It.  When parsing a block-opening in_line (Ao) while
the pointer is non-NULL, you push a note onto the stack.  When
parsing a block-closing in_line (Ac) while the pointer is non-NULL,
you remove the latest matching note from the stack.  When, at
the end of the It line, the stack is not empty, you extend the
It HEAD scope to the next line.  Then, you don't add new notes
to the stack any more, but continue removing old ones.  As soon
as the stack becomes empty, the It scope ends.

But is that really better than what I sent?

> My motivation is that none of the Ao/Ac/etc. macros are semantically
> or syntatically valuable.  They're just eye-candy.  They don't
> affect their contents, just the point at which the Ao or Ac is
> invoked.

Maybe, but you still want to warn when they don't pair up, right?
How do you check that when they are blocks no longer?
OK, you can set up a different algorithm, for example a global
counter for each type, and when that counter goes negative,
you warn ("block not open"), and when it is still non-zero at
the end of a scope (e.g., at Ss, Sh, Ed, El, It or end of file),
you warn as well ("unclosed block").

Again, new stuff to be designed and implemented.

> The only thing this doesn't apply to is Oo/Oc, which has semantic
> meaning in the SYNOPSIS.  Even this failure can be reduced if we
> special-case Oo/Oc within the SYNOPSIS (we already do this with Vt)
> to be a "real" block and the rest to be some shitty in_line_eoln().

Except that Oo/Oc and Op are most notorious for bad nesting,
in particular in the SYNOPSIS.  So, are we then right back to the
original problem exactly where it is going to hurt us most?
If Oo/Oc is going to stay a block in the SYNOPSIS, how are we going
to handle bad O* nesting in the SYNOPSIS?  It is not only O*, O* may
and does mix with Xo and might even mix with Fo (not sure),
in particular in the SYNOPSIS.

> Ingo, what are your thoughts on such an approach?  Of course, my
> arguments hinge on the primary offenders for this behaviour...

If your approach turns out to be able to deal with all cases (as i'm
quite convinced mine does, because it is implemented, i see that it
works, and have a good feeling now how difficult fixing any remaining
bugs might be in case any show up), and in case your approach turns
out to be simpler and cleaner than mine, we should certainly take
yours, because then i would consider it the better one.

The above questions do not look completely trivial, though, and there
may be hidden obstacles along the way that i overlooked (there were
many with my approach that i found and solved one by one).  So, who
is going to implement your suggestion, and work out the remaining
design decisions, such that we can compare and decide which solution
is the better one?

I'm not particularly keen on doing that *right now* (or during the
hackathon), even if it is better.  There are still some other
critical problems to be solved, and personally, i would rather
take the solution i have for one problem, at least for now, and
move on to the next one instead of starting over - and taking the
risk to get stuck with the same problem a second time.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv