tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: mdocml: Unify mdoc and man enums and structs into mandoc.h.
Date: Mon, 4 Oct 2010 00:36:47 +0200	[thread overview]
Message-ID: <20101003223647.GA20734@iris.usta.de> (raw)
In-Reply-To: <4CA8B41C.7020300@bsd.lv>

Hi Kristaps,

> Thoughts?

Many, and conflicting ones; so i cannot present final solutions,
but some thoughts indeed.

> I want to make a simple mandoc.h and libmandoc.a that has all the
> ingredients for writing front-ends, such as a fancier makewhatis and
> apropos, or man.cgi or whatnot.

Regarding libmandoc.a, sure.  Actually, i don't see much of a point
in having libraries at all in this context; i doubt that anybody will
ever want to use the parsers outside the mandoc program, or, put
the other way round, all functionality that can reasonably be based
on the mdoc language can probably reasonably be included into the
mandoc binary program.

Regarding makewhatis, apropos and man.cgi, i do not have much hope.
Remember that those must be able to work on the -Tascii output,
at least in OpenBSD, because that's the only version of the manuals
getting installed, and there is next to no hope to have that changed,
based on what Theo and Bob say.  Besides, i don't really see a need
to install manual source code either.  On a typical production
system, you don't need manual source code, just as you don't need
program source code; besides, the src.tar.gz ball is readily
available for each release, and anonymous CVS is not rocket science
either, in case you need the sources for some reason.

Regarding mandoc.h, actually, i still don't see the point.
Why should a file like mdoc_macro.c, or even mdoc_term.c,
be forced to include man data structures and function prototypes?
In the current implementation, there is not a single file
including both man.h and mdoc.h or both libman.h and libmdoc.h,
except main.c and tree.c.  And even if there were one or two
such files:  What is the advantage of a frontend file including
just mandoc.h instead of man.h and mdoc.h?

To the opposite:  In the frontends, i think it is good to keep
the following parts separate:
 1. language-independent output code
    e.g. doing things like indentation, line breaking, filling,
    hyphenation - term.c being a typical example
 2. language-dependent output code common to man and mdoc
    e.g. character translating tables like in chars.c
 3. language-specific AST-interpretation code
    e.g. deciding how much indentation .Bd needs - mdoc_term.c
Here, 1 & 2 do not need any language-dependent headers (but
probably language-independent headers like mandoc.h), while 3.
needs headers for *one* language (but not two).

> To begin with: roff.h, mdoc.h, and man.h -> mandoc.h; libmdoc.a,
> libroff.a, and libman.a (and associated stuff) being merged into a
> single libmandoc.a.  Then libmdoc.h, libman.h, and libroff.h being
> merged into libmandoc.h, used internally within libmandoc.a.
> 
> This will reduce structural complexity that's been bothering me for a while.
> 
> Once this is done, I will abstract and push the fdesc() function
> into the library: it implements parts of the grammar (such as
> escaped newlines) that should be internal to the library.
> 
> Another push is to get the escape routines in one place; right now,
> the functionality is duplicated.  Restructuring is a necessary
> precondition before I do so.

Wouldn't that suggest a structure like the following?
Admittedly, i'm just drawing a big picture, and a somewhat vague one.
Non-trivial design devils will certainly hide in the details...

1. A common lower layer, including:
   1.1. utilities used everywhere
        like memory management, error handling...
   1.2. roff parser, including fdesc() and escape parsing
   1.3. roff output, including escape rendering
   1.4. language-independent output handling (see 1. above)

2. Two middle layers for two languages, man and mdoc:
   2.2. macro parsers, producing ASTs, using 1.2.
   2.3. AST renderers, using 1.3 and 1.4

3. Upper layer:
   The main program tying 2.2. and 2.3. together for both backends


That said, here the conflicting thoughts i mentioned at the
beginning will show up: There IS a reason to bind man and mdoc
closer together.  Both languages include features of one third
language, roff.  And it is not only escapes which are common
to both: There are also common macros.  Here is a list of roff
macros that *might* be relevant to mandoc - this list is definitely
incomplete, some of these are already implemented in both mandoc
backends, some only in one, some in libroff, some not at all:

 .ad        - adjust output lines left, center, right...
 .bp        - eject current page
 .br        - break line
 .break     - break out of repeted execution
 .char      - define character to string
 .continue  - start next cycle of repeated execution
 .de        - define macro
 .di        - divert output to macro
 .ds        - define string
 .el        - else clause for conditional execution
 .fi        - fill output lines
 .hy        - enable hyphenation
 .ie        - conditional execution allowing else clause
 .if        - conditional execution
 .ig        - ignore following input
 .in        - indent
 .length    - store the length of a string into a register
 .ll        - set line length
 .nf        - do not fill output lines
 .nh        - disable hyphenation
 .nm        - output line numbering
 .nr        - define and set number register
 .ns        - no-space mode
 .os        - output saved vertical distance
 .papersize - set the paper size (think of -Tps)
 .pl        - set page length in lines
 .rm        - remove request, macro or string
 .rn        - rename request, macro or string
 .rr        - remove register
 .rs        - restore spacing mode
 .sentchar  - define sentence-endig characters
 .sp        - vertical space
 .substring - replace string by a substring
 .sv        - save vertical distance
 .ta        - tab settings
 .tl        - three part title
 .tm        - print string on terminal (stderr)
 .tr        - translate characters on output
 .ul        - underline
 .while     - repeated execution

Besides, the distinction of macros and escapes is fuzzy.
Here is a list of a few roff escapes actually behaving more
like macros, that is, not just producing one output character,
but having non-local effects on the parsing process:

 \" - start a comment
 \* - interpolate a string
 \d - half vertical space (oops - similar to .sp)
 \f - switch font (oops - similar to .Em)
 \n - interpolate number register
 \p - break output line (oops - similar to .br)
 \R - set number register (oops - similar to .nr)
 \s - set font size

So, in the very long term a need might arise to

 1. Handle roff macros in a common module, and be able to intermix
    them with high level, in particular man, macros
 2. Handle roff escape sequences in a way similar to macros,
    such that they create elements (\n, \p) or even blocks (\f, \s)

Note that not all of the macros can be handled well by a preprocessor,
for example .bp .br .sp are clearly elements and .ad .fi .in .ul are
clearly blocks.  Besides, even part of the stuff that, on first
sight, can be handled by a preprocessor, actually cannot, e.g. .ds:
Once strings are set dynamically, deleted and reset and then
maybe interpolate registers influenced by high-level macros.

On top of that, i have seen stray man macros, for example .B,
used in mdoc documents.  Taking all that together, it *might* make
sense in the distant future to have a common macro table for roff,
man and mdoc.  Or perhaps that's overkill and it might not, i'm
not sure.  Even if we don't go for a full common table, some way
to include the same roff macros into both man and mdoc ASTs might
turn out to be useful, without implementing them twice.  And some
way to handle at least some escape sequences as elements and blocks.

Now, this is certainly inconsistent - just some thoughts.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

  reply	other threads:[~2010-10-03 22:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201010021014.o92AEcOr023027@krisdoz.my.domain>
     [not found] ` <20101002175621.GB19515@iris.usta.de>
2010-10-03 16:49   ` Kristaps Dzonsons
2010-10-03 22:36     ` Ingo Schwarze [this message]
2010-10-04  6:35       ` Kristaps Dzonsons
2010-10-04 20:05         ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101003223647.GA20734@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).