From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o93ManXH002273 for ; Sun, 3 Oct 2010 18:36:51 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1P2XAl-0004Qd-Nl; Mon, 04 Oct 2010 00:36:47 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.71) (envelope-from ) id 1P2XAl-00025j-Mf for tech@mdocml.bsd.lv; Mon, 04 Oct 2010 00:36:47 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1P2XAl-0001AS-Ln for tech@mdocml.bsd.lv; Mon, 04 Oct 2010 00:36:47 +0200 Received: from schwarze by usta.de with local (Exim 4.71) (envelope-from ) id 1P2XAl-0000XY-Cm for tech@mdocml.bsd.lv; Mon, 04 Oct 2010 00:36:47 +0200 Date: Mon, 4 Oct 2010 00:36:47 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: mdocml: Unify mdoc and man enums and structs into mandoc.h. Message-ID: <20101003223647.GA20734@iris.usta.de> References: <201010021014.o92AEcOr023027@krisdoz.my.domain> <20101002175621.GB19515@iris.usta.de> <4CA8B41C.7020300@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CA8B41C.7020300@bsd.lv> User-Agent: Mutt/1.5.20 (2009-06-14) Hi Kristaps, > Thoughts? Many, and conflicting ones; so i cannot present final solutions, but some thoughts indeed. > I want to make a simple mandoc.h and libmandoc.a that has all the > ingredients for writing front-ends, such as a fancier makewhatis and > apropos, or man.cgi or whatnot. Regarding libmandoc.a, sure. Actually, i don't see much of a point in having libraries at all in this context; i doubt that anybody will ever want to use the parsers outside the mandoc program, or, put the other way round, all functionality that can reasonably be based on the mdoc language can probably reasonably be included into the mandoc binary program. Regarding makewhatis, apropos and man.cgi, i do not have much hope. Remember that those must be able to work on the -Tascii output, at least in OpenBSD, because that's the only version of the manuals getting installed, and there is next to no hope to have that changed, based on what Theo and Bob say. Besides, i don't really see a need to install manual source code either. On a typical production system, you don't need manual source code, just as you don't need program source code; besides, the src.tar.gz ball is readily available for each release, and anonymous CVS is not rocket science either, in case you need the sources for some reason. Regarding mandoc.h, actually, i still don't see the point. Why should a file like mdoc_macro.c, or even mdoc_term.c, be forced to include man data structures and function prototypes? In the current implementation, there is not a single file including both man.h and mdoc.h or both libman.h and libmdoc.h, except main.c and tree.c. And even if there were one or two such files: What is the advantage of a frontend file including just mandoc.h instead of man.h and mdoc.h? To the opposite: In the frontends, i think it is good to keep the following parts separate: 1. language-independent output code e.g. doing things like indentation, line breaking, filling, hyphenation - term.c being a typical example 2. language-dependent output code common to man and mdoc e.g. character translating tables like in chars.c 3. language-specific AST-interpretation code e.g. deciding how much indentation .Bd needs - mdoc_term.c Here, 1 & 2 do not need any language-dependent headers (but probably language-independent headers like mandoc.h), while 3. needs headers for *one* language (but not two). > To begin with: roff.h, mdoc.h, and man.h -> mandoc.h; libmdoc.a, > libroff.a, and libman.a (and associated stuff) being merged into a > single libmandoc.a. Then libmdoc.h, libman.h, and libroff.h being > merged into libmandoc.h, used internally within libmandoc.a. > > This will reduce structural complexity that's been bothering me for a while. > > Once this is done, I will abstract and push the fdesc() function > into the library: it implements parts of the grammar (such as > escaped newlines) that should be internal to the library. > > Another push is to get the escape routines in one place; right now, > the functionality is duplicated. Restructuring is a necessary > precondition before I do so. Wouldn't that suggest a structure like the following? Admittedly, i'm just drawing a big picture, and a somewhat vague one. Non-trivial design devils will certainly hide in the details... 1. A common lower layer, including: 1.1. utilities used everywhere like memory management, error handling... 1.2. roff parser, including fdesc() and escape parsing 1.3. roff output, including escape rendering 1.4. language-independent output handling (see 1. above) 2. Two middle layers for two languages, man and mdoc: 2.2. macro parsers, producing ASTs, using 1.2. 2.3. AST renderers, using 1.3 and 1.4 3. Upper layer: The main program tying 2.2. and 2.3. together for both backends That said, here the conflicting thoughts i mentioned at the beginning will show up: There IS a reason to bind man and mdoc closer together. Both languages include features of one third language, roff. And it is not only escapes which are common to both: There are also common macros. Here is a list of roff macros that *might* be relevant to mandoc - this list is definitely incomplete, some of these are already implemented in both mandoc backends, some only in one, some in libroff, some not at all: .ad - adjust output lines left, center, right... .bp - eject current page .br - break line .break - break out of repeted execution .char - define character to string .continue - start next cycle of repeated execution .de - define macro .di - divert output to macro .ds - define string .el - else clause for conditional execution .fi - fill output lines .hy - enable hyphenation .ie - conditional execution allowing else clause .if - conditional execution .ig - ignore following input .in - indent .length - store the length of a string into a register .ll - set line length .nf - do not fill output lines .nh - disable hyphenation .nm - output line numbering .nr - define and set number register .ns - no-space mode .os - output saved vertical distance .papersize - set the paper size (think of -Tps) .pl - set page length in lines .rm - remove request, macro or string .rn - rename request, macro or string .rr - remove register .rs - restore spacing mode .sentchar - define sentence-endig characters .sp - vertical space .substring - replace string by a substring .sv - save vertical distance .ta - tab settings .tl - three part title .tm - print string on terminal (stderr) .tr - translate characters on output .ul - underline .while - repeated execution Besides, the distinction of macros and escapes is fuzzy. Here is a list of a few roff escapes actually behaving more like macros, that is, not just producing one output character, but having non-local effects on the parsing process: \" - start a comment \* - interpolate a string \d - half vertical space (oops - similar to .sp) \f - switch font (oops - similar to .Em) \n - interpolate number register \p - break output line (oops - similar to .br) \R - set number register (oops - similar to .nr) \s - set font size So, in the very long term a need might arise to 1. Handle roff macros in a common module, and be able to intermix them with high level, in particular man, macros 2. Handle roff escape sequences in a way similar to macros, such that they create elements (\n, \p) or even blocks (\f, \s) Note that not all of the macros can be handled well by a preprocessor, for example .bp .br .sp are clearly elements and .ad .fi .in .ul are clearly blocks. Besides, even part of the stuff that, on first sight, can be handled by a preprocessor, actually cannot, e.g. .ds: Once strings are set dynamically, deleted and reset and then maybe interpolate registers influenced by high-level macros. On top of that, i have seen stray man macros, for example .B, used in mdoc documents. Taking all that together, it *might* make sense in the distant future to have a common macro table for roff, man and mdoc. Or perhaps that's overkill and it might not, i'm not sure. Even if we don't go for a full common table, some way to include the same roff macros into both man and mdoc ASTs might turn out to be useful, without implementing them twice. And some way to handle at least some escape sequences as elements and blocks. Now, this is certainly inconsistent - just some thoughts. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv