From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.201]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p2M2Ka6t000486 for ; Mon, 21 Mar 2011 22:20:37 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-01.scc.kit.edu with esmtp (Exim 4.72 #1) id 1Q1rCx-00036E-5I; Tue, 22 Mar 2011 03:20:34 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1Q1rCx-0003XM-3v for tech@mdocml.bsd.lv; Tue, 22 Mar 2011 03:20:31 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1Q1rCx-00053a-2q for tech@mdocml.bsd.lv; Tue, 22 Mar 2011 03:20:31 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1Q1rCw-0005eA-OS for tech@mdocml.bsd.lv; Tue, 22 Mar 2011 03:20:30 +0100 Date: Tue, 22 Mar 2011 03:20:30 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: [PATCH] Massive restructuring into mandoc.h/libmandoc.a. Message-ID: <20110322022030.GB16603@iris.usta.de> References: <4D878CF0.2060306@bsd.lv> <20110321215744.GA16603@iris.usta.de> <4D87D261.300@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D87D261.300@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Mon, Mar 21, 2011 at 11:34:09PM +0100: > Ingo, keep me posted (I hope you don't mind my cross-posting back to > tech@, as this includes some motivations for the recent changes). No, i don't mind, my mail wasn't private, it just hadn't enough substance for posting it, but your reply has. This is still not final feedback, i'm still trying to understand the details of what is happening, so here i'm only trying to describe my first impression what might need to be looked at. > In terms of being overdone, consider: > > (1) Collapsing libmdoc, libman, and libroff into libmandoc. > > I'm motivated by the wrongness of the existing approach: maintaining > separable libmdoc, libman, and libroff compilers ignores the reality > that Real Manuals are always mixing them. The more we work on > libroff, in particular, the more it's going to have to work with the > underlying compilers. Yes. I agree with that. I don't see much value in separable compilers. What i'm wondering about is code layering, and how that is expressed in headers. So i'm talking about code organization, and nomenclature, not about sub-libraries. I think we have the following logical layers: 1. The main program, parsing options and iterating files. 2. The reader code, reading one document and dispatching to parsers. 3. The high-level languages mdoc(7) and man(7). 4. The low-level roff(7) language. 5. The roff(7) plugins like tbl(7) and eqn(7). 6. Specific output utilities. 7. Generic output utilities. 8. General purpose utilities. Each higher level can use each lower level. Each level can consist of one or more functional units, maybe using the same lower level units, maybe used by the same higher level units, but not using each other. Levels seem to be subdivided in this way: 1. (main.c) 2. (read.c) 3. a) mdoc parser (mdoc*.c) b) man parser (man*.c) c) mdoc terminal renderer (mdoc_term.c) d) mdoc HTML renderer (mdoc_html.c) e) man terminal renderer (man_term.c) f) man HTML renderer (man_html.c) g) tree renderers (tree.c) 4. (roff.c) 5. a) tbl parser (tbl*.c) b) tbl terminal renderer (tbl_term.c) c) tbl HTML renderer (tbl_html.c) d) eqn parser (eqn.c) 6. a) terminal output utilities (term*.c) b) HTML output utilities (html.c) c) tree output utilities (tree.c) 7. (out.c, chars.c) 8. (mandoc.c) Each unit should define an interface that can be used at the higher levels. These interfaces should be defined in headers. That doesn't mean they need to be packaged as libraries. Besides, each unit can have an internal header, not for use by higher levels (currently called lib*.h). Here is an overview of the interfaces: 3. acd) mdoc.h, main.h (+ private libmdoc.h for a) bef) man.h, main.h (+ private libman.h for b) g) main.h 4. roff.h mandoc.h (registers) 5. abc) mandoc.h (tbl) (+ private libroff.h for a) d) mandoc.h (eqn) (+ private libroff.h) 6. a) term.h, main.h b) html.h, main.h 7. out.h, chars.h 8. libmandoc.h mandoc.h (errors) So, the overall structure is not bad, except for two messy points: * main.h is a layering violation by its sheer existence; parts belong in mdoc.h, man.h, term.h, html.h. * mandoc.h is even worse; registers belong in roff.h; proper tbl.h and eqn.h would be cleaner; the rest is lowest layer, together with libmandoc.h > Consider, if you will, how to handle in-line equations without > calling into the EQN parser from libmdoc or libman > (".Qq $a+b$" comes to mind). This will only get worse. No problem there, just use layer 5 from layer 3. > (2) Collapsing main.c parsing into read.c and thus libmandoc. > > The notion of "libmdoc" and "libman" as standalone parsers is a > horrible lie. A significant amount of parse complexity, such as > pasting together CPP-escaped lines and validating ASCIIness, occured > in main.c. Not even to mention (1), and the reality that mdoc and > man rarely occur on their own, making the roff_parsln() and > mdoc/man_parse dance part of the document syntax itself. Sure. I tend to like the main.c / read.c split, whatever we call the interface. Probably mandoc.h is a good choice of name for the level 2 interface, and the level 8 interface should just keep the name libmandoc.h for now, even though having the top and bottom layer a [lib]*.h pair is a bit confusing when all the other [lib]*.h pairs live on the same level. > (3) Collapsing mdoc.h and man.h into libmandoc.h/mandoc.h. > > Two things. First off, see (2). The parse() functions in both of > these headers were lies. Second of all, and allowing for that, what > do we get by having both mdoc.h and man.h in terms of their type > definitions? All this allows is for the two pairs, mdoc_XXXX.c and > man_XXXX.c, to have their own inclusions. Big f. deal: everybody > else imports both anyway! It's artificial to split them and, > although this isn't OpenBSD's problem, it puts an unnecessary burden > on distributing libmandoc.a (requiring mdoc.h and man.h hanger-ons) > for use by other utilities. Hm. I guess here is my gripe. I still don't see the point of clobbering everything into mandoc.h. What's wrong with saying #include "man.h" #include "mdoc.h" #include "mandoc.h" in a program using libmandoc, if it really uses both parsers and the main reader? It's neither better nor worse than just #include "mandoc.h" which includes everything; the difference is just keeping code organized by topic, keeping code for one topic in one file, which is easier to read and maintain. And potentially, not having one header for everything also helps layering, doesn't it? > That's pretty much all these patches accomplish. The rest is the > Makefile being re-written, which I've wanted to do for ages. Well, i don't really worry about the Makefile, it is nice if it becomes shorter, but it won't come down to OpenBSD shortness any time soon. ;-) Good night for now, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv