From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 18132 invoked from network); 2 Sep 2020 09:59:21 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 2 Sep 2020 09:59:21 -0000 Received: (qmail 6944 invoked by uid 89); 2 Sep 2020 09:59:43 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 6937 invoked from network); 2 Sep 2020 09:59:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=user-agent:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=n6j2a3IFPB25p3+8EaF8DD4ZEkb8m87t2uth34QH9T0=; b=Jbk4uUHeDsPEjrTdulKVFnMViVjScQUmSisTUMZRw2r6nwYUcBMatFtK3u2kx2qc7W Jt+uPjYOW004gl+5oiS+ma89xwv/RsKDn4YdBDDY1Mil8NnJnAmPfN7hnZBTN74Bx9x8 y1Vt6Zfb3pKJoor6sY65OMByk03iCEnu6y0Lf/Q2nbf6tsiQ+S9/vRCJ1HJYV2O9EsbE zO6kDQ/9DHlIPaKHbsYIuzcNRiVCSgi2xxHXHQmHqWZKIipi8MZLyn9gegqxRhpYHa/4 YzHP7TujzPKFzOrcF/zYjjTohLt6Whqz5RTgJBdOA+qliUquHX82fYq7Kg1J8caqbFdh stCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:user-agent:from:to:subject:date:message-id :mime-version:content-transfer-encoding; bh=n6j2a3IFPB25p3+8EaF8DD4ZEkb8m87t2uth34QH9T0=; b=J6IxpN4w3VPgH7ICfZlSCTWRUcnORXyqtmwkjR/WnAAJxOUnWuIRNTHDN8DfMQLtlT SUhAO22XodpxdSUeIqecS+7/jFrctDFFY6CrnAAk5GyqOOhJCqhDbKqFYAefNhjvM9sk YVfcowldwrJIM5VLL0qD1v4czjajc9sKF3g0OBg9huXtCCSYUIQOnGp4pV/QyGVztY9a P557/muwpA+21yNulyP+Is7msyq9fEj0bSHijcCgqrT7diFxPNnSHFa9qquxrFKMwD/m lOrynlG9BX3zo71mRU8GNyclHO5kfKTknhrXQCn67hnuB8s0xL2RMi7O/f3PRYEUvOve Gabw== X-Gm-Message-State: AOAM532pAgfoiMf75AilzgYrF1a3LFtA5p2Funiz2cVxrWBtvs0pQHgY EgOLKMWMtPspZ1CXuF7r8gdDJBfoFcc= X-Google-Smtp-Source: ABdhPJxd1iZCnACXzecGQjokWcV5qFV/sBrFhLIQZ1mE4pV2GI/rZHX8XrIxiqCLl2km5wDhcXfAvQ== X-Received: by 2002:a62:1b81:: with SMTP id b123mr2535495pfb.149.1599040754559; Wed, 02 Sep 2020 02:59:14 -0700 (PDT) User-agent: mu4e 1.4.13; emacs 27.1 From: Alexis To: supervision@list.skarnet.org Subject: Coda to the discussion on converting the HTML s6 documentation Date: Wed, 02 Sep 2020 19:59:10 +1000 Message-ID: <87o8mowlch.fsf@ada> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi all, i've received an email offlist asking some clarifying questions=20 about automating the conversion of the current HTML s6=20 documentation, and i thought it might be useful to post some of=20 the things i noted in my reply. The issue isn't that the HTML is unparseable (it's not). A tool=20 like `pandoc` can be used to convert the pages into other formats,=20 including roff. Over at Void, we recently tried to make use of=20 `pandoc` to create a man page for =C3=89rico's neat `void-docs` script,=20 which allows viewing the Void Handbook locally in a number of=20 formats. What i found is that the output of pandoc produced roff=20 that was fine visually, but which relied on presentational markup,=20 rather than semantic markup. i'll return to this issue below. The issue is twofold: * Things like bare "" tags (i.e. without a 'class' attribute=20 describing their contents) are used in the HTML to convey=20 multiple types of information that mdoc/roff=20 distinguishes. Sometimes an "" is used for an argument (Ar=20 in mdoc), sometimes it's simply used for emphasis (Em in=20 mdoc). Similarly, bare "" tags are used for a path (Pa in=20 mdoc), function types (Ft in mdoc), functions (Fn in mdoc), libraries (which could have a man page=20 that should be cross-referenced with an Xr macro), and so on. A=20 human is needed to decide the semantics involved (e.g. for=20 Casper's putative IL), based on context. * Many things /simply aren't marked up at all/. The example i gave=20 in my earlier post was environment variables: again, a human is=20 needed to decide whether something in ALLCAPS is an env var, a=20 cpp macro, or something else altogether (like a reference to the=20 'TAI64' concept.) The question might be asked: "Well, who cares? Why care about=20 semantic markup? As long as the visual output is the same, what's=20 the issue?" Two things: * Having the documentation source use semantic markup as much as=20 possible facilitates conversion between formats. `mandoc(1)`=20 doesn't only output man pages from mdoc source: it can also=20 produce HTML (used on man.voidlinux.org, with some custom CSS=20 for Void theming), PDF, PostScript, Markdown and plain ASCII. So=20 if things like flags, arguments, paths, environment variables,=20 variable types, variables, function types, functions etc. are=20 marked up in the mdoc source, a PDF (for example) can be styled=20 appropriately for each case. * Additionally, extensive semantic markup has a direct benefit to=20 end-users: the ability to use the functionality of `apropos` to=20 find appropriate content. For example, say one wished to find=20 all uses of the 'GID' env var in the s6 man pages. One could use=20 `apropos 'Ev=3DGID' | grep s6-`. (This sort of use-case is part of=20 why i've made sure all the names of all the man pages i'm=20 creating are prefixed with "s6-".) Similarly, one could search=20 for all mentions of the 'notification-fd' file with `apropos=20 'Pa~.*notification-fd'`, with the '~' indicating an extended=20 regular expression. However, this won't work without the=20 relevant markup in the sources. Fwiw, my suggestion, for those interested in converting the=20 documentation to One True Format as decided by Laurent, would be=20 to leverage my efforts to use semantic markup extensively in the=20 man pages. Once the s6-man-pages repo is ready, use `mandoc -T=20 html` to convert the pages to HTML, which will contain consistent=20 semantic markup (e.g. '

'). That=20 HTML can then be parsed and converted to the One True Format, an=20 authoritative source from which man pages and HTML can be=20 produced. Alexis.