Coda to the discussion on converting the HTML s6 documentation

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

* Coda to the discussion on converting the HTML s6 documentation
@ 2020-09-02  9:59 Alexis
  0 siblings, 0 replies; only message in thread
From: Alexis @ 2020-09-02  9:59 UTC (permalink / raw)
  To: supervision

Hi all,

i've received an email offlist asking some clarifying questions 
about automating the conversion of the current HTML s6 
documentation, and i thought it might be useful to post some of 
the things i noted in my reply.

The issue isn't that the HTML is unparseable (it's not). A tool 
like `pandoc` can be used to convert the pages into other formats, 
including roff. Over at Void, we recently tried to make use of 
`pandoc` to create a man page for Érico's neat `void-docs` script, 
which allows viewing the Void Handbook locally in a number of 
formats. What i found is that the output of pandoc produced roff 
that was fine visually, but which relied on presentational markup, 
rather than semantic markup. i'll return to this issue below.

The issue is twofold:

* Things like bare "<em>" tags (i.e. without a 'class' attribute 
  describing their contents) are used in the HTML to convey 
  multiple types of information that mdoc/roff 
  distinguishes. Sometimes an "<em>" is used for an argument (Ar 
  in mdoc), sometimes it's simply used for emphasis (Em in 
  mdoc). Similarly, bare "<tt>" tags are used for a path (Pa in 
  mdoc), function types (Ft in mdoc),
functions (Fn in mdoc), libraries (which could have a man page 
that should be cross-referenced with an Xr macro), and so on. A 
human is needed to decide the semantics involved (e.g. for 
Casper's putative IL), based on context.

* Many things /simply aren't marked up at all/. The example i gave 
  in my earlier post was environment variables: again, a human is 
  needed to decide whether something in ALLCAPS is an env var, a 
  cpp macro, or something else altogether (like a reference to the 
  'TAI64' concept.)

The question might be asked: "Well, who cares? Why care about 
semantic markup? As long as the visual output is the same, what's 
the issue?" Two things:

* Having the documentation source use semantic markup as much as 
  possible facilitates conversion between formats. `mandoc(1)` 
  doesn't only output man pages from mdoc source: it can also 
  produce HTML (used on man.voidlinux.org, with some custom CSS 
  for Void theming), PDF, PostScript, Markdown and plain ASCII. So 
  if things like flags, arguments, paths, environment variables, 
  variable types, variables, function types, functions etc. are 
  marked up in the mdoc source, a PDF (for example) can be styled 
  appropriately for each case.

* Additionally, extensive semantic markup has a direct benefit to 
  end-users: the ability to use the functionality of `apropos` to 
  find appropriate content. For example, say one wished to find 
  all uses of the 'GID' env var in the s6 man pages. One could use 
  `apropos 'Ev=GID' | grep s6-`. (This sort of use-case is part of 
  why i've made sure all the names of all the man pages i'm 
  creating are prefixed with "s6-".) Similarly, one could search 
  for all mentions of the 'notification-fd' file with `apropos 
  'Pa~.*notification-fd'`, with the '~' indicating an extended 
  regular expression. However, this won't work without the 
  relevant markup in the sources.

Fwiw, my suggestion, for those interested in converting the 
documentation to One True Format as decided by Laurent, would be 
to leverage my efforts to use semantic markup extensively in the 
man pages. Once the s6-man-pages repo is ready, use `mandoc -T 
html` to convert the pages to HTML, which will contain consistent 
semantic markup (e.g. '<h1 class="Sh" id="DESCRIPTION">'). That 
HTML can then be parsed and converted to the One True Format, an 
authoritative source from which man pages and HTML can be 
produced.

Alexis.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-09-02  9:59 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02  9:59 Coda to the discussion on converting the HTML s6 documentation Alexis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).