9front - general discussion about 9front
 help / color / mirror / Atom feed
From: sl@stanleylieber.com
To: 9front@9front.org
Date: Thu, 9 Aug 2018 21:17:28 -0400	[thread overview]
Message-ID: <20180810011728.YSmSUVpRrWMNNwmdbRZFXWxOuLPbdZ9qrmUB9j1zv4Q@z> (raw)

WARNING

I'm posting this to the 9front list because only spammers
are subscribed to the werc list.

BACKROUND

werc/apps/wman/app.rc prints man pages as HTML:

	fn wman_page_gen {
	    #troff -manhtml $1| troff2html -t 'Plan 9 from User Space'
	    troff -N -m$wman_tmac $1 | wman_out_filter
	}

The function wman_default_out_filter then performs some magic to
transform the resulting plain text into markdown, which is in turn
processed by the standard werc handlers.  This produces minimal
HTML (as opposed to the commented-out, original troff pipeline,
which produces hard to read HTML containing tables and other
relatively complex structures).

Recently, someone pointed out that wman botches HREF links when troff
-N automatically linewraps because of a dash:

	; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p
	          were formatted with fmtarenas or fmtisect (see venti-
	          <a href="../8/fmt">fmt(8)</a>). In particular, only the configuration needs to be

FIX

Currently, I have instituted a medium- to low-quality fix
on the running system by inserting a ssam(1) line into the
wman_default_out_filter function:

	fn wman_default_out_filter {
		# col -x syntax is the same for UNIX and Plan 9.
	    escape_html \
	    | ssam 'x/[a-z]+-\n[ ]+[a-z]+\([0-9]\)/s/\n[ ]+//g' \
	    | sed 's!([\.\-a-zA-Z0-9]+)\(('^`{echo $wman_cat_list|tr ' ' '|'}^')\)!<a href="../\2/\1">&</a>!g' \
	    | awk '/^$/ {if(n != 1) print; n=1; next} /./ {n=0; print}' \
	    | col -x
	}

Now we get:

	; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p
	          were formatted with fmtarenas or fmtisect (see <a href="../8/venti-fmt">venti-fmt(8)</a>). In particular, only the configuration needs to be
	                       fmtarenas.

WHINING

This sucks for a couple of reasons:

	- ssam(1) creates a temporary file on disk.

	- page formatting is now dicked-up, as we remove
	a newline every time we fix a link.

Plan 9 sed(1) and awk(1) do not recognize the \n for newline shorthand
that is available in sam(1).

It should be possible to address this with awk(1), but I'm out of
time for today.

Suggestions welcome.

sl


                 reply	other threads:[~2018-08-10  1:17 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180810011728.YSmSUVpRrWMNNwmdbRZFXWxOuLPbdZ9qrmUB9j1zv4Q@z \
    --to=sl@stanleylieber.com \
    --cc=9front@9front.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).