From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 9 Aug 2018 21:17:28 -0400 From: sl@stanleylieber.com To: 9front@9front.org MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: property module element NoSQL interface Message-ID: <20180810011728.YSmSUVpRrWMNNwmdbRZFXWxOuLPbdZ9qrmUB9j1zv4Q@z> WARNING I'm posting this to the 9front list because only spammers are subscribed to the werc list. BACKROUND werc/apps/wman/app.rc prints man pages as HTML: fn wman_page_gen { #troff -manhtml $1| troff2html -t 'Plan 9 from User Space' troff -N -m$wman_tmac $1 | wman_out_filter } The function wman_default_out_filter then performs some magic to transform the resulting plain text into markdown, which is in turn processed by the standard werc handlers. This produces minimal HTML (as opposed to the commented-out, original troff pipeline, which produces hard to read HTML containing tables and other relatively complex structures). Recently, someone pointed out that wman botches HREF links when troff -N automatically linewraps because of a dash: ; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p were formatted with fmtarenas or fmtisect (see venti- fmt(8)). In particular, only the configuration needs to be FIX Currently, I have instituted a medium- to low-quality fix on the running system by inserting a ssam(1) line into the wman_default_out_filter function: fn wman_default_out_filter { # col -x syntax is the same for UNIX and Plan 9. escape_html \ | ssam 'x/[a-z]+-\n[ ]+[a-z]+\([0-9]\)/s/\n[ ]+//g' \ | sed 's!([\.\-a-zA-Z0-9]+)\(('^`{echo $wman_cat_list|tr ' ' '|'}^')\)!&!g' \ | awk '/^$/ {if(n != 1) print; n=1; next} /./ {n=0; print}' \ | col -x } Now we get: ; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p were formatted with fmtarenas or fmtisect (see venti-fmt(8)). In particular, only the configuration needs to be fmtarenas. WHINING This sucks for a couple of reasons: - ssam(1) creates a temporary file on disk. - page formatting is now dicked-up, as we remove a newline every time we fix a link. Plan 9 sed(1) and awk(1) do not recognize the \n for newline shorthand that is available in sam(1). It should be possible to address this with awk(1), but I'm out of time for today. Suggestions welcome. sl