discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
* scdoc2mdoc - Alpha release and a call for help
@ 2019-07-05  3:44 Stephen Gregoratto
  2019-07-06 15:32 ` Ingo Schwarze
  0 siblings, 1 reply; 2+ messages in thread
From: Stephen Gregoratto @ 2019-07-05  3:44 UTC (permalink / raw)
  To: discuss

[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]

Last month I talked about up scdoc, a program that converts a
Markdown-style presentational markup to man(7). I talked to the creator
about rewriting it to output mdoc(7) instead, but he explicitly didn't
want that. So, I've decided to fork[1] it.

I've implemented most of the (simple) changes needed, but there are a
couple problems:
- Text blocks in scdoc are separated by blank lines, like Markdown.
  Currently, the program prints out a ".Pp" for each one, even if it
  isn't needed "e.g. Pp before Sh, Bd etc.". I'll have to track if I'm
  in a paragraph and handle closing it somehow.

- Tables in scdoc are set out cell-by-cell, one per line. You can also
  change the text alignment for each cell (why you'd want to in a
  manpage I wouldn't know). Currently, the program reads the table
  completely and allocates a linked-list of rows, which then contain a
  linked list of cells. At the end of parsing, the program spits out the
  appropriate tbl(7) representation, printing the alignment for each
  cell (even if they don't change) and wrapping each cell in a "T{ ...
  }T" block. This is done in the parse_table() function.

  I'd like to rewrite this to output this as a .Bl -column list, but I'm
  unsure of the proper way to do this. In my testing, it seems like
  doing something like the following would work:

    .Bl -column 
    .It row-start
    .Ta second cell
    .It row-start
    .Ta second cell
    .Ta third-cell
    .El

  BUT, mandoc -Tlint gives me warnings about this (first macro on line:
  Ta).  Also note that text in cells can be set bold or italic, further
  complicating things. Given how docbook2mdoc doesn't even try to do
  this doesn't inspire much confidence.

- Bold and italic text are formatted using the \fB/\fI escapes
  (parse_format()). I'll have to change this to keep track of the
  current text "format" and print a "Sy/Em" for each new line.

- There is a bug where the program emits a stray "Ed" after parsing a
  list. This is because the program parses the indentation level
  (parse_indent()) for each line and prints this if it gets lower.  This
  is a holdover from how the program implemented indented
  lists/paragraphs: by chanting some roff voodoo to physically indent
  the blocks by a certain size.

Even though I understand these problems, I don't feel that I have the
technical ability to solve them effectively, which is why I've come here
for help. Any comments/patches are greatly appreciated. I've implemented
kristaps@ oconfigure script, so there should be no problem building on
other platforms (tested on Linux, OpenBSD and Solaris 10).
I've attached my latest changes and a collection of scdoc documents
found in the wild for testing.

As for why I'm doing this, there is one big reason. The creator of scdoc
also maintains a popular Wayland compositor library (wlroots) and the
most popular Wayland compositor behind GNOME and KDE (sway). He's
considered a major developer in Wayland development circles. He also
enforces the use of scdoc for every new project he makes. The result:
many other Wayland projects are starting to use scdoc for their manpages
because of this influence.

Eventually when Wayland compositors/programs will be ported to OpenBSD,
scdoc is going to be included. My hope is to "fix" scdoc to output in a
sane markup language suitable for OpenBSD, similar to pod/docbook2mdoc.

[1] https://git.sgregoratto.me/The-King-of-Toasters/scdoc
-- 
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B

[-- Attachment #2: scdoc-2.0.0-ALPHA.tgz --]
[-- Type: application/gzip, Size: 17022 bytes --]

[-- Attachment #3: scdoc-documents.tgz --]
[-- Type: application/gzip, Size: 45230 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: scdoc2mdoc - Alpha release and a call for help
  2019-07-05  3:44 scdoc2mdoc - Alpha release and a call for help Stephen Gregoratto
@ 2019-07-06 15:32 ` Ingo Schwarze
  0 siblings, 0 replies; 2+ messages in thread
From: Ingo Schwarze @ 2019-07-06 15:32 UTC (permalink / raw)
  To: Stephen Gregoratto; +Cc: discuss

Hi Stephen,

Stephen Gregoratto wrote on Fri, Jul 05, 2019 at 01:44:15PM +1000:

> Last month I talked about up scdoc, a program that converts a
> Markdown-style presentational markup to man(7). I talked to the creator
> about rewriting it to output mdoc(7) instead,

That's a terrible idea.  If upstream is willing to use a good markup
language, they should maintain their documentation directly in
mdoc(7).  That results in great markup power while keeping the
documents easy to maintain.

When they insist on using a bad language for maintaining their
documents (like Markdown or DocBook), converting such bad input
into mdoc(7) will not help much.  The markup power of the result
will still be poor, and the the documentation will still be painful
to maintain given the poor source language.

> but he explicitly didn't
> want that. So, I've decided to fork[1] it.
[...]
> - Tables in scdoc are set out cell-by-cell, one per line. You can also
>   change the text alignment for each cell (why you'd want to in a
>   manpage I wouldn't know). Currently, the program reads the table
>   completely and allocates a linked-list of rows, which then contain a
>   linked list of cells. At the end of parsing, the program spits out the
>   appropriate tbl(7) representation, printing the alignment for each
>   cell (even if they don't change) and wrapping each cell in a "T{ ...
>   }T" block. This is done in the parse_table() function.
> 
>   I'd like to rewrite this to output this as a .Bl -column list,

Why?

The .Bl -column does not have more semantic markup power than tbl(7).

If the document as a whole is written in mdoc(7), then using
.Bl -column is more portable and robust than using tbl(7).

But when the overall document quality is very poor in the first
place - which it will unavoidably always be when starting from
some Markdown variant - then such minor benefits in robustness
are irrelevant; the whole thing will always be somewhat fragile
either way.

My suggestion would be to save yourself the work and just stick
to the tbl(7) code you already have.

> - Bold and italic text are formatted using the \fB/\fI escapes
>   (parse_format()). I'll have to change this to keep track of the
>   current text "format" and print a "Sy/Em" for each new line.

Why?

The macros .Sy and .Em have almost no semantic power, they are
explicitly documented as "physical markup".

Of course, if you write a manual page by hand, don't use font
escapes.  But that hardly applies to an automatic converter.

If you could guess with heuristics (like pod2mdoc(1) does)
which semantic macro to use, *that* would be a real improvement,
but .Sy and .Em don't reach that goal.

If your point were to help half-automatic, half-manual upgrades
from Markdown to mdoc(7) for the result to be maintained afterwards
and the original input discarded, writing .Sy and .Em might make
sense because they are easier to replace manually by the correct
macros.  But when manual postprocessing is not intended and the
original Markdown documents keep being maintained, what's the point
in writing .Sy and .Em?

> Even though I understand these problems, I don't feel that I have the
> technical ability to solve them effectively, which is why I've come here
> for help.

I hope you find help.

Myself, i might look at some point, but not right now, so don't
hold your breath.

If the project comes close to being usable in practice, i shall
list it on the mandoc.bsd.lv frontpage (like docbook2mdoc).
If i forget, poke me when you think it is usable, at the
latest when you make a release.

[...]
> As for why I'm doing this, there is one big reason. The creator of scdoc
> also maintains a popular Wayland compositor library (wlroots) and the
> most popular Wayland compositor behind GNOME and KDE (sway). He's
> considered a major developer in Wayland development circles. He also
> enforces the use of scdoc for every new project he makes.

Yikes.

> The result: many other Wayland projects are starting to use scdoc
> for their manpages because of this influence.
> 
> Eventually when Wayland compositors/programs will be ported to OpenBSD,
> scdoc is going to be included. My hope is to "fix" scdoc to output in a
> sane markup language suitable for OpenBSD, similar to pod/docbook2mdoc.

That might make sense, for similar reasons as Xenocara uses docbook2mdoc.

For now, even if Wayland uses it, scdoc appears to be even more of
a niche product than DocBook, and let's really hope that it will
never become widespread.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-07-06 15:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05  3:44 scdoc2mdoc - Alpha release and a call for help Stephen Gregoratto
2019-07-06 15:32 ` Ingo Schwarze

discuss@mandoc.bsd.lv

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.vuxu.org/mandoc-discuss

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 mandoc-discuss mandoc-discuss/ http://inbox.vuxu.org/mandoc-discuss \
		discuss@mandoc.bsd.lv
	public-inbox-index mandoc-discuss

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.mandoc.discuss


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git