discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Humm <hummsmith42@gmail.com>
Cc: discuss@mandoc.bsd.lv
Subject: Re: \h in PostScript and PDF output
Date: Mon, 10 Jan 2022 08:37:40 +0100	[thread overview]
Message-ID: <YdviRErxwF9Bn5n7@asta-kit.de> (raw)
In-Reply-To: <YdrqPTQzT+5F+E1+@beryllium.local>

Hello,

Humm wrote on Sun, Jan 09, 2022 at 01:59:25PM +0000:

> Consider the input
> 
> 	.TH A 1
> 	.SH S
> 	a\h'1u'b
> 
> For ASCII and UTF-8 output, in the output there is
> 
> 	ab
> 
> In PDF and PostScript output, the “b” is way too far to the right.  

Thank your for your report.  That is indeed a bug.

I believe that the patch appended below fixes the bug; it survived
my testing without finding regressions, and i intend to commit it to
openbsd.org and bsd.lv shortly.  Does it work for you, too?

Do you want to be credited in the commit message and in the next release
notes for reporting the bug?  If so, please send me a complete real name.
If you do not send a complete real name, i shall credit you as
  Humm <hummsmith42 at gmail dot com>
in the commit commit message but probably not mention you at all
in the release notes.  The release notes do not contain mail addresses,
and "Humm" alone isn't really helpful to identify anybody.

> The PostScript for it is
> 
> 	87.274 687.599(a)s
> 	166.408(b)c
> 
> That scales up: With \h'1n', the “b” is already off the page and thus 
> not visible.  For `a\h'1n'b c`, there is a line break before “c”.
> 
> roff(7) does mention: (COMPATIBILITY)
> 
>> Support for explicit movement requests and escapes is limited.

Yes, i think that is still true.  Not exactly non-existent,
but still limited.

Your report was quite useful in an additional respect.  Roff programs
juggle many different units representing lengths, and mandoc is no
different.  For that reason, when first looking at your report and
the related code, i failed to see right away what exactly is going on.
So i decided to draft a table detailing which units are used by which
variables and functions, and in the end, it turned out your bug is
indeed a bug related to unit conversions.  These are the fundamental
units used:

 * struct roffsu (double value with an explicit unit identifier)
   used for example by a2roffsu(), SCALE_HS_INIT(), and others
 * Adobe Font Metrics units (AFM)
   used for example by glyph.wx, ps_advance(), termp_ps.*, and others
 * en units (width of the 'n' glyph == 1)
   used for example by ascii_advance(), html*, manoutput.*, and others
 * AFM or en units in a context-dependendent manner
   used for example by a2width(), roffcol.*, term_hen(), term_len(),
   term_strlen(), termp.advance(), termp.width(), and many others
 * units of 1/24 AFM
   used by ps_hspan() and ps_setwidth()
 * terminal basic units (= 1/24 en)
   used by ascii_hspan() and ascii_setwidth()
 * AFM/24 or en/24 units in a context-dependendent manner
   used by term_hspan() and term_setwidth()
 * all other units have a fixed relationship to each other;
   for ASCII: 1i = 2.54c = 6v = 6P = 10m = 10n = 72p = 240u = 1000M
   but for PS/PDF: 1v = 1400 AFM, 1m = 778 AFM, 1n = 500 AFM;
   the rest is regular: 1i = 6545 AFM, ..., 1u = 27.27 AFM

I'm not going to bore you with the full table; it is significantly
more complicated.

While researching this table, i found nineteen (19!) additional
candidate places where i strongly suspect that a call to term_len()
is either missing or this function is called incorrectly, resulting in
more or less wrong positioning of PostScript and PDF output (ASCII,
UTF-8, and HTML output are not wrong in these cases).  I'll have
to further inspect, fix, and test those nineteen places one by one.
Uh oh...

I freely admit that while mandoc PostScript and PDF output is usable
for many simple purposes, its quality is significantly below the
quality of ASCII, UTF-8, and HTML output, in several respects -
some of these respects are conceptional, and bugs are also more
numerous for PostScript and PDF.

> I’m using Alpine Linux’s build of mandoc 1.14.6.

In general, thank you for mentioning that.  In this case,
fortunately, the problem was easily reproducible on -current.

> The context is

Thank you for mentioning the context, that often provides information
relevant for development, and sometimes for choosing priorities.

> the way the (-)man page generator scdoc
> ( https://git.sr.ht/~sircmpwn/scdoc )

Yikes, that one yet again.  While it is not quite as atrocious
as DocBook, it is well-known to be in the lower regions of the
quality scale of man(7) code generators.  So please avoid using
it if you can (of course, when porting third-party software to
Alpine, you may not get any chance to avoid it).

> handles lists: A list item
> 
> 	- a
> 
> becomes
> 
> 	.RS 4
> 	.ie n \{\
> 	\h'-04'\(bu\h'+03'\c
> 	.\}
> 	.el \{\
> 	.IP \(bu 4
> 	.\}
> 	a
> 	.RE
> 
> apparently for the looks.

Right, that's incredibly stupid.  I hate it when people who do not
understand the man(7) language but go ahead and write man(7) code
generators anyway.  In this particular case, the man(7) language
provides the .TP macro for just that purpose, so mucking around
with .RS and rather fragile low-level roff(7) code is quite absurd.

That said, that's of course no excuse for mandoc mishandling \h
in PostScript and PDF output mode.

Yours,
  Ingo


Index: term.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/term.c,v
retrieving revision 1.144
diff -u -p -r1.144 term.c
--- term.c	4 Oct 2021 18:56:24 -0000	1.144
+++ term.c	10 Jan 2022 06:35:29 -0000
@@ -1,6 +1,6 @@
 /* $OpenBSD: term.c,v 1.144 2021/10/04 18:56:24 schwarze Exp $ */
 /*
- * Copyright (c) 2010-2021 Ingo Schwarze <schwarze@openbsd.org>
+ * Copyright (c) 2010-2022 Ingo Schwarze <schwarze@openbsd.org>
  * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -634,12 +634,14 @@ term_word(struct termp *p, const char *w
 			if (a2roffsu(seq, &su, SCALE_EM) == NULL)
 				continue;
 			uc += term_hen(p, &su);
-			if (uc > 0)
-				while (uc-- > 0)
+			if (uc > 0) {
+				while (uc > 0) {
 					bufferc(p, ASCII_NBRSP);
-			else if (p->col > (size_t)(-uc))
+					uc -= term_len(p, 1);
+				}
+			} else if (p->col > (size_t)(-uc)) {
 				p->col += uc;
-			else {
+			} else {
 				uc += p->col;
 				p->col = 0;
 				if (p->tcol->offset > (size_t)(-uc)) {
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2022-01-10  7:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-09 13:59 Humm
2022-01-10  7:37 ` Ingo Schwarze [this message]
2022-01-10 16:45   ` Humm
2022-01-10 18:50     ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdviRErxwF9Bn5n7@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=discuss@mandoc.bsd.lv \
    --cc=hummsmith42@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).