discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
* \h in PostScript and PDF output
@ 2022-01-09 13:59 Humm
  2022-01-10  7:37 ` Ingo Schwarze
  0 siblings, 1 reply; 4+ messages in thread
From: Humm @ 2022-01-09 13:59 UTC (permalink / raw)
  To: discuss

Consider the input

	.TH A 1
	.SH S
	a\h'1u'b

For ASCII and UTF-8 output, in the output there is

	ab

In PDF and PostScript output, the “b” is way too far to the right.  
The PostScript for it is

	87.274 687.599(a)s
	166.408(b)c

That scales up: With \h'1n', the “b” is already off the page and thus 
not visible.  For `a\h'1n'b c`, there is a line break before “c”.

roff(7) does mention: (COMPATIBILITY)

>Support for explicit movement requests and escapes is limited.

I’m using Alpine Linux’s build of mandoc 1.14.6.

---

The context is the way the (-)man page generator scdoc
( https://git.sr.ht/~sircmpwn/scdoc ) handles lists: A list item

	- a

becomes

	.RS 4
	.ie n \{\
	\h'-04'\(bu\h'+03'\c
	.\}
	.el \{\
	.IP \(bu 4
	.\}
	a
	.RE

apparently for the looks.

-- 
Humm
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: \h in PostScript and PDF output
  2022-01-09 13:59 \h in PostScript and PDF output Humm
@ 2022-01-10  7:37 ` Ingo Schwarze
  2022-01-10 16:45   ` Humm
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Schwarze @ 2022-01-10  7:37 UTC (permalink / raw)
  To: Humm; +Cc: discuss

Hello,

Humm wrote on Sun, Jan 09, 2022 at 01:59:25PM +0000:

> Consider the input
> 
> 	.TH A 1
> 	.SH S
> 	a\h'1u'b
> 
> For ASCII and UTF-8 output, in the output there is
> 
> 	ab
> 
> In PDF and PostScript output, the “b” is way too far to the right.  

Thank your for your report.  That is indeed a bug.

I believe that the patch appended below fixes the bug; it survived
my testing without finding regressions, and i intend to commit it to
openbsd.org and bsd.lv shortly.  Does it work for you, too?

Do you want to be credited in the commit message and in the next release
notes for reporting the bug?  If so, please send me a complete real name.
If you do not send a complete real name, i shall credit you as
  Humm <hummsmith42 at gmail dot com>
in the commit commit message but probably not mention you at all
in the release notes.  The release notes do not contain mail addresses,
and "Humm" alone isn't really helpful to identify anybody.

> The PostScript for it is
> 
> 	87.274 687.599(a)s
> 	166.408(b)c
> 
> That scales up: With \h'1n', the “b” is already off the page and thus 
> not visible.  For `a\h'1n'b c`, there is a line break before “c”.
> 
> roff(7) does mention: (COMPATIBILITY)
> 
>> Support for explicit movement requests and escapes is limited.

Yes, i think that is still true.  Not exactly non-existent,
but still limited.

Your report was quite useful in an additional respect.  Roff programs
juggle many different units representing lengths, and mandoc is no
different.  For that reason, when first looking at your report and
the related code, i failed to see right away what exactly is going on.
So i decided to draft a table detailing which units are used by which
variables and functions, and in the end, it turned out your bug is
indeed a bug related to unit conversions.  These are the fundamental
units used:

 * struct roffsu (double value with an explicit unit identifier)
   used for example by a2roffsu(), SCALE_HS_INIT(), and others
 * Adobe Font Metrics units (AFM)
   used for example by glyph.wx, ps_advance(), termp_ps.*, and others
 * en units (width of the 'n' glyph == 1)
   used for example by ascii_advance(), html*, manoutput.*, and others
 * AFM or en units in a context-dependendent manner
   used for example by a2width(), roffcol.*, term_hen(), term_len(),
   term_strlen(), termp.advance(), termp.width(), and many others
 * units of 1/24 AFM
   used by ps_hspan() and ps_setwidth()
 * terminal basic units (= 1/24 en)
   used by ascii_hspan() and ascii_setwidth()
 * AFM/24 or en/24 units in a context-dependendent manner
   used by term_hspan() and term_setwidth()
 * all other units have a fixed relationship to each other;
   for ASCII: 1i = 2.54c = 6v = 6P = 10m = 10n = 72p = 240u = 1000M
   but for PS/PDF: 1v = 1400 AFM, 1m = 778 AFM, 1n = 500 AFM;
   the rest is regular: 1i = 6545 AFM, ..., 1u = 27.27 AFM

I'm not going to bore you with the full table; it is significantly
more complicated.

While researching this table, i found nineteen (19!) additional
candidate places where i strongly suspect that a call to term_len()
is either missing or this function is called incorrectly, resulting in
more or less wrong positioning of PostScript and PDF output (ASCII,
UTF-8, and HTML output are not wrong in these cases).  I'll have
to further inspect, fix, and test those nineteen places one by one.
Uh oh...

I freely admit that while mandoc PostScript and PDF output is usable
for many simple purposes, its quality is significantly below the
quality of ASCII, UTF-8, and HTML output, in several respects -
some of these respects are conceptional, and bugs are also more
numerous for PostScript and PDF.

> I’m using Alpine Linux’s build of mandoc 1.14.6.

In general, thank you for mentioning that.  In this case,
fortunately, the problem was easily reproducible on -current.

> The context is

Thank you for mentioning the context, that often provides information
relevant for development, and sometimes for choosing priorities.

> the way the (-)man page generator scdoc
> ( https://git.sr.ht/~sircmpwn/scdoc )

Yikes, that one yet again.  While it is not quite as atrocious
as DocBook, it is well-known to be in the lower regions of the
quality scale of man(7) code generators.  So please avoid using
it if you can (of course, when porting third-party software to
Alpine, you may not get any chance to avoid it).

> handles lists: A list item
> 
> 	- a
> 
> becomes
> 
> 	.RS 4
> 	.ie n \{\
> 	\h'-04'\(bu\h'+03'\c
> 	.\}
> 	.el \{\
> 	.IP \(bu 4
> 	.\}
> 	a
> 	.RE
> 
> apparently for the looks.

Right, that's incredibly stupid.  I hate it when people who do not
understand the man(7) language but go ahead and write man(7) code
generators anyway.  In this particular case, the man(7) language
provides the .TP macro for just that purpose, so mucking around
with .RS and rather fragile low-level roff(7) code is quite absurd.

That said, that's of course no excuse for mandoc mishandling \h
in PostScript and PDF output mode.

Yours,
  Ingo


Index: term.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/term.c,v
retrieving revision 1.144
diff -u -p -r1.144 term.c
--- term.c	4 Oct 2021 18:56:24 -0000	1.144
+++ term.c	10 Jan 2022 06:35:29 -0000
@@ -1,6 +1,6 @@
 /* $OpenBSD: term.c,v 1.144 2021/10/04 18:56:24 schwarze Exp $ */
 /*
- * Copyright (c) 2010-2021 Ingo Schwarze <schwarze@openbsd.org>
+ * Copyright (c) 2010-2022 Ingo Schwarze <schwarze@openbsd.org>
  * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -634,12 +634,14 @@ term_word(struct termp *p, const char *w
 			if (a2roffsu(seq, &su, SCALE_EM) == NULL)
 				continue;
 			uc += term_hen(p, &su);
-			if (uc > 0)
-				while (uc-- > 0)
+			if (uc > 0) {
+				while (uc > 0) {
 					bufferc(p, ASCII_NBRSP);
-			else if (p->col > (size_t)(-uc))
+					uc -= term_len(p, 1);
+				}
+			} else if (p->col > (size_t)(-uc)) {
 				p->col += uc;
-			else {
+			} else {
 				uc += p->col;
 				p->col = 0;
 				if (p->tcol->offset > (size_t)(-uc)) {
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: \h in PostScript and PDF output
  2022-01-10  7:37 ` Ingo Schwarze
@ 2022-01-10 16:45   ` Humm
  2022-01-10 18:50     ` Ingo Schwarze
  0 siblings, 1 reply; 4+ messages in thread
From: Humm @ 2022-01-10 16:45 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: discuss

Quoth Ingo Schwarze:
>I believe that the patch appended below fixes the bug; it survived
>my testing without finding regressions, and i intend to commit it to
>openbsd.org and bsd.lv shortly.  Does it work for you, too?

It does.

>Do you want to be credited in the commit message and in the next release
>notes for reporting the bug?  If so, please send me a complete real name.
>If you do not send a complete real name, i shall credit you as
>  Humm <hummsmith42 at gmail dot com>
>in the commit commit message but probably not mention you at all
>in the release notes.  The release notes do not contain mail addresses,
>and "Humm" alone isn't really helpful to identify anybody.

Sure, Lennart Jablonka it is then.

>While researching this table, i found nineteen (19!) additional
>candidate places where i strongly suspect that a call to term_len()
>is either missing or this function is called incorrectly, resulting in
>more or less wrong positioning of PostScript and PDF output (ASCII,
>UTF-8, and HTML output are not wrong in these cases).  I'll have
>to further inspect, fix, and test those nineteen places one by one.
>Uh oh...

Good luck have fun!

>I freely admit that while mandoc PostScript and PDF output is usable
>for many simple purposes, its quality is significantly below the
>quality of ASCII, UTF-8, and HTML output, in several respects -
>some of these respects are conceptional, and bugs are also more
>numerous for PostScript and PDF.

Yeah, the PDF and PostScript output doesn’t look as beautiful as I have 
come to expect from computerized typesetting.

>Right, that's incredibly stupid.  I hate it when people who do not
>understand the man(7) language but go ahead and write man(7) code
>generators anyway.  In this particular case, the man(7) language
>provides the .TP macro for just that purpose, so mucking around
>with .RS and rather fragile low-level roff(7) code is quite absurd.

I agree.  .TP introduces vertical space between paragraphs.  For some 
reason, the author cares about that/doesn’t want it.

-- 
Humm
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: \h in PostScript and PDF output
  2022-01-10 16:45   ` Humm
@ 2022-01-10 18:50     ` Ingo Schwarze
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Schwarze @ 2022-01-10 18:50 UTC (permalink / raw)
  To: Humm; +Cc: discuss

Hello Lennart,

Humm wrote on Mon, Jan 10, 2022 at 04:45:39PM +0000:

> Quoth Ingo Schwarze:

>> I believe that the patch appended below fixes the bug; it survived
>> my testing without finding regressions, and i intend to commit it to
>> openbsd.org and bsd.lv shortly.  Does it work for you, too?

> It does.

Thank you for testing!

Committed, see below.

[...]
>> I'll have to further inspect, fix, and test those nineteen places
>> one by one.  Uh oh...

> Good luck have fun!

Heh, thanks, but i could imagine less tedious work.  :-|
In particular since i'm not aware of any reasonable way
to do regression testing of PostScript or PDF output...

[...]
> Yeah, the PDF and PostScript output doesn’t look as beautiful
> as I have come to expect from computerized typesetting.

Exactly.  Mandoc is not a typesetting system but a manual page
formatter focussing on terminal and HTML output.  When i want
real typesetting of manual pages to PostScript or PDF, even i use
groff.

[...]
> I agree.  .TP introduces vertical space between paragraphs.  For some 
> reason, the author cares about that/doesn’t want it.

That's what the (reasonably portable) .PD macro is designed for.
It's mildly discouraged because it is a purely presentational macro,
but nothing much is wrong with using it - after all, the whole man(7)
language mostly provides presentational rather than semantic markup.

   $ man -T ascii -O width=72 -s 7 man
     PD   Specify the vertical space to be inserted before each new
          paragraph.
          The syntax is as follows:

                .PD [height]

          The height argument is a roff(7) scaling width.  It defaults
          to 1v.  If the unit is omitted, v is assumed.

          This macro affects the spacing before any subsequent instances
          of HP, IP, LP, P, PP, SH, SS, SY, and TP.

Or alternatively, if you don't care that much about portability,
are content with groff and mandoc as supported formatters, and want
better readability of the man(7) source code, you could use:

     TQ   Like TP, except that no vertical spacing is inserted before
          the paragraph.  This is a non-standard GNU extension and very
          rarely used even in GNU manual pages.

Resorting to low-level roff(7) is just nuts and even more fragile
than using a GNU extension like .TQ.

Yours,
  Ingo


Log Message:
-----------
When rendering the \h (horizontal motion) low-level roff(7) escape 
sequence in -T ps and -T pdf output mode, use an appropriate
horizontal distance by correctly using the term_len() utility
function.  Output from the -T ascii, -T utf8, and -T html modes
was already correct and remains unchanged.

Lennart Jablonka <hummsmith42 at gmail dot com> found and reported 
this unit conversion bug (misinterpreting AFM units as if they were 
en units) when rendering scdoc-generated manuals (which is a low
quality generator, but that's no excuse for mandoc misformatting \h) 
on Alpine Linux.  Lennart also tested this patch.

Modified Files:
--------------
    mandoc:
        term.c

Revision Data
-------------
Index: term.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/term.c,v
retrieving revision 1.284
retrieving revision 1.285
diff -Lterm.c -Lterm.c -u -p -r1.284 -r1.285
--- term.c
+++ term.c
@@ -1,6 +1,6 @@
 /* $Id$ */
 /*
- * Copyright (c) 2010-2021 Ingo Schwarze <schwarze@openbsd.org>
+ * Copyright (c) 2010-2022 Ingo Schwarze <schwarze@openbsd.org>
  * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -636,12 +636,14 @@ term_word(struct termp *p, const char *w
 			if (a2roffsu(seq, &su, SCALE_EM) == NULL)
 				continue;
 			uc += term_hen(p, &su);
-			if (uc > 0)
-				while (uc-- > 0)
+			if (uc > 0) {
+				while (uc > 0) {
 					bufferc(p, ASCII_NBRSP);
-			else if (p->col > (size_t)(-uc))
+					uc -= term_len(p, 1);
+				}
+			} else if (p->col > (size_t)(-uc)) {
 				p->col += uc;
-			else {
+			} else {
 				uc += p->col;
 				p->col = 0;
 				if (p->tcol->offset > (size_t)(-uc)) {
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-10 18:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-09 13:59 \h in PostScript and PDF output Humm
2022-01-10  7:37 ` Ingo Schwarze
2022-01-10 16:45   ` Humm
2022-01-10 18:50     ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).