* \h in PostScript and PDF output
@ 2022-01-09 13:59 Humm
2022-01-10 7:37 ` Ingo Schwarze
0 siblings, 1 reply; 4+ messages in thread
From: Humm @ 2022-01-09 13:59 UTC (permalink / raw)
To: discuss
Consider the input
.TH A 1
.SH S
a\h'1u'b
For ASCII and UTF-8 output, in the output there is
ab
In PDF and PostScript output, the “b” is way too far to the right.
The PostScript for it is
87.274 687.599(a)s
166.408(b)c
That scales up: With \h'1n', the “b” is already off the page and thus
not visible. For `a\h'1n'b c`, there is a line break before “c”.
roff(7) does mention: (COMPATIBILITY)
>Support for explicit movement requests and escapes is limited.
I’m using Alpine Linux’s build of mandoc 1.14.6.
---
The context is the way the (-)man page generator scdoc
( https://git.sr.ht/~sircmpwn/scdoc ) handles lists: A list item
- a
becomes
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.IP \(bu 4
.\}
a
.RE
apparently for the looks.
--
Humm
--
To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: \h in PostScript and PDF output 2022-01-09 13:59 \h in PostScript and PDF output Humm @ 2022-01-10 7:37 ` Ingo Schwarze 2022-01-10 16:45 ` Humm 0 siblings, 1 reply; 4+ messages in thread From: Ingo Schwarze @ 2022-01-10 7:37 UTC (permalink / raw) To: Humm; +Cc: discuss Hello, Humm wrote on Sun, Jan 09, 2022 at 01:59:25PM +0000: > Consider the input > > .TH A 1 > .SH S > a\h'1u'b > > For ASCII and UTF-8 output, in the output there is > > ab > > In PDF and PostScript output, the “b” is way too far to the right. Thank your for your report. That is indeed a bug. I believe that the patch appended below fixes the bug; it survived my testing without finding regressions, and i intend to commit it to openbsd.org and bsd.lv shortly. Does it work for you, too? Do you want to be credited in the commit message and in the next release notes for reporting the bug? If so, please send me a complete real name. If you do not send a complete real name, i shall credit you as Humm <hummsmith42 at gmail dot com> in the commit commit message but probably not mention you at all in the release notes. The release notes do not contain mail addresses, and "Humm" alone isn't really helpful to identify anybody. > The PostScript for it is > > 87.274 687.599(a)s > 166.408(b)c > > That scales up: With \h'1n', the “b” is already off the page and thus > not visible. For `a\h'1n'b c`, there is a line break before “c”. > > roff(7) does mention: (COMPATIBILITY) > >> Support for explicit movement requests and escapes is limited. Yes, i think that is still true. Not exactly non-existent, but still limited. Your report was quite useful in an additional respect. Roff programs juggle many different units representing lengths, and mandoc is no different. For that reason, when first looking at your report and the related code, i failed to see right away what exactly is going on. So i decided to draft a table detailing which units are used by which variables and functions, and in the end, it turned out your bug is indeed a bug related to unit conversions. These are the fundamental units used: * struct roffsu (double value with an explicit unit identifier) used for example by a2roffsu(), SCALE_HS_INIT(), and others * Adobe Font Metrics units (AFM) used for example by glyph.wx, ps_advance(), termp_ps.*, and others * en units (width of the 'n' glyph == 1) used for example by ascii_advance(), html*, manoutput.*, and others * AFM or en units in a context-dependendent manner used for example by a2width(), roffcol.*, term_hen(), term_len(), term_strlen(), termp.advance(), termp.width(), and many others * units of 1/24 AFM used by ps_hspan() and ps_setwidth() * terminal basic units (= 1/24 en) used by ascii_hspan() and ascii_setwidth() * AFM/24 or en/24 units in a context-dependendent manner used by term_hspan() and term_setwidth() * all other units have a fixed relationship to each other; for ASCII: 1i = 2.54c = 6v = 6P = 10m = 10n = 72p = 240u = 1000M but for PS/PDF: 1v = 1400 AFM, 1m = 778 AFM, 1n = 500 AFM; the rest is regular: 1i = 6545 AFM, ..., 1u = 27.27 AFM I'm not going to bore you with the full table; it is significantly more complicated. While researching this table, i found nineteen (19!) additional candidate places where i strongly suspect that a call to term_len() is either missing or this function is called incorrectly, resulting in more or less wrong positioning of PostScript and PDF output (ASCII, UTF-8, and HTML output are not wrong in these cases). I'll have to further inspect, fix, and test those nineteen places one by one. Uh oh... I freely admit that while mandoc PostScript and PDF output is usable for many simple purposes, its quality is significantly below the quality of ASCII, UTF-8, and HTML output, in several respects - some of these respects are conceptional, and bugs are also more numerous for PostScript and PDF. > I’m using Alpine Linux’s build of mandoc 1.14.6. In general, thank you for mentioning that. In this case, fortunately, the problem was easily reproducible on -current. > The context is Thank you for mentioning the context, that often provides information relevant for development, and sometimes for choosing priorities. > the way the (-)man page generator scdoc > ( https://git.sr.ht/~sircmpwn/scdoc ) Yikes, that one yet again. While it is not quite as atrocious as DocBook, it is well-known to be in the lower regions of the quality scale of man(7) code generators. So please avoid using it if you can (of course, when porting third-party software to Alpine, you may not get any chance to avoid it). > handles lists: A list item > > - a > > becomes > > .RS 4 > .ie n \{\ > \h'-04'\(bu\h'+03'\c > .\} > .el \{\ > .IP \(bu 4 > .\} > a > .RE > > apparently for the looks. Right, that's incredibly stupid. I hate it when people who do not understand the man(7) language but go ahead and write man(7) code generators anyway. In this particular case, the man(7) language provides the .TP macro for just that purpose, so mucking around with .RS and rather fragile low-level roff(7) code is quite absurd. That said, that's of course no excuse for mandoc mishandling \h in PostScript and PDF output mode. Yours, Ingo Index: term.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/term.c,v retrieving revision 1.144 diff -u -p -r1.144 term.c --- term.c 4 Oct 2021 18:56:24 -0000 1.144 +++ term.c 10 Jan 2022 06:35:29 -0000 @@ -1,6 +1,6 @@ /* $OpenBSD: term.c,v 1.144 2021/10/04 18:56:24 schwarze Exp $ */ /* - * Copyright (c) 2010-2021 Ingo Schwarze <schwarze@openbsd.org> + * Copyright (c) 2010-2022 Ingo Schwarze <schwarze@openbsd.org> * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv> * * Permission to use, copy, modify, and distribute this software for any @@ -634,12 +634,14 @@ term_word(struct termp *p, const char *w if (a2roffsu(seq, &su, SCALE_EM) == NULL) continue; uc += term_hen(p, &su); - if (uc > 0) - while (uc-- > 0) + if (uc > 0) { + while (uc > 0) { bufferc(p, ASCII_NBRSP); - else if (p->col > (size_t)(-uc)) + uc -= term_len(p, 1); + } + } else if (p->col > (size_t)(-uc)) { p->col += uc; - else { + } else { uc += p->col; p->col = 0; if (p->tcol->offset > (size_t)(-uc)) { -- To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: \h in PostScript and PDF output 2022-01-10 7:37 ` Ingo Schwarze @ 2022-01-10 16:45 ` Humm 2022-01-10 18:50 ` Ingo Schwarze 0 siblings, 1 reply; 4+ messages in thread From: Humm @ 2022-01-10 16:45 UTC (permalink / raw) To: Ingo Schwarze; +Cc: discuss Quoth Ingo Schwarze: >I believe that the patch appended below fixes the bug; it survived >my testing without finding regressions, and i intend to commit it to >openbsd.org and bsd.lv shortly. Does it work for you, too? It does. >Do you want to be credited in the commit message and in the next release >notes for reporting the bug? If so, please send me a complete real name. >If you do not send a complete real name, i shall credit you as > Humm <hummsmith42 at gmail dot com> >in the commit commit message but probably not mention you at all >in the release notes. The release notes do not contain mail addresses, >and "Humm" alone isn't really helpful to identify anybody. Sure, Lennart Jablonka it is then. >While researching this table, i found nineteen (19!) additional >candidate places where i strongly suspect that a call to term_len() >is either missing or this function is called incorrectly, resulting in >more or less wrong positioning of PostScript and PDF output (ASCII, >UTF-8, and HTML output are not wrong in these cases). I'll have >to further inspect, fix, and test those nineteen places one by one. >Uh oh... Good luck have fun! >I freely admit that while mandoc PostScript and PDF output is usable >for many simple purposes, its quality is significantly below the >quality of ASCII, UTF-8, and HTML output, in several respects - >some of these respects are conceptional, and bugs are also more >numerous for PostScript and PDF. Yeah, the PDF and PostScript output doesn’t look as beautiful as I have come to expect from computerized typesetting. >Right, that's incredibly stupid. I hate it when people who do not >understand the man(7) language but go ahead and write man(7) code >generators anyway. In this particular case, the man(7) language >provides the .TP macro for just that purpose, so mucking around >with .RS and rather fragile low-level roff(7) code is quite absurd. I agree. .TP introduces vertical space between paragraphs. For some reason, the author cares about that/doesn’t want it. -- Humm -- To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: \h in PostScript and PDF output 2022-01-10 16:45 ` Humm @ 2022-01-10 18:50 ` Ingo Schwarze 0 siblings, 0 replies; 4+ messages in thread From: Ingo Schwarze @ 2022-01-10 18:50 UTC (permalink / raw) To: Humm; +Cc: discuss Hello Lennart, Humm wrote on Mon, Jan 10, 2022 at 04:45:39PM +0000: > Quoth Ingo Schwarze: >> I believe that the patch appended below fixes the bug; it survived >> my testing without finding regressions, and i intend to commit it to >> openbsd.org and bsd.lv shortly. Does it work for you, too? > It does. Thank you for testing! Committed, see below. [...] >> I'll have to further inspect, fix, and test those nineteen places >> one by one. Uh oh... > Good luck have fun! Heh, thanks, but i could imagine less tedious work. :-| In particular since i'm not aware of any reasonable way to do regression testing of PostScript or PDF output... [...] > Yeah, the PDF and PostScript output doesn’t look as beautiful > as I have come to expect from computerized typesetting. Exactly. Mandoc is not a typesetting system but a manual page formatter focussing on terminal and HTML output. When i want real typesetting of manual pages to PostScript or PDF, even i use groff. [...] > I agree. .TP introduces vertical space between paragraphs. For some > reason, the author cares about that/doesn’t want it. That's what the (reasonably portable) .PD macro is designed for. It's mildly discouraged because it is a purely presentational macro, but nothing much is wrong with using it - after all, the whole man(7) language mostly provides presentational rather than semantic markup. $ man -T ascii -O width=72 -s 7 man PD Specify the vertical space to be inserted before each new paragraph. The syntax is as follows: .PD [height] The height argument is a roff(7) scaling width. It defaults to 1v. If the unit is omitted, v is assumed. This macro affects the spacing before any subsequent instances of HP, IP, LP, P, PP, SH, SS, SY, and TP. Or alternatively, if you don't care that much about portability, are content with groff and mandoc as supported formatters, and want better readability of the man(7) source code, you could use: TQ Like TP, except that no vertical spacing is inserted before the paragraph. This is a non-standard GNU extension and very rarely used even in GNU manual pages. Resorting to low-level roff(7) is just nuts and even more fragile than using a GNU extension like .TQ. Yours, Ingo Log Message: ----------- When rendering the \h (horizontal motion) low-level roff(7) escape sequence in -T ps and -T pdf output mode, use an appropriate horizontal distance by correctly using the term_len() utility function. Output from the -T ascii, -T utf8, and -T html modes was already correct and remains unchanged. Lennart Jablonka <hummsmith42 at gmail dot com> found and reported this unit conversion bug (misinterpreting AFM units as if they were en units) when rendering scdoc-generated manuals (which is a low quality generator, but that's no excuse for mandoc misformatting \h) on Alpine Linux. Lennart also tested this patch. Modified Files: -------------- mandoc: term.c Revision Data ------------- Index: term.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/term.c,v retrieving revision 1.284 retrieving revision 1.285 diff -Lterm.c -Lterm.c -u -p -r1.284 -r1.285 --- term.c +++ term.c @@ -1,6 +1,6 @@ /* $Id$ */ /* - * Copyright (c) 2010-2021 Ingo Schwarze <schwarze@openbsd.org> + * Copyright (c) 2010-2022 Ingo Schwarze <schwarze@openbsd.org> * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv> * * Permission to use, copy, modify, and distribute this software for any @@ -636,12 +636,14 @@ term_word(struct termp *p, const char *w if (a2roffsu(seq, &su, SCALE_EM) == NULL) continue; uc += term_hen(p, &su); - if (uc > 0) - while (uc-- > 0) + if (uc > 0) { + while (uc > 0) { bufferc(p, ASCII_NBRSP); - else if (p->col > (size_t)(-uc)) + uc -= term_len(p, 1); + } + } else if (p->col > (size_t)(-uc)) { p->col += uc; - else { + } else { uc += p->col; p->col = 0; if (p->tcol->offset > (size_t)(-uc)) { -- To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-01-10 18:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-09 13:59 \h in PostScript and PDF output Humm 2022-01-10 7:37 ` Ingo Schwarze 2022-01-10 16:45 ` Humm 2022-01-10 18:50 ` Ingo Schwarze
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).