Hi Wesley, Wesley Moore wrote on Wed, Nov 15, 2023 at 06:29:15PM +1000: > On Wed, 15 Nov 2023, at 11:26 AM, Ingo Schwarze wrote: [...] >> * It uses the mandoc(1) parsing and formatting engine by default >> since the same date. > Yes (pretty much). It looks like it was added a day later > https://github.com/chimera-linux/cports/commit/3c493733f7f0df345816cc69e273fca32fe70830 Ah, being unfamiliar with the Chimera Linux packaging system, i did not find that commit. I just added HTML comments referencing the two commit hashes to https://mandoc.bsd.lv/ports.html . It appears the dates are already correct, though, Github lists both commits as "Oct 30, 2021". [...] >> So i expect most users won't feel offended if installing new >> software takes a few seconds. > Perhaps that is the case but it bothered me enough to go to the > effort of writing these patches... :-) > [snip] >> So only 10-15% of the time is spent in roff_getstrn(), which cannot >> possibly result in the 25-35% overall speedup you report, so something >> looks fishy here. Then again, even 25-35% is not a large gain. > Those are the numbers I saw from my basic benchmarking measuring the > wall clock time of makewhatis -Tutf -n across three runs and averaging > the results for each machine. The raw data from my notes is available here: > > https://gist.github.com/wezm/4043bde0dc2974c88b7706c60b58f900 The first "Before" lines for OpenBSD and Raspberry are apparently off to the upper side, possibly because the pages were not yet in the buffer cache and had to be physically read from disk. But even excluding these, we have OpenBSD 3.89 -> 2.71 -30% Raspberry 7.18 -> 5.91 -18% Torrent 2.65 -> 1.90 -38% which is still way above what might be possible according to my measurements. I'm not accusing you of lying, btw. :-) This could be due to bugs in mandoc, bugs in your patch, mistakes in your or my measurement methodology, differences in the samples of manual page trees we used, differences in the operating systems, C libraries, or compilers we used, or differences in CFLAGS. None of these look particularly likely causes to me. But that only means i have to be extra careful to not commit anything that breaks parsing or formatting, given that we are clearly not fully understanding what is going on. > [snip] >> One other thing. I hate patch series, don't ever send them. >> They are usually a symptom of poor development methodology, >> and usually i reject them without even inspecting them. >> Here, i made an exception because your idea seems to have some merit. >> >> Every patch needs to >> * perform one well-defined task and >> * make the code better even if nothing else is ever committed >> building on it. >> >> Splitting a patch like you did it here only adds obfuscation >> If a patch becomes too big for review, then the *task* it performs >> needs to be split into logically separate steps, rather than >> splitting code changes into mutiple patch files that essentially >> all save the same purpose and are similar and closely related in >> their content. Such *logical* splitting can become tricky, but >> i doubt that is needed here. > Sorry about that. While I've been programming for a couple of decades > the email based patch workflow is very new to me and different to all > my experience to date. Sure, when working for hire, it also happened to me that i had to cope with dubious methodologies like patch series (which encourage sloppy design, sloppy implementation, very sloppy code review or no code review at all) or feature branches (which encourage bit rot, logical conflicts, integration mishaps, and sloppy and delayed merging and testing on top of the problems of patch series). But even if most of the biomass in a state comes in the form of cows, that doesn't imply cows are particularly intelligent and everybody should strive for bovinity. ;-) > If I post any more changes I'll be sure to follow your suggestions. Thanks! >> I'm not yet completely sure what the best way forward is. >> Probably something like: >> 1. Identifying all places that do string lookup in tables or lists. >> Some can probably be left alone: >> arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c, >> mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c, >> tbl_layout.c, tbl_opts.c >> But the others should probably be dealt with in a unified manner: >> reqtab, mdocmac, manmac on the one hand >> strtab, rentab, pretab, xmbtab on the other hand >> Not sure yet whether xtab can be included into xmbtab. >> 2. Design and implement a common handling for these using ohash >> that is as simple as possible. > When I have a moment I'll take a look through these and try to come up > with an approach/plan and post it for feedback before making code changes. After consulting my pillow, i believe the following may work: 1st step: Change strtab, rentab, xmbtab, roff_freestr, roff_setstrn, roff_getstrn, and roff_strdup to use ohash. That's very close to what you did in your first two patches but avoids any additional complication. It's one logical step because it changes the way struct roffkv is used. That likely results in the main performance gain. Once that is committed, we can optionally consider further steps, one by one, for example: 2nd step: Implement pretab using the facilities from step 1 similar to what you did in your third patch. That's independent because predefs[] is only used by roff_getstrn and does not use struct roffkv yet. 3rd step: Maybe unify reqtab, mdocmac, and manmac in order to resolve the two ugly roff_name[] loops in roff_getstrn. That's likely independent, too, because it won't use struct roffkv even in the final state of the code. Of course, i cannot guarantee that will work - to provide such a guarantee, i would have to do the work myself. ;-) But it looks promising to me, well in line with your goal, and similar enough to your original implementation. By the way, *if* this ends up in an amount of code that is non-trivial in the sense of Copyright, are you OK with having * Copyright (c) 2023 Wesley Moore <wes@wezm.net> Added to the relevant files and to the LICENSE file in the portable mandoc root directory? Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hi Ingo, Thanks for the detailed reply. On Wed, 15 Nov 2023, at 11:26 AM, Ingo Schwarze wrote: > > Can you confirm that these statements are accurate? > > * Chimera Linux includes mandoc in its base system since 2021 Oct 30. That's correct. > * There were no official nor unofficial packages of mandoc > for Chimera Linux before that. Yes that's right. > * It uses the mandoc(1) parsing and formatting engine by default > since the same date. Yes (pretty much). It looks like it was added a day later https://github.com/chimera-linux/cports/commit/3c493733f7f0df345816cc69e273fca32fe70830 > * The default apropos(1) program is mandoc since the same date. > * The default man(1) program is mandoc since the same date. > * Chimera Linux does not provide an officially supported > alternative for the man(1) program that users can select. > * Chimera Linux does not publish its manual pages on the web > on a site similar to https://manpages.debian.org/ > or https://man.openbsd.org/ or https://man.voidlinux.org/ . Yes all of above are correct. >> makewhatis -Tutf8 is run to rebuild the index >> whenever a package containing man pages is installed or updated. >> I noticed this took multiple seconds even on a Ryzen 9 7950X system. > > That may not be a major problem because installing an additional > package is not usually a fast operation (it usually requires both > network access and expensive database locking and management outside > the domain of manual pages) and it isn't done often either. > > So i expect most users won't feel offended if installing new > software takes a few seconds. Perhaps that is the case but it bothered me enough to go to the effort of writing these patches... [snip] > So only 10-15% of the time is spent in roff_getstrn(), which cannot > possibly result in the 25-35% overall speedup you report, so something > looks fishy here. Then again, even 25-35% is not a large gain. Those are the numbers I saw from my basic benchmarking measuring the wall clock time of makewhatis -Tutf -n across three runs and averaging the results for each machine. The raw data from my notes is available here: https://gist.github.com/wezm/4043bde0dc2974c88b7706c60b58f900 [snip] > One other thing. I hate patch series, don't ever send them. > They are usually a symptom of poor development methodology, > and usually i reject them without even inspecting them. > Here, i made an exception because your idea seems to have some merit. > > Every patch needs to > * perform one well-defined task and > * make the code better even if nothing else is ever committed > building on it. > > Splitting a patch like you did it here only adds obfuscation > If a patch becomes too big for review, then the *task* it performs > needs to be split into logically separate steps, rather than > splitting code changes into mutiple patch files that essentially > all save the same purpose and are similar and closely related in > their content. Such *logical* splitting can become tricky, but > i doubt that is needed here. Sorry about that. While I've been programming for a couple of decades the email based patch workflow is very new to me and different to all my experience to date. If I post any more changes I'll be sure to follow your suggestions. > I'm not yet completely sure what the best way forward is. > Probably something like: > 1. Identifying all places that do string lookup in tables or lists. > Some can probably be left alone: > arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c, > mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c, > tbl_layout.c, tbl_opts.c > But the others should probably be dealt with in a unified manner: > reqtab, mdocmac, manmac on the one hand > strtab, rentab, pretab, xmbtab on the other hand > Not sure yet whether xtab can be included into xmbtab. > 2. Design and implement a common handling for these using ohash > that is as simple as possible. When I have a moment I'll take a look through these and try to come up with an approach/plan and post it for feedback before making code changes. > Not sure yet whether these are multiple steps or a single step, > but doing one separate step for every *tab is definitely not the > way. > > I guess that's enough for today... > > Yours, > Ingo Wes -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hi Wesley, Wesley Moore wrote on Mon, Nov 13, 2023 at 11:42:42AM +1000: > In this patch set I aimed to improve the performance of makewhatis. > On my Chimera Linux systems Interesting. I just added Chimera Linux to these two pages: https://mandoc.bsd.lv/ports.html https://mandoc.bsd.lv/porthistory.html Can you confirm that these statements are accurate? * Chimera Linux includes mandoc in its base system since 2021 Oct 30. * There were no official nor unofficial packages of mandoc for Chimera Linux before that. * It uses the mandoc(1) parsing and formatting engine by default since the same date. * The default apropos(1) program is mandoc since the same date. * The default man(1) program is mandoc since the same date. * Chimera Linux does not provide an officially supported alternative for the man(1) program that users can select. * Chimera Linux does not publish its manual pages on the web on a site similar to https://manpages.debian.org/ or https://man.openbsd.org/ or https://man.voidlinux.org/ . > makewhatis -Tutf8 is run to rebuild the index > whenever a package containing man pages is installed or updated. That's not particularly smart. Rebuilding the whole database from scratch is unavoidably slow. The makewhatis(8) options -d and -u are specifically provided to serve the needs of package management systems. Specifically, if a package management systems cares about performance, it can use -d to add just those manual page files it just installed to the existing database, rather than rebuilding the whole database from scratch, which obviously implies reading every manual page file in the whole system from disk and parsing all those files. > I noticed this took multiple seconds even on a Ryzen 9 7950X system. That may not be a major problem because installing an additional package is not usually a fast operation (it usually requires both network access and expensive database locking and management outside the domain of manual pages) and it isn't done often either. So i expect most users won't feel offended if installing new software takes a few seconds. > Profiling showed a lot of time spent in the loops of roff_getstrn. Hmm, even though i'm not convinced your usecase is particularly important, that's interesting, and i wouldn't have expected it. It may be worth looking into that. > To speed this up I made use of hash tables. I tested the changes on > several different systems: > > - OpenBSD amd64 in VM > - Linux on Raspberry Pi 4 > - Linux on Ryzen 9 7950X > > In each case the wall clock run time for makewhatis -Tutf8 -n showed > an reduction of between 25% and 35%. For makewhatis(8), i would say that's such a small gain that it likely isn't worth much complication. Then again, it also affects mandoc(1), so maybe looking into it is worthwhile after all. Here is what i found with gprof(1) on OpenBSD-current: share local 100% 100% main -> mandocdb 92% 95% (mandocdb) -> mpages_merge 66% 67% (mpages_merge) -> mparse_readfd -> mparse_buf_r 32% 26% (mparse_buf_r) -> roff_parseln 19% 21% MANY -> free -> ofree 16.3% 2.5% (mparse_buf_r) -> mdoc_parseln 6.8% 18.4% (mparse_buf_r) -> man_parseln 15.6% 14.7% MANY -> roff_word_alloc 14.9% 17.0% (_libc_malloc etc) -> omalloc 14.7% 10.9% (mpages_merge) -> mparse_reset 14.6% 10.6% (roff_parse, roff_expand) -> roff_getstrn The "share" column refers to /usr/share/man/, i.e. the OpenBSD base system manuals, a good example of a large, mdoc(7)-heavy collection of manual pages. The "local" column refers to /usr/local/man/, i.e. the optional third-party software i happen to have installed, which contains mostly man(7) manual pages. On each line, the two numbers give the time spent in the respective function including functions called from it, relative to the time spent in main() including all functions called from there. The last name on each line is the respective function. The names before the arrows specify from where it is usually called. So only 10-15% of the time is spent in roff_getstrn(), which cannot possibly result in the 25-35% overall speedup you report, so something looks fishy here. Then again, even 25-35% is not a large gain. In general, improving the structure is much more important than improving the performance because the code has grown historically, becoming more and more complicated and less and less uniform in structure, i.e. harder and harder to maintain. In particular, such a small gain is clearly not worth further complication. On the other hand, the critical code path for the optimization potential you found is: roff_parse -> roff_getstrn In that area, code structure is not particularly good. Parsing for macros and strings is done in various different ways: 1. The code path roff_parse -> roff_getstrn looks for user-defined and renamed macros using r->strtab and r->rentab. 2. The code path roff_expand -> roff_getstrn looks for predefined strings using a linear search in predefs[]. 3. The code paths roff_block(ROFF_am) -> roff_getstrn and roff_rn -> roff_getstrn look for standard high-level macros directly in the roff_name[] array. 4. The functions mdoc_pmacro() in mdoc.c and man_pmacro() in man.c look for standard high-level macros using local mdoc->mdocmac and man->manmac hash tables with roffhash_find(). So it is likely this can be better structured with less duplication and not gratuitiously using different data structures, while also using hash tables for macro and string lookup throughout. > The one caveat of this change is the roff/while/into regression test is > failing with my changes. I've tried to work out why but I don't know > roff well enough understand the intent of the test nor have I been able > to clearly determine why my changes affect it. I was hoping to get some > help with this. The roff/while/into test uses .de in a very strange way, and .de internally uses the string storage, so that's how it may be related, but i would have to look at the details. Then again, studying that is not urgent because your patch is definitely not ready for commit. It adds complication rather than removing some. In particular: * Do not touch roff_int.h. It is only intended for internal API features needed by more than one parser, see mandoc_headers(3) for details, but what you are doing here is local to the roff parser (roff.c). * struct roff_entry is essentially the same as struct roffkv. There should not be two data structures for the same purpose. * roff_free_entry() is trivial and only used at one place, so it should be inlined. * Renaming roff_setstrn to roff_setentry causes quite some churn, so don't do that. If that means using ohash for xmbtab, too, so be it. * roff_strhash_alloc is also next to trivial. Maybe it can be inlined - maybe not because it is called from a few more places. If we introduce it, roffhash_alloc should also use it. * The duplication between roffhash_free() and roff_strhash_free() and between roffhash_find() and roff_strhash_find() is rather ugly. Not yet sure what to do about that. * roff_strhash_insert() is trivial and called only from one place. Some ideas are also good: * Using struct ohash *strtab directly in struct roff is good, just like it is already done for struct ohash *reqtab. * Introducing r->pretab and clearing it in roff_free() rather than in the badly named roff_free1() - just like r->reqtab - is good. One other thing. I hate patch series, don't ever send them. They are usually a symptom of poor development methodology, and usually i reject them without even inspecting them. Here, i made an exception because your idea seems to have some merit. Every patch needs to * perform one well-defined task and * make the code better even if nothing else is ever committed building on it. Splitting a patch like you did it here only adds obfuscation If a patch becomes too big for review, then the *task* it performs needs to be split into logically separate steps, rather than splitting code changes into mutiple patch files that essentially all save the same purpose and are similar and closely related in their content. Such *logical* splitting can become tricky, but i doubt that is needed here. I'm not yet completely sure what the best way forward is. Probably something like: 1. Identifying all places that do string lookup in tables or lists. Some can probably be left alone: arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c, mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c, tbl_layout.c, tbl_opts.c But the others should probably be dealt with in a unified manner: reqtab, mdocmac, manmac on the one hand strtab, rentab, pretab, xmbtab on the other hand Not sure yet whether xtab can be included into xmbtab. 2. Design and implement a common handling for these using ohash that is as simple as possible. Not sure yet whether these are multiple steps or a single step, but doing one separate step for every *tab is definitely not the way. I guess that's enough for today... Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
--- roff.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/roff.c b/roff.c index 74afaab..ff85a63 100644 --- a/roff.c +++ b/roff.c @@ -113,6 +113,7 @@ struct roff { struct mctx *mstack; /* stack of macro contexts */ int *rstack; /* stack of inverted `ie' values */ struct ohash *reqtab; /* request lookup table */ + struct ohash *pretab; /* predefined strings table */ struct roffreg *regtab; /* number registers */ struct ohash *strtab; /* user-defined strings & macros */ struct ohash *rentab; /* renamed strings & macros */ @@ -241,6 +242,7 @@ static int roff_parse_comment(struct roff *, struct buf *, int, int, char); static int roff_parsetext(struct roff *, struct buf *, int, int *); +static struct ohash *roff_pretab_alloc(void); static int roff_renamed(ROFF_ARGS); static int roff_req_or_macro(ROFF_ARGS); static int roff_return(ROFF_ARGS); @@ -700,6 +702,21 @@ roffhash_find(struct ohash *htab, const char *name, size_t sz) /* --- key-value table ---------------------------------------------------- */ +struct ohash * +roff_pretab_alloc(void) +{ + int i; + struct ohash *htab; + struct predef pre; + + htab = roff_strhash_alloc(); + for (i = 0; i < PREDEFS_MAX; i++) { + pre = predefs[i]; + roff_setentry(htab, pre.name, strlen(pre.name), pre.str, strlen(pre.str), 0); + } + return htab; +} + struct ohash * roff_strhash_alloc(void) { @@ -852,6 +869,7 @@ roff_free(struct roff *r) int i; roff_free1(r); + roff_strhash_free(r->pretab); for (i = 0; i < r->mstacksz; i++) free(r->mstack[i].argv); free(r->mstack); @@ -866,6 +884,7 @@ roff_alloc(int options) r = mandoc_calloc(1, sizeof(struct roff)); r->reqtab = roffhash_alloc(0, ROFF_RENAMED); + r->pretab = roff_pretab_alloc(); r->strtab = roff_strhash_alloc(); r->rentab = roff_strhash_alloc(); r->options = options | MPARSE_COMMENT; @@ -4318,8 +4337,8 @@ static const char * roff_getstrn(struct roff *r, const char *name, size_t len, int *deftype) { - int found, i; const struct roff_entry *entry; + int found; enum roff_tok tok; found = 0; @@ -4341,16 +4360,13 @@ roff_getstrn(struct roff *r, const char *name, size_t len, found = 1; } } - for (i = 0; i < PREDEFS_MAX; i++) { - if (strncmp(name, predefs[i].name, len) != 0 || - predefs[i].name[len] != '\0') - continue; + entry = roff_strhash_find(r->pretab, name, len); + if (entry != NULL) { if (*deftype & ROFFDEF_PRE) { *deftype = ROFFDEF_PRE; - return predefs[i].str; + return entry->val.p; } else { found = 1; - break; } } if (r->man->meta.macroset != MACROSET_MAN) { -- 2.42.1 -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
--- roff.c | 50 ++++++++++++++++++++++++-------------------------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/roff.c b/roff.c index 4e5bbf9..74afaab 100644 --- a/roff.c +++ b/roff.c @@ -114,8 +114,8 @@ struct roff { int *rstack; /* stack of inverted `ie' values */ struct ohash *reqtab; /* request lookup table */ struct roffreg *regtab; /* number registers */ - struct roffkv *rentab; /* renamed strings & macros */ struct ohash *strtab; /* user-defined strings & macros */ + struct ohash *rentab; /* renamed strings & macros */ struct roffkv *xmbtab; /* multi-byte trans table (`tr') */ struct roffstr *xtab; /* single-byte trans table (`tr') */ const char *current_string; /* value of last called user macro */ @@ -818,10 +818,10 @@ roff_free1(struct roff *r) r->regtab = NULL; roff_strhash_free(r->strtab); - roff_freestr(r->rentab); + roff_strhash_free(r->rentab); roff_freestr(r->xmbtab); - r->strtab = NULL; - r->rentab = r->xmbtab = NULL; + r->strtab = r->rentab = NULL; + r->xmbtab = NULL; if (r->xtab) for (i = 0; i < 128; i++) @@ -835,6 +835,7 @@ roff_reset(struct roff *r) { roff_free1(r); r->strtab = roff_strhash_alloc(); + r->rentab = roff_strhash_alloc(); r->options |= MPARSE_COMMENT; r->format = r->options & (MPARSE_MDOC | MPARSE_MAN); r->control = '\0'; @@ -866,6 +867,7 @@ roff_alloc(int options) r = mandoc_calloc(1, sizeof(struct roff)); r->reqtab = roffhash_alloc(0, ROFF_RENAMED); r->strtab = roff_strhash_alloc(); + r->rentab = roff_strhash_alloc(); r->options = options | MPARSE_COMMENT; r->format = options & (MPARSE_MDOC | MPARSE_MAN); r->mstackpos = -1; @@ -2084,8 +2086,8 @@ roff_parse(struct roff *r, char *buf, int *pos, int ln, int ppos) *pos = cp - buf; else if (deftype == ROFFDEF_UNDEF) { /* Using an undefined macro defines it to be empty. */ - roff_setstrn(&r->rentab, mac, maclen, NULL, 0, 0); roff_setentry(r->strtab, mac, maclen, "", 0, 0); + roff_setentry(r->rentab, mac, maclen, NULL, 0, 0); } return t; } @@ -2252,8 +2254,8 @@ roff_block(ROFF_ARGS) */ if (tok == ROFF_de || tok == ROFF_dei) { - roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); roff_setentry(r->strtab, name, namesz, "", 0, 0); + roff_setentry(r->rentab, name, namesz, NULL, 0, 0); } else if (tok == ROFF_am || tok == ROFF_ami) { deftype = ROFFDEF_ANY; value = roff_getstrn(r, iname, namesz, &deftype); @@ -2265,13 +2267,13 @@ roff_block(ROFF_ARGS) case ROFFDEF_REN: /* call original standard macro. */ csz = mandoc_asprintf(&call, ".%.*s \\$* \\\"\n", (int)strlen(value), value); - roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); roff_setentry(r->strtab, name, namesz, call, csz, 0); + roff_setentry(r->rentab, name, namesz, NULL, 0, 0); free(call); break; case ROFFDEF_STD: /* rename and call standard macro. */ rsz = mandoc_asprintf(&rname, "__%s_renamed", name); - roff_setstrn(&r->rentab, rname, rsz, name, namesz, 0); + roff_setentry(r->rentab, rname, rsz, name, namesz, 0); csz = mandoc_asprintf(&call, ".%.*s \\$* \\\"\n", (int)rsz, rname); roff_setentry(r->strtab, name, namesz, call, csz, 0); @@ -2852,7 +2854,7 @@ roff_ds(ROFF_ARGS) /* The rest is the value. */ roff_setentry(r->strtab, name, namesz, string, strlen(string), ROFF_as == tok); - roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); + roff_setentry(r->rentab, name, namesz, NULL, 0, 0); return ROFF_IGN; } @@ -3281,8 +3283,8 @@ roff_rm(ROFF_ARGS) while (*cp != '\0') { name = cp; namesz = roff_getname(r, &cp, ln, (int)(cp - buf->buf)); - roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); roff_setentry(r->strtab, name, namesz, NULL, 0, 0); + roff_setentry(r->rentab, name, namesz, NULL, 0, 0); if (name[namesz] == '\\' || name[namesz] == '\t') break; } @@ -3628,8 +3630,8 @@ roff_als(ROFF_ARGS) valsz = mandoc_asprintf(&value, ".%.*s \\$@\\\"\n", (int)oldsz, oldn); - roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); roff_setentry(r->strtab, newn, newsz, value, valsz, 0); + roff_setentry(r->rentab, newn, newsz, NULL, 0, 0); free(value); return ROFF_IGN; } @@ -3912,26 +3914,26 @@ roff_rn(ROFF_ARGS) value = roff_getstrn(r, oldn, oldsz, &deftype); switch (deftype) { case ROFFDEF_USER: - roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); roff_setentry(r->strtab, newn, newsz, value, strlen(value), 0); roff_setentry(r->strtab, oldn, oldsz, NULL, 0, 0); + roff_setentry(r->rentab, newn, newsz, NULL, 0, 0); break; case ROFFDEF_PRE: - roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); roff_setentry(r->strtab, newn, newsz, value, strlen(value), 0); + roff_setentry(r->rentab, newn, newsz, NULL, 0, 0); break; case ROFFDEF_REN: - roff_setstrn(&r->rentab, newn, newsz, value, strlen(value), 0); - roff_setstrn(&r->rentab, oldn, oldsz, NULL, 0, 0); + roff_setentry(r->rentab, newn, newsz, value, strlen(value), 0); + roff_setentry(r->rentab, oldn, oldsz, NULL, 0, 0); roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); break; case ROFFDEF_STD: - roff_setstrn(&r->rentab, newn, newsz, oldn, oldsz, 0); + roff_setentry(r->rentab, newn, newsz, oldn, oldsz, 0); roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); break; default: - roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); + roff_setentry(r->rentab, newn, newsz, NULL, 0, 0); break; } return ROFF_IGN; @@ -4167,7 +4169,7 @@ roff_setstr(struct roff *r, const char *name, const char *string, namesz = strlen(name); roff_setentry(r->strtab, name, namesz, string, string ? strlen(string) : 0, append); - roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); + roff_setentry(r->rentab, name, namesz, NULL, 0, 0); } static void @@ -4316,7 +4318,6 @@ static const char * roff_getstrn(struct roff *r, const char *name, size_t len, int *deftype) { - const struct roffkv *n; int found, i; const struct roff_entry *entry; enum roff_tok tok; @@ -4331,16 +4332,13 @@ roff_getstrn(struct roff *r, const char *name, size_t len, found = 1; } } - for (n = r->rentab; n != NULL; n = n->next) { - if (strncmp(name, n->key.p, len) != 0 || - n->key.p[len] != '\0' || n->val.p == NULL) - continue; + entry = roff_strhash_find(r->rentab, name, len); + if (entry != NULL && entry->val.p != NULL) { if (*deftype & ROFFDEF_REN) { *deftype = ROFFDEF_REN; - return n->val.p; + return entry->val.p; } else { found = 1; - break; } } for (i = 0; i < PREDEFS_MAX; i++) { @@ -4396,8 +4394,8 @@ roff_getstrn(struct roff *r, const char *name, size_t len, /* Using an undefined string defines it to be empty. */ - roff_setstrn(&r->rentab, name, len, NULL, 0, 0); roff_setentry(r->strtab, name, len, "", 0, 0); + roff_setentry(r->rentab, name, len, NULL, 0, 0); } *deftype = 0; -- 2.42.1 -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
--- roff.c | 190 +++++++++++++++++++++++++++++++++++++++++++++-------- roff_int.h | 6 ++ 2 files changed, 170 insertions(+), 26 deletions(-) diff --git a/roff.c b/roff.c index bdb0210..4e5bbf9 100644 --- a/roff.c +++ b/roff.c @@ -71,6 +71,14 @@ struct roffkv { struct roffkv *next; /* next in list */ }; +/* + * A key-value string pair used as an entry in an ohash. + */ +struct roff_entry { + struct roffstr val; + char key[]; +}; + /* * A single number register as part of a singly-linked list. */ @@ -106,8 +114,8 @@ struct roff { int *rstack; /* stack of inverted `ie' values */ struct ohash *reqtab; /* request lookup table */ struct roffreg *regtab; /* number registers */ - struct roffkv *strtab; /* user-defined strings & macros */ struct roffkv *rentab; /* renamed strings & macros */ + struct ohash *strtab; /* user-defined strings & macros */ struct roffkv *xmbtab; /* multi-byte trans table (`tr') */ struct roffstr *xtab; /* single-byte trans table (`tr') */ const char *current_string; /* value of last called user macro */ @@ -204,6 +212,7 @@ static void roff_expand_patch(struct buf *, int, static void roff_free1(struct roff *); static void roff_freereg(struct roffreg *); static void roff_freestr(struct roffkv *); +static void roff_free_entry(struct roff_entry *entry); static size_t roff_getname(struct roff *, char **, int, int); static int roff_getnum(const char *, int *, int *, int); static int roff_getop(const char *, int *, char *); @@ -244,6 +253,8 @@ static void roff_setstr(struct roff *, const char *, const char *, int); static void roff_setstrn(struct roffkv **, const char *, size_t, const char *, size_t, int); +static void roff_setentry(struct ohash *r, const char *name, + size_t namesz, const char *string, size_t stringsz, int append); static int roff_shift(ROFF_ARGS); static int roff_so(ROFF_ARGS); static int roff_tr(ROFF_ARGS); @@ -687,6 +698,55 @@ roffhash_find(struct ohash *htab, const char *name, size_t sz) return req == NULL ? TOKEN_NONE : req->tok; } +/* --- key-value table ---------------------------------------------------- */ + +struct ohash * +roff_strhash_alloc(void) +{ + struct ohash *htab; + + htab = mandoc_malloc(sizeof(*htab)); + mandoc_ohash_init(htab, 6, offsetof(struct roff_entry, key)); + return htab; +} + +void +roff_strhash_free(struct ohash *htab) +{ + struct roff_entry *entry; + unsigned int slot; + + if (htab == NULL) + return; + for (entry = ohash_first(htab, &slot); entry != NULL; + entry = ohash_next(htab, &slot)) + roff_free_entry(entry); + ohash_delete(htab); + free(htab); +} + +struct roff_entry * +roff_strhash_find(struct ohash *htab, const char *name, size_t sz) +{ + struct roff_entry *entry; + const char *end; + + if (sz) { + end = name + sz; + entry = ohash_find(htab, ohash_qlookupi(htab, name, &end)); + } else + entry = ohash_find(htab, ohash_qlookup(htab, name)); + return entry; +} + +void +roff_strhash_insert(struct ohash *htab, struct roff_entry *entry) { + unsigned int slot; + + slot = ohash_qlookup(htab, entry->key); + ohash_insert(htab, slot, entry); +} + /* --- stack of request blocks -------------------------------------------- */ /* @@ -757,10 +817,11 @@ roff_free1(struct roff *r) roff_freereg(r->regtab); r->regtab = NULL; - roff_freestr(r->strtab); + roff_strhash_free(r->strtab); roff_freestr(r->rentab); roff_freestr(r->xmbtab); - r->strtab = r->rentab = r->xmbtab = NULL; + r->strtab = NULL; + r->rentab = r->xmbtab = NULL; if (r->xtab) for (i = 0; i < 128; i++) @@ -773,6 +834,7 @@ void roff_reset(struct roff *r) { roff_free1(r); + r->strtab = roff_strhash_alloc(); r->options |= MPARSE_COMMENT; r->format = r->options & (MPARSE_MDOC | MPARSE_MAN); r->control = '\0'; @@ -803,6 +865,7 @@ roff_alloc(int options) r = mandoc_calloc(1, sizeof(struct roff)); r->reqtab = roffhash_alloc(0, ROFF_RENAMED); + r->strtab = roff_strhash_alloc(); r->options = options | MPARSE_COMMENT; r->format = options & (MPARSE_MDOC | MPARSE_MAN); r->mstackpos = -1; @@ -1458,7 +1521,7 @@ roff_expand(struct roff *r, struct buf *buf, int ln, int pos, char ec) if (iendarg - iarg == 2 && buf->buf[iarg] == '.' && buf->buf[iarg + 1] == 'T') { - roff_setstrn(&r->strtab, ".T", 2, NULL, 0, 0); + roff_setentry(r->strtab, ".T", 2, NULL, 0, 0); pos = iend; continue; } @@ -2021,8 +2084,8 @@ roff_parse(struct roff *r, char *buf, int *pos, int ln, int ppos) *pos = cp - buf; else if (deftype == ROFFDEF_UNDEF) { /* Using an undefined macro defines it to be empty. */ - roff_setstrn(&r->strtab, mac, maclen, "", 0, 0); roff_setstrn(&r->rentab, mac, maclen, NULL, 0, 0); + roff_setentry(r->strtab, mac, maclen, "", 0, 0); } return t; } @@ -2189,21 +2252,21 @@ roff_block(ROFF_ARGS) */ if (tok == ROFF_de || tok == ROFF_dei) { - roff_setstrn(&r->strtab, name, namesz, "", 0, 0); roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); + roff_setentry(r->strtab, name, namesz, "", 0, 0); } else if (tok == ROFF_am || tok == ROFF_ami) { deftype = ROFFDEF_ANY; value = roff_getstrn(r, iname, namesz, &deftype); switch (deftype) { /* Before appending, ... */ case ROFFDEF_PRE: /* copy predefined to user-defined. */ - roff_setstrn(&r->strtab, name, namesz, + roff_setentry(r->strtab, name, namesz, value, strlen(value), 0); break; case ROFFDEF_REN: /* call original standard macro. */ csz = mandoc_asprintf(&call, ".%.*s \\$* \\\"\n", (int)strlen(value), value); - roff_setstrn(&r->strtab, name, namesz, call, csz, 0); roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); + roff_setentry(r->strtab, name, namesz, call, csz, 0); free(call); break; case ROFFDEF_STD: /* rename and call standard macro. */ @@ -2211,7 +2274,7 @@ roff_block(ROFF_ARGS) roff_setstrn(&r->rentab, rname, rsz, name, namesz, 0); csz = mandoc_asprintf(&call, ".%.*s \\$* \\\"\n", (int)rsz, rname); - roff_setstrn(&r->strtab, name, namesz, call, csz, 0); + roff_setentry(r->strtab, name, namesz, call, csz, 0); free(call); free(rname); break; @@ -2787,7 +2850,7 @@ roff_ds(ROFF_ARGS) string++; /* The rest is the value. */ - roff_setstrn(&r->strtab, name, namesz, string, strlen(string), + roff_setentry(r->strtab, name, namesz, string, strlen(string), ROFF_as == tok); roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); return ROFF_IGN; @@ -3218,8 +3281,8 @@ roff_rm(ROFF_ARGS) while (*cp != '\0') { name = cp; namesz = roff_getname(r, &cp, ln, (int)(cp - buf->buf)); - roff_setstrn(&r->strtab, name, namesz, NULL, 0, 0); roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); + roff_setentry(r->strtab, name, namesz, NULL, 0, 0); if (name[namesz] == '\\' || name[namesz] == '\t') break; } @@ -3565,8 +3628,8 @@ roff_als(ROFF_ARGS) valsz = mandoc_asprintf(&value, ".%.*s \\$@\\\"\n", (int)oldsz, oldn); - roff_setstrn(&r->strtab, newn, newsz, value, valsz, 0); roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, value, valsz, 0); free(value); return ROFF_IGN; } @@ -3849,26 +3912,26 @@ roff_rn(ROFF_ARGS) value = roff_getstrn(r, oldn, oldsz, &deftype); switch (deftype) { case ROFFDEF_USER: - roff_setstrn(&r->strtab, newn, newsz, value, strlen(value), 0); - roff_setstrn(&r->strtab, oldn, oldsz, NULL, 0, 0); roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, value, strlen(value), 0); + roff_setentry(r->strtab, oldn, oldsz, NULL, 0, 0); break; case ROFFDEF_PRE: - roff_setstrn(&r->strtab, newn, newsz, value, strlen(value), 0); roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, value, strlen(value), 0); break; case ROFFDEF_REN: roff_setstrn(&r->rentab, newn, newsz, value, strlen(value), 0); roff_setstrn(&r->rentab, oldn, oldsz, NULL, 0, 0); - roff_setstrn(&r->strtab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); break; case ROFFDEF_STD: roff_setstrn(&r->rentab, newn, newsz, oldn, oldsz, 0); - roff_setstrn(&r->strtab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); break; default: - roff_setstrn(&r->strtab, newn, newsz, NULL, 0, 0); roff_setstrn(&r->rentab, newn, newsz, NULL, 0, 0); + roff_setentry(r->strtab, newn, newsz, NULL, 0, 0); break; } return ROFF_IGN; @@ -4102,7 +4165,7 @@ roff_setstr(struct roff *r, const char *name, const char *string, size_t namesz; namesz = strlen(name); - roff_setstrn(&r->strtab, name, namesz, string, + roff_setentry(r->strtab, name, namesz, string, string ? strlen(string) : 0, append); roff_setstrn(&r->rentab, name, namesz, NULL, 0, 0); } @@ -4179,25 +4242,93 @@ roff_setstrn(struct roffkv **r, const char *name, size_t namesz, n->val.sz = (int)(c - n->val.p); } +static void +roff_setentry(struct ohash *r, const char *name, size_t namesz, + const char *string, size_t stringsz, int append) +{ + struct roff_entry *entry; + char *c; + int i; + size_t oldch, newch; + + /* Search for an existing string with the same name. */ + entry = roff_strhash_find(r, name, namesz); + if (NULL == entry) { + /* Create a new string table entry. */ + entry = mandoc_malloc(sizeof(*entry) + namesz + 1); + memcpy(entry->key, name, namesz); + entry->key[namesz] = '\0'; + + entry->val.p = NULL; + entry->val.sz = 0; + roff_strhash_insert(r, entry); + } else if (0 == append) { + /* string is already in hash table, free it in + * preparation for updating. + */ + free(entry->val.p); + entry->val.p = NULL; + entry->val.sz = 0; + } + + if (NULL == string) + return; + + /* + * One additional byte for the '\n' in multiline mode, + * and one for the terminating '\0'. + */ + newch = stringsz + (1 < append ? 2u : 1u); + + if (NULL == entry->val.p) { + entry->val.p = mandoc_malloc(newch); + *entry->val.p = '\0'; + oldch = 0; + } else { + oldch = entry->val.sz; + entry->val.p = mandoc_realloc(entry->val.p, oldch + newch); + } + + /* Skip existing content in the destination buffer. */ + c = entry->val.p + (int)oldch; + + /* Append new content to the destination buffer. */ + i = 0; + while (i < (int)stringsz) { + /* + * Rudimentary roff copy mode: + * Handle escaped backslashes. + */ + if ('\\' == string[i] && '\\' == string[i + 1]) + i++; + *c++ = string[i++]; + } + + /* Append terminating bytes. */ + if (1 < append) + *c++ = '\n'; + + *c = '\0'; + entry->val.sz = (int)(c - entry->val.p); +} + static const char * roff_getstrn(struct roff *r, const char *name, size_t len, int *deftype) { const struct roffkv *n; int found, i; + const struct roff_entry *entry; enum roff_tok tok; found = 0; - for (n = r->strtab; n != NULL; n = n->next) { - if (strncmp(name, n->key.p, len) != 0 || - n->key.p[len] != '\0' || n->val.p == NULL) - continue; + entry = roff_strhash_find(r->strtab, name, len); + if (entry != NULL && entry->val.p != NULL) { if (*deftype & ROFFDEF_USER) { *deftype = ROFFDEF_USER; - return n->val.p; + return entry->val.p; } else { found = 1; - break; } } for (n = r->rentab; n != NULL; n = n->next) { @@ -4265,8 +4396,8 @@ roff_getstrn(struct roff *r, const char *name, size_t len, /* Using an undefined string defines it to be empty. */ - roff_setstrn(&r->strtab, name, len, "", 0, 0); roff_setstrn(&r->rentab, name, len, NULL, 0, 0); + roff_setentry(r->strtab, name, len, "", 0, 0); } *deftype = 0; @@ -4286,6 +4417,13 @@ roff_freestr(struct roffkv *r) } } +static void +roff_free_entry(struct roff_entry *entry) +{ + free(entry->val.p); + free(entry); +} + /* --- accessors and utility functions ------------------------------------ */ /* diff --git a/roff_int.h b/roff_int.h index a26afa9..b18a782 100644 --- a/roff_int.h +++ b/roff_int.h @@ -19,6 +19,7 @@ */ struct ohash; +struct roff_entry; struct roff_node; struct roff_meta; struct roff; @@ -82,6 +83,11 @@ struct ohash *roffhash_alloc(enum roff_tok, enum roff_tok); enum roff_tok roffhash_find(struct ohash *, const char *, size_t); void roffhash_free(struct ohash *); +struct ohash *roff_strhash_alloc(void); +struct roff_entry*roff_strhash_find(struct ohash *htab, const char *name, size_t sz); +void roff_strhash_insert(struct ohash *htab, struct roff_entry *entry); +void roff_strhash_free(struct ohash *htab); + enum mandoc_esc roff_escape(const char *, const int, const int, int *, int *, int *, int *, int *); void roff_state_reset(struct roff_man *); -- 2.42.1 -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hello, In this patch set I aimed to improve the performance of makewhatis. On my Chimera Linux systems makewhatis -Tutf8 is run to rebuild the index whenever a package containing man pages is installed or updated. I noticed this took multiple seconds even on a Ryzen 9 7950X system. Profiling showed a lot of time spent in the loops of roff_getstrn. To speed this up I made use of hash tables. I tested the changes on several different systems: - OpenBSD amd64 in VM - Linux on Raspberry Pi 4 - Linux on Ryzen 9 7950X In each case the wall clock run time for makewhatis -Tutf8 -n showed an reduction of between 25% and 35%. The one caveat of this change is the roff/while/into regression test is failing with my changes. I've tried to work out why but I don't know roff well enough understand the intent of the test nor have I been able to clearly determine why my changes affect it. I was hoping to get some help with this. Regards, Wes -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
On 10/23/23 14:23, Ingo Schwarze wrote:
> Does that patch work for you as well as your own?
Yes, thanks, it works for tzfile(5).
--
To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hi Paul, Paul Eggert wrote on Sun, Oct 22, 2023 at 05:41:28PM -0700: > On 2023-10-22 14:06, Ingo Schwarze wrote: >> mandoc only supports ASCII strings as arguments to \w, not escape >> sequences or formatting instructions. > For the TZDB man pages mandoc need not support all that, just \(bu. Thank you for identifying a subset of the functionality that is both useful and feasible. That's so much better than the TODO item for \w i have lying around - that one is much less important than what you suggested but horrifically difficult. > Just to make sure we're on the same page, I reproduced the problem by > running the command "mandoc -man -Tascii t.5", where t.5 contains the > following lines: > > .TH tzfile 5 > .SH NAME > .IP \(bu "\w'\(bu 'u" > xxx > .PP > yyy > > The output should contain two spaces between the bullet's "o" and the > "x", but with current mandoc it contains five spaces. Yes, that matches my understanding. > Proposed mandoc patch attached. This isn't a perfect emulation of groff, In this case, that's a virtue, even though in many other cases, compatibility with groff is indeed among the main goals. > nor have I tested with fancy constructs, Not a major roadblock; such testing becomes significantly easier with the existing test suite, and by adding to it. > but it should be good enough for tzfile(5). And likely for some other pages, too. Regarding the implementation: I did not like that you did most of the work in roff_escape.c. That function is called each and every time any parser or formatter wants to deal with any escape sequence. Your patch did additional work for almost every kind of escape sequence, even though \w is the only escape sequence needing that kind of work. On top of that, while changing the internal API mandoc_escape(3) is not unheard of and can be done when it is really important, the need for an API break did not seem very urgent to me here. Consequently, i chose to do all the work at one local place in the internal function roff_expand(). Even though that requires iterating the argument of every \w escape sequence twice, it causes less work regarding the grand total because \w is a rare escape sequence, occurring much less frequently than other sequences, and even for \w relevant extra work is only done if the argument contains embedded escape sequences, which will occur still less frequently. Encountering long, complicated, deeply nested esacpe sequences inside \w would definitely be unusual. Doing extra work for every parsing and every formatting of each and every escape sequence sounds clearly worse. Besides, roff_expand() is really the place where the logic belongs. The function roff_escape() is a parser. Its job is to figure out where the various syntax elements (sequence, name, argument) begin and end and what the class of the sequence is. Doing calculations bases on the content of the argument is out of scope there, and considering that it's a parser, doing calculations related to formatting even more so. Calculating the information needed for interpolation really belongs into the string interpolation function roff_expand(), and given that \w in particular is related to formatting, the normal internal ESCAPE_* API that the formatters use for such purposes should be used here, too. Potentially, that might also make future refinements of the functionality easier: the more this code already resenmbles typical formatter code, the better. To summarize, i committed the following patch. Does that patch work for you as well as your own? Yours, Ingo Log Message: ----------- Support some escape sequences, in particular character escape sequences, inside \w arguments, and skip most other escape sequences when measuring the output length in this way because most escape sequences contribute little or nothing to text width: for example, consider font escapes in terminal output. This implementation is very rudimentary. In particular, it assumes that every character has the same width. No attempt is made to detect double-width or zero-width Unicode characters or to take dependencies on output devices or fonts into account. These limitations are hard to avoid because mandoc has to interpolate \w at the parsing stage when the output device is not yet known. I really do not want the content of the syntax tree to depend on the output device. Feature requested by Paul <Eggert at cs dot ucla dot edu>, who also submitted a patch, but i chose to commit this very different patch with almost the same functionality. His input was still very valuable because complete support for \w is out of the question, and consequently, the main task is identifying subsets of the feature that are needed for real-world manual pages and can be supported without uprooting the whole forest. Modified Files: -------------- mandoc: roff.7 roff.c mandoc/regress/roff/esc: w.in w.out_ascii w.out_lint Revision Data ------------- Index: roff.7 =================================================================== RCS file: /home/cvs/mandoc/mandoc/roff.7,v retrieving revision 1.120 retrieving revision 1.121 diff -Lroff.7 -Lroff.7 -u -p -r1.120 -r1.121 --- roff.7 +++ roff.7 @@ -1,6 +1,6 @@ .\" $Id$ .\" -.\" Copyright (c) 2010-2019, 2022 Ingo Schwarze <schwarze@openbsd.org> +.\" Copyright (c) 2010-2019, 2022-2023 Ingo Schwarze <schwarze@openbsd.org> .\" Copyright (c) 2010, 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv> .\" .\" Permission to use, copy, modify, and distribute this software for any @@ -2224,7 +2224,8 @@ The .Xr mandoc 1 implementation assumes that after expansion of user-defined strings, the .Ar string -only contains normal characters, no escape sequences, and that each +only contains normal characters, characters expressed as escape sequences, +and zero-width escape sequences, and that each character has a width of 24 basic units. .It Ic \eX\(aq Ns Ar string Ns Ic \(aq Output Index: roff.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/roff.c,v retrieving revision 1.398 retrieving revision 1.399 diff -Lroff.c -Lroff.c -u -p -r1.398 -r1.399 --- roff.c +++ roff.c @@ -1,6 +1,6 @@ /* $Id$ */ /* - * Copyright (c) 2010-2015, 2017-2022 Ingo Schwarze <schwarze@openbsd.org> + * Copyright (c) 2010-2015, 2017-2023 Ingo Schwarze <schwarze@openbsd.org> * Copyright (c) 2008-2012, 2014 Kristaps Dzonsons <kristaps@bsd.lv> * * Permission to use, copy, modify, and distribute this software for any @@ -1362,6 +1362,7 @@ roff_expand(struct roff *r, struct buf * const char *res; /* the string to be pasted */ const char *src; /* source for copying */ char *dst; /* destination for copying */ + enum mandoc_esc subtype; /* return value from roff_escape */ int iesc; /* index of leading escape char */ int inam; /* index of the escape name */ int iarg; /* index beginning the argument */ @@ -1551,8 +1552,34 @@ roff_expand(struct roff *r, struct buf * res = ubuf; break; case 'w': - (void)snprintf(ubuf, sizeof(ubuf), - "%d", (iendarg - iarg) * 24); + rsz = 0; + subtype = ESCAPE_UNDEF; + while (iarg < iendarg) { + asz = subtype == ESCAPE_SKIPCHAR ? 0 : 1; + if (buf->buf[iarg] != '\\') { + rsz += asz; + iarg++; + continue; + } + switch ((subtype = roff_escape(buf->buf, 0, + iarg, NULL, NULL, NULL, NULL, &iarg))) { + case ESCAPE_SPECIAL: + case ESCAPE_NUMBERED: + case ESCAPE_UNICODE: + case ESCAPE_OVERSTRIKE: + case ESCAPE_UNDEF: + break; + case ESCAPE_DEVICE: + asz *= 8; + break; + case ESCAPE_EXPAND: + abort(); + default: + continue; + } + rsz += asz; + } + (void)snprintf(ubuf, sizeof(ubuf), "%d", rsz * 24); res = ubuf; break; default: Index: w.out_ascii =================================================================== RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.out_ascii,v retrieving revision 1.3 retrieving revision 1.4 diff -Lregress/roff/esc/w.out_ascii -Lregress/roff/esc/w.out_ascii -u -p -r1.3 -r1.4 --- regress/roff/esc/w.out_ascii +++ regress/roff/esc/w.out_ascii @@ -8,6 +8,13 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN character: 24 blank: 24 text: 96 + special: 24 + numbered: 24 + Unicode: 24 + overstrike: 24 + undefined: 24 + zero-width: 0 + skipchar: 48 A\bAr\brg\bgu\bum\bme\ben\bnt\bt d\bde\bel\bli\bim\bmi\bit\bte\ber\brs\bs unsupported \r: 24u @@ -27,4 +34,4 @@ D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN overstrike: 24u unterminated: 72 -OpenBSD June 8, 2022 OpenBSD +OpenBSD October 23, 2023 OpenBSD Index: w.out_lint =================================================================== RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.out_lint,v retrieving revision 1.7 retrieving revision 1.8 diff -Lregress/roff/esc/w.out_lint -Lregress/roff/esc/w.out_lint -u -p -r1.7 -r1.8 --- regress/roff/esc/w.out_lint +++ regress/roff/esc/w.out_lint @@ -1,4 +1,5 @@ -mandoc: w.in:17:20: UNSUPP: unsupported escape sequence: \r -mandoc: w.in:17:23: UNSUPP: unsupported escape sequence: \r -mandoc: w.in:23:16: WARNING: undefined escape, printing literally: \G -mandoc: w.in:51:15: ERROR: incomplete escape sequence: \w'foo +mandoc: w.in:25:15: WARNING: undefined escape, printing literally: \G +mandoc: w.in:31:20: UNSUPP: unsupported escape sequence: \r +mandoc: w.in:31:23: UNSUPP: unsupported escape sequence: \r +mandoc: w.in:37:16: WARNING: undefined escape, printing literally: \G +mandoc: w.in:65:15: ERROR: incomplete escape sequence: \w'foo Index: w.in =================================================================== RCS file: /home/cvs/mandoc/mandoc/regress/roff/esc/w.in,v retrieving revision 1.3 retrieving revision 1.4 diff -Lregress/roff/esc/w.in -Lregress/roff/esc/w.in -u -p -r1.3 -r1.4 --- regress/roff/esc/w.in +++ regress/roff/esc/w.in @@ -1,4 +1,4 @@ -.\" $OpenBSD: w.in,v 1.4 2022/06/08 13:08:00 schwarze Exp $ +.\" $OpenBSD: w.in,v 1.5 2023/10/23 20:07:19 schwarze Exp $ .Dd $Mdocdate$ .Dt ESC-W 1 .Os @@ -13,6 +13,20 @@ character: \w'n' blank: \w' ' .br text: \w'text' +.br +special: \w'\(bu' +.br +numbered: \w'\N'100'' +.br +Unicode: \w'\[u2013]' +.br +overstrike: \w'\o'ab'' +.br +undefined: \w'\G' +.br +zero-width: \w'\fB\&\fP' +.br +skipchar: \w'a\zb\z\(buc' .Ss Argument delimiters unsupported \er: \w\rM\ru .br -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hi Paul,
Paul Eggert wrote on Sun, Oct 22, 2023 at 05:22:48PM -0700:
> There's a typo in mandoc_escape.3: it refers to an internal function
> that has been renamed. Proposed patch attached.
Thank you, committed.
Nice to see the internal documentation is actually being used. :-)
Yours,
Ingo
Log Message:
-----------
update the internal function name roff_res() to roff_expand(),
it was changed some time ago;
patch from Paul <Eggert at cs dot ucla dot edu>
Modified Files:
--------------
mandoc:
mandoc_escape.3
Revision Data
-------------
Index: mandoc_escape.3
===================================================================
RCS file: /home/cvs/mandoc/mandoc/mandoc_escape.3,v
retrieving revision 1.4
retrieving revision 1.5
diff -Lmandoc_escape.3 -Lmandoc_escape.3 -u -p -r1.4 -r1.5
--- mandoc_escape.3
+++ mandoc_escape.3
@@ -100,7 +100,7 @@ width measurements
and numerical expression control
.Ic \eB .
These are handled by
-.Fn roff_res ,
+.Fn roff_expand ,
a private preprocessor function called from
.Fn roff_parseln ,
see the file
--
To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 2119 bytes --] Hi Branden, On Mon, Oct 23, 2023 at 03:30:59AM -0500, G. Branden Robinson wrote: > Hi Paul, > > At 2023-10-22T17:41:28-0700, Paul Eggert wrote: > > On 2023-10-22 14:06, Ingo Schwarze wrote: > > > mandoc only supports ASCII strings as arguments to \w, not escape > > > sequences or formatting instructions. > > > > For the TZDB man pages mandoc need not support all that, just \(bu. > > > > Just to make sure we're on the same page, I reproduced the problem by > > running the command "mandoc -man -Tascii t.5", where t.5 contains the > > following lines: > > > > .TH tzfile 5 > > .SH NAME > > .IP \(bu "\w'\(bu 'u" > > xxx > > .PP > > yyy > > At the risk of being simplistic, why not just give `IP` an explicit > measurement as an argument? > > .IP \(bu 2n He feels that IP \(bu 3n is too long of a space in PDF. "\w'\(bu 'u" has the benefit of being 3n in terminals, but shorter in PDF. This was triggered after my suggestion of using 3[n] instead of 2[n] to clearly separate the bullet from the bulleted text, as docuemented in man-pages(7). Cheers, Alex > > (Or 3n, or 4n, or whatever looks best to you.) > > > The output should contain two spaces between the bullet's "o" and the > > "x", but with current mandoc it contains five spaces. > > If you're viewing on a terminal, `.IP \(bu 3n` should achieve this.[1] > > (Typesetters are a different story because how wide a bullet is depends > on the output device and the font.) > > I'm not saying that better mandoc(1) support for `\w` would be an awful > thing to have, but it doesn't seem necessary, to me, to solve this > specific problem. > > Regards, > Branden > > [1] Strictly, you can leave the "n" off, but I consider that slightly > sloppy, and I think that the explicit scaling unit is also helpful > as a reminder to the man page author that `IP`'s second argument, > unlike most arguments to man(7) macros, will _not_ be formatted as > text. I might take that. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #1: Type: text/plain, Size: 781 bytes --] On 2023-10-22 14:06, Ingo Schwarze wrote: > mandoc only supports > ASCII strings as arguments to \w, not escape sequences or formatting > instructions. For the TZDB man pages mandoc need not support all that, just \(bu. Just to make sure we're on the same page, I reproduced the problem by running the command "mandoc -man -Tascii t.5", where t.5 contains the following lines: .TH tzfile 5 .SH NAME .IP \(bu "\w'\(bu 'u" xxx .PP yyy The output should contain two spaces between the bullet's "o" and the "x", but with current mandoc it contains five spaces. Proposed mandoc patch attached. This isn't a perfect emulation of groff, nor have I tested with fancy constructs, but it should be good enough for tzfile(5). [-- Attachment #2: mandoc-tzfile-fix.txt --] [-- Type: text/plain, Size: 5541 bytes --] Index: roff.c =================================================================== RCS file: /cvs/mandoc/roff.c,v retrieving revision 1.398 diff -u -r1.398 roff.c --- roff.c 22 Oct 2023 16:02:01 -0000 1.398 +++ roff.c 22 Oct 2023 20:59:52 -0000 @@ -1367,6 +1367,7 @@ int iarg; /* index beginning the argument */ int iendarg; /* index right after the argument */ int iend; /* index right after the sequence */ + int icols; /* output columns of sequence */ int isrc, idst; /* to reduce \\ and \. in names */ int deftype; /* type of definition to paste */ int argi; /* macro argument index */ @@ -1404,7 +1405,7 @@ */ if (roff_escape(buf->buf, ln, pos, &iesc, &inam, - &iarg, &iendarg, &iend) != ESCAPE_EXPAND) { + &iarg, &iendarg, &iend, &icols) != ESCAPE_EXPAND) { while (pos < iend) { if (buf->buf[pos] == ec) { buf->buf[pos] = '\\'; @@ -1552,7 +1553,7 @@ break; case 'w': (void)snprintf(ubuf, sizeof(ubuf), - "%d", (iendarg - iarg) * 24); + "%d", icols * 24); res = ubuf; break; default: @@ -4030,7 +4031,7 @@ if (cp[1] == '{' || cp[1] == '}') break; if (roff_escape(cp, 0, 0, NULL, &inam, - NULL, NULL, &iend) != ESCAPE_UNDEF) { + NULL, NULL, &iend, NULL) != ESCAPE_UNDEF) { mandoc_msg(MANDOCERR_NAMESC, ln, pos, "%.*s%.*s", namesz, name, iend, cp); cp += iend; Index: roff_escape.c =================================================================== RCS file: /cvs/mandoc/roff_escape.c,v retrieving revision 1.14 diff -u -r1.14 roff_escape.c --- roff_escape.c 8 Jun 2022 13:23:57 -0000 1.14 +++ roff_escape.c 22 Oct 2023 20:59:52 -0000 @@ -42,7 +42,7 @@ enum mandoc_esc rval; rval = roff_escape(--*rendarg, 0, 0, - NULL, NULL, &iarg, &iendarg, &iend); + NULL, NULL, &iarg, &iendarg, &iend, NULL); assert(rval != ESCAPE_EXPAND); if (rarg != NULL) *rarg = *rendarg + iarg; @@ -64,14 +64,16 @@ */ enum mandoc_esc roff_escape(const char *buf, const int ln, const int aesc, - int *resc, int *rnam, int *rarg, int *rendarg, int *rend) + int *resc, int *rnam, int *rarg, int *rendarg, int *rend, int *rcols) { int iesc; /* index of leading escape char */ int inam; /* index of escape name */ int iarg; /* index beginning the argument */ int iendarg; /* index right after the argument */ int iend; /* index right after the sequence */ - int sesc, snam, sarg, sendarg, send; /* for sub-escape */ + int icols; /* column width of sequence */ + int sesc, snam, sarg, sendarg, send, scols; + /* for sub-escape */ int escterm; /* whether term is escaped */ int maxl; /* expected length of the argument */ int argl; /* actual length of the argument */ @@ -98,6 +100,7 @@ */ iarg = iendarg = iend = inam + 1; + icols = 0; maxl = INT_MAX; term = '\0'; err = MANDOCERR_OK; @@ -141,11 +144,13 @@ case '\'': case '-': case '0': - case ':': case '_': case '`': case 'e': case '~': + icols++; + /* FALLTHROUGH */ + case ':': iarg--; argl = 1; rval = ESCAPE_SPECIAL; @@ -179,6 +184,7 @@ break; case '(': case '[': + icols++; rval = ESCAPE_SPECIAL; iendarg = iend = --iarg; break; @@ -208,6 +214,7 @@ term = '\b'; break; case 'C': + icols++; rval = ESCAPE_SPECIAL; term = '\b'; break; @@ -224,6 +231,7 @@ term = '\b'; break; case 'o': + icols++; rval = ESCAPE_OVERSTRIKE; term = '\b'; break; @@ -271,7 +279,7 @@ if ((term == '\b' || (term == '\0' && maxl == INT_MAX)) && buf[iarg] == buf[iesc]) { stype = roff_escape(buf, ln, iendarg, - &sesc, &snam, &sarg, &sendarg, &send); + &sesc, &snam, &sarg, &sendarg, &send, &scols); if (stype == ESCAPE_EXPAND) goto out_sub; } @@ -285,11 +293,13 @@ buf[snam]) != NULL) { err = MANDOCERR_ESC_DELIM; iend = send; + icols += scols; iarg = iendarg = sesc; goto out; } escterm = 1; iarg = send; + icols += scols; term = buf[snam]; } else if (strchr("BDHLRSvxNhl", buf[inam]) != NULL && strchr(" %&()*+-./0123456789:<=>", buf[iarg]) != NULL) { @@ -347,10 +357,11 @@ } if (buf[iendarg] == buf[iesc]) { stype = roff_escape(buf, ln, iendarg, - &sesc, &snam, &sarg, &sendarg, &send); + &sesc, &snam, &sarg, &sendarg, &send, &scols); if (stype == ESCAPE_EXPAND) goto out_sub; iend = send; + icols += scols; if (escterm == 1 && (buf[snam] == term || buf[inam] == 'N')) break; @@ -366,6 +377,8 @@ valid_A = 0; if (maxl != INT_MAX) maxl--; + if (term == '\'') + icols++; iend = ++iendarg; } } @@ -502,6 +515,7 @@ iarg = sarg; iendarg = sendarg; iend = send; + icols = scols; rval = ESCAPE_EXPAND; out: @@ -515,6 +529,8 @@ *rendarg = iendarg; if (rend != NULL) *rend = iend; + if (rcols != NULL) + *rcols = icols; if (ln == 0) return rval; Index: roff_int.h =================================================================== RCS file: /cvs/mandoc/roff_int.h,v retrieving revision 1.20 diff -u -r1.20 roff_int.h --- roff_int.h 2 Jun 2022 11:29:07 -0000 1.20 +++ roff_int.h 22 Oct 2023 20:59:52 -0000 @@ -83,7 +83,7 @@ void roffhash_free(struct ohash *); enum mandoc_esc roff_escape(const char *, const int, const int, - int *, int *, int *, int *, int *); + int *, int *, int *, int *, int *, int *); void roff_state_reset(struct roff_man *); void roff_validate(struct roff_man *);
[-- Attachment #1: Type: text/plain, Size: 117 bytes --] There's a typo in mandoc_escape.3: it refers to an internal function that has been renamed. Proposed patch attached. [-- Attachment #2: mandoc_escape.3.diff --] [-- Type: text/x-patch, Size: 483 bytes --] Index: mandoc_escape.3 =================================================================== RCS file: /cvs/mandoc/mandoc_escape.3,v retrieving revision 1.4 diff -u -r1.4 mandoc_escape.3 --- mandoc_escape.3 4 Jul 2017 23:40:01 -0000 1.4 +++ mandoc_escape.3 22 Oct 2023 20:59:51 -0000 @@ -100,7 +100,7 @@ and numerical expression control .Ic \eB . These are handled by -.Fn roff_res , +.Fn roff_expand , a private preprocessor function called from .Fn roff_parseln , see the file
[-- Attachment #1: Type: text/plain, Size: 21184 bytes --] Oops, I should have CCd tech@, not discuss@. On Sun, Oct 22, 2023 at 12:58:17PM +0200, Alejandro Colomar wrote: > Hi Paul, > > On Sat, Oct 21, 2023 at 05:36:03PM -0700, Paul Eggert wrote: > > On 2023-10-18 04:16, Alejandro Colomar wrote: > > > > > I guess using '\[bu]' (or '\(bu') might not be appropriate for the tz > > > project, as it may be less portable to old systems > > > > We should be ok with \(bu even with older nroff; I just checked with Solaris > > 10 nroff and it generated an ASCII 'o', same as GNU/Linux. > > > > > > > The change would be to use '.IP * 3' instead of '.IP * 2'. 3 is for the > > > width of '*' +2. > > > > 3 is equivalent to 3n, which is too much for PDF output. I installed the > > attached, which uses \w'\(bu 'u instead of 3. This is equivalent for to 3 > > for nroff, and has a better look for troff. > > Hmm, interesting trick. To me, 3 doesn't look so excessive; it's at the > verge of being it, but still acceptable, but can understand your > preference. > > The only downside I see to your approach (appart from making the source > more complex, but I'm willing to pay that for better output), is that > mandoc(1) doesn't seem to understand that, and falls back to the default > indentation, which is even more excessive. > > Maybe we can report the problem to the maintainers of mandoc(1), > although I'm not sure how much they'll enjoy such low level roff(7) > stuff. > > > > > This patch also cleans up some of the other indenting irregularities in TZDB > > man pages; for example, plain ".TP" is good enough, and we don't need ".TP > > 10". Hope it works for you. > > Thanks! LGTM. > > > From 4a577028c65c70f6a85726ee34d307ee4a51b24d Mon Sep 17 00:00:00 2001 > > From: Paul Eggert <eggert@cs.ucla.edu> > > Date: Sat, 21 Oct 2023 16:10:29 -0700 > > Subject: [PROPOSED] Be more systematic about man page indenting > > > > This responds to a suggestion by Alejandro Colomar in: > > https://mm.icann.org/pipermail/tz/2023-October/033116.html > > --- > > newtzset.3 | 6 ++-- > > time2posix.3 | 2 +- > > tzfile.5 | 86 +++++++++++++++++++++++++++++----------------------- > > zdump.8 | 5 ++- > > zic.8 | 48 ++++++++++++++--------------- > > 5 files changed, 78 insertions(+), 69 deletions(-) > > > > diff --git a/newtzset.3 b/newtzset.3 > > index 80617cd7..45ddbd24 100644 > > --- a/newtzset.3 > > +++ b/newtzset.3 > > @@ -115,7 +115,7 @@ it must have the following syntax (spaces inserted for clarity): > > .PP > > Where: > > .RS > > -.TP 15 > > +.TP > > .IR std " and " dst > > Three or more bytes that are the designation for the standard > > .RI ( std ) > > @@ -205,7 +205,7 @@ The format of > > .I date > > is one of the following: > > .RS > > -.TP 10 > > +.TP > > .BI J n > > The Julian day > > .I n > > @@ -238,7 +238,7 @@ first week in which the > > .IR d' th > > day occurs. Day zero is Sunday. > > .RE > > -.IP "" 15 > > +.IP > > The > > .I time > > has the same format as > > diff --git a/time2posix.3 b/time2posix.3 > > index f48402b9..6644060a 100644 > > --- a/time2posix.3 > > +++ b/time2posix.3 > > @@ -100,7 +100,7 @@ and back from, > > the POSIX representation over the leap second inserted at the end of June, > > 1993. > > .nf > > -.ta \w'93/06/30 'u +\w'23:59:59 'u +\w'A+0 'u +\w'X=time2posix(T) 'u > > +.ta \w'93/06/30\0'u +\w'23:59:59\0'u +\w'A+0\0'u +\w'X=time2posix(T)\0'u > > DATE TIME T X=time2posix(T) posix2time(X) > > 93/06/30 23:59:59 A+0 B+0 A+0 > > 93/06/30 23:59:60 A+1 B+1 A+1 or A+2 > > diff --git a/tzfile.5 b/tzfile.5 > > index 59d9f6ba..55280282 100644 > > --- a/tzfile.5 > > +++ b/tzfile.5 > > @@ -26,23 +26,24 @@ a signed binary integer is represented using two's complement, > > and a boolean is represented by a one-byte binary integer that is > > either 0 (false) or 1 (true). > > The format begins with a 44-byte header containing the following fields: > > -.IP * 2 > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > The magic four-byte ASCII sequence > > .q "TZif" > > identifies the file as a timezone information file. > > -.IP * > > +.IP \(bu > > A byte identifying the version of the file's format > > (as of 2021, either an ASCII NUL, > > .q "2", > > .q "3", > > or > > .q "4" ). > > -.IP * > > +.IP \(bu > > Fifteen bytes containing zeros reserved for future use. > > -.IP * > > +.IP \(bu > > Six four-byte integer values, in the following order: > > -.RS > > -.TP > > +.RS "\w' \(bu 'u" > > +.TP "\w' 'u" > > .B tzh_ttisutcnt > > The number of UT/local indicators stored in the file. > > (UT is Universal Time.) > > @@ -68,14 +69,15 @@ stored in the file. > > .PP > > The above header is followed by the following fields, whose lengths > > depend on the contents of the header: > > -.IP * 2 > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > .B tzh_timecnt > > four-byte signed integer values sorted in ascending order. > > These values are written in network byte order. > > Each is used as a transition time (as returned by > > .BR time (2)) > > at which the rules for computing local time change. > > -.IP * > > +.IP \(bu > > .B tzh_timecnt > > one-byte unsigned integer values; > > each one but the last tells which of the different types of local time types > > @@ -85,20 +87,20 @@ and continuing up to but not including the next transition time. > > (The last time type is present only for consistency checking with the > > POSIX-style TZ string described below.) > > These values serve as indices into the next field. > > -.IP * > > +.IP \(bu > > .B tzh_typecnt > > .B ttinfo > > entries, each defined as follows: > > -.in +.5i > > +.in +2 > > Hmm, I think I'll take this. The Linux man-pages currently use .in +4n > for examples, but maybe I can simplify to just +4 without the 'n'. I'll > check if the PDF isn't excessive. > > Cheers, > Alex > > > .sp > > .nf > > -.ta .5i +\w'unsigned char\0\0'u > > +.ta \w'\0\0\0\0'u +\w'unsigned char\0'u > > struct ttinfo { > > int32_t tt_utoff; > > unsigned char tt_isdst; > > unsigned char tt_desigidx; > > }; > > -.in -.5i > > +.in > > .fi > > .sp > > Each structure is written as a four-byte signed integer value for > > @@ -132,7 +134,8 @@ Also, in realistic applications > > is in the range [\-89999, 93599] (i.e., more than \-25 hours and less > > than 26 hours); this allows easy support by implementations that > > already support the POSIX-required range [\-24:59:59, 25:59:59]. > > -.IP * > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > .B tzh_charcnt > > bytes that represent time zone designations, > > which are null-terminated byte strings, each indexed by the > > @@ -140,7 +143,7 @@ which are null-terminated byte strings, each indexed by the > > values mentioned above. > > The byte strings can overlap if one is a suffix of the other. > > The encoding of these strings is not specified. > > -.IP * > > +.IP \(bu > > .B tzh_leapcnt > > pairs of four-byte values, written in network byte order; > > the first value of each pair gives the nonnegative time > > @@ -167,18 +170,19 @@ otherwise, for timestamps before the first occurrence time, > > the leap-second correction is zero if the first pair's correction is 1 or \-1, > > and is unspecified otherwise (which can happen only in files > > truncated at the start). > > -.IP * > > +.IP \(bu > > .B tzh_ttisstdcnt > > standard/wall indicators, each stored as a one-byte boolean; > > they tell whether the transition times associated with local time types > > were specified as standard time or local (wall clock) time. > > -.IP * > > +.IP \(bu > > .B tzh_ttisutcnt > > UT/local indicators, each stored as a one-byte boolean; > > they tell whether the transition times associated with local time types > > were specified as UT or local time. > > If a UT/local indicator is set, the corresponding standard/wall indicator > > must also be set. > > +.RE > > .PP > > The standard/wall and UT/local indicators were designed for > > transforming a TZif file's transition times into transitions appropriate > > @@ -312,15 +316,17 @@ This section documents common problems in reading or writing TZif files. > > Most of these are problems in generating TZif files for use by > > older readers. > > The goals of this section are: > > -.IP * 2 > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > to help TZif writers output files that avoid common > > pitfalls in older or buggy TZif readers, > > -.IP * > > +.IP \(bu > > to help TZif readers avoid common pitfalls when reading > > files generated by future TZif writers, and > > -.IP * > > +.IP \(bu > > to help any future specification authors see what sort of > > problems arise when the TZif format is changed. > > +.RE > > .PP > > When new versions of the TZif format have been defined, a > > design goal has been that a reader can successfully use a TZif > > @@ -335,21 +341,22 @@ workarounds, as well as to document other common bugs in > > readers. > > .PP > > Interoperability problems with TZif include the following: > > -.IP * 2 > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > Some readers examine only version 1 data. > > As a partial workaround, a writer can output as much version 1 > > data as possible. > > However, a reader should ignore version 1 data, and should use > > version 2+ data even if the reader's native timestamps have only > > 32 bits. > > -.IP * > > +.IP \(bu > > Some readers designed for version 2 might mishandle > > timestamps after a version 3 or higher file's last transition, because > > they cannot parse extensions to POSIX in the TZ-like string. > > As a partial workaround, a writer can output more transitions > > than necessary, so that only far-future timestamps are > > mishandled by version 2 readers. > > -.IP * > > +.IP \(bu > > Some readers designed for version 2 do not support > > permanent daylight saving time with transitions after 24:00 > > \(en e.g., a TZ string > > @@ -367,22 +374,22 @@ for the next time zone east \(en e.g., > > .q "AST4" > > for permanent > > Atlantic Standard Time (\-04). > > -.IP * > > +.IP \(bu > > Some readers designed for version 2 or 3, and that require strict > > conformance to RFC 8536, reject version 4 files whose leap second > > tables are truncated at the start or that end in expiration times. > > -.IP * > > +.IP \(bu > > Some readers ignore the footer, and instead predict future > > timestamps from the time type of the last transition. > > As a partial workaround, a writer can output more transitions > > than necessary. > > -.IP * > > +.IP \(bu > > Some readers do not use time type 0 for timestamps before > > the first transition, in that they infer a time type using a > > heuristic that does not always select time type 0. > > As a partial workaround, a writer can output a dummy (no-op) > > first transition at an early time. > > -.IP * > > +.IP \(bu > > Some readers mishandle timestamps before the first > > transition that has a timestamp not less than \-2**31. > > Readers that support only 32-bit timestamps are likely to be > > @@ -391,11 +398,11 @@ more prone to this problem, for example, when they process > > bits. > > As a partial workaround, a writer can output a dummy > > transition at timestamp \-2**31. > > -.IP * > > +.IP \(bu > > Some readers mishandle a transition if its timestamp has > > the minimum possible signed 64-bit value. > > Timestamps less than \-2**59 are not recommended. > > -.IP * > > +.IP \(bu > > Some readers mishandle POSIX-style TZ strings that > > contain > > .q "<" > > @@ -407,11 +414,11 @@ or > > .q ">" > > for time zone abbreviations containing only alphabetic > > characters. > > -.IP * > > +.IP \(bu > > Many readers mishandle time zone abbreviations that contain > > non-ASCII characters. > > These characters are not recommended. > > -.IP * > > +.IP \(bu > > Some readers may mishandle time zone abbreviations that > > contain fewer than 3 or more than 6 characters, or that > > contain ASCII characters other than alphanumerics, > > @@ -419,7 +426,7 @@ contain ASCII characters other than alphanumerics, > > and > > .q "+". > > These abbreviations are not recommended. > > -.IP * > > +.IP \(bu > > Some readers mishandle TZif files that specify > > daylight-saving time UT offsets that are less than the UT > > offsets for the corresponding standard time. > > @@ -435,7 +442,7 @@ thus swapping standard and daylight saving time. > > Although this workaround misidentifies which part of the year > > uses daylight saving time, it records UT offsets and time zone > > abbreviations correctly. > > -.IP * > > +.IP \(bu > > Some readers generate ambiguous timestamps for positive leap seconds > > that occur when the UTC offset is not a multiple of 60 seconds. > > For example, in a timezone with UTC offset +01:23:45 and with > > @@ -446,38 +453,41 @@ instead of mapping the latter to 01:23:46, and they will map 78796815 to > > This has not yet been a practical problem, since no civil authority > > has observed such UTC offsets since leap seconds were > > introduced in 1972. > > +.RE > > .PP > > Some interoperability problems are reader bugs that > > are listed here mostly as warnings to developers of readers. > > -.IP * 2 > > +.RS "\w' 'u" > > +.IP \(bu "\w'\(bu 'u" > > Some readers do not support negative timestamps. > > Developers of distributed applications should keep this > > in mind if they need to deal with pre-1970 data. > > -.IP * > > +.IP \(bu > > Some readers mishandle timestamps before the first > > transition that has a nonnegative timestamp. > > Readers that do not support negative timestamps are likely to > > be more prone to this problem. > > -.IP * > > +.IP \(bu > > Some readers mishandle time zone abbreviations like > > .q "\*-08" > > that contain > > .q "+", > > .q "\*-", > > or digits. > > -.IP * > > +.IP \(bu > > Some readers mishandle UT offsets that are out of the > > traditional range of \-12 through +12 hours, and so do not > > support locations like Kiritimati that are outside this > > range. > > -.IP * > > +.IP \(bu > > Some readers mishandle UT offsets in the range [\-3599, \-1] > > seconds from UT, because they integer-divide the offset by > > 3600 to get 0 and then display the hour part as > > .q "+00". > > -.IP * > > +.IP \(bu > > Some readers mishandle UT offsets that are not a multiple > > of one hour, or of 15 minutes, or of 1 minute. > > +.RE > > .SH SEE ALSO > > .BR time (2), > > .BR localtime (3), > > diff --git a/zdump.8 b/zdump.8 > > index f77c0c79..c3f0bba6 100644 > > --- a/zdump.8 > > +++ b/zdump.8 > > @@ -152,10 +152,9 @@ tabbed columns line up.) > > .nf > > .sp > > .if \n(.g .ft CR > > -.if t .in +.5i > > -.if n .in +2 > > +.in +2 > > .nr w \w'1896-01-13 'u+\n(.i > > -.ta \w'1896-01-13 'u +\w'12:01:26 'u +\w'-103126 'u +\w'HWT 'u > > +.ta \w'1896-01-13\0\0'u +\w'12:01:26\0\0'u +\w'-103126\0\0'u +\w'HWT\0\0'u > > TZ="Pacific/Honolulu" > > - - -103126 LMT > > 1896-01-13 12:01:26 -1030 HST > > diff --git a/zic.8 b/zic.8 > > index 6bcef7ae..a958ddd1 100644 > > --- a/zic.8 > > +++ b/zic.8 > > @@ -95,7 +95,7 @@ as local time. > > .B zic > > will act as if the input contained a link line of the form > > .sp > > -.ti +.5i > > +.ti +2 > > .ta \w'Link\0\0'u +\w'\fItimezone\fP\0\0'u > > Link \fItimezone\fP localtime > > .sp > > @@ -118,7 +118,7 @@ TZ strings like "EET\*-2EEST" that lack transition rules. > > .B zic > > will act as if the input contained a link line of the form > > .sp > > -.ti +.5i > > +.ti +2 > > Link \fItimezone\fP posixrules > > .sp > > If > > @@ -330,19 +330,19 @@ abbreviation must be unambiguous in context. > > .PP > > A rule line has the form > > .nf > > -.ti +.5i > > +.ti +2 > > .ta \w'Rule\0\0'u +\w'NAME\0\0'u +\w'FROM\0\0'u +\w'1973\0\0'u +\w'\*-\0\0'u +\w'Apr\0\0'u +\w'lastSun\0\0'u +\w'2:00w\0\0'u +\w'1:00d\0\0'u > > .sp > > Rule NAME FROM TO \*- IN ON AT SAVE LETTER/S > > .sp > > For example: > > -.ti +.5i > > +.ti +2 > > .sp > > Rule US 1967 1973 \*- Apr lastSun 2:00w 1:00d D > > .sp > > .fi > > The fields that make up a rule line are: > > -.TP "\w'LETTER/S'u" > > +.TP > > .B NAME > > Gives the name of the rule set that contains this line. > > The name must start with a character that is neither > > @@ -404,7 +404,7 @@ Month names may be abbreviated. > > Gives the day on which the rule takes effect. > > Recognized forms include: > > .nf > > -.in +.5i > > +.in +2 > > .sp > > .ta \w'Sun<=25\0\0'u > > 5 the fifth of the month > > @@ -413,7 +413,7 @@ lastMon the last Monday in the month > > Sun>=8 first Sunday on or after the eighth > > Sun<=25 last Sunday on or before the 25th > > .fi > > -.in -.5i > > +.in > > .sp > > A weekday name (e.g., > > .BR "Sunday" ) > > @@ -440,7 +440,7 @@ Gives the time of day at which the rule takes effect, > > relative to 00:00, the start of a calendar day. > > Recognized forms include: > > .nf > > -.in +.5i > > +.in +2 > > .sp > > .ta \w'00:19:32.13\0\0'u > > 2 time in hours > > @@ -454,7 +454,7 @@ Recognized forms include: > > \*-2:30 2.5 hours before 00:00 > > \*- equivalent to 0 > > .fi > > -.in -.5i > > +.in > > .sp > > Although > > .B zic > > @@ -532,18 +532,18 @@ the variable part is null. > > A zone line has the form > > .sp > > .nf > > -.ti +.5i > > +.ti +2 > > .ta \w'Zone\0\0'u +\w'Asia/Amman\0\0'u +\w'STDOFF\0\0'u +\w'Jordan\0\0'u +\w'FORMAT\0\0'u > > Zone NAME STDOFF RULES FORMAT [UNTIL] > > .sp > > For example: > > .sp > > -.ti +.5i > > +.ti +2 > > Zone Asia/Amman 2:00 Jordan EE%sT 2017 Oct 27 01:00 > > .sp > > .fi > > The fields that make up a zone line are: > > -.TP "\w'STDOFF'u" > > +.TP > > .B NAME > > The name of the timezone. > > This is the name used in creating the time conversion information file for the > > @@ -663,15 +663,15 @@ For example: > > .br > > .ne 7 > > .nf > > -.in +2m > > +.in +2 > > .ta \w'# Rule\0\0'u +\w'NAME\0\0'u +\w'FROM\0\0'u +\w'2006\0\0'u +\w'\*-\0\0'u +\w'Oct\0\0'u +\w'lastSun\0\0'u +\w'2:00\0\0'u +\w'SAVE\0\0'u > > .sp > > # Rule NAME FROM TO \*- IN ON AT SAVE LETTER/S > > Rule US 1967 2006 - Oct lastSun 2:00 0 S > > Rule US 1967 1973 - Apr lastSun 2:00 1:00 D > > -.ta \w'Zone\0\0America/Menominee\0\0'u +\w'STDOFF\0\0'u +\w'RULES\0\0'u +\w'FORMAT\0\0'u > > -# Zone\0\0NAME STDOFF RULES FORMAT [UNTIL] > > -Zone\0\0America/Menominee \*-5:00 \*- EST 1973 Apr 29 2:00 > > +.ta \w'# Zone\0\0'u +\w'America/Menominee\0\0'u +\w'STDOFF\0\0'u +\w'RULES\0\0'u +\w'FORMAT\0\0'u > > +# Zone NAME STDOFF RULES FORMAT [UNTIL] > > +Zone America/Menominee \*-5:00 \*- EST 1973 Apr 29 2:00 > > \*-6:00 US C%sT > > .sp > > .in > > @@ -687,13 +687,13 @@ interprets this more sensibly as a single transition from 02:00 CST (\*-05) to > > A link line has the form > > .sp > > .nf > > -.ti +.5i > > +.ti +2 > > .ta \w'Link\0\0'u +\w'Europe/Istanbul\0\0'u > > Link TARGET LINK-NAME > > .sp > > For example: > > .sp > > -.ti +.5i > > +.ti +2 > > Link Europe/Istanbul Asia/Istanbul > > .sp > > .fi > > @@ -717,7 +717,7 @@ For example: > > .sp > > .ne 3 > > .nf > > -.in +2m > > +.in +2 > > .ta \w'Zone\0\0'u +\w'Greenwich\0\0'u > > Link Greenwich G_M_T > > Link Etc/GMT Greenwich > > @@ -737,13 +737,13 @@ The file that describes leap seconds can have leap lines and an > > expiration line. > > Leap lines have the following form: > > .nf > > -.ti +.5i > > +.ti +2 > > .ta \w'Leap\0\0'u +\w'YEAR\0\0'u +\w'MONTH\0\0'u +\w'DAY\0\0'u +\w'HH:MM:SS\0\0'u +\w'CORR\0\0'u > > .sp > > Leap YEAR MONTH DAY HH:MM:SS CORR R/S > > .sp > > For example: > > -.ti +.5i > > +.ti +2 > > .sp > > Leap 2016 Dec 31 23:59:60 + S > > .sp > > @@ -791,13 +791,13 @@ option is used. > > .PP > > The expiration line, if present, has the form: > > .nf > > -.ti +.5i > > +.ti +2 > > .ta \w'Expires\0\0'u +\w'YEAR\0\0'u +\w'MONTH\0\0'u +\w'DAY\0\0'u > > .sp > > Expires YEAR MONTH DAY HH:MM:SS > > .sp > > For example: > > -.ti +.5i > > +.ti +2 > > .sp > > Expires 2020 Dec 28 00:00:00 > > .sp > > @@ -816,7 +816,7 @@ Here is an extended example of > > .B zic > > input, intended to illustrate many of its features. > > .nf > > -.in +2m > > +.in +2 > > .ta \w'# Rule\0\0'u +\w'NAME\0\0'u +\w'FROM\0\0'u +\w'1973\0\0'u +\w'\*-\0\0'u +\w'Apr\0\0'u +\w'lastSun\0\0'u +\w'2:00\0\0'u +\w'SAVE\0\0'u > > .sp > > # Rule NAME FROM TO \*- IN ON AT SAVE LETTER/S > > -- > > 2.39.2 > > > > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Bapt, Ingo Schwarze wrote on Fri, Oct 13, 2023 at 12:57:37PM +0200: > Baptiste Daroussin wrote on Fri, Oct 13, 2023 at 09:06:56AM +0200: >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266882 >> I haven't had time yet to debug, do you actually need more information? I believe this is fixed by the commit appended below. I'm also going to update the FreeBSD bug report. [...] > Having a quick look at your reproducing input file, i suspect that > the following input confuses mandoc: > > .\" ouch! > .ds xx \\*(fP\fR(\fP\\*(lI*\\*(fP > > Notably, you already marked that line with "ouch" in the manual > page source code, presumably acknowledging that doing such low-level > gymnastics in a manual page is asking for trouble. To not handle such > madness gracefully is still a bug in mandoc though, i do not deny that. > > I suspect that using the extremely special escape sequence "\\" > inside .ds is confusing mandoc - that sequence is mostly intended > for being used inside .de. The pre-parser still sees the "\\" and > consequently sees no user-defined string replacement escape sequences. > But when the "xx" string is later used, the "\\*" likely gets resolved > to "\*", i.e. to a string replacement request, but at the point, the > pre-parser has already been run so the string does not get replaced > and makes it through to the formatters, and those cannot handle it. > Or something like that, i will have to investigate in detail. That turned out to be true, as far as it goes. Two aspects are missing from the explanation: 1. Not only is the "xx" string later used, but it is used (1) with delayed expansion (i.e. \\*(xx rather than \*(xx) and (2) inside a macro argument. 2. The macro argument parser already handled delayed expansion. What it failed to handle was recursive expansion where both the outer and the inner expansion were delayed. To summarize, as far as i can see, the assertion only triggers when a manual pages uses delayed expansion recursively inside a macro argument. Note that using expansion inside a manual page is already bad style, even if it is neither recursive nor delayed nor inside a macro argument... :-( Feedback whether the patch works for you is welcome. Thanks again for the report! Ingo Log Message: ----------- When parsing a macro argument results in delayed escape sequence expansion, re-check for all contained escape sequences whether they need delayed expansion, not just for the particular escape sequences that triggered delayed expansion in the first place. This is needed because delayed expansion can result in strings containing nested escape sequences recursively needing delayed expansion, too. This fixes an assertion failure in krb5_openlog(3), see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266882 Thanks to Wolfram Schneider <wosch at FreeBSD> for reporting the bug and to Baptiste Daroussin <bapt at FreeBSD> for forwarding the report. Modified Files: -------------- mandoc: mandoc.h roff.c Revision Data ------------- Index: mandoc.h =================================================================== RCS file: /home/cvs/mandoc/mandoc/mandoc.h,v retrieving revision 1.281 retrieving revision 1.282 diff -Lmandoc.h -Lmandoc.h -u -p -r1.281 -r1.282 --- mandoc.h +++ mandoc.h @@ -23,7 +23,6 @@ #define ASCII_NBRZW 30 /* non-breaking zero-width space */ #define ASCII_BREAK 29 /* breakable zero-width space */ #define ASCII_HYPH 28 /* breakable hyphen */ -#define ASCII_ESC 27 /* escape sequence from copy-in processing */ #define ASCII_TABREF 26 /* reset tab reference position */ /* Index: roff.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/roff.c,v retrieving revision 1.396 retrieving revision 1.397 diff -Lroff.c -Lroff.c -u -p -r1.396 -r1.397 --- roff.c +++ roff.c @@ -1387,7 +1387,7 @@ roff_expand(struct roff *r, struct buf * */ if (buf->buf[pos] != ec) { - if (ec != ASCII_ESC && buf->buf[pos] == '\\') { + if (buf->buf[pos] == '\\') { roff_expand_patch(buf, pos, "\\e", pos + 1); pos++; } @@ -1632,12 +1632,7 @@ roff_getarg(struct roff *r, char **cpp, cp++; break; case '\\': - /* - * Signal to roff_expand() that an escape - * sequence resulted from copy-in processing - * and needs to be checked or interpolated. - */ - cp[-pairs] = ASCII_ESC; + cp[-pairs] = '\\'; newesc = 1; pairs++; cp++; @@ -1694,7 +1689,7 @@ roff_getarg(struct roff *r, char **cpp, buf.buf = start; buf.sz = strlen(start) + 1; buf.next = NULL; - if (roff_expand(r, &buf, ln, 0, ASCII_ESC) & ROFF_IGN) { + if (roff_expand(r, &buf, ln, 0, '\\') & ROFF_IGN) { free(buf.buf); buf.buf = mandoc_strdup(""); } -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 6564 bytes --] On Thu, Oct 19, 2023 at 06:19:23PM +0200, Ingo Schwarze wrote: > Hi Alejandro, > > Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200: > > > I had this gripe with man(7) some years ago. I thought of using the > > following instead, which slightly complicates the source code, but makes > > it more logical. > > > > $ cat nested_indent.man > > .TH nested_indent 7 2023-10-19 experiments > > .SH Ingo said: > > .TP > > Todo > > Currently, when formatting .TP or .IP with a non-empty head, > > [yada yada] > > .RS > > .PP > > When formatting .IP or .RS with an empty head, mandoc needs > > [yada yada] > > .RE > > > > As you can see, here the indentation is controlled by a single RS/RE > > pair, and everything within it uses PP as a normal paragraph separator. > > While that also generates correct terminal and typographical (PS, PDF) > output in the same purely presentational sense as .TP .IP .TP, it > does not help with respect to the semantic problem we are discussing > here. > > Look at the AST generated by mandoc(1): > > $ mandoc -T tree nested_indent.man > title = "nested_indent" > sec = "7" > vol = "Miscellaneous Information Manual" > os = "experiments" > date = "2023-10-19" > > SH (block) *2:2 > SH (head) 2:2 ID=HREF=Ingo_said: > Ingo (text) 2:5 > said: (text) 2:10 > SH (body) 2:2 > TP (block) *3:2 > TP (head) 3:2 ID=HREF > Todo (text) *4:1 > TP (body) 4:1 > Currently, when formatting .TP or .IP \ > with a nonempty head, (text) *5:1 > [yada yada] (text) *6:1 > RS (block) *7:2 > RS (head) 7:2 > RS (body) 7:2 > PP (block) *8:2 > PP (head) 8:2 > PP (body) 8:2 > When formatting .IP or .RS with an empty head, > mandoc needs (text) *9:1 > [yada yada] (text) *10:1 > TP (block) *12:2 > TP (head) 12:2 ID=HREF=final > final tag (text) *13:1 > TP (body) 13:1 > final body (text) *14:1 > > You see that the first .TP, the .RS, and the second .TP are all child > nodes of the top-level .SH. The .RS is not a child of the .TP but > a sibling. The two .TP nodes still aren't siblings of each other. > > Now on first sight, you might blame me for that and call it a mandoc > artifact, arguing that mandoc instead ought to treat the .RS as a > child of the first .TP. But no, that would be incorrect parsing > for the following reason: the .TP inmplies an indentation, and > the .RS also implies an indentation. If the .RS were a child of > the .TP, we would get double indentation. You can make that > argument even more convincing by adding a width argument to .RS > and varying that argument. That way, you see that the .RS is > indented relative to the .SH, not relative to the .TP. > > There are some cases where it is not completely clear whether one > man(7) node following another man(7) node is a child or a sibling. > mandoc(1) makes arbitrary choices in such ambiguous cases, usually > opting for sibling relations where possible and avoiding unnecessary > child relationships. But this is not an ambiguous case. Just like > the .IP, the .RS is definitely a sibling and not a child of the .TP. > As i said, no block can nest inside .TP. > > That's why i brought up .RS in my reply and developed rules > for handling it in a similar way as .IP, even though you did > not mention .RS before. > > > You could put the RS before the first paragraph, but then an unwanted > > line break appears after the tag. > > No matter where you put the .RS, it will never be a child of .TP. > > > (Maybe man(7) could be tweaked so > > that RS doesn't insert the line break after a TP.) > > Not really a useful idea because .RS doesn't help with the actual > problem in the first place. > > > In the end I didn't switch to that scheme, because IP just worked, but > > I might consider it if it proves to be useful. What do you think? > > As i said, i am not aware of a better solution than .TP .IP .TP. > In particular, .RS is not better because it causes exactly the > same trouble and potentially more trouble besides. > > But i also said that trying to define "good style" for man(7) > is a fool's errand. Because man(7) code is so exceedingly difficult > to write, man(7) code that is very clearly bad style is very often > found in the wild, so there is ample opportunity for saying "this > is bad style." In some cases, it is also possible to point out > better style, for example > > .BR "some word" . > > is clearly better style than > > .B some word\c > \&. > > even though both are correct man(7) code and even though there are > situations in man(7) where \c is unavoidable. > > But very frequently, situations arise where man(7) doesn't really > allow any good solution, and the best you can do is not making > the source gratuitiously worse than it needs to be. > > The .TP .IP .TP idiom is such an example. It's definitely ugly from > both semantic and stylistic points of view, but no good solution > is available. I'm willing to go further and claim that no better > solution can be designed even if you are willing to introduce a new > macro or change the way the .TP API is defined, even in incompatible > ways, because it's not this particular macro that is broken. What is > broken is the fundamental design of the language: the language not > only predates the concept of semantic markup, but it also predates > the concept of block nesting in markup languages. Yes, that is hard > to believe for people born after 1970 because those people have > essentially grown up with HTML and LaTeX and those two markup > languages have defined their concept of what a markup language is, > but let's face it, man(7) predates those fundamental concepts, > and it shows all over the place. > > As long as people are using the language, mandoc(1) needs to somehow > deal with the mess. I'm not happy with that because it is wasting a > lot of development time which could be spent in more productive ways, > but what can i do... > > Yours, > Ingo Hi Ingo, Hmmm. You convinced me (about the problems of man(7)), I think. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Alejandro, Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200: > I had this gripe with man(7) some years ago. I thought of using the > following instead, which slightly complicates the source code, but makes > it more logical. > > $ cat nested_indent.man > .TH nested_indent 7 2023-10-19 experiments > .SH Ingo said: > .TP > Todo > Currently, when formatting .TP or .IP with a non-empty head, > [yada yada] > .RS > .PP > When formatting .IP or .RS with an empty head, mandoc needs > [yada yada] > .RE > > As you can see, here the indentation is controlled by a single RS/RE > pair, and everything within it uses PP as a normal paragraph separator. While that also generates correct terminal and typographical (PS, PDF) output in the same purely presentational sense as .TP .IP .TP, it does not help with respect to the semantic problem we are discussing here. Look at the AST generated by mandoc(1): $ mandoc -T tree nested_indent.man title = "nested_indent" sec = "7" vol = "Miscellaneous Information Manual" os = "experiments" date = "2023-10-19" SH (block) *2:2 SH (head) 2:2 ID=HREF=Ingo_said: Ingo (text) 2:5 said: (text) 2:10 SH (body) 2:2 TP (block) *3:2 TP (head) 3:2 ID=HREF Todo (text) *4:1 TP (body) 4:1 Currently, when formatting .TP or .IP \ with a nonempty head, (text) *5:1 [yada yada] (text) *6:1 RS (block) *7:2 RS (head) 7:2 RS (body) 7:2 PP (block) *8:2 PP (head) 8:2 PP (body) 8:2 When formatting .IP or .RS with an empty head, mandoc needs (text) *9:1 [yada yada] (text) *10:1 TP (block) *12:2 TP (head) 12:2 ID=HREF=final final tag (text) *13:1 TP (body) 13:1 final body (text) *14:1 You see that the first .TP, the .RS, and the second .TP are all child nodes of the top-level .SH. The .RS is not a child of the .TP but a sibling. The two .TP nodes still aren't siblings of each other. Now on first sight, you might blame me for that and call it a mandoc artifact, arguing that mandoc instead ought to treat the .RS as a child of the first .TP. But no, that would be incorrect parsing for the following reason: the .TP inmplies an indentation, and the .RS also implies an indentation. If the .RS were a child of the .TP, we would get double indentation. You can make that argument even more convincing by adding a width argument to .RS and varying that argument. That way, you see that the .RS is indented relative to the .SH, not relative to the .TP. There are some cases where it is not completely clear whether one man(7) node following another man(7) node is a child or a sibling. mandoc(1) makes arbitrary choices in such ambiguous cases, usually opting for sibling relations where possible and avoiding unnecessary child relationships. But this is not an ambiguous case. Just like the .IP, the .RS is definitely a sibling and not a child of the .TP. As i said, no block can nest inside .TP. That's why i brought up .RS in my reply and developed rules for handling it in a similar way as .IP, even though you did not mention .RS before. > You could put the RS before the first paragraph, but then an unwanted > line break appears after the tag. No matter where you put the .RS, it will never be a child of .TP. > (Maybe man(7) could be tweaked so > that RS doesn't insert the line break after a TP.) Not really a useful idea because .RS doesn't help with the actual problem in the first place. > In the end I didn't switch to that scheme, because IP just worked, but > I might consider it if it proves to be useful. What do you think? As i said, i am not aware of a better solution than .TP .IP .TP. In particular, .RS is not better because it causes exactly the same trouble and potentially more trouble besides. But i also said that trying to define "good style" for man(7) is a fool's errand. Because man(7) code is so exceedingly difficult to write, man(7) code that is very clearly bad style is very often found in the wild, so there is ample opportunity for saying "this is bad style." In some cases, it is also possible to point out better style, for example .BR "some word" . is clearly better style than .B some word\c \&. even though both are correct man(7) code and even though there are situations in man(7) where \c is unavoidable. But very frequently, situations arise where man(7) doesn't really allow any good solution, and the best you can do is not making the source gratuitiously worse than it needs to be. The .TP .IP .TP idiom is such an example. It's definitely ugly from both semantic and stylistic points of view, but no good solution is available. I'm willing to go further and claim that no better solution can be designed even if you are willing to introduce a new macro or change the way the .TP API is defined, even in incompatible ways, because it's not this particular macro that is broken. What is broken is the fundamental design of the language: the language not only predates the concept of semantic markup, but it also predates the concept of block nesting in markup languages. Yes, that is hard to believe for people born after 1970 because those people have essentially grown up with HTML and LaTeX and those two markup languages have defined their concept of what a markup language is, but let's face it, man(7) predates those fundamental concepts, and it shows all over the place. As long as people are using the language, mandoc(1) needs to somehow deal with the mess. I'm not happy with that because it is wasting a lot of development time which could be spent in more productive ways, but what can i do... Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 9447 bytes --] On Thu, Oct 19, 2023 at 04:45:21PM +0200, Ingo Schwarze wrote: > Hi Alejandro, > > Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200: > > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote: > >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200: > > >>> groff -man -Thtml seems to never produce a blank line before a TP. > >> What do you mean by "blank line"? > > What my eyes experience as a relatively large inter-paragraph space. > > Heh. That's not a very useful definition when talking about HTML code, > given that the HTML language does not provide any non-deprecated syntax > or semantics related to paragraph spacing. :) I know. :) > > >>> mandoc -man -Thtml produces one in some cases, and I can't see a > >>> pattern. > >>> > >>> I found this bug while reading feature_test_macros(7) in the Debian > >>> online manpages: > >>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html> > > >> You can see that particular page rendered here: > >> https://man.bsd.lv/Test/ftm.7 > > > I don't see the bug there. I'm going to guess it's just another case of > > a missing CSS file. > > Actually, there *is* a problem with the HTML code in that page, > even though you did not see it because the CSS file hides it. > > That page contains this HTML code: > > <dl class="Bl-tag"> > <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt> > <dd>Exposes declarations consistent with the ISO C11 standard. > Defining this macro also enables C99 and C95 features > (like <b>_ISOC99_SOURCE</b>).</dd> > </dl> > <dl class="Bl-tag"> > <dt></dt> > <dd>Invoking the C compiler with the option <i>-std=c11</i> > produces the same effects as defining this macro.</dd> > </dl> > <dl class="Bl-tag"> > <dt><b>_LARGEFILE64_SOURCE</b></dt> > <dd>Expose definitions for the alternative API specified by the > LFS (Large File Summit) as a "transitional extension" > to the Single UNIX Specification. [...] > > That's of course quite nonsensical because _ISOC11_SOURCE and > _LARGEFILE64_SOURCE are intended by the page author as adjacent > entries in the same list, but mandoc(1) puts them into different > lists, with yet another single-item list in between. > > >> It is a long page, and i have been unable to figure out what exactly > >> you are talking about. > >> > >> Please point me to the precise position in the file where vertical > >> spacing before a .TP macro feels lacking or execessive to you, > > > In the Debian bullseye page, check the inter-paragraph space before the > > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag > > _LARGEFILE_SOURCE. > > Now i see, thank you for pointing me to the specific place. > > The trouble is caused by the following man(7) idiom: > > .TP > first tag > first body > .IP > still in the first body > .TP > second tag > second body Hmm, that's an old enemy showing up. > > The author's intent here is that the two .TP macros mark up adjacent > items in the same list and the .IP marks up an ordinary paragraph > break within the item body of the first list entry. > > Now, using .IP for an ordinary paragraph break is no doubt surprising, > but it works (from a purely presentational point of view) because .IP > does assert the same vertical spacing as .PP would and because .IP > asserts the same indentation as the previous .TP did. > > So logically, what the author wanted was a list with two entries, > one containing two paragraphs of text, the other containing one > paragraph of text. Technically, what we got is three paragraphs of > text, all indented by the same amount, the first and last containing > a tag and the middle one having an empty tag, with no indication that > there is any relation between any of the paragraphs, let alone that > they form a list. Figuring out the logical list structure, if any, > is left as a guessing exercise to the formatter. > > Here the conceptual inadequacy of the man(7) language becomes > blatantly obvious. With very few exceptions, the language does not > provide any concept of block nesting. The only exception is that > various macros can be nested inside .SH, .SS, and .RS. But nothing > can be nested inside a list item body, neither a paragraph break, > nor .RS, nor another list. I had this gripe with man(7) some years ago. I thought of using the following instead, which slightly complicates the source code, but makes it more logical. $ cat nested_indent.man .TH nested_indent 7 2023-10-19 experiments .SH Ingo said: .TP Todo Currently, when formatting .TP or .IP with a non-empty head, the HTML formatter looks at the previous and at the following abstract syntax tree (AST) node to figure out whether the tagged paragraph is part of a list. If that previous or follwing AST node is .IP or .RS with an empty head, it will have to iterate until it finds an AST node that is neither .IP nor .RS or has a non-empty head, evaluating the properties of that node instead of the directly preceding or following node. .RS .PP When formatting .IP or .RS with an empty head, mandoc needs to iterate backwards, searching for an AST node that is neither \&.IP nor .RS or has a non-empty head, and figure out whether that node is a list item, which again, as explained above, requires iterating both forwards and backwards. If it turns out we are inside a list, interrupting the list must be prevented. Instead, .IP with an empty head must be formatted like .PP, and .RS with an empty head must be formatted somewhat like .br. .RE As you can see, here the indentation is controlled by a single RS/RE pair, and everything within it uses PP as a normal paragraph separator. You could put the RS before the first paragraph, but then an unwanted line break appears after the tag. (Maybe man(7) could be tweaked so that RS doesn't insert the line break after a TP.) In the end I didn't switch to that scheme, because IP just worked, but I might consider it if it proves to be useful. What do you think? [CC += Branden, in case he wants to add his opinion too] Cheers, Alex > > Consequently, i think the fundamental design of the man(7) language > is too weak to adequately express a list item containing more than a > single paragraph of text, and the crude presentational workaround of > splitting the item body into two unrelated paragraphs with different > introducing macros indeed looks like the best workaround available. > > The longer i look at the man(7) language, the more convinced i become > that it is so rotten to the core that trying to provide a style > guide to write good man(7) pages is nothing but a fool's errand, > and trying to add a few semantical macros to the man(7) language is > an even worse fool's errand because a few additions won't cure the > fundamental design flaws. If you put lipstick on a pig, it's still > not going to win any Miss Espana contest. > > Let me quote only one other example that i ran into just today. > The first real-world example of .MR usage i encountered required > a trailing \c escape sequence on the preceding line. Now how > ironic is that? A brand-new macro introduced to improve semantics, > but using is requires terribly arcane low-level presentational > markup. I'm progressively becoming convinced that the language > is irredeemable. > > > Consequently, the following needs to be done in mandoc: > > 1. Currently, when formatting .TP or .IP with a non-empty head, > the HTML formatter looks at the previous and at the following > abstract syntax tree (AST) node to figure out whether the > tagged paragraph is part of a list. > If that previous or follwing AST node is .IP or .RS with an > empty head, it will have to iterate until it finds an AST node > that is neither .IP nor .RS or has a non-empty head, evaluating > the properties of that node instead of the directly preceding > or following node. > > 2. When formatting .IP or .RS with an empty head, mandoc needs > to iterate backwards, searching for an AST node that is neither > .IP nor .RS or has a non-empty head, and figure out whether that > node is a list item, which again, as explained above, requires > iterating both forwards and backwards. > If it turns out we are inside a list, interrupting the list > must be prevented. Instead, .IP with an empty head must be > formatted like .PP, and .RS with an empty head must be formatted > somewhat like .br. > > Probably, doing all this in the HTML formatter module would be > over the top. I believe such complicated AST inspection should > be done by the validation module (man_validate.c), which should > set AST node flags similar to > > - this node starts a new list > - this node starts a new list item > - this node merely indicates a paragraph break > - this node ends a list > > which the fotmatters can then readily use. > > This required logic is so complicated that i won't code it right > away, there are more urgent matters to be taken care of. > Instead, i will add it to the mandoc TODO list. > > Yours, > Ingo -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Alejandro,
Ingo Schwarze wrote on Thu, Oct 19, 2023 at 04:45:21PM +0200:
> Consequently, the following needs to be done in mandoc:
>
> 1. Currently, when formatting .TP or .IP with a non-empty head,
> the HTML formatter looks at the previous and at the following
> abstract syntax tree (AST) node to figure out whether the
> tagged paragraph is part of a list.
> If that previous or follwing AST node is .IP or .RS with an
> empty head, it will have to iterate until it finds an AST node
> that is neither .IP nor .RS or has a non-empty head, evaluating
> the properties of that node instead of the directly preceding
> or following node.
>
> 2. When formatting .IP or .RS with an empty head, mandoc needs
> to iterate backwards, searching for an AST node that is neither
> .IP nor .RS or has a non-empty head, and figure out whether that
> node is a list item, which again, as explained above, requires
> iterating both forwards and backwards.
> If it turns out we are inside a list, interrupting the list
> must be prevented. Instead, .IP with an empty head must be
> formatted like .PP, and .RS with an empty head must be formatted
> somewhat like .br.
>
> Probably, doing all this in the HTML formatter module would be
> over the top. I believe such complicated AST inspection should
> be done by the validation module (man_validate.c), which should
> set AST node flags similar to
>
> - this node starts a new list
> - this node starts a new list item
> - this node merely indicates a paragraph break
> - this node ends a list
>
> which the fotmatters can then readily use.
>
> This required logic is so complicated that i won't code it right
> away, there are more urgent matters to be taken care of.
> Instead, i will add it to the mandoc TODO list.
FYI: Added, see the commit below.
Ingo
Log Message:
-----------
new entries: .MR and .TP .IP .TP
Modified Files:
--------------
mandoc:
TODO
Revision Data
-------------
Index: TODO
===================================================================
RCS file: /home/cvs/mandoc/mandoc/TODO,v
retrieving revision 1.329
retrieving revision 1.330
diff -LTODO -LTODO -u -p -r1.329 -r1.330
--- TODO
+++ TODO
@@ -239,6 +239,9 @@ are mere guesses, and some may be wrong.
--- missing man features -----------------------------------------------
+- groff_man(7) .MR
+ loc ** exist * algo * size * imp ***
+
- MANWIDTH
Markus Waldeck <waldeck at gmx dot de> 9 Jun 2015 05:49:56 +0200
Laura Morales <lauretas at mail dot com> 26 Apr 2018 08:15:55 +0200
@@ -504,6 +507,10 @@ are mere guesses, and some may be wrong.
loc ** exist ** algo ** size * imp **
--- HTML issues --------------------------------------------------------
+
+- support the idiom .TP .IP .TP for multi-paragraph list item bodies
+ to: Alejandro Colomar Thu, 19 Oct 2023 16:45:21 +0200
+ loc ** exist ** algo ** size ** imp **
- .Nm without an argument and .Bx cause premature </pre>
Nab Sun, 5 Jun 2022 18:30:09 +0200
--
To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
Hi Alejandro, Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200: > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote: >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200: >>> groff -man -Thtml seems to never produce a blank line before a TP. >> What do you mean by "blank line"? > What my eyes experience as a relatively large inter-paragraph space. Heh. That's not a very useful definition when talking about HTML code, given that the HTML language does not provide any non-deprecated syntax or semantics related to paragraph spacing. :) >>> mandoc -man -Thtml produces one in some cases, and I can't see a >>> pattern. >>> >>> I found this bug while reading feature_test_macros(7) in the Debian >>> online manpages: >>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html> >> You can see that particular page rendered here: >> https://man.bsd.lv/Test/ftm.7 > I don't see the bug there. I'm going to guess it's just another case of > a missing CSS file. Actually, there *is* a problem with the HTML code in that page, even though you did not see it because the CSS file hides it. That page contains this HTML code: <dl class="Bl-tag"> <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt> <dd>Exposes declarations consistent with the ISO C11 standard. Defining this macro also enables C99 and C95 features (like <b>_ISOC99_SOURCE</b>).</dd> </dl> <dl class="Bl-tag"> <dt></dt> <dd>Invoking the C compiler with the option <i>-std=c11</i> produces the same effects as defining this macro.</dd> </dl> <dl class="Bl-tag"> <dt><b>_LARGEFILE64_SOURCE</b></dt> <dd>Expose definitions for the alternative API specified by the LFS (Large File Summit) as a "transitional extension" to the Single UNIX Specification. [...] That's of course quite nonsensical because _ISOC11_SOURCE and _LARGEFILE64_SOURCE are intended by the page author as adjacent entries in the same list, but mandoc(1) puts them into different lists, with yet another single-item list in between. >> It is a long page, and i have been unable to figure out what exactly >> you are talking about. >> >> Please point me to the precise position in the file where vertical >> spacing before a .TP macro feels lacking or execessive to you, > In the Debian bullseye page, check the inter-paragraph space before the > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag > _LARGEFILE_SOURCE. Now i see, thank you for pointing me to the specific place. The trouble is caused by the following man(7) idiom: .TP first tag first body .IP still in the first body .TP second tag second body The author's intent here is that the two .TP macros mark up adjacent items in the same list and the .IP marks up an ordinary paragraph break within the item body of the first list entry. Now, using .IP for an ordinary paragraph break is no doubt surprising, but it works (from a purely presentational point of view) because .IP does assert the same vertical spacing as .PP would and because .IP asserts the same indentation as the previous .TP did. So logically, what the author wanted was a list with two entries, one containing two paragraphs of text, the other containing one paragraph of text. Technically, what we got is three paragraphs of text, all indented by the same amount, the first and last containing a tag and the middle one having an empty tag, with no indication that there is any relation between any of the paragraphs, let alone that they form a list. Figuring out the logical list structure, if any, is left as a guessing exercise to the formatter. Here the conceptual inadequacy of the man(7) language becomes blatantly obvious. With very few exceptions, the language does not provide any concept of block nesting. The only exception is that various macros can be nested inside .SH, .SS, and .RS. But nothing can be nested inside a list item body, neither a paragraph break, nor .RS, nor another list. Consequently, i think the fundamental design of the man(7) language is too weak to adequately express a list item containing more than a single paragraph of text, and the crude presentational workaround of splitting the item body into two unrelated paragraphs with different introducing macros indeed looks like the best workaround available. The longer i look at the man(7) language, the more convinced i become that it is so rotten to the core that trying to provide a style guide to write good man(7) pages is nothing but a fool's errand, and trying to add a few semantical macros to the man(7) language is an even worse fool's errand because a few additions won't cure the fundamental design flaws. If you put lipstick on a pig, it's still not going to win any Miss Espana contest. Let me quote only one other example that i ran into just today. The first real-world example of .MR usage i encountered required a trailing \c escape sequence on the preceding line. Now how ironic is that? A brand-new macro introduced to improve semantics, but using is requires terribly arcane low-level presentational markup. I'm progressively becoming convinced that the language is irredeemable. Consequently, the following needs to be done in mandoc: 1. Currently, when formatting .TP or .IP with a non-empty head, the HTML formatter looks at the previous and at the following abstract syntax tree (AST) node to figure out whether the tagged paragraph is part of a list. If that previous or follwing AST node is .IP or .RS with an empty head, it will have to iterate until it finds an AST node that is neither .IP nor .RS or has a non-empty head, evaluating the properties of that node instead of the directly preceding or following node. 2. When formatting .IP or .RS with an empty head, mandoc needs to iterate backwards, searching for an AST node that is neither .IP nor .RS or has a non-empty head, and figure out whether that node is a list item, which again, as explained above, requires iterating both forwards and backwards. If it turns out we are inside a list, interrupting the list must be prevented. Instead, .IP with an empty head must be formatted like .PP, and .RS with an empty head must be formatted somewhat like .br. Probably, doing all this in the HTML formatter module would be over the top. I believe such complicated AST inspection should be done by the validation module (man_validate.c), which should set AST node flags similar to - this node starts a new list - this node starts a new list item - this node merely indicates a paragraph break - this node ends a list which the fotmatters can then readily use. This required logic is so complicated that i won't code it right away, there are more urgent matters to be taken care of. Instead, i will add it to the mandoc TODO list. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 4249 bytes --] Hi Ingo, On Thu, Oct 19, 2023 at 01:59:22PM +0200, Ingo Schwarze wrote: > Hi Alejandro, > > Ingo Schwarze wrote on Wed, Oct 18, 2023 at 02:04:46AM +0200: > > > 1. change the Makefile to always install mandoc.css > > Done with the commit appended below. > > I believe all issues discovered in the thread "unwanted line break > after bullet" have now been fixed. In case i missed something, > please speak up! Thanks! Not as far as I can see. Cheers, Alex > > Yours, > Ingo > > > Log Message: > ----------- > Install mandoc.css by default even if man.cgi(8) is not built. > It matters because users of "mandoc -T html" typically need it. > > Issue found in a conversation with Alejandro Colomar <alx at kernel aot org>. > > Modified Files: > -------------- > mandoc: > Makefile > configure > configure.local.example > > Revision Data > ------------- > Index: configure > =================================================================== > RCS file: /home/cvs/mandoc/mandoc/configure,v > retrieving revision 1.82 > retrieving revision 1.83 > diff -Lconfigure -Lconfigure -u -p -r1.82 -r1.83 > --- configure > +++ configure > @@ -111,6 +111,7 @@ BIN_FROM_SBIN= > INCLUDEDIR= > LIBDIR= > MANDIR= > +MISCDIR= > READ_ALLOWED_PATH= > > WWWPREFIX="/var/www" > @@ -620,6 +621,7 @@ exec > Makefile.local > [ -z "${INCLUDEDIR}" ] && INCLUDEDIR="${PREFIX}/include/mandoc" > [ -z "${LIBDIR}" ] && LIBDIR="${PREFIX}/lib/mandoc" > [ -z "${MANDIR}" ] && MANDIR="${PREFIX}/man" > +[ -z "${MISCDIR}" ] && MISCDIR="${PREFIX}/share/misc" > > [ -z "${HTDOCDIR}" ] && HTDOCDIR="${WWWPREFIX}/htdocs" > [ -z "${CGIBINDIR}" ] && CGIBINDIR="${WWWPREFIX}/cgi-bin" > @@ -658,6 +660,7 @@ BIN_FROM_SBIN = ${BIN_FROM_SBIN} > INCLUDEDIR = ${INCLUDEDIR} > LIBDIR = ${LIBDIR} > MANDIR = ${MANDIR} > +MISCDIR = ${MISCDIR} > WWWPREFIX = ${WWWPREFIX} > HTDOCDIR = ${HTDOCDIR} > CGIBINDIR = ${CGIBINDIR} > Index: Makefile > =================================================================== > RCS file: /home/cvs/mandoc/mandoc/Makefile,v > retrieving revision 1.542 > retrieving revision 1.543 > diff -LMakefile -LMakefile -u -p -r1.542 -r1.543 > --- Makefile > +++ Makefile > @@ -416,6 +416,7 @@ base-install: mandoc demandoc soelim > mkdir -p $(DESTDIR)$(MANDIR)/man5 > mkdir -p $(DESTDIR)$(MANDIR)/man7 > mkdir -p $(DESTDIR)$(MANDIR)/man8 > + mkdir -p $(DESTDIR)$(MISCDIR) > $(INSTALL_PROGRAM) mandoc demandoc $(DESTDIR)$(BINDIR) > $(INSTALL_PROGRAM) soelim $(DESTDIR)$(BINDIR)/$(BINM_SOELIM) > cd $(DESTDIR)$(BINDIR) && $(LN) mandoc $(BINM_MAN) > @@ -438,6 +439,7 @@ base-install: mandoc demandoc soelim > $(INSTALL_MAN) mandoc_char.7 $(DESTDIR)$(MANDIR)/man7 > $(INSTALL_MAN) makewhatis.8 \ > $(DESTDIR)$(MANDIR)/man8/$(BINM_MAKEWHATIS).8 > + $(INSTALL_DATA) mandoc.css $(DESTDIR)$(MISCDIR) > > lib-install: libmandoc.a > mkdir -p $(DESTDIR)$(LIBDIR) > Index: configure.local.example > =================================================================== > RCS file: /home/cvs/mandoc/mandoc/configure.local.example,v > retrieving revision 1.44 > retrieving revision 1.45 > diff -Lconfigure.local.example -Lconfigure.local.example -u -p -r1.44 -r1.45 > --- configure.local.example > +++ configure.local.example > @@ -108,11 +108,15 @@ OSNAME="OpenBSD 7.0" > # there is no need to copy the whole block. > # Even if you set PREFIX to something else, the other variables > # pick it up without copying them all over. > +# MISCDIR is only used for installing the file mandoc.css. > +# That is important because users of "mandoc -T html" often need it > +# even if they are not using man.cgi(8), see mandoc(1) for details. > > PREFIX="/usr/local" > BINDIR="${PREFIX}/bin" > SBINDIR="${PREFIX}/sbin" > MANDIR="${PREFIX}/man" > +MISCDIR="${PREFIX}/share/misc" > > # If BINDIR and SBINDIR are not subdirectories of the same parent > # directory or if the basename(1) of BINDIR differs from "bin", > -- > To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Alejandro,
Ingo Schwarze wrote on Wed, Oct 18, 2023 at 02:04:46AM +0200:
> 1. change the Makefile to always install mandoc.css
Done with the commit appended below.
I believe all issues discovered in the thread "unwanted line break
after bullet" have now been fixed. In case i missed something,
please speak up!
Yours,
Ingo
Log Message:
-----------
Install mandoc.css by default even if man.cgi(8) is not built.
It matters because users of "mandoc -T html" typically need it.
Issue found in a conversation with Alejandro Colomar <alx at kernel aot org>.
Modified Files:
--------------
mandoc:
Makefile
configure
configure.local.example
Revision Data
-------------
Index: configure
===================================================================
RCS file: /home/cvs/mandoc/mandoc/configure,v
retrieving revision 1.82
retrieving revision 1.83
diff -Lconfigure -Lconfigure -u -p -r1.82 -r1.83
--- configure
+++ configure
@@ -111,6 +111,7 @@ BIN_FROM_SBIN=
INCLUDEDIR=
LIBDIR=
MANDIR=
+MISCDIR=
READ_ALLOWED_PATH=
WWWPREFIX="/var/www"
@@ -620,6 +621,7 @@ exec > Makefile.local
[ -z "${INCLUDEDIR}" ] && INCLUDEDIR="${PREFIX}/include/mandoc"
[ -z "${LIBDIR}" ] && LIBDIR="${PREFIX}/lib/mandoc"
[ -z "${MANDIR}" ] && MANDIR="${PREFIX}/man"
+[ -z "${MISCDIR}" ] && MISCDIR="${PREFIX}/share/misc"
[ -z "${HTDOCDIR}" ] && HTDOCDIR="${WWWPREFIX}/htdocs"
[ -z "${CGIBINDIR}" ] && CGIBINDIR="${WWWPREFIX}/cgi-bin"
@@ -658,6 +660,7 @@ BIN_FROM_SBIN = ${BIN_FROM_SBIN}
INCLUDEDIR = ${INCLUDEDIR}
LIBDIR = ${LIBDIR}
MANDIR = ${MANDIR}
+MISCDIR = ${MISCDIR}
WWWPREFIX = ${WWWPREFIX}
HTDOCDIR = ${HTDOCDIR}
CGIBINDIR = ${CGIBINDIR}
Index: Makefile
===================================================================
RCS file: /home/cvs/mandoc/mandoc/Makefile,v
retrieving revision 1.542
retrieving revision 1.543
diff -LMakefile -LMakefile -u -p -r1.542 -r1.543
--- Makefile
+++ Makefile
@@ -416,6 +416,7 @@ base-install: mandoc demandoc soelim
mkdir -p $(DESTDIR)$(MANDIR)/man5
mkdir -p $(DESTDIR)$(MANDIR)/man7
mkdir -p $(DESTDIR)$(MANDIR)/man8
+ mkdir -p $(DESTDIR)$(MISCDIR)
$(INSTALL_PROGRAM) mandoc demandoc $(DESTDIR)$(BINDIR)
$(INSTALL_PROGRAM) soelim $(DESTDIR)$(BINDIR)/$(BINM_SOELIM)
cd $(DESTDIR)$(BINDIR) && $(LN) mandoc $(BINM_MAN)
@@ -438,6 +439,7 @@ base-install: mandoc demandoc soelim
$(INSTALL_MAN) mandoc_char.7 $(DESTDIR)$(MANDIR)/man7
$(INSTALL_MAN) makewhatis.8 \
$(DESTDIR)$(MANDIR)/man8/$(BINM_MAKEWHATIS).8
+ $(INSTALL_DATA) mandoc.css $(DESTDIR)$(MISCDIR)
lib-install: libmandoc.a
mkdir -p $(DESTDIR)$(LIBDIR)
Index: configure.local.example
===================================================================
RCS file: /home/cvs/mandoc/mandoc/configure.local.example,v
retrieving revision 1.44
retrieving revision 1.45
diff -Lconfigure.local.example -Lconfigure.local.example -u -p -r1.44 -r1.45
--- configure.local.example
+++ configure.local.example
@@ -108,11 +108,15 @@ OSNAME="OpenBSD 7.0"
# there is no need to copy the whole block.
# Even if you set PREFIX to something else, the other variables
# pick it up without copying them all over.
+# MISCDIR is only used for installing the file mandoc.css.
+# That is important because users of "mandoc -T html" often need it
+# even if they are not using man.cgi(8), see mandoc(1) for details.
PREFIX="/usr/local"
BINDIR="${PREFIX}/bin"
SBINDIR="${PREFIX}/sbin"
MANDIR="${PREFIX}/man"
+MISCDIR="${PREFIX}/share/misc"
# If BINDIR and SBINDIR are not subdirectories of the same parent
# directory or if the basename(1) of BINDIR differs from "bin",
--
To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 1121 bytes --] Hi Ingo, On Wed, Oct 18, 2023 at 11:31:48PM +0200, Ingo Schwarze wrote: > Hi Alejandro, > > Alejandro Colomar wrote on Wed, Oct 18, 2023 at 10:09:56PM +0200: > > > As you know, I have plans to start using MR from GNU groff. But doing > > so without wide support from distros and formatters would be stupid. > [...] > > Do you have plans to add support for the MR macro? > > Yes, it will be added. It shouldn't bee too difficult because it's > essentially a clone of mdoc(7) .Xr. Also, it is somewhat urgent > because the groff manual pages already use it, so i have to > support it in mandoc(1) before i can update the OpenBSD groff port > to 1.23. Thanks! > > The MF string register will be ignored, though. mandoc(1) will > always set the topic (or to use a more precise term, the *name*) > of the manual page linked to in the R font. Sounds reasonable to me, for consistency with mandoc(1)'s formatting of mdoc(7). Cheers, Alex > > Yours, > Ingo > -- > To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Alejandro, Alejandro Colomar wrote on Wed, Oct 18, 2023 at 10:09:56PM +0200: > As you know, I have plans to start using MR from GNU groff. But doing > so without wide support from distros and formatters would be stupid. [...] > Do you have plans to add support for the MR macro? Yes, it will be added. It shouldn't bee too difficult because it's essentially a clone of mdoc(7) .Xr. Also, it is somewhat urgent because the groff manual pages already use it, so i have to support it in mandoc(1) before i can update the OpenBSD groff port to 1.23. The MF string register will be ignored, though. mandoc(1) will always set the topic (or to use a more precise term, the *name*) of the manual page linked to in the R font. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
[-- Attachment #1: Type: text/plain, Size: 736 bytes --] Hi Ingo, I've asked you in other lists, but got no answer, so let's do it in the right list. :) As you know, I have plans to start using MR from GNU groff. But doing so without wide support from distros and formatters would be stupid. Regarding distros, I'm waiting until some distros such as Gentoo declare the 1.23.0 package as 'stable' (currently it shows as testing[1]). Regarding formatters, groff(1) of course provides support, but the other one that interests Linux distros is mandoc(1), to have support for void and other distros. Do you have plans to add support for the MR macro? Thanks, Alex [1] <https://packages.gentoo.org/packages/sys-apps/groff> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]