* 1.11.7 minor issues @ 2011-09-18 23:39 Ingo Schwarze 2011-09-18 23:54 ` Kristaps Dzonsons 0 siblings, 1 reply; 5+ messages in thread From: Ingo Schwarze @ 2011-09-18 23:39 UTC (permalink / raw) To: tech Hi, after fixing the two larger problems, systematic comparisons revealed two smaller issues that are new in 1.11.7: > @@ -52,8 +52,8 @@ > -center:off > -center=off > -center- > - Lynx recognizes "1", "+", "on" and "true" for true values, and "0", > - "-", "off" and "false" for false values. Other option-values are > + Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "- > + ", "off" and "false" for false values. Other option-values are > ignored. > @@ -109,8 +109,8 @@ > Many folks attempt a simple-minded regular expression approach, like > "s/<.*?>//g", but that fails in many cases because the tags may > continue over line breaks, they may contain quoted angle-brackets, or > - HTML comment may be present. Plus, folks forget to convert > - entities--like "<" for example. > + HTML comment may be present. Plus, folks forget to convert entities-- > + like "<" for example. > > Here's one "simple-minded" approach, that works for most files: I will think about those two tomorrow. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 1.11.7 minor issues 2011-09-18 23:39 1.11.7 minor issues Ingo Schwarze @ 2011-09-18 23:54 ` Kristaps Dzonsons 2011-09-19 7:58 ` Ingo Schwarze 0 siblings, 1 reply; 5+ messages in thread From: Kristaps Dzonsons @ 2011-09-18 23:54 UTC (permalink / raw) To: tech; +Cc: Ingo Schwarze On 19/09/2011 01:39, Ingo Schwarze wrote: > Hi, > > after fixing the two larger problems, systematic comparisons > revealed two smaller issues that are new in 1.11.7: > >> @@ -52,8 +52,8 @@ >> -center:off >> -center=off >> -center- >> - Lynx recognizes "1", "+", "on" and "true" for true values, and "0", >> - "-", "off" and "false" for false values. Other option-values are >> + Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "- >> + ", "off" and "false" for false values. Other option-values are >> ignored. > >> @@ -109,8 +109,8 @@ >> Many folks attempt a simple-minded regular expression approach, like >> "s/<.*?>//g", but that fails in many cases because the tags may >> continue over line breaks, they may contain quoted angle-brackets, or >> - HTML comment may be present. Plus, folks forget to convert >> - entities--like "<" for example. >> + HTML comment may be present. Plus, folks forget to convert entities-- >> + like "<" for example. >> >> Here's one "simple-minded" approach, that works for most files: > > I will think about those two tomorrow. Ingo, Around line 577 in roff.c is where mandoc_hyph ended up: the quotes need to be added. As for the second one, we should bring jmc@ in, no? I'd think that double or triple-dashes would be broken. Unicode, for one, http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks stipulates that en and em dashes break the line. Thoughts? Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 1.11.7 minor issues 2011-09-18 23:54 ` Kristaps Dzonsons @ 2011-09-19 7:58 ` Ingo Schwarze 2011-09-19 8:10 ` Kristaps Dzonsons 0 siblings, 1 reply; 5+ messages in thread From: Ingo Schwarze @ 2011-09-19 7:58 UTC (permalink / raw) To: tech Hi Kristaps, Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 01:54:39AM +0200: > On 19/09/2011 01:39, Ingo Schwarze wrote: >> after fixing the two larger problems, systematic comparisons >> revealed two smaller issues that are new in 1.11.7: >>>@@ -52,8 +52,8 @@ >>> -center:off >>> -center=off >>> -center- >>>- Lynx recognizes "1", "+", "on" and "true" for true values, and "0", >>>- "-", "off" and "false" for false values. Other option-values are >>>+ Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "- >>>+ ", "off" and "false" for false values. Other option-values are >>> ignored. >>>@@ -109,8 +109,8 @@ >>> Many folks attempt a simple-minded regular expression approach, like >>> "s/<.*?>//g", but that fails in many cases because the tags may >>> continue over line breaks, they may contain quoted angle-brackets, or >>>- HTML comment may be present. Plus, folks forget to convert >>>- entities--like "<" for example. >>>+ HTML comment may be present. Plus, folks forget to convert entities-- >>>+ like "<" for example. >>> >>> Here's one "simple-minded" approach, that works for most files: >> I will think about those two tomorrow. > Around line 577 in roff.c is where mandoc_hyph ended up: the quotes > need to be added. > > As for the second one, we should bring jmc@ in, no? I'd think that > double or triple-dashes would be broken. Unicode, for one, > > http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks > > stipulates that en and em dashes break the line. > > Thoughts? I think you are right that breaking at double dashes ought to be ok. However, groff doesn't break there, i don't consider the point of sufficient importance to deviate from groff, and not breaking at double hyphens keeps the code simpler. I have checked for all non-alpha ASCII character that groff indeed doesn't break the line if they preceed or follow a dash. So, here is what i have done for now - OK? CVSROOT: /cvs Module name: src Changes by: schwarze@cvs.openbsd.org 2011/09/19 01:53:54 Modified files: usr.bin/mandoc : roff.c Log message: Breaking the line at a hyphen is only allowed if the hyphen is both preceded and followed by an alphabetic character. This fixes about a dozen places in base. Index: roff.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/roff.c,v retrieving revision 1.43 diff -u -p -r1.43 roff.c --- roff.c 18 Sep 2011 23:26:18 -0000 1.43 +++ roff.c 19 Sep 2011 07:49:59 -0000 @@ -552,7 +552,6 @@ again: static enum rofferr roff_parsetext(char *p) { - char l, r; size_t sz; const char *start; enum mandoc_esc esc; @@ -579,14 +578,8 @@ roff_parsetext(char *p) continue; } - l = *(p - 1); - r = *(p + 1); - if ('\\' != l && - '\t' != r && '\t' != l && - ' ' != r && ' ' != l && - '-' != r && '-' != l && - ! isdigit((unsigned char)l) && - ! isdigit((unsigned char)r)) + if (isalpha((unsigned char)p[-1]) && + isalpha((unsigned char)p[1])) *p = ASCII_HYPH; p++; } -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 1.11.7 minor issues 2011-09-19 7:58 ` Ingo Schwarze @ 2011-09-19 8:10 ` Kristaps Dzonsons 2011-09-19 8:41 ` Ingo Schwarze 0 siblings, 1 reply; 5+ messages in thread From: Kristaps Dzonsons @ 2011-09-19 8:10 UTC (permalink / raw) To: tech On 19/09/2011 09:58, Ingo Schwarze wrote: > Hi Kristaps, > > Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 01:54:39AM +0200: >> On 19/09/2011 01:39, Ingo Schwarze wrote: > >>> after fixing the two larger problems, systematic comparisons >>> revealed two smaller issues that are new in 1.11.7: > >>>> @@ -52,8 +52,8 @@ >>>> -center:off >>>> -center=off >>>> -center- >>>> - Lynx recognizes "1", "+", "on" and "true" for true values, and "0", >>>> - "-", "off" and "false" for false values. Other option-values are >>>> + Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "- >>>> + ", "off" and "false" for false values. Other option-values are >>>> ignored. > >>>> @@ -109,8 +109,8 @@ >>>> Many folks attempt a simple-minded regular expression approach, like >>>> "s/<.*?>//g", but that fails in many cases because the tags may >>>> continue over line breaks, they may contain quoted angle-brackets, or >>>> - HTML comment may be present. Plus, folks forget to convert >>>> - entities--like "<" for example. >>>> + HTML comment may be present. Plus, folks forget to convert entities-- >>>> + like "<" for example. >>>> >>>> Here's one "simple-minded" approach, that works for most files: > >>> I will think about those two tomorrow. > >> Around line 577 in roff.c is where mandoc_hyph ended up: the quotes >> need to be added. >> >> As for the second one, we should bring jmc@ in, no? I'd think that >> double or triple-dashes would be broken. Unicode, for one, >> >> http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks >> >> stipulates that en and em dashes break the line. >> >> Thoughts? > > I think you are right that breaking at double dashes ought to be ok. > However, groff doesn't break there, i don't consider the point of > sufficient importance to deviate from groff, and not breaking at > double hyphens keeps the code simpler. I have checked for all > non-alpha ASCII character that groff indeed doesn't break the line > if they preceed or follow a dash. > > So, here is what i have done for now - OK? Ingo, I'm fine with this. Is there some place in {mdoc,man}(7) where we should be noting this? It seems that a few words regarding hyphenation would be useful, to wit, noting that hyphens will not break within a macro in mdoc(7), but will in a regular text context, and the conditions for such breakage. Thoughts? Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 1.11.7 minor issues 2011-09-19 8:10 ` Kristaps Dzonsons @ 2011-09-19 8:41 ` Ingo Schwarze 0 siblings, 0 replies; 5+ messages in thread From: Ingo Schwarze @ 2011-09-19 8:41 UTC (permalink / raw) To: tech Hi Kristaps, Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 10:10:55AM +0200: > I'm fine with this. Thanks for looking, i have put it in. > Is there some place in {mdoc,man}(7) where we should be noting this? So far, we say very little about how specific input will be physically formatted, we rather document how the input should look like with respect to syntax and semantics. That said, mentioning this might make sense, so i have taken a note in my private TODO file (i don't think it warrants a public TODO entry). The upcoming reordering should be done first. Also, this is a typical case where mdoc(7) and man(7) and even other macro packages behave the same way, so it belongs more in roff(7). I still hope that we can move common stuff there, pointing to it where required, to avoid duplication and make mdoc(7) and man(7) shorter. > It seems that a few words regarding > hyphenation would be useful, to wit, noting that hyphens will not > break within a macro in mdoc(7), but will in a regular text context, > and the conditions for such breakage. Yes, we should probably return to that at some point. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-09-19 8:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-18 23:39 1.11.7 minor issues Ingo Schwarze 2011-09-18 23:54 ` Kristaps Dzonsons 2011-09-19 7:58 ` Ingo Schwarze 2011-09-19 8:10 ` Kristaps Dzonsons 2011-09-19 8:41 ` Ingo Schwarze
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).