tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* 1.11.7 minor issues
@ 2011-09-18 23:39 Ingo Schwarze
  2011-09-18 23:54 ` Kristaps Dzonsons
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Schwarze @ 2011-09-18 23:39 UTC (permalink / raw)
  To: tech

Hi,

after fixing the two larger problems, systematic comparisons
revealed two smaller issues that are new in 1.11.7:

> @@ -52,8 +52,8 @@
>                -center:off
>                -center=off
>                -center-
> -       Lynx recognizes "1", "+", "on" and "true" for true values, and "0",
> -       "-", "off" and "false" for false values.  Other option-values are
> +       Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "-
> +       ", "off" and "false" for false values.  Other option-values are
>         ignored.

> @@ -109,8 +109,8 @@
>         Many folks attempt a simple-minded regular expression approach, like
>         "s/<.*?>//g", but that fails in many cases because the tags may
>         continue over line breaks, they may contain quoted angle-brackets, or
> -       HTML comment may be present.  Plus, folks forget to convert
> -       entities--like "&lt;" for example.
> +       HTML comment may be present.  Plus, folks forget to convert entities--
> +       like "&lt;" for example.
>
>         Here's one "simple-minded" approach, that works for most files:

I will think about those two tomorrow.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.11.7 minor issues
  2011-09-18 23:39 1.11.7 minor issues Ingo Schwarze
@ 2011-09-18 23:54 ` Kristaps Dzonsons
  2011-09-19  7:58   ` Ingo Schwarze
  0 siblings, 1 reply; 5+ messages in thread
From: Kristaps Dzonsons @ 2011-09-18 23:54 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

On 19/09/2011 01:39, Ingo Schwarze wrote:
> Hi,
>
> after fixing the two larger problems, systematic comparisons
> revealed two smaller issues that are new in 1.11.7:
>
>> @@ -52,8 +52,8 @@
>>                 -center:off
>>                 -center=off
>>                 -center-
>> -       Lynx recognizes "1", "+", "on" and "true" for true values, and "0",
>> -       "-", "off" and "false" for false values.  Other option-values are
>> +       Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "-
>> +       ", "off" and "false" for false values.  Other option-values are
>>          ignored.
>
>> @@ -109,8 +109,8 @@
>>          Many folks attempt a simple-minded regular expression approach, like
>>          "s/<.*?>//g", but that fails in many cases because the tags may
>>          continue over line breaks, they may contain quoted angle-brackets, or
>> -       HTML comment may be present.  Plus, folks forget to convert
>> -       entities--like "&lt;" for example.
>> +       HTML comment may be present.  Plus, folks forget to convert entities--
>> +       like "&lt;" for example.
>>
>>          Here's one "simple-minded" approach, that works for most files:
>
> I will think about those two tomorrow.

Ingo,

Around line 577 in roff.c is where mandoc_hyph ended up: the quotes need 
to be added.

As for the second one, we should bring jmc@ in, no?  I'd think that 
double or triple-dashes would be broken.  Unicode, for one,

http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks

stipulates that en and em dashes break the line.

Thoughts?

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.11.7 minor issues
  2011-09-18 23:54 ` Kristaps Dzonsons
@ 2011-09-19  7:58   ` Ingo Schwarze
  2011-09-19  8:10     ` Kristaps Dzonsons
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Schwarze @ 2011-09-19  7:58 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 01:54:39AM +0200:
> On 19/09/2011 01:39, Ingo Schwarze wrote:

>> after fixing the two larger problems, systematic comparisons
>> revealed two smaller issues that are new in 1.11.7:

>>>@@ -52,8 +52,8 @@
>>>             -center:off
>>>             -center=off
>>>             -center-
>>>-    Lynx recognizes "1", "+", "on" and "true" for true values, and "0",
>>>-    "-", "off" and "false" for false values.  Other option-values are
>>>+    Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "-
>>>+    ", "off" and "false" for false values.  Other option-values are
>>>      ignored.

>>>@@ -109,8 +109,8 @@
>>>     Many folks attempt a simple-minded regular expression approach, like
>>>     "s/<.*?>//g", but that fails in many cases because the tags may
>>>     continue over line breaks, they may contain quoted angle-brackets, or
>>>-    HTML comment may be present.  Plus, folks forget to convert
>>>-    entities--like "&lt;" for example.
>>>+    HTML comment may be present.  Plus, folks forget to convert entities--
>>>+    like "&lt;" for example.
>>>
>>>     Here's one "simple-minded" approach, that works for most files:

>> I will think about those two tomorrow.

> Around line 577 in roff.c is where mandoc_hyph ended up: the quotes
> need to be added.
> 
> As for the second one, we should bring jmc@ in, no?  I'd think that
> double or triple-dashes would be broken.  Unicode, for one,
> 
> http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks
> 
> stipulates that en and em dashes break the line.
> 
> Thoughts?

I think you are right that breaking at double dashes ought to be ok.
However, groff doesn't break there, i don't consider the point of
sufficient importance to deviate from groff, and not breaking at
double hyphens keeps the code simpler.  I have checked for all
non-alpha ASCII character that groff indeed doesn't break the line
if they preceed or follow a dash.

So, here is what i have done for now - OK?


CVSROOT:	/cvs
Module name:	src
Changes by:	schwarze@cvs.openbsd.org	2011/09/19 01:53:54

Modified files:
	usr.bin/mandoc : roff.c 

Log message:
Breaking the line at a hyphen is only allowed if the hyphen
is both preceded and followed by an alphabetic character.
This fixes about a dozen places in base.


Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.43
diff -u -p -r1.43 roff.c
--- roff.c	18 Sep 2011 23:26:18 -0000	1.43
+++ roff.c	19 Sep 2011 07:49:59 -0000
@@ -552,7 +552,6 @@ again:
 static enum rofferr
 roff_parsetext(char *p)
 {
-	char		 l, r;
 	size_t		 sz;
 	const char	*start;
 	enum mandoc_esc	 esc;
@@ -579,14 +578,8 @@ roff_parsetext(char *p)
 			continue;
 		}
 
-		l = *(p - 1);
-		r = *(p + 1);
-		if ('\\' != l &&
-				'\t' != r && '\t' != l &&
-				' ' != r && ' ' != l &&
-				'-' != r && '-' != l &&
-				! isdigit((unsigned char)l) &&
-				! isdigit((unsigned char)r))
+		if (isalpha((unsigned char)p[-1]) &&
+		    isalpha((unsigned char)p[1]))
 			*p = ASCII_HYPH;
 		p++;
 	}

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.11.7 minor issues
  2011-09-19  7:58   ` Ingo Schwarze
@ 2011-09-19  8:10     ` Kristaps Dzonsons
  2011-09-19  8:41       ` Ingo Schwarze
  0 siblings, 1 reply; 5+ messages in thread
From: Kristaps Dzonsons @ 2011-09-19  8:10 UTC (permalink / raw)
  To: tech

On 19/09/2011 09:58, Ingo Schwarze wrote:
> Hi Kristaps,
>
> Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 01:54:39AM +0200:
>> On 19/09/2011 01:39, Ingo Schwarze wrote:
>
>>> after fixing the two larger problems, systematic comparisons
>>> revealed two smaller issues that are new in 1.11.7:
>
>>>> @@ -52,8 +52,8 @@
>>>>              -center:off
>>>>              -center=off
>>>>              -center-
>>>> -    Lynx recognizes "1", "+", "on" and "true" for true values, and "0",
>>>> -    "-", "off" and "false" for false values.  Other option-values are
>>>> +    Lynx recognizes "1", "+", "on" and "true" for true values, and "0", "-
>>>> +    ", "off" and "false" for false values.  Other option-values are
>>>>       ignored.
>
>>>> @@ -109,8 +109,8 @@
>>>>      Many folks attempt a simple-minded regular expression approach, like
>>>>      "s/<.*?>//g", but that fails in many cases because the tags may
>>>>      continue over line breaks, they may contain quoted angle-brackets, or
>>>> -    HTML comment may be present.  Plus, folks forget to convert
>>>> -    entities--like "&lt;" for example.
>>>> +    HTML comment may be present.  Plus, folks forget to convert entities--
>>>> +    like "&lt;" for example.
>>>>
>>>>      Here's one "simple-minded" approach, that works for most files:
>
>>> I will think about those two tomorrow.
>
>> Around line 577 in roff.c is where mandoc_hyph ended up: the quotes
>> need to be added.
>>
>> As for the second one, we should bring jmc@ in, no?  I'd think that
>> double or triple-dashes would be broken.  Unicode, for one,
>>
>> http://www.cs.tut.fi/~jkorpela/dashes.html#linebreaks
>>
>> stipulates that en and em dashes break the line.
>>
>> Thoughts?
>
> I think you are right that breaking at double dashes ought to be ok.
> However, groff doesn't break there, i don't consider the point of
> sufficient importance to deviate from groff, and not breaking at
> double hyphens keeps the code simpler.  I have checked for all
> non-alpha ASCII character that groff indeed doesn't break the line
> if they preceed or follow a dash.
>
> So, here is what i have done for now - OK?

Ingo,

I'm fine with this.  Is there some place in {mdoc,man}(7) where we 
should be noting this?  It seems that a few words regarding hyphenation 
would be useful, to wit, noting that hyphens will not break within a 
macro in mdoc(7), but will in a regular text context, and the conditions 
for such breakage.

Thoughts?

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 1.11.7 minor issues
  2011-09-19  8:10     ` Kristaps Dzonsons
@ 2011-09-19  8:41       ` Ingo Schwarze
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2011-09-19  8:41 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Mon, Sep 19, 2011 at 10:10:55AM +0200:

> I'm fine with this.

Thanks for looking, i have put it in.

> Is there some place in {mdoc,man}(7) where we should be noting this?

So far, we say very little about how specific input will be physically
formatted, we rather document how the input should look like with
respect to syntax and semantics.

That said, mentioning this might make sense, so i have taken a note
in my private TODO file (i don't think it warrants a public TODO
entry).

The upcoming reordering should be done first.  Also, this is a
typical case where mdoc(7) and man(7) and even other macro packages
behave the same way, so it belongs more in roff(7).  I still hope
that we can move common stuff there, pointing to it where required,
to avoid duplication and make mdoc(7) and man(7) shorter.

> It seems that a few words regarding
> hyphenation would be useful, to wit, noting that hyphens will not
> break within a macro in mdoc(7), but will in a regular text context,
> and the conditions for such breakage.

Yes, we should probably return to that at some point.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-09-19  8:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-18 23:39 1.11.7 minor issues Ingo Schwarze
2011-09-18 23:54 ` Kristaps Dzonsons
2011-09-19  7:58   ` Ingo Schwarze
2011-09-19  8:10     ` Kristaps Dzonsons
2011-09-19  8:41       ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).