tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* line termination in manuals
@ 2011-01-22 19:56 Ingo Schwarze
  2011-01-22 20:05 ` Joerg Sonnenberger
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Schwarze @ 2011-01-22 19:56 UTC (permalink / raw)
  To: tech; +Cc: jmc

Hi,

deraadt@ pointed out that, if somebody would compile mandoc
on Windows, the C library would take care of different
conventions regarding line termination.

Besides, it goes without saying that text files on a UNIX
system use UNIX conventions, and UNIX tools working on them
expect that.  This isn't documented for other text files,
for example configuration files of various tools, either.

OK?

Yours,
  Ingo


Index: man.7
===================================================================
RCS file: /cvs/src/share/man/man7/man.7,v
retrieving revision 1.14
diff -u -r1.14 man.7
--- man.7	16 Jan 2011 02:56:47 -0000	1.14
+++ man.7	22 Jan 2011 19:46:09 -0000
@@ -53,9 +53,6 @@
 .Nm
 documents may contain only graphable 7-bit ASCII characters, the
 space character, and the tab character.
-All manuals must have
-.Ux
-line termination.
 .Pp
 Blank lines are acceptable; where found, the output will assert a
 vertical space.
Index: mdoc.7
===================================================================
RCS file: /cvs/src/share/man/man7/mdoc.7,v
retrieving revision 1.62
diff -u -r1.62 mdoc.7
--- mdoc.7	22 Jan 2011 14:05:28 -0000	1.62
+++ mdoc.7	22 Jan 2011 19:46:10 -0000
@@ -52,9 +52,6 @@
 .Nm
 documents may contain only graphable 7-bit ASCII characters, the space
 character, and, in certain circumstances, the tab character.
-All manuals must have
-.Ux
-line terminators.
 .Pp
 If the first character of a line is a space, that line is printed
 with a leading newline.
Index: roff.7
===================================================================
RCS file: /cvs/src/share/man/man7/roff.7,v
retrieving revision 1.8
diff -u -r1.8 roff.7
--- roff.7	9 Jan 2011 15:24:57 -0000	1.8
+++ roff.7	22 Jan 2011 19:46:10 -0000
@@ -57,10 +57,6 @@
 documented in the
 .Xr mandoc_char 7
 manual.
-.Pp
-All manuals must have
-.Ux
-line terminators.
 .Sh REQUEST SYNTAX
 A request or macro line consists of:
 .Pp
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 19:56 line termination in manuals Ingo Schwarze
@ 2011-01-22 20:05 ` Joerg Sonnenberger
  2011-01-22 21:28   ` Ingo Schwarze
  0 siblings, 1 reply; 10+ messages in thread
From: Joerg Sonnenberger @ 2011-01-22 20:05 UTC (permalink / raw)
  To: tech

On Sat, Jan 22, 2011 at 08:56:56PM +0100, Ingo Schwarze wrote:
> deraadt@ pointed out that, if somebody would compile mandoc
> on Windows, the C library would take care of different
> conventions regarding line termination.

Actually, it wouldn't. We explicitly read the file as whole or mmap it
and separate it at \n. I don't see a good reason why we couldn't just
drop \r before \n and accept both silently.

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 20:05 ` Joerg Sonnenberger
@ 2011-01-22 21:28   ` Ingo Schwarze
  2011-01-22 21:35     ` Joerg Sonnenberger
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Schwarze @ 2011-01-22 21:28 UTC (permalink / raw)
  To: tech

Hi Joerg,

Joerg Sonnenberger wrote on Sat, Jan 22, 2011 at 09:05:17PM +0100:
> On Sat, Jan 22, 2011 at 08:56:56PM +0100, Ingo Schwarze wrote:

>> deraadt@ pointed out that, if somebody would compile mandoc
>> on Windows, the C library would take care of different
>> conventions regarding line termination.

> Actually, it wouldn't. We explicitly read the file as whole or mmap it
> and separate it at \n. I don't see a good reason why we couldn't just
> drop \r before \n and accept both silently.

Good point.

The following works for me.

OK?
  Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.69
diff -u -r1.69 main.c
--- main.c	20 Jan 2011 21:33:11 -0000	1.69
+++ main.c	22 Jan 2011 21:27:00 -0000
@@ -669,6 +669,8 @@
 		}
 
 		while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
+			if ('\r' == blk.buf[i] && '\n' == blk.buf[i+1])
+				++i;
 			if ('\n' == blk.buf[i]) {
 				++i;
 				++lnn;
@@ -705,6 +707,8 @@
 
 			/* Found escape & at least one other char. */
 
+			if ('\r' == blk.buf[i+1] && '\n' == blk.buf[i+2])
+				++i;
 			if ('\n' == blk.buf[i + 1]) {
 				i += 2;
 				/* Escaped newlines are skipped over */
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 21:28   ` Ingo Schwarze
@ 2011-01-22 21:35     ` Joerg Sonnenberger
  2011-01-22 22:18       ` Ingo Schwarze
  0 siblings, 1 reply; 10+ messages in thread
From: Joerg Sonnenberger @ 2011-01-22 21:35 UTC (permalink / raw)
  To: tech

On Sat, Jan 22, 2011 at 10:28:14PM +0100, Ingo Schwarze wrote:
> The following works for me.

At least add a comment that this falls through to the next statement for
handling. Otherwise it looks fine.

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 21:35     ` Joerg Sonnenberger
@ 2011-01-22 22:18       ` Ingo Schwarze
  2011-01-22 22:28         ` Kristaps Dzonsons
  2011-01-22 22:29         ` Joerg Sonnenberger
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Schwarze @ 2011-01-22 22:18 UTC (permalink / raw)
  To: tech

Hi Joerg,

Joerg Sonnenberger wrote on Sat, Jan 22, 2011 at 10:35:10PM +0100:
> On Sat, Jan 22, 2011 at 10:28:14PM +0100, Ingo Schwarze wrote:

> At least add a comment that this falls through to the next statement
> for handling.  Otherwise it looks fine.

Gah, writing comments is good.
It makes you think again about the code,
so you find bugs.

When the last character in blk.buf is '\r' (without a newline
at the end of the file), my first attempt would have overrun
the buffer by one byte.

Thus, i'm committing the following shortly.

Yours,
  Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.69
diff -u -r1.69 main.c
--- main.c	20 Jan 2011 21:33:11 -0000	1.69
+++ main.c	22 Jan 2011 22:12:58 -0000
@@ -669,6 +669,15 @@
 		}
 
 		while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
+
+			/*
+			 * When finding an unescaped newline character,
+			 * leave the character loop to process the line.
+			 * Skip a preceding carriage return, if any.
+			 */
+
+			if ('\r' == blk.buf[i] && '\n' == blk.buf[i+1])
+				++i;
 			if ('\n' == blk.buf[i]) {
 				++i;
 				++lnn;
@@ -703,11 +712,18 @@
 				continue;
 			}
 
-			/* Found escape & at least one other char. */
+			/*
+			 * Found escape and at least one other character.
+			 * When it's a newline character, skip it.
+			 * When there is a carriage return in between,
+			 * skip that one as well.
+			 */
 
+			if ('\r' == blk.buf[i + 1] && i + 1 < (int)blk.sz &&
+			    '\n' == blk.buf[i + 2])
+				++i;
 			if ('\n' == blk.buf[i + 1]) {
 				i += 2;
-				/* Escaped newlines are skipped over */
 				++lnn;
 				continue;
 			}
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 22:18       ` Ingo Schwarze
@ 2011-01-22 22:28         ` Kristaps Dzonsons
  2011-01-22 22:29         ` Joerg Sonnenberger
  1 sibling, 0 replies; 10+ messages in thread
From: Kristaps Dzonsons @ 2011-01-22 22:28 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

>> At least add a comment that this falls through to the next statement
>> for handling.  Otherwise it looks fine.
>
> Gah, writing comments is good.
> It makes you think again about the code,
> so you find bugs.
>
> When the last character in blk.buf is '\r' (without a newline
> at the end of the file), my first attempt would have overrun
> the buffer by one byte.
>
> Thus, i'm committing the following shortly.

Has somebody checked whether groff accepts these in the same way?

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 22:18       ` Ingo Schwarze
  2011-01-22 22:28         ` Kristaps Dzonsons
@ 2011-01-22 22:29         ` Joerg Sonnenberger
  2011-01-22 22:59           ` Ingo Schwarze
  1 sibling, 1 reply; 10+ messages in thread
From: Joerg Sonnenberger @ 2011-01-22 22:29 UTC (permalink / raw)
  To: tech

On Sat, Jan 22, 2011 at 11:18:36PM +0100, Ingo Schwarze wrote:
> When the last character in blk.buf is '\r' (without a newline
> at the end of the file), my first attempt would have overrun
> the buffer by one byte.

Right. What about the same condition for the first if?

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 22:29         ` Joerg Sonnenberger
@ 2011-01-22 22:59           ` Ingo Schwarze
  2011-01-24 23:16             ` Kristaps Dzonsons
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Schwarze @ 2011-01-22 22:59 UTC (permalink / raw)
  To: tech

Hi,

Kristaps Dzonsons wrote on Sat, Jan 22, 2011 at 11:28:11PM +0100:

> Has somebody checked whether groff accepts these in the same way?

It does.


Joerg Sonnenberger wrote on Sat, Jan 22, 2011 at 11:29:38PM +0100:
> On Sat, Jan 22, 2011 at 11:18:36PM +0100, Ingo Schwarze wrote:

>> When the last character in blk.buf is '\r' (without a newline
>> at the end of the file), my first attempt would have overrun
>> the buffer by one byte.

> Right. What about the same condition for the first if?

Has anybody mentioned that handling of null-terminated strings
is less error-prone than of buffers with a length, because there
is the null at the end and you don't that easily overrun?

Joerg, you are right, and not only that, the bounds check was off
by one as well.  Checking i+1 < sz and then accessing buf[i+2]
is not smart.

Maybe i should write poems instead, or something.

Sigh,
  Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.69
diff -u -r1.69 main.c
--- main.c	20 Jan 2011 21:33:11 -0000	1.69
+++ main.c	22 Jan 2011 22:49:00 -0000
@@ -669,6 +669,16 @@
 		}
 
 		while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
+
+			/*
+			 * When finding an unescaped newline character,
+			 * leave the character loop to process the line.
+			 * Skip a preceding carriage return, if any.
+			 */
+
+			if ('\r' == blk.buf[i] && i + 1 < (int)blk.sz &&
+			    '\n' == blk.buf[i + 1])
+				++i;
 			if ('\n' == blk.buf[i]) {
 				++i;
 				++lnn;
@@ -703,11 +713,18 @@
 				continue;
 			}
 
-			/* Found escape & at least one other char. */
+			/*
+			 * Found escape and at least one other character.
+			 * When it's a newline character, skip it.
+			 * When there is a carriage return in between,
+			 * skip that one as well.
+			 */
 
+			if ('\r' == blk.buf[i + 1] && i + 2 < (int)blk.sz &&
+			    '\n' == blk.buf[i + 2])
+				++i;
 			if ('\n' == blk.buf[i + 1]) {
 				i += 2;
-				/* Escaped newlines are skipped over */
 				++lnn;
 				continue;
 			}
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-22 22:59           ` Ingo Schwarze
@ 2011-01-24 23:16             ` Kristaps Dzonsons
  2011-01-25  0:09               ` Ingo Schwarze
  0 siblings, 1 reply; 10+ messages in thread
From: Kristaps Dzonsons @ 2011-01-24 23:16 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

>> Has somebody checked whether groff accepts these in the same way?
>
> It does.
>
>
> Joerg Sonnenberger wrote on Sat, Jan 22, 2011 at 11:29:38PM +0100:
>> On Sat, Jan 22, 2011 at 11:18:36PM +0100, Ingo Schwarze wrote:
>
>>> When the last character in blk.buf is '\r' (without a newline
>>> at the end of the file), my first attempt would have overrun
>>> the buffer by one byte.
>
>> Right. What about the same condition for the first if?
>
> Has anybody mentioned that handling of null-terminated strings
> is less error-prone than of buffers with a length, because there
> is the null at the end and you don't that easily overrun?
>
> Joerg, you are right, and not only that, the bounds check was off
> by one as well.  Checking i+1<  sz and then accessing buf[i+2]
> is not smart.
>
> Maybe i should write poems instead, or something.

.Dd $Mdocdate$
.Dt FOO 1
.Os
.Sh NAME
.Nm foo
.Nd a poem by pretend\-Ingo
.Sh SYNOPSIS
.Nm Roses
.Ar e red, violets
.Ar e blue...

Incidentally, whenever you get the \r\n logic ironed out, check it in. 
I've been wanting to have DOS file support for a while...

(Should we specify that the -Tascii output is always UNIX newline 
terminated?)

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: line termination in manuals
  2011-01-24 23:16             ` Kristaps Dzonsons
@ 2011-01-25  0:09               ` Ingo Schwarze
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Schwarze @ 2011-01-25  0:09 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Tue, Jan 25, 2011 at 12:16:19AM +0100:

> Incidentally, whenever you get the \r\n logic ironed out,
> check it in.

Done.

> (Should we specify that the -Tascii output is always UNIX newline
> terminated?)

Better not.
I have no idea what putchar('\n') will do on non-UNIX systems.

Besides, we are talking about a UNIX tools to handle UNIX manuals,
and none of the other UNIX manuals restates explicitly that it is
using UNIX conventions.  I consider this obvious without saying.

Or, if putchar('\n') is doing the wrong thing on some system,
and there is a more portable way, we might consider using said
more portable way instead, unless it is too cumbersome.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-25  0:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-22 19:56 line termination in manuals Ingo Schwarze
2011-01-22 20:05 ` Joerg Sonnenberger
2011-01-22 21:28   ` Ingo Schwarze
2011-01-22 21:35     ` Joerg Sonnenberger
2011-01-22 22:18       ` Ingo Schwarze
2011-01-22 22:28         ` Kristaps Dzonsons
2011-01-22 22:29         ` Joerg Sonnenberger
2011-01-22 22:59           ` Ingo Schwarze
2011-01-24 23:16             ` Kristaps Dzonsons
2011-01-25  0:09               ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).