tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* exit_status persistence.
@ 2010-12-01 16:33 Kristaps Dzonsons
  2010-12-01 16:41 ` Kristaps Dzonsons
  2010-12-01 21:28 ` Ingo Schwarze
  0 siblings, 2 replies; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-01 16:33 UTC (permalink / raw)
  To: tech

Ingo, I flushed out some peculiar behaviour in the new main.c.

If a FATAL parse error occurs (e.g., rxdebug.1) when multiple files are 
passed on the command-line, subsequent files are parsed but not 
outputted.  This occurs due to main.c:484.  I don't want to monkey with 
exit_status---can you verify that exit_status may somehow be saved or 
reset between parses so that passing multiple files doesn't cause 
truncated output?

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: exit_status persistence.
  2010-12-01 16:33 exit_status persistence Kristaps Dzonsons
@ 2010-12-01 16:41 ` Kristaps Dzonsons
  2010-12-01 21:28 ` Ingo Schwarze
  1 sibling, 0 replies; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-01 16:41 UTC (permalink / raw)
  To: tech

> Ingo, I flushed out some peculiar behaviour in the new main.c.
>
> If a FATAL parse error occurs (e.g., rxdebug.1) when multiple files are
> passed on the command-line, subsequent files are parsed but not
> outputted. This occurs due to main.c:484. I don't want to monkey with
> exit_status---can you verify that exit_status may somehow be saved or
> reset between parses so that passing multiple files doesn't cause
> truncated output?

Note that except for the mandoc error-enum reordering (will do so later) 
and importing new error enums from `so' handling, main.c is fully 
up-to-date, so don't worry about clobbering any current changes on it...
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: exit_status persistence.
  2010-12-01 16:33 exit_status persistence Kristaps Dzonsons
  2010-12-01 16:41 ` Kristaps Dzonsons
@ 2010-12-01 21:28 ` Ingo Schwarze
  2010-12-02 10:51   ` exit_status persistence (now: roff.c question) Kristaps Dzonsons
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-01 21:28 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Wed, Dec 01, 2010 at 05:33:52PM +0100:

> Ingo, I flushed out some peculiar behaviour in the new main.c.
> 
> If a FATAL parse error occurs (e.g., rxdebug.1) when multiple files
> are passed on the command-line, subsequent files are parsed but not
> outputted.  This occurs due to main.c:484.  I don't want to monkey
> with exit_status---can you verify that exit_status may somehow be
> saved or reset between parses so that passing multiple files doesn't
> cause truncated output?

Looking at the code, i'd say you are right, this looks like a blatant
design error and needs to be fixed.
I'll probably split exit_status into two variables:
First, file_status will be initialized to MANDOCERR_OK for each
file, and that's the one mmsg will write to.
Then, at the end of each file, the contents must be moved to
exit_status, in case it's more severe than what is already there.

I'll cook up a patch after completing the bsd.lv -> OpenBSD merges.
Merge, then write code is usually better than the other way round.  :)

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: exit_status persistence (now: roff.c question).
  2010-12-01 21:28 ` Ingo Schwarze
@ 2010-12-02 10:51   ` Kristaps Dzonsons
  2010-12-02 13:29     ` Kristaps Dzonsons
  2010-12-02 20:54     ` Ingo Schwarze
  0 siblings, 2 replies; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-02 10:51 UTC (permalink / raw)
  To: tech

>> Ingo, I flushed out some peculiar behaviour in the new main.c.
>>
>> If a FATAL parse error occurs (e.g., rxdebug.1) when multiple files
>> are passed on the command-line, subsequent files are parsed but not
>> outputted.  This occurs due to main.c:484.  I don't want to monkey
>> with exit_status---can you verify that exit_status may somehow be
>> saved or reset between parses so that passing multiple files doesn't
>> cause truncated output?
>
> Looking at the code, i'd say you are right, this looks like a blatant
> design error and needs to be fixed.
> I'll probably split exit_status into two variables:
> First, file_status will be initialized to MANDOCERR_OK for each
> file, and that's the one mmsg will write to.
> Then, at the end of each file, the contents must be moved to
> exit_status, in case it's more severe than what is already there.
>
> I'll cook up a patch after completing the bsd.lv ->  OpenBSD merges.
> Merge, then write code is usually better than the other way round.  :)

Another question---I'm about to check in roff.c as fully in sync after 
running some more tests---I notice that `de1' has been removed from the 
switch statement at roff.c:549.  I understand this is because `de1' is 
renamed at roff.c:636.  If this is the case, can you document this 
process a bit more?

There's also an infinite loop somewhere in these new roff.c changes 
that's causing the NetBSD manuals to blow up...

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: exit_status persistence (now: roff.c question).
  2010-12-02 10:51   ` exit_status persistence (now: roff.c question) Kristaps Dzonsons
@ 2010-12-02 13:29     ` Kristaps Dzonsons
  2010-12-02 22:50       ` roff.c question Ingo Schwarze
  2010-12-02 20:54     ` Ingo Schwarze
  1 sibling, 1 reply; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-02 13:29 UTC (permalink / raw)
  To: tech

[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

>>> Ingo, I flushed out some peculiar behaviour in the new main.c.
>>>
>>> If a FATAL parse error occurs (e.g., rxdebug.1) when multiple files
>>> are passed on the command-line, subsequent files are parsed but not
>>> outputted. This occurs due to main.c:484. I don't want to monkey
>>> with exit_status---can you verify that exit_status may somehow be
>>> saved or reset between parses so that passing multiple files doesn't
>>> cause truncated output?
>>
>> Looking at the code, i'd say you are right, this looks like a blatant
>> design error and needs to be fixed.
>> I'll probably split exit_status into two variables:
>> First, file_status will be initialized to MANDOCERR_OK for each
>> file, and that's the one mmsg will write to.
>> Then, at the end of each file, the contents must be moved to
>> exit_status, in case it's more severe than what is already there.
>>
>> I'll cook up a patch after completing the bsd.lv -> OpenBSD merges.
>> Merge, then write code is usually better than the other way round. :)
>
> Another question---I'm about to check in roff.c as fully in sync after
> running some more tests---I notice that `de1' has been removed from the
> switch statement at roff.c:549. I understand this is because `de1' is
> renamed at roff.c:636. If this is the case, can you document this
> process a bit more?
>
> There's also an infinite loop somewhere in these new roff.c changes
> that's causing the NetBSD manuals to blow up...

Replying again to myself...

The groff_char.man page causes the blow-up (enclosed).

[-- Attachment #2: groff_char.man --]
[-- Type: application/x-troff-man, Size: 35603 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-02 10:51   ` exit_status persistence (now: roff.c question) Kristaps Dzonsons
  2010-12-02 13:29     ` Kristaps Dzonsons
@ 2010-12-02 20:54     ` Ingo Schwarze
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-02 20:54 UTC (permalink / raw)
  To: tech

Hi Kristaps,

>> I'll probably split exit_status into two variables:
>> First, file_status will be initialized to MANDOCERR_OK for each
>> file, and that's the one mmsg will write to.
>> Then, at the end of each file, the contents must be moved to
>> exit_status, in case it's more severe than what is already there.

That was so easy that i have just put it in.

> Another question---I'm about to check in roff.c as fully in sync
> after running some more tests---I notice that `de1' has been removed
> from the switch statement at roff.c:549.  I understand this is
> because `de1' is renamed at roff.c:636.

Exactly.

> If this is the case, can you document this process a bit more?

I guess i should do that when touching roff.c the next time, which
might happen soon...

> There's also an infinite loop somewhere in these new roff.c changes
> that's causing the NetBSD manuals to blow up...

... given that i'm now starting to look into that one.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-02 13:29     ` Kristaps Dzonsons
@ 2010-12-02 22:50       ` Ingo Schwarze
  2010-12-03 21:49         ` Ingo Schwarze
  2010-12-03 23:31         ` Ingo Schwarze
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-02 22:50 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Thu, Dec 02, 2010 at 02:29:41PM +0100:

> The groff_char.man page causes the blow-up (enclosed).

Typical GNU sabotage.  ;-(

Did you look at that code?
Take a seat first, lest you fall.

But seriously, how do you expect the following code to behave?

  .ds CH \\*[CH]
  \*(CH

Here is what groff does:

  EXAMPLES/loop.in:8: fatal error:
  input stack limit exceeded (probable infinite loop)

Apparently, the groff_char manual does not actually dereference this
self-referencing string, but the macro and if constructs are so
complicated that i guess i would need quite some time to disentangle
them and understand the reason.

In any case, this is all way above mandoc's head.
Young mandoc simply evaluates almost all conditions to false -
and runs straight into the trap.

I see two things this we can do here:

 1) roff_res() should not expand \\* or \\\\*,
    but it should expand \* and \\\*.

 2) We should implement some kind of stack limit
    and just bail out.

Planning to look into that tomorrow...

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-02 22:50       ` roff.c question Ingo Schwarze
@ 2010-12-03 21:49         ` Ingo Schwarze
  2010-12-05 15:15           ` Kristaps Dzonsons
  2010-12-03 23:31         ` Ingo Schwarze
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-03 21:49 UTC (permalink / raw)
  To: tech

Hi Kristaps,

>  1) roff_res() should not expand \\* or \\\\*,
>     but it should expand \* and \\\*.

OK for the following patch?

It fixes this test case:

  .TH bsbs 1 "December 3, 2010"
  .SH NAME
  bsbs - double backslash in a string definition
  .SH DESCRIPTION
  .ds right wrong
  .ds inner *[right]
  .ds outer \\*[inner]
  result: \*[outer]
  .PP
  The above line must be "result: *[right]",
  not "result: wrong".


Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.21
diff -u -p -r1.21 roff.c
--- roff.c	28 Nov 2010 19:35:33 -0000	1.21
+++ roff.c	3 Dec 2010 21:44:36 -0000
@@ -338,14 +338,19 @@ static int
 roff_res(struct roff *r, char **bufp, size_t *szp, int pos)
 {
 	const char	*cp, *cpp, *st, *res;
-	int		 i, maxl;
+	int		 i, bs, maxl;
 	size_t		 nsz;
 	char		*n;
 
-	/* LINTED */
-	for (cp = &(*bufp)[pos]; (cpp = strstr(cp, "\\*")); cp++) {
-		cp = cpp + 2;
-		switch (*cp) {
+	cp = *bufp + pos;
+	bs = 0;
+	while (*cp) {
+		cpp = cp++;
+		if ('\\' == *cp)
+			bs = !bs;
+		if ( ! (bs && '*' == *cp))
+			continue;
+		switch (*(++cp)) {
 		case ('('):
 			cp++;
 			maxl = 2;
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-02 22:50       ` roff.c question Ingo Schwarze
  2010-12-03 21:49         ` Ingo Schwarze
@ 2010-12-03 23:31         ` Ingo Schwarze
  2010-12-05 15:17           ` Kristaps Dzonsons
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-03 23:31 UTC (permalink / raw)
  To: tech

Hi Kristaps,

>  2) We should implement some kind of stack limit
>     and just bail out.

OK to commit the following patch, too?

The constant of 1000 is the same that new groff is using,
and i'm mostly indifferent regarding the question which number
to use.


Note though that groff_char.man still puts a terrible strain on mandoc
even with these two patches:

  9219 lines of output, including
  1416 ERROR: input stack limit exceeded, infinite loop?
  3328 ERROR: skipping unknown macro
    10 other errors and warnings
  4465 lines of completely garbled output on stdout

On my old, slow test box, mandoc groff_char.man runs
more than two minutes with these two patches.

But i don't really see the point in trying to improve this.
We won't get anything even semi-sensible out of this input
without implementing full roff.

Yours,
  Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.60
diff -u -p -r1.60 main.c
--- main.c	2 Dec 2010 20:40:43 -0000	1.60
+++ main.c	3 Dec 2010 23:19:01 -0000
@@ -162,6 +162,7 @@ static	const char * const	mandocerrs[MAN
 
 	"generic error",
 
+	"input stack limit exceeded, infinite loop?",
 	"skipping bad character",
 	"skipping text before the first section header",
 	"skipping unknown macro",
@@ -631,6 +632,8 @@ parsebuf(struct curparse *curp, struct b
 {
 	struct buf	 ln;
 	enum rofferr	 rr;
+	static int	 reparse_count;
+#define REPARSE_LIMIT	 1000
 	int		 i, of, rc;
 	int		 pos; /* byte number in the ln buffer */
 	int		 lnn; /* line number in the real file */
@@ -651,8 +654,10 @@ parsebuf(struct curparse *curp, struct b
 		if (0 == pos && '\0' == blk.buf[i])
 			break;
 
-		if (start)
+		if (start) {
 			curp->line = lnn;
+			reparse_count = 0;
+		}
 
 		while (i < (int)blk.sz && (start || '\0' != blk.buf[i])) {
 			if ('\n' == blk.buf[i]) {
@@ -751,7 +756,11 @@ rerun:
 
 		switch (rr) {
 		case (ROFF_REPARSE):
-			parsebuf(curp, ln, 0);
+			if (REPARSE_LIMIT >= ++reparse_count)
+				parsebuf(curp, ln, 0);
+			else
+				mmsg(MANDOCERR_ROFFLOOP, curp, 
+				    curp->line, pos, NULL);
 			pos = 0;
 			continue;
 		case (ROFF_APPEND):
Index: mandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.h,v
retrieving revision 1.23
diff -u -p -r1.23 mandoc.h
--- mandoc.h	1 Dec 2010 22:02:29 -0000	1.23
+++ mandoc.h	3 Dec 2010 23:19:01 -0000
@@ -100,6 +100,7 @@ enum	mandocerr {
 	MANDOCERR_BADQUOTE, /* unterminated quoted string */
 
 	MANDOCERR_ERROR, /* ===== start of errors ===== */
+	MANDOCERR_ROFFLOOP, /* input stack limit exceeded, infinite loop? */
 	MANDOCERR_BADCHAR, /* skipping bad character */
 	MANDOCERR_NOTEXT, /* skipping text before the first section header */
 	MANDOCERR_MACRO, /* skipping unknown macro */
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-03 21:49         ` Ingo Schwarze
@ 2010-12-05 15:15           ` Kristaps Dzonsons
  2010-12-08  1:05             ` Ingo Schwarze
  0 siblings, 1 reply; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-05 15:15 UTC (permalink / raw)
  To: tech

>>   1) roff_res() should not expand \\* or \\\\*,
>>      but it should expand \* and \\\*.
>
> OK for the following patch?
>
> It fixes this test case:

Ingo,

That's fine by me, but let's please put some roff.7 documentation (and 
in-line) in place regarding this sort of chicanery.  It confuses the 
hell out of me for one!

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-03 23:31         ` Ingo Schwarze
@ 2010-12-05 15:17           ` Kristaps Dzonsons
  2010-12-09 23:45             ` Ingo Schwarze
  0 siblings, 1 reply; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-05 15:17 UTC (permalink / raw)
  To: tech

>>   2) We should implement some kind of stack limit
>>      and just bail out.
>
> OK to commit the following patch, too?
>
> The constant of 1000 is the same that new groff is using,
> and i'm mostly indifferent regarding the question which number
> to use.
>
>
> Note though that groff_char.man still puts a terrible strain on mandoc
> even with these two patches:
>
>    9219 lines of output, including
>    1416 ERROR: input stack limit exceeded, infinite loop?
>    3328 ERROR: skipping unknown macro
>      10 other errors and warnings
>    4465 lines of completely garbled output on stdout
>
> On my old, slow test box, mandoc groff_char.man runs
> more than two minutes with these two patches.
>
> But i don't really see the point in trying to improve this.
> We won't get anything even semi-sensible out of this input
> without implementing full roff.

Ingo, this is fine by me---conditional upon (1) making the variable be 
part of struct roff and not static, and (2) stating the existence of a 
limit in roff.7.  I realise this is putting a lot of technical detail in 
roff.7 (as per last patch response, too) but I'm really, really sick of 
groff's crappy, undocumented behaviour and would rather too much than 
too little.

Can you pop an entry into the TODO to the extent that more rigorous 
checking of looping constructs should be enacted?

Thanks again, Ingo!

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-05 15:15           ` Kristaps Dzonsons
@ 2010-12-08  1:05             ` Ingo Schwarze
  2010-12-10  9:40               ` Kristaps Dzonsons
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-08  1:05 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Sun, Dec 05, 2010 at 04:15:12PM +0100:
> Ingo wrote:

>> 1) roff_res() should not expand \\* or \\\\*,
>>    but it should expand \* and \\\*.
>>
>> OK for the following patch?

> That's fine by me,

Oh well, the patch was broken, fortunately i notived before
committing it.  The problem was that it didn't make sure that
the '*' immediately followed the '\\'.

> but let's please put some roff.7 documentation
> (and in-line) in place

Actually, that's what made me find the problem.
Documenting code is often a good debugging strategy.

> regarding this sort of chicanery.
> It confuses the hell out of me for one!

At some point, i guess we will have to implement proper
roff copy mode - but not right now, we still have more
elementary problems to deal with.  At that point, we
ought to improve roff(7) docs related to .ds and \*.

For now, i suggest to add just one sentence.


See below for an updated patch.
As it is quite different form the previous one,
i'm not committing it right away,
but sending it again for review.

It is not as long as it seems; that's just the comments!  ;-)
Actually, the code is more straightforward and easier than
my first try and uses one automatic variable less.

As my other patch depends on this one (not in the sense of
physical conflicts in the code, but logically), i need to
return to it when this has gone in.

Yours,
  Ingo


Index: usr.bin/mandoc/roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.22
diff -u -p -r1.22 roff.c
--- usr.bin/mandoc/roff.c	7 Dec 2010 00:08:52 -0000	1.22
+++ usr.bin/mandoc/roff.c	8 Dec 2010 00:49:53 -0000
@@ -337,15 +337,43 @@ roff_alloc(struct regset *regs, void *da
 static int
 roff_res(struct roff *r, char **bufp, size_t *szp, int pos)
 {
-	const char	*cp, *cpp, *st, *res;
+	const char	*stesc;	/* start of an escape sequence ('\\') */
+	const char	*stnam;	/* start of the name, after "[(*" */
+	const char	*cp;	/* end of the name, e.g. before ']' */
+	const char	*res;	/* the string to be substituted */
 	int		 i, maxl;
 	size_t		 nsz;
 	char		*n;
 
-	/* LINTED */
-	for (cp = &(*bufp)[pos]; (cpp = strstr(cp, "\\*")); cp++) {
-		cp = cpp + 2;
-		switch (*cp) {
+	/* String escape sequences have at least three characters. */
+
+	for (cp = *bufp + pos; cp[0] && cp[1] && cp[2]; cp++) {
+
+		/*
+		 * The first character must be a backslash.
+		 * Save a pointer to it.
+		 */
+
+		if ('\\' != *cp)
+			continue;
+		stesc = cp;
+
+		/*
+		 * The second character must be an asterisk.
+		 * If it isn't, skip it anyway:  It is escaped,
+		 * so it can't start another escape sequence.
+		 */
+
+		if ('*' != *(++cp))
+			continue;
+
+		/*
+		 * The third character decides the length
+		 * of the name of the string.
+		 * Save a pointer to the name.
+		 */
+
+		switch (*(++cp)) {
 		case ('('):
 			cp++;
 			maxl = 2;
@@ -358,8 +386,9 @@ roff_res(struct roff *r, char **bufp, si
 			maxl = 1;
 			break;
 		}
+		stnam = cp;
 
-		st = cp;
+		/* Advance to the end of the name. */
 
 		for (i = 0; 0 == maxl || i < maxl; i++, cp++) {
 			if ('\0' == *cp)
@@ -368,19 +397,24 @@ roff_res(struct roff *r, char **bufp, si
 				break;
 		}
 
-		res = roff_getstrn(r, st, (size_t)i);
+		/*
+		 * Retrieve the replacement string; if it is
+		 * undefined, resume searching for escapes.
+		 */
+
+		res = roff_getstrn(r, stnam, (size_t)i);
 
 		if (NULL == res) {
 			cp -= maxl ? 1 : 0;
 			continue;
 		}
 
+		/* Replace the escape sequence by the string. */
+
 		nsz = *szp + strlen(res) + 1;
 		n = mandoc_malloc(nsz);
 
-		*n = '\0';
-
-		strlcat(n, *bufp, (size_t)(cpp - *bufp + 1));
+		strlcpy(n, *bufp, (size_t)(stesc - *bufp + 1));
 		strlcat(n, res, nsz);
 		strlcat(n, cp + (maxl ? 0 : 1), nsz);
 
@@ -544,6 +578,7 @@ roff_cblock(ROFF_ARGS)
 	case (ROFF_am1):
 		/* FALLTHROUGH */
 	case (ROFF_de):
+		/* ROFF_de1 is remapped to ROFF_de in roff_block(). */
 		/* FALLTHROUGH */
 	case (ROFF_dei):
 		/* FALLTHROUGH */
Index: share/man/man7/roff.7
===================================================================
RCS file: /cvs/src/share/man/man7/roff.7,v
retrieving revision 1.5
diff -u -p -r1.5 roff.7
--- share/man/man7/roff.7	30 Nov 2010 22:40:13 -0000	1.5
+++ share/man/man7/roff.7	8 Dec 2010 00:49:53 -0000
@@ -269,6 +269,9 @@ for a
 of arbitrary length, or \e*(NN or \e*N if the length of
 .Ar name
 is two or one characters, respectively.
+Interpolation can be prevented by escaping the leading backslash;
+that is, an asterisk preceded by an even number of backslashes
+does not trigger string interpolation.
 .Pp
 Since user-defined strings and macros share a common string table,
 defining a string
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-05 15:17           ` Kristaps Dzonsons
@ 2010-12-09 23:45             ` Ingo Schwarze
  2010-12-10  9:32               ` Kristaps Dzonsons
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-09 23:45 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Sun, Dec 05, 2010 at 04:17:55PM +0100:
> Ingo wrote:

>> 2) We should implement some kind of stack limit
>>    and just bail out.
>>
>> OK to commit the following patch, too?

> Ingo, this is fine by me---conditional upon (1) making the variable
> be part of struct roff

Hmmm, that doesn't work, struct roff is defined locally in roff.c,
so in main.c it has incomplete type.

> and not static,

But that part does work, i have just made it a member of
struct curparse.

> and (2) stating the existence of a limit in roff.7.

Done, in all brevity.

> I realise this is putting a lot of technical
> detail in roff.7 (as per last patch response, too)

Well, the roff language is full of obscure technicalities,
so we can't help that.  Hopefully, manual authors will rarely
need information from roff(7) and get away with mdoc(7).

> but I'm really, really sick of groff's crappy, undocumented
> behaviour and would rather too much than too little.
> 
> Can you pop an entry into the TODO to the extent that more rigorous
> checking of looping constructs should be enacted?

Done.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-09 23:45             ` Ingo Schwarze
@ 2010-12-10  9:32               ` Kristaps Dzonsons
  0 siblings, 0 replies; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-10  9:32 UTC (permalink / raw)
  To: tech

Ingo,

>>> 2) We should implement some kind of stack limit
>>>     and just bail out.
>>>
>>> OK to commit the following patch, too?
>
>> Ingo, this is fine by me---conditional upon (1) making the variable
>> be part of struct roff
>
> Hmmm, that doesn't work, struct roff is defined locally in roff.c,
> so in main.c it has incomplete type.
>
>> and not static,
>
> But that part does work, i have just made it a member of
> struct curparse.

Ehhh, you know what I mean. :-)  No function-scoped statics in the 
libraries!

>> and (2) stating the existence of a limit in roff.7.
>
> Done, in all brevity.
>
>> I realise this is putting a lot of technical
>> detail in roff.7 (as per last patch response, too)
>
> Well, the roff language is full of obscure technicalities,
> so we can't help that.  Hopefully, manual authors will rarely
> need information from roff(7) and get away with mdoc(7).
>
>> but I'm really, really sick of groff's crappy, undocumented
>> behaviour and would rather too much than too little.
>>
>> Can you pop an entry into the TODO to the extent that more rigorous
>> checking of looping constructs should be enacted?
>
> Done.

Great---check it in (I only see the TODO parts checked in in the commit 
mailings)!

Thanks and thanks,

Kristaps

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-08  1:05             ` Ingo Schwarze
@ 2010-12-10  9:40               ` Kristaps Dzonsons
  2010-12-10 20:45                 ` Ingo Schwarze
  0 siblings, 1 reply; 20+ messages in thread
From: Kristaps Dzonsons @ 2010-12-10  9:40 UTC (permalink / raw)
  To: tech

On 08/12/2010 02:05, Ingo Schwarze wrote:
> Hi Kristaps,
>
> Kristaps Dzonsons wrote on Sun, Dec 05, 2010 at 04:15:12PM +0100:
>> Ingo wrote:
>
>>> 1) roff_res() should not expand \\* or \\\\*,
>>>     but it should expand \* and \\\*.
>>>
>>> OK for the following patch?
>
>> That's fine by me,
>
> Oh well, the patch was broken, fortunately i notived before
> committing it.  The problem was that it didn't make sure that
> the '*' immediately followed the '\\'.
>
>> but let's please put some roff.7 documentation
>> (and in-line) in place
>
> Actually, that's what made me find the problem.
> Documenting code is often a good debugging strategy.
>
>> regarding this sort of chicanery.
>> It confuses the hell out of me for one!
>
> At some point, i guess we will have to implement proper
> roff copy mode - but not right now, we still have more
> elementary problems to deal with.  At that point, we
> ought to improve roff(7) docs related to .ds and \*.
>
> For now, i suggest to add just one sentence.
>
>
> See below for an updated patch.
> As it is quite different form the previous one,
> i'm not committing it right away,
> but sending it again for review.
>
> It is not as long as it seems; that's just the comments!  ;-)
> Actually, the code is more straightforward and easier than
> my first try and uses one automatic variable less.
>
> As my other patch depends on this one (not in the sense of
> physical conflicts in the code, but logically), i need to
> return to it when this has gone in.

Ingo,

Much more readable.  I worry a bit about performance, as this is an 
often-called function: why the funny business looking for three non-nils 
then the escaped asterisk?

Seems (it pseudo-C)

   for (cp = buf + pos; cpp = strstr(cp, "\\*"); cp++)
      if ('\0' == cpp[2]) continue;
      ...

would be much clearer and faster.  No?

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-10  9:40               ` Kristaps Dzonsons
@ 2010-12-10 20:45                 ` Ingo Schwarze
  2010-12-10 20:52                   ` Joerg Sonnenberger
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-10 20:45 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Fri, Dec 10, 2010 at 10:40:52AM +0100:
>>> Ingo wrote:

>>>> 1) roff_res() should not expand \\* or \\\\*,
>>>>    but it should expand \* and \\\*.

> I worry a bit about performance, as this is an
> often-called function:

Yes, the function is called L+E times,
where L is the number of input lines and E is the number
of expanded strings and user-defined roff macro invocations.

But my algorithm is still O(N), where N is the number of
characters on the line, as long as no expansion is done.
So, if C is the total number of characters in the input file,
and e is the mean number of expansions per line,
then the total order of the search algorithm (without
doing the actual expansions) is O(C*(1+e)).

The work to do for the expansions is O(C*e), but that
didn't change.

All that is still nearly linear, which is hardly a bad order.

> why the funny business looking for three
> non-nils then the escaped asterisk?
> 
> Seems (it pseudo-C)
> 
>   for (cp = buf + pos; cpp = strstr(cp, "\\*"); cp++)
>      if ('\0' == cpp[2]) continue;
>      ...
> 
> would be much clearer and faster.  No?

No.

Faster, no.  That's O(C*(1+e)) as well.
So we are talking about constant factors here.
Maybe you are economizing two char comparisons
per loop cycle.  Assuming e=1, that will buy you
perhaps 400k char comparisons on a 100k input file.
That might be about a millisecond on a very old PC.

Clearer, no.  It's incorrect.  You cannot use strstr().
The number of backslashes before "\\*" is relevant.
It must be even, or you don't want to substitute.

If we want to go into micro-optimization, we could do this:

        for (cp = *bufp + pos; *cp; cp++) {
                if ('\\' != *cp)
                        continue;
                stesc = cp;
		if ('\0' == *(++cp))
			break;
		if ('*' != *cp)
			continue;
		if ('\0' == *(++cp))
			break;
		switch (*cp) {

deferring the tests until they are really needed (untested).
That's fewer tests because most of the time, the
continue branches will be taken.

But is that really better?  It is more code, and harder
to understand.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-10 20:45                 ` Ingo Schwarze
@ 2010-12-10 20:52                   ` Joerg Sonnenberger
  2010-12-10 21:10                     ` Ingo Schwarze
  0 siblings, 1 reply; 20+ messages in thread
From: Joerg Sonnenberger @ 2010-12-10 20:52 UTC (permalink / raw)
  To: tech

On Fri, Dec 10, 2010 at 09:45:13PM +0100, Ingo Schwarze wrote:
> If we want to go into micro-optimization, we could do this:
> 
>         for (cp = *bufp + pos; *cp; cp++) {
>                 if ('\\' != *cp)
>                         continue;


... or use a single strchr() call to start the search. For the typical
case of medium long lines and no \, strchr can operate mostly on
dwords...

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-10 20:52                   ` Joerg Sonnenberger
@ 2010-12-10 21:10                     ` Ingo Schwarze
  2010-12-10 21:17                       ` Joerg Sonnenberger
  2010-12-10 23:12                       ` Ingo Schwarze
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-10 21:10 UTC (permalink / raw)
  To: tech

Hi Joerg,

Joerg Sonnenberger wrote on Fri, Dec 10, 2010 at 09:52:03PM +0100:
> On Fri, Dec 10, 2010 at 09:45:13PM +0100, Ingo Schwarze wrote:

>> If we want to go into micro-optimization, we could do this:
>> 
>>         for (cp = *bufp + pos; *cp; cp++) {
>>                 if ('\\' != *cp)
>>                         continue;

> ... or use a single strchr() call to start the search.

Sure, that would be correct, and not even obfuscation.

> For the typical case of medium long lines and no \, strchr
> can operate mostly on dwords...

Here is our implementation of strchr():

char *
strchr(const char *p, int ch)
{
	for (;; ++p) {
		if (*p == ch)
			return((char *)p);
		if (!*p)
			return((char *)NULL);
	}
	/* NOTREACHED */
}

You mean, the compiler will optimize that into dword
comparisons, but will not optimize this code:

	for (cp = *bufp + pos; *cp; cp++) {
		if ('\\' != *cp)
			continue;

And the difference will buy us enough to warrant the extra
function call?

I doubt any of this is relevant, but i'm tempted to do some
measurements...

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-10 21:10                     ` Ingo Schwarze
@ 2010-12-10 21:17                       ` Joerg Sonnenberger
  2010-12-10 23:12                       ` Ingo Schwarze
  1 sibling, 0 replies; 20+ messages in thread
From: Joerg Sonnenberger @ 2010-12-10 21:17 UTC (permalink / raw)
  To: tech

On Fri, Dec 10, 2010 at 10:10:20PM +0100, Ingo Schwarze wrote:
> Hi Joerg,
> 
> Joerg Sonnenberger wrote on Fri, Dec 10, 2010 at 09:52:03PM +0100:
> > On Fri, Dec 10, 2010 at 09:45:13PM +0100, Ingo Schwarze wrote:
> 
> >> If we want to go into micro-optimization, we could do this:
> >> 
> >>         for (cp = *bufp + pos; *cp; cp++) {
> >>                 if ('\\' != *cp)
> >>                         continue;
> 
> > ... or use a single strchr() call to start the search.
> 
> Sure, that would be correct, and not even obfuscation.
> 
> > For the typical case of medium long lines and no \, strchr
> > can operate mostly on dwords...
> 
> Here is our implementation of strchr():

If "our" you mean OpenBSD, it is not the version in libc on most
platforms, which is in assembler. But yes, the OpenBSD version e.g. for
i386 is pretty stupid.

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: roff.c question
  2010-12-10 21:10                     ` Ingo Schwarze
  2010-12-10 21:17                       ` Joerg Sonnenberger
@ 2010-12-10 23:12                       ` Ingo Schwarze
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Schwarze @ 2010-12-10 23:12 UTC (permalink / raw)
  To: tech

Hi,

> i'm tempted to do some measurements...

I took the ksh(1) manual, which is rather large,
but a real-word manual with few user-defined strings,
concatenated it 20 times to iteslf (only one header,
of course), and added one .ds near the beginning,
such that roff_res() gets called.  The result is a bit
above 100k lines, two and a half MB total size.

Timing for mandoc -Tlint -Wfatal test.1 on my notebook is:

  three checks up front:  1.40s
  delayed checks:         1.39s
  with strchr:            1.39s

  calling roff_res N times instead of once:
  N=0:                    1.37s
  N=100, strchr:          3.25s
  N=100, up front:        4.45s

So, time spent in roff_res() is about 2.2 percent of
the total parsing time, and we can save nearly 40 percent
of these 2.2 percent.

Thus, on a typical mdoc(7) manual, we economize
between 0.8 and 0.9 percent parsing time with these
optimizations.  Rendering time is about 0.55s,
so we save about 0.6 percent total time.

Given that the rendering time for the ksh(1) manual
is 100 milliseconds on my notebook, we can save
600 microseconds in absolute numbers, on a large
manual.

Thus, here is a version using these optimizations, which
fortunately does not make the code more complicated.
I diffed against OpenBSD, because that makes the
diff easier to understand.

Yours,
  Ingo


Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.23
diff -u -p -r1.23 roff.c
--- roff.c	9 Dec 2010 20:56:30 -0000	1.23
+++ roff.c	10 Dec 2010 22:58:19 -0000
@@ -345,18 +345,11 @@ roff_res(struct roff *r, char **bufp, si
 	size_t		 nsz;
 	char		*n;
 
-	/* String escape sequences have at least three characters. */
+	/* Search for a leading backslash and save a pointer to it. */
 
-	for (cp = *bufp + pos; cp[0] && cp[1] && cp[2]; cp++) {
-
-		/*
-		 * The first character must be a backslash.
-		 * Save a pointer to it.
-		 */
-
-		if ('\\' != *cp)
-			continue;
-		stesc = cp;
+	cp = *bufp + pos;
+	while (NULL != (cp = strchr(cp, '\\'))) {
+		stesc = cp++;
 
 		/*
 		 * The second character must be an asterisk.
@@ -364,7 +357,9 @@ roff_res(struct roff *r, char **bufp, si
 		 * so it can't start another escape sequence.
 		 */
 
-		if ('*' != *(++cp))
+		if ('\0' == *cp)
+			return(1);
+		if ('*' != *cp++)
 			continue;
 
 		/*
@@ -373,7 +368,9 @@ roff_res(struct roff *r, char **bufp, si
 		 * Save a pointer to the name.
 		 */
 
-		switch (*(++cp)) {
+		switch (*cp) {
+		case ('\0'):
+			return(1);
 		case ('('):
 			cp++;
 			maxl = 2;
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-12-10 23:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-01 16:33 exit_status persistence Kristaps Dzonsons
2010-12-01 16:41 ` Kristaps Dzonsons
2010-12-01 21:28 ` Ingo Schwarze
2010-12-02 10:51   ` exit_status persistence (now: roff.c question) Kristaps Dzonsons
2010-12-02 13:29     ` Kristaps Dzonsons
2010-12-02 22:50       ` roff.c question Ingo Schwarze
2010-12-03 21:49         ` Ingo Schwarze
2010-12-05 15:15           ` Kristaps Dzonsons
2010-12-08  1:05             ` Ingo Schwarze
2010-12-10  9:40               ` Kristaps Dzonsons
2010-12-10 20:45                 ` Ingo Schwarze
2010-12-10 20:52                   ` Joerg Sonnenberger
2010-12-10 21:10                     ` Ingo Schwarze
2010-12-10 21:17                       ` Joerg Sonnenberger
2010-12-10 23:12                       ` Ingo Schwarze
2010-12-03 23:31         ` Ingo Schwarze
2010-12-05 15:17           ` Kristaps Dzonsons
2010-12-09 23:45             ` Ingo Schwarze
2010-12-10  9:32               ` Kristaps Dzonsons
2010-12-02 20:54     ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).