tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* [PATCH] uniform parsing of names in libroff
@ 2011-01-10 22:52 Ingo Schwarze
  2011-01-10 23:10 ` Kristaps Dzonsons
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Schwarze @ 2011-01-10 22:52 UTC (permalink / raw)
  To: tech

Hi,

i just started to implement .rm (remove macro) in libroff,
then realized that i would be introducing the third place
where names (that is, to be precise, roff macro and string names)
get parsed.

The following patch moves this parsing into its own function,
to be soon reused by roff_rm().

I put that function roff_getname() in libroff (and not in
libmandoc) because i do not expect any macro package to deal
with that kind of low-level stuff.

The interface is similar to mandoc_getarg(), but much simpler,
because quotes count as normal characters in this context,
space characters cannot be escaped, and backslashes are only
allowed in pairs.  Yes, parsing of names has its own rules,
and they are quite different from the rules for parsing of
arguments.  See, this is roff, right?

As a bonus:
 - Both roff_ds and roff_nr become much shorter.
 - We get an ERROR when there are escapes in a name.

OK?

Yours,
  Ingo

P.S.
Maybe we can use this for roff_block() as well, but that is not
as straighforward, so i haven't done it right away.


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.65
diff -u -p -r1.65 main.c
--- main.c	9 Jan 2011 13:16:48 -0000	1.65
+++ main.c	10 Jan 2011 22:29:54 -0000
@@ -179,6 +179,7 @@ static	const char * const	mandocerrs[MAN
 
 	"input stack limit exceeded, infinite loop?",
 	"skipping bad character",
+	"escaped character not allowed in a name",
 	"skipping text before the first section header",
 	"skipping unknown macro",
 	"NOT IMPLEMENTED: skipping request",
Index: mandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.h,v
retrieving revision 1.28
diff -u -p -r1.28 mandoc.h
--- mandoc.h	9 Jan 2011 14:30:48 -0000	1.28
+++ mandoc.h	10 Jan 2011 22:29:54 -0000
@@ -116,6 +116,7 @@ enum	mandocerr {
 
 	MANDOCERR_ROFFLOOP, /* input stack limit exceeded, infinite loop? */
 	MANDOCERR_BADCHAR, /* skipping bad character */
+	MANDOCERR_NAMESC, /* escaped character not allowed in a name */
 	MANDOCERR_NOTEXT, /* skipping text before the first section header */
 	MANDOCERR_MACRO, /* skipping unknown macro */
 	MANDOCERR_REQUEST, /* NOT IMPLEMENTED: skipping request */
Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
retrieving revision 1.27
diff -u -p -r1.27 roff.c
--- roff.c	4 Jan 2011 22:28:17 -0000	1.27
+++ roff.c	10 Jan 2011 22:29:54 -0000
@@ -130,6 +130,7 @@ static	enum rofferr	 roff_cond_sub(ROFF_
 static	enum rofferr	 roff_ds(ROFF_ARGS);
 static	enum roffrule	 roff_evalcond(const char *, int *);
 static	void		 roff_freestr(struct roff *);
+static	char		*roff_getname(struct roff *, char **, int, int);
 static	const char	*roff_getstrn(const struct roff *, 
 				const char *, size_t);
 static	enum rofferr	 roff_line_ignore(ROFF_ARGS);
@@ -1052,25 +1053,13 @@ roff_ds(ROFF_ARGS)
 	 * will have `bar  "     ' as its value.
 	 */
 
-	name = *bufp + pos;
+	string = *bufp + pos;
+	name = roff_getname(r, &string, ln, pos);
 	if ('\0' == *name)
 		return(ROFF_IGN);
 
-	string = name;
-	/* Read until end of name. */
-	while (*string && ' ' != *string)
-		string++;
-
-	/* Nil-terminate name. */
-	if (*string)
-		*(string++) = '\0';
-	
-	/* Read past spaces. */
-	while (*string && ' ' == *string)
-		string++;
-
-	/* Read passed initial double-quote. */
-	if (*string && '"' == *string)
+	/* Read past initial double-quote. */
+	if ('"' == *string)
 		string++;
 
 	/* The rest is the value. */
@@ -1083,31 +1072,14 @@ roff_ds(ROFF_ARGS)
 static enum rofferr
 roff_nr(ROFF_ARGS)
 {
-	const char	*key, *val;
+	const char	*key;
+	char		*val;
 	struct reg	*rg;
 
-	key = &(*bufp)[pos];
+	val = *bufp + pos;
+	key = roff_getname(r, &val, ln, pos);
 	rg = r->regs->regs;
 
-	/* Parse register request. */
-	while ((*bufp)[pos] && ' ' != (*bufp)[pos])
-		pos++;
-
-	/*
-	 * Set our nil terminator.  Because this line is going to be
-	 * ignored anyway, we can munge it as we please.
-	 */
-	if ((*bufp)[pos])
-		(*bufp)[pos++] = '\0';
-
-	/* Skip whitespace to register token. */
-	while ((*bufp)[pos] && ' ' == (*bufp)[pos])
-		pos++;
-
-	val = &(*bufp)[pos];
-
-	/* Process register token. */
-
 	if (0 == strcmp(key, "nS")) {
 		rg[(int)REG_nS].set = 1;
 		if ( ! roff_parse_nat(val, &rg[(int)REG_nS].v.u))
@@ -1245,6 +1217,41 @@ roff_userdef(ROFF_ARGS)
 	return(*szp > 1 && '\n' == (*bufp)[(int)*szp - 2] ?
 	   ROFF_REPARSE : ROFF_APPEND);
 }
+
+
+static char *
+roff_getname(struct roff *r, char **cpp, int ln, int pos)
+{
+	char	 *name, *cp;
+
+	name = *cpp;
+	if ('\0' == *name)
+		return(name);
+
+	/* Read until end of name. */
+	for (cp = name; '\0' != *cp && ' ' != *cp; cp++) {
+		if ('\\' != *cp)
+			continue;
+		cp++;
+		if ('\\' == *cp)
+			continue;
+		(*r->msg)(MANDOCERR_NAMESC, r->data, ln, pos, NULL);
+		*cp = '\0';
+		name = cp;
+	}
+
+	/* Nil-terminate name. */
+	if ('\0' != *cp)
+		*(cp++) = '\0';
+
+	/* Read past spaces. */
+	while (' ' == *cp)
+		cp++;
+
+	*cpp = cp;
+	return(name);
+}
+
 
 /*
  * Store *string into the user-defined string called *name.

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] uniform parsing of names in libroff
  2011-01-10 22:52 [PATCH] uniform parsing of names in libroff Ingo Schwarze
@ 2011-01-10 23:10 ` Kristaps Dzonsons
  2011-01-10 23:41   ` [PATCH] implement .rm Ingo Schwarze
  0 siblings, 1 reply; 4+ messages in thread
From: Kristaps Dzonsons @ 2011-01-10 23:10 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

On 10/01/2011 23:52, Ingo Schwarze wrote:
> Hi,
>
> i just started to implement .rm (remove macro) in libroff,
> then realized that i would be introducing the third place
> where names (that is, to be precise, roff macro and string names)
> get parsed.
>
> The following patch moves this parsing into its own function,
> to be soon reused by roff_rm().
>
> I put that function roff_getname() in libroff (and not in
> libmandoc) because i do not expect any macro package to deal
> with that kind of low-level stuff.
>
> The interface is similar to mandoc_getarg(), but much simpler,
> because quotes count as normal characters in this context,
> space characters cannot be escaped, and backslashes are only
> allowed in pairs.  Yes, parsing of names has its own rules,
> and they are quite different from the rules for parsing of
> arguments.  See, this is roff, right?
>
> As a bonus:
>   - Both roff_ds and roff_nr become much shorter.
>   - We get an ERROR when there are escapes in a name.
>
> OK?

Ingo, I'm fine with this.

Let me know when your changes into libroff are finished---I want to hash 
setstr() stuff, as there are a lot of pod2man and man pages with lots of 
requests, and we can probably make a measurably impact on performance.

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] implement .rm
  2011-01-10 23:10 ` Kristaps Dzonsons
@ 2011-01-10 23:41   ` Ingo Schwarze
  2011-01-10 23:47     ` Kristaps Dzonsons
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Schwarze @ 2011-01-10 23:41 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Tue, Jan 11, 2011 at 12:10:18AM +0100:
> On 10/01/2011 23:52, Ingo Schwarze wrote:

>> i just started to implement .rm (remove macro) in libroff,
>> then realized that i would be introducing the third place
>> where names (that is, to be precise, roff macro and string names)
>> get parsed.

Having roff_getname(), this is now really easy.

> Ingo, I'm fine with this.

Thanks for looking, i'm putting that into both trees right now.

> Let me know when your changes into libroff are finished---

After the patch appended below.  ;-)

> I want to
> hash setstr() stuff, as there are a lot of pod2man and man pages
> with lots of requests, and we can probably make a measurably impact
> on performance.

Hmm, last time i did an optimization in libroff, it gave us
2% speedup on ksh(1).  Yes, that was measurable, though the
measurement was somewhat non-trivial.  :)

Thus, i will believe this gives us a measureable improvement
when i see the measurement!

Using hash tables (or RB-trees?) probably makes sense, but please
try to not hack up a third (or fourth?  i have lost count)
independent hash table implementation in mandoc.  ;-)

Yours,
  Ingo


diff -Napur mandoc.getname/roff.c mandoc/roff.c
--- mandoc.getname/roff.c	Mon Jan 10 15:24:03 2011
+++ mandoc/roff.c	Mon Jan 10 16:25:48 2011
@@ -134,10 +134,10 @@ static	char		*roff_getname(struct roff *, char **, int
 static	const char	*roff_getstrn(const struct roff *, 
 				const char *, size_t);
 static	enum rofferr	 roff_line_ignore(ROFF_ARGS);
-static	enum rofferr	 roff_line_error(ROFF_ARGS);
 static	enum rofferr	 roff_nr(ROFF_ARGS);
 static	int		 roff_res(struct roff *, 
 				char **, size_t *, int);
+static	enum rofferr	 roff_rm(ROFF_ARGS);
 static	void		 roff_setstr(struct roff *,
 				const char *, const char *, int);
 static	enum rofferr	 roff_so(ROFF_ARGS);
@@ -171,7 +171,7 @@ static	struct roffmac	 roffs[ROFF_MAX] = {
 	{ "ne", roff_line_ignore, NULL, NULL, 0, NULL },
 	{ "nh", roff_line_ignore, NULL, NULL, 0, NULL },
 	{ "nr", roff_nr, NULL, NULL, 0, NULL },
-	{ "rm", roff_line_error, NULL, NULL, 0, NULL },
+	{ "rm", roff_rm, NULL, NULL, 0, NULL },
 	{ "so", roff_so, NULL, NULL, 0, NULL },
 	{ "tr", roff_line_ignore, NULL, NULL, 0, NULL },
 	{ "TS", roff_TS, NULL, NULL, 0, NULL },
@@ -936,15 +936,6 @@ roff_line_ignore(ROFF_ARGS)
 
 /* ARGSUSED */
 static enum rofferr
-roff_line_error(ROFF_ARGS)
-{
-
-	(*r->msg)(MANDOCERR_REQUEST, r->data, ln, ppos, roffs[tok].name);
-	return(ROFF_IGN);
-}
-
-/* ARGSUSED */
-static enum rofferr
 roff_cond(ROFF_ARGS)
 {
 	int		 sv;
@@ -1086,6 +1077,22 @@ roff_nr(ROFF_ARGS)
 			rg[(int)REG_nS].v.u = 0;
 	}
 
+	return(ROFF_IGN);
+}
+
+/* ARGSUSED */
+static enum rofferr
+roff_rm(ROFF_ARGS)
+{
+	const char	 *name;
+	char		 *cp;
+
+	cp = *bufp + pos;
+	while ('\0' != *cp) {
+		name = roff_getname(r, &cp, ln, cp - *bufp);
+		if ('\0' != *name)
+			roff_setstr(r, name, NULL, 0);
+	}
 	return(ROFF_IGN);
 }
 
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] implement .rm
  2011-01-10 23:41   ` [PATCH] implement .rm Ingo Schwarze
@ 2011-01-10 23:47     ` Kristaps Dzonsons
  0 siblings, 0 replies; 4+ messages in thread
From: Kristaps Dzonsons @ 2011-01-10 23:47 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

> Thus, i will believe this gives us a measureable improvement
> when i see the measurement!
>
> Using hash tables (or RB-trees?) probably makes sense, but please
> try to not hack up a third (or fourth?  i have lost count)
> independent hash table implementation in mandoc.  ;-)

Ingo,

You've a point in this.  It's true that all roff, mdoc, and man name 
lookups generally have 1-3 letters.  Let me look into a nice, unified 
interface for hashing these.  If it saves space and speed, I'll float a 
patch for review.

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-10 23:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-10 22:52 [PATCH] uniform parsing of names in libroff Ingo Schwarze
2011-01-10 23:10 ` Kristaps Dzonsons
2011-01-10 23:41   ` [PATCH] implement .rm Ingo Schwarze
2011-01-10 23:47     ` Kristaps Dzonsons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).