tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: Can of worms: \h"..."
Date: Wed, 4 Jan 2012 01:54:30 +0100	[thread overview]
Message-ID: <20120104005430.GF2607@iris.usta.de> (raw)
In-Reply-To: <4F02F264.2070407@bsd.lv>

Hi Kristaps,

just a very quick answer - it's getting late already and i can't
study this in due detail right now.

Kristaps Dzonsons wrote on Tue, Jan 03, 2012 at 01:19:48PM +0100:

> On the verge of checking in a quick fix for the \h"..." TODO, it
> occurred to me that we either don't want to accomodate for pod2man
> badness OR something more subtle's at work.  \h"..." is specifically
> disallowed by groff(1).  So I searched in the groff source.  Behold!
> 
> In groff.c's input.cpp, we see several escapes (h, H, N, S, v, x)
> directly condition their enclosing markers on the first character
> (see get_delim_number()) while others do so indirectly.  These set
> the end marker on the first character given that it satisfies the
> token::delimiter() method (or whatever is C++'s name for an object
> function).
> 
> The delimiter() function (also in input.cpp) allows any character
> but a certain ASCII subset and whitespace.  groff(7) mentions the
> apostrophe, but it can much much more.
> 
> Question is: do we want this behaviour?  I'd say we do,

If i understand correctly, i tend to say:
Yes, we should accept the same characters as delimiters as groff.

> but as it's somewhat intrusive, I want some consensus before
> committing.  Either way, I do NOT suggest that we outwardly
> document this.

Indeed, documenting the apostrophe as a delimiter is enough,
everything else does not seem particularly sane.

> Note that this also fixes the situation where some non-\N escapes
> were being assigned the NUMERIC identifier, which is only used for
> \N.  I also removed the check for \N numbers, as this is done again
> later.

I didn't run it yet, but suspect that part to be wrong.
The point is: Sure, we have found an explicit delimiting character.
But any other letter will terminate the escape sequence as well, see

  http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/mandoc/char/N/

Both the mdoc(7) input and groff(1) output are checked in.
See in particular the "mixed content" on line 18 of basic.in,
line 13 of basic.out_ascii.

Whatever you check in, please don't break that test.  :-)

> Thoughts?

The longish switch(numeric) could probably be replaced by something like

  strchr("0123456789+-/*%<>=&:().", numeric)

Yours,
  Ingo


> Index: mandoc.c
> ===================================================================
> RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 mandoc.c
> --- mandoc.c	3 Dec 2011 16:08:51 -0000	1.62
> +++ mandoc.c	3 Jan 2012 12:18:51 -0000
> @@ -209,9 +209,15 @@ mandoc_escape(const char **end, const ch
>  		break;
>  
>  	/*
> -	 * These escapes are of the form \X'N', where 'X' is the trigger
> -	 * and 'N' resolves to a numerical expression.
> +	 * These escapes accept most characters as enclosure marks
> +	 * (except for those listed in the switch).
> +	 * The enclosed materials are numbers, so run them through the
> +	 * numerical subexpression calculator after we process.
>  	 */
> +	case ('N'):
> +		/* Special case: numerical representation of char. */
> +		gly = ESCAPE_NUMBERED;
> +		/* FALLTHROUGH */
>  	case ('B'):
>  		/* FALLTHROUGH */
>  	case ('h'):
> @@ -221,7 +227,6 @@ mandoc_escape(const char **end, const ch
>  	case ('L'):
>  		/* FALLTHROUGH */
>  	case ('l'):
> -		gly = ESCAPE_NUMBERED;
>  		/* FALLTHROUGH */
>  	case ('S'):
>  		/* FALLTHROUGH */
> @@ -230,32 +235,62 @@ mandoc_escape(const char **end, const ch
>  	case ('w'):
>  		/* FALLTHROUGH */
>  	case ('x'):
> -		if (ESCAPE_ERROR == gly)
> +		if (ESCAPE_NUMBERED != gly)
>  			gly = ESCAPE_IGNORE;
> -		if ('\'' != cp[i++])
> +		numeric = term = cp[i++];
> +		switch (numeric) {
> +		case('0'):
> +			/* FALLTHROUGH */
> +		case('1'):
> +			/* FALLTHROUGH */
> +		case('2'):
> +			/* FALLTHROUGH */
> +		case('3'):
> +			/* FALLTHROUGH */
> +		case('4'):
> +			/* FALLTHROUGH */
> +		case('5'):
> +			/* FALLTHROUGH */
> +		case('6'):
> +			/* FALLTHROUGH */
> +		case('7'):
> +			/* FALLTHROUGH */
> +		case('8'):
> +			/* FALLTHROUGH */
> +		case('9'):
> +			/* FALLTHROUGH */
> +		case('+'):
> +			/* FALLTHROUGH */
> +		case('-'):
> +			/* FALLTHROUGH */
> +		case('/'):
> +			/* FALLTHROUGH */
> +		case('*'):
> +			/* FALLTHROUGH */
> +		case('%'):
> +			/* FALLTHROUGH */
> +		case('<'):
> +			/* FALLTHROUGH */
> +		case('>'):
> +			/* FALLTHROUGH */
> +		case('='):
> +			/* FALLTHROUGH */
> +		case('&'):
> +			/* FALLTHROUGH */
> +		case(':'):
> +			/* FALLTHROUGH */
> +		case('('):
> +			/* FALLTHROUGH */
> +		case(')'):
> +			/* FALLTHROUGH */
> +		case('.'):
>  			return(ESCAPE_ERROR);
> -		term = numeric = '\'';
> -		break;
> -
> -	/*
> -	 * Special handling for the numbered character escape.
> -	 * XXX Do any other escapes need similar handling?
> -	 */
> -	case ('N'):
> -		if ('\0' == cp[i])
> +		default:
> +			break;
> +		}
> +		if (isspace((unsigned char)numeric))
>  			return(ESCAPE_ERROR);
> -		*end = &cp[++i];
> -		if (isdigit((unsigned char)cp[i-1]))
> -			return(ESCAPE_IGNORE);
> -		while (isdigit((unsigned char)**end))
> -			(*end)++;
> -		if (start)
> -			*start = &cp[i];
> -		if (sz)
> -			*sz = *end - &cp[i];
> -		if ('\0' != **end)
> -			(*end)++;
> -		return(ESCAPE_NUMBERED);
> +		break;
>  
>  	/* 
>  	 * Sizes get a special category of their own.
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

      reply	other threads:[~2012-01-04  0:54 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-03 12:19 Kristaps Dzonsons
2012-01-04  0:54 ` Ingo Schwarze [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120104005430.GF2607@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).