From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: Can of worms: \h"..."
Date: Wed, 4 Jan 2012 01:54:30 +0100 [thread overview]
Message-ID: <20120104005430.GF2607@iris.usta.de> (raw)
In-Reply-To: <4F02F264.2070407@bsd.lv>
Hi Kristaps,
just a very quick answer - it's getting late already and i can't
study this in due detail right now.
Kristaps Dzonsons wrote on Tue, Jan 03, 2012 at 01:19:48PM +0100:
> On the verge of checking in a quick fix for the \h"..." TODO, it
> occurred to me that we either don't want to accomodate for pod2man
> badness OR something more subtle's at work. \h"..." is specifically
> disallowed by groff(1). So I searched in the groff source. Behold!
>
> In groff.c's input.cpp, we see several escapes (h, H, N, S, v, x)
> directly condition their enclosing markers on the first character
> (see get_delim_number()) while others do so indirectly. These set
> the end marker on the first character given that it satisfies the
> token::delimiter() method (or whatever is C++'s name for an object
> function).
>
> The delimiter() function (also in input.cpp) allows any character
> but a certain ASCII subset and whitespace. groff(7) mentions the
> apostrophe, but it can much much more.
>
> Question is: do we want this behaviour? I'd say we do,
If i understand correctly, i tend to say:
Yes, we should accept the same characters as delimiters as groff.
> but as it's somewhat intrusive, I want some consensus before
> committing. Either way, I do NOT suggest that we outwardly
> document this.
Indeed, documenting the apostrophe as a delimiter is enough,
everything else does not seem particularly sane.
> Note that this also fixes the situation where some non-\N escapes
> were being assigned the NUMERIC identifier, which is only used for
> \N. I also removed the check for \N numbers, as this is done again
> later.
I didn't run it yet, but suspect that part to be wrong.
The point is: Sure, we have found an explicit delimiting character.
But any other letter will terminate the escape sequence as well, see
http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/mandoc/char/N/
Both the mdoc(7) input and groff(1) output are checked in.
See in particular the "mixed content" on line 18 of basic.in,
line 13 of basic.out_ascii.
Whatever you check in, please don't break that test. :-)
> Thoughts?
The longish switch(numeric) could probably be replaced by something like
strchr("0123456789+-/*%<>=&:().", numeric)
Yours,
Ingo
> Index: mandoc.c
> ===================================================================
> RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 mandoc.c
> --- mandoc.c 3 Dec 2011 16:08:51 -0000 1.62
> +++ mandoc.c 3 Jan 2012 12:18:51 -0000
> @@ -209,9 +209,15 @@ mandoc_escape(const char **end, const ch
> break;
>
> /*
> - * These escapes are of the form \X'N', where 'X' is the trigger
> - * and 'N' resolves to a numerical expression.
> + * These escapes accept most characters as enclosure marks
> + * (except for those listed in the switch).
> + * The enclosed materials are numbers, so run them through the
> + * numerical subexpression calculator after we process.
> */
> + case ('N'):
> + /* Special case: numerical representation of char. */
> + gly = ESCAPE_NUMBERED;
> + /* FALLTHROUGH */
> case ('B'):
> /* FALLTHROUGH */
> case ('h'):
> @@ -221,7 +227,6 @@ mandoc_escape(const char **end, const ch
> case ('L'):
> /* FALLTHROUGH */
> case ('l'):
> - gly = ESCAPE_NUMBERED;
> /* FALLTHROUGH */
> case ('S'):
> /* FALLTHROUGH */
> @@ -230,32 +235,62 @@ mandoc_escape(const char **end, const ch
> case ('w'):
> /* FALLTHROUGH */
> case ('x'):
> - if (ESCAPE_ERROR == gly)
> + if (ESCAPE_NUMBERED != gly)
> gly = ESCAPE_IGNORE;
> - if ('\'' != cp[i++])
> + numeric = term = cp[i++];
> + switch (numeric) {
> + case('0'):
> + /* FALLTHROUGH */
> + case('1'):
> + /* FALLTHROUGH */
> + case('2'):
> + /* FALLTHROUGH */
> + case('3'):
> + /* FALLTHROUGH */
> + case('4'):
> + /* FALLTHROUGH */
> + case('5'):
> + /* FALLTHROUGH */
> + case('6'):
> + /* FALLTHROUGH */
> + case('7'):
> + /* FALLTHROUGH */
> + case('8'):
> + /* FALLTHROUGH */
> + case('9'):
> + /* FALLTHROUGH */
> + case('+'):
> + /* FALLTHROUGH */
> + case('-'):
> + /* FALLTHROUGH */
> + case('/'):
> + /* FALLTHROUGH */
> + case('*'):
> + /* FALLTHROUGH */
> + case('%'):
> + /* FALLTHROUGH */
> + case('<'):
> + /* FALLTHROUGH */
> + case('>'):
> + /* FALLTHROUGH */
> + case('='):
> + /* FALLTHROUGH */
> + case('&'):
> + /* FALLTHROUGH */
> + case(':'):
> + /* FALLTHROUGH */
> + case('('):
> + /* FALLTHROUGH */
> + case(')'):
> + /* FALLTHROUGH */
> + case('.'):
> return(ESCAPE_ERROR);
> - term = numeric = '\'';
> - break;
> -
> - /*
> - * Special handling for the numbered character escape.
> - * XXX Do any other escapes need similar handling?
> - */
> - case ('N'):
> - if ('\0' == cp[i])
> + default:
> + break;
> + }
> + if (isspace((unsigned char)numeric))
> return(ESCAPE_ERROR);
> - *end = &cp[++i];
> - if (isdigit((unsigned char)cp[i-1]))
> - return(ESCAPE_IGNORE);
> - while (isdigit((unsigned char)**end))
> - (*end)++;
> - if (start)
> - *start = &cp[i];
> - if (sz)
> - *sz = *end - &cp[i];
> - if ('\0' != **end)
> - (*end)++;
> - return(ESCAPE_NUMBERED);
> + break;
>
> /*
> * Sizes get a special category of their own.
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
prev parent reply other threads:[~2012-01-04 0:54 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-03 12:19 Kristaps Dzonsons
2012-01-04 0:54 ` Ingo Schwarze [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120104005430.GF2607@iris.usta.de \
--to=schwarze@usta.de \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).