From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id q040sW2S001360 for ; Tue, 3 Jan 2012 19:54:32 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1RiF7f-0007tU-Ei; Wed, 04 Jan 2012 01:54:31 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RiF7f-00001p-F8 for tech@mdocml.bsd.lv; Wed, 04 Jan 2012 01:54:31 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RiF7f-0005sY-Dv for tech@mdocml.bsd.lv; Wed, 04 Jan 2012 01:54:31 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1RiF7f-0008Vd-3K for tech@mdocml.bsd.lv; Wed, 04 Jan 2012 01:54:31 +0100 Date: Wed, 4 Jan 2012 01:54:30 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: Can of worms: \h"..." Message-ID: <20120104005430.GF2607@iris.usta.de> References: <4F02F264.2070407@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F02F264.2070407@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, just a very quick answer - it's getting late already and i can't study this in due detail right now. Kristaps Dzonsons wrote on Tue, Jan 03, 2012 at 01:19:48PM +0100: > On the verge of checking in a quick fix for the \h"..." TODO, it > occurred to me that we either don't want to accomodate for pod2man > badness OR something more subtle's at work. \h"..." is specifically > disallowed by groff(1). So I searched in the groff source. Behold! > > In groff.c's input.cpp, we see several escapes (h, H, N, S, v, x) > directly condition their enclosing markers on the first character > (see get_delim_number()) while others do so indirectly. These set > the end marker on the first character given that it satisfies the > token::delimiter() method (or whatever is C++'s name for an object > function). > > The delimiter() function (also in input.cpp) allows any character > but a certain ASCII subset and whitespace. groff(7) mentions the > apostrophe, but it can much much more. > > Question is: do we want this behaviour? I'd say we do, If i understand correctly, i tend to say: Yes, we should accept the same characters as delimiters as groff. > but as it's somewhat intrusive, I want some consensus before > committing. Either way, I do NOT suggest that we outwardly > document this. Indeed, documenting the apostrophe as a delimiter is enough, everything else does not seem particularly sane. > Note that this also fixes the situation where some non-\N escapes > were being assigned the NUMERIC identifier, which is only used for > \N. I also removed the check for \N numbers, as this is done again > later. I didn't run it yet, but suspect that part to be wrong. The point is: Sure, we have found an explicit delimiting character. But any other letter will terminate the escape sequence as well, see http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/mandoc/char/N/ Both the mdoc(7) input and groff(1) output are checked in. See in particular the "mixed content" on line 18 of basic.in, line 13 of basic.out_ascii. Whatever you check in, please don't break that test. :-) > Thoughts? The longish switch(numeric) could probably be replaced by something like strchr("0123456789+-/*%<>=&:().", numeric) Yours, Ingo > Index: mandoc.c > =================================================================== > RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandoc.c,v > retrieving revision 1.62 > diff -u -p -r1.62 mandoc.c > --- mandoc.c 3 Dec 2011 16:08:51 -0000 1.62 > +++ mandoc.c 3 Jan 2012 12:18:51 -0000 > @@ -209,9 +209,15 @@ mandoc_escape(const char **end, const ch > break; > > /* > - * These escapes are of the form \X'N', where 'X' is the trigger > - * and 'N' resolves to a numerical expression. > + * These escapes accept most characters as enclosure marks > + * (except for those listed in the switch). > + * The enclosed materials are numbers, so run them through the > + * numerical subexpression calculator after we process. > */ > + case ('N'): > + /* Special case: numerical representation of char. */ > + gly = ESCAPE_NUMBERED; > + /* FALLTHROUGH */ > case ('B'): > /* FALLTHROUGH */ > case ('h'): > @@ -221,7 +227,6 @@ mandoc_escape(const char **end, const ch > case ('L'): > /* FALLTHROUGH */ > case ('l'): > - gly = ESCAPE_NUMBERED; > /* FALLTHROUGH */ > case ('S'): > /* FALLTHROUGH */ > @@ -230,32 +235,62 @@ mandoc_escape(const char **end, const ch > case ('w'): > /* FALLTHROUGH */ > case ('x'): > - if (ESCAPE_ERROR == gly) > + if (ESCAPE_NUMBERED != gly) > gly = ESCAPE_IGNORE; > - if ('\'' != cp[i++]) > + numeric = term = cp[i++]; > + switch (numeric) { > + case('0'): > + /* FALLTHROUGH */ > + case('1'): > + /* FALLTHROUGH */ > + case('2'): > + /* FALLTHROUGH */ > + case('3'): > + /* FALLTHROUGH */ > + case('4'): > + /* FALLTHROUGH */ > + case('5'): > + /* FALLTHROUGH */ > + case('6'): > + /* FALLTHROUGH */ > + case('7'): > + /* FALLTHROUGH */ > + case('8'): > + /* FALLTHROUGH */ > + case('9'): > + /* FALLTHROUGH */ > + case('+'): > + /* FALLTHROUGH */ > + case('-'): > + /* FALLTHROUGH */ > + case('/'): > + /* FALLTHROUGH */ > + case('*'): > + /* FALLTHROUGH */ > + case('%'): > + /* FALLTHROUGH */ > + case('<'): > + /* FALLTHROUGH */ > + case('>'): > + /* FALLTHROUGH */ > + case('='): > + /* FALLTHROUGH */ > + case('&'): > + /* FALLTHROUGH */ > + case(':'): > + /* FALLTHROUGH */ > + case('('): > + /* FALLTHROUGH */ > + case(')'): > + /* FALLTHROUGH */ > + case('.'): > return(ESCAPE_ERROR); > - term = numeric = '\''; > - break; > - > - /* > - * Special handling for the numbered character escape. > - * XXX Do any other escapes need similar handling? > - */ > - case ('N'): > - if ('\0' == cp[i]) > + default: > + break; > + } > + if (isspace((unsigned char)numeric)) > return(ESCAPE_ERROR); > - *end = &cp[++i]; > - if (isdigit((unsigned char)cp[i-1])) > - return(ESCAPE_IGNORE); > - while (isdigit((unsigned char)**end)) > - (*end)++; > - if (start) > - *start = &cp[i]; > - if (sz) > - *sz = *end - &cp[i]; > - if ('\0' != **end) > - (*end)++; > - return(ESCAPE_NUMBERED); > + break; > > /* > * Sizes get a special category of their own. -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv