From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id q4SMwI7P009728 for ; Mon, 28 May 2012 18:58:19 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1SZ8tE-0002e4-DO; Tue, 29 May 2012 00:58:16 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1SZ8tE-0002dz-BO for tech@mdocml.bsd.lv; Tue, 29 May 2012 00:58:16 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1SZ8tE-00082o-9o for tech@mdocml.bsd.lv; Tue, 29 May 2012 00:58:16 +0200 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1SZ8tE-0002ys-4n for tech@mdocml.bsd.lv; Tue, 29 May 2012 00:58:16 +0200 Date: Tue, 29 May 2012 00:58:15 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: make mandoc_escape more readable Message-ID: <20120528225815.GC26820@iris.usta.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by krisdoz.my.domain id q4SMwI7P009728 Kill c, rlim, cp, and rstart - 'cause which is what!? Also shortens the file by three lines, and replaces ten more lines of code by comments. Enjoy, Ingo ----- Forwarded message from Ingo Schwarze ----- From: Ingo Schwarze Date: Mon, 28 May 2012 16:45:34 -0600 (MDT) To: source-changes@cvs.openbsd.org CVSROOT: /cvs Module name: src Changes by: schwarze@cvs.openbsd.org 2012/05/28 16:45:34 Modified files: usr.bin/mandoc : mandoc.c regress/usr.bin/mandoc/roff/esc: Makefile Added files: regress/usr.bin/mandoc/roff/esc: c.in c.out_ascii f.in f.out_ascii ignore.in ignore.out_ascii multi.in multi.out_ascii one.in one.out_ascii two.in two.out_ascii Log message: While i already got my fingers dirty on mandoc_escape(), profit of the occasion to pull out some spaghetti, that is, three confusing variables and fourteen pointless assignments among them; instead, always operate on the official pointers **start, **end, and *sz, each of which conveys an obvious meaning. No functional change intended, and the new tests confirm that everything still (err...) "works", as far as that word can be applied to the kind of roff(7) mock-up code i'm polishing here. ----- End forwarded message ----- Index: usr.bin/mandoc/mandoc.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandoc.c,v retrieving revision 1.33 diff -u -p -r1.33 mandoc.c --- usr.bin/mandoc/mandoc.c 28 May 2012 17:08:48 -0000 1.33 +++ usr.bin/mandoc/mandoc.c 28 May 2012 22:23:32 -0000 @@ -38,20 +38,33 @@ static char *time2a(time_t); enum mandoc_esc mandoc_escape(const char **end, const char **start, int *sz) { - char c, term; - int i, rlim; - const char *cp, *rstart; + const char *local_start; + int local_sz; + char term; enum mandoc_esc gly; - cp = *end; - rstart = cp; - if (start) - *start = rstart; - i = rlim = 0; + /* + * When the caller doesn't provide return storage, + * use local storage. + */ + + if (NULL == start) + start = &local_start; + if (NULL == sz) + sz = &local_sz; + + /* + * Beyond the backslash, at least one input character + * is part of the escape sequence. With one exception + * (see below), that character won't be returned. + */ + gly = ESCAPE_ERROR; + *start = ++*end; + *sz = 0; term = '\0'; - switch ((c = cp[i++])) { + switch ((*start)[-1]) { /* * First the glyphs. There are several different forms of * these, but each eventually returns a substring of the glyph @@ -59,7 +72,7 @@ mandoc_escape(const char **end, const ch */ case ('('): gly = ESCAPE_SPECIAL; - rlim = 2; + *sz = 2; break; case ('['): gly = ESCAPE_SPECIAL; @@ -69,14 +82,15 @@ mandoc_escape(const char **end, const ch * Unicode codepoint. Here, however, only check whether * it's not a zero-width escape. */ - if ('u' == cp[i] && ']' != cp[i + 1]) + if ('u' == (*start)[0] && ']' != (*start)[1]) gly = ESCAPE_UNICODE; term = ']'; break; case ('C'): - if ('\'' != cp[i]) + if ('\'' != **start) return(ESCAPE_ERROR); gly = ESCAPE_SPECIAL; + *start = ++*end; term = '\''; break; @@ -87,7 +101,6 @@ mandoc_escape(const char **end, const ch * let us just skip the next character. */ case ('z'): - (*end)++; return(ESCAPE_SKIPCHAR); /* @@ -114,21 +127,17 @@ mandoc_escape(const char **end, const ch case ('f'): if (ESCAPE_ERROR == gly) gly = ESCAPE_FONT; - - rstart= &cp[i]; - if (start) - *start = rstart; - - switch (cp[i++]) { + switch (**start) { case ('('): - rlim = 2; + *start = ++*end; + *sz = 2; break; case ('['): + *start = ++*end; term = ']'; break; default: - rlim = 1; - i--; + *sz = 1; break; } break; @@ -150,9 +159,10 @@ mandoc_escape(const char **end, const ch case ('X'): /* FALLTHROUGH */ case ('Z'): - if ('\'' != cp[i++]) + if ('\'' != **start) return(ESCAPE_ERROR); gly = ESCAPE_IGNORE; + *start = ++*end; term = '\''; break; @@ -178,10 +188,11 @@ mandoc_escape(const char **end, const ch case ('w'): /* FALLTHROUGH */ case ('x'): + if ('\'' != **start) + return(ESCAPE_ERROR); if (ESCAPE_ERROR == gly) gly = ESCAPE_IGNORE; - if ('\'' != cp[i++]) - return(ESCAPE_ERROR); + *start = ++*end; term = '\''; break; @@ -190,17 +201,17 @@ mandoc_escape(const char **end, const ch * XXX Do any other escapes need similar handling? */ case ('N'): - if ('\0' == cp[i]) + if ('\0' == **start) return(ESCAPE_ERROR); - *end = &cp[++i]; - if (isdigit((unsigned char)cp[i-1])) + (*end)++; + if (isdigit((unsigned char)**start)) { + *sz = 1; return(ESCAPE_IGNORE); + } + (*start)++; while (isdigit((unsigned char)**end)) (*end)++; - if (start) - *start = &cp[i]; - if (sz) - *sz = *end - &cp[i]; + *sz = *end - *start; if ('\0' != **end) (*end)++; return(ESCAPE_NUMBERED); @@ -211,54 +222,43 @@ mandoc_escape(const char **end, const ch case ('s'): gly = ESCAPE_IGNORE; - rstart = &cp[i]; - if (start) - *start = rstart; - /* See +/- counts as a sign. */ - c = cp[i]; - if ('+' == c || '-' == c || ASCII_HYPH == c) - ++i; + if ('+' == **end || '-' == **end || ASCII_HYPH == **end) + (*end)++; - switch (cp[i++]) { + switch (**end) { case ('('): - rlim = 2; + *start = ++*end; + *sz = 2; break; case ('['): + *start = ++*end; term = ']'; break; case ('\''): + *start = ++*end; term = '\''; break; default: - rlim = 1; - i--; + *sz = 1; break; } - /* See +/- counts as a sign. */ - c = cp[i]; - if ('+' == c || '-' == c || ASCII_HYPH == c) - ++i; - break; /* * Anything else is assumed to be a glyph. + * In this case, pass back the character after the backslash. */ default: gly = ESCAPE_SPECIAL; - rlim = 1; - i--; + *start = --*end; + *sz = 1; break; } assert(ESCAPE_ERROR != gly); - *end = rstart = &cp[i]; - if (start) - *start = rstart; - /* * Read up to the terminating character, * paying attention to nested escapes. @@ -280,15 +280,13 @@ mandoc_escape(const char **end, const ch break; } } - rlim = (*end)++ - rstart; + *sz = (*end)++ - *start; } else { - assert(rlim > 0); - if ((size_t)rlim > strlen(rstart)) + assert(*sz > 0); + if ((size_t)*sz > strlen(*start)) return(ESCAPE_ERROR); - *end += rlim; + *end += *sz; } - if (sz) - *sz = rlim; /* Run post-processors. */ @@ -298,12 +296,13 @@ mandoc_escape(const char **end, const ch * Pretend that the constant-width font modes are the * same as the regular font modes. */ - if (2 == rlim && 'C' == *rstart) - rstart++; - else if (1 != rlim) + if (2 == *sz && 'C' == **start) { + (*start)++; + (*sz)--; + } else if (1 != *sz) break; - switch (*rstart) { + switch (**start) { case ('3'): /* FALLTHROUGH */ case ('B'): @@ -325,9 +324,7 @@ mandoc_escape(const char **end, const ch } break; case (ESCAPE_SPECIAL): - if (1 != rlim) - break; - if ('c' == *rstart) + if (1 == *sz && 'c' == **start) gly = ESCAPE_NOSPACE; break; default: Index: regress/usr.bin/mandoc/roff/esc/Makefile =================================================================== RCS file: /cvs/src/regress/usr.bin/mandoc/roff/esc/Makefile,v retrieving revision 1.2 diff -u -p -r1.2 Makefile --- regress/usr.bin/mandoc/roff/esc/Makefile 28 May 2012 17:08:48 -0000 1.2 +++ regress/usr.bin/mandoc/roff/esc/Makefile 28 May 2012 22:23:32 -0000 @@ -1,6 +1,6 @@ # $OpenBSD: Makefile,v 1.2 2012/05/28 17:08:48 schwarze Exp $ -REGRESS_TARGETS=h z +REGRESS_TARGETS=one two multi c f h z ignore # Postprocessing to remove "character backspace" sequences # unless they are foolowed by the same character again. Index: regress/usr.bin/mandoc/roff/esc/c.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/c.in diff -N regress/usr.bin/mandoc/roff/esc/c.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/c.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,13 @@ +.Dd May 28, 2012 +.Dt ESC-C 1 +.Os OpenBSD +.Sh NAME +.Nm esc-c +.Nd the roff escape c sequence: remove trailing space +.Sh DESCRIPTION +No space between +.Dq one +and +.Dq word : +one\c +word Index: regress/usr.bin/mandoc/roff/esc/c.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/c.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/c.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/c.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,9 @@ +ESC-C(1) OpenBSD Reference Manual ESC-C(1) + +NNAAMMEE + eesscc--cc - the roff escape c sequence: remove trailing space + +DDEESSCCRRIIPPTTIIOONN + No space between ``one'' and ``word'': oneword + +OpenBSD May 28, 2012 OpenBSD Index: regress/usr.bin/mandoc/roff/esc/f.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/f.in diff -N regress/usr.bin/mandoc/roff/esc/f.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/f.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,12 @@ +.Dd May 28, 2012 +.Dt ESC-F 1 +.Os OpenBSD +.Sh NAME +.Nm esc-f +.Nd the roff escape f sequence: font changes +.Sh DESCRIPTION +numbers: \f3bold\f2italic\f1roman +.br +letters: \fBbold\fIitalic\fPback\fRroman +.br +multiletter: \f[B]bold\f[I]italic\f[P]back\f[R]roman Index: regress/usr.bin/mandoc/roff/esc/f.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/f.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/f.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/f.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,11 @@ +ESC-F(1) OpenBSD Reference Manual ESC-F(1) + +NNAAMMEE + eesscc--ff - the roff escape f sequence: font changes + +DDEESSCCRRIIPPTTIIOONN + numbers: bboolldd_i_t_a_l_i_croman + letters: bboolldd_i_t_a_l_i_cbbaacckkroman + multiletter: bboolldd_i_t_a_l_i_cbbaacckkroman + +OpenBSD May 28, 2012 OpenBSD Index: regress/usr.bin/mandoc/roff/esc/ignore.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/ignore.in diff -N regress/usr.bin/mandoc/roff/esc/ignore.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/ignore.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,12 @@ +.Dd May 28, 2012 +.Dt ESC-IGNORE 1 +.Os OpenBSD +.Sh NAME +.Nm esc-ignore +.Nd ignored roff escape sequences +.Sh DESCRIPTION +multiform: a\kxb\k(xyc\k[xyz]d +.br +quoted: a\R'myreg 0'b\R'myreg \A'y'0'c +.br +sizes: a\s0b\s(12c\s[123]d\s'123'e\s'1\w'xy'2'f Index: regress/usr.bin/mandoc/roff/esc/ignore.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/ignore.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/ignore.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/ignore.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,11 @@ +ESC-IGNORE(1) OpenBSD Reference Manual ESC-IGNORE(1) + +NNAAMMEE + eesscc--iiggnnoorree - ignored roff escape sequences + +DDEESSCCRRIIPPTTIIOONN + multiform: abcd + quoted: abc + sizes: abcdef + +OpenBSD May 28, 2012 OpenBSD Index: regress/usr.bin/mandoc/roff/esc/multi.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/multi.in diff -N regress/usr.bin/mandoc/roff/esc/multi.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/multi.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,10 @@ +.Dd May 28, 2012 +.Dt ESC-MULTI 1 +.Os OpenBSD +.Sh NAME +.Nm esc-multi +.Nd roff multi-character escape sequences +.Sh DESCRIPTION +\[tno] \[t+-] \[tmu] \[tdi] \[12] \[14] \[34] +.br +\C'tno' \C't+-' \C'tmu' \C'tdi' \C'12' \C'14' \C'34' Index: regress/usr.bin/mandoc/roff/esc/multi.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/multi.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/multi.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/multi.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,10 @@ +ESC-MULTI(1) OpenBSD Reference Manual ESC-MULTI(1) + +NNAAMMEE + eesscc--mmuullttii - roff multi-character escape sequences + +DDEESSCCRRIIPPTTIIOONN + ~ +- x -:- 1/2 1/4 3/4 + ~ +- x -:- 1/2 1/4 3/4 + +OpenBSD May 28, 2012 OpenBSD Index: regress/usr.bin/mandoc/roff/esc/one.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/one.in diff -N regress/usr.bin/mandoc/roff/esc/one.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/one.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,14 @@ +.Dd May 28, 2012 +.Dt ESC-ONE 1 +.Os OpenBSD +.Sh NAME +.Nm esc-one +.Nd roff one-character escape sequences +.Sh DESCRIPTION +backslash: >\e< +.br +minus: >\-< +.br +acute: >\'< +.br +grave: >\`< Index: regress/usr.bin/mandoc/roff/esc/one.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/one.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/one.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/one.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,12 @@ +ESC-ONE(1) OpenBSD Reference Manual ESC-ONE(1) + +NNAAMMEE + eesscc--oonnee - roff one-character escape sequences + +DDEESSCCRRIIPPTTIIOONN + backslash: >\< + minus: >-< + acute: >'< + grave: >`< + +OpenBSD May 28, 2012 OpenBSD Index: regress/usr.bin/mandoc/roff/esc/two.in =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/two.in diff -N regress/usr.bin/mandoc/roff/esc/two.in --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/two.in 28 May 2012 22:23:32 -0000 @@ -0,0 +1,75 @@ +.Dd May 28, 2012 +.Dt ESC-TWO 1 +.Os OpenBSD +.Sh NAME +.Nm esc-two +.Nd roff two-character escape sequences +.Sh DESCRIPTION +lines: \(ba \(br \(ul \(bb \(sl \(rs +.\" groff doesn't know \(rl +.br +markers: \(bu \(lz \(sq \(ps \(sc \(lh \(rh \(at \(sh \(CR +.\" the circle \(ci differs +.\" the daggers \(dd and \(dg use backspace +.\" groff doesn't know \(OK +.br +legal: \(co \(rg \(tm +.br +punctuation: \(em \(en \(hy +.\" the inverted punctuation \(r! and \(r? use backspace +.br +quotes: \(Bq \(bq \(oq \(cq \(aq \(dq \(Fo \(Fc \(fo \(fc +.\" the double quotes \(lq and \(rq differ +.br +brackets: \(lB \(rB \(lC \(rC \(la \(ra \(bv \(lt \(lk \(rt \(rk \(rb +.\" the left bottom \(lb differs +.br +arrows: \(<- \(-> \(lA \(rA \(hA +.\" the left-right arrow \(<> differs +.\" groff doesn't know \(va and \(vA +.\" the vertical arrows \(da, \(ua, \(uA, \(dA use backspace +.br +logical: \(AN \(OR \(no \(te \(st \(tf \(3d \(or +.\" the universal quantifier \(fa uses backspace +.br +mathematical: \(pl \(mi \(-+ \(+- \(pc \(mu \(di \(f/ \(** +\(<= \(>= \(<< \(>> \(eq \(!= \(== \(ne \(=~ \(ap \(~~ \(~= \(pt +\(es \(mo \(sb \(sp \(ca \(cu +\(sr \(lc \(rc \(lf \(rf \(if \(Ah \(Im \(Re \(pd +.\" groff doesn't know \(-~, \(nb, \(nc, \(-h +.\" these differ: \(nm \(ib \(ip \(/_ \(pp \(gr +.\" these use backspace: \(c* \(c+ \(is +.br +ligatures: \(ff \(fi \(fl \(Fi \(Fl \(AE \(ae \(OE \(oe \(IJ \(ij +.\" the German eszett \(ss differs +.br +accents: \(a" \(a^ \(aa \(ga \(ac \(ad \(ah \(ao \(a~ \(ho \(ha \(ti +.\" the macron \(a- differs +.\" groff doesn't know \(a. +.\" the breve \(ab uses backspace +.br +.\" accented and special letters all use backspace: +.\" \('A \('E \('I \('O \('U \('a \('e \('i \('o \('u +.\" \(`A \(`E \(`I \(`O \(`U \(`a \(`e \(`i \(`o \(`u +.\" \(~A \(~N \(~O \(~a \(~n \(~o +.\" \(:A \(:E \(:I \(:O \(:U \(:a \(:e \(:i \(:o \(:u \(:y +.\" \(^A \(^E \(^I \(^O \(^U \(^a \(^e \(^i \(^o \(^u +.\" \(,C \(,c \(/L \(/l \(/O \(/o \(oA \(oa +.\" \(-D \(Sd \(TP \(Tp +.\" except: +special letter: \(.i +.\" groff doesn't know \(.j +.br +currency: \(Do \(Eu \(eu \(Fo +.\" these use backspace: \(ct \(Ye \(Po \(Cs +.br +units: \(de \(fm +.\" groff doesn't know \(%O, and \(sd and \(mc differ +.br +greek letters: \(*A \(*B \(*E \(*Z \(*Y \(*I \(*K \(*L +\(*M \(*N \(*O \(*P \(*R \(*T \(*U \(*X +\(*a \(*b \(*g \(*d \(*e \(*y \(*i \(*k +\(*n \(*o \(*r \(*u \(*x \(*w \(+e \(ts +.\" these differ: \(*G \(*S \(*F +.\" these use backspace: \(*D \(*H \(*C \(*Q \(*W +.\" \(*z \(*h \(*l \(*m \(*c \(*p \(*s \(*t \(*f \(*q \(+h \(+f \+p Index: regress/usr.bin/mandoc/roff/esc/two.out_ascii =================================================================== RCS file: regress/usr.bin/mandoc/roff/esc/two.out_ascii diff -N regress/usr.bin/mandoc/roff/esc/two.out_ascii --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/usr.bin/mandoc/roff/esc/two.out_ascii 28 May 2012 22:23:32 -0000 @@ -0,0 +1,25 @@ +ESC-TWO(1) OpenBSD Reference Manual ESC-TWO(1) + +NNAAMMEE + eesscc--ttwwoo - roff two-character escape sequences + +DDEESSCCRRIIPPTTIIOONN + lines: | | _ | / \ + markers: o <> [] 9| S <= => @ # _| + legal: (C) (R) tm + punctuation: -- - - + quotes: ,, , ` ' ' " << >> < > + brackets: [ ] { } < > | ,- { -. } -' + arrows: <- -> <= => <=> + logical: ^ v ~ 3 -) .:. .:. | + mathematical: + - -+ +- . x -:- / * <= >= << >> = != == !== =~ ~ ~~ ~= oc + {} E (= =) (^) U \/ |~ ~| |_ _| oo N I R a + ligatures: ff fi fl ffi ffl AE ae OE oe IJ ij + accents: " ^ ' ` , " v o ~ , ^ ~ + special letter: i + currency: $ EUR EUR << + units: o ' + greek letters: A B E Z H I K /\ M N O TT P T Y X a B y d e n i k v o p u + x w e s + +OpenBSD May 28, 2012 OpenBSD -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv