Hi, Both groff and Heirloom troff transform the following characters in typeset output such as -Tpdf: transforms ` into U+2018 (left single quote) transforms ' into U+2019 (right single quote) transforms ~ into U+02DC (small tilde) (Plan 9 and mandoc only do the first two.) These, particularly the second one, are desirable when used in prose. But when the original ASCII character is meant (such as when listing command-line input, or describing how to escape characters), it must be escaped to provide correct output in all formats. Index: apropos.1 =================================================================== RCS file: /cvs/mdocml/apropos.1,v retrieving revision 1.37 diff -u -p -u -p -r1.37 apropos.1 --- apropos.1 16 Feb 2015 16:23:54 -0000 1.37 +++ apropos.1 21 Mar 2015 09:11:56 -0000 @@ -210,7 +210,7 @@ This has syntax .Sm off .Oo .Op Ar key Op , Ar key ... -.Pq Cm = | ~ +.Pq Cm = | \(ti .Oc .Ar val , .Sm on @@ -227,7 +227,7 @@ for a list of available keys. Operator .Cm = evaluates a substring, while -.Cm ~ +.Cm \(ti evaluates a regular expression. .It Fl i Ar term If @@ -398,7 +398,7 @@ as well: .Pp Search in names and descriptions using a regular expression: .Pp -.Dl $ apropos '~set.?[ug]id' +.Dl $ apropos \(aq\(tiset.?[ug]id\(aq .Pp Search for manuals in the library section mentioning both the .Qq optind @@ -413,15 +413,15 @@ Do exactly the same as calling with the argument .Qq ssh : .Pp -.Dl $ apropos \-\- \-i 'Nm~[[:<:]]ssh[[:>:]]' +.Dl $ apropos \-\- \-i \(aqNm\(ti[[:<:]]ssh[[:>:]]\(aq .Pp The following two invocations are equivalent: .Pp .D1 Li $ apropos -S Ar arch Li -s Ar section expression .Bd -ragged -offset indent .Li $ apropos \e( Ar expression Li \e) -.Li -a arch~^( Ns Ar arch Ns Li |any)$ -.Li -a sec~^ Ns Ar section Ns Li $ +.Li -a arch\(ti^( Ns Ar arch Ns Li |any)$ +.Li -a sec\(ti^ Ns Ar section Ns Li $ .Ed .Sh SEE ALSO .Xr man 1 , Index: eqn.7 =================================================================== RCS file: /cvs/mdocml/eqn.7,v retrieving revision 1.34 diff -u -p -u -p -r1.34 eqn.7 --- eqn.7 9 Mar 2015 20:17:23 -0000 1.34 +++ eqn.7 21 Mar 2015 09:11:56 -0000 @@ -146,7 +146,7 @@ is used as the delimiter for the value .Ar val . This allows for arbitrary enclosure of terms (not just quotes), such as .Pp -.D1 Cm define Ar foo 'bar baz' +.D1 Cm define Ar foo \(aqbar baz\(aq .D1 Cm define Ar foo cbar bazc .Pp It is an error to have an empty @@ -166,8 +166,8 @@ created. Definitions can create arbitrary strings, for example, the following is a legal construction. .Bd -literal -offset indent -define foo 'define' -foo bar 'baz' +define foo \(aqdefine\(aq +foo bar \(aqbaz\(aq .Ed .Pp Self-referencing definitions will raise an error. Index: mandoc.1 =================================================================== RCS file: /cvs/mdocml/mandoc.1,v retrieving revision 1.155 diff -u -p -u -p -r1.155 mandoc.1 --- mandoc.1 23 Feb 2015 13:31:03 -0000 1.155 +++ mandoc.1 21 Mar 2015 09:11:56 -0000 @@ -570,7 +570,7 @@ as the style-sheet: .Pp To check over a large set of manuals: .Pp -.Dl $ mandoc \-Tlint `find /usr/src -name \e*\e.[1-9]` +.Dl $ mandoc \-Tlint \`find /usr/src -name \e*\e.[1-9]\` .Pp To produce a series of PostScript manuals for A4 paper: .Pp Index: mandoc_char.7 =================================================================== RCS file: /cvs/mdocml/mandoc_char.7,v retrieving revision 1.59 diff -u -p -u -p -r1.59 mandoc_char.7 --- mandoc_char.7 20 Jan 2015 19:39:34 -0000 1.59 +++ mandoc_char.7 21 Mar 2015 09:11:56 -0000 @@ -196,7 +196,7 @@ Spacing: .Bl -column "Input" "Description" -offset indent -compact .It Em Input Ta Em Description .It Sq \e\ \& Ta unpaddable non-breaking space -.It \e~ Ta paddable non-breaking space +.It \e\(ti Ta paddable non-breaking space .It \e0 Ta unpaddable, breaking digit-width space .It \e| Ta one-sixth \e(em narrow space, zero width in nroff mode .It \e^ Ta one-twelfth \e(em half-narrow space, zero width in nroff @@ -371,9 +371,9 @@ Mathematical: .It \e(ne Ta \(ne Ta not equivalent .It \e(ap Ta \(ap Ta tilde operator .It \e(|= Ta \(|= Ta asymptotically equal -.It \e(=~ Ta \(=~ Ta approximately equal -.It \e(~~ Ta \(~~ Ta almost equal -.It \e(~= Ta \(~= Ta almost equal +.It \e(=\(ti Ta \(=~ Ta approximately equal +.It \e(\(ti\(ti Ta \(~~ Ta almost equal +.It \e(\(ti= Ta \(~= Ta almost equal .It \e(pt Ta \(pt Ta proportionate .It \e(es Ta \(es Ta empty set .It \e(mo Ta \(mo Ta element @@ -436,15 +436,15 @@ Accents: .It \e(a. Ta \(a. Ta dotted .It \e(a^ Ta \(a^ Ta circumflex .It \e(aa Ta \(aa Ta acute -.It \e' Ta \' Ta acute +.It \e\(aq Ta \' Ta acute .It \e(ga Ta \(ga Ta grave -.It \e` Ta \` Ta grave +.It \e\` Ta \` Ta grave .It \e(ab Ta \(ab Ta breve .It \e(ac Ta \(ac Ta cedilla .It \e(ad Ta \(ad Ta dieresis .It \e(ah Ta \(ah Ta caron .It \e(ao Ta \(ao Ta ring -.It \e(a~ Ta \(a~ Ta tilde +.It \e(a\(ti Ta \(a~ Ta tilde .It \e(ho Ta \(ho Ta ogonek .It \e(ha Ta \(ha Ta hat (text) .It \e(ti Ta \(ti Ta tilde (text) @@ -453,32 +453,32 @@ Accents: Accented letters: .Bl -column "Input" "Rendered" "Description" -offset indent -compact .It Em Input Ta Em Rendered Ta Em Description -.It \e('A Ta \('A Ta acute A -.It \e('E Ta \('E Ta acute E -.It \e('I Ta \('I Ta acute I -.It \e('O Ta \('O Ta acute O -.It \e('U Ta \('U Ta acute U -.It \e('a Ta \('a Ta acute a -.It \e('e Ta \('e Ta acute e -.It \e('i Ta \('i Ta acute i -.It \e('o Ta \('o Ta acute o -.It \e('u Ta \('u Ta acute u -.It \e(`A Ta \(`A Ta grave A -.It \e(`E Ta \(`E Ta grave E -.It \e(`I Ta \(`I Ta grave I -.It \e(`O Ta \(`O Ta grave O -.It \e(`U Ta \(`U Ta grave U -.It \e(`a Ta \(`a Ta grave a -.It \e(`e Ta \(`e Ta grave e -.It \e(`i Ta \(`i Ta grave i -.It \e(`o Ta \(`i Ta grave o -.It \e(`u Ta \(`u Ta grave u -.It \e(~A Ta \(~A Ta tilde A -.It \e(~N Ta \(~N Ta tilde N -.It \e(~O Ta \(~O Ta tilde O -.It \e(~a Ta \(~a Ta tilde a -.It \e(~n Ta \(~n Ta tilde n -.It \e(~o Ta \(~o Ta tilde o +.It \e(\(aqA Ta \('A Ta acute A +.It \e(\(aqE Ta \('E Ta acute E +.It \e(\(aqI Ta \('I Ta acute I +.It \e(\(aqO Ta \('O Ta acute O +.It \e(\(aqU Ta \('U Ta acute U +.It \e(\(aqa Ta \('a Ta acute a +.It \e(\(aqe Ta \('e Ta acute e +.It \e(\(aqi Ta \('i Ta acute i +.It \e(\(aqo Ta \('o Ta acute o +.It \e(\(aqu Ta \('u Ta acute u +.It \e(\`A Ta \(`A Ta grave A +.It \e(\`E Ta \(`E Ta grave E +.It \e(\`I Ta \(`I Ta grave I +.It \e(\`O Ta \(`O Ta grave O +.It \e(\`U Ta \(`U Ta grave U +.It \e(\`a Ta \(`a Ta grave a +.It \e(\`e Ta \(`e Ta grave e +.It \e(\`i Ta \(`i Ta grave i +.It \e(\`o Ta \(`i Ta grave o +.It \e(\`u Ta \(`u Ta grave u +.It \e(\(tiA Ta \(~A Ta tilde A +.It \e(\(tiN Ta \(~N Ta tilde N +.It \e(\(tiO Ta \(~O Ta tilde O +.It \e(\(tia Ta \(~a Ta tilde a +.It \e(\(tin Ta \(~n Ta tilde n +.It \e(\(tio Ta \(~o Ta tilde o .It \e(:A Ta \(:A Ta dieresis A .It \e(:E Ta \(:E Ta dieresis E .It \e(:I Ta \(:I Ta dieresis I @@ -657,7 +657,7 @@ manual. .Sh UNICODE CHARACTERS The escape sequences .Pp -.Dl \e[uXXXX] and \eC'uXXXX' +.Dl \e[uXXXX] and \eC\(aquXXXX\(aq .Pp are interpreted as Unicode codepoints. The codepoint must be in the range above U+0080 and less than U+10FFFF. @@ -685,7 +685,7 @@ escape sequence, inserting the character from the current character set into the output. Of course, this is inherently non-portable and is already marked as deprecated in the Heirloom roff manual. -For example, do not use \eN'34', use \e(dq, or even the plain +For example, do not use \eN\(aq34\(aq, use \e(dq, or even the plain .Sq \(dq character where possible. .Sh COMPATIBILITY @@ -709,7 +709,7 @@ In .Fl T Ns Cm html and .Fl T Ns Cm xhtml , -the \e(~=, \e(nb, and \e(nc special characters render differently +the \e(\(ti=, \e(nb, and \e(nc special characters render differently between mandoc and groff. .It The -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
Hi Anthony, Anthony J. Bentley wrote on Sat, Mar 21, 2015 at 03:30:53AM -0600: > Both groff and Heirloom troff transform the following characters in > typeset output such as -Tpdf: > > transforms ` into U+2018 (left single quote) > transforms ' into U+2019 (right single quote) > transforms ~ into U+02DC (small tilde) > > (Plan 9 and mandoc only do the first two.) > > These, particularly the second one, are desirable when used in prose. > But when the original ASCII character is meant (such as when listing > command-line input, or describing how to escape characters), it must > be escaped to provide correct output in all formats. I hate this. It is backwards, i consider it a bug in troff. ASCII input should produce ASCII output. If people want fancy special characters in the output, it is OK that fancy character escape sequences are required in the input. But it is not OK to require escape sequences in the input to get ASCII output - except in cases where the language syntax requires it, like for \e or for \(dq at the beginning of macro arguments. As long as the problem is limited to PostScript and PDF output, i'd rather ignore it than pester authors with such madness. Yours, Ingo > Index: apropos.1 > =================================================================== > RCS file: /cvs/mdocml/apropos.1,v > retrieving revision 1.37 > diff -u -p -u -p -r1.37 apropos.1 > --- apropos.1 16 Feb 2015 16:23:54 -0000 1.37 > +++ apropos.1 21 Mar 2015 09:11:56 -0000 > @@ -210,7 +210,7 @@ This has syntax > .Sm off > .Oo > .Op Ar key Op , Ar key ... > -.Pq Cm = | ~ > +.Pq Cm = | \(ti > .Oc > .Ar val , > .Sm on > @@ -227,7 +227,7 @@ for a list of available keys. > Operator > .Cm = > evaluates a substring, while > -.Cm ~ > +.Cm \(ti > evaluates a regular expression. > .It Fl i Ar term > If > @@ -398,7 +398,7 @@ as well: > .Pp > Search in names and descriptions using a regular expression: > .Pp > -.Dl $ apropos '~set.?[ug]id' > +.Dl $ apropos \(aq\(tiset.?[ug]id\(aq > .Pp > Search for manuals in the library section mentioning both the > .Qq optind > @@ -413,15 +413,15 @@ Do exactly the same as calling > with the argument > .Qq ssh : > .Pp > -.Dl $ apropos \-\- \-i 'Nm~[[:<:]]ssh[[:>:]]' > +.Dl $ apropos \-\- \-i \(aqNm\(ti[[:<:]]ssh[[:>:]]\(aq > .Pp > The following two invocations are equivalent: > .Pp > .D1 Li $ apropos -S Ar arch Li -s Ar section expression > .Bd -ragged -offset indent > .Li $ apropos \e( Ar expression Li \e) > -.Li -a arch~^( Ns Ar arch Ns Li |any)$ > -.Li -a sec~^ Ns Ar section Ns Li $ > +.Li -a arch\(ti^( Ns Ar arch Ns Li |any)$ > +.Li -a sec\(ti^ Ns Ar section Ns Li $ > .Ed > .Sh SEE ALSO > .Xr man 1 , > Index: eqn.7 > =================================================================== > RCS file: /cvs/mdocml/eqn.7,v > retrieving revision 1.34 > diff -u -p -u -p -r1.34 eqn.7 > --- eqn.7 9 Mar 2015 20:17:23 -0000 1.34 > +++ eqn.7 21 Mar 2015 09:11:56 -0000 > @@ -146,7 +146,7 @@ is used as the delimiter for the value > .Ar val . > This allows for arbitrary enclosure of terms (not just quotes), such as > .Pp > -.D1 Cm define Ar foo 'bar baz' > +.D1 Cm define Ar foo \(aqbar baz\(aq > .D1 Cm define Ar foo cbar bazc > .Pp > It is an error to have an empty > @@ -166,8 +166,8 @@ created. > Definitions can create arbitrary strings, for example, the following is > a legal construction. > .Bd -literal -offset indent > -define foo 'define' > -foo bar 'baz' > +define foo \(aqdefine\(aq > +foo bar \(aqbaz\(aq > .Ed > .Pp > Self-referencing definitions will raise an error. > Index: mandoc.1 > =================================================================== > RCS file: /cvs/mdocml/mandoc.1,v > retrieving revision 1.155 > diff -u -p -u -p -r1.155 mandoc.1 > --- mandoc.1 23 Feb 2015 13:31:03 -0000 1.155 > +++ mandoc.1 21 Mar 2015 09:11:56 -0000 > @@ -570,7 +570,7 @@ as the style-sheet: > .Pp > To check over a large set of manuals: > .Pp > -.Dl $ mandoc \-Tlint `find /usr/src -name \e*\e.[1-9]` > +.Dl $ mandoc \-Tlint \`find /usr/src -name \e*\e.[1-9]\` > .Pp > To produce a series of PostScript manuals for A4 paper: > .Pp > Index: mandoc_char.7 > =================================================================== > RCS file: /cvs/mdocml/mandoc_char.7,v > retrieving revision 1.59 > diff -u -p -u -p -r1.59 mandoc_char.7 > --- mandoc_char.7 20 Jan 2015 19:39:34 -0000 1.59 > +++ mandoc_char.7 21 Mar 2015 09:11:56 -0000 > @@ -196,7 +196,7 @@ Spacing: > .Bl -column "Input" "Description" -offset indent -compact > .It Em Input Ta Em Description > .It Sq \e\ \& Ta unpaddable non-breaking space > -.It \e~ Ta paddable non-breaking space > +.It \e\(ti Ta paddable non-breaking space > .It \e0 Ta unpaddable, breaking digit-width space > .It \e| Ta one-sixth \e(em narrow space, zero width in nroff mode > .It \e^ Ta one-twelfth \e(em half-narrow space, zero width in nroff > @@ -371,9 +371,9 @@ Mathematical: > .It \e(ne Ta \(ne Ta not equivalent > .It \e(ap Ta \(ap Ta tilde operator > .It \e(|= Ta \(|= Ta asymptotically equal > -.It \e(=~ Ta \(=~ Ta approximately equal > -.It \e(~~ Ta \(~~ Ta almost equal > -.It \e(~= Ta \(~= Ta almost equal > +.It \e(=\(ti Ta \(=~ Ta approximately equal > +.It \e(\(ti\(ti Ta \(~~ Ta almost equal > +.It \e(\(ti= Ta \(~= Ta almost equal > .It \e(pt Ta \(pt Ta proportionate > .It \e(es Ta \(es Ta empty set > .It \e(mo Ta \(mo Ta element > @@ -436,15 +436,15 @@ Accents: > .It \e(a. Ta \(a. Ta dotted > .It \e(a^ Ta \(a^ Ta circumflex > .It \e(aa Ta \(aa Ta acute > -.It \e' Ta \' Ta acute > +.It \e\(aq Ta \' Ta acute > .It \e(ga Ta \(ga Ta grave > -.It \e` Ta \` Ta grave > +.It \e\` Ta \` Ta grave > .It \e(ab Ta \(ab Ta breve > .It \e(ac Ta \(ac Ta cedilla > .It \e(ad Ta \(ad Ta dieresis > .It \e(ah Ta \(ah Ta caron > .It \e(ao Ta \(ao Ta ring > -.It \e(a~ Ta \(a~ Ta tilde > +.It \e(a\(ti Ta \(a~ Ta tilde > .It \e(ho Ta \(ho Ta ogonek > .It \e(ha Ta \(ha Ta hat (text) > .It \e(ti Ta \(ti Ta tilde (text) > @@ -453,32 +453,32 @@ Accents: > Accented letters: > .Bl -column "Input" "Rendered" "Description" -offset indent -compact > .It Em Input Ta Em Rendered Ta Em Description > -.It \e('A Ta \('A Ta acute A > -.It \e('E Ta \('E Ta acute E > -.It \e('I Ta \('I Ta acute I > -.It \e('O Ta \('O Ta acute O > -.It \e('U Ta \('U Ta acute U > -.It \e('a Ta \('a Ta acute a > -.It \e('e Ta \('e Ta acute e > -.It \e('i Ta \('i Ta acute i > -.It \e('o Ta \('o Ta acute o > -.It \e('u Ta \('u Ta acute u > -.It \e(`A Ta \(`A Ta grave A > -.It \e(`E Ta \(`E Ta grave E > -.It \e(`I Ta \(`I Ta grave I > -.It \e(`O Ta \(`O Ta grave O > -.It \e(`U Ta \(`U Ta grave U > -.It \e(`a Ta \(`a Ta grave a > -.It \e(`e Ta \(`e Ta grave e > -.It \e(`i Ta \(`i Ta grave i > -.It \e(`o Ta \(`i Ta grave o > -.It \e(`u Ta \(`u Ta grave u > -.It \e(~A Ta \(~A Ta tilde A > -.It \e(~N Ta \(~N Ta tilde N > -.It \e(~O Ta \(~O Ta tilde O > -.It \e(~a Ta \(~a Ta tilde a > -.It \e(~n Ta \(~n Ta tilde n > -.It \e(~o Ta \(~o Ta tilde o > +.It \e(\(aqA Ta \('A Ta acute A > +.It \e(\(aqE Ta \('E Ta acute E > +.It \e(\(aqI Ta \('I Ta acute I > +.It \e(\(aqO Ta \('O Ta acute O > +.It \e(\(aqU Ta \('U Ta acute U > +.It \e(\(aqa Ta \('a Ta acute a > +.It \e(\(aqe Ta \('e Ta acute e > +.It \e(\(aqi Ta \('i Ta acute i > +.It \e(\(aqo Ta \('o Ta acute o > +.It \e(\(aqu Ta \('u Ta acute u > +.It \e(\`A Ta \(`A Ta grave A > +.It \e(\`E Ta \(`E Ta grave E > +.It \e(\`I Ta \(`I Ta grave I > +.It \e(\`O Ta \(`O Ta grave O > +.It \e(\`U Ta \(`U Ta grave U > +.It \e(\`a Ta \(`a Ta grave a > +.It \e(\`e Ta \(`e Ta grave e > +.It \e(\`i Ta \(`i Ta grave i > +.It \e(\`o Ta \(`i Ta grave o > +.It \e(\`u Ta \(`u Ta grave u > +.It \e(\(tiA Ta \(~A Ta tilde A > +.It \e(\(tiN Ta \(~N Ta tilde N > +.It \e(\(tiO Ta \(~O Ta tilde O > +.It \e(\(tia Ta \(~a Ta tilde a > +.It \e(\(tin Ta \(~n Ta tilde n > +.It \e(\(tio Ta \(~o Ta tilde o > .It \e(:A Ta \(:A Ta dieresis A > .It \e(:E Ta \(:E Ta dieresis E > .It \e(:I Ta \(:I Ta dieresis I > @@ -657,7 +657,7 @@ manual. > .Sh UNICODE CHARACTERS > The escape sequences > .Pp > -.Dl \e[uXXXX] and \eC'uXXXX' > +.Dl \e[uXXXX] and \eC\(aquXXXX\(aq > .Pp > are interpreted as Unicode codepoints. > The codepoint must be in the range above U+0080 and less than U+10FFFF. > @@ -685,7 +685,7 @@ escape sequence, inserting the character > from the current character set into the output. > Of course, this is inherently non-portable and is already marked > as deprecated in the Heirloom roff manual. > -For example, do not use \eN'34', use \e(dq, or even the plain > +For example, do not use \eN\(aq34\(aq, use \e(dq, or even the plain > .Sq \(dq > character where possible. > .Sh COMPATIBILITY > @@ -709,7 +709,7 @@ In > .Fl T Ns Cm html > and > .Fl T Ns Cm xhtml , > -the \e(~=, \e(nb, and \e(nc special characters render differently > +the \e(\(ti=, \e(nb, and \e(nc special characters render differently > between mandoc and groff. > .It > The -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
Hi Ingo,
Ingo Schwarze writes:
> Hi Anthony,
>
> Anthony J. Bentley wrote on Sat, Mar 21, 2015 at 03:30:53AM -0600:
>
> > Both groff and Heirloom troff transform the following characters in
> > typeset output such as -Tpdf:
> >
> > transforms ` into U+2018 (left single quote)
> > transforms ' into U+2019 (right single quote)
> > transforms ~ into U+02DC (small tilde)
> >
> > (Plan 9 and mandoc only do the first two.)
> >
> > These, particularly the second one, are desirable when used in prose.
> > But when the original ASCII character is meant (such as when listing
> > command-line input, or describing how to escape characters), it must
> > be escaped to provide correct output in all formats.
>
> I hate this. It is backwards, i consider it a bug in troff.
I wouldn't call it a bug. Converting ' to an apostrophe is a very
natural thing for a typesetter to do, and troff has done so since the
mid 1970s if not earlier. The vast majority of uses of ', even in
manuals, are in prose, not code examples that need a literal ASCII '.
Escaping the few instances where it is necessary is not a huge burden.
I do care about PDF output. In fact for at least five years I've often
referred to groff's PDF output when looking at manuals because it
provides a greater visual distinction between different types of
semantic input than the terminal, or even mandoc HTML output.
--
Anthony J. Bentley
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv