* FWD: man.conf mandoc -Tlocale @ 2014-02-14 13:06 ` Ingo Schwarze 2014-02-15 8:43 ` Thomas Klausner 0 siblings, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2014-02-14 13:06 UTC (permalink / raw) To: discuss Hi, in OpenBSD, we are discussing to move to mandoc(1) default from -Tascii to -Tlocale, see the mail on <tech@openbsd.org> below. How do you feel about that idea, in particular regarding other operating systems like DragonFly, NetBSD, FreeBSD and from the perspective of the pkgsrc packaging system? Yours, Ingo ----- Forwarded message from Ingo Schwarze <schwarze@usta.de> ----- From: Ingo Schwarze <schwarze@usta.de> Sender: owner-tech@openbsd.org Date: Fri, 14 Feb 2014 14:02:26 +0100 To: Ted Unangst <tedu@tedunangst.com> Cc: tech@openbsd.org Subject: Re: man.conf mandoc -Tlocale Hi Ted, Ted Unangst wrote on Thu, Feb 13, 2014 at 09:22:04PM -0500: > About 20 years after the invention of utf-8, I've decided to see what > all the fuss is about and experiment with uxterm and whatnot. > Naturally, this means I want to see sweet fancy quotes in all my man > pages instead of the lame ``fake'' quotes. In order to convince mandoc > to give me what I want, however, requires a command line option. But > what about all those old school ascii only terminals I still sometimes > use? > > mandoc fortunately has an option -Tlocale, which will pick between > ascii and utf8 based on environment. Perfect! Let's use it. > > Tested to work as expected in uxterm. Tested to change nothing in a > regular xterm by default (no LC_CTYPE set). Even though i don't use it, i'm not opposed to your patch. I think it makes sense. I even considered switching the mandoc(1) default from -Tascii to -Tlocale in general, but forgot about it again. If you like the idea, that would be something to do after unlock; it might require explicitly giving the -Tascii option in some build system and similar contexts. I think -Tlocale might be a saner default than -Tascii nowadays. People who don't want UTF-8 shouldn't have it in their LC_CTYPE, and it's hard to see why people who do want it and have it in their LC_CTYPE should be forced to give -Tlocale or something similar to each and every utility they call. What do you think? Ingo > Index: man.conf > =================================================================== > RCS file: /cvs/src/etc/man.conf,v > retrieving revision 1.18 > diff -u -p -r1.18 man.conf > --- man.conf 13 Jul 2013 20:21:52 -0000 1.18 > +++ man.conf 14 Feb 2014 02:14:29 -0000 > @@ -16,15 +16,15 @@ _subdir {cat,man}1 {cat,man}8 {cat,man} > _suffix .0 > _build .0.Z /usr/bin/zcat %s > _build .0.gz /usr/bin/gzcat %s > -_build .[1-9n] /usr/bin/mandoc %s > -_build .[1-9n].Z /usr/bin/zcat %s | /usr/bin/mandoc > -_build .[1-9n].gz /usr/bin/gzcat %s | /usr/bin/mandoc > -_build .[1-9][a-z] /usr/bin/mandoc %s > -_build .[1-9][a-z].Z /usr/bin/zcat %s | /usr/bin/mandoc > -_build .[1-9][a-z].gz /usr/bin/gzcat %s | /usr/bin/mandoc > -_build .tbl /usr/bin/mandoc %s > -_build .tbl.Z /usr/bin/zcat %s | /usr/bin/mandoc > -_build .tbl.gz /usr/bin/gzcat %s | /usr/bin/mandoc > +_build .[1-9n] /usr/bin/mandoc -Tlocale %s > +_build .[1-9n].Z /usr/bin/zcat %s | /usr/bin/mandoc -Tlocale > +_build .[1-9n].gz /usr/bin/gzcat %s | /usr/bin/mandoc -Tlocale > +_build .[1-9][a-z] /usr/bin/mandoc -Tlocale %s > +_build .[1-9][a-z].Z /usr/bin/zcat %s | /usr/bin/mandoc -Tlocale > +_build .[1-9][a-z].gz /usr/bin/gzcat %s | /usr/bin/mandoc -Tlocale > +_build .tbl /usr/bin/mandoc -Tlocale %s > +_build .tbl.Z /usr/bin/zcat %s | /usr/bin/mandoc -Tlocale > +_build .tbl.gz /usr/bin/gzcat %s | /usr/bin/mandoc -Tlocale > > # Sections and their directories. > # All paths ending in '/' are the equivalent of entries specifying that > ----- End forwarded message ----- -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-14 13:06 ` FWD: man.conf mandoc -Tlocale Ingo Schwarze @ 2014-02-15 8:43 ` Thomas Klausner 2014-02-15 9:42 ` Ingo Schwarze 0 siblings, 1 reply; 7+ messages in thread From: Thomas Klausner @ 2014-02-15 8:43 UTC (permalink / raw) To: discuss On Fri, Feb 14, 2014 at 02:06:47PM +0100, Ingo Schwarze wrote: > in OpenBSD, we are discussing to move to mandoc(1) default > from -Tascii to -Tlocale, see the mail on <tech@openbsd.org> > below. > > How do you feel about that idea, in particular regarding other > operating systems like DragonFly, NetBSD, FreeBSD and from the > perspective of the pkgsrc packaging system? I've tried this on the NetBSD man ls(1) man page with LC_CTYPE=de_DE.UTF-8 and didn't see a difference. # man ls > ls.default man: Formatting manual page... # mandoc -Tlocale /usr/share/man/man1/ls.1 > ls.locale # diff ls.* # Ideas why, or is this expected? One thing I remember being broken at some point: Does this still allow examples to be copied, or do we have to be extra careful about marking them up then? At some point (sorry, I don't remember details, not even if it was mandoc or groff) I had the annoying state where 'man foo' replaced dashes with some UTF-8 dash that the shell didn't accept as when pasting it in a shell. Thomas -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-15 8:43 ` Thomas Klausner @ 2014-02-15 9:42 ` Ingo Schwarze 2014-02-16 20:56 ` Ingo Schwarze 2014-03-13 9:16 ` Thomas Klausner 0 siblings, 2 replies; 7+ messages in thread From: Ingo Schwarze @ 2014-02-15 9:42 UTC (permalink / raw) To: Thomas Klausner; +Cc: discuss Hi Thomas, Thomas Klausner wrote on Sat, Feb 15, 2014 at 09:43:09AM +0100: > On Fri, Feb 14, 2014 at 02:06:47PM +0100, Ingo Schwarze wrote: >> in OpenBSD, we are discussing to move to mandoc(1) default >> from -Tascii to -Tlocale, see the mail on <tech@openbsd.org> >> below. >> >> How do you feel about that idea, in particular regarding other >> operating systems like DragonFly, NetBSD, FreeBSD and from the >> perspective of the pkgsrc packaging system? > I've tried this on the NetBSD man ls(1) man page with > LC_CTYPE=de_DE.UTF-8 and didn't see a difference. > > # man ls > ls.default > man: Formatting manual page... > # mandoc -Tlocale /usr/share/man/man1/ls.1 > ls.locale > # diff ls.* > # > > Ideas why, or is this expected? No, it is not expected. I just retried this sequence of commands on OpenBSD. It works as expected using any of the following mandoc binaries: - the one built from OpenBSD base (built using the OpenBSD build system, not including compat glue) - the one built from mdocml.bsd.lv HEAD (built using the portable build system, including compat glue) - the one built from the mdocml.bsd.lv VERSION_1_12 branch (built using the portable build system, including compat glue) When running with -Tlocale, all three mandoc binaries produce output where the "``" and "''" double quotes in the NetBSD ls(1) manual page (checked out from NetBSD CVS, src/bin/ls/ls.1) are rendered as a single UTF-8 character, specifically: $ locale LANG= LC_COLLATE="C" LC_CTYPE=de_DE.UTF-8 LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_MESSAGES="C" LC_ALL= $ mandoc -Tlocale ls.1 | hexdump -C | grep -A1 digit 00000470 63 20 64 69 67 69 74 20 e2 80 9c 6f 6e 65 e2 80 |c digit ...one..| 00000480 9d 2e 29 20 46 6f 72 63 65 20 6f 75 74 70 75 74 |..) Force output| To debug your problem, i'd suggest to first find out what exactly is broken for you. Does -Tutf8 output differ from -Tascii output? If output is not different even with -Tutf8, UTF-8 output itself is likely to be broken, as opposed to locale detection. In that case, i'd recommend to check what the value of the USE_WCHAR preprocessor #define is while compiling the file term_ascii.c. If it is different, locale detection is likely to be broken. In that case, i'd recommend to use gdb(1) to run "mandoc -Tlocale ls.1" while LC_CTYPE=de_DE.UTF-8 is set and find out, in the file term_ascii.c, function ascii_init(), what the value of the local variable "v" is right after this function call: setlocale(LC_ALL, "") > One thing I remember being broken at some point: Does this still allow > examples to be copied, or do we have to be extra careful about marking > them up then? Yes. Plain '-' as an input character is rendered as an UTF-8 hyphen: $ mandoc -Tlocale ls.1 | hexdump -C | head -n 7 | tail -n 2 00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. | 00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis| However, the input string "\-" is rendered as a plain ASCII minus sign, even with -Tutf8: $ mandoc -Tlocale ls.1 | hexdump -C | head -n 70 | tail -n 3 00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio| 00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow| 00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 | If i understand correctly, that is usual typographical convention in roff typesetting. > At some point (sorry, I don't remember details, not even if it was > mandoc or groff) I had the annoying state where 'man foo' replaced > dashes with some UTF-8 dash that the shell didn't accept as when > pasting it in a shell. Yes, that can happen. Actually, groff does exactly the same, and it does so by default: $ echo $LC_CTYPE de_DE.UTF-8 $ nroff -mandoc -c ls.1 | hexdump -C | head -n 7 | tail -n 2 00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. | 00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis| $ nroff -mandoc -c ls.1 | hexdump -C | head -n 70 | tail -n 3 00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio| 00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow| 00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 | Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-15 9:42 ` Ingo Schwarze @ 2014-02-16 20:56 ` Ingo Schwarze 2014-02-17 11:41 ` Ulrich Spörlein 2014-03-13 9:16 ` Thomas Klausner 1 sibling, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2014-02-16 20:56 UTC (permalink / raw) To: Thomas Klausner; +Cc: discuss Hi Thomas, Dmitrij D. Czarkoff just pointed out to me in private mail that my analysis wasn't quite right, so i reinvestigated, and i have to correct this part: Ingo Schwarze wrote on Sat, Feb 15, 2014 at 10:42:51AM +0100: > Thomas Klausner wrote on Sat, Feb 15, 2014 at 09:43:09AM +0100: >> One thing I remember being broken at some point: Does this still allow >> examples to be copied, or do we have to be extra careful about marking >> them up then? > Yes. Plain '-' as an input character is rendered as an UTF-8 hyphen: That is *not* true. Plain '-' always renders as plain '-'. > $ mandoc -Tlocale ls.1 | hexdump -C | head -n 7 | tail -n 2 > 00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. | > 00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis| The reason for this is that we use \(en between .Nm and .Nd in the NAME section, not a plain '-'. > However, the input string "\-" is rendered as a plain ASCII minus sign, > even with -Tutf8: > > $ mandoc -Tlocale ls.1 | hexdump -C | head -n 70 | tail -n 3 > 00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio| > 00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow| > 00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 | That part is correct. So, we have these mappings: input output ----- ASCII UTF-8 ----- ----- - - - \- - - \(hy - U+2010 \(en - U+2013 \(em -- U+2014 See also these lines in chars.in: CHAR("-", "-", 45) CHAR("hy", "-", 8208) CHAR("en", "-", 8211) CHAR("em", "--", 8212) So, unless people put \(hy, \(en, or \(em into their example code, i would expect copy and paste to work just fine even in UTF-8 mode. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-16 20:56 ` Ingo Schwarze @ 2014-02-17 11:41 ` Ulrich Spörlein 2014-02-17 11:55 ` Anthony J. Bentley 0 siblings, 1 reply; 7+ messages in thread From: Ulrich Spörlein @ 2014-02-17 11:41 UTC (permalink / raw) To: discuss; +Cc: Thomas Klausner On Sun, 2014-02-16 at 21:56:55 +0100, Ingo Schwarze wrote: > So, we have these mappings: > > input output > ----- ASCII UTF-8 > ----- ----- > > - - - > \- - - > \(hy - U+2010 > \(en - U+2013 > \(em -- U+2014 > > See also these lines in chars.in: > > CHAR("-", "-", 45) > CHAR("hy", "-", 8208) > CHAR("en", "-", 8211) > CHAR("em", "--", 8212) > > So, unless people put \(hy, \(en, or \(em into their example code, > i would expect copy and paste to work just fine even in UTF-8 mode. I don't think hyphens will be the problem, but quotes, where people might have used .Dq when they actually want the literal ASCII quotes "" as they are to be used in some shell or other code. In any case, -Tlocale should be the default and is the right thing to do, IMHO. Cheers, Uli -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-17 11:41 ` Ulrich Spörlein @ 2014-02-17 11:55 ` Anthony J. Bentley 0 siblings, 0 replies; 7+ messages in thread From: Anthony J. Bentley @ 2014-02-17 11:55 UTC (permalink / raw) To: discuss On Mon, Feb 17, 2014 at 4:41 AM, Ulrich Spörlein <uqs@spoerlein.net> wrote: > On Sun, 2014-02-16 at 21:56:55 +0100, Ingo Schwarze wrote: >> So, we have these mappings: >> >> input output >> ----- ASCII UTF-8 >> ----- ----- >> >> - - - >> \- - - >> \(hy - U+2010 >> \(en - U+2013 >> \(em -- U+2014 >> >> See also these lines in chars.in: >> >> CHAR("-", "-", 45) >> CHAR("hy", "-", 8208) >> CHAR("en", "-", 8211) >> CHAR("em", "--", 8212) >> >> So, unless people put \(hy, \(en, or \(em into their example code, >> i would expect copy and paste to work just fine even in UTF-8 mode. > > I don't think hyphens will be the problem, but quotes, where people > might have used .Dq when they actually want the literal ASCII quotes "" > as they are to be used in some shell or other code. Thankfully, in my experience (using -Tlocale for well over a year) this does not happen often--indeed, I haven't seen it at all. Any such cases would be a bug in the manpage, but this is not a bug that would be common enough to make -Tlocale an impractical default. On the other hand, groff's hyphen substitution is extremely irritating when it's turned on, since you'd be hard pressed to find a manual *not* affected by it. -- Anthony J. Bentley -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FWD: man.conf mandoc -Tlocale 2014-02-15 9:42 ` Ingo Schwarze 2014-02-16 20:56 ` Ingo Schwarze @ 2014-03-13 9:16 ` Thomas Klausner 1 sibling, 0 replies; 7+ messages in thread From: Thomas Klausner @ 2014-03-13 9:16 UTC (permalink / raw) To: Ingo Schwarze; +Cc: discuss Hi Ingo! On Sat, Feb 15, 2014 at 10:42:51AM +0100, Ingo Schwarze wrote: > > I've tried this on the NetBSD man ls(1) man page with > > LC_CTYPE=de_DE.UTF-8 and didn't see a difference. > > > > # man ls > ls.default > > man: Formatting manual page... > > # mandoc -Tlocale /usr/share/man/man1/ls.1 > ls.locale > > # diff ls.* > > # > > > > Ideas why, or is this expected? > > No, it is not expected. It was the simple problem that USE_WCHAR was not defined when building mandoc on NetBSD. I've changed that now. The version from pkgsrc worked fine before that. So both -Tlocale and -Tutf8 now produce UTF-8 double quote characters when run on ls(1). Thanks, Thomas -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-13 9:16 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <sfid-H20140214-152923-+048.24-1@spamfilter.osbf.lua> 2014-02-14 13:06 ` FWD: man.conf mandoc -Tlocale Ingo Schwarze 2014-02-15 8:43 ` Thomas Klausner 2014-02-15 9:42 ` Ingo Schwarze 2014-02-16 20:56 ` Ingo Schwarze 2014-02-17 11:41 ` Ulrich Spörlein 2014-02-17 11:55 ` Anthony J. Bentley 2014-03-13 9:16 ` Thomas Klausner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).