From: Ingo Schwarze <schwarze@usta.de>
To: Thomas Klausner <wiz@NetBSD.org>
Cc: discuss@mdocml.bsd.lv
Subject: Re: FWD: man.conf mandoc -Tlocale
Date: Sat, 15 Feb 2014 10:42:51 +0100 [thread overview]
Message-ID: <20140215094251.GA24366@iris.usta.de> (raw)
In-Reply-To: <20140215084309.GA14964@danbala.tuwien.ac.at>
Hi Thomas,
Thomas Klausner wrote on Sat, Feb 15, 2014 at 09:43:09AM +0100:
> On Fri, Feb 14, 2014 at 02:06:47PM +0100, Ingo Schwarze wrote:
>> in OpenBSD, we are discussing to move to mandoc(1) default
>> from -Tascii to -Tlocale, see the mail on <tech@openbsd.org>
>> below.
>>
>> How do you feel about that idea, in particular regarding other
>> operating systems like DragonFly, NetBSD, FreeBSD and from the
>> perspective of the pkgsrc packaging system?
> I've tried this on the NetBSD man ls(1) man page with
> LC_CTYPE=de_DE.UTF-8 and didn't see a difference.
>
> # man ls > ls.default
> man: Formatting manual page...
> # mandoc -Tlocale /usr/share/man/man1/ls.1 > ls.locale
> # diff ls.*
> #
>
> Ideas why, or is this expected?
No, it is not expected.
I just retried this sequence of commands on OpenBSD.
It works as expected using any of the following mandoc binaries:
- the one built from OpenBSD base (built using the OpenBSD
build system, not including compat glue)
- the one built from mdocml.bsd.lv HEAD (built using the
portable build system, including compat glue)
- the one built from the mdocml.bsd.lv VERSION_1_12 branch
(built using the portable build system, including compat glue)
When running with -Tlocale, all three mandoc binaries produce
output where the "``" and "''" double quotes in the NetBSD ls(1)
manual page (checked out from NetBSD CVS, src/bin/ls/ls.1)
are rendered as a single UTF-8 character, specifically:
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE=de_DE.UTF-8
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_MESSAGES="C"
LC_ALL=
$ mandoc -Tlocale ls.1 | hexdump -C | grep -A1 digit
00000470 63 20 64 69 67 69 74 20 e2 80 9c 6f 6e 65 e2 80 |c digit ...one..|
00000480 9d 2e 29 20 46 6f 72 63 65 20 6f 75 74 70 75 74 |..) Force output|
To debug your problem, i'd suggest to first find out what exactly is
broken for you.
Does -Tutf8 output differ from -Tascii output?
If output is not different even with -Tutf8, UTF-8 output itself
is likely to be broken, as opposed to locale detection.
In that case, i'd recommend to check what the value of the USE_WCHAR
preprocessor #define is while compiling the file term_ascii.c.
If it is different, locale detection is likely to be broken.
In that case, i'd recommend to use gdb(1) to run "mandoc -Tlocale ls.1"
while LC_CTYPE=de_DE.UTF-8 is set and find out, in the file term_ascii.c,
function ascii_init(), what the value of the local variable "v" is
right after this function call: setlocale(LC_ALL, "")
> One thing I remember being broken at some point: Does this still allow
> examples to be copied, or do we have to be extra careful about marking
> them up then?
Yes. Plain '-' as an input character is rendered as an UTF-8 hyphen:
$ mandoc -Tlocale ls.1 | hexdump -C | head -n 7 | tail -n 2
00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. |
00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis|
However, the input string "\-" is rendered as a plain ASCII minus sign,
even with -Tutf8:
$ mandoc -Tlocale ls.1 | hexdump -C | head -n 70 | tail -n 3
00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio|
00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow|
00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 |
If i understand correctly, that is usual typographical convention
in roff typesetting.
> At some point (sorry, I don't remember details, not even if it was
> mandoc or groff) I had the annoying state where 'man foo' replaced
> dashes with some UTF-8 dash that the shell didn't accept as when
> pasting it in a shell.
Yes, that can happen.
Actually, groff does exactly the same, and it does so by default:
$ echo $LC_CTYPE
de_DE.UTF-8
$ nroff -mandoc -c ls.1 | hexdump -C | head -n 7 | tail -n 2
00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. |
00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis|
$ nroff -mandoc -c ls.1 | hexdump -C | head -n 70 | tail -n 3
00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio|
00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow|
00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 |
Yours,
Ingo
--
To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv
next prev parent reply other threads:[~2014-02-15 9:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <sfid-H20140214-152923-+048.24-1@spamfilter.osbf.lua>
2014-02-14 13:06 ` Ingo Schwarze
2014-02-15 8:43 ` Thomas Klausner
2014-02-15 9:42 ` Ingo Schwarze [this message]
2014-02-16 20:56 ` Ingo Schwarze
2014-02-17 11:41 ` Ulrich Spörlein
2014-02-17 11:55 ` Anthony J. Bentley
2014-03-13 9:16 ` Thomas Klausner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140215094251.GA24366@iris.usta.de \
--to=schwarze@usta.de \
--cc=discuss@mdocml.bsd.lv \
--cc=wiz@NetBSD.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).