From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o6BMcrFM000267 for ; Sun, 11 Jul 2010 18:38:53 -0400 (EDT) Received: from smtp-2.sys.kth.se (localhost [127.0.0.1]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 47DEB14F118 for ; Mon, 12 Jul 2010 00:38:47 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-2.sys.kth.se ([127.0.0.1]) by smtp-2.sys.kth.se (smtp-2.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id AkSuU0LGMcMd for ; Mon, 12 Jul 2010 00:38:35 +0200 (CEST) X-KTH-Auth: kristaps [85.8.60.253] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: discuss@mdocml.bsd.lv Received: from lappy.bsd.lv (h85-8-60-253.dynamic.se.alltele.net [85.8.60.253]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 36A5A14DC5F for ; Mon, 12 Jul 2010 00:38:34 +0200 (CEST) Message-ID: <4C3A47E9.1080106@bsd.lv> Date: Mon, 12 Jul 2010 00:38:33 +0200 From: Kristaps Dzonsons User-Agent: Thunderbird 2.0.0.16 (X11/20080812) X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: discuss@mdocml.bsd.lv Subject: Re: Raw UTF-8? References: <4c33f0f0.0c87970a.3458.fffff43f@mx.google.com> <20100707185815.GA19725@iris.usta.de> <20100707191807.GA18154@britannica.bec.de> <20100707211212.GC19725@iris.usta.de> <20100707211725.GA29241@britannica.bec.de> <20100709210539.GA2465@roadrunner.spoerlein.net> In-Reply-To: <20100709210539.GA2465@roadrunner.spoerlein.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit > This also works fine with FreeBSD's groff when rendering to UTF-8 aware > terminals using -Tutf8 (and of course in -Tps and -Thtml mode). > > I really hope the sentiment expressed in this thread is in jest, as I > would stop considering mandoc(1) a viable alternative for FreeBSD's man > subsystem if it will never support UTF-8 output (and then render \(:o as > ö like it should). I think there's a little confusion here. I see Ingo just wrote and answered most questions. Well, no point in wasting a response... The state of affairs follows: - mandoc/groff accept and understand ASCII input - mandoc/groff [sometimes] accept but DO NOT understand non-ASCII input That UTF-8 input renders on your screen is coincidence: you happen to have a UTF-8 terminal and groff hasn't puked on the characters. You implicitly assume your readers' mediums have the same capabilities. Now for the \[foo] syntax. First, it exists. Second, it covers most European characters. Is it general? No. Why let it stay? Because it lets \(:u be both "u" (my terminal) and ü (e.g. www output). If you don't use the \[foo] escapes, you're screwing readers. Yes, we're screwing non-western-European manual writers ("me") already, but this is not a problem we need to solve right now. Now for output and The Good Stuff. -Tutf8 is not hard. I think I can manage this in coming releases without any negative effects. In fact, it will cut the binary size, as I'd key special chars as integers and rewrite them on the fly into UTF-8, Latin-1, or whatever, for all outputs. Thanks, Kristaps -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv