From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o6DNOgCb011960 for ; Tue, 13 Jul 2010 19:24:43 -0400 (EDT) Received: from smtp-1.sys.kth.se (localhost [127.0.0.1]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 087E01570E3 for ; Wed, 14 Jul 2010 01:24:37 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([127.0.0.1]) by smtp-1.sys.kth.se (smtp-1.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Uxfa9bMhkypB for ; Wed, 14 Jul 2010 01:24:36 +0200 (CEST) X-KTH-Auth: kristaps [93.97.173.6] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: discuss@mdocml.bsd.lv Received: from lappy.bsd.lv (93-97-173-6.zone5.bethere.co.uk [93.97.173.6]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 6C2C81570E2 for ; Wed, 14 Jul 2010 01:24:34 +0200 (CEST) Message-ID: <4C3CF5E5.7060700@bsd.lv> Date: Wed, 14 Jul 2010 01:25:25 +0200 From: Kristaps Dzonsons User-Agent: Thunderbird 2.0.0.16 (X11/20080812) X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: discuss@mdocml.bsd.lv Subject: Re: Raw UTF-8? References: <4c33f0f0.0c87970a.3458.fffff43f@mx.google.com> <20100707185815.GA19725@iris.usta.de> <20100707191807.GA18154@britannica.bec.de> <20100707211212.GC19725@iris.usta.de> <20100707211725.GA29241@britannica.bec.de> <20100709210539.GA2465@roadrunner.spoerlein.net> <4C3A47E9.1080106@bsd.lv> <20100713192341.GB25163@roadrunner.spoerlein.net> In-Reply-To: <20100713192341.GB25163@roadrunner.spoerlein.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit >>> This also works fine with FreeBSD's groff when rendering to UTF-8 aware >>> terminals using -Tutf8 (and of course in -Tps and -Thtml mode). >>> >>> I really hope the sentiment expressed in this thread is in jest, as I >>> would stop considering mandoc(1) a viable alternative for FreeBSD's man >>> subsystem if it will never support UTF-8 output (and then render \(:o as >>> ö like it should). >> I think there's a little confusion here. I see Ingo just wrote and >> answered most questions. Well, no point in wasting a response... >> >> The state of affairs follows: >> >> - mandoc/groff accept and understand ASCII input >> - mandoc/groff [sometimes] accept but DO NOT understand non-ASCII input >> >> That UTF-8 input renders on your screen is coincidence: you happen to >> have a UTF-8 terminal and groff hasn't puked on the characters. You >> implicitly assume your readers' mediums have the same capabilities. >> >> Now for the \[foo] syntax. First, it exists. Second, it covers most >> European characters. Is it general? No. Why let it stay? Because it >> lets \(:u be both "u" (my terminal) and ü (e.g. www output). If you >> don't use the \[foo] escapes, you're screwing readers. Yes, we're >> screwing non-western-European manual writers ("me") already, but this is >> not a problem we need to solve right now. > > I completely agree here, there's nothing fancy we could or should do > regarding input. Yes. Note that the problem space lies entirely within -Tps, which for now has hard-coded glyph widths. > >> Now for output and The Good Stuff. >> >> -Tutf8 is not hard. I think I can manage this in coming releases >> without any negative effects. In fact, it will cut the binary size, as >> I'd key special chars as integers and rewrite them on the fly into >> UTF-8, Latin-1, or whatever, for all outputs. > > Sounds great, do you also plan on adding "special chars" support to -Tps > (mostly for latin1 accents and umlauts)? Yes. I want to roll it into the next release along with the chars.in upgrade. Thanks, Kristaps -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv