From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p4QJ2ral019200 for ; Thu, 26 May 2011 15:02:54 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 1451814EA61; Thu, 26 May 2011 21:02:47 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-2.sys.kth.se ([130.237.32.160]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id gcK1o6v2XDbI; Thu, 26 May 2011 21:02:45 +0200 (CEST) X-KTH-Auth: kristaps [213.103.216.43] X-KTH-mail-from: kristaps@bsd.lv Received: from macky.local (s213-103-216-43.cust.tele2.se [213.103.216.43]) by smtp-2.sys.kth.se (Postfix) with ESMTP id B490414EA26; Thu, 26 May 2011 21:02:44 +0200 (CEST) Message-ID: <4DDEA3D3.80707@bsd.lv> Date: Thu, 26 May 2011 21:02:43 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: Ingo Schwarze , discuss@mdocml.bsd.lv Subject: Re: mdocml: It's annoying that we don't have preconv, so throw together a References: <201105260030.p4Q0UBe8004660@krisdoz.my.domain> <20110526165415.GA9429@iris.usta.de> In-Reply-To: <20110526165415.GA9429@iris.usta.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit >> It's annoying that we don't have preconv, > > All the more thanks for starting that! > > However, i see main() in there. > That will end up with pipes and knobs. > > In the long run, i think this needs to be intregrated into mandoc, > with as few knobs as possible. OK, maybe we will need a knob to > specify the input encoding, but maybe even that can sometimes be > guessed from the file, for example when there is a BOM. > > In an UTF-8 terminal, > > mandoc foo.jp.1 | less > > ought to be enough for everyone. > > Of course, there is nothing wrong with developing new functionality > stand-alone, it was quite successful with tbl. Ingo, I hope you don't mind that I cross-post this to discuss@... For those of you not reading source-changes, there's now a preconv utility in mdocml for recoding multibyte manuals as mandoc input. And it's more or less finished, not started! ;) I'm able to download the Japanese manuals and read through them just fine. Er... "look at them" just fine. After running them through iconv to UTF-8, of course. I do agree with what you say. When you (and other downstream) think the time has come, it's ready to be migrated. Til then, there's plenty of catching up to do as it is. ;) I'll put out a release in the next few days to get eyeballs on all this locale stuff that's been checked in since BSDCan. Once it's been in the wild for a little bit, I'll be more prepared to muck around with putting it directly into mandoc. In terms of bloat, if we keep to Latin-1, US-ASCII, and UTF-8, there's not very much overhead (as you can see in preconv.c, most of which is read_whole_file, which will soon go away to compat.c where it belongs). Thanks, Kristaps -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv