* Raw UTF-8? @ 2010-07-07 3:13 Anthony J. Bentley 2010-07-07 9:33 ` Kristaps Dzonsons 2010-07-07 18:58 ` Ingo Schwarze 0 siblings, 2 replies; 14+ messages in thread From: Anthony J. Bentley @ 2010-07-07 3:13 UTC (permalink / raw) To: discuss Hey guys, When using special characters in manpages, I use plain UTF-8 instead of the escapes documented in mandoc_char(7), for a couple reasons. I'm just wondering, is this practice discouraged in any way? Is there a chance of this _not_ working in future versions of mandoc? -- Thanks, Anthony J. Bentley -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 3:13 Raw UTF-8? Anthony J. Bentley @ 2010-07-07 9:33 ` Kristaps Dzonsons 2010-07-07 14:39 ` Anthony J. Bentley 2010-07-07 18:58 ` Ingo Schwarze 1 sibling, 1 reply; 14+ messages in thread From: Kristaps Dzonsons @ 2010-07-07 9:33 UTC (permalink / raw) To: discuss > When using special characters in manpages, I use plain UTF-8 instead of > the escapes documented in mandoc_char(7), for a couple reasons. I'm just > wondering, is this practice discouraged in any way? Is there a chance > of this _not_ working in future versions of mandoc? This is being discussed on tech@ right now. Currently, once you use any non-ASCII encoding, the manual is no longer accessable to all terminals. This is bad. Furthermore, -Tps will throw away your input. This is more bad. In fact, only -Thtml will be ok with what you do, which is only by dint of it using the same output encoding. groff promises Unicode support in "the next major version". According to their mailing lists, they plan on using \[uNNN] for a Unicode escape and on-the-fly translate input UTF-8 into Unicode (effectively using "int" instead of "char" for characters). http://www.mail-archive.com/groff@gnu.org/msg01378.html I think it's best for the time being to lift the input warnings and document that non-ASCII characters will Balkanise the manual. I'm flapping between warning about it and not warning. What, by the way, are the reasons you have against using the mandoc_char escapes? -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 9:33 ` Kristaps Dzonsons @ 2010-07-07 14:39 ` Anthony J. Bentley 2010-07-07 20:13 ` Ingo Schwarze 0 siblings, 1 reply; 14+ messages in thread From: Anthony J. Bentley @ 2010-07-07 14:39 UTC (permalink / raw) To: discuss; +Cc: Kristaps Dzonsons > What, by the way, are the reasons you have against using the mandoc_char > escapes? Mostly it's just less to memorize. Much easier to remember my compose key's alt / o, as opposed to \(/o or \o or ø or \u00F8 depending on where I'm working at the moment. mandoc_char(7) isn't much help either, as it's several pages long and doesn't display the actual character in the terminal, so I have to take a minute or so just to find the one character... -- Thanks, Anthony J. Bentley -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 14:39 ` Anthony J. Bentley @ 2010-07-07 20:13 ` Ingo Schwarze 0 siblings, 0 replies; 14+ messages in thread From: Ingo Schwarze @ 2010-07-07 20:13 UTC (permalink / raw) To: discuss Hi Anthony, Anthony J. Bentley wrote on Wed, Jul 07, 2010 at 08:39:09AM -0600: > mandoc_char(7) isn't much help either, as it's several pages long and > doesn't display the actual character in the terminal, See the problem? In the manual you write, \(/o won't be displayed either, no matter how you try to input it. The output is the problem here, not the input. I think we should reorganize mandoc_char(7) in the following way. 1) First the sane escape sequences that render well everywhere and serve a real purpose. Examples: \~ \ \& \(ba \(em \(en \(hy \e \- The number of sane sequences is relatively small, which will also solve the problem you are pointing out above: mandoc_char(7) is unreasonably long. A typesetting system like TeX needs long character tables, and it needs the ability to print obscure characters. A manual page does not. 2) Then a sentence explaining that what follows is rarely needed, because mandoc(1) is not really intended for general purpose typesetting, and much less typesetting of mathematical formulas, but just for writing manuals, encouraging people to express their intention using text, not symbols. Still, all escape sequences in this section are guaranteed to render well and may be useful in uncommon situations. Examples: \(co \(rg \(-> \(rA \(+- \(<= \*(Pi 3) Then a sentence explaining that what follows is rarely needed, because mdoc(7) has alternative concepts handling the typical use case better. Note that writing new man(7) code is discouraged anyway. Still, all escape sequences in this section are guaranteed to render well and may be useful in very uncommon situations. Examples: \0 -- use .Dl or .Bd -literal \bu -- use .Bl -bullet \lq -- use .Qq or .Qo \lB -- use .Bq or .Bc 4) Then a sentence mildly discouraging the use of what follows, listing obsolete sequences that render well, but are not needed at all because equivalent recommended escape sequences exist. Examples: \^ \% \| \(at \(mi \(eq \*(Ba 5) Then a sentence strongly discouraging any use of the rest, listing those escape sequences that render badly. Examples: \(r! \(sr \(ss \('e \(`u \(~n \(:a \(^o \(,c \(oa \(eu Obviously, this is by far the largest group. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 3:13 Raw UTF-8? Anthony J. Bentley 2010-07-07 9:33 ` Kristaps Dzonsons @ 2010-07-07 18:58 ` Ingo Schwarze 2010-07-07 19:18 ` Joerg Sonnenberger 1 sibling, 1 reply; 14+ messages in thread From: Ingo Schwarze @ 2010-07-07 18:58 UTC (permalink / raw) To: discuss Hi Anthony, > When using special characters in manpages, I consider that a terrible idea. In a nutshell, such manuals are useless on terminals. If some piece of information is important, you should really encode it such that all readers can see it. If it is unimportant, just leave it out instead of obfuscating it, which will make some people wonder whether they are missing anything. We should probably add a warning to discourage people from using characters needing more than ASCII on output, saying something like "this manual is not portable and will not display correctly in some environments". From my point of view, non-ASCII-output escape sequences are only supported for backward compatibility with legacy manuals, and displaying something semi-sensible in their place is done on a best-effort basis, knowing that it is ultimately unreliable. Using such escape sequences in new mdoc(7) source code, you would only show that you don't care about the usability of your manuals. For the occasional proper name of an author, use transliteration to ASCII. I consider using non-ASCII-output escape sequences in there a discourtesy with respect to the author, because then some people will not be able to read the name. > I use plain UTF-8 instead of the escapes documented in mandoc_char(7), > for a couple reasons. I'm just wondering, is this practice > discouraged in any way? Yes. Eight-Bit characters in roff, man and mdoc source code are syntax errors, just like they are in C and in any sane programming language. The current implementation passes them through, but it could as well throw them away, or abort the parser, subject to change without notice. > Is there a chance of this _not_ working in future versions of mandoc? If it works, that is by mere chance, but not portable in any way, neither between output devices, nor between platforms, nor between different versions of mandoc. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 18:58 ` Ingo Schwarze @ 2010-07-07 19:18 ` Joerg Sonnenberger 2010-07-07 21:12 ` Ingo Schwarze 0 siblings, 1 reply; 14+ messages in thread From: Joerg Sonnenberger @ 2010-07-07 19:18 UTC (permalink / raw) To: discuss On Wed, Jul 07, 2010 at 08:58:15PM +0200, Ingo Schwarze wrote: > For the occasional proper name of an author, use transliteration > to ASCII. I consider using non-ASCII-output escape sequences in > there a discourtesy with respect to the author, because then some > people will not be able to read the name. Actually, I would consider the reverse the correct behavior. The escape sequences should provide the transliteration depending on the device capabilities. Consider my name -- I would strongly hope that output devices with proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall back to the transliterated version otherwise. > > I use plain UTF-8 instead of the escapes documented in mandoc_char(7), > > for a couple reasons. I'm just wondering, is this practice > > discouraged in any way? > > Yes. Eight-Bit characters in roff, man and mdoc source code are syntax > errors, just like they are in C and in any sane programming language. > The current implementation passes them through, but it could as well > throw them away, or abort the parser, subject to change without notice. You know that C99 just like many other modern language (dialects) allow full 8bit input? The primary problem I have with using 8bit input for mandoc(1) (or groff in general) is that it doesn't have a way to specify the input character set. If that is addressed, the discussion would move to the more interesting point of transliteration. Joreg -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 19:18 ` Joerg Sonnenberger @ 2010-07-07 21:12 ` Ingo Schwarze 2010-07-07 21:17 ` Joerg Sonnenberger 0 siblings, 1 reply; 14+ messages in thread From: Ingo Schwarze @ 2010-07-07 21:12 UTC (permalink / raw) To: discuss Hi Joerg, Joerg Sonnenberger wrote on Wed, Jul 07, 2010 at 09:18:08PM +0200: > Consider my name -- I would strongly hope that output devices with > proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall > back to the transliterated version otherwise. You hope in vain. Did you try? Both old and new groff render that as 'J"\borg Sonnenberger', which looks like "Jorg Sonnenberger" on a typical terminal. Maybe the reason for using the unreliable backspace-encoding variant instead of the transliteration "oe" is that more languages than just german might use the "LATIN SMALL LETTER O WITH DIAERESIS", as Unicode calls it, and who knows how a good transliteration from those languages into ASCII might look like? The point is, for correct results, you must transliterate before encoding, when you still know the context, e.g. the language, which is often required to figure out a correct transliteration. Thus, you should really use .An Joerg Sonnenberger and never .An J\(:org Sonnenberger when documenting your programs. > You know that C99 just like many other modern language (dialects) > allow full 8bit input? I know that some do, and i have fought with Python code garbled in that way, and all the more do i call it insane. > The primary problem I have with using 8bit input for mandoc(1) (or groff > in general) is that it doesn't have a way to specify the input character > set. If that is addressed, the discussion would move to the more > interesting point of transliteration. In my experience, as soon as you start dealing with character sets, chaos ensues. WTF has made matters worse, not better, because now many people think it is OK to scatter crap all over the place. In typesetting, the mentioned chaos is unfortunately unavoidable, and you need to deal with it; but most of the time, it is also easier to handle there because in most typesetting environments, you deal with one language at a time, and you know beforehand with which one. Unless we enjoy pain, bloat and code obfuscation *and* want to be continuously distracted from serious development, we should keep mandoc as far away from any kind of charset considerations as possible. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 21:12 ` Ingo Schwarze @ 2010-07-07 21:17 ` Joerg Sonnenberger 2010-07-09 21:05 ` Ulrich Spörlein 0 siblings, 1 reply; 14+ messages in thread From: Joerg Sonnenberger @ 2010-07-07 21:17 UTC (permalink / raw) To: discuss On Wed, Jul 07, 2010 at 11:12:12PM +0200, Ingo Schwarze wrote: > Hi Joerg, > > Joerg Sonnenberger wrote on Wed, Jul 07, 2010 at 09:18:08PM +0200: > > > Consider my name -- I would strongly hope that output devices with > > proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall > > back to the transliterated version otherwise. > > You hope in vain. Did you try? Yes. lp(1) on NetBSD is such an example. It does the right thing with groff. Depending on the output device (-Tlatin1 vs -Tascii), it will either use the umlaut for \(:o or the oe transliteration. Joerg -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-07 21:17 ` Joerg Sonnenberger @ 2010-07-09 21:05 ` Ulrich Spörlein 2010-07-10 18:11 ` J.C. Roberts 2010-07-11 22:38 ` Kristaps Dzonsons 0 siblings, 2 replies; 14+ messages in thread From: Ulrich Spörlein @ 2010-07-09 21:05 UTC (permalink / raw) To: discuss On Wed, 07.07.2010 at 23:17:25 +0200, Joerg Sonnenberger wrote: > On Wed, Jul 07, 2010 at 11:12:12PM +0200, Ingo Schwarze wrote: > > Hi Joerg, > > > > Joerg Sonnenberger wrote on Wed, Jul 07, 2010 at 09:18:08PM +0200: > > > > > Consider my name -- I would strongly hope that output devices with > > > proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall > > > back to the transliterated version otherwise. > > > > You hope in vain. Did you try? > > Yes. lp(1) on NetBSD is such an example. It does the right thing with > groff. Depending on the output device (-Tlatin1 vs -Tascii), it will > either use the umlaut for \(:o or the oe transliteration. This also works fine with FreeBSD's groff when rendering to UTF-8 aware terminals using -Tutf8 (and of course in -Tps and -Thtml mode). I really hope the sentiment expressed in this thread is in jest, as I would stop considering mandoc(1) a viable alternative for FreeBSD's man subsystem if it will never support UTF-8 output (and then render \(:o as ö like it should). Uli -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-09 21:05 ` Ulrich Spörlein @ 2010-07-10 18:11 ` J.C. Roberts 2010-07-11 22:17 ` Ingo Schwarze 2010-07-11 22:38 ` Kristaps Dzonsons 1 sibling, 1 reply; 14+ messages in thread From: J.C. Roberts @ 2010-07-10 18:11 UTC (permalink / raw) To: discuss; +Cc: Ulrich Spörlein On Fri, 9 Jul 2010 22:05:39 +0100 Ulrich Sp__rlein <uqs@spoerlein.net> wrote: > > On Wed, 07.07.2010 at 23:17:25 +0200, Joerg Sonnenberger wrote: > > On Wed, Jul 07, 2010 at 11:12:12PM +0200, Ingo Schwarze wrote: > > > Hi Joerg, > > > > > > Joerg Sonnenberger wrote on Wed, Jul 07, 2010 at 09:18:08PM +0200: > > > > > > > Consider my name -- I would strongly hope that output devices > > > > with proper Latin1/Latin15/UTF-8 support to use the diacrit, > > > > but fall back to the transliterated version otherwise. > > > > > > You hope in vain. Did you try? > > > > Yes. lp(1) on NetBSD is such an example. It does the right thing > > with groff. Depending on the output device (-Tlatin1 vs -Tascii), > > it will either use the umlaut for \(:o or the oe transliteration. > > This also works fine with FreeBSD's groff when rendering to UTF-8 > aware terminals using -Tutf8 (and of course in -Tps and -Thtml mode). > > I really hope the sentiment expressed in this thread is in jest, as I > would stop considering mandoc(1) a viable alternative for FreeBSD's > man subsystem if it will never support UTF-8 output (and then render > \(:o as __ like it should). > > Uli I doubt Ingo was joking and I do understand his concerns, but I agree UTF-8 support is very important and many consider it a "requirement" these days. Personally, I think the \(:o syntax is nonsense. It's an ancient and sad work-around addressing only one of the countless transliterations and/or translations needed for a complete solution. If we tried to create 7-bit strings like this for every possible transliteration and/or translation of every non-ascii character, the list would be absolutely humongous and computationally intractable as well as still being incomplete and often totally inaccurate. UTF-8 sucks less. Since we now have UTF, it seems better to error out on the archaic \(:o syntax to prompt change, rather than support it to prolong a bad idea and yet another syntax everyone needs to learn. More importantly, the real problem is the *idea* of automating transliteration. If you think it through, you'll realize automated transliteration cannot be completely solved. A complete solution would require an accurate transliteration, or even translation, to ascii of every non-ascii character, as well as doing so correctly for every possible language/usage/context. In essence, you are asking for perfect automated translation even when perfect manual translation can be impossible in some situations. Given the need to support ascii-only terminals/outputs, and given the need to support non-ascii characters, and given fully automated transliteration/translation is currently impossible, at first glance it seems there is an irreconcilable conflict. Luckily, we can look at it again. And there is a way to resolve it. Since we cannot solve the problem of automated transliteration (and hence, automated translation) for all cases, the idea itself is flawed. The best thing to do is change the problem we're trying to solve. Instead of trying to automate the transliteration/translation of non-ascii characters, we can impose a simple requirement. The most simple answer would be allow non-ascii if and only if an ascii equivalent is provided, otherwise error. This puts both the option to use non-ascii characters as well as the responsibility of correct transliteration/translation in the hands of the author. I don't mean to pick on Joerg, but names are excellent examples as well as one of the most compelling reasons to have proper support for non-ascii characters. A format something like: {ascii, utf-8} such as: J{oe, \u00F6;}rg or {Joerg, J\u00F6rg} or {Joerg Sonnenberger, J\u00F6rg Sonnenberger} Ummm... no, the above is just an example, not a suggestion of syntax. There's probably an existing IF-THEN-ELSE which could be leveraged without undue overhead, but you get the main idea... --make the author provide both the THEN and the ELSE. In many situations, even when a terminal is capable of displaying the UTF-8, it could still be beneficial to also display the ascii, possibly in parenthesis. There are plenty of idiots like me who do not know how to pronounce or even type an "o" with a diacritic, so showing the ascii transliterated/translated version really does help. If you saw a formal name in Japanese, Arabic, Thai, Russian, or any language you don't know, written in it's native character set, could you pronounce or type it? Worse yet, when it comes to the ascii-fication of non-ascii names of people, there are tons of variations and different people have different preferences, so the result is there is no "right" way to do it and the best practice is to avoid offense by requiring a transcription or translation from the person. And then there are the people who really want to be unique (like everyone else) and intentionally (mis?)spell their name in their own words... http://pichaus.com/wattoom-zink-pubbawup-gazork-@911d3fd1ffb157c0e23066faca4cf751/ The state of California could not write out my full name correctly on my drivers license, and the US Federal Government could not write out my full name correctly on my passport, but at least the latter sent me a nice "Sorry for the inconvenience" letter. Though I personally learned not to care about it at an early age, most people are offended if you get their name wrong. Throwing an error if an ascii equivalent is not provided is fairly harsh, but it is necessary to prevent hidden information on ascii-only terminals/outputs and also to prevent offending anyone (by either omission in ascii, or by misspelling the ascii equivalent). jcr -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-10 18:11 ` J.C. Roberts @ 2010-07-11 22:17 ` Ingo Schwarze 0 siblings, 0 replies; 14+ messages in thread From: Ingo Schwarze @ 2010-07-11 22:17 UTC (permalink / raw) To: discuss Hi Jonathan, J.C. Roberts wrote on Sat, Jul 10, 2010 at 11:11:18AM -0700: > UTF-8 support is very important and many consider it a "requirement" > these days. Not for manual pages, i just don't see the point. That said, i wouldn't oppose a -Tlatin1 or -Tutf8 output mode merely for groff compatibility, as long as it is not too intrusive (which probably it need not be, implementation will be quite local in one corner of the terminal output frontend). But i have no plans to implement, use or maintain it, and i would test it only in so far as it must not break anything else. Also, i would continue urging people to not use it, as manual pages relying on it would sacrifice portability for no good reason. And, of course, i will strongly oppose 8-bit-character input. I'm not willing to deal with multi-byte or wide character support functions anywhere in mandoc's code. > Personally, I think the \(:o syntax is nonsense. Agreed, the reason being that there is no reliable way to render it. Again, i consider it provided for backward compatibility. > Since we now have UTF, it seems better to error out on the archaic \(:o > syntax to prompt change, rather than support it to prolong a bad idea > and yet another syntax everyone needs to learn. No, it is used in too many places, and it would not be nice to deny rendering just because some piece of mdoc(7) or man(7) source code contains syntax we don't like. We are not defining new standards right now, we are re-implementing an existing language. > There's probably an existing IF-THEN-ELSE which could be leveraged > without undue overhead, but you get the main idea... --make the author > provide both the THEN and the ELSE. You mean, like in http://www.openbsd.org/cgi-bin/cvsweb/src/share/man/man4/sppp.4#rev1.20 I don't consider that viable, and jmc and myself agree to remove it from the tree when we find it (unless it comes from upstream). I doubt that people will regularly provide alternatives. Typing in special characters at all is already tedious, providing alternatives will not get done. And even if it is done, the result is incredibly ugly and hardly maintainable. > In many situations, even when a terminal is capable of displaying the > UTF-8, it could still be beneficial to also display the ascii, possibly > in parenthesis. In my opinion, you are vastly overestimating the importance of special characters in manual pages (beware, i'm not talking about typesetting of mathematical papers here!) and you are vastly underestimating the importance of portability and simplicity. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-09 21:05 ` Ulrich Spörlein 2010-07-10 18:11 ` J.C. Roberts @ 2010-07-11 22:38 ` Kristaps Dzonsons 2010-07-13 19:23 ` Ulrich Spörlein 1 sibling, 1 reply; 14+ messages in thread From: Kristaps Dzonsons @ 2010-07-11 22:38 UTC (permalink / raw) To: discuss > This also works fine with FreeBSD's groff when rendering to UTF-8 aware > terminals using -Tutf8 (and of course in -Tps and -Thtml mode). > > I really hope the sentiment expressed in this thread is in jest, as I > would stop considering mandoc(1) a viable alternative for FreeBSD's man > subsystem if it will never support UTF-8 output (and then render \(:o as > ö like it should). I think there's a little confusion here. I see Ingo just wrote and answered most questions. Well, no point in wasting a response... The state of affairs follows: - mandoc/groff accept and understand ASCII input - mandoc/groff [sometimes] accept but DO NOT understand non-ASCII input That UTF-8 input renders on your screen is coincidence: you happen to have a UTF-8 terminal and groff hasn't puked on the characters. You implicitly assume your readers' mediums have the same capabilities. Now for the \[foo] syntax. First, it exists. Second, it covers most European characters. Is it general? No. Why let it stay? Because it lets \(:u be both "u" (my terminal) and ü (e.g. www output). If you don't use the \[foo] escapes, you're screwing readers. Yes, we're screwing non-western-European manual writers ("me") already, but this is not a problem we need to solve right now. Now for output and The Good Stuff. -Tutf8 is not hard. I think I can manage this in coming releases without any negative effects. In fact, it will cut the binary size, as I'd key special chars as integers and rewrite them on the fly into UTF-8, Latin-1, or whatever, for all outputs. Thanks, Kristaps -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-11 22:38 ` Kristaps Dzonsons @ 2010-07-13 19:23 ` Ulrich Spörlein 2010-07-13 23:25 ` Kristaps Dzonsons 0 siblings, 1 reply; 14+ messages in thread From: Ulrich Spörlein @ 2010-07-13 19:23 UTC (permalink / raw) To: discuss On Mon, 12.07.2010 at 00:38:33 +0200, Kristaps Džonsons wrote: > > This also works fine with FreeBSD's groff when rendering to UTF-8 aware > > terminals using -Tutf8 (and of course in -Tps and -Thtml mode). > > > > I really hope the sentiment expressed in this thread is in jest, as I > > would stop considering mandoc(1) a viable alternative for FreeBSD's man > > subsystem if it will never support UTF-8 output (and then render \(:o as > > ö like it should). > > I think there's a little confusion here. I see Ingo just wrote and > answered most questions. Well, no point in wasting a response... > > The state of affairs follows: > > - mandoc/groff accept and understand ASCII input > - mandoc/groff [sometimes] accept but DO NOT understand non-ASCII input > > That UTF-8 input renders on your screen is coincidence: you happen to > have a UTF-8 terminal and groff hasn't puked on the characters. You > implicitly assume your readers' mediums have the same capabilities. > > Now for the \[foo] syntax. First, it exists. Second, it covers most > European characters. Is it general? No. Why let it stay? Because it > lets \(:u be both "u" (my terminal) and ü (e.g. www output). If you > don't use the \[foo] escapes, you're screwing readers. Yes, we're > screwing non-western-European manual writers ("me") already, but this is > not a problem we need to solve right now. I completely agree here, there's nothing fancy we could or should do regarding input. > Now for output and The Good Stuff. > > -Tutf8 is not hard. I think I can manage this in coming releases > without any negative effects. In fact, it will cut the binary size, as > I'd key special chars as integers and rewrite them on the fly into > UTF-8, Latin-1, or whatever, for all outputs. Sounds great, do you also plan on adding "special chars" support to -Tps (mostly for latin1 accents and umlauts)? Regards, Uli -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raw UTF-8? 2010-07-13 19:23 ` Ulrich Spörlein @ 2010-07-13 23:25 ` Kristaps Dzonsons 0 siblings, 0 replies; 14+ messages in thread From: Kristaps Dzonsons @ 2010-07-13 23:25 UTC (permalink / raw) To: discuss >>> This also works fine with FreeBSD's groff when rendering to UTF-8 aware >>> terminals using -Tutf8 (and of course in -Tps and -Thtml mode). >>> >>> I really hope the sentiment expressed in this thread is in jest, as I >>> would stop considering mandoc(1) a viable alternative for FreeBSD's man >>> subsystem if it will never support UTF-8 output (and then render \(:o as >>> ö like it should). >> I think there's a little confusion here. I see Ingo just wrote and >> answered most questions. Well, no point in wasting a response... >> >> The state of affairs follows: >> >> - mandoc/groff accept and understand ASCII input >> - mandoc/groff [sometimes] accept but DO NOT understand non-ASCII input >> >> That UTF-8 input renders on your screen is coincidence: you happen to >> have a UTF-8 terminal and groff hasn't puked on the characters. You >> implicitly assume your readers' mediums have the same capabilities. >> >> Now for the \[foo] syntax. First, it exists. Second, it covers most >> European characters. Is it general? No. Why let it stay? Because it >> lets \(:u be both "u" (my terminal) and ü (e.g. www output). If you >> don't use the \[foo] escapes, you're screwing readers. Yes, we're >> screwing non-western-European manual writers ("me") already, but this is >> not a problem we need to solve right now. > > I completely agree here, there's nothing fancy we could or should do > regarding input. Yes. Note that the problem space lies entirely within -Tps, which for now has hard-coded glyph widths. > >> Now for output and The Good Stuff. >> >> -Tutf8 is not hard. I think I can manage this in coming releases >> without any negative effects. In fact, it will cut the binary size, as >> I'd key special chars as integers and rewrite them on the fly into >> UTF-8, Latin-1, or whatever, for all outputs. > > Sounds great, do you also plan on adding "special chars" support to -Tps > (mostly for latin1 accents and umlauts)? Yes. I want to roll it into the next release along with the chars.in upgrade. Thanks, Kristaps -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-07-13 23:24 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-07 3:13 Raw UTF-8? Anthony J. Bentley 2010-07-07 9:33 ` Kristaps Dzonsons 2010-07-07 14:39 ` Anthony J. Bentley 2010-07-07 20:13 ` Ingo Schwarze 2010-07-07 18:58 ` Ingo Schwarze 2010-07-07 19:18 ` Joerg Sonnenberger 2010-07-07 21:12 ` Ingo Schwarze 2010-07-07 21:17 ` Joerg Sonnenberger 2010-07-09 21:05 ` Ulrich Spörlein 2010-07-10 18:11 ` J.C. Roberts 2010-07-11 22:17 ` Ingo Schwarze 2010-07-11 22:38 ` Kristaps Dzonsons 2010-07-13 19:23 ` Ulrich Spörlein 2010-07-13 23:25 ` Kristaps Dzonsons
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).