* (Con)TeX(t), Unicode and accented characters @ 2004-12-20 20:02 Mojca Miklavec 2004-12-20 20:52 ` Hans Hagen 2004-12-21 7:56 ` r.ermers 0 siblings, 2 replies; 6+ messages in thread From: Mojca Miklavec @ 2004-12-20 20:02 UTC (permalink / raw) Here's a short version of my question: How do I enable unicode encoded characters (just normal accented latin characters) to be typeset in (any font) in ConTeXt, like the \usepackage[utf8]{inputenc} in LaTeX? And here the long one: ************************************************************************ I don't really understand how accented characters are typeset in (Con)TeX(t). One of the main reasons for switching to LaTeX (maybe 8 years ago) someone mentioned was: "You don't have to worry about accented characters. You can make any accented character and it will work all over the world." (We actually did have lots of problems with MS Word and web browsers at that time.) And it was true. But when I switched to ConTeXt I came against that problem again. In LaTeX I used \v{c}\v{s}\v{z} at first, later \usepackage{csz} ... "c"s"z (which works pretty much the same as "a"o"u in German) and finally (when someone told me about that possibility) \usepackage[utf8]{inputenc} ... čšž As I didn't know how to use any other the font, I always used CMR, the default, so I didn't have problems with exotic fonts either. ************************************************************************ But here we come to ConTeXt. For the German "Umlaut", \"{a}\"{o}\"{u} (äöü), this was satisfactory: \useencoding[windows-1250] \mainlanguage[de] For \v{c}\v{s}\v{z} (čšž) this wasn't the case, so a proposed solution from another ConTeXt user was: % output=pdf -translate-file=cp1250cs \setupbodyfont [csr,ams,rm] What I don't really understand: why did the Chech TUG have to design *their own font*, csr, (or made changes to cmr) if accented characters worked perfectly already in plain TeX? The second problem: This works under Windows when typesetting in code page 1250. How can I use accented characters if text is typeset in Unicode (or latin2) in Linux? The third problem: How do I typeset '\v{c}' in some other font? I do understand that it may not function in just any font since someone has to tell the computer how the accented characters are built, but as long as \v{c} works, there's no reason for \useencoding[utf8] and then continuing with unicode encoded characters not to produce the desired result. Thank you, Mojca ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (Con)TeX(t), Unicode and accented characters 2004-12-20 20:02 (Con)TeX(t), Unicode and accented characters Mojca Miklavec @ 2004-12-20 20:52 ` Hans Hagen 2004-12-20 21:35 ` Mojca Miklavec 2004-12-20 22:16 ` VnPenguin 2004-12-21 7:56 ` r.ermers 1 sibling, 2 replies; 6+ messages in thread From: Hans Hagen @ 2004-12-20 20:52 UTC (permalink / raw) Mojca Miklavec wrote: > But when I switched to ConTeXt I came against that problem again. > > In LaTeX I used > \v{c}\v{s}\v{z} this also works in context > at first, later > \usepackage{csz} ... "c"s"z in this case, i assume that csz makes " active and such; if you really want that , we shoul dmake an enco-fcz, with definitions like: \startlanguagespecifics[cz] \appendtoks \makecharacteractive " \to \everynormalcatcodes \installcompoundcharacter "c {\v{c}} \installcompoundcharacter "s {\v{s}} \installcompoundcharacter "z {\v{z}} \stoplanguagespecifics and alike; if you want utf, you should say (at the top of the file) \enableregime[utf] > As I didn't know how to use any other the font, I always used CMR, the > default, so I didn't have problems with exotic fonts either. this should work with all fonts, since there are fallback definitions > % output=pdf -translate-file=cp1250cs > \setupbodyfont > [csr,ams,rm] try to avoid code pages > What I don't really understand: why did the Chech TUG have to design > *their own font*, csr, (or made changes to cmr) if accented characters > worked perfectly already in plain TeX? in cmr \v{s} is actually two characters, while in csr it's one (composed) character (built of two characters but seen as one); therefore when you use csr fonts, you can get proper hyphenation (which is notthe case in cmr where the usage of \accent primitive spoils the game); next year, when i can assume that the new latin modern fonts are available everywhere, i will drop cmr as default cum suis in favor of lsr (which has cmr, plr, csr, vnr, aer etc included) > The second problem: This works under Windows when typesetting in code > page 1250. How can I use accented characters if text is typeset in > Unicode (or latin2) in Linux? you probably need to configure you reditor to use utf > The third problem: How do I typeset '\v{c}' in some other font? I do > understand that it may not function in just any font since someone has > to tell the computer how the accented characters are built, but as long > as \v{c} works, there's no reason for > \useencoding[utf8] > and then continuing with unicode encoded characters not to produce the > desired result. don't worry, other fonts work ok; if an encoding does not support the chars you need, a composed char is constructed; [font encodings have othing to do with input encoding but there do influence hyphenations] if i'm right, ec, texnansi, and qx encoding all serve your purpose Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (Con)TeX(t), Unicode and accented characters 2004-12-20 20:52 ` Hans Hagen @ 2004-12-20 21:35 ` Mojca Miklavec 2004-12-20 22:16 ` VnPenguin 1 sibling, 0 replies; 6+ messages in thread From: Mojca Miklavec @ 2004-12-20 21:35 UTC (permalink / raw) Hans Hagen wrote: > and alike; if you want utf, you should say (at the top of the file) > > \enableregime[utf] Thanks for many other advices also, but especially for this one: I probably already tried this out. Well, almost ;). Since niether \enableregime[utf8] nor \enableregime[utf-8] resulted in the desired output. (I was always used to write '8' after utf since utf-16 and some others exist as well.) Thank you, Mojca ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (Con)TeX(t), Unicode and accented characters 2004-12-20 20:52 ` Hans Hagen 2004-12-20 21:35 ` Mojca Miklavec @ 2004-12-20 22:16 ` VnPenguin 1 sibling, 0 replies; 6+ messages in thread From: VnPenguin @ 2004-12-20 22:16 UTC (permalink / raw) On Mon, 20 Dec 2004 21:52:17 +0100, Hans Hagen <pragma@wxs.nl> wrote: > Mojca Miklavec wrote: [...] > > The second problem: This works under Windows when typesetting in code > > page 1250. How can I use accented characters if text is typeset in > > Unicode (or latin2) in Linux? > > you probably need to configure you reditor to use utf Under Linux I use vim/gvim, gedit, gtk2edit for editing Vietnamese text in UTF-8 without any problem :) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (Con)TeX(t), Unicode and accented characters 2004-12-20 20:02 (Con)TeX(t), Unicode and accented characters Mojca Miklavec 2004-12-20 20:52 ` Hans Hagen @ 2004-12-21 7:56 ` r.ermers 2004-12-21 10:01 ` Adam Lindsay 1 sibling, 1 reply; 6+ messages in thread From: r.ermers @ 2004-12-21 7:56 UTC (permalink / raw) Mojca, In reply to your question: > I don't really understand how accented characters are typeset in > (Con)TeX(t). One of the main reasons for switching to LaTeX (maybe 8 > years ago) someone mentioned was: "You don't have to worry about accented > characters. You can make any accented character and it will work all over > the world." (We actually did have lots of problems with MS Word and web > browsers at that time.) And it was true. You know that all characters in a font have a number. If you type a, the font mechanism makes sure that you see an a. In reality the font shows you the character that is put on the numerical position of a. In the font dingbats for example, the character on that position is not an a, but a symbol. In Latex the combination \"{a} can mean two things: 1. in most fonts: show the charachter on the a given numerical position, which means that there is one character ä. 2. in some other fonts \"{a} means: combine " with a and make an ä. This means that " is combined with the character on the numerical position of a. TeX does this very well and thus construes very acceptable diacritical signs like \"{q}, \d{o}, \v{o}, which do not exist in regular fonts. If you have a font which contains \"{q}, \d{o} or some other special characters, you may instruct TeX not to create the character, but rather to show the contents of a given numerical position in that font. That's what the .enc and .fd files under Latex are for. That's also the reason there are, or used to be, special fonts for Polish an Czech and other languages: they contain predefined characters in one single numerical position, e.g. \v{s} and \v{c} that TeX does not have to create anew from two signs. Kind regards, Robert ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (Con)TeX(t), Unicode and accented characters 2004-12-21 7:56 ` r.ermers @ 2004-12-21 10:01 ` Adam Lindsay 0 siblings, 0 replies; 6+ messages in thread From: Adam Lindsay @ 2004-12-21 10:01 UTC (permalink / raw) r.ermers@hccnet.nl said this at Tue, 21 Dec 2004 08:56:40 +0100: >In Latex the combination \"{a} can mean two things: >1. in most fonts: show the charachter on the a given numerical position, >which means that there is one character ä. > >2. in some other fonts \"{a} means: combine " with a and make an ä. This >means that " is combined with the character on the numerical position of >a. TeX does this very well and thus construes very acceptable diacritical >signs like \"{q}, \d{o}, \v{o}, which do not exist in regular fonts. Robert, That's a helpful explanation. I'll try to expand on that in the ConTeXt case, just in case people are curious or are led into thinking it's just the same: In ConTeXt, the combination \"{a} means one thing: \adiaeresis (see enco- acc). This \adiaeresis can mean one of two things, depending on the encoding: 1. Numerical position, or 2. The fallback case (defined in enco-def), where a diaeresis/umlaut is placed atop an 'a' glyph. Hyphenation implications as Hans described. The interesting/helpful thing about ConTeXt is that internally, that glyph is given a consistent name, no matter how it is input or output. So, if you type ä in your given input regime, and that encoding is properly set, that numerical ä (e.g., character #228 in the windows regime) is mapped to \adiaeresis. Wanna know what happens in UTF-8? Here's my 'simplified' explanation: In a UTF-8 bytestream, that character "ä" is signified by two bytes: 0xC3, 0xA4. That first byte triggers a conversion of both bytes into two different bytes, the actual Unicode number, 0x00 0xE4 (or: 0, 228). ConTeXt then looks into internal hashes set up (in this case, the unic- 000 vector), looks at the 228th element, and sees that it's \adiaeresis. Things then proceed as normal. :) (It's also interesting to note that for PostScript and TrueType fonts, that number > name > number (glyph) mapping happens yet again in the driver. But all that is outside of TeX proper, so to say any more would be confusing.) -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-12-21 10:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-12-20 20:02 (Con)TeX(t), Unicode and accented characters Mojca Miklavec 2004-12-20 20:52 ` Hans Hagen 2004-12-20 21:35 ` Mojca Miklavec 2004-12-20 22:16 ` VnPenguin 2004-12-21 7:56 ` r.ermers 2004-12-21 10:01 ` Adam Lindsay
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).