ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* i18n in ConTeXt (& some bugs)
@ 2004-10-21  7:15 Adam Lindsay
  2004-10-21 18:03 ` Tobias Burnus
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Adam Lindsay @ 2004-10-21  7:15 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 3108 bytes --]

Hi Hans, Victor, all.

I've been feeling the Unicode love with XeTeX, and I explored lang-* as a
major test case for internationalised text. I uncovered a few bugs along
the way. 

In OpenType fonts, \i doesn't compose well with accents, unlike normal
TeX fonts. Therefore a couple more definitions (at least) are needed in
enco-acc:
\defineaccent ' {\i} {\iacute}
\defineaccent ` {\i} {\igrave} % etc.

I had the toughest time getting strings like \v!january to switch
language. I found that this change (from \currentmainlanguage) fixed the
inability to \ShowLanguageValues. In lang-lab:
\def\labellanguage{\defaultlanguage\currentlanguage}
\def\headlanguage {\defaultlanguage\currentlanguage}

Another bug in visualisation was no doubt brought on by the switch to low
level english. In s-mod-00:
  \VL \ShowLabelText \subsection           \VL\MR
  \VL \ShowLabelText \subsubsection        \VL\MR
... changes to...
  \VL \ShowLabelText \v!subsection           \VL\MR
  \VL \ShowLabelText \v!subsubsection        \VL\MR


With that done, I noticed a few smaller bugs in the lang-* files. I am
not an expert in any of these languages, so all of my recommendations are
subject to verification by people who know what they're doing!

In lang-grk, a copy-paste error was propagated: 
 \s!fi => \s!gr

In lang-ura, the Hungarian word for abbreviations has a typo:
  R\"ovid\'\it\'esek => R\"ovid\'it\'esek (or R\"ovid\'\i t\'esek)

In lang-sla, the Polish word for part has an \ecedilla, which I'm
guessing should be an \eogonek...
  Ust\c{e}p => Ust\k{e}p  (probably!)

In lang-ita, The Catalan word for March is spelled with a \,. That's
probably a cedilla:
  mar\,c => mar\c{c}

Finally, I've done some work filling out enco-uc with characters used in
the other encodings and (especially) characters used in the lang-* files.
It's attached. I have a couple questions that remain for experts:

Greek: I defined \Greekleftquot as a guillemot, as a guess. Is that right?

Cyrillic: The Unicode codepoint for \cyrilicii completely baffled me for
a while. (I don't have a copy of the t2a/b fonts here with me to take a
look for myself!) According to Unicode, it should be the same as
\cyrillici, but that can't be right. I'm now guessing that since it
appears the same as the roman 'i' in enco-cyr, it corresponds with
U+0456: CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I. Is that right?

The additions to enco-uc are attached. I would be ever so grateful if
they found their way into the distribution!

By the way, although these seem like complaints, I must say (again) that
the plumbing supporting arbitrary encodings, accents, and input regimes
in ConTeXt is absolutely fantastic. Making XeTeX work with ConTeXt is
really quite trivial compared to the efforts being expended on the
XeLaTeX side!

Best,
adam
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Adam T. Lindsay, Computing Dept.     atl@comp.lancs.ac.uk
 Lancaster University, InfoLab21        +44(0)1524/510.514
 Lancaster, LA1 4WA, UK             Fax:+44(0)1524/510.492
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

[-- Attachment #2: enco-uc-new.tex --]
[-- Type: application/x-tex, Size: 12222 bytes --]

[-- Attachment #3: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: i18n in ConTeXt (& some bugs)
  2004-10-21  7:15 i18n in ConTeXt (& some bugs) Adam Lindsay
@ 2004-10-21 18:03 ` Tobias Burnus
  2004-10-21 18:31 ` Adam Lindsay
  2004-10-22  9:55 ` Hans Hagen
  2 siblings, 0 replies; 4+ messages in thread
From: Tobias Burnus @ 2004-10-21 18:03 UTC (permalink / raw)


Hello,

Adam Lindsay wrote:

>Greek: I defined \Greekleftquot as a guillemot, as a guess. Is that right?
>  
>
Looking at the Oxford Guide to Style, I find
"Use double quotation marks, or in modern Greek guillemets; but in 
ancient Greek some scholars may dispense with quotation marks, perhaps 
using an initial capital instead."
Since you are probably talking of modern Greek, I'd use  «guillemets».

Regards,

Tobias

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: i18n in ConTeXt (& some bugs)
  2004-10-21  7:15 i18n in ConTeXt (& some bugs) Adam Lindsay
  2004-10-21 18:03 ` Tobias Burnus
@ 2004-10-21 18:31 ` Adam Lindsay
  2004-10-22  9:55 ` Hans Hagen
  2 siblings, 0 replies; 4+ messages in thread
From: Adam Lindsay @ 2004-10-21 18:31 UTC (permalink / raw)


Adam Lindsay said this at Thu, 21 Oct 2004 08:15:14 +0100:

>I had the toughest time getting strings like \v!january to switch
>language. I found that this change (from \currentmainlanguage) fixed the
>inability to \ShowLanguageValues. In lang-lab:
>\def\labellanguage{\defaultlanguage\currentlanguage}
>\def\headlanguage {\defaultlanguage\currentlanguage}

On second thought, those probably aren't changes for lang-lab, but rather
to be redefined in s-mod-00 or something...

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Adam T. Lindsay, Computing Dept.     atl@comp.lancs.ac.uk
 Lancaster University, InfoLab21        +44(0)1524/510.514
 Lancaster, LA1 4WA, UK             Fax:+44(0)1524/510.492
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: i18n in ConTeXt (& some bugs)
  2004-10-21  7:15 i18n in ConTeXt (& some bugs) Adam Lindsay
  2004-10-21 18:03 ` Tobias Burnus
  2004-10-21 18:31 ` Adam Lindsay
@ 2004-10-22  9:55 ` Hans Hagen
  2 siblings, 0 replies; 4+ messages in thread
From: Hans Hagen @ 2004-10-22  9:55 UTC (permalink / raw)
  Cc: Victor Figurnov

Adam Lindsay wrote:

> In OpenType fonts, \i doesn't compose well with accents, unlike normal
> TeX fonts. Therefore a couple more definitions (at least) are needed in
> enco-acc:
> \defineaccent ' {\i} {\iacute}
> \defineaccent ` {\i} {\igrave} % etc.

ok, added

  > I had the toughest time getting strings like \v!january to switch
> language. I found that this change (from \currentmainlanguage) fixed the
> inability to \ShowLanguageValues. In lang-lab:
> \def\labellanguage{\defaultlanguage\currentlanguage}
> \def\headlanguage {\defaultlanguage\currentlanguage}

dangerous and wrong, better

\gdef\ShowHeadText#1{\tttf#1\VL\mainlanguage[\currentlanguage]\headtext{#1}\VisualizeLastSpace}
\gdef\ShowLabelText#1{\tttf#1\VL\mainlanguage[\currentlanguage]\labeltext{#1}\VisualizeLastSpace}

> Another bug in visualisation was no doubt brought on by the switch to low
> level english. In s-mod-00:
>   \VL \ShowLabelText \subsection           \VL\MR
>   \VL \ShowLabelText \subsubsection        \VL\MR
> ... changes to...
>   \VL \ShowLabelText \v!subsection           \VL\MR
>   \VL \ShowLabelText \v!subsubsection        \VL\MR

corrected

> With that done, I noticed a few smaller bugs in the lang-* files. I am
> not an expert in any of these languages, so all of my recommendations are
> subject to verification by people who know what they're doing!
> 
> In lang-grk, a copy-paste error was propagated: 
>  \s!fi => \s!gr

corrected

> In lang-ura, the Hungarian word for abbreviations has a typo:
>   R\"ovid\'\it\'esek => R\"ovid\'it\'esek (or R\"ovid\'\i t\'esek)

corrected

> In lang-sla, the Polish word for part has an \ecedilla, which I'm
> guessing should be an \eogonek...
>   Ust\c{e}p => Ust\k{e}p  (probably!)

corrected

(some day i'll change all these into \namedglyphs)

> In lang-ita, The Catalan word for March is spelled with a \,. That's
> probably a cedilla:
>   mar\,c => mar\c{c}

ok

> Finally, I've done some work filling out enco-uc with characters used in
> the other encodings and (especially) characters used in the lang-* files.
> It's attached. I have a couple questions that remain for experts:

indeed

> Greek: I defined \Greekleftquot as a guillemot, as a guess. Is that right?
> 
> Cyrillic: The Unicode codepoint for \cyrilicii completely baffled me for
> a while. (I don't have a copy of the t2a/b fonts here with me to take a
> look for myself!) According to Unicode, it should be the same as
> \cyrillici, but that can't be right. I'm now guessing that since it
> appears the same as the roman 'i' in enco-cyr, it corresponds with
> U+0456: CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I. Is that right?
> 
> The additions to enco-uc are attached. I would be ever so grateful if
> they found their way into the distribution!

ok, added

btw, we can also need to extend the utf (unic-*) files

> By the way, although these seem like complaints, I must say (again) that

hm, bug are bugs, no complaints -)

> the plumbing supporting arbitrary encodings, accents, and input regimes
> in ConTeXt is absolutely fantastic. Making XeTeX work with ConTeXt is
> really quite trivial compared to the efforts being expended on the
> XeLaTeX side!

ah, that's good news; i already got pessimistic seeing the stream of 
patched needed for latex that pass by on the xetex list

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-10-22  9:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-21  7:15 i18n in ConTeXt (& some bugs) Adam Lindsay
2004-10-21 18:03 ` Tobias Burnus
2004-10-21 18:31 ` Adam Lindsay
2004-10-22  9:55 ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).