Hello Hans,

On 8/24/07, Hans Hagen wrote:
> Hi,
>
> I uploaded a new version of mkiv (regular zip).

Thanks a lot!

> - case changing using attributes and node processing
>
> simple test file for spacing and casing:

I'm attaching a not-so-trivial test file for "casing", just to see how
well it works for Croatian.

A few observations:

- LM doesn't have any lj, nj, dz, dž, ... (probably another request
for the Polish guys)

- It would be great if MK IV did the trasformation from digraphs to
normal letters in case those digraphs are not present in the font
itself (for ij, lj, nj, dz, dž, ... just as it would be great if
ccaron was automatically composed out of c and caron if the letter
wasn't present in that font).
Visually there is probably no difference in plain text, except in
exactly the cases for which you're sending the tests (that's casing
and spacing). See http://en.wikipedia.org/wiki/Gaj's_Latin_alphabet
how the word "MJENJAČNICA" is split into letters.
Normal people still type n+j in text, not the digraph "ǌ" (nj), but in
case you get some text with those digraphs which are valid Unicode
letters, it would be nice if they were processed ...

> \starttext
>
> test: oeps {\setcharacterspacing[frenchpunctuation] x: xx \bfd x: xx}
> oeps: test
>
> test \WORD{test TEST \TeX} test
>
> test \word{test TEST \TeX} test
>
> test \Word{test TEST \TeX} test

Another few observations:
- \word doesn't work in XeTeX
- What exactly is \Words supposed to do (with non-first letters in a word)?
- ConTeXt with XeTeX outputs dozens of empty lines to the console.

An extra challenge would be to get this work (but unless some Croats
ask you for that or unless you have too much time left, don't bother
about that - it needs slightly more than only lccode and uccode of a
letter since there are three forms: one for lowercase [ljubljana ->
lj], one for all-uppercase words [LJUBLJANA -> LJ] and one for the
first letter of a word starting with an uppercase [Ljubljana -> Lj]):

In Unicode:

\word{ǉubǉana} -> ǉubǉana
\Word{ǉubǉana} -> ǈubǉana
\WORD{ǉubǉana} -> ǇUBǇANA

\word{ǈubǉana} -> ǉubǉana
\Word{ǈubǉana} -> ǈubǉana
\WORD{ǈubǉana} -> ǇUBǇANA

\word{ǇUBǇANA} -> ǉubǉana
\Word{ǇUBǇANA} -> ǈubǉana
\WORD{ǇUBǇANA} -> ǇUBǇANA

In Latin transcript (in case you have problems seing some Unicode letters):

\word{ljubljana} -> ljubljana
\Word{ljubljana} -> Ljubljana
\WORD{ljubljana} -> LJUBLJANA

\word{Ljubljana} -> ljubljana
\Word{Ljubljana} -> Ljubljana
\WORD{Ljubljana} -> LJUBLJANA

\word{LJUBLJANA} -> ljubljana
\Word{LJUBLJANA} -> Ljubljana
\WORD{LJUBLJANA} -> LJUBLJANA

See also:

http://unicode.org/cldr/data/common/collation/hr.xml
http://en.wikipedia.org/wiki/Gaj's_Latin_alphabet

> {\setcharacterkerning[extrakerning]\input zapf\endgraf }

(That could be "backported" to XeTeX. I think it enables a similar
feature now, but I should check.)

Mojca