ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Character names (was: Context 2005.12.19 released)
@ 2005-12-21 14:19 Mojca Miklavec
  2005-12-21 16:17 ` Taco Hoekwater
  0 siblings, 1 reply; 14+ messages in thread
From: Mojca Miklavec @ 2005-12-21 14:19 UTC (permalink / raw)


Hans Hagen wrote:
> Mojca Miklavec wrote:
> >Taco Hoekwater wrote:
> >
> >>New features since 2005.12.18:
> >>
> >>* Support for the latin-9 regime (latin-1 + euro)
> >>
> >
> >There are some more (automatically generated) regime definitions at
> >http://pub.mojca.org/tex/enco/contextbase/
> >(only from the glyph names that I was able to extract from the
> >existing files, so it's only OK for some of the regimes mentioned
> >there).
> >
> >If possible, I would like to ask for core support for windows-1250
> >(perhaps other users may find some other regimes useful as well).
> >
> >
> just send me the files you feel confident with

(I'll send the good files soon.)

Except Celtic, Thai, Arabic and Hebrew (although the letter names for
Hebrew are almost completely defined) almost all the windows and ISO
regimes are OK, just some glyphs are missing (which are, or at least
were, missing in Unicode vectors as well). If anyone has suggestions
for names for the following characters, 6 additional regimes can be
fully supported:

windows-1251 and iso-8859-5
2116 NUMERO SIGN

windows-1253
0385 GREEK DIALYTIKA TONOS
2015 HORIZONTAL BAR
0384 GREEK TONOS

windows-1258
0300 COMBINING GRAVE ACCENT
0309 COMBINING HOOK ABOVE
0303 COMBINING TILDE
0301 COMBINING ACUTE ACCENT
0323 COMBINING DOT BELOW
20AB DONG SIGN

iso-8859-7
20AF DRACHMA SIGN
037A GREEK YPOGEGRAMMENI
2015 HORIZONTAL BAR
0384 GREEK TONOS
0385 GREEK DIALYTIKA TONOS

iso-8859-10
2015 HORIZONTAL BAR

Mojca

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names (was: Context 2005.12.19 released)
  2005-12-21 14:19 Character names (was: Context 2005.12.19 released) Mojca Miklavec
@ 2005-12-21 16:17 ` Taco Hoekwater
  2005-12-21 18:09   ` Hans Hagen
  2005-12-23  1:00   ` Mojca Miklavec
  0 siblings, 2 replies; 14+ messages in thread
From: Taco Hoekwater @ 2005-12-21 16:17 UTC (permalink / raw)



Here's what I can come up with. At least a few are acceptable, like the
horizontal bar. \textnumero exists, but is only reachable in cyrillic
encodings (fixable, I guess?), and the greek & vietnamese accents
are also only usable in the correct encoding. I've used the \text...
versions of the accents, but perhaps the actual commands are more
correct (like \' and \~).

Cheers, Taco

\starttext
\definecharacter texthorizontalbar {{--\kern 0pt--}}
\definecharacter textdong          {\underbar{\dstroke}}

\starttabulate[|c|c|]
\NC 0300 COMBINING GRAVE ACCENT \NC \textgrave           \NC \NR
\NC 0309 COMBINING HOOK ABOVE	\NC \texthookabove       \NC \NR
\NC 0303 COMBINING TILDE	\NC \texttilde           \NC \NR
\NC 0301 COMBINING ACUTE ACCENT	\NC \textacute           \NC \NR
\NC 0323 COMBINING DOT BELOW	\NC \textbottomdot       \NC \NR
\NC 037A GREEK YPOGEGRAMMENI	\NC \unknownchar         \NC \NR  % prime?
\NC 0384 GREEK TONOS		\NC \greektonos          \NC \NR
\NC 0385 GREEK DIALYTIKA TONOS	\NC \greekdialytikatonos \NC \NR
\NC 2015 HORIZONTAL BAR		\NC \texthorizontalbar   \NC \NR
\NC 20AB DONG SIGN		\NC \textdong            \NC \NR
\NC 20AF DRACHMA SIGN		\NC \unknownchar         \NC \NR
\NC 2116 NUMERO SIGN		\NC \textnumero          \NC \NR
\stoptabulate
\stoptext


Mojca Miklavec wrote:
> Hans Hagen wrote:
> 
>>Mojca Miklavec wrote:
>>
>>>Taco Hoekwater wrote:
>>>
>>>
>>>>New features since 2005.12.18:
>>>>
>>>>* Support for the latin-9 regime (latin-1 + euro)
>>>>
>>>
>>>There are some more (automatically generated) regime definitions at
>>>http://pub.mojca.org/tex/enco/contextbase/
>>>(only from the glyph names that I was able to extract from the
>>>existing files, so it's only OK for some of the regimes mentioned
>>>there).
>>>
>>>If possible, I would like to ask for core support for windows-1250
>>>(perhaps other users may find some other regimes useful as well).
>>>
>>>
>>
>>just send me the files you feel confident with
> 
> 
> (I'll send the good files soon.)
> 
> Except Celtic, Thai, Arabic and Hebrew (although the letter names for
> Hebrew are almost completely defined) almost all the windows and ISO
> regimes are OK, just some glyphs are missing (which are, or at least
> were, missing in Unicode vectors as well). If anyone has suggestions
> for names for the following characters, 6 additional regimes can be
> fully supported:
> 
> windows-1251 and iso-8859-5
> 2116 NUMERO SIGN
> 
> windows-1253
> 0385 GREEK DIALYTIKA TONOS
> 2015 HORIZONTAL BAR
> 0384 GREEK TONOS
> 
> windows-1258
> 0300 COMBINING GRAVE ACCENT
> 0309 COMBINING HOOK ABOVE
> 0303 COMBINING TILDE
> 0301 COMBINING ACUTE ACCENT
> 0323 COMBINING DOT BELOW
> 20AB DONG SIGN
> 
> iso-8859-7
> 20AF DRACHMA SIGN
> 037A GREEK YPOGEGRAMMENI
> 2015 HORIZONTAL BAR
> 0384 GREEK TONOS
> 0385 GREEK DIALYTIKA TONOS
> 
> iso-8859-10
> 2015 HORIZONTAL BAR
> 
> Mojca
> _______________________________________________
> ntg-context mailing list
> ntg-context@ntg.nl
> http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names (was: Context 2005.12.19 released)
  2005-12-21 16:17 ` Taco Hoekwater
@ 2005-12-21 18:09   ` Hans Hagen
  2005-12-23  1:00   ` Mojca Miklavec
  1 sibling, 0 replies; 14+ messages in thread
From: Hans Hagen @ 2005-12-21 18:09 UTC (permalink / raw)


Taco Hoekwater wrote:

>
> \definecharacter texthorizontalbar {{--\kern 0pt--}}
> \definecharacter textdong          {\underbar{\dstroke}}

ok, i added those to enco-def.tex (end of file:)

\startencoding[\s!default]

\definecharacter texthorizontalbar {{--\kern\zeropoint--}}
\definecharacter textdong          {\underbar{\dstroke}}

\stopencoding

Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names (was: Context 2005.12.19 released)
  2005-12-21 16:17 ` Taco Hoekwater
  2005-12-21 18:09   ` Hans Hagen
@ 2005-12-23  1:00   ` Mojca Miklavec
  2005-12-23  9:44     ` Character names Hans Hagen
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Mojca Miklavec @ 2005-12-23  1:00 UTC (permalink / raw)


Taco Hoekwater wrote:
>
> Here's what I can come up with. At least a few are acceptable, like the
> horizontal bar. \textnumero exists, but is only reachable in cyrillic
> encodings (fixable, I guess?), and the greek & vietnamese accents
> are also only usable in the correct encoding. I've used the \text...
> versions of the accents, but perhaps the actual commands are more
> correct (like \' and \~).
>
> Cheers, Taco
>
> \starttext
> \definecharacter texthorizontalbar {{--\kern 0pt--}}
> \definecharacter textdong          {\underbar{\dstroke}}

Thanks for those ...

> \NC 0300 COMBINING GRAVE ACCENT \NC \textgrave           \NC \NR
> \NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove       \NC \NR
> \NC 0303 COMBINING TILDE        \NC \texttilde           \NC \NR
> \NC 0301 COMBINING ACUTE ACCENT \NC \textacute           \NC \NR
> \NC 0323 COMBINING DOT BELOW    \NC \textbottomdot       \NC \NR

I may be wrong, but aren't those used only in combination with other
characters? I don't know if TeX (ConTeXt) can handle this (at least
not yet). When I wrote the list a couple of days ago I forgot about
that fact. If the accent would come before the charecter, this could
be replaced by "\buildtextaccent...", but here there's perhaps no
solution without some additional macros. (And since the Vietnamese
seem to be satisfied with viscii and utf for now, supporting cp1258 is
not crucial.)

I double-checked the differences between the existing regimes and the
ones that were automatically produced by a script. The list of regimes
that are "ripe" for supporting is thus:

cp125[ 0 | *1 | *2 | 3 | 4 | 7 ]
iso-8859-[ *1 | *2 | 3 | 4 | *5 | *7 | 9 | 13 | *15 | 16 ]
*viscii (with glyph names instead of \"\u\...)

(The ones marked with a star are already supported, perhaps with some
inconsistencies. Not supported: Hebrew, Arabic, Vietnamese? for cp125X
and Arabic, Thai and Celtic for iso-8859-X.)

I'll send the files (full content is already on my page), but I need
to know how to split/group them (I guess it would be a bad idea to
have one file for each encoding). Should there be one file for
iso-8859 and one for windows encodings? What about those regimes that
are already supported? I would like to move at least the "regi-win"
(with 8 wrong definitions anyway) to a "less discriminating" place,
don't know what to do with Greek and Cyrillic.

And another set of questions:
1. Can someone check for (in)consistencies for
greekupsilondiaeresis vs. greekupsilondialytika?
Looks like the same glyph named differently at different places
(functionality may break).

2. What to do with
{\cyrillicGJE}       {\'\cyrillicG} % 0403 CYRILLIC CAPITAL LETTER GJE
{\cyrillicgje}       {\'\cyrillicg} % 0453 CYRILLIC SMALL LETTER GJE
{\cyrillicKJE}       {\'\cyrillicK} % 040C CYRILLIC CAPITAL LETTER KJE
{\cyrillickje}       {\'\cyrillick} % 045C CYRILLIC SMALL LETTER KJE
{\cyrillicgheupturn} {\cyrillicgup} % 0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
Which variant is better?

Would it make sense to define
\definecharacter cyrillicGJE {\buildtextaccent\textacute\cyrillicG}
\defineaccent ' \cyrillicG {\cyrillicGJE}
and then use \cyrillicGJE consistently?

3.
PLEASE FIX:
in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure)
\definecharacter textellipsis     {\mathematics\cdots}
(I guess this "bug" was the reason for changing some definitions in
regimes/encodings elsewhere.)

Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS" or anything else?

4. \softhyphen, \hyphen or \- for "00AD SOFT HYPHEN"?

5. Urgently: what to do with quotations (without language
discriminations if possible)?

% 201A SINGLE LOW-9 QUOTATION MARK
\quotesinglebase vs. \lowerleftsingleninequote
% 201E DOUBLE LOW-9 QUOTATION MARK
\quotedblbase vs. \lowerleftdoubleninequote
% 2018 LEFT SINGLE QUOTATION MARK
\quoteleft vs. \upperleftsinglesixquote
% 2019 RIGHT SINGLE QUOTATION MARK
\quoteright vs. \upperrightsingleninequote

% 201C LEFT DOUBLE QUOTATION MARK
\quotedblleft vs. \upperleftdoublesixquote
% 201D RIGHT DOUBLE QUOTATION MARK
\quotedblright vs. \upperrightdoubleninequote

% 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
\guilsingleleft vs. \leftsubguillemot
 % 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
\guilsingleright vs. \rightsubguillemot
% 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
\leftguillemot vs. \greekleftquot
(are Greek quotations treated specially or what is this doing in regi-grk?)
% 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
\rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot
(in my point of view the last one may be better, but not fair since
it's language dependent: may be OK for French, but not for German or
vice versa; perhaps a language-sensitive macro could be inserted at
this place?)

6. \textnumero, 0x2116 (and perhaps some other characters) should be
added to unicode vector 33.

7. files regi-il1 and regi-win have many inconsistencies. I would like
to suggest to do the following renamings:

windows -> cp1252
il1 -> iso-8858-1
il2 -> iso-8858-2
iso88595 -> iso-8858-5
grk -> iso-8859-7 (the new one)

and to add the following lines somewhere:

% or perhaps the other way around
\defineregimesynonym[utf-8][utf]
\defineregimesynonym[utf8][utf]

\defineregimesynonym[windows-1250][cp1250]
\defineregimesynonym[windows-1251][cp1251]
\defineregimesynonym[windows-1252][cp1252]
\defineregimesynonym[windows-1253][cp1253]
\defineregimesynonym[windows-1254][cp1254]
%defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew)
%defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic)
\defineregimesynonym[windows-1257][cp1257]
%defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese)

% for historical reasons
\defineregimesynonym[windows][cp1252]

% 5 - Cyrillic
% 6 - Arabic (not supported)
% 7 - Greek
% 8 - Hebrew (3 signs missing)
% 11 - Thai (not supported)

\defineregimesynonym[il1][iso-8859-1]
\defineregimesynonym[il2][iso-8859-2]
\defineregimesynonym[il3][iso-8859-3]
\defineregimesynonym[il4][iso-8859-4]
\defineregimesynonym[il5][iso-8859-9]
\defineregimesynonym[il6][iso-8859-10]
\defineregimesynonym[il7][iso-8859-13]
%defineregimesynonym[il8][iso-8859-14] % not supported yet
\defineregimesynonym[il9][iso-8859-15]
\defineregimesynonym[il10][iso-8859-16]

\defineregimesynonym[latin1][iso-8859-1]
\defineregimesynonym[latin2][iso-8859-2]
\defineregimesynonym[latin3][iso-8859-3]
\defineregimesynonym[latin4][iso-8859-4]
\defineregimesynonym[latin5][iso-8859-9]
\defineregimesynonym[latin6][iso-8859-10]
\defineregimesynonym[latin7][iso-8859-13]
%defineregimesynonym[latin8][iso-8859-14] % not supported yet
\defineregimesynonym[latin9][iso-8859-15]
\defineregimesynonym[latin10][iso-8859-16]

% for historical reasons
\defineregimesynonym[iso88595][iso-8859-5]
\defineregimesynonym[grk][iso-8859-7]

I can send the new files as soon as it gets clear how to group them.
If additionalz the rest of the questions are answered, then new files
can become more consistent without breaking anything.

Sorry for the long mail,
    Mojca

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23  1:00   ` Mojca Miklavec
@ 2005-12-23  9:44     ` Hans Hagen
  2005-12-23 12:29       ` Taco Hoekwater
  2005-12-25 23:48       ` Mojca Miklavec
  2005-12-23 10:31     ` Hans Hagen
  2005-12-23 12:36     ` Taco Hoekwater
  2 siblings, 2 replies; 14+ messages in thread
From: Hans Hagen @ 2005-12-23  9:44 UTC (permalink / raw)


Mojca Miklavec wrote:

>I'll send the files (full content is already on my page), but I need
>to know how to split/group them (I guess it would be a bad idea to
>have one file for each encoding). Should there be one file for
>iso-8859 and one for windows encodings? What about those regimes that
>are already supported? I would like to move at least the "regi-win"
>(with 8 wrong definitions anyway) to a "less discriminating" place,
>don't know what to do with Greek and Cyrillic.
>  
>
the problem with one file is that they will be loaded all which will 
make memory and hash usage extreme, so best split it in separate files

>PLEASE FIX:
>in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure)
>\definecharacter textellipsis     {\mathematics\cdots}
>(I guess this "bug" was the reason for changing some definitions in
>regimes/encodings elsewhere.)
>
>Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS" or anything else?
>  
>
that's for taco to decide

>(are Greek quotations treated specially or what is this doing in regi-grk?)
>% 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
>\rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot
>(in my point of view the last one may be better, but not fair since
>it's language dependent: may be OK for French, but not for German or
>vice versa; perhaps a language-sensitive macro could be inserted at
>this place?)
>
>  
>
see core-mis, maybe using

  \symbol[\c!leftquotation]

helps

>6. \textnumero, 0x2116 (and perhaps some other characters) should be
>added to unicode vector 33.
>
>7. files regi-il1 and regi-win have many inconsistencies. I would like
>to suggest to do the following renamings:
>  
>
>% or perhaps the other way around
>\defineregimesynonym[utf-8][utf]
>\defineregimesynonym[utf8][utf]
>
>\defineregimesynonym[windows-1250][cp1250]
>\defineregimesynonym[windows-1251][cp1251]
>\defineregimesynonym[windows-1252][cp1252]
>\defineregimesynonym[windows-1253][cp1253]
>\defineregimesynonym[windows-1254][cp1254]
>%defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew)
>%defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic)
>\defineregimesynonym[windows-1257][cp1257]
>%defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese)
>
>% for historical reasons
>\defineregimesynonym[windows][cp1252]
>
>  
>
needs some thought

>I can send the new files as soon as it gets clear how to group them.
>If additionalz the rest of the questions are answered, then new files
>can become more consistent without breaking anything.
>  
>
so ... split the files

Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23  1:00   ` Mojca Miklavec
  2005-12-23  9:44     ` Character names Hans Hagen
@ 2005-12-23 10:31     ` Hans Hagen
  2005-12-23 12:36     ` Taco Hoekwater
  2 siblings, 0 replies; 14+ messages in thread
From: Hans Hagen @ 2005-12-23 10:31 UTC (permalink / raw)


Mojca Miklavec wrote:

>\defineregimesynonym[windows-1250][cp1250]
>  
>

the synonym features is already in the kernel; the following patch to 
regi-ini will permit file name synonyms, so

\definefilesynonym[regi-win][...]

patch:

\def\douseregime#1% nearly identical to encoding
  {\doifundefined{\c!file\f!regimeprefix#1}%
     {\setvalue{\c!file\f!regimeprefix#1}{}%
      \makeshortfilename[\truefilename{\f!regimeprefix#1}]%
      \startreadingfile
        \readsysfile\shortfilename
          {\showmessage\m!encodings2{#1}}
          {\showmessage\m!encodings3{#1}}%
      \stopreadingfile}}

so, we can make (many) files called

regi-cp-1250

and then say

\definefilesynonym[regi-win][regi-cp-1250]

\defineregimesynonym[win][cp1250]

(of course the internals of regi-... should become cp1250 then)

Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23  9:44     ` Character names Hans Hagen
@ 2005-12-23 12:29       ` Taco Hoekwater
  2005-12-23 12:48         ` Mojca Miklavec
  2005-12-25 23:48       ` Mojca Miklavec
  1 sibling, 1 reply; 14+ messages in thread
From: Taco Hoekwater @ 2005-12-23 12:29 UTC (permalink / raw)


Hans Hagen wrote:
> 
>> PLEASE FIX:
>> in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm 
>> not sure)
>> \definecharacter textellipsis     {\mathematics\cdots}
>> (I guess this "bug" was the reason for changing some definitions in
>> regimes/encodings elsewhere.)

>> Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS" 

Yes. But on the baseline, so:

   \definecharacter textellipsis     {\periods\relax}

U+2024 (ONE-DOT LEADER):

   \definecharacter textonedotleader     {\doperiods[1]}

U+2025 (TWO-DOT LEADER):

   \definecharacter texttwodotleader     {\doperiods[2]}

I believe there is a four-dot leader in unicode as well, but I
can't find it right now.

Taco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23  1:00   ` Mojca Miklavec
  2005-12-23  9:44     ` Character names Hans Hagen
  2005-12-23 10:31     ` Hans Hagen
@ 2005-12-23 12:36     ` Taco Hoekwater
  2005-12-23 13:39       ` Mojca Miklavec
  2 siblings, 1 reply; 14+ messages in thread
From: Taco Hoekwater @ 2005-12-23 12:36 UTC (permalink / raw)


Mojca Miklavec wrote:
>>\NC 0300 COMBINING GRAVE ACCENT \NC \textgrave           \NC \NR
>>\NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove       \NC \NR
>>\NC 0303 COMBINING TILDE        \NC \texttilde           \NC \NR
>>\NC 0301 COMBINING ACUTE ACCENT \NC \textacute           \NC \NR
>>\NC 0323 COMBINING DOT BELOW    \NC \textbottomdot       \NC \NR
> 
> I may be wrong, but aren't those used only in combination with other
> characters? I don't know if TeX (ConTeXt) can handle this (at least
> not yet).

If the format was <accent> <char>, that would work, but unicode
specifies <char> <accent>, and that cannot be done without a special
font encoding that uses lots of ligatures.

Taco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23 12:29       ` Taco Hoekwater
@ 2005-12-23 12:48         ` Mojca Miklavec
  0 siblings, 0 replies; 14+ messages in thread
From: Mojca Miklavec @ 2005-12-23 12:48 UTC (permalink / raw)


Taco Hoekwater wrote:
>
> >> Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS"
>
> Yes. But on the baseline, so:

OK, thanks.

>    \definecharacter textellipsis     {\periods\relax}

So perhaps fix the unic-032.tex again then.

> I believe there is a four-dot leader in unicode as well, but I
> can't find it right now.

There are many dots in Unicode. Section 205X (2058) for example.

Mojca

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23 12:36     ` Taco Hoekwater
@ 2005-12-23 13:39       ` Mojca Miklavec
  2005-12-23 14:18         ` Taco Hoekwater
  2005-12-23 14:39         ` Hans Hagen
  0 siblings, 2 replies; 14+ messages in thread
From: Mojca Miklavec @ 2005-12-23 13:39 UTC (permalink / raw)


Taco Hoekwater wrote:
> Mojca Miklavec wrote:
> >>\NC 0300 COMBINING GRAVE ACCENT \NC \textgrave           \NC \NR
> >>\NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove       \NC \NR
> >>\NC 0303 COMBINING TILDE        \NC \texttilde           \NC \NR
> >>\NC 0301 COMBINING ACUTE ACCENT \NC \textacute           \NC \NR
> >>\NC 0323 COMBINING DOT BELOW    \NC \textbottomdot       \NC \NR
> >
> > I may be wrong, but aren't those used only in combination with other
> > characters? I don't know if TeX (ConTeXt) can handle this (at least
> > not yet).
>
> If the format was <accent> <char>, that would work, but unicode
> specifies <char> <accent>, and that cannot be done without a special
> font encoding that uses lots of ligatures.

I thought so. But the issue is not a matter of font designers, but of
underlying software. If TeX can't "unget" a character and replace it
with the accented one, you can't ask font designers to add dozens of
ligatures. Knuth didn't write TeX with Unicode conventions in mind, so
I can understand that, I only wonder if XeTeX, Aleph [and exTeX]
support such accents.

I'll consider this as "leave Windows Vietnamese encoding unsupported"
(they have two other encodings anyway).

Thanks,
    Mojca

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23 13:39       ` Mojca Miklavec
@ 2005-12-23 14:18         ` Taco Hoekwater
  2005-12-23 14:39         ` Hans Hagen
  1 sibling, 0 replies; 14+ messages in thread
From: Taco Hoekwater @ 2005-12-23 14:18 UTC (permalink / raw)




Mojca Miklavec wrote:
> I thought so. But the issue is not a matter of font designers, but of
> underlying software. If TeX can't "unget" a character and replace it
> with the accented one, you can't ask font designers to add dozens of
> ligatures. Knuth didn't write TeX with Unicode conventions in mind, so
> I can understand that, I only wonder if XeTeX, Aleph [and exTeX]
> support such accents.

Inside the TFM file, it is fairly straightforward to instruct TeX to
create a ligature from "a" followed by "`" to "à".

That is a job for texfont or fontinst, not the font designers'.  But
it qualifies as a hack, not true unicode support. Anyway, no point
spending time on it if nobody is going to use it.

Cheers,

Taco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23 13:39       ` Mojca Miklavec
  2005-12-23 14:18         ` Taco Hoekwater
@ 2005-12-23 14:39         ` Hans Hagen
  2005-12-23 21:33           ` VnPenguin
  1 sibling, 1 reply; 14+ messages in thread
From: Hans Hagen @ 2005-12-23 14:39 UTC (permalink / raw)


Mojca Miklavec wrote:

>I'll consider this as "leave Windows Vietnamese encoding unsupported"
>(they have two other encodings anyway).
>  
>
indeed (also, i never heard vnpenquin ask for it -)

Hans

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23 14:39         ` Hans Hagen
@ 2005-12-23 21:33           ` VnPenguin
  0 siblings, 0 replies; 14+ messages in thread
From: VnPenguin @ 2005-12-23 21:33 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On 12/23/05, Hans Hagen <pragma@wxs.nl> wrote:
> Mojca Miklavec wrote:
>
> >I'll consider this as "leave Windows Vietnamese encoding unsupported"
> >(they have two other encodings anyway).
> >
> >
> indeed (also, i never heard vnpenquin ask for it -)
>
Hi all,
In fact, VnTeX supports UTF-8, VISCII, TCVN, VPS. So it would be nice
if ConTeXt could do the same :) In reality, the charset UTF-8 becomes
a standard in Vietnam, and other charsets (VISCII, VPS, TCVN) are less
and less used. For me, I used always UTF-8 for all my documents here
(ConTeXt, OpenOffice, HTML, MySQL data,...).

Thank you for your wonderful work,

Merry Christmas !

--
http://vnoss.org
Vietnamese Open Source Software Community

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Character names
  2005-12-23  9:44     ` Character names Hans Hagen
  2005-12-23 12:29       ` Taco Hoekwater
@ 2005-12-25 23:48       ` Mojca Miklavec
  1 sibling, 0 replies; 14+ messages in thread
From: Mojca Miklavec @ 2005-12-25 23:48 UTC (permalink / raw)


Hans Hagen wrote:
> Mojca Miklavec wrote:
>
> >(are Greek quotations treated specially or what is this doing in regi-grk?)
> >% 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
> >\rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot
> >(in my point of view the last one may be better, but not fair since
> >it's language dependent: may be OK for French, but not for German or
> >vice versa; perhaps a language-sensitive macro could be inserted at
> >this place?)
> >
> >
> >
> see core-mis, maybe using
>
>   \symbol[\c!leftquotation]
>
> helps

The other way round: It's not "left quotation mark" that should turn
into right guillemot, but right guillemot that should be classified as
left or right quotation mark according to the current language in
order to guarantee proper line breaking.

> >I can send the new files as soon as it gets clear how to group them.
> >If additionalz the rest of the questions are answered, then new files
> >can become more consistent without breaking anything.

Sorry, it was Christmas inbetween, so "as soon" lasted a bit more than
a moment :)

> so ... split the files

They're here:
http://pub.mojca.org/tex/enco/contextbase/

Esp. for the Cyrillic one some definitions should be added first into
the core in order to support some accented characters (\cyrillicGJE,
\cyrillicKJE, \cyrillicgheupturn ... - see one of my last mails)

Any comments?

Mojca

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-12-25 23:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-21 14:19 Character names (was: Context 2005.12.19 released) Mojca Miklavec
2005-12-21 16:17 ` Taco Hoekwater
2005-12-21 18:09   ` Hans Hagen
2005-12-23  1:00   ` Mojca Miklavec
2005-12-23  9:44     ` Character names Hans Hagen
2005-12-23 12:29       ` Taco Hoekwater
2005-12-23 12:48         ` Mojca Miklavec
2005-12-25 23:48       ` Mojca Miklavec
2005-12-23 10:31     ` Hans Hagen
2005-12-23 12:36     ` Taco Hoekwater
2005-12-23 13:39       ` Mojca Miklavec
2005-12-23 14:18         ` Taco Hoekwater
2005-12-23 14:39         ` Hans Hagen
2005-12-23 21:33           ` VnPenguin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).