ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Re: Index sorting for other languages that English
@ 2006-05-24 10:11 Richard Gabriel
  2006-05-24 15:55 ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Gabriel @ 2006-05-24 10:11 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 3014 bytes --]

Thanks Hans, it works with my test file, 
unless I set up:

\setupregister[index][expansion=xml]

which i need for correct processing of the XML files.
If I simply add this command into the testing TeX file (no XML), the Czech sorting stops to work and all accented characters are placed under "A".

Regarding the sorting itself (sort-lan.tex): 
I found the definiton of the sorting quite strange, let's say, incomplete. 
It makes no sense to separate ccaron while all other accented letters are placed under the unaccented ones.
I'll update the definitions, test it and send it to you.


-Richard



  _____  

From: Hans Hagen [mailto:pragma@wxs.nl]
To: mailing list for ConTeXt users [mailto:ntg-context@ntg.nl]
Sent: Tue, 23 May 2006 17:02:53 +0200
Subject: Re: [NTG-context] Index sorting for other languages that English

Richard Gabriel wrote:
  > Hello Hans,
  >
  > after an upgrade I noticed thar the index sorting works even worse 
  > than before (tested on Czech, Chinese and Japanese, but probably 
  > related to non-ASCII characters in common).
  >
  > With TeXExec 5.4.3, all words beginning with national (accented) 
  > characters were put into a separate ("symbols") group and placed 
  > before "A". This was not good but more or less acceptable.
  > With TeXExec 6.2.0, words beginning with accented characters are 
  > placed under certain unaccented letter. My colleague found out that 
  > these words are sorted according the first unaccented letter. This is 
  > unacceptable and unusable.
  >
  > We do a "work-around" so we try to avoid indexing words beginning with 
  > accented charaters. But it's impossible in many cases.
  > I'd like to ask you to improve the index sorting. Could I help or 
  > contribute in some way?
  >
  > Attached is a testing file, which creates 2 indexes from various Czech 
  > words (covering the Czech alphabet). The index should be sorted 
  > exactly that way as the terms are written in the file.
  >
  actually the nex texexec implementation does czech sorting but it's not enables yet in context itself (was experimental until now) 
  
  - download the latest version (i uploaded a version that enables it) 
  - don't forget \mainlanguage[cz] at the top of your document 
  - in sort-lan.tex you can see how czech sorting is defined 
  
  (context adds a lot of into to the tui file in order to get sorting done) 
  
  -----------------------------------------------------------------
                                            Hans Hagen | PRAGMA ADE
                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                               | www.pragma-pod.nl
  -----------------------------------------------------------------
  
  _______________________________________________
  ntg-context mailing list
  ntg-context@ntg.nl
  http://www.ntg.nl/mailman/listinfo/ntg-context
    

[-- Attachment #1.2: Type: text/html, Size: 3742 bytes --]

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
  2006-05-24 10:11 Index sorting for other languages that English Richard Gabriel
@ 2006-05-24 15:55 ` Hans Hagen
  2006-05-30  6:43   ` Index sorting for other languages than English (2) R. Ermers
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2006-05-24 15:55 UTC (permalink / raw)


Richard Gabriel wrote:
> Thanks Hans, it works with my test file,
> unless I set up:
>
> \setupregister[index][expansion=xml]
>
> which i need for correct processing of the XML files.
> If I simply add this command into the testing TeX file (no XML), the 
> Czech sorting stops to work and all accented characters are placed 
> under "A".
test file ...
>
> Regarding the sorting itself (sort-lan.tex):
> I found the definiton of the sorting quite strange, let's say, 
> incomplete.
> It makes no sense to separate ccaron while all other accented letters 
> are placed under the unaccented ones.
> I'll update the definitions, test it and send it to you.
ok 

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Index sorting for other languages than English (2)
  2006-05-24 15:55 ` Hans Hagen
@ 2006-05-30  6:43   ` R. Ermers
  2006-05-30  7:35     ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: R. Ermers @ 2006-05-30  6:43 UTC (permalink / raw)


Hi all,

I have a document in Dutch (\mainlanguage[nl]) in which I quote Turkish 
items, which I want to collect in a separate index, like this:

"Enkele voorbeelden zijn: \quote{oudere zus} \turkish{abla}, 
\quote{jongere broer of zus} \turkish{karde\c{s}}, de \quote{zus van 
vader} (\quote{tante}) \turkish{hala, \quote{de zus van moeder} 
\turkish{teyze}. Voor aangetrouwde familieleden gelden soms juist vagere 
termen dan in het Nederlands, bijv. \quote{aangetrouwde tante} en 
\quote{schoonzuster}, \turkish{yenge}."

The index, however, is based on Dutch (mainlanguage). This causes two 
problems:

1. words with accents, like s\"oz, are not sorted correctly to any standard:
S
söz kesmek 76
saygı 14
s¸eref 3, 14, 24, 27

2. letters with diacritics, like \c{s} (under which \c{s}eref is to be 
placed) are not included in the alphabetical listing in the index, which 
of course follows the Dutch alphabet.

Does anyone have a solution?

Regards,

Robert


_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages than English (2)
  2006-05-30  6:43   ` Index sorting for other languages than English (2) R. Ermers
@ 2006-05-30  7:35     ` Hans Hagen
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-05-30  7:35 UTC (permalink / raw)


R. Ermers wrote:
> Hi all,
>
> I have a document in Dutch (\mainlanguage[nl]) in which I quote Turkish 
> items, which I want to collect in a separate index, like this:
>
> "Enkele voorbeelden zijn: \quote{oudere zus} \turkish{abla}, 
> \quote{jongere broer of zus} \turkish{karde\c{s}}, de \quote{zus van 
> vader} (\quote{tante}) \turkish{hala, \quote{de zus van moeder} 
> \turkish{teyze}. Voor aangetrouwde familieleden gelden soms juist vagere 
> termen dan in het Nederlands, bijv. \quote{aangetrouwde tante} en 
> \quote{schoonzuster}, \turkish{yenge}."
>
> The index, however, is based on Dutch (mainlanguage). This causes two 
> problems:
>
> 1. words with accents, like s\"oz, are not sorted correctly to any standard:
> S
> söz kesmek 76
> saygı 14
> s¸eref 3, 14, 24, 27
>
> 2. letters with diacritics, like \c{s} (under which \c{s}eref is to be 
> placed) are not included in the alphabetical listing in the index, which 
> of course follows the Dutch alphabet.
>
> Does anyone have a solution?
>   
hm, so we need a mixed sorting mechanism

(in sort-lan.tex you can define a sort order for turkish but it still 
concerns the whole doc then)

Hans
_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages than English (2)
  2006-05-30  7:23 Richard Gabriel
@ 2006-05-30  8:00 ` Hans Hagen
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-05-30  8:00 UTC (permalink / raw)


Richard Gabriel wrote:
> I'd suggest you to use the extended variant of the \index macro. There 
> you can specify an ASCII equivalent of the word, which will be used 
> for sorting:
>
> \index[soz kesmek]{s\"oz kesmek}
> \index[seref]{\c seref}
actually, supporting multiple indexes with their own sort order is kind 
of prepared but never completed, so i'll have a look at it

Hans

-- 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages than English (2)
@ 2006-05-30  7:23 Richard Gabriel
  2006-05-30  8:00 ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Gabriel @ 2006-05-30  7:23 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 1687 bytes --]

I'd suggest you to use the extended variant of the \index macro. There you can specify an ASCII equivalent of the word, which will be used for sorting:

\index[soz kesmek]{s\"oz kesmek}
\index[seref]{\c seref}

-Richard


  _____  

From: "R. Ermers" [mailto:r.ermers@hccnet.nl]
To: mailing list for ConTeXt users [mailto:ntg-context@ntg.nl]
Sent: Tue, 30 May 2006 08:43:01 +0200
Subject: [NTG-context] Index sorting for other languages than English (2)

Hi all,
  
  I have a document in Dutch (\mainlanguage[nl]) in which I quote Turkish 
  items, which I want to collect in a separate index, like this:
  
  "Enkele voorbeelden zijn: \quote{oudere zus} \turkish{abla}, 
  \quote{jongere broer of zus} \turkish{karde\c{s}}, de \quote{zus van 
  vader} (\quote{tante}) \turkish{hala, \quote{de zus van moeder} 
  \turkish{teyze}. Voor aangetrouwde familieleden gelden soms juist vagere 
  termen dan in het Nederlands, bijv. \quote{aangetrouwde tante} en 
  \quote{schoonzuster}, \turkish{yenge}."
  
  The index, however, is based on Dutch (mainlanguage). This causes two 
  problems:
  
  1. words with accents, like s\"oz, are not sorted correctly to any standard:
  S
  söz kesmek 76
  saygı 14
  s¸eref 3, 14, 24, 27
  
  2. letters with diacritics, like \c{s} (under which \c{s}eref is to be 
  placed) are not included in the alphabetical listing in the index, which 
  of course follows the Dutch alphabet.
  
  Does anyone have a solution?
  
  Regards,
  
  Robert
  
  
  _______________________________________________
  ntg-context mailing list
  ntg-context@ntg.nl
  http://www.ntg.nl/mailman/listinfo/ntg-context
    

[-- Attachment #1.2: Type: text/html, Size: 2309 bytes --]

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-05-30  8:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-24 10:11 Index sorting for other languages that English Richard Gabriel
2006-05-24 15:55 ` Hans Hagen
2006-05-30  6:43   ` Index sorting for other languages than English (2) R. Ermers
2006-05-30  7:35     ` Hans Hagen
2006-05-30  7:23 Richard Gabriel
2006-05-30  8:00 ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).