ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Index sorting for other languages that English
@ 2006-05-23 10:22 Richard Gabriel
  2006-05-23 13:41 ` John R. Culleton
  2006-05-23 15:02 ` Hans Hagen
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Gabriel @ 2006-05-23 10:22 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 1046 bytes --]

Hello Hans,

after an upgrade I noticed thar the index sorting works even worse than before (tested on Czech, Chinese and Japanese, but probably related to non-ASCII characters in common).

With TeXExec 5.4.3, all words beginning with national (accented) characters were put into a separate ("symbols") group and placed before "A". This was not good but more or less acceptable.
With TeXExec 6.2.0, words beginning with accented characters are placed under certain unaccented letter. My colleague found out that these words are sorted according the first unaccented letter. This is unacceptable and unusable.

We do a "work-around" so we try to avoid indexing words beginning with accented charaters. But it's impossible in many cases.
I'd like to ask you to improve the index sorting. Could I help or contribute in some way?

Attached is a testing file, which creates 2 indexes from various Czech words (covering the Czech alphabet). The index should be sorted exactly that way as the terms are written in the file.

Thanks,
Richard

[-- Attachment #1.2: Type: text/html, Size: 1326 bytes --]

[-- Attachment #2: test.tex --]
[-- Type: application/x-tex, Size: 1485 bytes --]

[-- Attachment #3: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
  2006-05-23 10:22 Index sorting for other languages that English Richard Gabriel
@ 2006-05-23 13:41 ` John R. Culleton
  2006-05-23 15:02 ` Hans Hagen
  1 sibling, 0 replies; 6+ messages in thread
From: John R. Culleton @ 2006-05-23 13:41 UTC (permalink / raw)


On Tuesday 23 May 2006 06:22, Richard Gabriel wrote:
> Hello Hans,
>
> after an upgrade I noticed thar the index sorting works even worse than
> before (tested on Czech, Chinese and Japanese, but probably related to
> non-ASCII characters in common).
>
> With TeXExec 5.4.3, all words beginning with national (accented) characters
> were put into a separate ("symbols") group and placed before "A". This was
> not good but more or less acceptable. With TeXExec 6.2.0, words beginning
> with accented characters are placed under certain unaccented letter. My
> colleague found out that these words are sorted according the first
> unaccented letter. This is unacceptable and unusable.
>
> We do a "work-around" so we try to avoid indexing words beginning with
> accented charaters. But it's impossible in many cases. I'd like to ask you
> to improve the index sorting. Could I help or contribute in some way?
>
> Attached is a testing file, which creates 2 indexes from various Czech
> words (covering the Czech alphabet). The index should be sorted exactly
> that way as the terms are written in the file.
>
> Thanks,
> Richard

Try Xindy. It has facilities for sorting according to arbitrary
alphabetic orders including Czech. It fits in the workflow much
as does makeindex, but perhaps it could be adapted to a Context
runstream. 

-- 
John Culleton
Books with answers to marketing and publishing questions:
http://wexfordpress.com/tex/shortlist.pdf

Book coaches, consultants and packagers:
http://wexfordpress.com/tex/packagers.pdf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
  2006-05-23 10:22 Index sorting for other languages that English Richard Gabriel
  2006-05-23 13:41 ` John R. Culleton
@ 2006-05-23 15:02 ` Hans Hagen
  1 sibling, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-05-23 15:02 UTC (permalink / raw)


Richard Gabriel wrote:
> Hello Hans,
>
> after an upgrade I noticed thar the index sorting works even worse 
> than before (tested on Czech, Chinese and Japanese, but probably 
> related to non-ASCII characters in common).
>
> With TeXExec 5.4.3, all words beginning with national (accented) 
> characters were put into a separate ("symbols") group and placed 
> before "A". This was not good but more or less acceptable.
> With TeXExec 6.2.0, words beginning with accented characters are 
> placed under certain unaccented letter. My colleague found out that 
> these words are sorted according the first unaccented letter. This is 
> unacceptable and unusable.
>
> We do a "work-around" so we try to avoid indexing words beginning with 
> accented charaters. But it's impossible in many cases.
> I'd like to ask you to improve the index sorting. Could I help or 
> contribute in some way?
>
> Attached is a testing file, which creates 2 indexes from various Czech 
> words (covering the Czech alphabet). The index should be sorted 
> exactly that way as the terms are written in the file.
>
actually the nex texexec implementation does czech sorting but it's not enables yet in context itself (was experimental until now) 

- download the latest version (i uploaded a version that enables it) 
- don't forget \mainlanguage[cz] at the top of your document 
- in sort-lan.tex you can see how czech sorting is defined 

(context adds a lot of into to the tui file in order to get sorting done) 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
@ 2006-05-26 10:07 Richard Gabriel
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Gabriel @ 2006-05-26 10:07 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 3534 bytes --]

Here is the test file. 
If you remove the \setupregister command, or simply set expansion=no, the sorting will work perfectly.
With expansion=yes or expansion=xml, the accented letters are sorted under "A".

Below are my updated sorting rules again...

-Richard


---
\def\czsortdivisionch{ch}
\def\czsortdivisionCh{Ch}

\startmode[sortorder-cz]
    \exportsortexpansion{aacute}{a+1}
    \exportsortexpansion{Aacute}{A+1}
    \exportsortexpansion{ccaron}{c+1}
    \exportsortexpansion{Ccaron}{C+1}
    \exportsortdivision{c+1}{ccaron}
    \exportsortexpansion{dcaron}{d+1}
    \exportsortexpansion{Dcaron}{C+1}
    \exportsortdivision{d+1}{dcaron}
    \exportsortexpansion{eacute}{e+1}
    \exportsortexpansion{Eacute}{E+1}
    \exportsortexpansion{ecaron}{e+2}
    \exportsortexpansion{Ecaron}{E+2}
    \exportsortreduction{ch}{h+1}
    \exportsortexpansion{ch}{h+1}
    \exportsortreduction{Ch}{h+1}
    \exportsortexpansion{Ch}{h+1}
    \exportsortdivision{h+1}{czsortdivisionch}
    \exportsortexpansion{iacute}{i+1}
    \exportsortexpansion{Iacute}{I+1}
    \exportsortexpansion{ncaron}{n+1}
    \exportsortexpansion{Ncaron}{n+1}
    \exportsortdivision{n+1}{ncaron}
    \exportsortexpansion{oacute}{o+1}
    \exportsortexpansion{Oacute}{O+1}
    \exportsortexpansion{rcaron}{r+1}
    \exportsortexpansion{Rcaron}{R+1}
    \exportsortdivision{r+1}{rcaron}
    \exportsortexpansion{scaron}{s+1}
    \exportsortexpansion{Scaron}{S+1}
    \exportsortdivision{s+1}{scaron}
    \exportsortexpansion{tcaron}{t+1}
    \exportsortexpansion{Tcaron}{T+1}
    \exportsortdivision{t+1}{tcaron}
    \exportsortexpansion{uacute}{u+1}
    \exportsortexpansion{Uacute}{U+1}
    \exportsortexpansion{uring}{u+2}
    \exportsortexpansion{Uring}{U+2}
    \exportsortexpansion{yacute}{y+1}
    \exportsortexpansion{Yacute}{Y+1}
    \exportsortexpansion{zcaron}{z+1}
    \exportsortexpansion{Zcaron}{Z+1}
    \exportsortdivision{z+1}{zcaron}
\stopmode



  _____  

From: Hans Hagen [mailto:pragma@wxs.nl]
To: mailing list for ConTeXt users [mailto:ntg-context@ntg.nl]
Sent: Wed, 24 May 2006 17:55:02 +0200
Subject: Re: [NTG-context] Index sorting for other languages that English

Richard Gabriel wrote:
  > Thanks Hans, it works with my test file,
  > unless I set up:
  >
  > \setupregister[index][expansion=xml]
  >
  > which i need for correct processing of the XML files.
  > If I simply add this command into the testing TeX file (no XML), the 
  > Czech sorting stops to work and all accented characters are placed 
  > under "A".
  test file ...
  >
  > Regarding the sorting itself (sort-lan.tex):
  > I found the definiton of the sorting quite strange, let's say, 
  > incomplete.
  > It makes no sense to separate ccaron while all other accented letters 
  > are placed under the unaccented ones.
  > I'll update the definitions, test it and send it to you.
  ok 
  
  Hans
  
  -----------------------------------------------------------------
                                            Hans Hagen | PRAGMA ADE
                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                               | www.pragma-pod.nl
  -----------------------------------------------------------------
  
  _______________________________________________
  ntg-context mailing list
  ntg-context@ntg.nl
  http://www.ntg.nl/mailman/listinfo/ntg-context
    

[-- Attachment #1.2: Type: text/html, Size: 4935 bytes --]

[-- Attachment #2: test.tex --]
[-- Type: application/x-tex, Size: 1673 bytes --]

[-- Attachment #3: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
  2006-05-24 10:11 Richard Gabriel
@ 2006-05-24 15:55 ` Hans Hagen
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-05-24 15:55 UTC (permalink / raw)


Richard Gabriel wrote:
> Thanks Hans, it works with my test file,
> unless I set up:
>
> \setupregister[index][expansion=xml]
>
> which i need for correct processing of the XML files.
> If I simply add this command into the testing TeX file (no XML), the 
> Czech sorting stops to work and all accented characters are placed 
> under "A".
test file ...
>
> Regarding the sorting itself (sort-lan.tex):
> I found the definiton of the sorting quite strange, let's say, 
> incomplete.
> It makes no sense to separate ccaron while all other accented letters 
> are placed under the unaccented ones.
> I'll update the definitions, test it and send it to you.
ok 

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Index sorting for other languages that English
@ 2006-05-24 10:11 Richard Gabriel
  2006-05-24 15:55 ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Gabriel @ 2006-05-24 10:11 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 3014 bytes --]

Thanks Hans, it works with my test file, 
unless I set up:

\setupregister[index][expansion=xml]

which i need for correct processing of the XML files.
If I simply add this command into the testing TeX file (no XML), the Czech sorting stops to work and all accented characters are placed under "A".

Regarding the sorting itself (sort-lan.tex): 
I found the definiton of the sorting quite strange, let's say, incomplete. 
It makes no sense to separate ccaron while all other accented letters are placed under the unaccented ones.
I'll update the definitions, test it and send it to you.


-Richard



  _____  

From: Hans Hagen [mailto:pragma@wxs.nl]
To: mailing list for ConTeXt users [mailto:ntg-context@ntg.nl]
Sent: Tue, 23 May 2006 17:02:53 +0200
Subject: Re: [NTG-context] Index sorting for other languages that English

Richard Gabriel wrote:
  > Hello Hans,
  >
  > after an upgrade I noticed thar the index sorting works even worse 
  > than before (tested on Czech, Chinese and Japanese, but probably 
  > related to non-ASCII characters in common).
  >
  > With TeXExec 5.4.3, all words beginning with national (accented) 
  > characters were put into a separate ("symbols") group and placed 
  > before "A". This was not good but more or less acceptable.
  > With TeXExec 6.2.0, words beginning with accented characters are 
  > placed under certain unaccented letter. My colleague found out that 
  > these words are sorted according the first unaccented letter. This is 
  > unacceptable and unusable.
  >
  > We do a "work-around" so we try to avoid indexing words beginning with 
  > accented charaters. But it's impossible in many cases.
  > I'd like to ask you to improve the index sorting. Could I help or 
  > contribute in some way?
  >
  > Attached is a testing file, which creates 2 indexes from various Czech 
  > words (covering the Czech alphabet). The index should be sorted 
  > exactly that way as the terms are written in the file.
  >
  actually the nex texexec implementation does czech sorting but it's not enables yet in context itself (was experimental until now) 
  
  - download the latest version (i uploaded a version that enables it) 
  - don't forget \mainlanguage[cz] at the top of your document 
  - in sort-lan.tex you can see how czech sorting is defined 
  
  (context adds a lot of into to the tui file in order to get sorting done) 
  
  -----------------------------------------------------------------
                                            Hans Hagen | PRAGMA ADE
                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                               | www.pragma-pod.nl
  -----------------------------------------------------------------
  
  _______________________________________________
  ntg-context mailing list
  ntg-context@ntg.nl
  http://www.ntg.nl/mailman/listinfo/ntg-context
    

[-- Attachment #1.2: Type: text/html, Size: 3742 bytes --]

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-05-26 10:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-23 10:22 Index sorting for other languages that English Richard Gabriel
2006-05-23 13:41 ` John R. Culleton
2006-05-23 15:02 ` Hans Hagen
2006-05-24 10:11 Richard Gabriel
2006-05-24 15:55 ` Hans Hagen
2006-05-26 10:07 Richard Gabriel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).