ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen <pragma@wxs.nl>
To: "Thomas A. Schmitz" <thomas.schmitz@uni-bonn.de>
Cc: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: two buglets
Date: Sun, 03 Oct 2010 17:10:02 +0200	[thread overview]
Message-ID: <4CA89CCA.8020705@wxs.nl> (raw)
In-Reply-To: <1D533ABC-6546-40C3-9CD4-9659A86D9A5E@uni-bonn.de>

On 3-10-2010 12:58, Thomas A. Schmitz wrote:
>
> On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
>
>> indeed, and in a nice obscure way ...
>>
>> \setuplayout[topspace=1cm,height=middle]
>>
>> \setupbodyfont[11pt]
>>
>> \starttext
>>
>> \def\Test#1%
>> {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>>     \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
>>     \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
>>     \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
>> \stopcolumns
>>
>> \page
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>>     \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
>>     \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
>>     \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
>> \stopcolumns
>>
>> \page
>>
>> \dorecurse {2} {
>>    \page \recurselevel:
>>         \index{oá}  \index{öb}  \index{Oč}  \index{Öď}
>>         \index{oo}  \index{öo}  \index{Oo}  \index{Öo}
>>         \index{Öq}  \index{öř}  \index{Oš}  \index{oů}
>>    done
>> }
>>
>> \stoptext
>
> Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with
> method=zm,pc,uc
> but I also get spurious empty lines in the index. I'll try and come up with a minimal example.

mm zm pm : use mapping order, add -1,0, +1 to different case and use 
shape info for missing entries (similar shapes)
mc zc pc : use mapping order, add -1,0, +1 to different case
uc: unicode order

so, you define a sequence of comparisons where for instance

U   -> order u +/- 1
\"u -> order of shape u +/- 1

etc .. a bit cryptic I admit ... some combinations give the same result 
depending on the vectors used. (Jano promissed to write up something.)

numbers are sorted in a special way

so, at some point we simplify characters and start looking at shapes and 
sort based on shapes which of course leads to clashes so in a next step 
we look at unicodes etc etc

>>> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
>>
>> Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
>>
>> I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
>>
>> local predefinedmethods = {
>>     [variables.before] = "mm,mc,uc",
>>     [variables.after]  = "pm,mc,uc",
>>     [variables.first]  = "pc,mm,uc",
>>     [variables.last]   = "mc,mm,uc",
>> }
>
> Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.

best would be to have a test file per language with in comments the 
expected order; such tests should also provide foreign entries

for instance, how would you mix german and greek in your books; we 
probably need some specialized vectors then, which is possible as the 
sorting language can be configured independent from the text language

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2010-10-03 15:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-11 15:52 Thomas A. Schmitz
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35   ` Thomas A. Schmitz
2010-02-11 19:29     ` Hans Hagen
2010-02-11 21:27       ` Thomas A. Schmitz
2010-02-11 18:14   ` David Rogers
2010-10-03  8:24   ` Thomas A. Schmitz
2010-10-03 10:29     ` Hans Hagen
2010-10-03 10:58       ` Thomas A. Schmitz
2010-10-03 15:10         ` Hans Hagen [this message]
2010-10-03 15:43           ` Thomas A. Schmitz
2010-10-05 12:15             ` Philipp Gesang
2010-10-05 12:39               ` Hans Hagen
2010-10-05 13:29               ` Thomas A. Schmitz
2010-10-05 21:17                 ` Philipp Gesang
2010-10-05 21:27                   ` Hans Hagen
2010-10-05 21:55                     ` Philipp Gesang
2010-10-06  7:50                       ` Hans Hagen
2010-02-11 17:19 ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CA89CCA.8020705@wxs.nl \
    --to=pragma@wxs.nl \
    --cc=ntg-context@ntg.nl \
    --cc=thomas.schmitz@uni-bonn.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).