ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* two buglets
@ 2010-02-11 15:52 Thomas A. Schmitz
  2010-02-11 17:17 ` Hans Hagen
  2010-02-11 17:19 ` Hans Hagen
  0 siblings, 2 replies; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 15:52 UTC (permalink / raw)
  To: mailing ConTeXt users list for

Hi all,

working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs):

1. index sorts uppercase letters after lowercase letters. Minimal example:

\starttext

\index{Aardvark}Aardvark

\index{azygous}azygous

\page

\setupregister[index][n=1]
\placeregister[index]

\stoptext

I would expect azygous to follow Aardvark, but it is sorted before.

2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:

! LuaTeX error ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: attempt to compare nil with number
stack traceback:
	...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: in function <...text/tex/texmf-context/tex/context/base/bibl-tra.lua:76>
	[C]: in function 'sort'
	...text/tex/texmf-context/tex/context/base/bibl-tra.lua:84: in function 'flush'
	<main ctx instance>:1: in main chunk.
\typesetpubslist ...hacks.flush("\@@pbsorttype ")}
                                                  \doendoflist 
\dodoplacepublications ...sttrue \typesetpubslist 
                                                  \inpublistfalse \endgroup ...
l.37 \placepublications[criterium=all]
                                      
minimal example (the typo \cite[clarke199] instead of \cite[clarke1999a] is there on purpose to demonstrate the problem):

\setuppublications[state=start,
                   sorttype=bbl,
                   refcommand=authornum,
                   numbering=yes]

\setuppublicationlist[samplesize={VSdK90},totalnumber=2]

\startpublication[k=champion2004,t=book,
a={{Champion}},y=2004,
n=10,s=Cha04]
\author[]{Craige~B.}[C.~B.]{}{Champion}
\pubyear{2004}
\title{Cultural Politics in Polybius's {\em Histories}}
\city{Berkeley}
\pubname{Univ. of California Pr.}
\stoppublication

\startpublication[k=clarke1999a,t=book,
a={{Clarke}},y=1999b,
n=9,s=Cla99b]
\author[]{Katherine}[K.]{}{Clarke}
\pubyear{1999\maybeyear{b}}
\title{Between Geography and History: Hellenistic Constructions of the Roman
  World}
\city{Oxford}
\pubname{Oxford UP}
\stoppublication

\starttext

\cite[champion2004]

\cite[clarke199]

\page

\placepublications[criterium=all]

\stoptext

Could this error be handled more gracefully, i.e. intercepted?

All best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 15:52 two buglets Thomas A. Schmitz
@ 2010-02-11 17:17 ` Hans Hagen
  2010-02-11 17:35   ` Thomas A. Schmitz
                     ` (2 more replies)
  2010-02-11 17:19 ` Hans Hagen
  1 sibling, 3 replies; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 17:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz

On 11-2-2010 16:52, Thomas A. Schmitz wrote:
> Hi all,
>
> working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs):
>
> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>
> \starttext
>
> \index{Aardvark}Aardvark
>
> \index{azygous}azygous
>
> \page
>
> \setupregister[index][n=1]
> \placeregister[index]
>
> \stoptext
>
> I would expect azygous to follow Aardvark, but it is sorted before.


are you sure that that's the convention for english? it's easy to change 
it ...

\startluacode
sorters.mappings['en'] = {
     ["a"] =  2, ["b"] =  4, ["c"] =  6, ["d"] =  8, ["e"] = 10,
     ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
     ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
     ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
     ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
     ["z"] = 52,
     ["A"] =  1, ["B"] =  3, ["C"] =  5, ["D"] =  7, ["E"] =  9,
     ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
     ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
     ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
     ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
     ["Z"] = 51,
}
\stopluacode

\starttext
     \index{Aardvark}Aardvark \par
     \index{azygous}azygous
     \placeregister[index][n=1]
\stoptext



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 15:52 two buglets Thomas A. Schmitz
  2010-02-11 17:17 ` Hans Hagen
@ 2010-02-11 17:19 ` Hans Hagen
  1 sibling, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 17:19 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz

On 11-2-2010 16:52, Thomas A. Schmitz wrote:

> 2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:

so what do you expect? to drop that entry? or else, what default key to 
use?

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 17:17 ` Hans Hagen
@ 2010-02-11 17:35   ` Thomas A. Schmitz
  2010-02-11 19:29     ` Hans Hagen
  2010-02-11 18:14   ` David Rogers
  2010-10-03  8:24   ` Thomas A. Schmitz
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 17:35 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:

> are you sure that that's the convention for english? it's easy to change it ...
> 
> \startluacode
> sorters.mappings['en'] = {
>    ["a"] =  2, ["b"] =  4, ["c"] =  6, ["d"] =  8, ["e"] = 10,
>    ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>    ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>    ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>    ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>    ["z"] = 52,
>    ["A"] =  1, ["B"] =  3, ["C"] =  5, ["D"] =  7, ["E"] =  9,
>    ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>    ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>    ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>    ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>    ["Z"] = 51,
> }
> \stopluacode
> 
> \starttext
>    \index{Aardvark}Aardvark \par
>    \index{azygous}azygous
>    \placeregister[index][n=1]
> \stoptext

No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous -> aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried

\startluacode
sorters.mappings['en'] = {
   ["a"] =  1, ["b"] =  2, ["c"] =  3, ["d"] =  4, ["e"] = 5,
   ["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10,
   ["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15,
   ["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20,
   ["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25,
   ["z"] = 26,
}
\stopluacode

but that didn't work.

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 17:17 ` Hans Hagen
  2010-02-11 17:35   ` Thomas A. Schmitz
@ 2010-02-11 18:14   ` David Rogers
  2010-10-03  8:24   ` Thomas A. Schmitz
  2 siblings, 0 replies; 19+ messages in thread
From: David Rogers @ 2010-02-11 18:14 UTC (permalink / raw)
  To: mailing list for ConTeXt users

* Hans Hagen <pragma@wxs.nl> [2010-02-11 18:17]:

>are you sure that that's the convention for english? it's easy to 
>change it ...

I've never seen an ordinary English index that was sorted by case.
English indexes should definitely default to case-insensitive.

(Has anyone here ever been asked for an index in English sorted by
case?)


-- 
David
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 17:35   ` Thomas A. Schmitz
@ 2010-02-11 19:29     ` Hans Hagen
  2010-02-11 21:27       ` Thomas A. Schmitz
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 19:29 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz

On 11-2-2010 18:35, Thomas A. Schmitz wrote:
>
> On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
>
>> are you sure that that's the convention for english? it's easy to change it ...
>>
>> \startluacode
>> sorters.mappings['en'] = {
>>     ["a"] =  2, ["b"] =  4, ["c"] =  6, ["d"] =  8, ["e"] = 10,
>>     ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>>     ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>>     ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>>     ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>>     ["z"] = 52,
>>     ["A"] =  1, ["B"] =  3, ["C"] =  5, ["D"] =  7, ["E"] =  9,
>>     ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>>     ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>>     ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>>     ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>>     ["Z"] = 51,
>> }
>> \stopluacode
>>
>> \starttext
>>     \index{Aardvark}Aardvark \par
>>     \index{azygous}azygous
>>     \placeregister[index][n=1]
>> \stoptext
>
> No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous ->  aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried
>
> \startluacode
> sorters.mappings['en'] = {
>     ["a"] =  1, ["b"] =  2, ["c"] =  3, ["d"] =  4, ["e"] = 5,
>     ["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10,
>     ["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15,
>     ["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20,
>     ["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25,
>     ["z"] = 26,
> }
> \stopluacode
>
> but that didn't work.

just give them the same code, so "A"=1, "a"=1

(we could make that an option: upper first, lower first, mixed)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 19:29     ` Hans Hagen
@ 2010-02-11 21:27       ` Thomas A. Schmitz
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 21:27 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 11, 2010, at 8:29 PM, Hans Hagen wrote:

> just give them the same code, so "A"=1, "a"=1
> 
> (we could make that an option: upper first, lower first, mixed)
> 
> Hans

Thank you, Hans, that works nicely! It would be good to have this as an option. And I would vote for having the "mixed" setting as default. I wasn't even aware that there were indexes that sort according to case.

All best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-02-11 17:17 ` Hans Hagen
  2010-02-11 17:35   ` Thomas A. Schmitz
  2010-02-11 18:14   ` David Rogers
@ 2010-10-03  8:24   ` Thomas A. Schmitz
  2010-10-03 10:29     ` Hans Hagen
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03  8:24 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi all, Hans,


On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:

>> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>> 
>> \starttext
>> 
>> \index{Aardvark}Aardvark
>> 
>> \index{azygous}azygous
>> 
>> \page
>> 
>> \setupregister[index][n=1]
>> \placeregister[index]
>> 
>> \stoptext
>> 
>> I would expect azygous to follow Aardvark, but it is sorted before.
> 
> 
> are you sure that that's the convention for english? it's easy to change it ...
> 
> \startluacode
> sorters.mappings['en'] = {
>    ["a"] =  2, ["b"] =  4, ["c"] =  6, ["d"] =  8, ["e"] = 10,
>    ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>    ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>    ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>    ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>    ["z"] = 52,
>    ["A"] =  1, ["B"] =  3, ["C"] =  5, ["D"] =  7, ["E"] =  9,
>    ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>    ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>    ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>    ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>    ["Z"] = 51,
> }
> \stopluacode
> 
> \starttext
>    \index{Aardvark}Aardvark \par
>    \index{azygous}azygous
>    \placeregister[index][n=1]
> \stoptext
> 

we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions:

1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now?

2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.

All best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-03  8:24   ` Thomas A. Schmitz
@ 2010-10-03 10:29     ` Hans Hagen
  2010-10-03 10:58       ` Thomas A. Schmitz
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-03 10:29 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz

On 3-10-2010 10:24, Thomas A. Schmitz wrote:
> Hi all, Hans,
>
>
> On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
>
>>> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>>>
>>> \starttext
>>>
>>> \index{Aardvark}Aardvark
>>>
>>> \index{azygous}azygous
>>>
>>> \page
>>>
>>> \setupregister[index][n=1]
>>> \placeregister[index]
>>>
>>> \stoptext
>>>
>>> I would expect azygous to follow Aardvark, but it is sorted before.
>>
>>
>> are you sure that that's the convention for english? it's easy to change it ...
>>
>> \startluacode
>> sorters.mappings['en'] = {
>>     ["a"] =  2, ["b"] =  4, ["c"] =  6, ["d"] =  8, ["e"] = 10,
>>     ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>>     ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>>     ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>>     ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>>     ["z"] = 52,
>>     ["A"] =  1, ["B"] =  3, ["C"] =  5, ["D"] =  7, ["E"] =  9,
>>     ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>>     ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>>     ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>>     ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>>     ["Z"] = 51,
>> }
>> \stopluacode
>>
>> \starttext
>>     \index{Aardvark}Aardvark \par
>>     \index{azygous}azygous
>>     \placeregister[index][n=1]
>> \stoptext
>>
>
> we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions:
>
> 1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now?

indeed, and in a nice obscure way ...

\setuplayout[topspace=1cm,height=middle]

\setupbodyfont[11pt]

\starttext

\def\Test#1%
 
{\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
     \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
     \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
     \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
\stopcolumns

\page

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
     \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
     \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
     \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
\stopcolumns

\page

\dorecurse {2} {
    \page \recurselevel:
         \index{oá}  \index{öb}  \index{Oč}  \index{Öď}
         \index{oo}  \index{öo}  \index{Oo}  \index{Öo}
         \index{Öq}  \index{öř}  \index{Oš}  \index{oů}
    done
}

\stoptext

> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.

Currently Jano and I are figuring out some details (as Jano does the 
testing with more complex multilingual indices).

I have no preferece ... we can configure each language independently 
using the method key in the entries in sort-lan.lua As I seldom consult 
an index I have no clue what to expect or default to so feel free to 
tell me what the defaults should be. We now have predefined:

local predefinedmethods = {
     [variables.before] = "mm,mc,uc",
     [variables.after]  = "pm,mc,uc",
     [variables.first]  = "pc,mm,uc",
     [variables.last]   = "mc,mm,uc",
}

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-03 10:29     ` Hans Hagen
@ 2010-10-03 10:58       ` Thomas A. Schmitz
  2010-10-03 15:10         ` Hans Hagen
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03 10:58 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:

> indeed, and in a nice obscure way ...
> 
> \setuplayout[topspace=1cm,height=middle]
> 
> \setupbodyfont[11pt]
> 
> \starttext
> 
> \def\Test#1%
> {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
> 
> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
> 
> \startcolumns[n=3]
>    \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
>    \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
>    \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
> \stopcolumns
> 
> \page
> 
> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
> 
> \startcolumns[n=3]
>    \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
>    \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
>    \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
> \stopcolumns
> 
> \page
> 
> \dorecurse {2} {
>   \page \recurselevel:
>        \index{oá}  \index{öb}  \index{Oč}  \index{Öď}
>        \index{oo}  \index{öo}  \index{Oo}  \index{Öo}
>        \index{Öq}  \index{öř}  \index{Oš}  \index{oů}
>   done
> }
> 
> \stoptext

Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with 
method=zm,pc,uc
but I also get spurious empty lines in the index. I'll try and come up with a minimal example.

> 
>> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
> 
> Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
> 
> I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
> 
> local predefinedmethods = {
>    [variables.before] = "mm,mc,uc",
>    [variables.after]  = "pm,mc,uc",
>    [variables.first]  = "pc,mm,uc",
>    [variables.last]   = "mc,mm,uc",
> }

Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.

THanks, and all best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-03 10:58       ` Thomas A. Schmitz
@ 2010-10-03 15:10         ` Hans Hagen
  2010-10-03 15:43           ` Thomas A. Schmitz
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-03 15:10 UTC (permalink / raw)
  To: Thomas A. Schmitz; +Cc: mailing list for ConTeXt users

On 3-10-2010 12:58, Thomas A. Schmitz wrote:
>
> On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
>
>> indeed, and in a nice obscure way ...
>>
>> \setuplayout[topspace=1cm,height=middle]
>>
>> \setupbodyfont[11pt]
>>
>> \starttext
>>
>> \def\Test#1%
>> {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>>     \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
>>     \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
>>     \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
>> \stopcolumns
>>
>> \page
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>>     \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
>>     \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
>>     \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
>> \stopcolumns
>>
>> \page
>>
>> \dorecurse {2} {
>>    \page \recurselevel:
>>         \index{oá}  \index{öb}  \index{Oč}  \index{Öď}
>>         \index{oo}  \index{öo}  \index{Oo}  \index{Öo}
>>         \index{Öq}  \index{öř}  \index{Oš}  \index{oů}
>>    done
>> }
>>
>> \stoptext
>
> Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with
> method=zm,pc,uc
> but I also get spurious empty lines in the index. I'll try and come up with a minimal example.

mm zm pm : use mapping order, add -1,0, +1 to different case and use 
shape info for missing entries (similar shapes)
mc zc pc : use mapping order, add -1,0, +1 to different case
uc: unicode order

so, you define a sequence of comparisons where for instance

U   -> order u +/- 1
\"u -> order of shape u +/- 1

etc .. a bit cryptic I admit ... some combinations give the same result 
depending on the vectors used. (Jano promissed to write up something.)

numbers are sorted in a special way

so, at some point we simplify characters and start looking at shapes and 
sort based on shapes which of course leads to clashes so in a next step 
we look at unicodes etc etc

>>> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
>>
>> Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
>>
>> I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
>>
>> local predefinedmethods = {
>>     [variables.before] = "mm,mc,uc",
>>     [variables.after]  = "pm,mc,uc",
>>     [variables.first]  = "pc,mm,uc",
>>     [variables.last]   = "mc,mm,uc",
>> }
>
> Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.

best would be to have a test file per language with in comments the 
expected order; such tests should also provide foreign entries

for instance, how would you mix german and greek in your books; we 
probably need some specialized vectors then, which is possible as the 
sorting language can be configured independent from the text language

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-03 15:10         ` Hans Hagen
@ 2010-10-03 15:43           ` Thomas A. Schmitz
  2010-10-05 12:15             ` Philipp Gesang
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03 15:43 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote:

> 
> mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes)
> mc zc pc : use mapping order, add -1,0, +1 to different case
> uc: unicode order
> 
> so, you define a sequence of comparisons where for instance
> 
> U   -> order u +/- 1
> \"u -> order of shape u +/- 1
> 
> etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.)
> 
> numbers are sorted in a special way
> 
> so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc
> 

OK, that makes sense. I'll play with it, but having a few choice pages on the wiki would be great!

>>>> 
> 
> best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries
> 
> for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language
> 
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis.

For Greek: I just looked at a dozen books here on my shelf. Most English books have a separate index for Greek terms; when they sort Greek terms with English words, they use transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If I ever see the necessity of setting this up, I'll be in touch off-list, but it's such an unusual thing that I think you shouldn't bother now.

All best

Thomas

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-03 15:43           ` Thomas A. Schmitz
@ 2010-10-05 12:15             ` Philipp Gesang
  2010-10-05 12:39               ` Hans Hagen
  2010-10-05 13:29               ` Thomas A. Schmitz
  0 siblings, 2 replies; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 12:15 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1.1: Type: text/plain, Size: 2291 bytes --]

On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
> OK, I'll write something for German and English, but the thing
> is that we need more input what users expect. For mixtures with
> foreign languages, there might not be generally accepted rules at
> all, so people will define something on an ad-hoc basis.

Hi Thomas and others,

technically speaking the problem is solved by ISO 14651.[1]

In praxi multilingual sorting depends on local rules, of
which “One index per script|language.” seems to be the most
common.

Some time ago I made an lpeg from the bnf in [1]. It matches the
collation rules from [2], but as I couldn’t figure out how to map
them onto context’s sorting mechanism I never got around to
actually capture the information. As I won’t be having the time
to try it with the new structure of sort-lan I guess I’ll just
attach the peg grammar for anyone to use as a starting point.
Unicode collation would be great to have in context.

> transliteration. The problem with polytonic Greek is that so many
> different unicode characters need to have the same sort entry. If

Isn’t that just what the Greek rules in sort-lan.lua do? If not
then it would be a bug.

····startsnippet·················································

definitions["gr"] = {
    entries = {
        ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α",
        ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α",
        ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α",
        ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α",
        ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β",

····stopsnippet··················································

Always nice to have a decent discussion on sorting ;)

Philipp


[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.1.2: iso14651-parser.lua --]
[-- Type: text/plain, Size: 3747 bytes --]

require "lpeg"

local C, Cs, Ct, P, R, S, V, match = lpeg.C, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.match

local iso_parser

rules = P{
    [1] = "weight_table",

    -- Define collation tables as sequences of lines

    weight_table = V"common_template_table" + V"tailored_table",
    common_template_table = V"simple_line"^0,
    tailored_table = V"table_line"^0,

    -- Define the line types

    simple_line = (V"symbol_definition" + V"collating_element" +
                   V"weight_assignment" + V"order_end")^-1 * V"line_completion" --/ function (first) io.write("simple: "..first) end
                   ,
    --table_line = V"simple_line" + V"tailoring_line",
    table_line = V"tailoring_line" + V"simple_line",
    tailoring_line = (V"reorder_after" + V"order_start" + V"reorder_end" +
                      V"section_definition" + V"reorder_section_after") *
                      V"line_completion" --/ function (first) io.write("tailoring: "..first) end
                      ,

    -- Define the basic syntax for collation weighting

    symbol_definition = P"collating-symbol" * V"space"^1 * V"symbol_element",
    symbol_element = V"symbol"-V"symbol_range" + V"symbol_range",
    symbol_range = V"symbol" * P".." * V"symbol",
    symbol = V"simple_symbol" + V"ucs_symbol",
    ucs_symbol = (P"<U"  * V"one_to_eight_digit_hex_string" * P">") +
                 (P"<U-" * V"one_to_eight_digit_hex_string" * P">"),
    simple_symbol = P"<" * V"identifier" * P">",
    collating_element = P"collating-element" * V"space"^1 * V"symbol" * V"space"^1 *
                        P"from" * V"space"^1 * V"quoted_symbol_sequence",
    quoted_symbol_sequence = P'"' * V"simple_weight"^1 * P'"',
    --weight_assignment = V"simple_weight" + V"symbol_weight",
    weight_assignment = V"symbol_weight" + V"simple_weight",
    simple_weight = V"symbol_element" + P"UNDEFINED",
    symbol_weight = V"symbol_element" * V"space"^1 * V"weight_list",
    weight_list = V"level_token" * (V"semicolon" * V"level_token")^0,
    level_token = V"symbol_group" + P"IGNORE",
    symbol_group = V"symbol_element" + V"quoted_symbol_sequence",
    order_end = P"order_end",

    -- Define the tailoring syntax

    reorder_after = P"reorder-after" * V"space"^1 * V"target_symbol",
    target_symbol = V"symbol",
    order_start = P"order_start" * V"space"^1 * V"multiple_level_direction",
    multiple_level_direction = V"direction" * (V"semicolon" * V"direction")^0 * P",position"^-1,
    direction = P"forward" + P"backward",
    reorder_end = P"reorder-end",
    section_definition = V"section_definition_simple" + V"section_definition_list",
    section_definition_simple = P"section" * V"space"^1 * V"section_identifier",
    section_identifier = V"identifier",
    section_definition_list = P"section" * V"space"^1 * V"section_identifier" * V"space"^1 * V"symbol_list",
    symbol_list = V"symbol_element" * (V"semicolon" * V"symbol_element")^0,
    reorder_section_after = P"reorder-section-after" * V"space"^1 * V"section_identifier" * V"space"^1 * V"target_symbol",

    -- Define low-level tokens used by the rest of the syntax

    identifier = (V"letter" + V"digit") * V"id_part"^0,
    id_part = V"letter" + V"digit" + S"-_",
    line_completion = V"space"^0 * V"comment"^-1 * V"EOL",
    comment = V"comment_char" * V"character"^0,
    one_to_eight_digit_hex_string = V"hex_upper"^-8,
    hex_numeric_string = V"hex_upper"^1,
    space = S" \t",
    semicolon = P";",
    comment_char = P"%",
    digit = R"09",
    hex_upper = V"digit" + S"ABCDEF",
    letter = R"az" + R"AZ",
    EOL = P"\n",
    character = 1-V"EOL",
}

f = io.open("iso14651.txt", "r")
tab = f:read("*all")
f:close()

--rules:print()
print(rules:match(tab))

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 12:15             ` Philipp Gesang
@ 2010-10-05 12:39               ` Hans Hagen
  2010-10-05 13:29               ` Thomas A. Schmitz
  1 sibling, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-10-05 12:39 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Philipp Gesang

On 5-10-2010 2:15, Philipp Gesang wrote:

> [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
> [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt

I'll have a look at it when I've time for it (I didn't know that doc; 
it's more fun figuring it out oneself anyway).

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 12:15             ` Philipp Gesang
  2010-10-05 12:39               ` Hans Hagen
@ 2010-10-05 13:29               ` Thomas A. Schmitz
  2010-10-05 21:17                 ` Philipp Gesang
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-05 13:29 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Oct 5, 2010, at 2:15 PM, Philipp Gesang wrote:
> 
> Hi Thomas and others,
> 
> technically speaking the problem is solved by ISO 14651.[1]
> 
> In praxi multilingual sorting depends on local rules, of
> which “One index per script|language.” seems to be the most
> common.

Yes, that's what I was trying to say. In practice, hardly anyone will want an individual index for Spanish  if they have just two Spanish words in an English book. And someone (me) might say that they want three Greek terms in their German index at logical places. 

> 
> Some time ago I made an lpeg from the bnf in [1]. It matches the
> collation rules from [2], but as I couldn’t figure out how to map
> them onto context’s sorting mechanism I never got around to
> actually capture the information. As I won’t be having the time
> to try it with the new structure of sort-lan I guess I’ll just
> attach the peg grammar for anyone to use as a starting point.
> Unicode collation would be great to have in context.
> 
>> transliteration. The problem with polytonic Greek is that so many
>> different unicode characters need to have the same sort entry. If
> 
> Isn’t that just what the Greek rules in sort-lan.lua do? If not
> then it would be a bug.
> 
Oh yes, you're right, I missed that. Thanks for pointing that out!

Thomas

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 13:29               ` Thomas A. Schmitz
@ 2010-10-05 21:17                 ` Philipp Gesang
  2010-10-05 21:27                   ` Hans Hagen
  0 siblings, 1 reply; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 21:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1.1: Type: text/plain, Size: 1781 bytes --]

On 2010-10-05 <15:29:38>, Thomas A. Schmitz wrote:
>                                            And someone (me) might
> say that they want three Greek terms in their German index at
> logical places. 

Try the definitions in the attachment. For three words only they
will be fine. But if the count increases you will soon run into a
situation where it’s not easy to determine where those “logical
places” are. E.g. would you want the letter “υ” under latin “y”
or “u”? Phonologically (might depend on your stance on historical
phonology -- could be a minefield) you might find it reasonable
to treat “ου” as “u” (or “ū” if that matters), but your audience
might expect it at the graphetic location, latin “ou”, instead.
As you can see in the example, when mapping both omega and
omicron onto Latin “o” the result is that “χρῶμα” will appear
before “Χρόνος”, which looks a bit odd.

This ad-hoc solution is troublesome when two words (a German and
a Greek one) occupy the same spot in the search order, like
“Polyneikes” and “Πολυνείκης”. My index output is:

Polyneikes 2
Πολυνείκης 2
Polyneikes 3
Πολυνείκης 3

which should rather be

Polyneikes 2, 3
Πολυνείκης 2, 3

I guess there is some testing going on in order to determine
whether to proceed with the current entry or switch to the next
one. The position is the same, however the comparison with the
last item fails and a new one is created instead. (Only
guessing.)

If you run into this problem you might have to ask Hans for
advice.

Hth,

Philipp


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.1.2: greek-german.tex --]
[-- Type: text/x-tex, Size: 6773 bytes --]


\startluacode
sorters.definitions["de-gr"] = {
    method  = "before",
    replacements = {
        -- German
        { "ä", 'ae' }, { "Ä", 'Ae' },
        { "ö", 'oe' }, { "Ö", 'Oe' },
        { "ü", 'ue' }, { "Ü", 'Ue' },
        { "ß", 'ss' },

        -- Greek
        { "α", "a"  }, { "ά", "a"  }, { "ὰ", "a"  }, { "ᾶ", "a"  }, { "ᾳ", "a"  },
        { "ἀ", "a"  }, { "ἁ", "a"  }, { "ἄ", "a"  }, { "ἂ", "a"  }, { "ἆ", "a"  },
        { "ἁ", "a"  }, { "ἅ", "a"  }, { "ἃ", "a"  }, { "ἇ", "a"  }, { "ᾁ", "a"  },
        { "ᾴ", "a"  }, { "ᾲ", "a"  }, { "ᾷ", "a"  }, { "ᾄ", "a"  }, { "ᾂ", "a"  },
        { "ᾅ", "a"  }, { "ᾃ", "a"  }, { "ᾆ", "a"  }, { "ᾇ", "a"  }, { "β", "b"  },
        { "γ", "g"  }, { "δ", "d"  }, { "ε", "e"  }, { "έ", "e"  }, { "ὲ", "e"  },
        { "ἐ", "e"  }, { "ἔ", "e"  }, { "ἒ", "e"  }, { "ἑ", "e"  }, { "ἕ", "e"  },
        { "ἓ", "e"  }, { "ζ", "z"  }, { "η", "e"  }, { "η", "e"  }, { "ή", "e"  },
        { "ὴ", "e"  }, { "ῆ", "e"  }, { "ῃ", "e"  }, { "ἠ", "e"  }, { "ἤ", "e"  },
        { "ἢ", "e"  }, { "ἦ", "e"  }, { "ᾐ", "e"  }, { "ἡ", "e"  }, { "ἥ", "e"  },
        { "ἣ", "e"  }, { "ἧ", "e"  }, { "ᾑ", "e"  }, { "ῄ", "e"  }, { "ῂ", "e"  },
        { "ῇ", "e"  }, { "ᾔ", "e"  }, { "ᾒ", "e"  }, { "ᾕ", "e"  }, { "ᾓ", "e"  },
        { "ᾖ", "e"  }, { "ᾗ", "e"  }, { "θ", "th" }, { "ι", "i"  }, { "ί", "i"  },
        { "ὶ", "i"  }, { "ῖ", "i"  }, { "ἰ", "i"  }, { "ἴ", "i"  }, { "ἲ", "i"  },
        { "ἶ", "i"  }, { "ἱ", "i"  }, { "ἵ", "i"  }, { "ἳ", "i"  }, { "ἷ", "i"  },
        { "ϊ", "i"  }, { "ΐ", "i"  }, { "ῒ", "i"  }, { "ῗ", "i"  }, { "κ", "k"  },
        { "λ", "l"  }, { "μ", "m"  }, { "ν", "n"  }, { "ξ", "x"  }, { "ο", "o"  },
        { "ό", "o"  }, { "ὸ", "o"  }, { "ὀ", "o"  }, { "ὄ", "o"  }, { "ὂ", "o"  },
        { "ὁ", "o"  }, { "ὅ", "o"  }, { "ὃ", "o"  }, { "π", "p"  }, { "ρ", "r"  },
        { "ῤ", "r"  }, { "ῥ", "r"  }, { "σ", "s"  }, { "ς", "s"  }, { "τ", "t"  },
        { "υ", "y"  }, { "ύ", "y"  }, { "ὺ", "y"  }, { "ῦ", "y"  }, { "ὐ", "y"  },
        { "ὔ", "y"  }, { "ὒ", "y"  }, { "ὖ", "y"  }, { "ὑ", "y"  }, { "ὕ", "y"  },
        { "ὓ", "y"  }, { "ὗ", "y"  }, { "ϋ", "y"  }, { "ΰ", "y"  }, { "ῢ", "y"  },
        { "ῧ", "y"  }, { "φ", "ph" }, { "χ", "ch" }, { "ψ", "ps" }, { "ω", "o"  },
        { "ώ", "o"  }, { "ὼ", "o"  }, { "ῶ", "o"  }, { "ῳ", "o"  }, { "ὠ", "o"  },
        { "ὤ", "o"  }, { "ὢ", "o"  }, { "ὦ", "o"  }, { "ᾠ", "o"  }, { "ὡ", "o"  },
        { "ὥ", "o"  }, { "ὣ", "o"  }, { "ὧ", "o"  }, { "ᾡ", "o"  }, { "ῴ", "o"  },
        { "ῲ", "o"  }, { "ῷ", "o"  }, { "ᾤ", "o"  }, { "ᾢ", "o"  }, { "ᾥ", "o"  },
        { "ᾣ", "o"  }, { "ᾦ", "o"  }, { "ᾧ", "o"  },

        { "Α", "A"  }, { "Ά", "A"  }, { "Ὰ", "A"  }, { "ᾼ", "A"  }, { "Ἀ", "A"  },
        { "Ἁ", "A"  }, { "Ἄ", "A"  }, { "Ἂ", "A"  }, { "Ἆ", "A"  }, { "Ἁ", "A"  },
        { "Ἅ", "A"  }, { "Ἃ", "A"  }, { "Ἇ", "A"  }, { "ᾉ", "A"  }, { "ᾌ", "A"  },
        { "ᾊ", "A"  }, { "ᾍ", "A"  }, { "ᾋ", "A"  }, { "ᾎ", "A"  }, { "ᾏ", "A"  },
        { "Β", "B"  }, { "Γ", "G"  }, { "Δ", "D"  }, { "Ε", "E"  }, { "Έ", "E"  },
        { "Ὲ", "E"  }, { "Ἐ", "E"  }, { "Ἔ", "E"  }, { "Ἒ", "E"  }, { "Ἑ", "E"  },
        { "Ἕ", "E"  }, { "Ἓ", "E"  }, { "Ζ", "Z"  }, { "Η", "E"  }, { "Η", "E"  },
        { "Ή", "E"  }, { "Ὴ", "E"  }, { "ῌ", "E"  }, { "Ἠ", "E"  }, { "Ἤ", "E"  },
        { "Ἢ", "E"  }, { "Ἦ", "E"  }, { "ᾘ", "E"  }, { "Ἡ", "E"  }, { "Ἥ", "E"  },
        { "Ἣ", "E"  }, { "Ἧ", "E"  }, { "ᾙ", "E"  }, { "ᾜ", "E"  }, { "ᾚ", "E"  },
        { "ᾝ", "E"  }, { "ᾛ", "E"  }, { "ᾞ", "E"  }, { "ᾟ", "E"  }, { "Θ", "Th" },
        { "Ι", "I"  }, { "Ί", "I"  }, { "Ὶ", "I"  }, { "Ἰ", "I"  }, { "Ἴ", "I"  },
        { "Ἲ", "I"  }, { "Ἶ", "I"  }, { "Ἱ", "I"  }, { "Ἵ", "I"  }, { "Ἳ", "I"  },
        { "Ἷ", "I"  }, { "Ϊ", "I"  }, { "Κ", "K"  }, { "Λ", "L"  }, { "Μ", "M"  },
        { "Ν", "N"  }, { "Ξ", "X"  }, { "Ο", "O"  }, { "Ό", "O"  }, { "Ὸ", "O"  },
        { "Ὀ", "O"  }, { "Ὄ", "O"  }, { "Ὂ", "O"  }, { "Ὁ", "O"  }, { "Ὅ", "O"  },
        { "Ὃ", "O"  }, { "Π", "P"  }, { "Ρ", "R"  }, { "Ῥ", "R"  }, { "Σ", "S"  },
        { "Σ", "S"  }, { "Τ", "T"  }, { "Υ", "Y"  }, { "Ύ", "Y"  }, { "Ὺ", "Y"  },
        { "Ὑ", "Y"  }, { "Ὕ", "Y"  }, { "Ὓ", "Y"  }, { "Ὗ", "Y"  }, { "Ϋ", "Y"  },
        { "Φ", "Ph" }, { "Χ", "Ch" }, { "Ψ", "Ps" }, { "Ω", "O"  }, { "Ώ", "O"  },
        { "Ὼ", "O"  }, { "ῼ", "O"  }, { "Ὠ", "O"  }, { "Ὤ", "O"  }, { "Ὢ", "O"  },
        { "Ὦ", "O"  }, { "ᾨ", "O"  }, { "Ὡ", "O"  }, { "Ὥ", "O"  }, { "Ὣ", "O"  },
        { "Ὧ", "O"  }, { "ᾩ", "O"  }, { "ᾬ", "O"  }, { "ᾪ", "O"  }, { "ᾭ", "O"  },
        { "ᾫ", "O"  }, { "ᾮ", "O"  }, { "ᾯ", "O"  },
    },

    entries = {
        ["a"] = "a", ["b"] = "b", ["c"] = "c", ["d"] = "d", ["e"] = "e",
        ["f"] = "f", ["g"] = "g", ["h"] = "h", ["i"] = "i", ["j"] = "j",
        ["k"] = "k", ["l"] = "l", ["m"] = "m", ["n"] = "n", ["o"] = "o",
        ["p"] = "p", ["q"] = "q", ["r"] = "r", ["s"] = "s", ["t"] = "t",
        ["u"] = "u", ["v"] = "v", ["w"] = "w", ["x"] = "x", ["y"] = "y",
        ["z"] = "z",
    },
    orders = {
        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
        "k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
        "u", "v", "w", "x", "y", "z",
    },
}

\stopluacode

\unexpanded\def\ind#1{\index{#1} #1}

\setupbodyfont[cmu]

\starttext

\startcolumns[n=3]
  \placeregister[index][language=de-gr,method={zm,pc,uc}]
\stopcolumns

\dorecurse {2} {
  \page\title{Iteration No. \recurselevel}
    
    \ind{Adrastos}      \ind{Ἄδραστος}\par
    \ind{Amphiaraos}    \ind{Ἀμφιάραος}\par
    \ind{Hippomedon}    \ind{Ἱππομέδων}\par
    \ind{Kapaneus}      \ind{Καπανεύς}\par
    \ind{Parthenopaios} \ind{Παρθενοπαῖоς}\par
    \ind{Polyneikes}    \ind{Πολυνείκης}\par
    \ind{Tydeus}        \ind{Τυδεύς}\par

    \ind{ἀναλύειν}    \ind{Ἀναλυτικὰ}
    \ind{analysiert}  \ind{Analytik}  \ind{analytisch}\par

    \ind{ψυχὴ} \ind{Psyche} \ind{psychisch} \par

    \ind{Χρόνος}  \ind{χρόνιος}     \ind{χρῶμα}
    \ind{Chronos} \ind{Chronometer} \ind{chronologisch} \ind{chronisch} 
    \ind{Chrom}   \ind{Chromatik}   \ind{chromatisch}
}

\stoptext


[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 21:17                 ` Philipp Gesang
@ 2010-10-05 21:27                   ` Hans Hagen
  2010-10-05 21:55                     ` Philipp Gesang
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-05 21:27 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Philipp Gesang

On 5-10-2010 11:17, Philipp Gesang wrote:

> I guess there is some testing going on in order to determine
> whether to proceed with the current entry or switch to the next
> one. The position is the same, however the comparison with the
> last item fails and a new one is created instead. (Only
> guessing.)

it's a sequence of tests per comparison, like

Polyneikes
     polyneikes % lowercased
     polyneikes % shapes
     Polyneikes % unicode

Πολυνείκης
     Πολυνείκης % lowercased
     polyneikes % shapes
     Πολυνείκης % unicode

casing and shapes depends on the mapping vectors and the order can be 
influenced, you can see this in action with

\enabletrackers[sorters.tests]

Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 21:27                   ` Hans Hagen
@ 2010-10-05 21:55                     ` Philipp Gesang
  2010-10-06  7:50                       ` Hans Hagen
  0 siblings, 1 reply; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 21:55 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1670 bytes --]

On 2010-10-05 <23:27:33>, Hans Hagen wrote:
> On 5-10-2010 11:17, Philipp Gesang wrote:
> 
> >I guess there is some testing going on in order to determine
> >whether to proceed with the current entry or switch to the next
> >one. The position is the same, however the comparison with the
> >last item fails and a new one is created instead. (Only
> >guessing.)
> 
> it's a sequence of tests per comparison, like
> 
> Polyneikes
>     polyneikes % lowercased
>     polyneikes % shapes

I assume by “shapes” you mean the base symbol (all diacritics
stripped).

>     Polyneikes % unicode
> 
> Πολυνείκης
>     Πολυνείκης % lowercased
>     polyneikes % shapes
>     Πολυνείκης % unicode
> 
> casing and shapes depends on the mapping vectors and the order can
> be influenced, you can see this in action with
> 
> \enabletrackers[sorters.tests]

Bingo! The tracker instantly revealed a really nasty flaw in the
German standard transcription for Greek: “υ” is normally
converted to Latin “y”, but is retained as Latin “u” in
diphthongs like “ευ” and “ηυ”. So with the sorting definition I
posted I get amongst the results:

sorters         > Kapaneys > Kapaneus

because all “υ” are lazily mapped to “y”. Thus, for those
occasional three words per book, determining the sorting position
by hand (e.g. “\index[Kapaneus]{Καπανεύς}”) might be less prone
to error.

Thanks for the hint and sorry for posting a non-solution,

Philipp


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: two buglets
  2010-10-05 21:55                     ` Philipp Gesang
@ 2010-10-06  7:50                       ` Hans Hagen
  0 siblings, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-10-06  7:50 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Philipp Gesang

On 5-10-2010 11:55, Philipp Gesang wrote:

> I assume by “shapes” you mean the base symbol (all diacritics
> stripped).

indeed (and we might need to add/patch a few more shcodes to 
char-def.lua if needed)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-10-06  7:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-11 15:52 two buglets Thomas A. Schmitz
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35   ` Thomas A. Schmitz
2010-02-11 19:29     ` Hans Hagen
2010-02-11 21:27       ` Thomas A. Schmitz
2010-02-11 18:14   ` David Rogers
2010-10-03  8:24   ` Thomas A. Schmitz
2010-10-03 10:29     ` Hans Hagen
2010-10-03 10:58       ` Thomas A. Schmitz
2010-10-03 15:10         ` Hans Hagen
2010-10-03 15:43           ` Thomas A. Schmitz
2010-10-05 12:15             ` Philipp Gesang
2010-10-05 12:39               ` Hans Hagen
2010-10-05 13:29               ` Thomas A. Schmitz
2010-10-05 21:17                 ` Philipp Gesang
2010-10-05 21:27                   ` Hans Hagen
2010-10-05 21:55                     ` Philipp Gesang
2010-10-06  7:50                       ` Hans Hagen
2010-02-11 17:19 ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).