* two buglets
@ 2010-02-11 15:52 Thomas A. Schmitz
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:19 ` Hans Hagen
0 siblings, 2 replies; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 15:52 UTC (permalink / raw)
To: mailing ConTeXt users list for
Hi all,
working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs):
1. index sorts uppercase letters after lowercase letters. Minimal example:
\starttext
\index{Aardvark}Aardvark
\index{azygous}azygous
\page
\setupregister[index][n=1]
\placeregister[index]
\stoptext
I would expect azygous to follow Aardvark, but it is sorted before.
2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:
! LuaTeX error ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: attempt to compare nil with number
stack traceback:
...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: in function <...text/tex/texmf-context/tex/context/base/bibl-tra.lua:76>
[C]: in function 'sort'
...text/tex/texmf-context/tex/context/base/bibl-tra.lua:84: in function 'flush'
<main ctx instance>:1: in main chunk.
\typesetpubslist ...hacks.flush("\@@pbsorttype ")}
\doendoflist
\dodoplacepublications ...sttrue \typesetpubslist
\inpublistfalse \endgroup ...
l.37 \placepublications[criterium=all]
minimal example (the typo \cite[clarke199] instead of \cite[clarke1999a] is there on purpose to demonstrate the problem):
\setuppublications[state=start,
sorttype=bbl,
refcommand=authornum,
numbering=yes]
\setuppublicationlist[samplesize={VSdK90},totalnumber=2]
\startpublication[k=champion2004,t=book,
a={{Champion}},y=2004,
n=10,s=Cha04]
\author[]{Craige~B.}[C.~B.]{}{Champion}
\pubyear{2004}
\title{Cultural Politics in Polybius's {\em Histories}}
\city{Berkeley}
\pubname{Univ. of California Pr.}
\stoppublication
\startpublication[k=clarke1999a,t=book,
a={{Clarke}},y=1999b,
n=9,s=Cla99b]
\author[]{Katherine}[K.]{}{Clarke}
\pubyear{1999\maybeyear{b}}
\title{Between Geography and History: Hellenistic Constructions of the Roman
World}
\city{Oxford}
\pubname{Oxford UP}
\stoppublication
\starttext
\cite[champion2004]
\cite[clarke199]
\page
\placepublications[criterium=all]
\stoptext
Could this error be handled more gracefully, i.e. intercepted?
All best
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 15:52 two buglets Thomas A. Schmitz
@ 2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35 ` Thomas A. Schmitz
` (2 more replies)
2010-02-11 17:19 ` Hans Hagen
1 sibling, 3 replies; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 17:17 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz
On 11-2-2010 16:52, Thomas A. Schmitz wrote:
> Hi all,
>
> working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs):
>
> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>
> \starttext
>
> \index{Aardvark}Aardvark
>
> \index{azygous}azygous
>
> \page
>
> \setupregister[index][n=1]
> \placeregister[index]
>
> \stoptext
>
> I would expect azygous to follow Aardvark, but it is sorted before.
are you sure that that's the convention for english? it's easy to change
it ...
\startluacode
sorters.mappings['en'] = {
["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10,
["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
["z"] = 52,
["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9,
["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
["Z"] = 51,
}
\stopluacode
\starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
\stoptext
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 15:52 two buglets Thomas A. Schmitz
2010-02-11 17:17 ` Hans Hagen
@ 2010-02-11 17:19 ` Hans Hagen
1 sibling, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 17:19 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz
On 11-2-2010 16:52, Thomas A. Schmitz wrote:
> 2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:
so what do you expect? to drop that entry? or else, what default key to
use?
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 17:17 ` Hans Hagen
@ 2010-02-11 17:35 ` Thomas A. Schmitz
2010-02-11 19:29 ` Hans Hagen
2010-02-11 18:14 ` David Rogers
2010-10-03 8:24 ` Thomas A. Schmitz
2 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 17:35 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
> are you sure that that's the convention for english? it's easy to change it ...
>
> \startluacode
> sorters.mappings['en'] = {
> ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10,
> ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
> ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
> ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
> ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
> ["z"] = 52,
> ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9,
> ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
> ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
> ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
> ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
> ["Z"] = 51,
> }
> \stopluacode
>
> \starttext
> \index{Aardvark}Aardvark \par
> \index{azygous}azygous
> \placeregister[index][n=1]
> \stoptext
No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous -> aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried
\startluacode
sorters.mappings['en'] = {
["a"] = 1, ["b"] = 2, ["c"] = 3, ["d"] = 4, ["e"] = 5,
["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10,
["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15,
["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20,
["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25,
["z"] = 26,
}
\stopluacode
but that didn't work.
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35 ` Thomas A. Schmitz
@ 2010-02-11 18:14 ` David Rogers
2010-10-03 8:24 ` Thomas A. Schmitz
2 siblings, 0 replies; 19+ messages in thread
From: David Rogers @ 2010-02-11 18:14 UTC (permalink / raw)
To: mailing list for ConTeXt users
* Hans Hagen <pragma@wxs.nl> [2010-02-11 18:17]:
>are you sure that that's the convention for english? it's easy to
>change it ...
I've never seen an ordinary English index that was sorted by case.
English indexes should definitely default to case-insensitive.
(Has anyone here ever been asked for an index in English sorted by
case?)
--
David
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 17:35 ` Thomas A. Schmitz
@ 2010-02-11 19:29 ` Hans Hagen
2010-02-11 21:27 ` Thomas A. Schmitz
0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-02-11 19:29 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz
On 11-2-2010 18:35, Thomas A. Schmitz wrote:
>
> On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
>
>> are you sure that that's the convention for english? it's easy to change it ...
>>
>> \startluacode
>> sorters.mappings['en'] = {
>> ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10,
>> ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>> ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>> ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>> ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>> ["z"] = 52,
>> ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9,
>> ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>> ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>> ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>> ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>> ["Z"] = 51,
>> }
>> \stopluacode
>>
>> \starttext
>> \index{Aardvark}Aardvark \par
>> \index{azygous}azygous
>> \placeregister[index][n=1]
>> \stoptext
>
> No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous -> aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried
>
> \startluacode
> sorters.mappings['en'] = {
> ["a"] = 1, ["b"] = 2, ["c"] = 3, ["d"] = 4, ["e"] = 5,
> ["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10,
> ["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15,
> ["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20,
> ["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25,
> ["z"] = 26,
> }
> \stopluacode
>
> but that didn't work.
just give them the same code, so "A"=1, "a"=1
(we could make that an option: upper first, lower first, mixed)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 19:29 ` Hans Hagen
@ 2010-02-11 21:27 ` Thomas A. Schmitz
0 siblings, 0 replies; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-02-11 21:27 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Feb 11, 2010, at 8:29 PM, Hans Hagen wrote:
> just give them the same code, so "A"=1, "a"=1
>
> (we could make that an option: upper first, lower first, mixed)
>
> Hans
Thank you, Hans, that works nicely! It would be good to have this as an option. And I would vote for having the "mixed" setting as default. I wasn't even aware that there were indexes that sort according to case.
All best
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35 ` Thomas A. Schmitz
2010-02-11 18:14 ` David Rogers
@ 2010-10-03 8:24 ` Thomas A. Schmitz
2010-10-03 10:29 ` Hans Hagen
2 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03 8:24 UTC (permalink / raw)
To: mailing list for ConTeXt users
Hi all, Hans,
On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
>> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>>
>> \starttext
>>
>> \index{Aardvark}Aardvark
>>
>> \index{azygous}azygous
>>
>> \page
>>
>> \setupregister[index][n=1]
>> \placeregister[index]
>>
>> \stoptext
>>
>> I would expect azygous to follow Aardvark, but it is sorted before.
>
>
> are you sure that that's the convention for english? it's easy to change it ...
>
> \startluacode
> sorters.mappings['en'] = {
> ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10,
> ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
> ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
> ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
> ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
> ["z"] = 52,
> ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9,
> ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
> ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
> ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
> ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
> ["Z"] = 51,
> }
> \stopluacode
>
> \starttext
> \index{Aardvark}Aardvark \par
> \index{azygous}azygous
> \placeregister[index][n=1]
> \stoptext
>
we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions:
1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now?
2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
All best
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-03 8:24 ` Thomas A. Schmitz
@ 2010-10-03 10:29 ` Hans Hagen
2010-10-03 10:58 ` Thomas A. Schmitz
0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-03 10:29 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Thomas A. Schmitz
On 3-10-2010 10:24, Thomas A. Schmitz wrote:
> Hi all, Hans,
>
>
> On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
>
>>> 1. index sorts uppercase letters after lowercase letters. Minimal example:
>>>
>>> \starttext
>>>
>>> \index{Aardvark}Aardvark
>>>
>>> \index{azygous}azygous
>>>
>>> \page
>>>
>>> \setupregister[index][n=1]
>>> \placeregister[index]
>>>
>>> \stoptext
>>>
>>> I would expect azygous to follow Aardvark, but it is sorted before.
>>
>>
>> are you sure that that's the convention for english? it's easy to change it ...
>>
>> \startluacode
>> sorters.mappings['en'] = {
>> ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10,
>> ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20,
>> ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30,
>> ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40,
>> ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50,
>> ["z"] = 52,
>> ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9,
>> ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19,
>> ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29,
>> ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39,
>> ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49,
>> ["Z"] = 51,
>> }
>> \stopluacode
>>
>> \starttext
>> \index{Aardvark}Aardvark \par
>> \index{azygous}azygous
>> \placeregister[index][n=1]
>> \stoptext
>>
>
> we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions:
>
> 1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now?
indeed, and in a nice obscure way ...
\setuplayout[topspace=1cm,height=middle]
\setupbodyfont[11pt]
\starttext
\def\Test#1%
{\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3]
\Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
\Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
\Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
\stopcolumns
\page
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3]
\Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
\Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
\Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
\stopcolumns
\page
\dorecurse {2} {
\page \recurselevel:
\index{oá} \index{öb} \index{Oč} \index{Öď}
\index{oo} \index{öo} \index{Oo} \index{Öo}
\index{Öq} \index{öř} \index{Oš} \index{oů}
done
}
\stoptext
> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
Currently Jano and I are figuring out some details (as Jano does the
testing with more complex multilingual indices).
I have no preferece ... we can configure each language independently
using the method key in the entries in sort-lan.lua As I seldom consult
an index I have no clue what to expect or default to so feel free to
tell me what the defaults should be. We now have predefined:
local predefinedmethods = {
[variables.before] = "mm,mc,uc",
[variables.after] = "pm,mc,uc",
[variables.first] = "pc,mm,uc",
[variables.last] = "mc,mm,uc",
}
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-03 10:29 ` Hans Hagen
@ 2010-10-03 10:58 ` Thomas A. Schmitz
2010-10-03 15:10 ` Hans Hagen
0 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03 10:58 UTC (permalink / raw)
To: Hans Hagen; +Cc: mailing list for ConTeXt users
On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
> indeed, and in a nice obscure way ...
>
> \setuplayout[topspace=1cm,height=middle]
>
> \setupbodyfont[11pt]
>
> \starttext
>
> \def\Test#1%
> {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
>
> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>
> \startcolumns[n=3]
> \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
> \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
> \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
> \stopcolumns
>
> \page
>
> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>
> \startcolumns[n=3]
> \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
> \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
> \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
> \stopcolumns
>
> \page
>
> \dorecurse {2} {
> \page \recurselevel:
> \index{oá} \index{öb} \index{Oč} \index{Öď}
> \index{oo} \index{öo} \index{Oo} \index{Öo}
> \index{Öq} \index{öř} \index{Oš} \index{oů}
> done
> }
>
> \stoptext
Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with
method=zm,pc,uc
but I also get spurious empty lines in the index. I'll try and come up with a minimal example.
>
>> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
>
> Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
>
> I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
>
> local predefinedmethods = {
> [variables.before] = "mm,mc,uc",
> [variables.after] = "pm,mc,uc",
> [variables.first] = "pc,mm,uc",
> [variables.last] = "mc,mm,uc",
> }
Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.
THanks, and all best
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-03 10:58 ` Thomas A. Schmitz
@ 2010-10-03 15:10 ` Hans Hagen
2010-10-03 15:43 ` Thomas A. Schmitz
0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-03 15:10 UTC (permalink / raw)
To: Thomas A. Schmitz; +Cc: mailing list for ConTeXt users
On 3-10-2010 12:58, Thomas A. Schmitz wrote:
>
> On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
>
>> indeed, and in a nice obscure way ...
>>
>> \setuplayout[topspace=1cm,height=middle]
>>
>> \setupbodyfont[11pt]
>>
>> \starttext
>>
>> \def\Test#1%
>> {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>> \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
>> \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
>> \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
>> \stopcolumns
>>
>> \page
>>
>> wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
>>
>> \startcolumns[n=3]
>> \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
>> \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
>> \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
>> \stopcolumns
>>
>> \page
>>
>> \dorecurse {2} {
>> \page \recurselevel:
>> \index{oá} \index{öb} \index{Oč} \index{Öď}
>> \index{oo} \index{öo} \index{Oo} \index{Öo}
>> \index{Öq} \index{öř} \index{Oš} \index{oů}
>> done
>> }
>>
>> \stoptext
>
> Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with
> method=zm,pc,uc
> but I also get spurious empty lines in the index. I'll try and come up with a minimal example.
mm zm pm : use mapping order, add -1,0, +1 to different case and use
shape info for missing entries (similar shapes)
mc zc pc : use mapping order, add -1,0, +1 to different case
uc: unicode order
so, you define a sequence of comparisons where for instance
U -> order u +/- 1
\"u -> order of shape u +/- 1
etc .. a bit cryptic I admit ... some combinations give the same result
depending on the vectors used. (Jano promissed to write up something.)
numbers are sorted in a special way
so, at some point we simplify characters and start looking at shapes and
sort based on shapes which of course leads to clashes so in a next step
we look at unicodes etc etc
>>> 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
>>
>> Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
>>
>> I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
>>
>> local predefinedmethods = {
>> [variables.before] = "mm,mc,uc",
>> [variables.after] = "pm,mc,uc",
>> [variables.first] = "pc,mm,uc",
>> [variables.last] = "mc,mm,uc",
>> }
>
> Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.
best would be to have a test file per language with in comments the
expected order; such tests should also provide foreign entries
for instance, how would you mix german and greek in your books; we
probably need some specialized vectors then, which is possible as the
sorting language can be configured independent from the text language
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-03 15:10 ` Hans Hagen
@ 2010-10-03 15:43 ` Thomas A. Schmitz
2010-10-05 12:15 ` Philipp Gesang
0 siblings, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-03 15:43 UTC (permalink / raw)
To: Hans Hagen; +Cc: mailing list for ConTeXt users
On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote:
>
> mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes)
> mc zc pc : use mapping order, add -1,0, +1 to different case
> uc: unicode order
>
> so, you define a sequence of comparisons where for instance
>
> U -> order u +/- 1
> \"u -> order of shape u +/- 1
>
> etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.)
>
> numbers are sorted in a special way
>
> so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc
>
OK, that makes sense. I'll play with it, but having a few choice pages on the wiki would be great!
>>>>
>
> best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries
>
> for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language
>
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis.
For Greek: I just looked at a dozen books here on my shelf. Most English books have a separate index for Greek terms; when they sort Greek terms with English words, they use transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If I ever see the necessity of setting this up, I'll be in touch off-list, but it's such an unusual thing that I think you shouldn't bother now.
All best
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-03 15:43 ` Thomas A. Schmitz
@ 2010-10-05 12:15 ` Philipp Gesang
2010-10-05 12:39 ` Hans Hagen
2010-10-05 13:29 ` Thomas A. Schmitz
0 siblings, 2 replies; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 12:15 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1.1: Type: text/plain, Size: 2291 bytes --]
On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
> OK, I'll write something for German and English, but the thing
> is that we need more input what users expect. For mixtures with
> foreign languages, there might not be generally accepted rules at
> all, so people will define something on an ad-hoc basis.
Hi Thomas and others,
technically speaking the problem is solved by ISO 14651.[1]
In praxi multilingual sorting depends on local rules, of
which “One index per script|language.” seems to be the most
common.
Some time ago I made an lpeg from the bnf in [1]. It matches the
collation rules from [2], but as I couldn’t figure out how to map
them onto context’s sorting mechanism I never got around to
actually capture the information. As I won’t be having the time
to try it with the new structure of sort-lan I guess I’ll just
attach the peg grammar for anyone to use as a starting point.
Unicode collation would be great to have in context.
> transliteration. The problem with polytonic Greek is that so many
> different unicode characters need to have the same sort entry. If
Isn’t that just what the Greek rules in sort-lan.lua do? If not
then it would be a bug.
····startsnippet·················································
definitions["gr"] = {
entries = {
["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α",
["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α",
["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α",
["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α",
["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β",
····stopsnippet··················································
Always nice to have a decent discussion on sorting ;)
Philipp
[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.1.2: iso14651-parser.lua --]
[-- Type: text/plain, Size: 3747 bytes --]
require "lpeg"
local C, Cs, Ct, P, R, S, V, match = lpeg.C, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.match
local iso_parser
rules = P{
[1] = "weight_table",
-- Define collation tables as sequences of lines
weight_table = V"common_template_table" + V"tailored_table",
common_template_table = V"simple_line"^0,
tailored_table = V"table_line"^0,
-- Define the line types
simple_line = (V"symbol_definition" + V"collating_element" +
V"weight_assignment" + V"order_end")^-1 * V"line_completion" --/ function (first) io.write("simple: "..first) end
,
--table_line = V"simple_line" + V"tailoring_line",
table_line = V"tailoring_line" + V"simple_line",
tailoring_line = (V"reorder_after" + V"order_start" + V"reorder_end" +
V"section_definition" + V"reorder_section_after") *
V"line_completion" --/ function (first) io.write("tailoring: "..first) end
,
-- Define the basic syntax for collation weighting
symbol_definition = P"collating-symbol" * V"space"^1 * V"symbol_element",
symbol_element = V"symbol"-V"symbol_range" + V"symbol_range",
symbol_range = V"symbol" * P".." * V"symbol",
symbol = V"simple_symbol" + V"ucs_symbol",
ucs_symbol = (P"<U" * V"one_to_eight_digit_hex_string" * P">") +
(P"<U-" * V"one_to_eight_digit_hex_string" * P">"),
simple_symbol = P"<" * V"identifier" * P">",
collating_element = P"collating-element" * V"space"^1 * V"symbol" * V"space"^1 *
P"from" * V"space"^1 * V"quoted_symbol_sequence",
quoted_symbol_sequence = P'"' * V"simple_weight"^1 * P'"',
--weight_assignment = V"simple_weight" + V"symbol_weight",
weight_assignment = V"symbol_weight" + V"simple_weight",
simple_weight = V"symbol_element" + P"UNDEFINED",
symbol_weight = V"symbol_element" * V"space"^1 * V"weight_list",
weight_list = V"level_token" * (V"semicolon" * V"level_token")^0,
level_token = V"symbol_group" + P"IGNORE",
symbol_group = V"symbol_element" + V"quoted_symbol_sequence",
order_end = P"order_end",
-- Define the tailoring syntax
reorder_after = P"reorder-after" * V"space"^1 * V"target_symbol",
target_symbol = V"symbol",
order_start = P"order_start" * V"space"^1 * V"multiple_level_direction",
multiple_level_direction = V"direction" * (V"semicolon" * V"direction")^0 * P",position"^-1,
direction = P"forward" + P"backward",
reorder_end = P"reorder-end",
section_definition = V"section_definition_simple" + V"section_definition_list",
section_definition_simple = P"section" * V"space"^1 * V"section_identifier",
section_identifier = V"identifier",
section_definition_list = P"section" * V"space"^1 * V"section_identifier" * V"space"^1 * V"symbol_list",
symbol_list = V"symbol_element" * (V"semicolon" * V"symbol_element")^0,
reorder_section_after = P"reorder-section-after" * V"space"^1 * V"section_identifier" * V"space"^1 * V"target_symbol",
-- Define low-level tokens used by the rest of the syntax
identifier = (V"letter" + V"digit") * V"id_part"^0,
id_part = V"letter" + V"digit" + S"-_",
line_completion = V"space"^0 * V"comment"^-1 * V"EOL",
comment = V"comment_char" * V"character"^0,
one_to_eight_digit_hex_string = V"hex_upper"^-8,
hex_numeric_string = V"hex_upper"^1,
space = S" \t",
semicolon = P";",
comment_char = P"%",
digit = R"09",
hex_upper = V"digit" + S"ABCDEF",
letter = R"az" + R"AZ",
EOL = P"\n",
character = 1-V"EOL",
}
f = io.open("iso14651.txt", "r")
tab = f:read("*all")
f:close()
--rules:print()
print(rules:match(tab))
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 486 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 12:15 ` Philipp Gesang
@ 2010-10-05 12:39 ` Hans Hagen
2010-10-05 13:29 ` Thomas A. Schmitz
1 sibling, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-10-05 12:39 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Philipp Gesang
On 5-10-2010 2:15, Philipp Gesang wrote:
> [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
> [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt
I'll have a look at it when I've time for it (I didn't know that doc;
it's more fun figuring it out oneself anyway).
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 12:15 ` Philipp Gesang
2010-10-05 12:39 ` Hans Hagen
@ 2010-10-05 13:29 ` Thomas A. Schmitz
2010-10-05 21:17 ` Philipp Gesang
1 sibling, 1 reply; 19+ messages in thread
From: Thomas A. Schmitz @ 2010-10-05 13:29 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Oct 5, 2010, at 2:15 PM, Philipp Gesang wrote:
>
> Hi Thomas and others,
>
> technically speaking the problem is solved by ISO 14651.[1]
>
> In praxi multilingual sorting depends on local rules, of
> which “One index per script|language.” seems to be the most
> common.
Yes, that's what I was trying to say. In practice, hardly anyone will want an individual index for Spanish if they have just two Spanish words in an English book. And someone (me) might say that they want three Greek terms in their German index at logical places.
>
> Some time ago I made an lpeg from the bnf in [1]. It matches the
> collation rules from [2], but as I couldn’t figure out how to map
> them onto context’s sorting mechanism I never got around to
> actually capture the information. As I won’t be having the time
> to try it with the new structure of sort-lan I guess I’ll just
> attach the peg grammar for anyone to use as a starting point.
> Unicode collation would be great to have in context.
>
>> transliteration. The problem with polytonic Greek is that so many
>> different unicode characters need to have the same sort entry. If
>
> Isn’t that just what the Greek rules in sort-lan.lua do? If not
> then it would be a bug.
>
Oh yes, you're right, I missed that. Thanks for pointing that out!
Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 13:29 ` Thomas A. Schmitz
@ 2010-10-05 21:17 ` Philipp Gesang
2010-10-05 21:27 ` Hans Hagen
0 siblings, 1 reply; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 21:17 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1.1: Type: text/plain, Size: 1781 bytes --]
On 2010-10-05 <15:29:38>, Thomas A. Schmitz wrote:
> And someone (me) might
> say that they want three Greek terms in their German index at
> logical places.
Try the definitions in the attachment. For three words only they
will be fine. But if the count increases you will soon run into a
situation where it’s not easy to determine where those “logical
places” are. E.g. would you want the letter “υ” under latin “y”
or “u”? Phonologically (might depend on your stance on historical
phonology -- could be a minefield) you might find it reasonable
to treat “ου” as “u” (or “ū” if that matters), but your audience
might expect it at the graphetic location, latin “ou”, instead.
As you can see in the example, when mapping both omega and
omicron onto Latin “o” the result is that “χρῶμα” will appear
before “Χρόνος”, which looks a bit odd.
This ad-hoc solution is troublesome when two words (a German and
a Greek one) occupy the same spot in the search order, like
“Polyneikes” and “Πολυνείκης”. My index output is:
Polyneikes 2
Πολυνείκης 2
Polyneikes 3
Πολυνείκης 3
which should rather be
Polyneikes 2, 3
Πολυνείκης 2, 3
I guess there is some testing going on in order to determine
whether to proceed with the current entry or switch to the next
one. The position is the same, however the comparison with the
last item fails and a new one is created instead. (Only
guessing.)
If you run into this problem you might have to ask Hans for
advice.
Hth,
Philipp
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.1.2: greek-german.tex --]
[-- Type: text/x-tex, Size: 6773 bytes --]
\startluacode
sorters.definitions["de-gr"] = {
method = "before",
replacements = {
-- German
{ "ä", 'ae' }, { "Ä", 'Ae' },
{ "ö", 'oe' }, { "Ö", 'Oe' },
{ "ü", 'ue' }, { "Ü", 'Ue' },
{ "ß", 'ss' },
-- Greek
{ "α", "a" }, { "ά", "a" }, { "ὰ", "a" }, { "ᾶ", "a" }, { "ᾳ", "a" },
{ "ἀ", "a" }, { "ἁ", "a" }, { "ἄ", "a" }, { "ἂ", "a" }, { "ἆ", "a" },
{ "ἁ", "a" }, { "ἅ", "a" }, { "ἃ", "a" }, { "ἇ", "a" }, { "ᾁ", "a" },
{ "ᾴ", "a" }, { "ᾲ", "a" }, { "ᾷ", "a" }, { "ᾄ", "a" }, { "ᾂ", "a" },
{ "ᾅ", "a" }, { "ᾃ", "a" }, { "ᾆ", "a" }, { "ᾇ", "a" }, { "β", "b" },
{ "γ", "g" }, { "δ", "d" }, { "ε", "e" }, { "έ", "e" }, { "ὲ", "e" },
{ "ἐ", "e" }, { "ἔ", "e" }, { "ἒ", "e" }, { "ἑ", "e" }, { "ἕ", "e" },
{ "ἓ", "e" }, { "ζ", "z" }, { "η", "e" }, { "η", "e" }, { "ή", "e" },
{ "ὴ", "e" }, { "ῆ", "e" }, { "ῃ", "e" }, { "ἠ", "e" }, { "ἤ", "e" },
{ "ἢ", "e" }, { "ἦ", "e" }, { "ᾐ", "e" }, { "ἡ", "e" }, { "ἥ", "e" },
{ "ἣ", "e" }, { "ἧ", "e" }, { "ᾑ", "e" }, { "ῄ", "e" }, { "ῂ", "e" },
{ "ῇ", "e" }, { "ᾔ", "e" }, { "ᾒ", "e" }, { "ᾕ", "e" }, { "ᾓ", "e" },
{ "ᾖ", "e" }, { "ᾗ", "e" }, { "θ", "th" }, { "ι", "i" }, { "ί", "i" },
{ "ὶ", "i" }, { "ῖ", "i" }, { "ἰ", "i" }, { "ἴ", "i" }, { "ἲ", "i" },
{ "ἶ", "i" }, { "ἱ", "i" }, { "ἵ", "i" }, { "ἳ", "i" }, { "ἷ", "i" },
{ "ϊ", "i" }, { "ΐ", "i" }, { "ῒ", "i" }, { "ῗ", "i" }, { "κ", "k" },
{ "λ", "l" }, { "μ", "m" }, { "ν", "n" }, { "ξ", "x" }, { "ο", "o" },
{ "ό", "o" }, { "ὸ", "o" }, { "ὀ", "o" }, { "ὄ", "o" }, { "ὂ", "o" },
{ "ὁ", "o" }, { "ὅ", "o" }, { "ὃ", "o" }, { "π", "p" }, { "ρ", "r" },
{ "ῤ", "r" }, { "ῥ", "r" }, { "σ", "s" }, { "ς", "s" }, { "τ", "t" },
{ "υ", "y" }, { "ύ", "y" }, { "ὺ", "y" }, { "ῦ", "y" }, { "ὐ", "y" },
{ "ὔ", "y" }, { "ὒ", "y" }, { "ὖ", "y" }, { "ὑ", "y" }, { "ὕ", "y" },
{ "ὓ", "y" }, { "ὗ", "y" }, { "ϋ", "y" }, { "ΰ", "y" }, { "ῢ", "y" },
{ "ῧ", "y" }, { "φ", "ph" }, { "χ", "ch" }, { "ψ", "ps" }, { "ω", "o" },
{ "ώ", "o" }, { "ὼ", "o" }, { "ῶ", "o" }, { "ῳ", "o" }, { "ὠ", "o" },
{ "ὤ", "o" }, { "ὢ", "o" }, { "ὦ", "o" }, { "ᾠ", "o" }, { "ὡ", "o" },
{ "ὥ", "o" }, { "ὣ", "o" }, { "ὧ", "o" }, { "ᾡ", "o" }, { "ῴ", "o" },
{ "ῲ", "o" }, { "ῷ", "o" }, { "ᾤ", "o" }, { "ᾢ", "o" }, { "ᾥ", "o" },
{ "ᾣ", "o" }, { "ᾦ", "o" }, { "ᾧ", "o" },
{ "Α", "A" }, { "Ά", "A" }, { "Ὰ", "A" }, { "ᾼ", "A" }, { "Ἀ", "A" },
{ "Ἁ", "A" }, { "Ἄ", "A" }, { "Ἂ", "A" }, { "Ἆ", "A" }, { "Ἁ", "A" },
{ "Ἅ", "A" }, { "Ἃ", "A" }, { "Ἇ", "A" }, { "ᾉ", "A" }, { "ᾌ", "A" },
{ "ᾊ", "A" }, { "ᾍ", "A" }, { "ᾋ", "A" }, { "ᾎ", "A" }, { "ᾏ", "A" },
{ "Β", "B" }, { "Γ", "G" }, { "Δ", "D" }, { "Ε", "E" }, { "Έ", "E" },
{ "Ὲ", "E" }, { "Ἐ", "E" }, { "Ἔ", "E" }, { "Ἒ", "E" }, { "Ἑ", "E" },
{ "Ἕ", "E" }, { "Ἓ", "E" }, { "Ζ", "Z" }, { "Η", "E" }, { "Η", "E" },
{ "Ή", "E" }, { "Ὴ", "E" }, { "ῌ", "E" }, { "Ἠ", "E" }, { "Ἤ", "E" },
{ "Ἢ", "E" }, { "Ἦ", "E" }, { "ᾘ", "E" }, { "Ἡ", "E" }, { "Ἥ", "E" },
{ "Ἣ", "E" }, { "Ἧ", "E" }, { "ᾙ", "E" }, { "ᾜ", "E" }, { "ᾚ", "E" },
{ "ᾝ", "E" }, { "ᾛ", "E" }, { "ᾞ", "E" }, { "ᾟ", "E" }, { "Θ", "Th" },
{ "Ι", "I" }, { "Ί", "I" }, { "Ὶ", "I" }, { "Ἰ", "I" }, { "Ἴ", "I" },
{ "Ἲ", "I" }, { "Ἶ", "I" }, { "Ἱ", "I" }, { "Ἵ", "I" }, { "Ἳ", "I" },
{ "Ἷ", "I" }, { "Ϊ", "I" }, { "Κ", "K" }, { "Λ", "L" }, { "Μ", "M" },
{ "Ν", "N" }, { "Ξ", "X" }, { "Ο", "O" }, { "Ό", "O" }, { "Ὸ", "O" },
{ "Ὀ", "O" }, { "Ὄ", "O" }, { "Ὂ", "O" }, { "Ὁ", "O" }, { "Ὅ", "O" },
{ "Ὃ", "O" }, { "Π", "P" }, { "Ρ", "R" }, { "Ῥ", "R" }, { "Σ", "S" },
{ "Σ", "S" }, { "Τ", "T" }, { "Υ", "Y" }, { "Ύ", "Y" }, { "Ὺ", "Y" },
{ "Ὑ", "Y" }, { "Ὕ", "Y" }, { "Ὓ", "Y" }, { "Ὗ", "Y" }, { "Ϋ", "Y" },
{ "Φ", "Ph" }, { "Χ", "Ch" }, { "Ψ", "Ps" }, { "Ω", "O" }, { "Ώ", "O" },
{ "Ὼ", "O" }, { "ῼ", "O" }, { "Ὠ", "O" }, { "Ὤ", "O" }, { "Ὢ", "O" },
{ "Ὦ", "O" }, { "ᾨ", "O" }, { "Ὡ", "O" }, { "Ὥ", "O" }, { "Ὣ", "O" },
{ "Ὧ", "O" }, { "ᾩ", "O" }, { "ᾬ", "O" }, { "ᾪ", "O" }, { "ᾭ", "O" },
{ "ᾫ", "O" }, { "ᾮ", "O" }, { "ᾯ", "O" },
},
entries = {
["a"] = "a", ["b"] = "b", ["c"] = "c", ["d"] = "d", ["e"] = "e",
["f"] = "f", ["g"] = "g", ["h"] = "h", ["i"] = "i", ["j"] = "j",
["k"] = "k", ["l"] = "l", ["m"] = "m", ["n"] = "n", ["o"] = "o",
["p"] = "p", ["q"] = "q", ["r"] = "r", ["s"] = "s", ["t"] = "t",
["u"] = "u", ["v"] = "v", ["w"] = "w", ["x"] = "x", ["y"] = "y",
["z"] = "z",
},
orders = {
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
"u", "v", "w", "x", "y", "z",
},
}
\stopluacode
\unexpanded\def\ind#1{\index{#1} #1}
\setupbodyfont[cmu]
\starttext
\startcolumns[n=3]
\placeregister[index][language=de-gr,method={zm,pc,uc}]
\stopcolumns
\dorecurse {2} {
\page\title{Iteration No. \recurselevel}
\ind{Adrastos} \ind{Ἄδραστος}\par
\ind{Amphiaraos} \ind{Ἀμφιάραος}\par
\ind{Hippomedon} \ind{Ἱππομέδων}\par
\ind{Kapaneus} \ind{Καπανεύς}\par
\ind{Parthenopaios} \ind{Παρθενοπαῖоς}\par
\ind{Polyneikes} \ind{Πολυνείκης}\par
\ind{Tydeus} \ind{Τυδεύς}\par
\ind{ἀναλύειν} \ind{Ἀναλυτικὰ}
\ind{analysiert} \ind{Analytik} \ind{analytisch}\par
\ind{ψυχὴ} \ind{Psyche} \ind{psychisch} \par
\ind{Χρόνος} \ind{χρόνιος} \ind{χρῶμα}
\ind{Chronos} \ind{Chronometer} \ind{chronologisch} \ind{chronisch}
\ind{Chrom} \ind{Chromatik} \ind{chromatisch}
}
\stoptext
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 486 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 21:17 ` Philipp Gesang
@ 2010-10-05 21:27 ` Hans Hagen
2010-10-05 21:55 ` Philipp Gesang
0 siblings, 1 reply; 19+ messages in thread
From: Hans Hagen @ 2010-10-05 21:27 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Philipp Gesang
On 5-10-2010 11:17, Philipp Gesang wrote:
> I guess there is some testing going on in order to determine
> whether to proceed with the current entry or switch to the next
> one. The position is the same, however the comparison with the
> last item fails and a new one is created instead. (Only
> guessing.)
it's a sequence of tests per comparison, like
Polyneikes
polyneikes % lowercased
polyneikes % shapes
Polyneikes % unicode
Πολυνείκης
Πολυνείκης % lowercased
polyneikes % shapes
Πολυνείκης % unicode
casing and shapes depends on the mapping vectors and the order can be
influenced, you can see this in action with
\enabletrackers[sorters.tests]
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 21:27 ` Hans Hagen
@ 2010-10-05 21:55 ` Philipp Gesang
2010-10-06 7:50 ` Hans Hagen
0 siblings, 1 reply; 19+ messages in thread
From: Philipp Gesang @ 2010-10-05 21:55 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 1670 bytes --]
On 2010-10-05 <23:27:33>, Hans Hagen wrote:
> On 5-10-2010 11:17, Philipp Gesang wrote:
>
> >I guess there is some testing going on in order to determine
> >whether to proceed with the current entry or switch to the next
> >one. The position is the same, however the comparison with the
> >last item fails and a new one is created instead. (Only
> >guessing.)
>
> it's a sequence of tests per comparison, like
>
> Polyneikes
> polyneikes % lowercased
> polyneikes % shapes
I assume by “shapes” you mean the base symbol (all diacritics
stripped).
> Polyneikes % unicode
>
> Πολυνείκης
> Πολυνείκης % lowercased
> polyneikes % shapes
> Πολυνείκης % unicode
>
> casing and shapes depends on the mapping vectors and the order can
> be influenced, you can see this in action with
>
> \enabletrackers[sorters.tests]
Bingo! The tracker instantly revealed a really nasty flaw in the
German standard transcription for Greek: “υ” is normally
converted to Latin “y”, but is retained as Latin “u” in
diphthongs like “ευ” and “ηυ”. So with the sorting definition I
posted I get amongst the results:
sorters > Kapaneys > Kapaneus
because all “υ” are lazily mapped to “y”. Thus, for those
occasional three words per book, determining the sorting position
by hand (e.g. “\index[Kapaneus]{Καπανεύς}”) might be less prone
to error.
Thanks for the hint and sorry for posting a non-solution,
Philipp
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 486 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: two buglets
2010-10-05 21:55 ` Philipp Gesang
@ 2010-10-06 7:50 ` Hans Hagen
0 siblings, 0 replies; 19+ messages in thread
From: Hans Hagen @ 2010-10-06 7:50 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Philipp Gesang
On 5-10-2010 11:55, Philipp Gesang wrote:
> I assume by “shapes” you mean the base symbol (all diacritics
> stripped).
indeed (and we might need to add/patch a few more shcodes to
char-def.lua if needed)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-10-06 7:50 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-11 15:52 two buglets Thomas A. Schmitz
2010-02-11 17:17 ` Hans Hagen
2010-02-11 17:35 ` Thomas A. Schmitz
2010-02-11 19:29 ` Hans Hagen
2010-02-11 21:27 ` Thomas A. Schmitz
2010-02-11 18:14 ` David Rogers
2010-10-03 8:24 ` Thomas A. Schmitz
2010-10-03 10:29 ` Hans Hagen
2010-10-03 10:58 ` Thomas A. Schmitz
2010-10-03 15:10 ` Hans Hagen
2010-10-03 15:43 ` Thomas A. Schmitz
2010-10-05 12:15 ` Philipp Gesang
2010-10-05 12:39 ` Hans Hagen
2010-10-05 13:29 ` Thomas A. Schmitz
2010-10-05 21:17 ` Philipp Gesang
2010-10-05 21:27 ` Hans Hagen
2010-10-05 21:55 ` Philipp Gesang
2010-10-06 7:50 ` Hans Hagen
2010-02-11 17:19 ` Hans Hagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).