On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote: > OK, I'll write something for German and English, but the thing > is that we need more input what users expect. For mixtures with > foreign languages, there might not be generally accepted rules at > all, so people will define something on an ad-hoc basis. Hi Thomas and others, technically speaking the problem is solved by ISO 14651.[1] In praxi multilingual sorting depends on local rules, of which “One index per script|language.” seems to be the most common. Some time ago I made an lpeg from the bnf in [1]. It matches the collation rules from [2], but as I couldn’t figure out how to map them onto context’s sorting mechanism I never got around to actually capture the information. As I won’t be having the time to try it with the new structure of sort-lan I guess I’ll just attach the peg grammar for anyone to use as a starting point. Unicode collation would be great to have in context. > transliteration. The problem with polytonic Greek is that so many > different unicode characters need to have the same sort entry. If Isn’t that just what the Greek rules in sort-lan.lua do? If not then it would be a bug. ····startsnippet················································· definitions["gr"] = { entries = { ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α", ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α", ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α", ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α", ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β", ····stopsnippet·················································· Always nice to have a decent discussion on sorting ;) Philipp [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments