On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
> OK, I'll write something for German and English, but the thing
> is that we need more input what users expect. For mixtures with
> foreign languages, there might not be generally accepted rules at
> all, so people will define something on an ad-hoc basis.

Hi Thomas and others,

technically speaking the problem is solved by ISO 14651.[1]

In praxi multilingual sorting depends on local rules, of
which “One index per script|language.” seems to be the most
common.

Some time ago I made an lpeg from the bnf in [1]. It matches the
collation rules from [2], but as I couldn’t figure out how to map
them onto context’s sorting mechanism I never got around to
actually capture the information. As I won’t be having the time
to try it with the new structure of sort-lan I guess I’ll just
attach the peg grammar for anyone to use as a starting point.
Unicode collation would be great to have in context.

> transliteration. The problem with polytonic Greek is that so many
> different unicode characters need to have the same sort entry. If

Isn’t that just what the Greek rules in sort-lan.lua do? If not
then it would be a bug.

····startsnippet·················································

definitions["gr"] = {
    entries = {
        ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α",
        ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α",
        ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α",
        ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α",
        ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β",

····stopsnippet··················································

Always nice to have a decent discussion on sorting ;)

Philipp


[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments