ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Philipp Gesang <pgesang@ix.urz.uni-heidelberg.de>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: polish sorting
Date: Thu, 19 Aug 2010 10:13:48 +0200	[thread overview]
Message-ID: <20100819081348.GA27552@aides> (raw)
In-Reply-To: <4C6C632E.70505@wxs.nl>


[-- Attachment #1.1: Type: text/plain, Size: 4670 bytes --]

Hi Hans,

1.  changing the English sorting rules as you suggested had no effect,
    neither did the “add_uppercase_mappings('pl',1)”.

2.  I think my original question was stated imprecisly, so let me
    emphasize what I'm after:

    Suppose you've got three string aaa, Aaa and aab. They are tested
    _as if_ they had the same case, i.e. “aaa == Aaa” (the sorter
    returns 0). Then (only if the case-indifferent test returned equal)
    another check is done for the _first_ character only. If both
    strings differ in  case of the first char, then the string with the
    lowercase one gets precedence. The correct order will be:

    [1] = "aaa", [2] = "Aaa", [3] = "aab"

    whereas with uppercase after lowercase (as I understand it) you'd
    get:

    [1] = "aaa", [2] = "aab", [3] = "Aaa"

    And that is why I extended the splitter (a) to keep the state of the
    first character as a boolean as well as (b) to return lowercase sort
    strings, and the comparer to do an extra check for this whenever
    basicsort returns 0.

I really don't expect you to change the sorter, far from it. Perhaps you
can keep an extra comparer around to do the job -- after all the table
is called “comparers” but for now contains only a single one. Same for
splitters. And as this rule seems to be quite popular around the world
it might probably become useful someday. If you decide against it I'll
just put it on the wiki which will be fine enough, I guess.

Philipp


On 2010-08-19 <00:48:14>, Hans Hagen wrote:
> On 18-8-2010 6:08, Philipp Gesang wrote:
> >Hi,
> >
> >I'm creating some sorting tables. While researching this topic I
> >stumbled on the Polish dictionary sorting rules: if two strings are
> >equal except for case then the one gets precedence that begins
> >lowercase.[1] (This seems to apply to the Swedish order as well but I
> >have no means to verify that. Apparently, my German dictionary (from
> >1991) follows the same rule without explicitly stating so.)
> >
> >Context seems to prefer it the other way round, so I modified two
> >functions from sort-ini.lua to handle that; but I'm not happy with
> >this solution.
> >
> >So my question: is there already, or could we have some mechanism
> >to influence the details of sorting in context?
> 
> i wonder if this works out ok (needs a test index):
> 
> sorters.replacements["pl"] = {
>     -- no replacements
> }
> 
> sorters.entries["pl"] = {
>     ["a"] = "a", ["ą"] = "ą", ["b"] = "b", ["c"] = "c", ["ć"] = "ć",
>     ["d"] = "d", ["e"] = "e", ["ę"] = "ę", ["f"] = "f", ["g"] = "g",
>     ["h"] = "h", ["i"] = "i", ["j"] = "j", ["k"] = "k", ["l"] = "l",
>     ["ł"] = "ł", ["m"] = "m", ["n"] = "n", ["ń"] = "ń", ["o"] = "o",
>     ["ó"] = "ó", ["p"] = "p", ["q"] = "q", ["r"] = "r", ["s"] = "s",
>     ["ś"] = "ś", ["t"] = "t", ["u"] = "u", ["v"] = "v", ["w"] = "w",
>     ["x"] = "x", ["y"] = "y", ["z"] = "z", ["ź"] = "ź", ["ż"] = "ż",
> }
> 
> sorters.mappings["pl"] = {
>     ["a"] =  1, ["ą"] =  2, ["b"] =  3, ["c"] =  4, ["ć"] =  5,
>     ["d"] =  6, ["e"] =  7, ["ę"] =  8, ["f"] =  9, ["g"] = 10,
>     ["h"] = 11, ["i"] = 12, ["j"] = 13, ["k"] = 14, ["l"] = 15,
>     ["ł"] = 16, ["m"] = 17, ["n"] = 18, ["ń"] = 19, ["o"] = 20,
>     ["ó"] = 21, ["p"] = 22, ["q"] = 23, ["r"] = 24, ["s"] = 25,
>     ["ś"] = 26, ["t"] = 27, ["u"] = 28, ["v"] = 29, ["w"] = 30,
>     ["x"] = 31, ["y"] = 32, ["z"] = 33, ["ź"] = 34, ["ż"] = 35,
> }
> 
> add_uppercase_entries ('pl')
> add_uppercase_mappings('pl',1)
> 
> 
> 
> 
> -----------------------------------------------------------------
>                                           Hans Hagen | PRAGMA ADE
>               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
>                                              | www.pragma-pod.nl
> -----------------------------------------------------------------
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2010-08-19  8:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-18 16:08 Philipp Gesang
2010-08-18 22:38 ` Hans Hagen
2010-08-18 22:48 ` Hans Hagen
2010-08-19  8:13   ` Philipp Gesang [this message]
2010-08-19  9:41     ` Hans Hagen
2010-08-19 10:35       ` Philipp Gesang
2010-08-19 11:13         ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100819081348.GA27552@aides \
    --to=pgesang@ix.urz.uni-heidelberg.de \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).