From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/61059 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: polish sorting Date: Thu, 19 Aug 2010 10:13:48 +0200 Message-ID: <20100819081348.GA27552@aides> References: <20100818160856.GB13324@aides> <4C6C632E.70505@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0637801464==" X-Trace: dough.gmane.org 1282205615 10883 80.91.229.12 (19 Aug 2010 08:13:35 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 19 Aug 2010 08:13:35 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Thu Aug 19 10:13:33 2010 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Om0Fh-0003xy-Jn for gctc-ntg-context-518@m.gmane.org; Thu, 19 Aug 2010 10:13:33 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id E1A8BCA5D9; Thu, 19 Aug 2010 10:13:32 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id xoBtH09P9aIE; Thu, 19 Aug 2010 10:13:30 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 08E95CA5C1; Thu, 19 Aug 2010 10:13:30 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 95CC8CA5C1 for ; Thu, 19 Aug 2010 10:13:28 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Jt4LSGwya9NC for ; Thu, 19 Aug 2010 10:13:25 +0200 (CEST) Original-Received: from filter3-nij.mf.surf.net (filter3-nij.mf.surf.net [195.169.124.154]) by balder.ntg.nl (Postfix) with ESMTP id D5531CA5B1 for ; Thu, 19 Aug 2010 10:13:25 +0200 (CEST) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter3-nij.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id o7J8DOZe009389 for ; Thu, 19 Aug 2010 10:13:24 +0200 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id o7J8DMoS029906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 19 Aug 2010 10:13:22 +0200 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id o7J8DNYJ002429 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 19 Aug 2010 10:13:23 +0200 Original-Received: from localhost (mnhm-4d0122f7.pool.mediaWays.net [77.1.34.247]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id o7J8D8NF008947 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Thu, 19 Aug 2010 10:13:09 +0200 In-Reply-To: <4C6C632E.70505@wxs.nl> X-Operating-System: Linux aides 2.6.34-rc3 X-Polite-Request: "Please try to be nice, don't send html mail." User-Agent: Mutt/1.5.20 (2009-06-14) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0dCU8doHP - 64a9ecd17e06 - 20100819 X-Scanned-By: CanIt (www . roaringpenguin . com) on 195.169.124.154 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:61059 Archived-At: --===============0637801464== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="opJtzjQTFsWo+cga" Content-Disposition: inline --opJtzjQTFsWo+cga Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Hans, 1. changing the English sorting rules as you suggested had no effect, neither did the =E2=80=9Cadd_uppercase_mappings('pl',1)=E2=80=9D. 2. I think my original question was stated imprecisly, so let me emphasize what I'm after: Suppose you've got three string aaa, Aaa and aab. They are tested _as if_ they had the same case, i.e. =E2=80=9Caaa =3D=3D Aaa=E2=80=9D (= the sorter returns 0). Then (only if the case-indifferent test returned equal) another check is done for the _first_ character only. If both strings differ in case of the first char, then the string with the lowercase one gets precedence. The correct order will be: [1] =3D "aaa", [2] =3D "Aaa", [3] =3D "aab" whereas with uppercase after lowercase (as I understand it) you'd get: [1] =3D "aaa", [2] =3D "aab", [3] =3D "Aaa" And that is why I extended the splitter (a) to keep the state of the first character as a boolean as well as (b) to return lowercase sort strings, and the comparer to do an extra check for this whenever basicsort returns 0. I really don't expect you to change the sorter, far from it. Perhaps you can keep an extra comparer around to do the job -- after all the table is called =E2=80=9Ccomparers=E2=80=9D but for now contains only a single on= e. Same for splitters. And as this rule seems to be quite popular around the world it might probably become useful someday. If you decide against it I'll just put it on the wiki which will be fine enough, I guess. Philipp On 2010-08-19 <00:48:14>, Hans Hagen wrote: > On 18-8-2010 6:08, Philipp Gesang wrote: > >Hi, > > > >I'm creating some sorting tables. While researching this topic I > >stumbled on the Polish dictionary sorting rules: if two strings are > >equal except for case then the one gets precedence that begins > >lowercase.[1] (This seems to apply to the Swedish order as well but I > >have no means to verify that. Apparently, my German dictionary (from > >1991) follows the same rule without explicitly stating so.) > > > >Context seems to prefer it the other way round, so I modified two > >functions from sort-ini.lua to handle that; but I'm not happy with > >this solution. > > > >So my question: is there already, or could we have some mechanism > >to influence the details of sorting in context? >=20 > i wonder if this works out ok (needs a test index): >=20 > sorters.replacements["pl"] =3D { > -- no replacements > } >=20 > sorters.entries["pl"] =3D { > ["a"] =3D "a", ["=C4=85"] =3D "=C4=85", ["b"] =3D "b", ["c"] =3D "c",= ["=C4=87"] =3D "=C4=87", > ["d"] =3D "d", ["e"] =3D "e", ["=C4=99"] =3D "=C4=99", ["f"] =3D "f",= ["g"] =3D "g", > ["h"] =3D "h", ["i"] =3D "i", ["j"] =3D "j", ["k"] =3D "k", ["l"] =3D= "l", > ["=C5=82"] =3D "=C5=82", ["m"] =3D "m", ["n"] =3D "n", ["=C5=84"] =3D= "=C5=84", ["o"] =3D "o", > ["=C3=B3"] =3D "=C3=B3", ["p"] =3D "p", ["q"] =3D "q", ["r"] =3D "r",= ["s"] =3D "s", > ["=C5=9B"] =3D "=C5=9B", ["t"] =3D "t", ["u"] =3D "u", ["v"] =3D "v",= ["w"] =3D "w", > ["x"] =3D "x", ["y"] =3D "y", ["z"] =3D "z", ["=C5=BA"] =3D "=C5=BA",= ["=C5=BC"] =3D "=C5=BC", > } >=20 > sorters.mappings["pl"] =3D { > ["a"] =3D 1, ["=C4=85"] =3D 2, ["b"] =3D 3, ["c"] =3D 4, ["=C4=87= "] =3D 5, > ["d"] =3D 6, ["e"] =3D 7, ["=C4=99"] =3D 8, ["f"] =3D 9, ["g"] = =3D 10, > ["h"] =3D 11, ["i"] =3D 12, ["j"] =3D 13, ["k"] =3D 14, ["l"] =3D 15, > ["=C5=82"] =3D 16, ["m"] =3D 17, ["n"] =3D 18, ["=C5=84"] =3D 19, ["o= "] =3D 20, > ["=C3=B3"] =3D 21, ["p"] =3D 22, ["q"] =3D 23, ["r"] =3D 24, ["s"] = =3D 25, > ["=C5=9B"] =3D 26, ["t"] =3D 27, ["u"] =3D 28, ["v"] =3D 29, ["w"] = =3D 30, > ["x"] =3D 31, ["y"] =3D 32, ["z"] =3D 33, ["=C5=BA"] =3D 34, ["=C5=BC= "] =3D 35, > } >=20 > add_uppercase_entries ('pl') > add_uppercase_mappings('pl',1) >=20 >=20 >=20 >=20 > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com > | www.pragma-pod.nl > ----------------------------------------------------------------- > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --opJtzjQTFsWo+cga Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAkxs57wACgkQ02lYlJYWs9KC0gCeMMR1HF8uuaj63DZccTQ4U7Wr ZdQAn0085c5GLPdMzXj3wFi7oHQ9SKc7 =jbA3 -----END PGP SIGNATURE----- --opJtzjQTFsWo+cga-- --===============0637801464== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0637801464==--