* Re: sort-lan.lua nitpicks and sorting
2010-05-02 13:59 sort-lan.lua nitpicks and sorting Philipp Gesang
@ 2010-05-03 7:35 ` Philipp Gesang
2010-05-07 9:21 ` Hans Hagen
1 sibling, 0 replies; 3+ messages in thread
From: Philipp Gesang @ 2010-05-03 7:35 UTC (permalink / raw)
To: ntg-context
[-- Attachment #1.1.1: Type: text/plain, Size: 605 bytes --]
On 2010-05-02 <15:59:53>, Philipp Gesang wrote:
> Hi again,
>
>
> 1. In sort-lan.lua, line 101 should read «['r'] = "r"», and line 144
> «['r'] = 26, -- r».
In lines 152 and 109 concerning the character “ů” (uring in unicode
speak) there's a typo, the key should be “uc(0x016F)” instead of
“uc(0x01F6)”.
The long quantities “ó” and “ý” are missing as well. They belong after
their short counterparts. I append a diff for the file.
Philipp
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.1.2: sort-lan-lua.patch --]
[-- Type: text/plain, Size: 2106 bytes --]
--- /home/laokoon/base/sort-lan.lua 2010-04-07 23:10:04.000000000 +0200
+++ sort-lan.lua 2010-05-03 09:28:23.813291928 +0200
@@ -98,7 +98,8 @@
['o'] = "o",
['p'] = "p",
['q'] = "q",
- ['s'] = "r",
+ ['r'] = "r",
+ [uc(0x00F3)] = uc(0x00F3), -- oacute
[uc(0x0147)] = uc(0x0147), -- rcaron
['s'] = "s",
[uc(0x0161)] = uc(0x0161), -- scaron
@@ -106,11 +107,12 @@
[uc(0x0165)] = uc(0x0165), -- tcaron
['u'] = "u",
[uc(0x00FA)] = "u",
- [uc(0x01F6)] = "u",
+ [uc(0x016F)] = "u",
['v'] = "v",
['w'] = "w",
['x'] = "x",
['y'] = "y",
+ [uc(0x00FD)] = uc(0x00FD), -- yacute
['z'] = "z",
[uc(0x017E)] = uc(0x017E), -- zcaron
}
@@ -139,23 +141,25 @@
['n'] = 21, -- n
[uc(0x0147)] = 22, -- ncaron
['o'] = 23, -- o
- ['p'] = 24, -- p
- ['q'] = 25, -- q
- ['s'] = 26, -- r
- [uc(0x0147)] = 27, -- rcaron
- ['s'] = 28, -- s
- [uc(0x0161)] = 29, -- scaron
- ['t'] = 30, -- t
- [uc(0x0165)] = 31, -- tcaron
- ['u'] = 32, -- u
- [uc(0x00FA)] = 33, -- uacute
- [uc(0x01F6)] = 34, -- uring
- ['v'] = 35, -- v
- ['w'] = 36, -- w
- ['x'] = 37, -- x
- ['y'] = 38, -- y
- ['z'] = 39, -- z
- [uc(0x017E)] = 40, -- zcaron
+ [uc(0x00F3)] = 24, -- oacute
+ ['p'] = 25, -- p
+ ['q'] = 26, -- q
+ ['r'] = 27, -- r
+ [uc(0x0147)] = 28, -- rcaron
+ ['s'] = 29, -- s
+ [uc(0x0161)] = 20, -- scaron
+ ['t'] = 31, -- t
+ [uc(0x0165)] = 32, -- tcaron
+ ['u'] = 33, -- u
+ [uc(0x00FA)] = 34, -- uacute
+ [uc(0x016F)] = 35, -- uring
+ ['v'] = 36, -- v
+ ['w'] = 37, -- w
+ ['x'] = 38, -- x
+ ['y'] = 39, -- y
+ [uc(0x00FD)] = 40, -- yacute
+ ['z'] = 41, -- z
+ [uc(0x017E)] = 42, -- zcaron
}
-- French
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 486 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: sort-lan.lua nitpicks and sorting
2010-05-02 13:59 sort-lan.lua nitpicks and sorting Philipp Gesang
2010-05-03 7:35 ` Philipp Gesang
@ 2010-05-07 9:21 ` Hans Hagen
1 sibling, 0 replies; 3+ messages in thread
From: Hans Hagen @ 2010-05-07 9:21 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Philipp Gesang
On 2-5-2010 3:59, Philipp Gesang wrote:
> 1. In sort-lan.lua, line 101 should read «['r'] = "r"», and line 144
> «['r'] = 26, -- r».
i patched the file
> 2. Although I read the disclaimer about said file being “preliminary and
> incomplete” -- is there some rationale behind the range of integers for
> each language mapping? The mapping for English goes from 1 to 51,
> interleaving 2 integers for each letter (which is odd because it should
> start from index 3 with “a”, shouldn't it?), while the Czech one goes
> from 1 to 40 without skipping, Finnish and Austrian from 1 to 58.
some old (ruby) code was used etc etc
> What about mapping them onto a larger but common scale that would
> alleviate multilingual sorting so that the alphabetical representation
> of the phoneme /a/ maps to the same value over different languages?†
> E.g.
> ["a"] = 3, -- in a Latin mapping,
> ["α"] = 3, -- in Greek mapping,
> ["а"] = 3, -- in a Russian mapping.
hm, interesting ... feel free to reshuffle and provide patches
> † I know this is impractical for many writing systems and even within
> the set of Latin or Greek based alphabets it largely depends on a given
> purpose how much precision you need in sorting.
indeed but we can have multiple variants and are not bound to specific
conventions
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 3+ messages in thread