From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/62286 Path: news.gmane.org!not-for-mail From: "Thomas A. Schmitz" Newsgroups: gmane.comp.tex.context Subject: Re: two buglets Date: Sun, 3 Oct 2010 17:43:21 +0200 Message-ID: References: <4B743BBE.2050307@wxs.nl> <4CA85AFD.7050200@wxs.nl> <1D533ABC-6546-40C3-9CD4-9659A86D9A5E@uni-bonn.de> <4CA89CCA.8020705@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1286120638 24088 80.91.229.12 (3 Oct 2010 15:43:58 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 3 Oct 2010 15:43:58 +0000 (UTC) Cc: mailing list for ConTeXt users To: Hans Hagen Original-X-From: ntg-context-bounces@ntg.nl Sun Oct 03 17:43:56 2010 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1P2Qj9-0001b2-26 for gctc-ntg-context-518@m.gmane.org; Sun, 03 Oct 2010 17:43:51 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id D39E0CA707; Sun, 3 Oct 2010 17:43:48 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id nM6-UoBfN8gt; Sun, 3 Oct 2010 17:43:48 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 60A33CA70A; Sun, 3 Oct 2010 17:43:42 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 70CDACA70A for ; Sun, 3 Oct 2010 17:43:41 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 1a035lpDXTSV for ; Sun, 3 Oct 2010 17:43:38 +0200 (CEST) Original-Received: from filter1-ams.mf.surf.net (filter1-ams.mf.surf.net [192.87.102.69]) by balder.ntg.nl (Postfix) with ESMTP id 82E10CA707 for ; Sun, 3 Oct 2010 17:43:38 +0200 (CEST) Original-Received: from uni-bonn.de (mail.uni-bonn.de [131.220.15.113]) by filter1-ams.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id o93FhbHN003662 for ; Sun, 3 Oct 2010 17:43:38 +0200 Original-Received: from [87.178.39.169] (account tschmit1@uni-bonn.de HELO [192.168.0.2]) by fe2.uni-bonn.de (CommuniGate Pro SMTP 5.2.12) with ESMTPA id 46780460; Sun, 03 Oct 2010 17:43:37 +0200 In-Reply-To: <4CA89CCA.8020705@wxs.nl> X-Mailer: Apple Mail (2.1081) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=131.220.15.113; country=DE; region=07; city=Bonn; latitude=50.7333; longitude=7.1000; http://maps.google.com/maps?q=50.7333,7.1000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 06DefHB1M - 932b4d6f89eb - 20101003 X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.69 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:62286 Archived-At: On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote: > > mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes) > mc zc pc : use mapping order, add -1,0, +1 to different case > uc: unicode order > > so, you define a sequence of comparisons where for instance > > U -> order u +/- 1 > \"u -> order of shape u +/- 1 > > etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.) > > numbers are sorted in a special way > > so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc > OK, that makes sense. I'll play with it, but having a few choice pages on the wiki would be great! >>>> > > best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries > > for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language > OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis. For Greek: I just looked at a dozen books here on my shelf. Most English books have a separate index for Greek terms; when they sort Greek terms with English words, they use transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If I ever see the necessity of setting this up, I'll be in touch off-list, but it's such an unusual thing that I think you shouldn't bother now. All best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________