From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/82513 Path: news.gmane.org!not-for-mail From: Hans Hagen Newsgroups: gmane.comp.tex.context Subject: Re: Support for Thai in ConTeXt Date: Wed, 15 May 2013 17:20:58 +0200 Message-ID: <5193A7DA.3070203@wxs.nl> References: <51926385.70705@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1368631303 27117 80.91.229.3 (15 May 2013 15:21:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 May 2013 15:21:43 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Wed May 15 17:21:43 2013 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UcdWP-0008Ny-DX for gctc-ntg-context-518@m.gmane.org; Wed, 15 May 2013 17:21:41 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 13FB0101E0; Wed, 15 May 2013 17:21:41 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Pd3Hk4hJUWmi; Wed, 15 May 2013 17:21:35 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id D37A7101E5; Wed, 15 May 2013 17:21:35 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id D3A14101E5 for ; Wed, 15 May 2013 17:21:34 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id qcNYQXiZPMZg for ; Wed, 15 May 2013 17:21:33 +0200 (CEST) Original-Received: from filter1-ams.mf.surf.net (filter1-ams.mf.surf.net [192.87.102.69]) by balder.ntg.nl (Postfix) with ESMTP id 6265C101E0 for ; Wed, 15 May 2013 17:21:23 +0200 (CEST) Original-Received: from smtp.ziggozakelijk.nl (D57D1DA2.static.ziggozakelijk.nl [213.125.29.162]) by filter1-ams.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id r4FFLMmO014703 for ; Wed, 15 May 2013 17:21:22 +0200 X-Default-Received-SPF: pass (skip=loggedin (res=PASS)) x-ip-name=10.100.1.103; Original-Received: from [10.100.1.103] (unverified [10.100.1.103]) by pragma-net.nl (SurgeMail 6.3c2) with ESMTP id 3755-1713362 for ; Wed, 15 May 2013 17:21:22 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 In-Reply-To: X-Authenticated-User: hagen@controller-9 X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=213.125.29.162; country=NL; region=15; city=Zwolle; latitude=52.5058; longitude=6.0858; http://maps.google.com/maps?q=52.5058,6.0858&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0NJAflmAF - a301faecf848 - 20130515 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.69 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:82513 Archived-At: On 5/15/2013 4:09 PM, Mojca Miklavec wrote: > On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote: >> On 5/14/2013 6:07 PM, luigi scarso wrote: >> >>> I Hope that someone can help here >> >> >> as Mojca mentioned thai at bachotex i'll add the patterns as a start >> >> given specs, examples and time, adding support for thai to context shouldn't >> be too hard (assuming that there are users) > > But it's not trivial either. It depends ... we're using a dictionary to determine word boundaries, aren't we? I'm pretty sure that I've done more complex coding. > There's an opensource project implementing word segmentation: > http://linux.thai.net/projects/swath > The specification (someone's thesis) can be found here: > http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf Ok, so there are some ttext files there with words. > The ugly part of pdfTeX approach is that it requires an external text > processor to digest an input TeX document and return a copy with word > segmentation. Then pdfTeX is run on the resulting file. XeTeX can use > ICU library to do the segmentation. > > In LuaTeX one would have to plug the word segmentation somewhere (but > writing that part is slightly non-trivial). I just did a quick test using those dictionaries (abusing some code that i already had on my machine). Quite doable. It all depends on having the dictionaries available (on the garden or in the distribution). Anyhow, it's not that much font related, just language / script support and we already have that for some languages and adding thai to it doesn't hurt. Of course we'd need some testing. It doesn't make much sense to add features to context that no one would use at some point. But ... Luigi is already teaching himself Thai, so ... Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________