From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/15321 Path: main.gmane.org!not-for-mail From: "Thomas A. Schmitz" Newsgroups: gmane.comp.tex.context Subject: Re: Arabic-utf-8 (plus a sample) Date: Sat, 05 Jun 2004 22:41:39 +0200 Sender: ntg-context-admin@ntg.nl Message-ID: <1086468099.5707.26.camel@tascomputer.home> References: Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1086468426 550 80.91.224.253 (5 Jun 2004 20:47:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 5 Jun 2004 20:47:06 +0000 (UTC) Original-X-From: ntg-context-admin@ntg.nl Sat Jun 05 22:46:56 2004 Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BWi48-0004Xs-00 for ; Sat, 05 Jun 2004 22:46:56 +0200 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id CD5DF10B24; Sat, 5 Jun 2004 22:46:51 +0200 (MEST) Original-Received: from mailout03.sul.t-online.com (mailout03.sul.t-online.com [194.25.134.81]) by ref.ntg.nl (Postfix) with ESMTP id E755E10AB6 for ; Sat, 5 Jun 2004 22:44:42 +0200 (MEST) Original-Received: from fwd11.aul.t-online.de by mailout03.sul.t-online.com with smtp id 1BWi1y-0002PZ-01; Sat, 05 Jun 2004 22:44:42 +0200 Original-Received: from [192.168.0.2] (rI6ZTYZZQefwtTuw9MZMtLS-1aTRSnQK1N6J1Ye61knlGyoAVG2d6n@[80.128.246.104]) by fmrl11.sul.t-online.com with esmtp id 1BWi1w-2KHS0O0; Sat, 5 Jun 2004 22:44:40 +0200 Original-To: ntg-context@ntg.nl In-Reply-To: X-Mailer: Ximian Evolution 1.4.6 X-Seen: false X-ID: rI6ZTYZZQefwtTuw9MZMtLS-1aTRSnQK1N6J1Ye61knlGyoAVG2d6n@t-dialin.net Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:15321 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:15321 Idris, I know a bit of perl and would love to help. However, I fear that sending us your stuff via mail will be a bit difficult because the utf-8 chracters get transformed into gibberish. Could you send the hexadecimal code of the characters you want to convert? Or I could simply give you the syntax, you'll know what to do. So here comes a perl script that works for my greek stuff; I don't see why it shouldn't work with Arabic: ==================================cut here #!/usr/bin/perl -w use strict; use open ':utf8'; open(NEW,">new.tex"); #opens file to print out the result while (<>); { #this opens the file for reading $_ =~ s/\x{HEXADECIMAL_VALUE_OF_CHARACTER}/\x{HEXADECIMAL_VALUE_OF_NEW_CHARACTER}/esg; #this is the actual conversion print NEW "$_"; #and this writes the result into file "new.tex" } close(NEW); ==================================and here Make the script executable and call it with the name of a file as an argument. HTH Thomas On Sat, 2004-06-05 at 21:32, Idris Samawi Hamid wrote: > Hi gang, > > For Arabic we use a Latin transcription in Aleph/(e-)Omega (or even > ArabTeX) unless one of the encoding filters like utf-8 is used. Even for > utf-8 files, however, it would be very useful to be able to convert a > utf-8 file to Latin transcription for further processing by > Aleph/(e-)Omega. For example, adding diacritics is much easier to do in > Latin than in an Arabic script editor because Latin transcription is > one-dimensional and adding diacritics to Arabic is a 2-dimen affair. > > The best thing would be a perl script but I don't know perl at all (except > to run some some precreated scripts). If someone out of the kindness of > their heart could write a short and simple script for just seven > characters I could do the rest myself and present it back here. > > Now all of the Arabic charachters in utf-8 can be represented by extended > ascii. I need something like this, that converts every extended ascii > representation of Arabic utf-8 into a Latin transcription: > > ا => A > > ب => b > > ج => j > > د => d > > ه => h > > و => w > > ز => z > > If someone could write a perl script that can accomplish the above > conversion, I can manually fill in the rest of the script. Basically I use > a modified version of the ArabTeX transcription. > > Here is a "gift" in return: a sample utf-8 Arabic file that can be > processed by Aleph/(e-)Omega in ConTeXt (you will probably need to dvips > this, though some dvi-viewers can do the postscript/16-bit thing): > > ============================================== > \hoffset=0pt % for Omega bug: has this been fixed? > > \def\ArabicUTF{\ocp\UTFArUni=inutf8 %% in88596 > %\ocp\UTFArUni=in88596 > \ocp\UniCUni=uni2cuni > \ocp\CUniArab=cuni2oar > \ocplist\UTFArOCP= > \addbeforeocplist 1 \UTFArUni > \addbeforeocplist 1 \UniCUni > \addbeforeocplist 1 \CUniArab > \nullocplist > \pushocplist\UTFArOCP} > > \input m-gamma.tex > \input type-omg.tex > \switchtobodyfont[omarb,12pt] % > > \textdir TRT% > \pardir TRT% > \ArabicUTF > > \starttext > > ، ؛ ؟ ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د ذ ر ز س > ش ص ض ط ظ ع غ ـ ف ق ك ل م ن ه و ى ي > > \blank[big] > > %ً ٌ ٍ َ ُ ِ ّ > > ْ ٠١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩ ٪ ٫ ٬ ٰ ٱ ٲ ٳ ٴ ٵ ٶ ٷ > ٸ ٹ ٺ ٻ ټ ٽ پ ٿ ڀ ځ ڂ ڃ ڄ څ چ ڇ ڈ ډ ڊ ڋ ڌ ڍ > ڎ ڏ ڐ ڑ ڒ ړ ڔ ڕ ږ ڗ ژ ڙ ښ ڛ ڜ ڝ ڞ ڟ ڢ ڡ ڢ ڣ > ڤ ڥ ڦ ڧ ڨ ک ڪ ګ ڬ ڭ ڮ گ ڰ ڱ ڲ ڳ ڴ ڵ ڶ ڷ ں ڻ > ڼ ھ ۀ ہ ۃ ۄ ۅ ۆ ۇ ۈ ۉ ۊ ۋ ی ې ۑ ے ۓ ۔ ە ۰ > ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ > > \blank[big] > > ـً ـٌ ـٍ ـَ ـُ ـِ ـّ ـْ ـٰ > > ا ب ج د ه و ز > > \stoptext > > ============================================== > > Best > Idris