From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/15325 Path: main.gmane.org!not-for-mail From: "Thomas A. Schmitz" Newsgroups: gmane.comp.tex.context Subject: Re: Arabic-utf-8 (plus a sample) Date: Sat, 05 Jun 2004 23:48:18 +0200 Sender: ntg-context-admin@ntg.nl Message-ID: <1086472098.5707.36.camel@tascomputer.home> References: <1086468099.5707.26.camel@tascomputer.home> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1086472450 28001 80.91.224.253 (5 Jun 2004 21:54:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 5 Jun 2004 21:54:10 +0000 (UTC) Original-X-From: ntg-context-admin@ntg.nl Sat Jun 05 23:53:58 2004 Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BWj70-0006ss-00 for ; Sat, 05 Jun 2004 23:53:58 +0200 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id E84B610B42; Sat, 5 Jun 2004 23:53:53 +0200 (MEST) Original-Received: from mailout04.sul.t-online.com (mailout04.sul.t-online.com [194.25.134.18]) by ref.ntg.nl (Postfix) with ESMTP id 5D33D10ABB for ; Sat, 5 Jun 2004 23:51:31 +0200 (MEST) Original-Received: from fwd10.aul.t-online.de by mailout04.sul.t-online.com with smtp id 1BWj4d-00022Y-00; Sat, 05 Jun 2004 23:51:31 +0200 Original-Received: from [192.168.0.2] (Vya0gaZpQeNSPpywjJEl8E7bP0UewAOCYY6PD5pY9xn6k5+m4WXvou@[80.128.246.104]) by fmrl10.sul.t-online.com with esmtp id 1BWj4R-1KR62a0; Sat, 5 Jun 2004 23:51:19 +0200 Original-To: ntg-context@ntg.nl In-Reply-To: X-Mailer: Ximian Evolution 1.4.6 X-Seen: false X-ID: Vya0gaZpQeNSPpywjJEl8E7bP0UewAOCYY6PD5pY9xn6k5+m4WXvou@t-dialin.net Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:15325 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:15325 Just a quick reply (it's bedtime over here): there may be 2 problems. 1 is that the mail program put in an unwanted linebreak after the =~ part, just remove it; it should all be one line. And then: you'll need a fairly recent version of perl for it to work, what do you get when you do perl --version I guess for utf to work, it should be at least 5.8.0. Your basic idea of the usage is right (I'm not a windows person, but I assume it should be the same): save the scipt as utf2tex.pl, make it executable and call it as utf2tex.pl FILENAME.txt. I guess it would be easiest to convert the utf to ascii directly - that would mean you could later convert it back. I have a set of scripts that do just that -- convert babel Greek into utf-8 and back. If you need more help, I'll look into it tomorrow! Best Thomas On Sat, 2004-06-05 at 23:33, Idris Samawi Hamid wrote: > On Sat, 05 Jun 2004 22:41:39 +0200, Thomas A. Schmitz > wrote: > > > Idris, > > > > I know a bit of perl and would love to help. However, I fear that > > sending us your stuff via mail will be a bit difficult because the utf-8 > > chracters get transformed into gibberish. > > Thnx 4 such a speedy reply! I don't think you are getting gibberish > though; you should be getting the extended ascii representation. So the > letter alif (hex 0627) should look like this: > > ا > > Do you get a forward-slashed circle and a section symbol? If so, that's > the ascii representation I'm trying to convert to the letter `A'. > > Here are the codes you want: > > ا [0627] => A > > ب [0628] => b > > ج [062C] => j > > د [062F] => d > > ه [0647] => h > > و [0648] => w > > ز [0632] => z > > Let me explain my situation more clearly:-) > > I have a unicode editor, Unitype Global Writer. I save a unicode document > as a utf *.txt file. When I open that saved file in my TeX editor > (WinEdt), it comes out as extended ascii (that's the "gibberish"). So what > I wanted to do was convert the ascii "gibberish" to my Latin > transcription. It seems that what you are suggesting is to use the hex > representation and convert the unicode txt file into a Latin transcription > file directly and bypass the gibberish. > > On your perl file, can you give me an example of how to use it? I tried > (in windows, with name > utf2tex.pl and unicode text in unicode-utf.txt) and get > > ========================= > > perl utf2tex.pl unicode-utf.txt > Unknown discipline class ':utf8' at C:/Perl/lib/open.pm line 18. > BEGIN failed--compilation aborted at utf2tex.pl line 4. > ========================= > > from your script I tried, e.g. > > ============================ > $_ =~ > s/\x{0627}/\x{0041}/esg; > # from alif to `A' > ============================ > > Your guidance will be greatly appreciated! > > Thnx a million! > Idris