From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/50883 Path: news.gmane.org!not-for-mail From: Henning Hraban Ramm Newsgroups: gmane.comp.tex.context Subject: Re: converters (was: TexPaste alpha) Date: Fri, 29 May 2009 10:14:28 +0200 Message-ID: References: <4A1DB914.7000006@gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v935.3) Content-Type: multipart/mixed; boundary=Apple-Mail-3--1010303559 X-Trace: ger.gmane.org 1243584932 640 80.91.229.12 (29 May 2009 08:15:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 29 May 2009 08:15:32 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Fri May 29 10:15:28 2009 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1M9xFQ-0007go-4M for gctc-ntg-context-518@m.gmane.org; Fri, 29 May 2009 10:15:28 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id B31C01FE6C; Fri, 29 May 2009 10:15:27 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 05457-04; Fri, 29 May 2009 10:14:44 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id AADD11FE0F; Fri, 29 May 2009 10:14:44 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 1F69F1FE0F for ; Fri, 29 May 2009 10:14:42 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 24028-07 for ; Fri, 29 May 2009 10:14:34 +0200 (CEST) Original-Received: from filter3-ams.mf.surf.net (filter3-ams.mf.surf.net [192.87.102.71]) by ronja.ntg.nl (Postfix) with ESMTP id 264671FD91 for ; Fri, 29 May 2009 10:14:34 +0200 (CEST) Original-Received: from turan.fiee.net (lvps87-230-77-106.dedicated.hosteurope.de [87.230.77.106]) by filter3-ams.mf.surf.net (8.13.8/8.13.8/Debian-3) with ESMTP id n4T8EO49025972 for ; Fri, 29 May 2009 10:14:25 +0200 Original-Received: from turan.fiee.net (lvps87-230-77-106.dedicated.hosteurope.de [127.0.0.1]) by turan.fiee.net (Postfix) with ESMTP id A066F18E08034 for ; Fri, 29 May 2009 10:14:29 +0200 (CEST) Received-SPF: neutral (lvps87-230-77-106.dedicated.hosteurope.de: 83.79.57.158 is neither permitted nor denied by domain of fiee.net) client-ip=83.79.57.158; envelope-from=hraban@fiee.net; helo=[10.128.9.9]; Original-Received: from [10.128.9.9] (158-57.79-83.cust.bluewin.ch [83.79.57.158]) by turan.fiee.net (Postfix) with ESMTP for ; Fri, 29 May 2009 10:14:29 +0200 (CEST) In-Reply-To: X-Mailer: Apple Mail (2.935.3) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=87.230.77.106; country=DE; region=07; city=Host; latitude=51.6500; longitude=6.1833; http://maps.google.com/maps?q=51.6500,6.1833&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 233937383 - 08004c658d02 - 20090529 X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.71 X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.11 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:50883 Archived-At: --Apple-Mail-3--1010303559 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Am 2009-05-28 um 09:45 schrieb luigi scarso: > I guess I should build a new converter suite (there's also a > InDesign Tags to ConTeXt converter anywhere on my harddisk). > But I won't make GUI apps, just scripts. > That's sound good ! > If in python, even better ! > If only scripts, the best ! > > Can we have more details ? Which conversion do you need? If it's InDesign to ConTeXt, there's always custom programming needed - e.g. you need to know what ID paragraph style should become what ConTeXt section. (sample attached) I'm not good in building parsers, using mostly regular expression replacements, so my converters are always limited, and manual cleanup is necessary - but they save a lot of manual work anyway! Greetlings from Lake Constance! Hraban --- http://www.fiee.net/texnique/ http://wiki.contextgarden.net https://www.cacert.org (I'm an assurer) --Apple-Mail-3--1010303559 Content-Disposition: attachment; filename=latin1_to_utf8.py Content-Type: text/x-python-script; x-unix-mode=0755; x-mac-type=54455854; name="latin1_to_utf8.py" Content-Transfer-Encoding: quoted-printable #!/usr/bin/env python # -*- coding: utf-8 -*- """ Universelle Textcodierung 2009-03-10 by Henning Hraban Ramm, fi=C3=ABe virtu=C3=ABlle quellcodierung_to_zielcodierung.py [Optionen] Quelldatei [Zieldatei] Es k=C3=B6nnen auch ganze Verzeichnisse bearbeitet werden. Optionen: --filter=3DDateiendung --overwrite (sonst wird die Originaldatei gesichert) --hidden (sonst werden versteckte Dateien ignoriert) """ import os, os.path, sys, codecs, getopt, shutil try: import latex except: pass modes =3D ('filter', 'overwrite', 'hidden') mode =3D {} def help(message=3D""): print message print __doc__ sys.exit(1) def backup(datei): original =3D datei pfad, datei =3D os.path.split(datei) datei, ext =3D os.path.splitext(datei) count =3D 0 while os.path.exists(os.path.join(pfad, "%s.%d%s" % (datei, count, = ext))): count +=3D 1 neudatei =3D os.path.join(pfad, "%s.%d%s" % (datei, count, ext)) print "Sichere %s als %s" % (original, neudatei) shutil.copy(original, neudatei) return neudatei def is_hidden(datei): return (datei.startswith('.') or os.sep+'.' in datei) def convert(source, target, so_enc, ta_enc): from_exists =3D os.path.exists(source) to_exists =3D os.path.exists(target) from_isdir =3D os.path.isdir(source) to_isdir =3D os.path.isdir(target) from_path, from_name =3D os.path.split(source) to_path, to_name =3D os.path.split(target) #from_name =3D os.path.basename(source) #to_name =3D os.path.basename(target) if not from_exists: help("Quelle '%s' nicht gefunden!" % from_name) if from_isdir: if is_hidden(source) and not mode['hidden']: print "Ignoriere verstecktes Verzeichnis %s" % source return if not to_isdir: help("Wenn die Quelle ein Verzeichnis ist, muss auch das = Ziel ein Verzeichnis sein!") print "Verarbeite Verzeichnis %s" % source dateien =3D os.listdir(source) #if not mode['hidden']: # dateien =3D [d for d in dateien if not is_hidden(d)] if mode['filter']: dateien =3D [d for d in dateien if = d.endswith(mode['filter'])] for datei in dateien: s =3D os.path.join(source, datei) t =3D os.path.join(target, datei) convert(s, t, so_enc, ta_enc) else: if is_hidden(from_name) and not mode['hidden']: print "Ignoriere versteckte Datei %s" % source return if to_isdir: target =3D os.path.join(target, from_name) if not mode['overwrite']: if source=3D=3Dtarget: source=3Dbackup(source) elif os.path.exists(target): backup(target) print "Konvertiere %s (%s)\n\tnach %s (%s)" % (source, so_enc, = target, ta_enc) so_file =3D file(source, "rU") lines =3D so_file.readlines() so_file.close() ta_file =3D file(target, "w") for l in lines: ta_file.write(unicode(l, so_enc).encode(ta_enc)) ta_file.close() =20 opts, args =3D getopt.getopt(sys.argv[1:], "ohf:", = ["overwrite","hidden","filter=3D"]) if len(args)<1: help("Zu wenige Parameter angegeben!") for m in modes: mode[m] =3D False for (o, a) in opts: if o=3D=3D'-'+m[0] or o=3D=3D'--'+m: if a: print "Modus %s =3D %s" % (m, a) else: a =3D True print "Modus %s aktiv" % m mode[m] =3D a #print "modes:", mode #print "opts :", opts #print "args :", args # gew=C3=BCnschte Codierung aus dem Dateinamen ablesen scriptname =3D os.path.splitext(os.path.basename(sys.argv[0]))[0] from_enc, to_enc =3D scriptname.split("_to_") from_name =3D to_name =3D args[0] if len(args)>1: to_name =3D args[1] convert(from_name, to_name, from_enc, to_enc) =20 --Apple-Mail-3--1010303559 Content-Disposition: attachment; filename=indtxt2context.py Content-Type: text/x-python-script; x-mac-creator=21526368; x-unix-mode=0644; x-mac-type=54455854; name="indtxt2context.py" Content-Transfer-Encoding: quoted-printable #!/usr/bin/env python # -*- coding: utf-8 -*- """ Convert InDesign tagged text to ConTeXt """ import sys, os import re quote =3D u'$&_%' rePatterns =3D { # paragraph styles ur'^((\d\.)*\s+)?(.+)$' : = ur'\\chapter{\3}\n', ur'^((\d\.)*\s+)?(.+)$' : = ur'\\section{\3}\n', ur'^((\d\.)*\s+)?(.+)$' : = ur'\\subsection{\3}\n', ur'^((\d\.)*\s+)?(.+)$' : = ur'\\subsubsection{\3}\n', # character styles ur'(.+?)' : ur'{\\bf \1}', #ur'(.*?)' : ur'\\otherfont{\1}', =09 u'<.*?>' : u'', # delete all other tags # lines that start with dotted numbers =3D section titles ur'^\d+\s+(.+)$' : ur'\\chapter{\1}\n', ur'^\d+\.\d+\.?\s+(.+)$' : ur'\\section{\1}\n', ur'^\d+\.\d+\.\d+\.?\s+(.+)$' : ur'\\subsection{\1}\n', ur'^\d+\.\d+\.\d+\.\d+\.?\s+(.+)\$' : ur'\\subsubsection{\1}\n', =09 ur'^(\s*)[=E2=80=93\-=C2=B7=E2=80=A2=EF=82=A8]\s+' : = ur'\1\\item\t', # itemization (lines starting with bullet etc.) ur'^(\s*)(\d+)\.?\)\s+' : ur'\1\\item[\2]\t', # itemization = (numerical) ur'([Zusovz])\.([Baguo])\.' : ur'\1.\\,\2.', # u.a., s.o., o.g., = z.B. ur'[=E2=80=9E"=E2=80=9C](.*?)[=E2=80=9C=E2=80=9D"]' : = ur'\\quotation{\1}', # German quotation ur'[\'=E2=80=99,](.*?)[\'=E2=80=99=E2=80=98]' : ur'\\quote{\1}', = # German single quotation #ur'"(.*?)"' : ur'\\quotation{\1}', # quotation? ur' (\.\?\!:;)' : ur'\1', # spaces in front of punctuation ur'{\\em\s+}' : ur'', # empty emphasizing ur' (%|=C2=B0)' : ur'\\,\1', # spaces in front of measure units u' - ' : u' =E2=80=93 ', # en dash ur'(\d{4})\s*(\-|=E2=80=93)\s*(\d{4})' : ur'\1=E2=80=93\3', # = year numbers =09 u' +' : u' ', # multiple spaces u'^\s+$' : u'\n', # make empty lines really empty # ur'' : ur'', =09 } reres =3D {} status =3D { 'item' : False } # collect parameters if len(sys.argv) > 1: sourcename =3D sys.argv[1] if len(sys.argv) > 2: targetname =3D sys.argv[2] else: targetname =3D sourcename.replace('.txt', '.tex') else: print "file name?" sys.exit() # compile regular expressions for k in rePatterns: p =3D re.compile(k) reres[p] =3D rePatterns[k] source =3D open(sourcename, 'rU') target =3D open(targetname, 'w') # convert lines for line in source.readlines(): line =3D unicode(line, 'utf-16be') # "unicode" encoded InDesign = tagged text is UTF-16 big-endian encoded! for p in reres: line =3D p.sub(reres[p], line) for c in quote: line =3D line.replace(c, u'\\'+c) if '\\item ' in line and not status['item']: target.write('\\startitemize[]\n') status['item'] =3D True if status['item'] and not '\\item ' in line: target.write('\\stopitemize\n') status['item'] =3D False target.write(line.encode('utf-8')) # write UTF-8 source.close() target.close() print "%s completed" % targetname= --Apple-Mail-3--1010303559 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --Apple-Mail-3--1010303559--