From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/104199 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Axel Kielhorn Newsgroups: gmane.comp.tex.context Subject: Reading XML with lua Date: Fri, 3 May 2019 17:24:39 +0200 Message-ID: <6B1CE296-3FCD-4B16-960B-484FE2D25999@axelkielhorn.de> Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="110227"; mail-complaints-to="usenet@blaine.gmane.org" To: "ntg-context@ntg.nl" Original-X-From: ntg-context-bounces@ntg.nl Fri May 03 21:29:48 2019 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hMdSA-000bS2-Qp for gctc-ntg-context-518@m.gmane.org; Fri, 03 May 2019 21:02:40 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 83E44132940; Fri, 3 May 2019 17:24:44 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W-YxqfWzKnn0; Fri, 3 May 2019 17:24:42 +0200 (CEST) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id D6A0013292F; Fri, 3 May 2019 17:24:42 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 05652132935 for ; Fri, 3 May 2019 17:24:42 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w44lktT6Cszv for ; Fri, 3 May 2019 17:24:40 +0200 (CEST) Original-Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id 52F8A132933 for ; Fri, 3 May 2019 17:24:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1556897080; s=strato-dkim-0002; d=axelkielhorn.de; h=To:Date:Message-Id:Subject:From:X-RZG-CLASS-ID:X-RZG-AUTH:From: Subject:Sender; bh=+l7RCttqRtzowOs4I2cn5XTknw5oWUYP6ieBLUBT7Cg=; b=S7fudWgXsSbayW4mFdj9amGcC7o2T+WbsBY1016dp038yi395/BBm/Njzr7f3FWeSy 8QKHhgRu06DOzpQHcyuioj1aqQQHay6VNjgiw8m4uovTVQIyyPXa7f2UZUHb09VjadQd z0CaqbnYpL8Y+dAH5SuM6l7Pi0yRWhWjjlUv+tz6BPqC0N5jB/cqTPrE1buQWxa5sT0x 4YkFSToeot8jVD7pyUR3cDuYvTh/xd3P0EENfkJeD2e2MMdmEkZU1hxuOxEtTpdOCkBA W9CCnZo2qxzID62s0YZDV3LGwwfC1F61GzOY3R2cwjs46e3lUgunO6+ZvO7YOocza/Eo 2s8Q== X-RZG-AUTH: ":OGkcVUGwfvMLvkVusQ1g9gJQt7WL23nhrqI2F4F9RuxzbAQePvTSV0cr4JoWPbaGO12P8798Yx/+ThuHHNNqxXz/qnQ67JW5iyZk/RJU5g==" X-RZG-CLASS-ID: mo00 Original-Received: from [IPv6:2003:cc:fbc7:5200:651e:331e:32a9:cb42] by smtp.strato.de (RZmta 44.18 AUTH) with ESMTPSA id Y056c2v43FOdGi7 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate) for ; Fri, 3 May 2019 17:24:39 +0200 (CEST) X-Mailer: Apple Mail (2.3124) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.org gmane.comp.tex.context:104199 Archived-At: --Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, I=E2=80=99 m finally at a point where I can read an XML file look at it = and write it back. Right now I have two problems: 1. There are many empty entries in the table. This makes processing the file a little tedious since I have to filter = the real entries. 2. Some entries contain leading and trailing whitespace and linkebreaks. These are the result of formatting the XML file. Shouldn't strip_cm_and_dt take care of them? Or do I use that incorrectly? I have included my lua file and an example xml. You can run it with mtxrun =E2=80=94script p.lua Greetings Axel --Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654 Content-Disposition: attachment; filename=doclist.xml Content-Type: application/xml; name="doclist.xml" Content-Transfer-Encoding: 7bit PD 3710 DTD.html LA PD 3711 Prozessbeschreibung.html LA PD 4711 Prozess.xml Rechtsabteilung PL PD 4712 Prozess.pdf Rechtsabteilung Web-Server PD 4713 Prozess-g.pdf Rechtsabteilung Web-Server PD 4722 Dokumentenfluss.pdf EDV Web-Server --Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654 Content-Disposition: attachment; filename=p.lua Content-Type: application/octet-stream; name="p.lua" Content-Transfer-Encoding: 7bit local application = logs.application { name = "Prozess", banner = "Prozess parser", } local report = application.report local settings = {} settings.strip_cm_and_dt=true -- Reading the file doc = xml.load("doclist.xml", settings) report("Datei %s gelesen", doc.settings.currentresource) -- inspect(doc) -- inspect(doc.dt) for i,v in ipairs(doc.dt) do if v.tg == "doclist" then doc2 = v break end end -- Just the doclist part -- inspect(doc2) for i,v in ipairs(doc2.dt) do if v.tg=="psdoc" then print ("PSDOC: ",i) -- inspect (v) docan = nil for j,k in ipairs (v.dt) do if k.tg then print (j, k.tg, table.unpack(k.dt)) if k.tg=="docan" then docan=table.unpack(k.dt) end -- inspect (k.dt) end end if docan then print ("DOCAN: ", docan) end end -- inspect (v) end -- Writing it back xml.save(doc,"outfile.xml") -- So far no logging --[[ local f = io.open("p.log", "w") f:write() f:close() --]] --Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --Apple-Mail=_A7F181BD-2649-49CD-B1E3-8EB1813DC654--