From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30848 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: William Lupton Newsgroups: gmane.text.pandoc Subject: Re: Translating style sheets in reader on HTML input Date: Sat, 25 Jun 2022 12:23:52 +0100 Message-ID: References: <3f7b920b-c982-5be5-fa04-9025e008e518@tuxad.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000009121ab05e243ecc6" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33114"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS4HJ6WSAHBBVHA3OKQMGQEM22MHQQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Jun 25 13:24:10 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-pl1-f190.google.com ([209.85.214.190]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o53tY-0008Op-Pm for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 25 Jun 2022 13:24:08 +0200 Original-Received: by mail-pl1-f190.google.com with SMTP id e8-20020a17090301c800b0016a57150c37sf2662667plh.3 for ; Sat, 25 Jun 2022 04:24:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1656156247; cv=pass; d=google.com; s=arc-20160816; b=Lnlo5H7/7bQUEMbPpeWHqMnOmV+dEibu5vOkyyF+vDYGXrWJ4gvyXe3iqmIcd/UOi8 t1aAi99etVraMjrFvAekh0C2AAlvrhqn2EEHahkPzlGh+x2l4iZl82+IZWIdNX3hVlls ZYTUEkRYa6aa28+pPIzM5fM4HEcU8Vkq+s0fivhlap54A58Bv6vekV7EeVIXGQbuj81v dRGp5W+owoiY+Mpp2LcSfpfOl68X2k6kKwdH9AVZCgSQGTIZpfFEb8FpoGGXuEc3XGuh Sy/aYgLpfzIorHJ73FULkJL8UonzbDjpj0ltQFnta9QGlN6MmS88q8S0AivmEXH0kPSX ingQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:sender:dkim-signature; bh=UQSJ+QFQURvBjTTUdt4ZkdZ9unku1iPLC6d6DXz6lx0=; b=UIUJ6ngJ9rNh8dfVFeUo0m80mqQ3X+vBVWTaKYfaoi+viVa7SxCSRq2iqaMGn+ptQm y2DUwUVPVnNNIh4sv476L68JMxu35xcKP4kDCvpRGcnVBDH47kvLiUAFHPysXjtINFk0 x6XANEaoGnQ4HZrRvmnE6uF8xG0Fff6CzG6RfOIV4DOCoKKqKOepsW16oxKFxJzqdAzR 1J07r+m40BV9IhRi0KH0SvgmDbZVobjPb9O5b2eYZM/rIjtTCUL9y7yvJcPYc5nG+IZA Xvjek4x2TPi038Da5RjjRSv8Zss+dItZKuxGR9/UlZrDbki5vHRVqfs9XtoE6fR9lpZi rtmQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20210112.gappssmtp.com header.s=20210112 header.b=ChSfRStu; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=UQSJ+QFQURvBjTTUdt4ZkdZ9unku1iPLC6d6DXz6lx0=; b=MX9Wf5caZ+2+2flJSIAvGIEENTVC9W+Tv3TIe3g3rKPFqnjOZAalq3skwZXxAy0Axj 7nEVdK+3jNwdfTq0NgwIccf7orYruuB4eMGZ6tpDDpmQYZXcSWKK7jjerHf4DOnaFwFk kHACphckXBqvw7jKtSZyV7j4uAM/AjfY6sJS8o20naXusmqNLi38hqkBqQR+wRLvVwdL +iaVe6FIqdKeBWbC9QZ31JL8udeHHrFzZeNE/+3MwsvWhlUfr0MA8EWkct9TRaBfTgjE d4302MnWthRc0PGAcbOcXkZ7yUreEjIsUtgarx/hyPOtYykwsX3eQomW86RTJTxH/D88 y96Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=UQSJ+QFQURvBjTTUdt4ZkdZ9unku1iPLC6d6DXz6lx0=; b=YwA0cWSde5sUAtqqTDz+jRx5aO/fj258JBhukDiSq2B9qVjYYszSROssN55SShFN7i rVpz2uRPDRGwsNCXuG8njdd8mVKXpH7k75bfg6utr6V4WZUAR85iBbZbUgfH0feL3tIA T9EqhWNBiCzS/rSz4wWBkoYv30Uk1hvByfqod5f6M1jhsDZopon/apJHYAfLmmmmRa+w TTRI8t/Ajab3jlga75YonQDTl++OndEESMEzgZWD2myRO3j4r40OEpGfxNwMKGdzhrC7 ZokFil133RQuz1+oNLmBXm+pzfGRDzGdopR5BqRrzve2fmaBL7Fwqs7kHj3a34rLehKK V/RQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora+/vhDHC+61loTGGCEWKac28rTgtjeSBSEsa9ihWIlz4E2SJKH0 3XZ79RgdU2RDQY6GrjAnKwI= X-Google-Smtp-Source: AGRyM1u7xWh+bgdpRZJKqYepeW09p8TFjCiNdmMJilMrAt7RnBAPsaZP25y/5oVgt+BE3pyJsDLiXw== X-Received: by 2002:a17:90b:1e06:b0:1ec:b513:4523 with SMTP id pg6-20020a17090b1e0600b001ecb5134523mr9371759pjb.58.1656156247480; Sat, 25 Jun 2022 04:24:07 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6a00:884:b0:525:23f4:c380 with SMTP id q4-20020a056a00088400b0052523f4c380ls7300158pfj.5.gmail; Sat, 25 Jun 2022 04:24:04 -0700 (PDT) X-Received: by 2002:a63:bf4d:0:b0:40c:4060:f6d with SMTP id i13-20020a63bf4d000000b0040c40600f6dmr3375522pgo.254.1656156244021; Sat, 25 Jun 2022 04:24:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656156244; cv=none; d=google.com; s=arc-20160816; b=IOIOV5iWiEycNP2b4JzfBVyfAkgFYDcY7uTMG7BRiu2eVH0yNUhqqiNYVYFbRn72jb 8Bjs3KDWVkW/PN5qbbBXEvqbkqIXkn0E0tlwsf8FPQKrPMWJwr6Hgv74Dqe8H9PaN62e piiGnTlfpERKKHPz88fFzJUROyCZyskxIj9JhkqVnMlLVYCq522StYitXwjcKqMy7ZDL Dl9WR+EMuO+3iTapelMmfe5HslNcSgvHmYTlKo+lufdwBXFEbXCCIhy01vDod10omT+T j68sDTicxDSCpmnCxVmYphJUsIsa5kDVPEysa+d7egsOcUMwuNt+jyC/XmpyigeIG/gP yumA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=9LrA4/5HCiPoQtVxCV7x/nnuiSK+FMVbA6qWU3CYmlA=; b=ktRCY6m/wK5CrQHq5G1Ecvpj4Umd3kwBWAOoTr43GhZtXTZMKAwbcCZhkLM3yyp0U3 E2AFE5/P6Z84daN4hIooJZEdK+sKwMCHidiLzLJ8k5i7fb7Pn5i9cDAEw4o+PP/eKNTo e9xwjXMYLQns+KsYsTTeKn3zlJQaaeojvZpTg82OpQCX4oS60X/QXdfdde3Dp3unemz9 Yui3ZKgCWqf7ngXqTCfMACqNAexqEaQxxnKNrP6J+S4AcSFDosQb4ovyIt8NIPWDiSYT planSiak9C5VbfKepx2vBuLILkXzNmIUbSis/CWnhStRl+4cclr5oSqVvWCCS9w4bDxg XcRw== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20210112.gappssmtp.com header.s=20210112 header.b=ChSfRStu; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org Original-Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com. [2607:f8b0:4864:20::532]) by gmr-mx.google.com with ESMTPS id nk15-20020a17090b194f00b001ec0da63423si388758pjb.2.2022.06.25.04.24.03 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 25 Jun 2022 04:24:03 -0700 (PDT) Received-SPF: pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) client-ip=2607:f8b0:4864:20::532; Original-Received: by mail-pg1-x532.google.com with SMTP id 23so4716331pgc.8 for ; Sat, 25 Jun 2022 04:24:03 -0700 (PDT) X-Received: by 2002:a62:5b06:0:b0:525:451b:bf31 with SMTP id p6-20020a625b06000000b00525451bbf31mr3849183pfb.61.1656156243167; Sat, 25 Jun 2022 04:24:03 -0700 (PDT) In-Reply-To: X-Original-Sender: wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20210112.gappssmtp.com header.s=20210112 header.b=ChSfRStu; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30848 Archived-At: --0000000000009121ab05e243ecc6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes, lua filters operate on the AST (abstract syntax tree). I think that some pre-processing will be necessary because (AFAIK) the Para (p) element doesn't retain attributes in the AST. Here's an example using HTML derived from yours (the p element is wrapped in a div). Note: I think perhaps the lua div logic could be simpler, but this seems to work. % cat kursiv.html span-text-with-class

para-text-class-from-div

% pandoc kursiv.html -L kursiv.lua span-text-with-class

para-text-class-from-div

% cat kursiv.lua function Span(span) local class, index =3D span.attr.classes:find('Kursiv') if class then span.attr.classes:remove(index) return pandoc.Emph({span}) end end function Div(div) local class, index =3D div.attr.classes:find('Normal_fett') if class then div.attr.classes:remove(index) div.content =3D div.content:map( function(elem) elem.content =3D {pandoc.Strong(elem.content)} return elem end ) return div end end On Sat, 25 Jun 2022 at 10:34, Frank Bergmann wrote: > Hi, > > this time I have some questions. > > As far as I understood the lua scripting it is not working on actual > input but just on already translated native format. > What I need is to do some "translations" on raw HTML input. > (BTW - actual output here is asciidoc.) > > My issue is that the "HTML" input has a lot of styles like these: > > > >

>

>

>

>

>

>

>

>

>

>

> > > > (Note: kursiv=3Ditalic/emphasized, fett=3Dbold, unterstrichen=3Dunderline= ) > > Is there a way in pandoc to "translate" styles like e.g. the ones with > "fett" to e.g. a simple HTML tag "" before internally doing the > actual translation to native and then to output format? > Can a lua script be used for this? > Or do I need to write a translator of my own and run it BEFORE using > pandoc? > > (Note: The "HTML" input is coming from Adobe RoboHelp.) > > kind regards, > Frank > > -- > Frank Bergmann, P=C3=B6dinghauser Str. 5, D-32051 Herford, Tel. +49-5221-= 9249753 > SAP Hybris & Linux LPIC-3, E-Mail tx2014-VEyjnN4Vo9k@public.gmane.org, USt-IdNr DE237314606 > http://tdyn.de/freel -- Redirect to profile at freelancermap > http://www.gulp.de/freiberufler/2HNKY2YHW.html -- Profile at GULP > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/c09f254c-5ccf-1ed4-97ab-= 4e6bccbbdcb6%40tuxad.com > . > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CAEe_xxh02ZZ_HbZS0cPDZ4rWE%2BES5zYJQsa4Uw9_bTBX5aEAVg%40mail= .gmail.com. --0000000000009121ab05e243ecc6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Yes, lua filters operate on the AST (abstract syntax = tree).

I think that some pre-processing will be necessa= ry because (AFAIK) the Para (p) element doesn't retain attributes in th= e AST.

Here's an example using HTML derived from you= rs (the p element is wrapped in a div). Note: I think perhaps the lua div l= ogic could be simpler, but this seems to work.

% cat kursiv.html
<span class=3D"Kursiv&q= uot;>span-text-with-class</span>
<div class=3D"Normal_f= ett"><p>para-text-class-from-div</p></div>

% pandoc kursiv.html -L kursiv.lua
<em><span>spa= n-text-with-class</span></em>
<div>
<p><st= rong>para-text-class-from-div</strong></p>
</div>

% cat kursiv.lua
function Span(span)
=C2=A0 =C2=A0 l= ocal class, index =3D span.attr.classes:find('Kursiv')
=C2=A0 = =C2=A0 if class then
=C2=A0 =C2=A0 =C2=A0 =C2=A0 span.attr.classes:remov= e(index)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return pandoc.Emph({span})
=C2= =A0 =C2=A0 end
end

function Div(div)
=C2=A0 =C2=A0 local class= , index =3D div.attr.classes:find('Normal_fett')
=C2=A0 =C2=A0 i= f class then
=C2=A0 =C2=A0 =C2=A0 =C2=A0 div.attr.classes:remove(index)<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 div.content =3D div.content:map(
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 function(elem)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 elem.content =3D {pandoc.Strong(elem.con= tent)}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return el= em
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 end
=C2=A0 =C2=A0 =C2=A0= =C2=A0 )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return div
=C2=A0 =C2=A0 endend


On Sat, 25 Jun 2022 at 10:34, Frank Bergmann <pandoc-eSlkCAlw8VwAvxtiuMwx3w@public.gmane.org> wrote:
Hi,

this time I have some questions.

As far as I understood the lua scripting it is not working on actual
input but just on already translated native format.
What I need is to do some "translations" on raw HTML input.
(BTW - actual output here is asciidoc.)

My issue is that the "HTML" input has a lot of styles like these:=

<span class=3D"Kursiv">
<span class=3D"FettUnterstrichen">
<p class=3D"Normal_fett">
<p class=3D"rml10_101__Normal_fett">
<p class=3D"rml10_112__Normal_fett">
<p class=3D"rml10_114__Normal_fett">
<p class=3D"rml10_11__Normal_fett">
<p class=3D"rml10_122__Normal_fett">
<p class=3D"rml10_124__Normal_fett">
<p class=3D"rml10_133__Normal_fett">
<p class=3D"rml10_136__Normal_fett">
<p class=3D"rml10_138__Normal_fett">
<p class=3D"rml10_177__Normal_fett">
<span class=3D"Fett">
<span class=3D"FettUnterstrichen">

(Note: kursiv=3Ditalic/emphasized, fett=3Dbold, unterstrichen=3Dunderline)<= br>
Is there a way in pandoc to "translate" styles like e.g. the ones= with
"fett" to e.g. a simple HTML tag "<b>" before int= ernally doing the
actual translation to native and then to output format?
Can a lua script be used for this?
Or do I need to write a translator of my own and run it BEFORE using pandoc= ?

(Note: The "HTML" input is coming from Adobe RoboHelp.)

kind regards,
Frank

--
Frank Bergmann, P=C3=B6dinghauser Str. 5, D-32051 Herford, Tel. +49-5221-92= 49753
SAP Hybris & Linux LPIC-3, E-Mail tx2014-VEyjnN4Vo9k@public.gmane.org, USt-IdNr DE237314606
http:= //tdyn.de/freel=C2=A0 -- Redirect to profile at freelancermap
http://www.gulp.de/freiberufler/2HNKY2YHW.html=C2= =A0 -- Profile at GULP

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pan= doc-discuss/c09f254c-5ccf-1ed4-97ab-4e6bccbbdcb6%40tuxad.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://group= s.google.com/d/msgid/pandoc-discuss/CAEe_xxh02ZZ_HbZS0cPDZ4rWE%2BES5zYJQsa4= Uw9_bTBX5aEAVg%40mail.gmail.com.
--0000000000009121ab05e243ecc6--