Yes, lua filters operate on the AST (abstract syntax tree).

I think that some pre-processing will be necessary because (AFAIK) the Para (p) element doesn't retain attributes in the AST.

Here's an example using HTML derived from yours (the p element is wrapped in a div). Note: I think perhaps the lua div logic could be simpler, but this seems to work.

% cat kursiv.html
<span class="Kursiv">span-text-with-class</span>
<div class="Normal_fett"><p>para-text-class-from-div</p></div>

% pandoc kursiv.html -L kursiv.lua
<em><span>span-text-with-class</span></em>
<div>
<p><strong>para-text-class-from-div</strong></p>
</div>

% cat kursiv.lua
function Span(span)
    local class, index = span.attr.classes:find('Kursiv')
    if class then
        span.attr.classes:remove(index)
        return pandoc.Emph({span})
    end
end

function Div(div)
    local class, index = div.attr.classes:find('Normal_fett')
    if class then
        div.attr.classes:remove(index)
        div.content = div.content:map(
            function(elem)
                elem.content = {pandoc.Strong(elem.content)}
                return elem
            end
        )
        return div
    end
end


On Sat, 25 Jun 2022 at 10:34, Frank Bergmann <pandoc-eSlkCAlw8VwAvxtiuMwx3w@public.gmane.org> wrote:
Hi,

this time I have some questions.

As far as I understood the lua scripting it is not working on actual
input but just on already translated native format.
What I need is to do some "translations" on raw HTML input.
(BTW - actual output here is asciidoc.)

My issue is that the "HTML" input has a lot of styles like these:

<span class="Kursiv">
<span class="FettUnterstrichen">
<p class="Normal_fett">
<p class="rml10_101__Normal_fett">
<p class="rml10_112__Normal_fett">
<p class="rml10_114__Normal_fett">
<p class="rml10_11__Normal_fett">
<p class="rml10_122__Normal_fett">
<p class="rml10_124__Normal_fett">
<p class="rml10_133__Normal_fett">
<p class="rml10_136__Normal_fett">
<p class="rml10_138__Normal_fett">
<p class="rml10_177__Normal_fett">
<span class="Fett">
<span class="FettUnterstrichen">

(Note: kursiv=italic/emphasized, fett=bold, unterstrichen=underline)

Is there a way in pandoc to "translate" styles like e.g. the ones with
"fett" to e.g. a simple HTML tag "<b>" before internally doing the
actual translation to native and then to output format?
Can a lua script be used for this?
Or do I need to write a translator of my own and run it BEFORE using pandoc?

(Note: The "HTML" input is coming from Adobe RoboHelp.)

kind regards,
Frank

--
Frank Bergmann, Pödinghauser Str. 5, D-32051 Herford, Tel. +49-5221-9249753
SAP Hybris & Linux LPIC-3, E-Mail tx2014-VEyjnN4Vo9k@public.gmane.org, USt-IdNr DE237314606
http://tdyn.de/freel  -- Redirect to profile at freelancermap
http://www.gulp.de/freiberufler/2HNKY2YHW.html  -- Profile at GULP

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c09f254c-5ccf-1ed4-97ab-4e6bccbbdcb6%40tuxad.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxh02ZZ_HbZS0cPDZ4rWE%2BES5zYJQsa4Uw9_bTBX5aEAVg%40mail.gmail.com.