public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Inserting attributes into elements
Date: Thu, 15 Jun 2023 15:00:39 +0200	[thread overview]
Message-ID: <CADAJKhBjJTZki4np=oSfbbcmtBjEahCU4440tRfOh_6TMzSAYw@mail.gmail.com> (raw)
In-Reply-To: <37d8c191-388e-164e-6955-9014b4f0a4a0-FcZObrvlYduBUy7/sJONFg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 11161 bytes --]

Den tors 15 juni 2023 03:20H <agents-FcZObrvlYduBUy7/sJONFg@public.gmane.org> skrev:

> On 06/14/2023 01:30 PM, BPJ wrote:
> >
> > Pandoc's document module used to not support attributes at all. IIRC
> attributes were first introduced for fenced code blocks, then extended to
> inline code. Spans and divs (in the Pandoc sense) were introduced
> specifically to provide containers for arbitrary content to which
> attributes can be attached. At the same time (IIRC) attributes were
> extended to headings ("Header"), links and images. It was decided not to
> extend attributes to other elements as that would entail huge changes to
> the code base. Later when Pandoc's table model was changed the new table
> model included attributes.
> >
> > Code needs attributes to allow to attach highlighting information to it,
> and headings and images need them too for various reasons, and links
> probably came along for the ride together with images. Normally divs and
> spans are enough for all other cases, because in regular CSS in an external
> file or embedded in the `<head>` of an HTML document you can use a child
> selector, e.g. in Markdown you type
> >
> > ``````markdown
> > :::class
> > ****
> > :::
> > ``````
> >
> > and then you style the rule with
> >
> > ``````css
> > div.class hr { ... }
> > ``````
> >
> > Your imposed limitation of not being able to use external CSS creates
> problems which most users simply don't have. For the horizontal rule case
> you can use a raw block to insert the HTML directly, if you are not going
> to generate other formats from the source:
> >
> > ``````markdown
> > Para before.
> >
> > ```{=html}
> > <hr style="...">
> > ```
> >
> > Para after
> > ``````
> >
> > You can also use a filter to do things like this:
> >
> > ``````lua
> > local hr_filter = {
> >   HorizontalRule = function()
> >     return pandoc.RawBlock('html', '<hr style="...">')
> >   end
> > }
> > function Div(div)
> >   if div.classes:includes('class') then
> >     return div:walk(hr_filter).content
> >   end
> > end
> > ``````
> >
> >
> > I sometimes post-process HTML generated by pandoc with with Mojo::Dom <
> https://metacpan.org/pod/Mojo::DOM> to transfer attributes from wrapping
> divs/spans to contained elements and remove the wrapper, or just to set
> attributes to elements contained in wrappers. The API makes such changes
> very easy. You basically find elements in an HTML document with CSS
> selectors, then loop through the found elements and change them in-place
> with Perl code. Adding/removing/changing attributes is very easy: you just
> treat the element object as if it is a hash (associative array) reference
> containing the attributes! Then when you are done you print the document
> object to a file or stdout.
> >
> Thank you for the explanation. I did resort to creating the <hr ... /> in
> the filter.
>
> Now another problem - I have multiple images in my markdown document and a
> <figure></figure> tag pair gets added around the <image> which is fine.
>
> However, while processing the <figure> block I want to make changes to the
> default style attribute <image> for some of the images. Using the logging
> module I find e.g.:
>
> (#) figure Figure {
>   attr: Attr {
>     attributes: AttributeList {
>       style: "margin: 0px;"
>     }
>     classes: List {}
>     identifier: ""
>   }
>   caption: {
>     long: Blocks {}
>   }
>   content: Blocks[1] {
>     [1] Plain {
>       content: Inlines[1] {
>         [1] Image {
>           attr: Attr {
>             attributes: AttributeList {}
>             classes: List {}
>             identifier: ""
>           }
>           caption: Inlines[1] {
>             [1] Str "whatever"
>           }
>           src: "https://www.somedomain.tld/images/someimage.jpg"
>           title: ""
>         }
>       }
>     }
>   }
> }
>
>
> and if look at the logging output for the Image I find:
>
> #) image Image {
>   attr: Attr {
>     attributes: AttributeList {
>       style: "height: auto; width: 100%; object-fit: contain;"
>     }
>     classes: List {}
>     identifier: ""
>   }
>   caption: Inlines[1] {
>     [1] Str "whatever"
>   }
>   src: "https://www.somedomain.tld/images/someimage.jpg"
>   title: ""
> }
>
> While processing the Figure element in the filter, I want to change the
> style attributes for the Image listed above. They show up correctly in the
> logging module output for Image above but the logging output for Figure
> shows an empty list.
>



I’m not quite sure what you mean but here are basically two ways to affect
only elements inside another element of a certain kind (with a certain tag):

A.  If you want to do something with all Image elements inside any Figure
element you can use a local filter which you apply only to the Figure
element. This is done by calling the `:walk(inner_filter)` method on the
parent/ancestor element.

    Some things to keep in mind:

    a)  The `inner_filter` is a table where the keys are element tag names
and each value are filter functions to apply to descendants with that tag.
    b)  The `:walk` method does _not_ change the element it is called on.
Rather it returns a “copy” with any modifications applied. Thus you have to
assign the return value of `?walk` to a variable, or return the return
value of `:walk` from the outer filter function in turn.

    This again can take two forms:

    1.  If the action of the inner filter function does not access the
parent/ancestor element in any way it is probably more efficient (and less
cluttered) to define the inner filter only once outside the outer filter
function:

        ```lua
        -- Define static "inner" filter once
        local fig_img_filter = {
          Image = function(img)
            -- Do something with Image
            img.attributes.style = 'border: 0.2rem solid black'
            return img
          end
        }

        -- Main filter function
        function Figure(fig)
          -- Apply "inner" filter to images inside the Figure
          return fig:walk(fig_img_filter)
        end
        ```

    2.  If on the other hand the inner filter function needs to access the
parent/ancestor element you need to define the inner filter inside the
outer filter function where the parent is in scope:

        ```lua
        function Div(div)
          -- Test condition on div
          if div.classes:includes('foo') then
            -- Apply dynamic filter to nested links
            return div:walk({
              Link = function(lnk)
                -- Test condition on link
                if lnk.classes:includes('bar') then
                  -- Copy something from div
                  lnk.attributes.baz = div.attributes.zip
                else
                  -- Copy something else from div
                  lnk.attributes.baz = div.attributes.zap
                end
                return lnk
              end
            }).content
            -- Div is not needed anymore so replace it with its content
          end
        end
        ```

        This also illustrates the common situation where a Div or Span is
only used to tell the filter what to do to elements inside it. In this case
the Div is no longer needed when you have applied the changes to the
elements inside it, so rather than returning the Div you return only what
is inside it.

B.  In case you want to do something only with specific descendants of an
element you have to walk/check the children at each level manually.

    Some things to remember:

    a)  You need to check that any child exist separately before accessing
its properties, because trying to access properties on a non-existing value
is a fatal error.
    b)  Don’t forget to reassign any elements which you have changed to the
right place in the structure or the change may not take effect. It’s not
all that clear when it is needed so it’s safer to do it always.

    Here is a more explicit version (with more comments and some extras :-)
of my filter for conditionally stripping the Figure around an Image:

    ```lua
    function Figure(fig)
      -- First check that we have exactly one child...
      if 1 == #fig.content then
        local child = fig.content[1]
        -- ...then check that the child is a Plain...
        if 'Plain' == child.tag then
          -- ...then check that there is exactly one grandchild...
          if 1 == #child.content then
            local grandchild = child.content[1]
            -- ...and that the grandchild is an Image.
            if 'Image' == grandchild.tag then
              -- Bingo!
              -- Now do we want it to be a figure?
              if grandchild.classes:includes('fig') then
                grandchild.attributes.style = style_for_img_in_fig
                -- Put grandchild back
                child.content[1] = grandchild
                -- Put child back
                fig.content[1] = child
                fig.attributes.style = style_for_figs
                -- Done with figure
                return fig
              end
              -- else if a standalone image
              grandchild.attributes.style = style_for_img
              -- Don't want the fig so return the image!
              return granchild
            end
          end
        end
      end
      -- If not all the conditions hold
      return nil
    end
    ```

    (Note: the `style_for_*` variables aren’t actually defined in this
code; they are just placeholders for some strings!)



> I thought
> print(pandoc.utils.stringify(el.content[1].content[1].attr.attributes))
> would give me the attributes but it does not.
>

`pandoc.utils.stringify` concatenates and returns the Str element values in
its argument, which must be an element object. The attributes of an element
don't meet any of those two descriptions, but they are an object with
custom stringification (in the Lua sense) so you can just say
`print(element.attributes)` to inspect them.



> Could this be a bug?
>

No.


> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/37d8c191-388e-164e-6955-9014b4f0a4a0%40meddatainc.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBjJTZki4np%3DoSfbbcmtBjEahCU4440tRfOh_6TMzSAYw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 16463 bytes --]

  parent reply	other threads:[~2023-06-15 13:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13 20:37 H
     [not found] ` <76a72c07-6699-d243-ae20-64808682ec9e-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-13 21:00   ` 'William Lupton' via pandoc-discuss
     [not found]     ` <CAEe_xxgeoT3UjKy0vK2b_w87d-ovNgpL_gRdyDeyb6+4SztxQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-06-13 21:16       ` H
     [not found]         ` <F6DC033A-83F1-48E8-9947-A372BB0366E7-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-13 21:19           ` H
     [not found]             ` <90C7A30F-C0FA-49D8-B0CD-6521B58113F1-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-13 21:27               ` Bastien DUMONT
2023-06-13 21:38                 ` 'William Lupton' via pandoc-discuss
     [not found]                   ` <CAEe_xxj=P-e03Z6A4BbbqyCAdiCgpxs3cGR8WH_P590Q+bQkWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-06-13 21:58                     ` H
     [not found]                       ` <CA9D2999-4E90-450E-A709-0ECCA45E3494-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-14  6:38                         ` 'William Lupton' via pandoc-discuss
2023-06-13 22:05                 ` H
     [not found]                   ` <0a6aa41a-fe72-a1e8-2630-ec6070c0bbb3-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-13 22:53                     ` H
     [not found]                       ` <74253f39-02db-dc2e-2ae1-9d27aaab82ea-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-14  0:34                         ` H
     [not found]                           ` <61724767-ada0-133f-6751-5884c7460a25-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-14  6:49                             ` Bastien DUMONT
2023-06-14 16:23                               ` H
     [not found]                                 ` <b696e1f5-0648-c8f0-4117-257896e40f8b-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-14 17:30                                   ` BPJ
     [not found]                                     ` <CADAJKhDMVj8SknY2u1mnGOsbQG59sAJqT=vfJcUuiRM-dHempA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-06-15  1:19                                       ` H
     [not found]                                         ` <37d8c191-388e-164e-6955-9014b4f0a4a0-FcZObrvlYduBUy7/sJONFg@public.gmane.org>
2023-06-15 10:13                                           ` 'William Lupton' via pandoc-discuss
2023-06-15 13:00                                           ` BPJ [this message]
     [not found]                                             ` <CADAJKhBjJTZki4np=oSfbbcmtBjEahCU4440tRfOh_6TMzSAYw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-06-16  1:12                                               ` H

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADAJKhBjJTZki4np=oSfbbcmtBjEahCU4440tRfOh_6TMzSAYw@mail.gmail.com' \
    --to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).