public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: How to manipulate with Block elements with Lua filters
Date: Wed, 5 Jul 2023 17:44:22 +0200	[thread overview]
Message-ID: <CADAJKhDnwx5VsAv4mzukcU6MDSqDZ+cKk_x0V4Gvsb+twV7J1w@mail.gmail.com> (raw)
In-Reply-To: <ae032a8d-4d0a-4608-b479-61965cee2793n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 6281 bytes --]

Den mån 26 juni 2023 21:12Ioan Muntean <imuntean-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Dear BPJ
> I read this post and thank you for illuminating some aspects of walking a
> DIV element in pandoc. My story is different but I tried to achieve
> something simpler.
> I have a docx document and I want to convert it to Latex to include some
> special styles. . I adapted your code and  all works well except that I get
> extra lines between the environment definition and content.
>

Sorry for the delay. At first I didn't notice your post and then I was very
busy.

The reason for the "extra" whitespace is that the raw text before and after
the div are separate blocks, and Pandoc inserts a blank space between
blocks because usually it is both wanted and necessary. The workaround for
that is to convert the div to latex while running the filter and
concatenate it with the prefix and postfix into a single raw latex block.
You can see how to do that in the attached filter, which maybe will work
for you as is. It is a slimmed-down version of an undocumented filter which
does the same for either classes or arbitrary attributes and for multiple
output formats, but with the whitespace-avoiding stuff added.


Here is my adapted code:
>
> function Div(divclaims)
>
> local preclaims = pandoc.RawInline('latex', '\\begin{claims}')
> local postclaims = pandoc.RawInline('latex', '\\end{claims}')
> local preissues = pandoc.RawInline('latex', '\\begin{issues}')
> local postissues = pandoc.RawInline('latex', '\\end{issues}')
>
>   divbe=tostring(divclaims.t)
> styletobe=tostring(divclaims.attr)
>
>   if (string.find(styletobe, "Claims") or string.find(styletobe,
> "Issues"))  then
> if (string.find(styletobe, "Claims")) then pre=preclaims post=postclaims
> print("Claim found") end
> if (string.find(styletobe, "Issues")) then pre=preissues post=postissues
> print("Issue found") end
>
> local content = divclaims.content
>
> table.insert(content, 1, pre)
> table.insert(content, post)
>
>
>
> return content
>   end
>   return nil
> end
>
>
>
> The Latex looks cool, except that the table.insert adds some empty lines :
>
> \begin{issues}
>
> text with formatting....
>
> \end{issues}
>
> That baffles the Latex interpreter . My question is: how can I modify the
> code such that my latex output will have no extra empty lines?
>
> \begin{issues} text text \end{issues}
>
> Thank you in advance!
> Ioan M.
>
>
>
> On Monday, January 10, 2022 at 12:46:37 PM UTC-6 BPJ wrote:
>
>> It is neither possible nor needed to convert the whole block to HTML
>> within the filter; rather you should just inject the start and end tags:
>>
>> ``````lua
>> -- Create these only once, for speed and resources saving!
>> local pre = pandoc.RawBlock('html', '<note>')
>> local post = pandoc.RawBlock('html', '</note>')
>>
>> function Div (div)
>>   -- The order of the classes shouldn't matter!
>>   if div.classes:includes('replace-me') then
>>     local content = div.content
>>     table.insert(content, 1, pre)
>>     table.insert(content, post)
>>     return content
>>   end
>>   return nil
>> end
>> ``````
>>
>> Den mån 10 jan. 2022 15:33Tomáš Kruliš <tomas....@integromat.com> skrev:
>>
>>> Hello,
>>>
>>> I would like to ask how you should, in general, detect and manipulate
>>> with Pandoc `block` elements. Currently, I am trying to replace `<div
>>> class='replace-me'>` tag with `<note>` tag in similar (highly simplified)
>>> HTML file:
>>>
>>> ```.{html}
>>> <html>
>>> <body>
>>> <p> First line. </p>
>>> <div class="replace-me another-class"> This should carry on to converted
>>> document. </div>
>>> <p>End.</p>
>>> </body>
>>> </html>
>>> ```
>>>
>>> I have tried to detect the `<div>` tag, use `walk_block` to get the
>>> `<div>` content and put it in `<note>` tag, I also found a code using
>>> `:walk` method. Lastly, I tried to convert `<div>` content to simple string
>>> and concatenate that in `RawInline` type:
>>>
>>> ```.{lua}
>>>   if elem.t == 'Div' and elem.classes[1] == "replace-me" then
>>>     content = pandoc.utils.stringify(elem.content)
>>>     return pandoc.RawInline('html', '<note>' .. content.. '</note>')
>>>   else
>>>     return elem
>>>   end
>>> ```
>>>
>>> But none of that is working. I would like to ask you, how to work in
>>> general with `pandoc_walk` or `:walk` (are they the same?) and how to deal
>>> with my specific situation?
>>> Thank you very much for any help, I ope that afterwards I will be able
>>> to help myself a little bit more :)
>>> Regards Tomas
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/590abdf0-6bc5-4f37-a978-a46ad5cff5a8n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/590abdf0-6bc5-4f37-a978-a46ad5cff5a8n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/ae032a8d-4d0a-4608-b479-61965cee2793n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/ae032a8d-4d0a-4608-b479-61965cee2793n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDnwx5VsAv4mzukcU6MDSqDZ%2BcKk_x0V4Gvsb%2BtwV7J1w%40mail.gmail.com.

[-- Attachment #1.2: Type: text/html, Size: 9594 bytes --]

[-- Attachment #2: custom-style2latex.zip --]
[-- Type: application/zip, Size: 20047 bytes --]

      parent reply	other threads:[~2023-07-05 15:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 14:32 Tomáš Kruliš
     [not found] ` <590abdf0-6bc5-4f37-a978-a46ad5cff5a8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-01-10 18:46   ` BPJ
     [not found]     ` <CADAJKhAzeK-kPd7yHWbtGX=363VvcgFUj8gt_vEMUXfGkBd+ug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-01-11 14:07       ` Tomáš Kruliš
     [not found]         ` <aea85ef4-af50-46e5-8e21-d677801cb971n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-01-11 20:56           ` Bastien DUMONT
2022-01-11 21:25           ` Bastien DUMONT
2022-01-12  9:45           ` BPJ
     [not found]             ` <CADAJKhD6WVs3UxD0Dt0bYLA8mE4-A4n=QyyKrwkT=MYO+QNaVg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-01-13 14:40               ` Tomáš Kruliš
2023-06-26 19:12       ` Ioan Muntean
     [not found]         ` <ae032a8d-4d0a-4608-b479-61965cee2793n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-07-05 15:44           ` BPJ [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADAJKhDnwx5VsAv4mzukcU6MDSqDZ+cKk_x0V4Gvsb+twV7J1w@mail.gmail.com \
    --to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).