Thanks, the script to strip the underlines is exactly what I needed. As for
the spaces in the bolds/italics, I'll add an issue for that. Obviously,
this is not pandoc's fault but if you have a workaround that can be ported
to the RTF reader at some point, that would be super.
In the meantime, I guess I'll investigate doing the job myself in Lua. Even
if it takes me a couple of days to figure out it'll be faster than
processing every file manually!
Kris
On Wednesday, April 27, 2022 at 2:28:20 PM UTC-4 John MacFarlane wrote:
>
> The issue with bold is probably because the RTF file includes
> some spaces inside the boldface emphasis. That is depressingly
> common in word processing documents, and we have code in the docx
> reader, if I recall, that handles it by converting
>
> helloSPACE
> to
> helloSPACE
>
> We could port this over to the RTF reader, I think -- can you
> put up an issue on the tracker so we don't forget?
>
> The other issue can be handled using a simple Lua filter.
> Save it as ununderline.lua and use -L ununderline.lua on
> the command line:
>
> function Underline(el)
> return el.content
> end
>
> You could probably handle the spacing issue with a more complex
> Lua filter, as well.
>
> Kris Wilk writes:
>
> > Sorry if anyone gets this twice, had to correct my formatting...
> >
> > I'm trying to use pandoc (for the first time) to convert some RTF files
> to
> > markdown. My goal is to extract the text with ***bold*** and **italics**
> > preserved and no other formatting.
> >
> > Simply converting with "pandoc in.rtf -o out.md" produces a markdown
> file
> > that's not quite what I need. For instance, here's a line from the
> output:
> >
> > **[Scientific Name]{.underline}: ***Aplysia parvula *Morch, 1863
> >
> > FIRST and foremost, pandoc tries to preserve the underlined text, which
> I
> > don't want. Can this be disabled? I've tried the "bracketed_spans" and "
> > native_spans" extensions but this still processes the underlines as:
> >
> > **Scientific Name: ***Aplysia parvula *Morch, 1863
> >
> > SECOND, at least when I view this in VSCode's markdown preview, the bold
> > and emphasis are not presented correctly, I guess because they touch
> each
> > other or have spaces (or both?)? It displays correctly if it's:
> >
> > **Scientific Name:** *Aplysia parvula* Morch, 1863
> >
> > I realize that the text in the RTF might have the bold/italic tagged
> > weirdly but is there a way to deal with this or am I just stuck? I have
> > about 500 such files to process, so I'm looking for automated methods.
> >
> > Thanks in advance for any help you can provide!
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/aecd40a2-09db-4e1b-96ad-752973375e0cn%40googlegroups.com
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/bf4044b0-6746-4720-942f-53303a5cb296n%40googlegroups.com.