* Handling internal references when converting from HTML to reStructuredText
@ 2018-12-15 11:35 Carsten Fuchs
[not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Carsten Fuchs @ 2018-12-15 11:35 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1910 bytes --]
Dear Pandoc group,
I have a set of HTML files that originated from a DokuWiki wiki.
The HTML files have a few external links such as https://... and mostly
internal links to each other. For example in file introduction.html:
Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
"modeleditor:mainwindow">The Main Window</a>.
where modeleditor/mainwindow.html is another local file of the same overall
document (a former wiki page).
When converting the above HTML to RST, I would like Pandoc to treat the
link (<a href="/modeleditor:mainwindow">) as an internal link, so that the
generated reStructuredText contains e.g.
:ref:`modeleditor_mainwindow` or `the-main-window`_
The exact name of the target label is not important: even if it results in
a broken internal reference, I can fix these in a separate pre- or
postprocessing step.
The question is:
How do I prepare the <a href="..."> link to have Pandoc treat it as an
internal reference?
I have tried modifying the target of a Link node with a Lua filter script,
but the result is always an external link such as
`The Main Window <#_the_main_window>`__
in the generated RST file.
Is there something else I can try in a Lua filter or as an independent
preprocessing step to have such links treated as internal references?
Best regards,
Carsten
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 5235 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Handling internal references when converting from HTML to reStructuredText
[not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-16 0:53 ` John MacFarlane
[not found] ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2018-12-16 0:53 UTC (permalink / raw)
To: Carsten Fuchs, pandoc-discuss
Currently the RST writer doesn't really distinguish
between links with internal targets (#...) and those
with external targets. It won't produce
:ref:`foo` or `foo`_ for the former.
This is something we might think about changing in
the RST writer. However, in the mean time you can
deal with this by using a lua filter. Simply
intercept Links with the targets you're interested
in, and produce a RawInline('rst', ___) with the
exact RST you want.
Carsten Fuchs <carsten.fuchs-SDYpArl04Oc@public.gmane.org> writes:
> Dear Pandoc group,
>
> I have a set of HTML files that originated from a DokuWiki wiki.
> The HTML files have a few external links such as https://... and mostly
> internal links to each other. For example in file introduction.html:
>
> Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
> "modeleditor:mainwindow">The Main Window</a>.
>
> where modeleditor/mainwindow.html is another local file of the same overall
> document (a former wiki page).
> When converting the above HTML to RST, I would like Pandoc to treat the
> link (<a href="/modeleditor:mainwindow">) as an internal link, so that the
> generated reStructuredText contains e.g.
>
> :ref:`modeleditor_mainwindow` or `the-main-window`_
>
> The exact name of the target label is not important: even if it results in
> a broken internal reference, I can fix these in a separate pre- or
> postprocessing step.
>
> The question is:
> How do I prepare the <a href="..."> link to have Pandoc treat it as an
> internal reference?
> I have tried modifying the target of a Link node with a Lua filter script,
> but the result is always an external link such as
> `The Main Window <#_the_main_window>`__
> in the generated RST file.
>
> Is there something else I can try in a Lua filter or as an independent
> preprocessing step to have such links treated as internal references?
>
> Best regards,
> Carsten
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Handling internal references when converting from HTML to reStructuredText
[not found] ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-16 9:56 ` Carsten Fuchs
0 siblings, 0 replies; 3+ messages in thread
From: Carsten Fuchs @ 2018-12-16 9:56 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Hi John,
that's awesome, many thanks for your help!
Best regards,
Carsten
Am 16.12.18 um 01:53 schrieb John MacFarlane:
>
> Currently the RST writer doesn't really distinguish
> between links with internal targets (#...) and those
> with external targets. It won't produce
> :ref:`foo` or `foo`_ for the former.
>
> This is something we might think about changing in
> the RST writer. However, in the mean time you can
> deal with this by using a lua filter. Simply
> intercept Links with the targets you're interested
> in, and produce a RawInline('rst', ___) with the
> exact RST you want.
>
> Carsten Fuchs <carsten.fuchs-SDYpArl04Oc@public.gmane.org> writes:
>
>> Dear Pandoc group,
>>
>> I have a set of HTML files that originated from a DokuWiki wiki.
>> The HTML files have a few external links such as https://... and mostly
>> internal links to each other. For example in file introduction.html:
>>
>> Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
>> "modeleditor:mainwindow">The Main Window</a>.
>>
>> where modeleditor/mainwindow.html is another local file of the same overall
>> document (a former wiki page).
>> When converting the above HTML to RST, I would like Pandoc to treat the
>> link (<a href="/modeleditor:mainwindow">) as an internal link, so that the
>> generated reStructuredText contains e.g.
>>
>> :ref:`modeleditor_mainwindow` or `the-main-window`_
>>
>> The exact name of the target label is not important: even if it results in
>> a broken internal reference, I can fix these in a separate pre- or
>> postprocessing step.
>>
>> The question is:
>> How do I prepare the <a href="..."> link to have Pandoc treat it as an
>> internal reference?
>> I have tried modifying the target of a Link node with a Lua filter script,
>> but the result is always an external link such as
>> `The Main Window <#_the_main_window>`__
>> in the generated RST file.
>>
>> Is there something else I can try in a Lua filter or as an independent
>> preprocessing step to have such links treated as internal references?
>>
>> Best regards,
>> Carsten
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-12-16 9:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-15 11:35 Handling internal references when converting from HTML to reStructuredText Carsten Fuchs
[not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-16 0:53 ` John MacFarlane
[not found] ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-16 9:56 ` Carsten Fuchs
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).