public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Handling internal references when converting from HTML to reStructuredText
@ 2018-12-15 11:35 Carsten Fuchs
       [not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Carsten Fuchs @ 2018-12-15 11:35 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1910 bytes --]

Dear Pandoc group,

I have a set of HTML files that originated from a DokuWiki wiki.
The HTML files have a few external links such as https://... and mostly 
internal links to each other. For example in file introduction.html:

Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
"modeleditor:mainwindow">The Main Window</a>.

where modeleditor/mainwindow.html is another local file of the same overall 
document (a former wiki page).
When converting the above HTML to RST, I would like Pandoc to treat the 
link (<a href="/modeleditor:mainwindow">) as an internal link, so that the 
generated reStructuredText contains e.g.

:ref:`modeleditor_mainwindow` or `the-main-window`_

The exact name of the target label is not important: even if it results in 
a broken internal reference, I can fix these in a separate pre- or 
postprocessing step.

The question is:
How do I prepare the <a href="..."> link to have Pandoc treat it as an 
internal reference?
I have tried modifying the target of a Link node with a Lua filter script, 
but the result is always an external link such as 
`The Main Window <#_the_main_window>`__
in the generated RST file.

Is there something else I can try in a Lua filter or as an independent 
preprocessing step to have such links treated as internal references?

Best regards,
Carsten

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5235 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Handling internal references when converting from HTML to reStructuredText
       [not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-16  0:53   ` John MacFarlane
       [not found]     ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2018-12-16  0:53 UTC (permalink / raw)
  To: Carsten Fuchs, pandoc-discuss


Currently the RST writer doesn't really distinguish
between links with internal targets (#...) and those
with external targets.  It won't produce
:ref:`foo` or `foo`_ for the  former.

This is something we might think about changing in
the RST writer.  However, in the mean time you can
deal with this by using a lua filter.  Simply
intercept Links with the targets you're interested
in, and produce a RawInline('rst', ___) with the
exact RST you want.

Carsten Fuchs <carsten.fuchs-SDYpArl04Oc@public.gmane.org> writes:

> Dear Pandoc group,
>
> I have a set of HTML files that originated from a DokuWiki wiki.
> The HTML files have a few external links such as https://... and mostly 
> internal links to each other. For example in file introduction.html:
>
> Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
> "modeleditor:mainwindow">The Main Window</a>.
>
> where modeleditor/mainwindow.html is another local file of the same overall 
> document (a former wiki page).
> When converting the above HTML to RST, I would like Pandoc to treat the 
> link (<a href="/modeleditor:mainwindow">) as an internal link, so that the 
> generated reStructuredText contains e.g.
>
> :ref:`modeleditor_mainwindow` or `the-main-window`_
>
> The exact name of the target label is not important: even if it results in 
> a broken internal reference, I can fix these in a separate pre- or 
> postprocessing step.
>
> The question is:
> How do I prepare the <a href="..."> link to have Pandoc treat it as an 
> internal reference?
> I have tried modifying the target of a Link node with a Lua filter script, 
> but the result is always an external link such as 
> `The Main Window <#_the_main_window>`__
> in the generated RST file.
>
> Is there something else I can try in a Lua filter or as an independent 
> preprocessing step to have such links treated as internal references?
>
> Best regards,
> Carsten
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Handling internal references when converting from HTML to reStructuredText
       [not found]     ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-16  9:56       ` Carsten Fuchs
  0 siblings, 0 replies; 3+ messages in thread
From: Carsten Fuchs @ 2018-12-16  9:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi John,

that's awesome, many thanks for your help!

Best regards,
Carsten


Am 16.12.18 um 01:53 schrieb John MacFarlane:
> 
> Currently the RST writer doesn't really distinguish
> between links with internal targets (#...) and those
> with external targets.  It won't produce
> :ref:`foo` or `foo`_ for the  former.
> 
> This is something we might think about changing in
> the RST writer.  However, in the mean time you can
> deal with this by using a lua filter.  Simply
> intercept Links with the targets you're interested
> in, and produce a RawInline('rst', ___) with the
> exact RST you want.
> 
> Carsten Fuchs <carsten.fuchs-SDYpArl04Oc@public.gmane.org> writes:
> 
>> Dear Pandoc group,
>>
>> I have a set of HTML files that originated from a DokuWiki wiki.
>> The HTML files have a few external links such as https://... and mostly 
>> internal links to each other. For example in file introduction.html:
>>
>> Refer to section <a href="/modeleditor:mainwindow" class="wikilink1" title=
>> "modeleditor:mainwindow">The Main Window</a>.
>>
>> where modeleditor/mainwindow.html is another local file of the same overall 
>> document (a former wiki page).
>> When converting the above HTML to RST, I would like Pandoc to treat the 
>> link (<a href="/modeleditor:mainwindow">) as an internal link, so that the 
>> generated reStructuredText contains e.g.
>>
>> :ref:`modeleditor_mainwindow` or `the-main-window`_
>>
>> The exact name of the target label is not important: even if it results in 
>> a broken internal reference, I can fix these in a separate pre- or 
>> postprocessing step.
>>
>> The question is:
>> How do I prepare the <a href="..."> link to have Pandoc treat it as an 
>> internal reference?
>> I have tried modifying the target of a Link node with a Lua filter script, 
>> but the result is always an external link such as 
>> `The Main Window <#_the_main_window>`__
>> in the generated RST file.
>>
>> Is there something else I can try in a Lua filter or as an independent 
>> preprocessing step to have such links treated as internal references?
>>
>> Best regards,
>> Carsten
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/64bcee56-e23f-4206-8e78-ded10e90a2c1%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-12-16  9:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-15 11:35 Handling internal references when converting from HTML to reStructuredText Carsten Fuchs
     [not found] ` <64bcee56-e23f-4206-8e78-ded10e90a2c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-16  0:53   ` John MacFarlane
     [not found]     ` <m28t0q47ig.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-16  9:56       ` Carsten Fuchs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).