* Support for HTML's '<samp>' tag? @ 2019-10-06 7:46 Werner LEMBERG [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Werner LEMBERG @ 2019-10-06 7:46 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [pandoc 2.6] I convert from HTML to texinfo, and I noticed that <samp>foo</samp> gets translated to foo instead of the expected @samp{foo} . Doing git grep -i 'samp[^l]' in pandoc's repository I don't see support for the '<samp>' HTML tag at all. Am I missing something? Is this intentional? Werner ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>]
* Re: Support for HTML's '<samp>' tag? [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org> @ 2019-10-06 11:06 ` BPJ [not found] ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: BPJ @ 2019-10-06 11:06 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2869 bytes --] The reason no doubt is that Pandoc has nothing in its document model which corresponds to `<samp>`. If you use the `+raw_html` extension (`--from=html+raw_html`) you can use a simple Lua filter to convert the raw HTML tags to the appropriate raw Texinfo: ````lua local samp_for = { ['<samp>'] = pandoc.RawInline('texinfo', '@samp{'), ['</samp>'] = pandoc.RawInline('texinfo', '}') } function RawInline (elem) if 'html' ~= elem.format then return nil elseif samp_for[elem.text] then return samp_for[elem.text] else return nil end end ```` I have often wished for an option or extension which will cause "unknown" HTML elements or LaTeX commands/environments to be converted into a span or div with a `tag` or `command`/`environment` attribute so that e.g. `<samp>foo</samp>` becomes what in Markdown would become `[foo]{tag=samp}`, `\foo[bar]{baz}` would becomes what in Markdown would become `[[bar]{command="foo" arg="1"}[baz]{command="foo" arg="2"}]{command="foo"}` and `\begin{foo}[bar]{baz}text\end{foo}` becomes ````Markdown ::: {environment="foo" arg1="bar" arg2="baz"} text ::: ```` since this would in most cases make it much easier to use filters to make something sensible of "unknown" tags/commands/environments. At one point I wrote a filter in Perl which implemented this for LaTeX, including both preserving LaTeX arguments "raw" as attributes and calling out to pandoc to parse the content of arguments and environments. It was much harder with HTML since I had to loop through the contents of any elements which might contain raw HTML and "capture" raw opening tags and what follows them up to the closing tag if any, including keeping track of nested tags. I suppose such a filter might be written in Lua too, but parsing LaTeX arguments and raw HTML attributes would be considerably more hairy due to the limitations of Lua patterns compared to Perl regexes. Den sön 6 okt. 2019 09:47Werner LEMBERG <wl-mXXj517/zsQ@public.gmane.org> skrev: > > [pandoc 2.6] > > > I convert from HTML to texinfo, and I noticed that > > <samp>foo</samp> > > gets translated to > > foo > > instead of the expected > > @samp{foo} . > > Doing > > git grep -i 'samp[^l]' > > in pandoc's repository I don't see support for the '<samp>' HTML tag > at all. Am I missing something? Is this intentional? > > > Werner > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 4437 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Support for HTML's '<samp>' tag? [not found] ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2019-10-06 16:39 ` John MacFarlane 2019-10-06 16:53 ` Werner LEMBERG 1 sibling, 0 replies; 4+ messages in thread From: John MacFarlane @ 2019-10-06 16:39 UTC (permalink / raw) To: BPJ, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw This is a good workaround, but really we should parse <samp>foo</samp> as inline code, perhaps with class 'sample'. I'll add an issue on the tracker: https://github.com/jgm/pandoc/issues/5792 BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > The reason no doubt is that Pandoc has nothing in its document model which > corresponds to `<samp>`. > > If you use the `+raw_html` extension (`--from=html+raw_html`) you can use a > simple Lua filter to convert the raw HTML tags to the appropriate raw > Texinfo: > > ````lua > local samp_for = { > ['<samp>'] = pandoc.RawInline('texinfo', '@samp{'), > ['</samp>'] = pandoc.RawInline('texinfo', '}') > } > > function RawInline (elem) > if 'html' ~= elem.format then > return nil > elseif samp_for[elem.text] then > return samp_for[elem.text] > else > return nil > end > end > ```` > > I have often wished for an option or extension which will cause "unknown" > HTML elements or LaTeX commands/environments to be converted into a span or > div with a `tag` or `command`/`environment` attribute so that e.g. > `<samp>foo</samp>` becomes what in Markdown would become `[foo]{tag=samp}`, > `\foo[bar]{baz}` would becomes what in Markdown would become > `[[bar]{command="foo" arg="1"}[baz]{command="foo" arg="2"}]{command="foo"}` > and `\begin{foo}[bar]{baz}text\end{foo}` becomes > > ````Markdown > ::: {environment="foo" arg1="bar" arg2="baz"} > text > ::: > ```` > > since this would in most cases make it much easier to use filters to make > something sensible of "unknown" tags/commands/environments. > > At one point I wrote a filter in Perl which implemented this for LaTeX, > including both preserving LaTeX arguments "raw" as attributes and calling > out to pandoc to parse the content of arguments and environments. It was > much harder with HTML since I had to loop through the contents of any > elements which might contain raw HTML and "capture" raw opening tags and > what follows them up to the closing tag if any, including keeping track of > nested tags. > I suppose such a filter might be written in Lua too, but parsing LaTeX > arguments and raw HTML attributes would be considerably more hairy due to > the limitations of Lua patterns compared to Perl regexes. > > Den sön 6 okt. 2019 09:47Werner LEMBERG <wl-mXXj517/zsQ@public.gmane.org> skrev: > >> >> [pandoc 2.6] >> >> >> I convert from HTML to texinfo, and I noticed that >> >> <samp>foo</samp> >> >> gets translated to >> >> foo >> >> instead of the expected >> >> @samp{foo} . >> >> Doing >> >> git grep -i 'samp[^l]' >> >> in pandoc's repository I don't see support for the '<samp>' HTML tag >> at all. Am I missing something? Is this intentional? >> >> >> Werner >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2a7adg753.fsf%40johnmacfarlane.net. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Support for HTML's '<samp>' tag? [not found] ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2019-10-06 16:39 ` John MacFarlane @ 2019-10-06 16:53 ` Werner LEMBERG 1 sibling, 0 replies; 4+ messages in thread From: Werner LEMBERG @ 2019-10-06 16:53 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, melroch-Re5JQEeQqe8AvxtiuMwx3w > If you use the `+raw_html` extension (`--from=html+raw_html`) you > can use a simple Lua filter to convert the raw HTML tags to the > appropriate raw Texinfo: [...] Thanks a lot for this! Werner ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-10-06 16:53 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-06 7:46 Support for HTML's '<samp>' tag? Werner LEMBERG [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org> 2019-10-06 11:06 ` BPJ [not found] ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2019-10-06 16:39 ` John MacFarlane 2019-10-06 16:53 ` Werner LEMBERG
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).