public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Support for HTML's '<samp>' tag?
@ 2019-10-06  7:46 Werner LEMBERG
       [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Werner LEMBERG @ 2019-10-06  7:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[pandoc 2.6]


I convert from HTML to texinfo, and I noticed that

  <samp>foo</samp>

gets translated to

  foo

instead of the expected

  @samp{foo}   .

Doing

  git grep -i 'samp[^l]'

in pandoc's repository I don't see support for the '<samp>' HTML tag
at all.  Am I missing something?  Is this intentional?


    Werner


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Support for HTML's '<samp>' tag?
       [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>
@ 2019-10-06 11:06   ` BPJ
       [not found]     ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: BPJ @ 2019-10-06 11:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

The reason no doubt is that Pandoc has nothing in its document model which
corresponds to `<samp>`.

If you use the `+raw_html` extension (`--from=html+raw_html`) you can use a
simple Lua filter to convert the raw HTML tags to the appropriate raw
Texinfo:

````lua
local samp_for = {
  ['<samp>'] = pandoc.RawInline('texinfo', '@samp{'),
  ['</samp>'] = pandoc.RawInline('texinfo', '}')
}

function RawInline (elem)
  if 'html' ~= elem.format then
    return nil
  elseif samp_for[elem.text] then
    return samp_for[elem.text]
  else
    return nil
  end
end
````

I have often wished for an option or extension which will cause "unknown"
HTML elements or LaTeX commands/environments to be converted into a span or
div with a `tag` or `command`/`environment` attribute so that e.g.
`<samp>foo</samp>` becomes what in Markdown would become `[foo]{tag=samp}`,
`\foo[bar]{baz}` would becomes what in Markdown would become
`[[bar]{command="foo" arg="1"}[baz]{command="foo" arg="2"}]{command="foo"}`
and `\begin{foo}[bar]{baz}text\end{foo}` becomes

````Markdown
::: {environment="foo"  arg1="bar" arg2="baz"}
text
:::
````

since this would in most cases make it much easier to use filters to make
something sensible of "unknown" tags/commands/environments.

At one point I wrote a filter in Perl which implemented this for LaTeX,
including both preserving LaTeX arguments "raw" as attributes and calling
out to pandoc to parse the content of arguments and environments.  It was
much harder with HTML since I had to loop through the contents of any
elements which might contain raw HTML and "capture" raw opening tags and
what follows them up to the closing tag if any, including keeping track of
nested tags.
I suppose such a filter might be written in Lua too, but parsing LaTeX
arguments and raw HTML attributes would be considerably more hairy due to
the limitations of Lua patterns compared to Perl regexes.

Den sön 6 okt. 2019 09:47Werner LEMBERG <wl-mXXj517/zsQ@public.gmane.org> skrev:

>
> [pandoc 2.6]
>
>
> I convert from HTML to texinfo, and I noticed that
>
>   <samp>foo</samp>
>
> gets translated to
>
>   foo
>
> instead of the expected
>
>   @samp{foo}   .
>
> Doing
>
>   git grep -i 'samp[^l]'
>
> in pandoc's repository I don't see support for the '<samp>' HTML tag
> at all.  Am I missing something?  Is this intentional?
>
>
>     Werner
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4437 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Support for HTML's '<samp>' tag?
       [not found]     ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-06 16:39       ` John MacFarlane
  2019-10-06 16:53       ` Werner LEMBERG
  1 sibling, 0 replies; 4+ messages in thread
From: John MacFarlane @ 2019-10-06 16:39 UTC (permalink / raw)
  To: BPJ, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


This is a good workaround, but really we should parse

    <samp>foo</samp>

as inline code, perhaps with class 'sample'.  I'll add an
issue on the tracker:

https://github.com/jgm/pandoc/issues/5792


BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> The reason no doubt is that Pandoc has nothing in its document model which
> corresponds to `<samp>`.
>
> If you use the `+raw_html` extension (`--from=html+raw_html`) you can use a
> simple Lua filter to convert the raw HTML tags to the appropriate raw
> Texinfo:
>
> ````lua
> local samp_for = {
>   ['<samp>'] = pandoc.RawInline('texinfo', '@samp{'),
>   ['</samp>'] = pandoc.RawInline('texinfo', '}')
> }
>
> function RawInline (elem)
>   if 'html' ~= elem.format then
>     return nil
>   elseif samp_for[elem.text] then
>     return samp_for[elem.text]
>   else
>     return nil
>   end
> end
> ````
>
> I have often wished for an option or extension which will cause "unknown"
> HTML elements or LaTeX commands/environments to be converted into a span or
> div with a `tag` or `command`/`environment` attribute so that e.g.
> `<samp>foo</samp>` becomes what in Markdown would become `[foo]{tag=samp}`,
> `\foo[bar]{baz}` would becomes what in Markdown would become
> `[[bar]{command="foo" arg="1"}[baz]{command="foo" arg="2"}]{command="foo"}`
> and `\begin{foo}[bar]{baz}text\end{foo}` becomes
>
> ````Markdown
> ::: {environment="foo"  arg1="bar" arg2="baz"}
> text
> :::
> ````
>
> since this would in most cases make it much easier to use filters to make
> something sensible of "unknown" tags/commands/environments.
>
> At one point I wrote a filter in Perl which implemented this for LaTeX,
> including both preserving LaTeX arguments "raw" as attributes and calling
> out to pandoc to parse the content of arguments and environments.  It was
> much harder with HTML since I had to loop through the contents of any
> elements which might contain raw HTML and "capture" raw opening tags and
> what follows them up to the closing tag if any, including keeping track of
> nested tags.
> I suppose such a filter might be written in Lua too, but parsing LaTeX
> arguments and raw HTML attributes would be considerably more hairy due to
> the limitations of Lua patterns compared to Perl regexes.
>
> Den sön 6 okt. 2019 09:47Werner LEMBERG <wl-mXXj517/zsQ@public.gmane.org> skrev:
>
>>
>> [pandoc 2.6]
>>
>>
>> I convert from HTML to texinfo, and I noticed that
>>
>>   <samp>foo</samp>
>>
>> gets translated to
>>
>>   foo
>>
>> instead of the expected
>>
>>   @samp{foo}   .
>>
>> Doing
>>
>>   git grep -i 'samp[^l]'
>>
>> in pandoc's repository I don't see support for the '<samp>' HTML tag
>> at all.  Am I missing something?  Is this intentional?
>>
>>
>>     Werner
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2a7adg753.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Support for HTML's '<samp>' tag?
       [not found]     ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2019-10-06 16:39       ` John MacFarlane
@ 2019-10-06 16:53       ` Werner LEMBERG
  1 sibling, 0 replies; 4+ messages in thread
From: Werner LEMBERG @ 2019-10-06 16:53 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, melroch-Re5JQEeQqe8AvxtiuMwx3w


> If you use the `+raw_html` extension (`--from=html+raw_html`) you
> can use a simple Lua filter to convert the raw HTML tags to the
> appropriate raw Texinfo: [...]

Thanks a lot for this!


    Werner


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-06 16:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-06  7:46 Support for HTML's '<samp>' tag? Werner LEMBERG
     [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>
2019-10-06 11:06   ` BPJ
     [not found]     ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-06 16:39       ` John MacFarlane
2019-10-06 16:53       ` Werner LEMBERG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).