public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Support for HTML's '<samp>' tag?
Date: Sun, 6 Oct 2019 13:06:21 +0200	[thread overview]
Message-ID: <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A@mail.gmail.com> (raw)
In-Reply-To: <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

The reason no doubt is that Pandoc has nothing in its document model which
corresponds to `<samp>`.

If you use the `+raw_html` extension (`--from=html+raw_html`) you can use a
simple Lua filter to convert the raw HTML tags to the appropriate raw
Texinfo:

````lua
local samp_for = {
  ['<samp>'] = pandoc.RawInline('texinfo', '@samp{'),
  ['</samp>'] = pandoc.RawInline('texinfo', '}')
}

function RawInline (elem)
  if 'html' ~= elem.format then
    return nil
  elseif samp_for[elem.text] then
    return samp_for[elem.text]
  else
    return nil
  end
end
````

I have often wished for an option or extension which will cause "unknown"
HTML elements or LaTeX commands/environments to be converted into a span or
div with a `tag` or `command`/`environment` attribute so that e.g.
`<samp>foo</samp>` becomes what in Markdown would become `[foo]{tag=samp}`,
`\foo[bar]{baz}` would becomes what in Markdown would become
`[[bar]{command="foo" arg="1"}[baz]{command="foo" arg="2"}]{command="foo"}`
and `\begin{foo}[bar]{baz}text\end{foo}` becomes

````Markdown
::: {environment="foo"  arg1="bar" arg2="baz"}
text
:::
````

since this would in most cases make it much easier to use filters to make
something sensible of "unknown" tags/commands/environments.

At one point I wrote a filter in Perl which implemented this for LaTeX,
including both preserving LaTeX arguments "raw" as attributes and calling
out to pandoc to parse the content of arguments and environments.  It was
much harder with HTML since I had to loop through the contents of any
elements which might contain raw HTML and "capture" raw opening tags and
what follows them up to the closing tag if any, including keeping track of
nested tags.
I suppose such a filter might be written in Lua too, but parsing LaTeX
arguments and raw HTML attributes would be considerably more hairy due to
the limitations of Lua patterns compared to Perl regexes.

Den sön 6 okt. 2019 09:47Werner LEMBERG <wl-mXXj517/zsQ@public.gmane.org> skrev:

>
> [pandoc 2.6]
>
>
> I convert from HTML to texinfo, and I noticed that
>
>   <samp>foo</samp>
>
> gets translated to
>
>   foo
>
> instead of the expected
>
>   @samp{foo}   .
>
> Doing
>
>   git grep -i 'samp[^l]'
>
> in pandoc's repository I don't see support for the '<samp>' HTML tag
> at all.  Am I missing something?  Is this intentional?
>
>
>     Werner
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4437 bytes --]

  parent reply	other threads:[~2019-10-06 11:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-06  7:46 Werner LEMBERG
     [not found] ` <20191006.094647.1517844175195168605.wl-mXXj517/zsQ@public.gmane.org>
2019-10-06 11:06   ` BPJ [this message]
     [not found]     ` <CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-06 16:39       ` John MacFarlane
2019-10-06 16:53       ` Werner LEMBERG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADAJKhBQWxXYZS03xg000oiPDwk4q39Tb6XCUWm-6khF8Q9L8A@mail.gmail.com \
    --to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).