* omit HTML block in html --> org conversion
@ 2016-05-20 15:30 Ista Zahn
[not found] ` <43181b14-9c6d-402c-bed5-ba790f6ee5cb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Ista Zahn @ 2016-05-20 15:30 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2081 bytes --]
Hello,
I'm using pandoc to convert html email to org mode for viewing in emacs.
This works well, except that <div> tags are not converted and are instead
stuffed into #BEGIN_HTML ... #+END_HTML blocks. Is there any way I can
prevent that from happening? I'd really like for those div tags to be
omitted from the converted org file.
Here is a brief example.
Input:
echo '<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<div
style="width:100%;padding:24px 0 16px
0;background-color:#f5f5f5;text-align:center">
</div>
<p style="background:white"><i><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">You are
receiving this email because you took professional training and/or required
training in the past year.</span></i><o:p></o:p></p>
</body>
</html>
' | pandoc -f html -t org
Result:
#+BEGIN_HTML
<div class="WordSection1">
#+END_HTML
#+BEGIN_HTML
<div
style="width:100%;padding:24px 0 16px
0;background-color:#f5f5f5;text-align:center">
#+END_HTML
#+BEGIN_HTML
</div>
#+END_HTML
/You are receiving this email because you took professional training
and/or required training in the past year./
#+BEGIN_HTML
</div>
#+END_HTML
Desired result:
/You are receiving this email because you took professional training
and/or required training in the past year./
Thanks for any suggestions.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/43181b14-9c6d-402c-bed5-ba790f6ee5cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 7474 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: omit HTML block in html --> org conversion
[not found] ` <43181b14-9c6d-402c-bed5-ba790f6ee5cb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-05-20 17:47 ` John MACFARLANE
[not found] ` <20160520174700.GA95956-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: John MACFARLANE @ 2016-05-20 17:47 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
You could use a filter to remove the divs.
% cat nodivs.hs
import Text.Pandoc.JSON
main = toJSONFilter nodivs
where nodivs (Div _ bs) = bs
nodivs b = [b]
% ghc --make nodivs
% pandoc --filter ./nodivs -t org
<div class="hi">
ok
</div>
ok
If you don't have ghc installed, you could write the
filter in python instead using the pandocfilters or
panflute library.
+++ Ista Zahn [May 20 16 08:30 ]:
> Hello,
> I'm using pandoc to convert html email to org mode for viewing in
> emacs. This works well, except that <div> tags are not converted and
> are instead stuffed into #BEGIN_HTML ... #+END_HTML blocks. Is there
> any way I can prevent that from happening? I'd really like for those
> div tags to be omitted from the converted org file.
> Here is a brief example.
> Input:
> echo '<html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;
> charset=Windows-1252">
> </head>
> <body lang="EN-US" link="blue" vlink="purple">
> <div class="WordSection1">
> <div
> style="width:100%;padding:24px 0 16px
> 0;background-color:#f5f5f5;text-align:center">
> </div>
> <p style="background:white"><i><span
> style="font-size:11.0pt;font-family:"Calibri","sans-serif"">You are
> receiving this email because you took professional training and/or
> required
> training in the past year.</span></i><o:p></o:p></p>
> </body>
> </html>
> ' | pandoc -f html -t org
> Result:
> #+BEGIN_HTML
> <div class="WordSection1">
> #+END_HTML
>
> #+BEGIN_HTML
> <div
> style="width:100%;padding:24px 0 16px
> 0;background-color:#f5f5f5;text-align:center">
> #+END_HTML
>
> #+BEGIN_HTML
> </div>
> #+END_HTML
>
> /You are receiving this email because you took professional training
> and/or required training in the past year./
>
> #+BEGIN_HTML
> </div>
> #+END_HTML
> Desired result:
> /You are receiving this email because you took professional training
> and/or required training in the past year./
> Thanks for any suggestions.
>
> --
> You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to
> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> [3]https://groups.google.com/d/msgid/pandoc-discuss/43181b14-9c6d-402c-
> bed5-ba790f6ee5cb%40googlegroups.com.
> For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
> 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> 3. https://groups.google.com/d/msgid/pandoc-discuss/43181b14-9c6d-402c-bed5-ba790f6ee5cb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
> 4. https://groups.google.com/d/optout
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: omit HTML block in html --> org conversion
[not found] ` <20160520174700.GA95956-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
@ 2016-05-20 20:27 ` Ista Zahn
0 siblings, 0 replies; 3+ messages in thread
From: Ista Zahn @ 2016-05-20 20:27 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 4474 bytes --]
Wonderful, thank you!
On May 20, 2016 1:47 PM, "John MACFARLANE" <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
> You could use a filter to remove the divs.
>
> % cat nodivs.hs
> import Text.Pandoc.JSON
>
> main = toJSONFilter nodivs
> where nodivs (Div _ bs) = bs
> nodivs b = [b]
>
> % ghc --make nodivs
>
> % pandoc --filter ./nodivs -t org
> <div class="hi">
> ok
> </div>
> ok
>
> If you don't have ghc installed, you could write the
> filter in python instead using the pandocfilters or
> panflute library.
>
> +++ Ista Zahn [May 20 16 08:30 ]:
>
>> Hello,
>> I'm using pandoc to convert html email to org mode for viewing in
>> emacs. This works well, except that <div> tags are not converted and
>> are instead stuffed into #BEGIN_HTML ... #+END_HTML blocks. Is there
>> any way I can prevent that from happening? I'd really like for those
>> div tags to be omitted from the converted org file.
>> Here is a brief example.
>> Input:
>> echo '<html>
>> <head>
>> <meta http-equiv="Content-Type" content="text/html;
>> charset=Windows-1252">
>> </head>
>> <body lang="EN-US" link="blue" vlink="purple">
>> <div class="WordSection1">
>> <div
>> style="width:100%;padding:24px 0 16px
>> 0;background-color:#f5f5f5;text-align:center">
>> </div>
>> <p style="background:white"><i><span
>> style="font-size:11.0pt;font-family:"Calibri","sans-serif"">You are
>> receiving this email because you took professional training and/or
>> required
>> training in the past year.</span></i><o:p></o:p></p>
>> </body>
>> </html>
>> ' | pandoc -f html -t org
>> Result:
>> #+BEGIN_HTML
>> <div class="WordSection1">
>> #+END_HTML
>>
>> #+BEGIN_HTML
>> <div
>> style="width:100%;padding:24px 0 16px
>> 0;background-color:#f5f5f5;text-align:center">
>> #+END_HTML
>>
>> #+BEGIN_HTML
>> </div>
>> #+END_HTML
>>
>> /You are receiving this email because you took professional training
>> and/or required training in the past year./
>>
>> #+BEGIN_HTML
>> </div>
>> #+END_HTML
>> Desired result:
>> /You are receiving this email because you took professional training
>> and/or required training in the past year./
>> Thanks for any suggestions.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to
>> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> [3]https://groups.google.com/d/msgid/pandoc-discuss/43181b14-9c6d-402c-
>> bed5-ba790f6ee5cb%40googlegroups.com.
>> For more options, visit [4]https://groups.google.com/d/optout.
>>
>> References
>>
>> 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> 3.
>> https://groups.google.com/d/msgid/pandoc-discuss/43181b14-9c6d-402c-bed5-ba790f6ee5cb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>> 4. https://groups.google.com/d/optout
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20160520174700.GA95956%40protagoras.berkeley.edu
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CA%2BvqiLGeurWLANYsvuYE5RzA8X9a14J5YXVxoDWenE28PVEZOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 7174 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-05-20 20:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20 15:30 omit HTML block in html --> org conversion Ista Zahn
[not found] ` <43181b14-9c6d-402c-bed5-ba790f6ee5cb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-05-20 17:47 ` John MACFARLANE
[not found] ` <20160520174700.GA95956-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
2016-05-20 20:27 ` Ista Zahn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).