public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Converting HTML to EPUB loses paragraph styles
@ 2019-10-21 22:51 David Given
       [not found] ` <44436778-58cb-4883-bd25-c85d32358a1e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: David Given @ 2019-10-21 22:51 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1137 bytes --]

I have a document in HTML which I'm trying to convert to EPUB, using a 
command line like this:

pandoc -f html -t epub2 --metadata-file=meta.yaml input.html 
--self-contained -o output.epub --toc

If my input file contains a paragraph with a CSS class like this:

<p class="something">text</p>

...then the class is stripped off in the output EPUB:

<p>text</p>

This is problematic as I need the class in order to format my text 
properly. Does anyone know why this happens and if there's any way to stop 
it?

(Additional context: I need the class because bare <br/> elements get 
transformed into <p><br/></p> paragraphs, which my CSS can't detect, so if 
anyone can stop *that* conversion instead it'd work just as well.)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/44436778-58cb-4883-bd25-c85d32358a1e%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1710 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Converting HTML to EPUB loses paragraph styles
       [not found] ` <44436778-58cb-4883-bd25-c85d32358a1e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-10-22 16:10   ` John MacFarlane
       [not found]     ` <yh480ktv80kbg3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2019-10-22 16:10 UTC (permalink / raw)
  To: David Given, pandoc-discuss


Pandoc conversions can, in general, lose information.
See the beginning of the manual.

If you have a class on a div rather than a p, it should
carry through.  So if you can modify the source document
accordingly, it may work.

David Given <david.given-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I have a document in HTML which I'm trying to convert to EPUB, using a 
> command line like this:
>
> pandoc -f html -t epub2 --metadata-file=meta.yaml input.html 
> --self-contained -o output.epub --toc
>
> If my input file contains a paragraph with a CSS class like this:
>
> <p class="something">text</p>
>
> ...then the class is stripped off in the output EPUB:
>
> <p>text</p>
>
> This is problematic as I need the class in order to format my text 
> properly. Does anyone know why this happens and if there's any way to stop 
> it?
>
> (Additional context: I need the class because bare <br/> elements get 
> transformed into <p><br/></p> paragraphs, which my CSS can't detect, so if 
> anyone can stop *that* conversion instead it'd work just as well.)
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/44436778-58cb-4883-bd25-c85d32358a1e%40googlegroups.com.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Converting HTML to EPUB loses paragraph styles
       [not found]     ` <yh480ktv80kbg3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-10-22 18:19       ` David Given
       [not found]         ` <CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd+VgumS0f_WAmSww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: David Given @ 2019-10-22 18:19 UTC (permalink / raw)
  To: John MacFarlane; +Cc: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2638 bytes --]

Thanks --- the div trick works fine as a workaround; I hadn't thought of
that. Odd that the style is retained for a div and not for a paragraph,
though. I wonder if there's a layer translating via something semantically
equivalent to markdown.

On Tue, 22 Oct 2019 at 18:11, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

>
> Pandoc conversions can, in general, lose information.
> See the beginning of the manual.
>
> If you have a class on a div rather than a p, it should
> carry through.  So if you can modify the source document
> accordingly, it may work.
>
> David Given <david.given-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I have a document in HTML which I'm trying to convert to EPUB, using a
> > command line like this:
> >
> > pandoc -f html -t epub2 --metadata-file=meta.yaml input.html
> > --self-contained -o output.epub --toc
> >
> > If my input file contains a paragraph with a CSS class like this:
> >
> > <p class="something">text</p>
> >
> > ...then the class is stripped off in the output EPUB:
> >
> > <p>text</p>
> >
> > This is problematic as I need the class in order to format my text
> > properly. Does anyone know why this happens and if there's any way to
> stop
> > it?
> >
> > (Additional context: I need the class because bare <br/> elements get
> > transformed into <p><br/></p> paragraphs, which my CSS can't detect, so
> if
> > anyone can stop *that* conversion instead it'd work just as well.)
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/44436778-58cb-4883-bd25-c85d32358a1e%40googlegroups.com
> .
>


-- 
┌─── http://www.cowlark.com ───
│ "I have always wished for my computer to be as easy to use as my
│ telephone; my wish has come true because I can no longer figure out
│ how to use my telephone." --- Bjarne Stroustrup

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd%2BVgumS0f_WAmSww%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 3977 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Converting HTML to EPUB loses paragraph styles
       [not found]         ` <CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd+VgumS0f_WAmSww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-22 18:31           ` BPJ
  0 siblings, 0 replies; 4+ messages in thread
From: BPJ @ 2019-10-22 18:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]

Yes, everything is translated into Pandoc's internal format which is mostly
feature equivalent with Pandoc's Markdown. Attributes are only supported on
code, codeblocks, divs, spans links and images. The only practical
difference I'm aware of is that you need to use child selectors in CSS.

Den tis 22 okt. 2019 20:20David Given <dg-BVYxIlfuoVxBDgjK7y7TUQ@public.gmane.org> skrev:

> Thanks --- the div trick works fine as a workaround; I hadn't thought of
> that. Odd that the style is retained for a div and not for a paragraph,
> though. I wonder if there's a layer translating via something semantically
> equivalent to markdown.
>
> On Tue, 22 Oct 2019 at 18:11, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
>
>>
>> Pandoc conversions can, in general, lose information.
>> See the beginning of the manual.
>>
>> If you have a class on a div rather than a p, it should
>> carry through.  So if you can modify the source document
>> accordingly, it may work.
>>
>> David Given <david.given-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > I have a document in HTML which I'm trying to convert to EPUB, using a
>> > command line like this:
>> >
>> > pandoc -f html -t epub2 --metadata-file=meta.yaml input.html
>> > --self-contained -o output.epub --toc
>> >
>> > If my input file contains a paragraph with a CSS class like this:
>> >
>> > <p class="something">text</p>
>> >
>> > ...then the class is stripped off in the output EPUB:
>> >
>> > <p>text</p>
>> >
>> > This is problematic as I need the class in order to format my text
>> > properly. Does anyone know why this happens and if there's any way to
>> stop
>> > it?
>> >
>> > (Additional context: I need the class because bare <br/> elements get
>> > transformed into <p><br/></p> paragraphs, which my CSS can't detect, so
>> if
>> > anyone can stop *that* conversion instead it'd work just as well.)
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/44436778-58cb-4883-bd25-c85d32358a1e%40googlegroups.com
>> .
>>
>
>
> --
> ┌─── http://www.cowlark.com ───
> │ "I have always wished for my computer to be as easy to use as my
> │ telephone; my wish has come true because I can no longer figure out
> │ how to use my telephone." --- Bjarne Stroustrup
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd%2BVgumS0f_WAmSww%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd%2BVgumS0f_WAmSww%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAkwcPr6o8Ubd70EH66wE20YWFLt8hb7dnQ8jjywA_F9w%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5501 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-22 18:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 22:51 Converting HTML to EPUB loses paragraph styles David Given
     [not found] ` <44436778-58cb-4883-bd25-c85d32358a1e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-10-22 16:10   ` John MacFarlane
     [not found]     ` <yh480ktv80kbg3.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-10-22 18:19       ` David Given
     [not found]         ` <CALgV52jLsZ7Ums2_QxceDOUEYWSB_e6zgmd+VgumS0f_WAmSww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-22 18:31           ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).