public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* map p+class and span+class to para and char style names in html to docx, odt, icml and vice versa
@ 2015-09-22 12:40 massifrg
       [not found] ` <6f4a2ed7-3eb3-4f09-8fc2-07c823e62ff2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: massifrg @ 2015-09-22 12:40 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2032 bytes --]

Hello,
I'm working on documents marked up with XHTML.
I wrote some utilities to convert them in docx, pdf (through PrinceXML or 
ConTeXt) and ICML.
Those utilities are far from complete and I'd like to use Pandoc instead.

It would be great to convert <p class=...> and <span class=...> elements to 
some corresponding paragraph and character styles in docx, odt and ICML.
The concepts of paragraph styles and character styles are common in Word, 
OpenOffice/Libreoffice Writer and InDesign (and not only them).
They map well to HTML's p+class and span+class.

In Pandoc, paragraphs lack attributes(see Pandoc.Text.Definition 
<http://hackage.haskell.org/package/pandoc-types-1.12.4.4/docs/Text-Pandoc-Definition.html>), 
even if there's a workaround (see here 
<https://groups.google.com/forum/#!searchin/pandoc-discuss/paragraph$20attributes/pandoc-discuss/hmcT7edsHd8/SH-l8AWYiqoJ>
).

It would be really useful if Pandoc mapped p+class and span+class elements 
to para and char styles with the same name in docx, odt, icml.
What do you think?

I think it should be an option that you could toggle (i.e. "--map-styles"). 
Something like (or working with) --reference-odt and --reference-docx (and 
maybe --reference-icml or --reference-idml in the future), but not limited 
to a fixed set of styles.

I don't know how they should be marked up in markdown, but since it would 
be specific to those formats, markdown writer could simply ignore that 
feature.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6f4a2ed7-3eb3-4f09-8fc2-07c823e62ff2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2829 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: map p+class and span+class to para and char style names in html to docx, odt, icml and vice versa
       [not found] ` <6f4a2ed7-3eb3-4f09-8fc2-07c823e62ff2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-12-29 15:07   ` massifrg
       [not found]     ` <337e7324-35e6-4774-ad4b-574e33cede54-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: massifrg @ 2015-12-29 15:07 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2477 bytes --]

I try to reformulate and simplify the question.
Example:

A <span class="myStyle">word</span> with a custom style.

Convert it from markdown to HTML (pandoc -f markdown -t html) and you get:

<p>A <span class="myStyle">word</span> with a custom style.</p>

Convert it from markdown to ICML (pandoc -f markdown -t icml) and you get:

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>A </Content>
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>word</Content>
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content> with a custom style.</Content>
  </CharacterStyleRange><Br />
</ParagraphStyleRange>

The styled word is put in a CharacterStyleRange of its own, but there's no 
trace of the class attribute.
Is there a way to get this:

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>A </Content>
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="$ID/myStyle">
    <Content>word</Content>
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content> with a custom style.</Content>
  </CharacterStyleRange><Br />
</ParagraphStyleRange>

This way, when you import the ICML in InDesign, in a document with myStyle 
previously defined as a character style, you get the right formatting.
The same could be thought for DOCX and ODT, with reference documents that 
contain the styles you need.
I have used the class attribute to map the style, but another attribute 
could be used: it's only conventional.
I think this "style mapping" should be disabled by default, but enabled by 
a command line option.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/337e7324-35e6-4774-ad4b-574e33cede54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 12194 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: map p+class and span+class to para and char style names in html to docx, odt, icml and vice versa
       [not found]     ` <337e7324-35e6-4774-ad4b-574e33cede54-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-12-31 17:23       ` mb21
       [not found]         ` <984a8c71-60cd-4766-83d3-219d178ab923-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: mb21 @ 2015-12-31 17:23 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3374 bytes --]

So what you're proposing is to extend the functionality described in 
https://github.com/jgm/pandoc/issues/2542 to: - not only cover DOCX, but 
also ODT and ICML
- and not only Inlines but also Blocks (i.e. not only "character styles" 
but also "paragraph styles")

You're welcome to add your comments to that issue!

You suggest using a Span for Inlines, so using a Div for Blocks would be 
consequent. Also, as you mentioned  Para currently unfortunately doesn't 
support attributes in Pandoc's AST anyway.

Btw, you can also always write your own filter (see 
http://pandoc.org/scripting.html) to modify Pandoc's AST and insert for 
example Raw ICML, like: [RawBlock (Format "icml") "<ParagraphStyleRange ... 
</ParagraphStyleRange>"]


On Tuesday, December 29, 2015 at 4:07:15 PM UTC+1, massifrg wrote:
>
> I try to reformulate and simplify the question.
> Example:
>
> A <span class="myStyle">word</span> with a custom style.
>
> Convert it from markdown to HTML (pandoc -f markdown -t html) and you get:
>
> <p>A <span class="myStyle">word</span> with a custom style.</p>
>
> Convert it from markdown to ICML (pandoc -f markdown -t icml) and you get:
>
> <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
>   <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
>     <Content>A </Content>
>   </CharacterStyleRange>
>   <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
>     <Content>word</Content>
>   </CharacterStyleRange>
>   <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
>     <Content> with a custom style.</Content>
>   </CharacterStyleRange><Br />
> </ParagraphStyleRange>
>
> The styled word is put in a CharacterStyleRange of its own, but there's no 
> trace of the class attribute.
> Is there a way to get this:
>
> <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
>   <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
>     <Content>A </Content>
>   </CharacterStyleRange>
>   <CharacterStyleRange AppliedCharacterStyle="$ID/myStyle">
>     <Content>word</Content>
>   </CharacterStyleRange>
>   <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
>     <Content> with a custom style.</Content>
>   </CharacterStyleRange><Br />
> </ParagraphStyleRange>
>
> This way, when you import the ICML in InDesign, in a document with myStyle 
> previously defined as a character style, you get the right formatting.
> The same could be thought for DOCX and ODT, with reference documents that 
> contain the styles you need.
> I have used the class attribute to map the style, but another attribute 
> could be used: it's only conventional.
> I think this "style mapping" should be disabled by default, but enabled by 
> a command line option.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/984a8c71-60cd-4766-83d3-219d178ab923%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 9402 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: map p+class and span+class to para and char style names in html to docx, odt, icml and vice versa
       [not found]         ` <984a8c71-60cd-4766-83d3-219d178ab923-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-01-02 14:28           ` massifrg
  0 siblings, 0 replies; 4+ messages in thread
From: massifrg @ 2016-01-02 14:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 895 bytes --]

Thank you for the answer and the links, mb21.
When I have something to contribute, I'll add it to issue 2542.

I'll follow these guidelines:
- use the current AST (even if p+attrs would map better than div+attrs to 
paragraph styles)
- follow jgm's comments on issue 2542 (map only to existing styles and 
"style-" prefix)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8c61babd-b797-44e8-9b37-bc0f57aace36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1350 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-02 14:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 12:40 map p+class and span+class to para and char style names in html to docx, odt, icml and vice versa massifrg
     [not found] ` <6f4a2ed7-3eb3-4f09-8fc2-07c823e62ff2-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-12-29 15:07   ` massifrg
     [not found]     ` <337e7324-35e6-4774-ad4b-574e33cede54-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-12-31 17:23       ` mb21
     [not found]         ` <984a8c71-60cd-4766-83d3-219d178ab923-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-01-02 14:28           ` massifrg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).