public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "Pablo Rodríguez" <oinos-S0/GAf8tV78@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: HTML attributes not being stripped off
Date: Mon, 12 Nov 2012 20:14:26 +0100	[thread overview]
Message-ID: <50A14A92.9060301@web.de> (raw)
In-Reply-To: <20121111223615.GE4399-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org>

Thank you for your explanation, John.

I'm afraid I cannot code and pandoc's internal representation of the
document (sorry if the naming isn't accurate, but this is really Greek
to me [χαλεπὰ τὰ καλά, I agree :-)]) is beyond my extremely limited
understanding on these matters.

I can understand (and I hope I'm not wrong) that pandoc cannot be as
flexible as HTML and this is on purpose. (This might be problematic for
some uses, as the Spanish law gazette uses no headings, but it
distinguishes the different <p> with different classes.)

Not focusing specifically on HTML, I think that pandoc should allow to
uniquely identify, add to a class and set the language to any element,
desired text span or division.

I know that this is related to a couple of messages I sent yesterday.
Sorry for repeating myself, but these are basic features to write documents.

From the documentation perspective, it would be to apply the type Attr
to any constructor from data Block and Inline. And to data TableCell.

Although language could be defined as a key-value pair in type Attr, I
think is clearer to define a new specific language attribute.

Is there anything wrong with this approach?

Many thanks for your help,



Pablo

On 11/11/12 23:36, John MacFarlane wrote:
> You've got to remember that pandoc converts the input format to an
> internal representation of the document (the 'Pandoc' structure), and
> then converts that to the output format.
> 
> This internal representation (see
> http://hackage.haskell.org/packages/archive/pandoc-types/1.9.1/doc/html/Text-Pandoc-Definition.html)
> is much less expressive than HTML, and doesn't have a place for the
> attributes you want.  That's why they are lost on HTML -> HTML
> translation.
> 
> +++ Pablo Rodríguez [Nov 11 12 12:19 ]:
>> Hi John,
>>
>> I'm using pandoc mainly to generate ePub files.
>>
>> I used textile first as source language, but it isn't fully implemented
>> by pandoc and textile itself has issues with multiparagraph elements.
>>
>> It seems HTML is probably a much better option for pandoc as source
>> language, although I have to forget footnotes. There is no way to have
>> it all.
>>
>> But pandoc strips almost all attributes from HTML elements.
>>
>> A minimal sample:
>>
>> <ol start="2" style="list-style-type:lower-latin;">
>> <li><p>Well there is no other way to tag <em lang="la">lingua
>> latina</em>.</p>
>> <li><p>Or even classes or ids.</p>.</li>
>> </ol>
>>
>> Would it be possible that there is an option that doesn't strip off
>> attributes from HTML code?
>>
>> BTW, when converting from HTML to another HTML code, at least id, class
>> and lang attributes shouldn't be stripped off by default.
>>
>> Many thanks for your help,
>>
>>
>> Pablo
>> -- 
>> http://www.ousia.tk
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
> 

-- 
http://www.ousia.tk

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




  parent reply	other threads:[~2012-11-12 19:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-11 11:19 Pablo Rodríguez
     [not found] ` <509F89B3.4070403-S0/GAf8tV78@public.gmane.org>
2012-11-11 22:36   ` John MacFarlane
     [not found]     ` <20121111223615.GE4399-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org>
2012-11-12 19:14       ` Pablo Rodríguez [this message]
     [not found]         ` <50A14A92.9060301-S0/GAf8tV78@public.gmane.org>
2022-06-27  9:42           ` 'guenael Muller' via pandoc-discuss
     [not found]             ` <33fcfdbf-3edc-4145-a7f0-325bfd42698fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-27  9:47               ` Albert Krewinkel
2022-06-27  9:55               ` Sukil Etxenike arizaleta
     [not found]                 ` <87174047-ad9b-b702-4a08-eaa3c00c511d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-06-27 10:17                   ` 'guenael Muller' via pandoc-discuss
     [not found]                     ` <e1b7f6d6-56c7-469e-b2f1-082718e2cbb2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-27 11:37                       ` Albert Krewinkel
     [not found]                         ` <87r13abaeb.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-06-27 12:14                           ` Albert Krewinkel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A14A92.9060301@web.de \
    --to=oinos-s0/gaf8tv78@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).