From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/5103 Path: news.gmane.org!not-for-mail From: =?UTF-8?B?UGFibG8gUm9kcsOtZ3Vleg==?= Newsgroups: gmane.text.pandoc Subject: Re: HTML attributes not being stripped off Date: Mon, 12 Nov 2012 20:14:26 +0100 Message-ID: <50A14A92.9060301@web.de> References: <509F89B3.4070403@web.de> <20121111223615.GE4399@Johns-MacBook-Air-2.local> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1352747675 3400 80.91.229.3 (12 Nov 2012 19:14:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 12 Nov 2012 19:14:35 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCD6NVO5UINBBGMVQWCQKGQEG3OLPXI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Nov 12 20:14:45 2012 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-wg0-f58.google.com ([74.125.82.58]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TXzT3-0003yN-Dv for gtp-pandoc-discuss@m.gmane.org; Mon, 12 Nov 2012 20:14:45 +0100 Original-Received: by mail-wg0-f58.google.com with SMTP id dt12sf2256263wgb.3 for ; Mon, 12 Nov 2012 11:14:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-beenthere:received-spf:message-id:date:from:user-agent :mime-version:to:subject:references:in-reply-to:x-provags-id :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-transfer-encoding; bh=Mpe/tRXmOWOhCSAkDgT8cOWDuHGZGaDa5MJt2eBSrEY=; b=eRm/WeXV2A+RnhVk6podBge1wGJS5Ty7xUtu0c43BG9saj6f3pbaRfH/ui3Y/smHpl VhCA1g0KX8lFApKC35CXTzNO6PDdGwwY0pmQ7P8IfnX4dzYULRGJc9Lfh/FNB/ZcswMQ +MHGb0gsJGRD7bl7vq4r99aTD97ETh//bEYSw/GZhQi1N0YgH9xYXl6tCXSgsKcLggmF 1Iil2kvJ9NH67K6lIvA9RUdAZnILeoAovmYW1h1bFxyRn76sVFDO7KySZmIeM7yuS+ay y/IXfMQk3ZgMxmDVWRB3DG8Ra3ilA Original-Received: by 10.216.228.26 with SMTP id e26mr720954weq.24.1352747674117; Mon, 12 Nov 2012 11:14:34 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.14.198.201 with SMTP id v49ls4562634een.0.gmail; Mon, 12 Nov 2012 11:14:33 -0800 (PST) Original-Received: by 10.14.204.3 with SMTP id g3mr20611800eeo.7.1352747673471; Mon, 12 Nov 2012 11:14:33 -0800 (PST) Original-Received: by 10.14.204.3 with SMTP id g3mr20611799eeo.7.1352747673458; Mon, 12 Nov 2012 11:14:33 -0800 (PST) Original-Received: from mout.web.de (mout.web.de. [212.227.15.3]) by gmr-mx.google.com with ESMTP id z47si1887631eel.0.2012.11.12.11.14.33; Mon, 12 Nov 2012 11:14:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of oinos-S0/GAf8tV78@public.gmane.org designates 212.227.15.3 as permitted sender) client-ip=212.227.15.3; Original-Received: from [192.168.1.33] ([88.3.89.223]) by smtp.web.de (mrweb101) with ESMTPSA (Nemesis) id 0Lo0V2-1T5XOa3tFo-00gQhf for ; Mon, 12 Nov 2012 20:14:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 In-Reply-To: <20121111223615.GE4399-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org> X-Provags-ID: V02:K0:gEEmJPQpPMEK2Am71P9fkniMFycxN/i69hW7Mg1l8ju 3rEONBok4wJ1O7Ho9dGnuePj8Wrd1ee5iqFyzKDTanQGCIBJN4 bhRStL7+MRVGYylG5AmPSxDoyiAxNliRiIlT1+5LYamyMdWSTh emZQ1+0bmG+TTwl+RHZUOBOugRmk/q4YPr4WJg1DvNo7FIz0vc qc10MAIB4IJ17Kdvg0nMw== X-Original-Sender: oinos-S0/GAf8tV78@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of oinos-S0/GAf8tV78@public.gmane.org designates 212.227.15.3 as permitted sender) smtp.mail=oinos-S0/GAf8tV78@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-Subscribe: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:5103 Archived-At: Thank you for your explanation, John. I'm afraid I cannot code and pandoc's internal representation of the document (sorry if the naming isn't accurate, but this is really Greek to me [=CF=87=CE=B1=CE=BB=CE=B5=CF=80=E1=BD=B0 =CF=84=E1=BD=B0 =CE=BA=CE=B1= =CE=BB=CE=AC, I agree :-)]) is beyond my extremely limited understanding on these matters. I can understand (and I hope I'm not wrong) that pandoc cannot be as flexible as HTML and this is on purpose. (This might be problematic for some uses, as the Spanish law gazette uses no headings, but it distinguishes the different

with different classes.) Not focusing specifically on HTML, I think that pandoc should allow to uniquely identify, add to a class and set the language to any element, desired text span or division. I know that this is related to a couple of messages I sent yesterday. Sorry for repeating myself, but these are basic features to write documents= . >From the documentation perspective, it would be to apply the type Attr to any constructor from data Block and Inline. And to data TableCell. Although language could be defined as a key-value pair in type Attr, I think is clearer to define a new specific language attribute. Is there anything wrong with this approach? Many thanks for your help, Pablo On 11/11/12 23:36, John MacFarlane wrote: > You've got to remember that pandoc converts the input format to an > internal representation of the document (the 'Pandoc' structure), and > then converts that to the output format. >=20 > This internal representation (see > http://hackage.haskell.org/packages/archive/pandoc-types/1.9.1/doc/html/T= ext-Pandoc-Definition.html) > is much less expressive than HTML, and doesn't have a place for the > attributes you want. That's why they are lost on HTML -> HTML > translation. >=20 > +++ Pablo Rodr=C3=ADguez [Nov 11 12 12:19 ]: >> Hi John, >> >> I'm using pandoc mainly to generate ePub files. >> >> I used textile first as source language, but it isn't fully implemented >> by pandoc and textile itself has issues with multiparagraph elements. >> >> It seems HTML is probably a much better option for pandoc as source >> language, although I have to forget footnotes. There is no way to have >> it all. >> >> But pandoc strips almost all attributes from HTML elements. >> >> A minimal sample: >> >>

    >>
  1. Well there is no other way to tag lingua >> latina.

    >>
  2. Or even classes or ids.

    .
  3. >>
>> >> Would it be possible that there is an option that doesn't strip off >> attributes from HTML code? >> >> BTW, when converting from HTML to another HTML code, at least id, class >> and lang attributes shouldn't be stripped off by default. >> >> Many thanks for your help, >> >> >> Pablo >> --=20 >> http://www.ousia.tk >> >> --=20 >> You received this message because you are subscribed to the Google Group= s "pandoc-discuss" group. >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe= @googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >=20 --=20 http://www.ousia.tk --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@go= oglegroups.com. For more options, visit https://groups.google.com/groups/opt_out.