public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org>,
	pandoc-discuss
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: reading html, <h1 class="title"> header ignored
Date: Tue, 27 Aug 2019 17:53:55 -0700	[thread overview]
Message-ID: <yh480kk1ayxg7w.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


I take back one thing I said.  The <h1 class="title"> just
gets skipped; it doesn't get parsed into a metadata title.
If it did, it would appear as a title heading in your ODT
document and I think you'd have no complaints.

So maybe this can be solved by changing the HTML reader to
insert the contents of <h1 class="title"> into the title
metadata, if it hasn't already been populated by a
<title> tag in the header (I assume your document lacks one?)

Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org> writes:

> Hello, 
>
> Thank you very much for your response!
>
> On Tuesday, August 27, 2019 at 5:33:24 PM UTC+1, John MacFarlane wrote:
>>
>>
>> One possibility would be to change pandoc's HTML reader so that 
>> <h1 class="title"> is normally parsed as a regular level-1 
>> heading, UNLESS <meta generator="pandoc"> is present in the 
>> head section.  That would allow nice round tripping from pandoc 
>> but not get in the way of other HTML-producers. 
>>
>
>
>> However, it may be that pandoc's current behavior is actually 
>> better in many cases, even when processing HTML produced by 
>> other sources.  So it's quite possible that making this change 
>> would lead to a surge of complaints. (Comments welcome on this.) 
>>
>
> I would suggest that this behaviour become the default, BUT you add a 
> command line option to invoke the present behaviour.
>
> So:
>
> - with <meta generator="pandoc">, process <h1 class="title"> as metadata
> - with --title-metadata (or similar), process <h1 class="title"> as metadata
> - otherwise process <h1 class="title"> as a header
>  
>
>>
>> Another, probably better approach would be to parse 
>> <h1 class="title"> as a metadata title when pandoc is run 
>> with --standalone, but not when pandoc is run in fragment mode.
>
>
> But I want to get a complete ODT document as output. Don't I need to use 
> --standalone? If I do then this fix would do nothing for me.
>  
>
>>
>> A workaround for you would be to preprocess the input, or 
>> run in --standalone mode and use a lua filter that extracts 
>> the metadata title and inserts a level 1 header with its content 
>> at the beginning of the document. 
>>
>
> Preprocessing the input with a mere search and replace, changing 
> class="title" to class="meow", is a simple approach that works. But it is a 
> mandatory extra step.
>
>  Yours, Mikhail Ramendik 
>
>>
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/684df614-496b-455f-aa2d-e602b19c96b0%40googlegroups.com.


  parent reply	other threads:[~2019-08-28  0:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  0:07 Mikhail Ramendik
     [not found] ` <8a9e115c-2983-47d7-a7df-82af5d73822c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-27 16:33   ` John MacFarlane
     [not found]     ` <m2mufuefgc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-27 22:54       ` Mikhail Ramendik
     [not found]         ` <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-28  0:53           ` John MacFarlane [this message]
     [not found]             ` <yh480kk1ayxg7w.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-28  1:11               ` Mikhail Ramendik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yh480kk1ayxg7w.fsf@johnmacfarlane.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).