From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org>,
pandoc-discuss
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: reading html, <h1 class="title"> header ignored
Date: Tue, 27 Aug 2019 17:53:55 -0700 [thread overview]
Message-ID: <yh480kk1ayxg7w.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
I take back one thing I said. The <h1 class="title"> just
gets skipped; it doesn't get parsed into a metadata title.
If it did, it would appear as a title heading in your ODT
document and I think you'd have no complaints.
So maybe this can be solved by changing the HTML reader to
insert the contents of <h1 class="title"> into the title
metadata, if it hasn't already been populated by a
<title> tag in the header (I assume your document lacks one?)
Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org> writes:
> Hello,
>
> Thank you very much for your response!
>
> On Tuesday, August 27, 2019 at 5:33:24 PM UTC+1, John MacFarlane wrote:
>>
>>
>> One possibility would be to change pandoc's HTML reader so that
>> <h1 class="title"> is normally parsed as a regular level-1
>> heading, UNLESS <meta generator="pandoc"> is present in the
>> head section. That would allow nice round tripping from pandoc
>> but not get in the way of other HTML-producers.
>>
>
>
>> However, it may be that pandoc's current behavior is actually
>> better in many cases, even when processing HTML produced by
>> other sources. So it's quite possible that making this change
>> would lead to a surge of complaints. (Comments welcome on this.)
>>
>
> I would suggest that this behaviour become the default, BUT you add a
> command line option to invoke the present behaviour.
>
> So:
>
> - with <meta generator="pandoc">, process <h1 class="title"> as metadata
> - with --title-metadata (or similar), process <h1 class="title"> as metadata
> - otherwise process <h1 class="title"> as a header
>
>
>>
>> Another, probably better approach would be to parse
>> <h1 class="title"> as a metadata title when pandoc is run
>> with --standalone, but not when pandoc is run in fragment mode.
>
>
> But I want to get a complete ODT document as output. Don't I need to use
> --standalone? If I do then this fix would do nothing for me.
>
>
>>
>> A workaround for you would be to preprocess the input, or
>> run in --standalone mode and use a lua filter that extracts
>> the metadata title and inserts a level 1 header with its content
>> at the beginning of the document.
>>
>
> Preprocessing the input with a mere search and replace, changing
> class="title" to class="meow", is a simple approach that works. But it is a
> mandatory extra step.
>
> Yours, Mikhail Ramendik
>
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/684df614-496b-455f-aa2d-e602b19c96b0%40googlegroups.com.
next prev parent reply other threads:[~2019-08-28 0:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-27 0:07 Mikhail Ramendik
[not found] ` <8a9e115c-2983-47d7-a7df-82af5d73822c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-27 16:33 ` John MacFarlane
[not found] ` <m2mufuefgc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-27 22:54 ` Mikhail Ramendik
[not found] ` <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-28 0:53 ` John MacFarlane [this message]
[not found] ` <yh480kk1ayxg7w.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-28 1:11 ` Mikhail Ramendik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yh480kk1ayxg7w.fsf@johnmacfarlane.net \
--to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
--cc=mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).