public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org>,
	pandoc-discuss
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: reading html, <h1 class="title"> header ignored
Date: Tue, 27 Aug 2019 09:33:07 -0700	[thread overview]
Message-ID: <m2mufuefgc.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <8a9e115c-2983-47d7-a7df-82af5d73822c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


This is because pandoc uses <h1 class="title"> for metadata
titles when rendering HTML.  So, for better round-trip consistency,
we parse these as metadata when reading HTML.

Every once and a while someone runs into this issue, using HTML
created elsewhere that uses the same class.

It would probably have been better, in retrospect, to use a
class like "pandoc-title".  I'd be reluctant to change that now,
though, since it would affect lots of customized templates.

One possibility would be to change pandoc's HTML reader so that
<h1 class="title"> is normally parsed as a regular level-1
heading, UNLESS <meta generator="pandoc"> is present in the
head section.  That would allow nice round tripping from pandoc
but not get in the way of other HTML-producers.

However, it may be that pandoc's current behavior is actually
better in many cases, even when processing HTML produced by
other sources.  So it's quite possible that making this change
would lead to a surge of complaints. (Comments welcome on this.)

Another, probably better approach would be to parse
<h1 class="title"> as a metadata title when pandoc is run
with --standalone, but not when pandoc is run in fragment mode.
(Currently, in fragment mode, the h1 just disappears, since
no metadata is created.)  Feel free to add an issue to the
tracker suggesting this (and, comments welcome from anyone).

A workaround for you would be to preprocess the input, or
run in --standalone mode and use a lua filter that extracts
the metadata title and inserts a level 1 header with its content
at the beginning of the document.



Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org> writes:

> Hello,
>
> I am converting an HTML file to ODT. (The problem is with the reader, not 
> writer, as it also reproduces of converting to MediaWiki).
>
> My HTML generator uses the <h1 class="title"> markup for chapter titles. 
> And these titles end up entirely missing on pandoc output.
>
> If I do a replace in the file so the tag looks like <h1 class="meow"> 
> instead and then convert, the titles are in place.
>
> How can I make pandoc process chapter titles that are marked up with <h1 
> class="title"> and include them in the output? Or do I need to create a 
> bug/issue somewhere?
>
> $ pandoc --version 
> pandoc 2.1.2 
> Compiled with pandoc-types 1.17.3.1, texmath 0.10.1.2, skylighting 0.6
>
> (Installed from Fedora 29 repository).
>
> Yours, Mikhail Ramendik
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8a9e115c-2983-47d7-a7df-82af5d73822c%40googlegroups.com.


  parent reply	other threads:[~2019-08-27 16:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  0:07 Mikhail Ramendik
     [not found] ` <8a9e115c-2983-47d7-a7df-82af5d73822c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-27 16:33   ` John MacFarlane [this message]
     [not found]     ` <m2mufuefgc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-27 22:54       ` Mikhail Ramendik
     [not found]         ` <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-28  0:53           ` John MacFarlane
     [not found]             ` <yh480kk1ayxg7w.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-28  1:11               ` Mikhail Ramendik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2mufuefgc.fsf@johnmacfarlane.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).