From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23318 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: reading html,

header ignored Date: Tue, 27 Aug 2019 17:53:55 -0700 Message-ID: References: <8a9e115c-2983-47d7-a7df-82af5d73822c@googlegroups.com> <684df614-496b-455f-aa2d-e602b19c96b0@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="84712"; mail-complaints-to="usenet@blaine.gmane.org" To: Mikhail Ramendik , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBMNBS7VQKGQET55RQUY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Aug 28 02:54:12 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-vk1-f189.google.com ([209.85.221.189]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1i2mDz-000LrF-JH for gtp-pandoc-discuss@m.gmane.org; Wed, 28 Aug 2019 02:54:11 +0200 Original-Received: by mail-vk1-f189.google.com with SMTP id k7sf479797vkn.7 for ; Tue, 27 Aug 2019 17:54:11 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1566953650; cv=pass; d=google.com; s=arc-20160816; b=hBBUtgJpxl1Vfq47EtuvzIIE43/VCsUM/RiLUWxKA4v+EPUeOS3CjrOjwD/4KH/Ald H9q+nfq0WJHt76bQSDBujCR2xnDj1p8VtsEm4QP+Erxz4lcWIn+UoKlK9q2kx2Bsk/1M sJiIb59qeHqns5bw/DHbgVhVhCqSNVTBtDUznysCkzURdVkBranlyKJdbP+1mxJ7/VwG N253qsGjHoTRksYCdk6V1blgdVFT8+3gHwkOgh26aAx+UsTbZGU8LHrFAOhE6xgSHbq/ 9LSV1i4wDWgm2KmnOKmZ8tE1/lbvRrglguRzwBhf7GuL+K2Cc3JL0Lekl4+gbvDtLlhb 0HWg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:references:in-reply-to:subject:to:from:sender:dkim-signature; bh=1mdrWlc1wy6gEBNdO8qYoogKLqnpX+nHIcS04+qFH4E=; b=T3M9514sSdM0+UCNMjjXlD/yjdC0XkJujav0Lzi4a2RJHqRmS6oKJkbno2HBuXvzES OfS1pXRZUTSvmNl2bQS+guvPuAbESn3puCpGKmdtuxiDklUMzQsd2C7NeCDq3SdVNdsG /g/bXJXAH7i5yLOZBCy6vLUsEfgbjtrJqR1F7yIhRPixbeeVOSgB65FcU+0K80AGY8Wi BIvkIIDVimRwQEBhGEyegcZFYNnnfTaNShCOMEd/nxoQmg+7U+T0J8SZYyNGPGqcS/z3 S2X+EU5cehF6KrqPMqBsT8e0Zu6bLEyBsUfTmUmt1Qvp7/eWUhGt7qsqqxxyDuQySOQd bWJQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="T3Pna1L/"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::429 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=1mdrWlc1wy6gEBNdO8qYoogKLqnpX+nHIcS04+qFH4E=; b=Y9hy+5RStSqTDA+hbiS2GWEjP+BL+pcvkn0tOYOkwdISwShQPwW1z3C0WEotcmkIZX VpDuAO5UoMW01eU7taqdrdzBl4k2wmfNhQ01gbovoZ33XszpiySVnmJiYmoWJkCWDhKV ofBEV2XODaL4oKwBNUlAumNLht/Ld7R1sOGHVKjgiGtr9FhfigDFMnSlGy2bPaoOGSiw CObOp6AbEQL5jqF6XIk0I0Pj42Ib33es4DbUNepJnOJDG7DyBGUTu3B0/JQuHahtQi5s In7ghT7qBw/xJp0oXM3cPpkHNmrrBLgyW4cXyAEjNYmO8oxFOkRs1h0CEDCTgBn1h/kc /zgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=1mdrWlc1wy6gEBNdO8qYoogKLqnpX+nHIcS04+qFH4E=; b=kPH88Xc/XiAjINu+rbkdWnuJwykIfyzsx9Yyb7bMj+4UcOmOelEpp950xV4k76Uplk tX3K94zH/kz7tD8t5NaR5No8HUBqZ2V70jRfYgBHGCb+DiIz2qayJ17RXoL661YLWiwR RwFEEgj49MI9Pzzj0VGMx22BGQjylqFZsvYqhQy0OAckQqXgXQNETNW5hfn3CeV0KK1E 4CRcvRUjMcqtS+gLE0gmnXBk3o83HOrbZfMRaw+EhKdAoxkQOj6NBPaKUaqzCsr2+YNl Oihoz3HU2L2n84RfxRlsdW5RAqZMSvA+w8JTSt2uBBXm6YwX2anHV44WS21/1SHPey7N Ohaw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAWbIQslf2BkDn5Y4pa40Z9Y5O4ruY/c83KTRR0LHByMxGpcojuH UzrjWkPFGYak0dCGe89dCWc= X-Google-Smtp-Source: APXvYqxlTN/qvn674AdJ15ilr7XSBe8qiXQ1Y4K+hA3lDFqgr2Syy7CgVIx30KuzW9MLkvUW4gL3NQ== X-Received: by 2002:a67:da1e:: with SMTP id v30mr910292vsj.209.1566953650256; Tue, 27 Aug 2019 17:54:10 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ab0:583:: with SMTP id e3ls44401uae.6.gmail; Tue, 27 Aug 2019 17:54:09 -0700 (PDT) X-Received: by 2002:ab0:36:: with SMTP id 51mr602932uai.105.1566953649345; Tue, 27 Aug 2019 17:54:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566953649; cv=none; d=google.com; s=arc-20160816; b=s8a2QRaZl/285Lwk+PKxL7w+UjK7GkOxBStQgyhvou1vAD4AiRHOK95bI2i1sLAHzJ YK/rgOeESgv1WPPh1kyGZyWCOvb2zd83+oa/Y/LJER3ohnQaeacYdJPT70Z3RUiU4nlM h4ctZndgAvM3kLzO0AfUvliGAJzU/ufi+lWc/PmJDUJDBBY3tbpFzr2A2cJOhDBhVxW1 THOnyKC61v4H0VFc8vcWkZOU9bjHqxmpzJ7B1MjVvCqgkRwYrST/JmV3oj+oR4SzEKtS AE91d9ohnzr/8c+q51hi65AW6p75zH8BL6NfNKp0rznc8rOm08+JM13mD+cBxH9DYH6w /lSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=FH7XvInM0xGf00npUFaR/PR1Ez9MhY/ihFvFNP4RDcU=; b=xoMY0B+bGZO8a8FTcUi7cfU3gDl323nb9jAQlhqfDxVxlHqgpdGwarfEj1X/126IVj faFvqxnJtJaultWZiuUkSRF992DteVewUKwYmoGEIoDWkRN3QDxwYuBj6pRkOm51TUQA GUBN62P/wV13hLRFzYleWMbpx/G9Yj00npwMFYqKYg3JhPsz2Oedarq4FK+u1ebI8hHg 3mChgHHxyRuHuq0j3kVZtpoX3pThj/rlFouUC4ChWuNPimQrAsUOpayGtKtzqPabDdU+ JQkOAhiC4LmsXIT135O7Vw9e/zOynYT5SAFLN0lQIsmxhuouqhfW7+eKcmYAIYLw4AzJ QW1g== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="T3Pna1L/"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::429 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com. [2607:f8b0:4864:20::429]) by gmr-mx.google.com with ESMTPS id z67si42951vsb.1.2019.08.27.17.54.09 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Aug 2019 17:54:09 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::429 as permitted sender) client-ip=2607:f8b0:4864:20::429; Original-Received: by mail-pf1-x429.google.com with SMTP id y200so502524pfb.6 for ; Tue, 27 Aug 2019 17:54:09 -0700 (PDT) X-Received: by 2002:a63:f048:: with SMTP id s8mr1139206pgj.26.1566953647902; Tue, 27 Aug 2019 17:54:07 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id f6sm485680pga.50.2019.08.27.17.54.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Aug 2019 17:54:06 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id B412DA18E; Tue, 27 Aug 2019 20:53:55 -0400 (EDT) In-Reply-To: <684df614-496b-455f-aa2d-e602b19c96b0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="T3Pna1L/"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::429 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23318 Archived-At: I take back one thing I said. The

just gets skipped; it doesn't get parsed into a metadata title. If it did, it would appear as a title heading in your ODT document and I think you'd have no complaints. So maybe this can be solved by changing the HTML reader to insert the contents of

into the title metadata, if it hasn't already been populated by a tag in the header (I assume your document lacks one?) Mikhail Ramendik <mr-eJ/51bLfIl8ox3rIn2DAYQ@public.gmane.org> writes: > Hello, > > Thank you very much for your response! > > On Tuesday, August 27, 2019 at 5:33:24 PM UTC+1, John MacFarlane wrote: >> >> >> One possibility would be to change pandoc's HTML reader so that >> <h1 class="title"> is normally parsed as a regular level-1 >> heading, UNLESS <meta generator="pandoc"> is present in the >> head section. That would allow nice round tripping from pandoc >> but not get in the way of other HTML-producers. >> > > >> However, it may be that pandoc's current behavior is actually >> better in many cases, even when processing HTML produced by >> other sources. So it's quite possible that making this change >> would lead to a surge of complaints. (Comments welcome on this.) >> > > I would suggest that this behaviour become the default, BUT you add a > command line option to invoke the present behaviour. > > So: > > - with <meta generator="pandoc">, process <h1 class="title"> as metadata > - with --title-metadata (or similar), process <h1 class="title"> as metadata > - otherwise process <h1 class="title"> as a header > > >> >> Another, probably better approach would be to parse >> <h1 class="title"> as a metadata title when pandoc is run >> with --standalone, but not when pandoc is run in fragment mode. > > > But I want to get a complete ODT document as output. Don't I need to use > --standalone? If I do then this fix would do nothing for me. > > >> >> A workaround for you would be to preprocess the input, or >> run in --standalone mode and use a lua filter that extracts >> the metadata title and inserts a level 1 header with its content >> at the beginning of the document. >> > > Preprocessing the input with a mere search and replace, changing > class="title" to class="meow", is a simple approach that works. But it is a > mandatory extra step. > > Yours, Mikhail Ramendik > >> >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/684df614-496b-455f-aa2d-e602b19c96b0%40googlegroups.com.