From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23238 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: BP Jonsson Newsgroups: gmane.text.pandoc Subject: Re: Docx to explicit codes Date: Wed, 14 Aug 2019 12:36:02 +0200 Message-ID: References: <35ac9cc9-d30f-46f9-a2e5-4aa19cddc3b0@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000006c780205901153e4" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="182018"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Andrew Brown To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDIY76M674FRBHOIZ7VAKGQEZ4AMUUI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Aug 14 12:36:16 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-oi1-f191.google.com ([209.85.167.191]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1hxqdb-000lBG-9C for gtp-pandoc-discuss@m.gmane.org; Wed, 14 Aug 2019 12:36:15 +0200 Original-Received: by mail-oi1-f191.google.com with SMTP id g204sf1879537oif.14 for ; Wed, 14 Aug 2019 03:36:15 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1565778974; cv=pass; d=google.com; s=arc-20160816; b=RumH+WJtWKiX48SD5vEmfjVab+ELPUKOwMJdX/jVjtmTpM8T0PedFZwyxTdXLhrb3q QjiaVSsL+uMglnbsIYbOAnN6TmgyujQxzFla41Uaw5TgklqLZ5tOHBbqPqOQgWlf89+2 kRAF9ImiMmvkhZwT2xrFuQbDRqKaG+lpTwwA0eTDZyziKD6udL+JFVi5b/QWm1h2sBKt nf8Gik+oWjLkcIscDllloJ12oxUN514S1BnY9mJk70WFrrroIoFCkaKgFCdRAsRsXI3a TmmktkBg8Aj4GPAmNDZTa0YSsfD0CHZpxg1RXH8qdkmuE8cCqhDZoh0HgJqeQjbfRo3W Tuhg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:cc:to:subject:message-id :date:from:in-reply-to:references:mime-version:sender:dkim-signature :dkim-signature; bh=nvHG9+FrL/N/Ohaprm2jN6byVAX2Wn1sR2/vhdhThQU=; b=UxjMuMH/6DZ6CMAKx8LetUXZAwjoBh07b8R+R1XlxQmuKpffLO9oILpQ+yJaEJVr8p wHHZpZ3x900vfs8JBd4Oggh7cD5eDN4AjK2j8XUajeMFEQTSFSJkTsE5QUAL4tfOFguh C+ipk/xBx+b9obJ8XjxYHP3O8G9KNRKtg2EIO8ym+Lqz3ONQNjhVL9pl9ayM1YFiiQR8 +kFc8yshBUEIBgnRhjud1Jnj0ZW89ReIABdO/Sf7PG3diXL15TBIuAnuG6Bn9Z1sVKe7 WWaHpRSk9u3l/BgzJnPkqmlNcPUIHjoQ7pRSfT+SWC3ireZLQMSOYtfTPbFWAWBKQfwh zKgw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WPmKt5Jj; spf=pass (google.com: domain of bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::42f as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:cc:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=nvHG9+FrL/N/Ohaprm2jN6byVAX2Wn1sR2/vhdhThQU=; b=WJvb5o5l9C4+usa5unVSM2fMEouu4Zb4FyeuYPNpElHPOCkYK2akUD4bxkje2TrNcW mugDMdc0Vb41+Xn62vL7isJTRzmVYYxQa5gb92Ya5nU52y3guwGTdBg2bKFGiVv9cg4p xzQFrjoALHlfoPOI4FeejQoXSLWwZOHU0ujthBGjs+PrZHMFEG475E/gEXacO5e8GmuS P+J9J7yOvbXNvZ82bRjz1kTo9q2xVaqZfeYZDepjkqwmOjNtJ+YPIV7Bndgfyf+UzD/T 1V690tX50AT0QWmqFBtnSQYSU40bI81OJ0FtmSl0CUpacfnqDc8Y1cpKd4VsZiHSziX4 RF2w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=nvHG9+FrL/N/Ohaprm2jN6byVAX2Wn1sR2/vhdhThQU=; b=c79JcfzuqqLZQ/usWoxUIEErb9W8w/urq0IzLmnVkABuqg2BhI1O1CA2WgHN7CNQik HttxAgfUUyVMGRZWvka3Ac+Plfju58WgZxI6220y/RHehDUEeZeAEErLEuQZ46/sZYGA KOhHgHfQCKyIIcH3IUqWclbMwPCInlbCfOVP4kX5DtJdz9SOwYL4/H4JkAcy902O0yxV q7GZWrqRwIgLt9T+sHA450gFMfmqG4rRxX9xbyZCI2U7u9ZvoUsgAH/tO9gf4llwfiXy cgJh3BZKbN3yey8Nx4wo8wScuD1B8NQfXW95DUdSsFANp08zG/cwaJbP/Hduwc00B4e9 9x1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:cc:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=nvHG9+FrL/N/Ohaprm2jN6byVAX2Wn1sR2/vhdhThQU=; b=awznrUbd2Wqayaa72neHmDypP/006q1J9SZ4xxVNEEG4sCxQA9qDmH/unwEuQ2vXNj BsEMlsydKoZ6/NKljWwFY+TZF8USowTKSjhAOih3nh2cy2K02estioADMjJ0o3kntiCd yVfaodzpR2TYCS7P8lkg1Tm+UNQc84hYHvL5eTOUd0BgopOU3eFg3KhjO26TEKCmgKku FaXkbk7A6H2PSyu+1Go1VY+IC4XmPvt/IMqQcUSjPZjetoFi2hmcchVbbgefV0C/AU1r BuWHzsTFlCV2t73Ej9Up4qba/yTgdyZGRLqh+wpS3Kd6Nbuw9SkDmzQi4yFutjDrxWdx lySg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAXU36Fsx8Y6ZbNLumxAjpLtHHF3ygzbogyCoqJpitjBDr67eAkG Kjk+UW/6FckA+5wvs7qAJEY= X-Google-Smtp-Source: APXvYqweQ9B2s9hKJHXzgzblo1Jg7amJQQXsMfBIMZePZnf1z0zJq6FJBRNm+En4OJTlZA+AWn6gIw== X-Received: by 2002:aca:720b:: with SMTP id p11mr4916396oic.85.1565778974010; Wed, 14 Aug 2019 03:36:14 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a9d:eea:: with SMTP id 97ls301395otj.13.gmail; Wed, 14 Aug 2019 03:36:13 -0700 (PDT) X-Received: by 2002:a05:6830:154:: with SMTP id j20mr21853187otp.266.1565778972949; Wed, 14 Aug 2019 03:36:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565778972; cv=none; d=google.com; s=arc-20160816; b=VawTtykR9913MCUjJXQfm87es/gqcHt6Kza2FbuUicMbKZdYM+EvmA1aivK1tnbxQE sgMG8HEC0E5ShPvsm3Z+EBm3pEFcwrFjrsgklRsDobgEWZ5bnpOi5ROFJUAJDnSzeUaw cDxAEntEhErKQtfipsnUHFNRabgcm0Cqe4ypIpmd1Do0hROamY+aCF/Peib6pppZEaNI NJbcYzjWjSayUs/AXitiI3A9XTIn0+q+9OJoWyY+stjjlH+qsheIqN+JOBDelp2Etd0s IdIgEerzFp/Jc4NS8ipcGZ82xR8Fvs2fCvjy2XJw66A7Jf5jrR8D1J+lS1OKeU4hnB2P PirA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=qoKvxeUDbsBsGDY1NXtgcgtVLSHlmenFpylbhpsJoPI=; b=Sn5QayfqcLIb/83XVcRcwupk4MqybCv5sqQZLEYWFLo7R9kY33BFJQnv9TVCNuZ+kr mR1UyruF8sEkQKqFB8Wf6qzrjZd0+WiKR6m8CdnO75J/mrb9zlPOLJD6gcKx4FskJKGt YbZ+CPj+uPvB4Dw6SwR8h8/HT+XtokSSPRYJWU2tTddUNS0nV8MXdMXZxlTzKP/3vIRX U3bVe08VR1iPVMzX96hx0OacdTgalNODnY6V4qAt9J7XlMqoVPSi4IhpgTP7IQP5z3tN xJO2KW9yk/wK+0+ZsTaPibekT/aj2jv96CidVUtwqg2Tt0jh2b/OEaCVWpJF/XofydQD /oTQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WPmKt5Jj; spf=pass (google.com: domain of bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::42f as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com. [2607:f8b0:4864:20::42f]) by gmr-mx.google.com with ESMTPS id p205si7990oic.1.2019.08.14.03.36.12 for (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Wed, 14 Aug 2019 03:36:12 -0700 (PDT) Received-SPF: pass (google.com: domain of bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::42f as permitted sender) client-ip=2607:f8b0:4864:20::42f; Original-Received: by mail-pf1-x42f.google.com with SMTP id w26so8063765pfq.12 for ; Wed, 14 Aug 2019 03:36:12 -0700 (PDT) X-Received: by 2002:a17:90a:de05:: with SMTP id m5mr1859178pjv.48.1565778972013; Wed, 14 Aug 2019 03:36:12 -0700 (PDT) In-Reply-To: X-Original-Sender: bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WPmKt5Jj; spf=pass (google.com: domain of bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::42f as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23238 Archived-At: --0000000000006c780205901153e4 Content-Type: text/plain; charset="UTF-8" John, I think Andrew wants to convert to some other format, perhaps docbook? Andrew, is the above correct? In any case please tell which format you want to convert to. Also can you please send a DOCX file containing your example as an attachment, so that I can inspect Pandoc's internal representation of it and see if I can write a Lua filter which gives you the output you want? It wouldn't surprise me if this is due to some buggy ebook reader(s) having a picky, nonstandard idea of what things should look like. Have you tried several ebook readers on Pandoc's output? If so have you got different results or the same? Do you mean that you don't want the space between words to be struck out? That should be fairly easy to fix with a Lua filter, I think. I would however need to know which output formats you are converting to so that the filter can support them all. As for colored text, underline etc. Pandoc doesn't support all kinds of styled text out of the box. However you should be able to use named styles instead of automatic styles in your DOCX document and a Pandoc Lua filter to work around it. If you already have DOCX documents using automatic styles you may be able to convert them with a Word or LibreOffice macro. I have used Linux exclusively for several years so my Word skills are a bit stale, but if it is OK temporarily convert the DOCX file to an ODT file I may be able to help you even doing batch conversion. Den tis 13 aug. 2019 20:36John MacFarlane skrev: > > I'm confused about what problem you're having. > In HTML, this is absolutely correct: > > ou cest par cette force, que la > planette > > The scope of the del element is the entire thing. (And there is > no element in HTML. > > Andrew Brown writes: > > > We have several hundred pages in .docx similar to this > > > > ou cest par cette force, que la planette > > > > > > Having tried epub3, html, tei, db5, epub, icml, muse, rst and textile I > am > > getting at best something like this > > > > ou cest par cette force, que la > > planette > > > > > > while I need > > > > ou cest par cette force, > > que la planette > > > > > > The result will be stored in Filemaker Pro and converted by calculation > to > > Adobe Tagged Text. > > > > Icml comes close, but treats underlined text as normal text, a bug > surely. > > > > Hopeless quest? > > > > AB > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/35ac9cc9-d30f-46f9-a2e5-4aa19cddc3b0%40googlegroups.com > . > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480kd0h8lxpw.fsf%40johnmacfarlane.net > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTGfyBcABzrU%2B0EYX5430L3o1YSOrbPqHn2rvKENkv-wQ%40mail.gmail.com. --0000000000006c780205901153e4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
John, I think Andrew wants to convert t= o some other format, perhaps docbook?

Andrew, is the above correct? In any case please tell which f= ormat you want to convert to. Also can you please send a DOCX file containi= ng your example as an attachment, so that I can inspect Pandoc's intern= al representation of it and see if I can write a Lua filter which gives you= the output you want?

It= wouldn't surprise me if this is due to some buggy ebook reader(s) havi= ng a picky, nonstandard idea of what things should look like. Have you trie= d several ebook readers on Pandoc's output? If so have you got differen= t results or the same?

D= o you mean that you don't want the space between words to be struck out= ? That should be fairly easy to fix with a Lua filter, I think. I would how= ever need to know which output formats you are converting to so that the fi= lter can support them all.

As for colored text, underline etc. Pandoc doesn't support all kinds= of styled text out of the box. However you should be able to use named sty= les instead of automatic styles in your DOCX document and a Pandoc Lua filt= er to work around it. If you already have DOCX documents using automatic st= yles you may be able to convert them with a Word or LibreOffice macro. I ha= ve used Linux exclusively for several years so my Word skills are a bit sta= le, but if it is OK temporarily convert the DOCX file to an ODT file I may = be able to help you even doing batch conversion.

Den tis 13 aug. 2019= 20:36John MacFarlane <jgm@berkeley.= edu> skrev:

I'm confused about what problem you're having.
In HTML, this is absolutely correct:

=C2=A0 =C2=A0 <del>ou <bold>cest</bold> par cette force, = <bold>que</bold> la
=C2=A0 =C2=A0 planette</del>

The scope of the del element is the entire thing.=C2=A0 (And there is
no <bold+del> element in HTML.

Andrew Brown <c18.org.c18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> We have several hundred pages in .docx similar to this
>
> ou cest par cette force, que la planette
>
>
> Having tried epub3, html, tei, db5, epub, icml, muse, rst and textile = I am
> getting at best something like this
>
> <del>ou <bold>cest</bold> par cette force, <bold&= gt;que</bold> la
> planette</del>
>
>
> while I need
>
> <del>ou </del><bold+del>cest</bold+del><del= > par cette force,
> </del><bold+del>que</bold+del><del> la planett= e</del>
>
>
> The result will be stored in Filemaker Pro and converted by calculatio= n to
> Adobe Tagged Text.
>
> Icml comes close, but treats underlined text as normal text, a bug sur= ely.
>
> Hopeless quest?
>
> AB
>
> --
> You received this message because you are subscribed to the Google Gro= ups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send= an email to pandoc-discuss+unsubscribe@googlegr= oups.com.
> To view this discussion on the web visit https://groups= .google.com/d/msgid/pandoc-discuss/35ac9cc9-d30f-46f9-a2e5-4aa19cddc3b0%40g= ooglegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe@googlegroups.= com.
To view this discussion on the web visit https://groups.google.com/d/msgid/p= andoc-discuss/yh480kd0h8lxpw.fsf%40johnmacfarlane.net.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://group= s.google.com/d/msgid/pandoc-discuss/CAFC_yuTGfyBcABzrU%2B0EYX5430L3o1YSOrbP= qHn2rvKENkv-wQ%40mail.gmail.com.
--0000000000006c780205901153e4--