If you want a home-grown output format you may want to try a custom writer written in Lua. You may also want a filter which "splits" runs of styled text with embedded other styles into the kind of combined style runs you want. What I have done in the past is "fake" writer implemented as a filter which inserted enough raw markup to make Pandoc's plain output format look like POD, the documentation format used with Perl. That was written in Perl before there were Lua filters. A filter written in Perl or Python is probably still more powerful than one written in Lua. Another possibility is to read in the JSON representation of Pandoc's internal representation, which is much simpler than the DOCX xml, and convert it into your own format without ever sending it back to Pandoc. Especially if whitespace between block content isn't significant in your format this may work well.

As for xml I have the same attitude to that as to taking my medicine: no fun and bad taste but I have to deal with it. Give me an OO abstraction layer, preferably tailored to the application, any day and please please don't make me have to search it! In particular I would have a word or two to say to the person who came up with using xml for data!


Den ons 14 aug. 2019 14:33Andrew Brown <c18.org.c18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
Indeed, BP, I want to convert to my own format in which every change of typographic style has its own delimiters -- and my use of angle brackets is unconnected with html. Once I have my text in that state I can convert it by calculation within FMP to whatever I need, Adobe Tagged Text perhaps, or TeX, or CSS.

I probably have the solution with docconverter.pro (see my second message above) so in the first instance I shall try that on the final files, not yet received, and will return to Pandoc if need be.

Many thanks for your kind offer of assistance. I'm probably one of the older persons on this list and consider that it has all been downhill since 1980, with XML making the final fall into the abyss of unbounded prolixity, arm-in-arm with TEI. There's probably a term for my condition -- other than senility -- which is probably incurable.

AB



Le mercredi 14 août 2019 12:36:16 UTC+2, BP Jonsson a écrit :
John, I think Andrew wants to convert to some other format, perhaps docbook?

Andrew, is the above correct? In any case please tell which format you want to convert to. Also can you please send a DOCX file containing your example as an attachment, so that I can inspect Pandoc's internal representation of it and see if I can write a Lua filter which gives you the output you want?

It wouldn't surprise me if this is due to some buggy ebook reader(s) having a picky, nonstandard idea of what things should look like. Have you tried several ebook readers on Pandoc's output? If so have you got different results or the same?

Do you mean that you don't want the space between words to be struck out? That should be fairly easy to fix with a Lua filter, I think. I would however need to know which output formats you are converting to so that the filter can support them all.

As for colored text, underline etc. Pandoc doesn't support all kinds of styled text out of the box. However you should be able to use named styles instead of automatic styles in your DOCX document and a Pandoc Lua filter to work around it. If you already have DOCX documents using automatic styles you may be able to convert them with a Word or LibreOffice macro. I have used Linux exclusively for several years so my Word skills are a bit stale, but if it is OK temporarily convert the DOCX file to an ODT file I may be able to help you even doing batch conversion.

Den tis 13 aug. 2019 20:36John MacFarlane <j...-TVLZxgkOlNWn+EJxYGL2xA@public.gmane.orgu> skrev:

I'm confused about what problem you're having.
In HTML, this is absolutely correct:

    <del>ou <bold>cest</bold> par cette force, <bold>que</bold> la
    planette</del>

The scope of the del element is the entire thing.  (And there is
no <bold+del> element in HTML.

Andrew Brown <c18....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> We have several hundred pages in .docx similar to this
>
> ou cest par cette force, que la planette
>
>
> Having tried epub3, html, tei, db5, epub, icml, muse, rst and textile I am
> getting at best something like this
>
> <del>ou <bold>cest</bold> par cette force, <bold>que</bold> la
> planette</del>
>
>
> while I need
>
> <del>ou </del><bold+del>cest</bold+del><del> par cette force,
> </del><bold+del>que</bold+del><del> la planette</del>
>
>
> The result will be stored in Filemaker Pro and converted by calculation to
> Adobe Tagged Text.
>
> Icml comes close, but treats underlined text as normal text, a bug surely.
>
> Hopeless quest?
>
> AB
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/35ac9cc9-d30f-46f9-a2e5-4aa19cddc3b0%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kd0h8lxpw.fsf%40johnmacfarlane.net.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/72b3010b-eeff-419b-8643-9a09d6341205%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhD_xAmQv8uKTJqKFWnGkq9iK-NpvN0dozKR6jxWWc1rZA%40mail.gmail.com.