public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Ioan Muntean <imuntean-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: "pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org"
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Docx reader and numbered customized styles
Date: Wed, 15 Nov 2023 05:32:00 +0000	[thread overview]
Message-ID: <SN7PR15MB5635AB592F88C9B03F88840DF9B1A@SN7PR15MB5635.namprd15.prod.outlook.com> (raw)
In-Reply-To: <ZVPw1A54Xry2zGHT@localhost>

[-- Attachment #1: Type: text/plain, Size: 4337 bytes --]

Bastien,
Thanks! This looks helpful. I will try to play with the lua ByteStringReader and then lpeg. The first question I have is how do I deal with a docx file as a zip file in LUA? Or should I unzip the docx first in a pipeline .bat command?
Thanks in advance!
Ioan

________________________________
From: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> on behalf of Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
Sent: Tuesday, November 14, 2023 4:12 PM
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Docx reader and numbered customized styles

I guess that it could involve writing a custom reader for the docx format that would do `pandoc.read(input, 'docx')` to get the Pandoc AST of the document, uncompress the DOCX file, read the "styles" file, and set global metadata in the AST matching the configuration of the Headings styles. Then, this metadata may be used by a filter while exporting to LaTeX.

Well, I think that it would be easier to rename the heading styles or to insert some information at the beginning of your file to be processed and removed by the filter for LaTeX export.

Le Tuesday 14 November 2023 à 01:55:56PM, Ioan Muntean a écrit :
> Hi Bastien
> I have a related question that is not immediately connected to special styles,
> but the Headings 1, Headings 2 etc.
> In my MS Word document, Headings 1 and so on are numbered with a specific set
> of multilist levels. I am curious whether there is a way to pass the type of
> numbering from Headings 1 style in Word to markdown or later to Latex. I work
> often with LUA filters, but in the -t native format of docx, Headings do not
> have any specification, online numbered list or special paragraphs. So how do
> we recover the numbering of Headings styles?
> One way to deal with it would be to rename Headings 1 to headingsnumbered 1 and
> deal with that special style. Is there any other way to do this?
> Thanks in advance!
> Ioan
>
> On Thursday, October 26, 2023 at 11:49:05 AM UTC-5 Bastien DUMONT wrote:
>
>     > So is the -f docx+styles working with the docx reader, too? If so, how?
>
>     -f docx+styles means “use the docx reader and enable the ‘styles’
>     extension”, so yes! As is written in the manual, it renders the styles as
>     divs and spans with a “custom-style” attribute. You will have to use a
>     filter to convert some of these divs and spans to whatever code you want in
>     your LaTeX file.
>
>     Or are you talking about customized lists, not custom styles?
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [2]https://groups.google.com/d/msgid/
> pandoc-discuss/5652a76c-59ab-4056-ac00-92732e13698en%40googlegroups.com.
>
> References:
>
> [1] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [2] https://groups.google.com/d/msgid/pandoc-discuss/5652a76c-59ab-4056-ac00-92732e13698en%40googlegroups.com?utm_medium=email&utm_source=footer

--
You received this message because you are subscribed to a topic in the Google Groups "pandoc-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pandoc-discuss/7BCIWpu8em0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ZVPw1A54Xry2zGHT%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/SN7PR15MB5635AB592F88C9B03F88840DF9B1A%40SN7PR15MB5635.namprd15.prod.outlook.com.

[-- Attachment #2: Type: text/html, Size: 7433 bytes --]

  reply	other threads:[~2023-11-15  5:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26 15:27 Ioan Muntean
     [not found] ` <53f12b55-0d77-42de-bba2-b88e91f59eecn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-10-26 15:36   ` Bastien DUMONT
2023-10-26 16:24     ` Ioan Muntean
     [not found]       ` <6bc0ec42-4f2b-4832-8b08-827b913669cen-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-10-26 16:48         ` Bastien DUMONT
2023-10-31 15:09           ` Ioan Muntean
2023-11-14 21:55           ` Ioan Muntean
     [not found]             ` <5652a76c-59ab-4056-ac00-92732e13698en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-11-14 22:12               ` Bastien DUMONT
2023-11-15  5:32                 ` Ioan Muntean [this message]
     [not found]                   ` <SN7PR15MB5635AB592F88C9B03F88840DF9B1A-xz2k8ToKDo8Cw+E34yPvHod3EbNNOtPMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2023-11-15  8:06                     ` Bastien DUMONT

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN7PR15MB5635AB592F88C9B03F88840DF9B1A@SN7PR15MB5635.namprd15.prod.outlook.com \
    --to=imuntean-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).