BTW, just see thatP Pandoc of course also escapes the {} after index, so those need to be fixed as well. Easiest by pushing through sed, too: pandoc D.docx -t latex | sed -E 's/\+\+index\\\{(.*?)\\\}/\\index{\1}/g' >D.tex Hendrik Seliger schrieb am Freitag, 4. Februar 2022 um 14:46:40 UTC+1: > Hi! > > I just ran into the same problem and solved it somewhat manually, but it > worked well for a 200+ pages document. > Basic approach: unzip the docx file (these are zip-archives with a > different extension to the filename), then tweak it a bit to put a text > entry of the index we can later use in the file converted by Pandoc, re-zip > the beast and the use Pandoc to create a LaTeX file. With the below > scripts, the converted file would have a '++index{Index entry}' for each > entry. Of course, the '++index' needs to be find-and-replaced to '\index'. > Done. > > Here the details (please excuse the Markdown, I copied from my personal > wiki). Hope this helps the one or other out there… > > > # Preserving indices > Pandoc does not do indices. So to keep them, unzip the Word file, open > `document.xml` in Atom, and replace the index entries with the LaTeX > command. > > First make all xml-tags stand on one line. And replace all index entries > by a LaTeX-command. I am using `++` instead of the `\` to make later > replacement in the TeX-file easier. _Of course, check after conversion in > Pandoc that the `++` needs to me manually turned into a `\` for the > index-commands to work._ > > Now the difficult part: any lines before and after a line starting with > `++index` need to be removed, from and until a line starting with something > else than `<`. And, there could be several index commands running into each > other without any normal text between. > > So we pull out any index entry and make a `++index` out of it using `sed`, > which is dirty, but quick. Then perl is dropped onto the file to pull the > `++index` before any xml-tags, which puts it right behind the normal text, > where they should go later. Then we write the rest out as is, to make sure > all xml-tags Word kept open are properly closed. One small alteration: any > text in the Word index command is simply replaced by _FOO_, so it would be > easier to track if anything went wrong or these indices somehow pop up > again. > > This can be achieved with the following perl-script saved in > WordIndex2LaTeX.pl (and made executable, so chmod +x WordIndex2LaTeX.pl): > ``` > #!/usr/bin/perl > > # @Author: Hendrik G. Seliger > # @Date: 4 February 2022, 11:34 +01:00 > # @Filename: WordIndex2LaTeX.pl > # @Last modified time: 4 February 2022, 11:36 +01:00 > # @License: GPL3 > # @Copyright: © Copyright 2022 by Hendrik G. Seliger > > ### > $keptlines=''; > $indexlines=''; > > while ( ) { > if ( ( $_ =~ /^ word index entry > $keptlines .= $_; # save the line > } elsif ( $_ =~ /^++index/ ) { > # Found an index entry. Now, put the LaTeX command BEFORE > the > # Word tag, so that the tags are correctly opened and > closed, but > # the LaTeX command appears first > $indexlines .= $_; # save the line > } else { # normal text line, print all kept ones and current, > erase memory > print $indexlines; > print $keptlines; > print $_; > $keptlines=''; > $indexlines=''; > } > } > print $indexlines; > print $keptlines; > ``` > > So hence the conversion is achieved with > > ``` > mkdir D > cd D > unzip ../MyDoc.docx > cd word > cat document.xml|sed -E 's/>/>\n/g' |sed -E 's/(.) 's/^XE "(.*)"/++index{\1}\nXE \"FOO\"/g' | > ../../WordIndex2LaTeX.pl| tr -d '\n' >document2.xml > ``` > > Back up `document.xml` and rename `document2.xml` to `document.xml`. > Re-zip the document > ``` > mv document.xml ../.. > mv document2.xml document.xml > cd .. > zip -r ../D.docx * > cd .. > ``` > > > John MacFarlane schrieb am Montag, 30. August 2021 um 04:07:17 UTC+2: > >> >> Sorry, indexes aren't supported. >> >> DJ Penton writes: >> >> > I am new to pandoc. I am under enormous time pressure to convert a docx >> > file to latex. This has worked beautifully except that alphabetical >> index >> > entries in the docx file do not seem to be preserved. I would have >> expected >> > a latex \index{} tag. Is there a way to do this? >> > >> > I apologise for asking a question that has probably been answered >> > repeatedly. I spent 15 minutes searching for an answer and didn't see >> one. >> > Probably I just missed it. I must continue with other work on the >> document >> > for now. >> > >> > Anyway, thanks in advance; be kind :-) >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/3da0bbf7-36e8-4511-9766-7777ec427133n%40googlegroups.com. >> >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/146889bd-867e-4b83-8b21-98bb01e559d7n%40googlegroups.com.