Basic approach: unzip the docx file (these are zip-archives with a different extension to the filename), then tweak it a bit to put a text entry of the index we can later use in the file converted by Pandoc, re-zip the beast and the use Pandoc to create a LaTeX file. With the below scripts, the converted file would have a '++index{Index entry}' for each entry. Of course, the '++index' needs to be find-and-replaced to '\index'. Done.

# Preserving indices
Pandoc does not do indices. So to keep them, unzip the Word file, open `document.xml` in Atom, and replace the index entries with the LaTeX command.

First make all xml-tags stand on one line. And replace all index entries by a LaTeX-command. I am using `++` instead of the `\` to make later replacement in the TeX-file easier. _Of course, check after conversion in Pandoc that the `++` needs to me manually turned into a `\` for the index-commands to work._

Now the difficult part: any lines before and after a line starting with `++index` need to be removed, from and until a line starting with something else than `<`. And, there could be several index commands running into each other without any normal text between.

So we pull out any index entry and make a `++index` out of it using `sed`, which is dirty, but quick. Then perl is dropped onto the file to pull the `++index` before any xml-tags, which puts it right behind the normal text, where they should go later. Then we write the rest out as is, to make sure all xml-tags Word kept open are properly closed. One small alteration: any text in the Word index command is simply replaced by _FOO_, so it would be easier to track if anything went wrong or these indices somehow pop up again.

This can be achieved with the following perl-script saved in WordIndex2LaTeX.pl (and made executable, so chmod +x WordIndex2LaTeX.pl):
```
#!/usr/bin/perl

# @Author: Hendrik G. Seliger
# @Date: 4 February 2022, 11:34 +01:00
# @Filename: WordIndex2LaTeX.pl
# @Last modified time: 4 February 2022, 11:36 +01:00
# @License: GPL3
# @Copyright: © Copyright 2022 by Hendrik G. Seliger

###

$keptlines='';
$indexlines='';

while ( <STDIN> ) {
if ( ( $_ =~ /^</ ) || ($_ =~ /^XE / ) ) { # line with xml tag or word index entry
$keptlines .= $_; # save the line
} elsif ( $_ =~ /^++index/ ) {
# Found an index entry. Now, put the LaTeX command BEFORE the
# Word tag, so that the tags are correctly opened and closed, but
# the LaTeX command appears first
$indexlines .= $_; # save the line
} else { # normal text line, print all kept ones and current, erase memory
print $indexlines;
print $keptlines;
print $_;
$keptlines='';
$indexlines='';
}
}
print $indexlines;
print $keptlines;
```

So hence the conversion is achieved with

```
mkdir D
cd D
unzip ../MyDoc.docx
cd word
cat document.xml|sed -E 's/>/>\n/g' |sed -E 's/(.)</\1\n</g' | sed -E 's/^XE "(.*)"/++index{\1}\nXE \"FOO\"/g' | ../../WordIndex2LaTeX.pl| tr -d '\n' >document2.xml
```

Back up `document.xml` and rename `document2.xml` to `document.xml`. Re-zip the document
```
mv document.xml ../..
mv document2.xml document.xml
cd ..
zip -r ../D.docx *
cd ..
```

John MacFarlane schrieb am Montag, 30. August 2021 um 04:07:17 UTC+2:

Sorry, indexes aren't supported.

DJ Penton <jakep...@gmail.com> writes:

> I am new to pandoc. I am under enormous time pressure to convert a docx
> file to latex. This has worked beautifully except that alphabetical index
> entries in the docx file do not seem to be preserved. I would have expected
> a latex \index{} tag. Is there a way to do this?
>
> I apologise for asking a question that has probably been answered
> repeatedly. I spent 15 minutes searching for an answer and didn't see one.
> Probably I just missed it. I must continue with other work on the document
> for now.
>
> Anyway, thanks in advance; be kind :-)
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3da0bbf7-36e8-4511-9766-7777ec427133n%40googlegroups.com.