From: Aditya Mahajan <adityam@umich.edu>
Subject: Re: counting the words in a TeX document
Date: Sat, 5 Aug 2006 13:02:34 -0400 (EDT) [thread overview]
Message-ID: <Pine.WNT.4.63.0608051252050.3556@nqvgln> (raw)
In-Reply-To: <6faad9f00608050945g5f829eaeka4afdee9858c7df8@mail.gmail.com>
On Sat, 5 Aug 2006, Mojca Miklavec wrote:
> I would like to ask how difficult it would be to count the number of
> words in a TeX/ConTeXt document. If it's too complex, please ignore
> the rest of the message.
>
>
> Most recipes for LaTeX say that it's best to do something like
> "pdftotext" and then issue "wc" to count the words in the resulting
> text file, but windows users don't have "wc" and sometimes you only
> need to know the length of the abstract or so ...
>
> Some time ago Hans mentioned that he counts the number of appearance
> of single charactres, but I don't know how difficult it would be to
> extend it to count the number of words.
>
> The problem is not that well defined (how to handle equations, some
> would probably want to exclude headers, footers, buttons, ...), but it
> only needs to be an approximation and "backward compatibility" (in the
> sense that counter would have to result in the same number after some
> years) is not needed at all since algorithms might improve with time
> and the resulting document doesn't really depend on that number, it
> would only be written to the log file.
>
> My idea for the interface would be something like
>
> \startwordcount[abstract]
> \startframedtext
> Bla bla.
> \stopframedtext
> \stopwordcount
>
> which would write something like "abstract: 2 words" to the log file
>
> or
>
> \startstatistics[abstract][words]
> \startframedtext
> Bla bla.
> \stopframedtext
> \stopstatistics
>
> But this is really a low priority. I'm currently using Acrobat to copy
> the text, then I paste it into Office and take a look at statistics
> there when I need to obey some limitations.
>
> So, if there's a simple solution, I would be glad to use it, but if it
> takes too much time to implement it, it's probably not worth the
> effort.
A very crude approach. There is a program called detex
http://ctan.org/tex-archive/support/detex/ I have not used it, but I
think that it strips off every command \something from the tex file.
Then you can filter the file through wc to get a rough estimate of
the number of words. One approach that will work is
\startstatistics[filename][words|letters|lines]
maps to
\startbuffer[\jobname-statistics-filename]
and
\stopstatistics maps to
\stopbuffer
\getbuffer[\jobname-statistics-filename]
\executesystemcommand{detex \jobname-statistics-filename.tmp | wc
<flags correspondingto words|lines|letters> }
and possibly prettify output to be more clearly visible in the log.
Another approach can be write a vim script so that you can count the
number of words in a visually highlighted area.
Aditya
next prev parent reply other threads:[~2006-08-05 17:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-05 16:45 Mojca Miklavec
2006-08-05 17:02 ` Aditya Mahajan [this message]
2006-08-05 17:52 ` gnwiii
2006-08-05 20:07 ` Hans Hagen
2006-08-06 0:31 ` Mojca Miklavec
2006-08-06 15:00 ` Hans Hagen
2006-08-06 17:27 ` Aditya Mahajan
2006-08-07 8:24 ` Mojca Miklavec
2006-08-07 9:22 ` Hans Hagen
2006-08-07 18:54 ` Mojca Miklavec
2006-08-07 20:55 ` Hans Hagen
2006-08-07 21:31 ` Mojca Miklavec
2006-08-08 0:49 ` Aditya Mahajan
2006-08-08 7:54 ` Hans Hagen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.WNT.4.63.0608051252050.3556@nqvgln \
--to=adityam@umich.edu \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).