ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: gnwiii@gmail.com
Subject: Re: counting the words in a TeX document
Date: Sat, 5 Aug 2006 14:52:28 -0300	[thread overview]
Message-ID: <22af238a0608051052ycc17efaia63642c7ec44f152@mail.gmail.com> (raw)
In-Reply-To: <6faad9f00608050945g5f829eaeka4afdee9858c7df8@mail.gmail.com>

On 8/5/06, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote:
> Hello,
>
> I would like to ask how difficult it would be to count the number of
> words in a TeX/ConTeXt document. If it's too complex, please ignore
> the rest of the message.

It wasn't too complex for Michael Downes using LaTeX:

\ProvidesFile{wordcount.tex}[2000/09/27 v1.5 Michael Downes]
% Copyright 2000 Michael John Downes
% This file has no restrictions on its use, distribution, or sale.
%
% If you run LaTeX on wordcount.tex it will prompt you for the name of a
% document to be counted. For most people, however, it will be more
% convenient to run the shell script wordcount.sh, giving the document
% name as the first argument. The comments in wordcount.sh
% give further information about the usage and limitations of this tool.

% The fundamental idea is to mark each character and interword space
% with a unique tag that will show up in TeX "showbox" output. Then
% arrange to make the output routine trigger a TeX overfull vbox message
% for the page box so that everything gets reported in the TeX log.
% Then run grep -c (or an equivalent text search utility, e.g., perl) on
% the log file to count the occurrences.
% [....]

> Most recipes for LaTeX say that it's best to do something like
> "pdftotext" and then issue "wc" to count the words in the resulting
> text file, but windows users don't have "wc" and sometimes you only
> need to know the length of the abstract or so ...

Many GNU utilities have been ported (GNUWin32), or can be implemented
in perl/ruby which context uses anyway.

> Some time ago Hans mentioned that he counts the number of appearance
> of single charactres, but I don't know how difficult it would be to
> extend it to count the number of words.
>
> The problem is not that well defined (how to handle equations, some
> would probably want to exclude headers, footers, buttons, ...), but it
> only needs to be an approximation and "backward compatibility" (in the
> sense that counter would have to result in the same number after some
> years) is not needed at all since algorithms might improve with time
> and the resulting document doesn't really depend on that number, it
> would only be written to the log file.
>
> My idea for the interface would be something like
>
> \startwordcount[abstract]
> \startframedtext
> Bla bla.
> \stopframedtext
> \stopwordcount
>
> which would write something like "abstract: 2 words" to the log file
>
> or
>
> \startstatistics[abstract][words]
> \startframedtext
> Bla bla.
> \stopframedtext
> \stopstatistics
>
> But this is really a low priority. I'm currently using Acrobat to copy
> the text, then I paste it into Office and take a look at statistics
> there when I need to obey some limitations.
>
> So, if there's a simple solution, I would be glad to use it, but if it
> takes too much time to implement it, it's probably not worth the
> effort.

ConTeXt already analyzes the "scratch" files with perl or ruby, so if
you can adapt MD's idea it shouldn't be a big deal to have texexec
print the result.

-- 
George N. White III <aa056@chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

  parent reply	other threads:[~2006-08-05 17:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-05 16:45 Mojca Miklavec
2006-08-05 17:02 ` Aditya Mahajan
2006-08-05 17:52 ` gnwiii [this message]
2006-08-05 20:07 ` Hans Hagen
2006-08-06  0:31   ` Mojca Miklavec
2006-08-06 15:00     ` Hans Hagen
2006-08-06 17:27     ` Aditya Mahajan
2006-08-07  8:24       ` Mojca Miklavec
2006-08-07  9:22         ` Hans Hagen
2006-08-07 18:54           ` Mojca Miklavec
2006-08-07 20:55             ` Hans Hagen
2006-08-07 21:31               ` Mojca Miklavec
2006-08-08  0:49                 ` Aditya Mahajan
2006-08-08  7:54                 ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22af238a0608051052ycc17efaia63642c7ec44f152@mail.gmail.com \
    --to=gnwiii@gmail.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).