From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/30002 Path: news.gmane.org!not-for-mail From: "Mojca Miklavec" Newsgroups: gmane.comp.tex.context Subject: counting the words in a TeX document Date: Sat, 5 Aug 2006 18:45:59 +0200 Message-ID: <6faad9f00608050945g5f829eaeka4afdee9858c7df8@mail.gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1154796377 25146 80.91.229.2 (5 Aug 2006 16:46:17 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 5 Aug 2006 16:46:17 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Sat Aug 05 18:46:14 2006 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1G9PHu-0003Pq-RM for gctc-ntg-context-518@m.gmane.org; Sat, 05 Aug 2006 18:46:10 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 43CE11FFB2; Sat, 5 Aug 2006 18:46:10 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01476-04; Sat, 5 Aug 2006 18:46:04 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id AD6CD1FF08; Sat, 5 Aug 2006 18:46:03 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 53C271FF08 for ; Sat, 5 Aug 2006 18:46:02 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 01526-02 for ; Sat, 5 Aug 2006 18:45:59 +0200 (CEST) Original-Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186]) by ronja.ntg.nl (Postfix) with SMTP id AEC8B1FEA0 for ; Sat, 5 Aug 2006 18:45:59 +0200 (CEST) Original-Received: by nf-out-0910.google.com with SMTP id x30so356131nfb for ; Sat, 05 Aug 2006 09:45:59 -0700 (PDT) Original-Received: by 10.78.183.8 with SMTP id g8mr1937575huf; Sat, 05 Aug 2006 09:45:59 -0700 (PDT) Original-Received: by 10.78.175.15 with HTTP; Sat, 5 Aug 2006 09:45:59 -0700 (PDT) Original-To: "mailing list for ConTeXt users" Content-Disposition: inline X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.7 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:30002 Archived-At: Hello, I would like to ask how difficult it would be to count the number of words in a TeX/ConTeXt document. If it's too complex, please ignore the rest of the message. Most recipes for LaTeX say that it's best to do something like "pdftotext" and then issue "wc" to count the words in the resulting text file, but windows users don't have "wc" and sometimes you only need to know the length of the abstract or so ... Some time ago Hans mentioned that he counts the number of appearance of single charactres, but I don't know how difficult it would be to extend it to count the number of words. The problem is not that well defined (how to handle equations, some would probably want to exclude headers, footers, buttons, ...), but it only needs to be an approximation and "backward compatibility" (in the sense that counter would have to result in the same number after some years) is not needed at all since algorithms might improve with time and the resulting document doesn't really depend on that number, it would only be written to the log file. My idea for the interface would be something like \startwordcount[abstract] \startframedtext Bla bla. \stopframedtext \stopwordcount which would write something like "abstract: 2 words" to the log file or \startstatistics[abstract][words] \startframedtext Bla bla. \stopframedtext \stopstatistics But this is really a low priority. I'm currently using Acrobat to copy the text, then I paste it into Office and take a look at statistics there when I need to obey some limitations. So, if there's a simple solution, I would be glad to use it, but if it takes too much time to implement it, it's probably not worth the effort. Thanks a lot, Mojca