From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/30044 Path: news.gmane.org!not-for-mail From: Aditya Mahajan Newsgroups: gmane.comp.tex.context Subject: Re: counting the words in a TeX document Date: Sun, 6 Aug 2006 13:27:23 -0400 (EDT) Message-ID: References: <6faad9f00608050945g5f829eaeka4afdee9858c7df8@mail.gmail.com> <44D4FA91.3080808@wxs.nl> <6faad9f00608051731t1dc00da2v73ad192dedd4835c@mail.gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1154885266 16260 80.91.229.2 (6 Aug 2006 17:27:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 6 Aug 2006 17:27:46 +0000 (UTC) Cc: Benjamin Gorinsek Original-X-From: ntg-context-bounces@ntg.nl Sun Aug 06 19:27:43 2006 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1G9mPa-0003pZ-Og for gctc-ntg-context-518@m.gmane.org; Sun, 06 Aug 2006 19:27:38 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 40A061FFED; Sun, 6 Aug 2006 19:27:37 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 23339-03; Sun, 6 Aug 2006 19:27:32 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id EC5421FFE0; Sun, 6 Aug 2006 19:27:31 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id CAD451FFE0 for ; Sun, 6 Aug 2006 19:27:29 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 23337-05-2 for ; Sun, 6 Aug 2006 19:27:27 +0200 (CEST) Original-Received: from tombraider.mr.itd.umich.edu (smtp.mail.umich.edu [141.211.93.161]) by ronja.ntg.nl (Postfix) with SMTP id 5AF501FFD9 for ; Sun, 6 Aug 2006 19:27:27 +0200 (CEST) Original-Received: FROM aditya.annarb01.mi.comcast.net (c-68-40-50-205.hsd1.mi.comcast.net [68.40.50.205]) BY tombraider.mr.itd.umich.edu ID 44D6267C.240D8.16937 ; 6 Aug 2006 13:27:25 -0400 Original-To: mailing list for ConTeXt users In-Reply-To: <6faad9f00608051731t1dc00da2v73ad192dedd4835c@mail.gmail.com> X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.7 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:30044 Archived-At: On Sun, 6 Aug 2006, Mojca Miklavec wrote: > Base on those three answers I got a more clear idea of two (different, > but complementary) methods that might be sensible: > > a) ctxtools --wordcount filename[tex|pdf] > to do the wordcount for the whole document using pdftotext + ruby regexp > > b) > \usemodule[wordcount] > > whatever > > \startstatistics[name][words|letters|lines] > some more-or-less plain text > \stopstatistics > > whatever > > and according to Aditya's idea, run a (ruby) regular expression > (insead of detex) on it which would write the nicely formatted desired > number to the output/log file. (I don't know if it's possible to use > the first approach for the second problem, but it doesn't make sense > to complicate things too much.) If you have a script that counts words in a Context document, the second approach is straight forward. Write everything to a buffer and run the script on the buffer. However, such a mechansim will never be perfect (or close to perfect) in the sense of parsing arbitrary input. ftp://tug.ctan.org/pub/tex-archive/macros/plain/contrib/misc/xii.tex But of course, you will not write anything like this in an abstract :-) Aditya