From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/30054 Path: news.gmane.org!not-for-mail From: "Mojca Miklavec" Newsgroups: gmane.comp.tex.context Subject: Re: counting the words in a TeX document Date: Mon, 7 Aug 2006 10:24:32 +0200 Message-ID: <6faad9f00608070124h2162d8ddj163fd308ca30348a@mail.gmail.com> References: <6faad9f00608050945g5f829eaeka4afdee9858c7df8@mail.gmail.com> <44D4FA91.3080808@wxs.nl> <6faad9f00608051731t1dc00da2v73ad192dedd4835c@mail.gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1154939135 25883 80.91.229.2 (7 Aug 2006 08:25:35 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 7 Aug 2006 08:25:35 +0000 (UTC) Cc: Benjamin Gorinsek Original-X-From: ntg-context-bounces@ntg.nl Mon Aug 07 10:25:33 2006 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1GA0Ps-0005dh-34 for gctc-ntg-context-518@m.gmane.org; Mon, 07 Aug 2006 10:24:52 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9BE731FCDF; Mon, 7 Aug 2006 10:24:51 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 07423-04-13; Mon, 7 Aug 2006 10:24:42 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id B811E1FDDA; Mon, 7 Aug 2006 10:24:41 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 521471FDDA for ; Mon, 7 Aug 2006 10:24:39 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 07423-04-12 for ; Mon, 7 Aug 2006 10:24:33 +0200 (CEST) Original-Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.187]) by ronja.ntg.nl (Postfix) with SMTP id 9CEBE1FCDF for ; Mon, 7 Aug 2006 10:24:32 +0200 (CEST) Original-Received: by nf-out-0910.google.com with SMTP id x29so919652nfb for ; Mon, 07 Aug 2006 01:24:32 -0700 (PDT) Original-Received: by 10.78.175.14 with SMTP id x14mr2316063hue; Mon, 07 Aug 2006 01:24:32 -0700 (PDT) Original-Received: by 10.78.175.15 with HTTP; Mon, 7 Aug 2006 01:24:32 -0700 (PDT) Original-To: "mailing list for ConTeXt users" In-Reply-To: Content-Disposition: inline X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.7 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:30054 Archived-At: On 8/6/06, Aditya Mahajan wrote: > On Sun, 6 Aug 2006, Mojca Miklavec wrote: > > Base on those three answers I got a more clear idea of two (different, > > but complementary) methods that might be sensible: > > > > a) ctxtools --wordcount filename[tex|pdf] > > to do the wordcount for the whole document using pdftotext + ruby regexp > > > > b) > > \usemodule[wordcount] > > > > whatever > > > > \startstatistics[name][words|letters|lines] > > some more-or-less plain text > > \stopstatistics > > > > whatever > > > > and according to Aditya's idea, run a (ruby) regular expression > > (insead of detex) on it which would write the nicely formatted desired > > number to the output/log file. (I don't know if it's possible to use > > the first approach for the second problem, but it doesn't make sense > > to complicate things too much.) > > If you have a script that counts words in a Context document, the > second approach is straight forward. Write everything to a buffer and > run the script on the buffer. However, such a mechansim will never be > perfect (or close to perfect) in the sense of parsing arbitrary input. The most dummy solution that I could think of (using slightly modified Hans's ruby script): \unprotect \def\startstatistics {\dodoubleempty\dostartstatistics} \def\dostartstatistics[#1][#2]#3\stopstatistics {\setbuffer[#1]#3\endbuffer \executesystemcommand{ruby wordcount.rb \jobname-#1.tmp}% \getbuffer[#1]} \protect \doifnotmode{demo}{\endinput} ... but a friend who asked me for a favour actually wants to use abbreviations and bibliography as well, so only the first method (to create PDF first) would work. He currently keeps copy-pasting the resulting PDF to Word and uses Word's statistics to cound the words and/or characters for him. But I guess that his wishes will have to wait for some more time in this case. > ftp://tug.ctan.org/pub/tex-archive/macros/plain/contrib/misc/xii.tex > > But of course, you will not write anything like this in an abstract > :-) Nevertheless, I love the story (and esp. the document which creates it)! All the best, Mojca