From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/21318 Path: news.gmane.org!not-for-mail From: Taco Hoekwater Newsgroups: gmane.comp.tex.context Subject: TeX trie processing (\pattern loading) details visualisation Date: Mon, 11 Jul 2005 17:11:52 +0200 Message-ID: <42D28C38.3000505@elvenkind.com> References: <429D6DD1.2040005@wxs.nl> <429D734F.7040203@elvenkind.com> <429D7FB5.90406@wxs.nl> <429F265B.3070402@elvenkind.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1121094785 19747 80.91.229.2 (11 Jul 2005 15:13:05 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 11 Jul 2005 15:13:05 +0000 (UTC) Cc: ntg-context@ntg.nl, pdfTeX developers list Original-X-From: ntg-context-bounces@ntg.nl Mon Jul 11 17:12:58 2005 Return-path: Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1Drzx3-0007cD-DI for gctc-ntg-context-518@m.gmane.org; Mon, 11 Jul 2005 17:12:09 +0200 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 12B54127C7; Mon, 11 Jul 2005 17:12:07 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 29676-01; Mon, 11 Jul 2005 17:12:02 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 97006127A1; Mon, 11 Jul 2005 17:12:01 +0200 (CEST) Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id E39BA127A1; Mon, 11 Jul 2005 17:11:59 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 29636-02; Mon, 11 Jul 2005 17:11:59 +0200 (CEST) Original-Received: from glenfiddich.elvenkind.com (elvenknd.xs4all.nl [213.84.171.68]) by ronja.ntg.nl (Postfix) with ESMTP id D849E1278A; Mon, 11 Jul 2005 17:11:58 +0200 (CEST) Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by glenfiddich.elvenkind.com (Postfix) with ESMTP id EC1A71817D; Mon, 11 Jul 2005 17:10:23 +0200 (CEST) Original-Received: from glenfiddich.elvenkind.com ([127.0.0.1]) by localhost (glenfiddich.elvenkind.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 01256-08; Mon, 11 Jul 2005 17:10:20 +0200 (CEST) Original-Received: from [10.10.0.6] (glenlivet.elvenkind.com [10.10.0.6]) by glenfiddich.elvenkind.com (Postfix) with ESMTP id B78FB16FF9; Mon, 11 Jul 2005 17:10:20 +0200 (CEST) User-Agent: Mozilla Thunderbird 1.0 (X11/20041206) X-Accept-Language: en-us, en Original-To: Hans Hagen In-Reply-To: <429F265B.3070402@elvenkind.com> X-Virus-Scanned: by amavisd-new at elvenkind.net X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on smtp.ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:21318 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:21318 Hi, Vaguely connected to the font reader visualisation I posted last month, I have created a visualisation of the trie (\pattern) processing source code in initex. There are files here: http://tex.aanhet.net/temp/patreader.zip (12.825 bytes) http://tex.aanhet.net/temp/patreader.pdf (> 36 Megabytes) Please fetch the zip file and attempt to generate a local version yourself before downloading the PDF document :) The process itself is a bit harder to comprehend than the font reader, so some background knowledge is needed. It also helps if you have the TeX pascal sources handy. I should probably write a descriptive text in prose to go along with the images, but I'm bored with this stuff. It took me much longer than I had anticipated, because I kept running into limitations of MP ;-( Roughly, the execution order <-> pages mapping is as follows: pages function action 1 - 8 new_patterns() % \patterns for language 0 9 - 23 new_patterns() % \patterns for language 2 24 - 26 new_patterns() % \patterns for language 1 27 - 29 init_trie() % initialization of arrays 30 - 41 init_trie() % reshuffling languages 2 and 1 42 - 42 init_trie() % prepare for compression 43 - 241 compress_trie() % trie compression 242 - 244 init_trie() % prepare for packing 245 - 717 first_fit() % trie packing 718 - 965 init_trie() % finalizations for run-time The various blue items are used runtime (i.e. during hyphenation), the other arrays are only used in initex or only for statistics reporting. trie_hash is physically the same array as trie_ref, but it is cleaner to show them separately. The supplied perl script can in fact demonstrate the hyphenation of words using TeX's algorithm, but if you want meaningful results you have to feed it hyphen.tex instead of the three demonstration languages, and in that case, you have to increase the two limits ($trie_size and $trie_op_size). Check the top (and bottom) of the perl script for that. Have fun, Taco