ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen <j.hagen@xs4all.nl>
To: autumnus <ai2472206007@yeah.net>,
	mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: [NTG-context] Re: Quickly invoke a self-defined index sorting file?
Date: Sun, 12 Jan 2025 12:01:03 +0100	[thread overview]
Message-ID: <e8b46fdc-f024-4a11-b267-b2fee76be45d@xs4all.nl> (raw)
In-Reply-To: <173667230090.1763.3540311828271775295@cgl.ntg.nl>

On 1/12/2025 9:58 AM, autumnus wrote:
> hi,
> 
> I defined an index file for sorting Chinese. (a bit large with almost 4MB)
> 
> https://github.com/Soanguy/ConTeXt-Chinese-Example/blob/master/sorting/sort-alpha.lua
> 
> But I have to pre-reference this file with input every time to enable it.
> How do I fuse these files with my local context system and use setup to enable it directly.
> 
> I want directly use
> %%%
> \setupregister[index][n=4,language=cn-alpha,]
> %%%
> 
> instead of
> %%%%
> \input sort-alpha.lua
> \setupregister[index][
>    n=4,
>    language={cn-alpha},]
> %%%

remove \startluacode and \stopluacode in that file and do this instead:

\registerctxluafile{sort-hanzi}{}

\starttext
     test
\stoptext

currently you first load that whole file in memory tokenized (i.e. 1 
byte becomes 8 bytes) which is fine (and fast) for reasonable size files 
but in your case it has to bump token memory

also, you don't really need the huge entries table because you're not 
going to split the index for every first character

maybe this

definitions["cn-hanzi"].entries = table.setmetatableindex(function(t,k)
     if utfbyte(k) < 1000 then
         return "latin"
     else
         return "chinese"
     end
end)

print(definitions["cn-hanzi"].entries['a'])
print(definitions["cn-hanzi"].entries['咗'])

but even then ... korean and japanese don't have that either so 
basically you only need the order (is that order defined somewhere in 
unicode?

> In addition to this, I found that the following notification appeared on the tex terminal,
> probably because there are too many characters in the index file (tens of thousands of characters).
> Can I avoid this notification?
> 
> tex memory      > bumping category 'token' succeeded, details: all=16000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=2000000 | min=2000000 | ptr=1999080 | set=10000000 | stp=1000000 | top=2000000
> tex memory      > bumping category 'token' succeeded, details: all=24000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=3000000 | min=2000000 | ptr=2999080 | set=10000000 | stp=1000000 | top=3000000
> tex memory      > bumping category 'token' succeeded, details: all=32000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=4000000 | min=2000000 | ptr=3999080 | set=10000000 | stp=1000000 | top=4000000
> tex memory      > bumping category 'token' succeeded, details: all=40000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=5000000 | min=2000000 | ptr=4999080 | set=10000000 | stp=1000000 | top=5000000
> tex memory      > bumping category 'token' succeeded, details: all=48000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=6000000 | min=2000000 | ptr=5999080 | set=10000000 | stp=1000000 | top=6000000
> tex memory      > bumping category 'token' succeeded, details: all=56000000 | ext=0 | ini=568315 | itm=8 | max=10000000 | mem=7000000 | min=2000000 | ptr=6999080 | set=10000000 | stp=1000000 | top=7000000
> 
> autumnus
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
> webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
> archive  : https://github.com/contextgarden/context
> wiki     : https://wiki.contextgarden.net
> ___________________________________________________________________________________


-- 

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2025-01-12 11:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-12  8:58 [NTG-context] " autumnus 
2025-01-12 11:01 ` Hans Hagen [this message]
2025-01-12 11:53   ` [NTG-context] " autumnus 
2025-01-12 12:18     ` Hans Hagen
2025-01-12 13:54       ` autumnus 
2025-01-12 15:59         ` Hans Hagen via ntg-context

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8b46fdc-f024-4a11-b267-b2fee76be45d@xs4all.nl \
    --to=j.hagen@xs4all.nl \
    --cc=ai2472206007@yeah.net \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).