Re: Multiple cases of unexpected behaviour in luametatex

From: Hans Hagen <j.hagen@xs4all.nl>
To: "mailing list for ConTeXt users" <ntg-context@ntg.nl>,
	"Marcel Fabian Krüger" <tex@2krueger.de>
Subject: Re: Multiple cases of unexpected behaviour in luametatex
Date: Fri, 3 Jul 2020 21:56:28 +0200	[thread overview]
Message-ID: <7d1f8ca7-2e36-3c99-c729-407a396aa0e2@xs4all.nl> (raw)
In-Reply-To: <20200703125545.orklmoqazzgyascu@yoga>

On 7/3/2020 2:55 PM, Marcel Fabian Krüger wrote:
> Hi,
> 
> I recently noticed some cases where luametatex behaved in unexpected
> ways:
> 
>    - The "Extra \fi" error isn't triggered, instead an extra `\fi`
>      freezes luametatex. (Can be reproduced by compiling a document which
>      only consists of a single \fi)

i already fixed here (noticed it when documenting some conditionals)

>    - token.new can only create some `data` tokens, but it doesn't apply
>      bound checking on it's arguments:

there is no checking yet, there is an upper limit of 0x1FFFFF, so i'll 
add a check for that

>      Also for all other commands LuaTeX seems to apply range-checks to
>      ensure that such overflows don't happen, even if invalid values are
>      passed as firstargument.

indeed, but hadn't yet done that for data, it also need a more strict 
check at the tex end (i'm still not sure if i make a slightly different 
implementation of it but i can add the test anyway)

>    - There is token.primitives(). My assumption is that the returned
>      table is meant to indicate the command is, mode and name
>      corresponding to every primitive. (I think it is awesome that such a
>      table is made available in luametatex) But especially the mode
>      field sometimes has values which do not correspond to the mode of
>      the actual primitives:

indeed.

>      I tried running the following in an almost iniTeX setting where all
>      primitives aside from \shipout and \Umathcodenum have their default
>      definitions:
> 
>      ```
>      \catcode`\%=12
>      \catcode`\~=12
>      \directlua{
>        local sorted = token.primitives()
>        table.sort(sorted, function(a,b) return a[1]<b[1] or a[1]==b[1] and a[2]<b[2]end)
>        for _,info in ipairs(sorted) do
>          local t = token.create(info[3])
>          local rc, rm = t.command, t.mode
>          if rc==info[1] and rm ~= info[2] then
>            if info[2] == 0 then
>              print(string.format('MODE MISMATCH, expected zero: \string\\%s: real: %i, command: %i', info[3], rm, rc))
>            else
>              print(string.format('MODE MISMATCH: \string\\%s: offset: %i, command: %i', info[3], rm-info[2], rc))
>            end
>          elseif rc~=info[1] then print(t.csname)
>          end
>        end
>      }
>      ```
> 
>      This indicates that there are two kinds of differences:
>      For some command codes, there are multiple primitives whose second
>      entry in the token.primitives table is zero even though their mode
>      is not zero. This especially affects the commands `above`,
>      `after_something`, `make_box`, `un_vbox`, `set_specification` and
>      `car_ret`.
>      E.g. for after_something, all of \atendofgrouped, \afterassigned and
>      \aftergrouped have a zero as second entry in token.primitives.

some tokens are more complex in the sense that they are combinations 
(have a follow up) and i'm not sure to what extedn i want to block that 
... all a matter of experimenting and time, so

the 'mode' field will be dropped but for now i kept it

some like after_something i need to check (i just didn't update their 
ranges yet after adding some more primitives that use them) (maybe some 
otheres need an offset added but i'll check it)

>      The other difference is that all the internal_... commands have a
>      fixed offset which differes between commands in their mode field.
> 
>      IMO the difference for the internal_... commands make sense because
>      they make for easier to use numbers, but having multiple primitives
>      indicating mode 0 for the other commands seems to make this table
>      significantly less useful because it can't be used to get a unique
>      description of a primitive.
> 
>      (I may have completely misinterpreted the table of course, but given
>      that for other primitives the values match I do not think so)
it's a it work in progress as there are some exceptions that use special 
chr codes (for instance in conditionals several cmd codes need to have 
exclusive codes, so adapting it is a stepwise process; one decision i 
need to make there is how close to stay to the original tex codes

eventually i want all to have reasonable ranges  in the token interface 
(not per se the same as in the engine itself but that's a black box 
anyway) which involves some offsetting .. i do that stepwise in order to 
keep a working engine (the token interface is not used in context that 
much)

Hans

hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________