Re: Can't no break between a close punctuation and a open punctuation - 黄复雄 via ntg-context

From: "黄复雄 via ntg-context" <ntg-context@ntg.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: 黄复雄 <aahuaang@gmail.com>
Subject: Re: Can't no break between a close punctuation and a open punctuation
Date: Sat, 3 Sep 2022 15:27:12 +0800	[thread overview]
Message-ID: <CAHN0TNgr-C_HF_yTdvn_Scr6yGVEmWHDvkuOSqDLUh=NH9MeJw@mail.gmail.com> (raw)
In-Reply-To: <2fd0db42-28aa-4036-545c-85376c3132e8@freedom.nl>

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

> you can try this in scrp-cjk.lua (remake the format)
>
> local chinese_8 = {
>      jami_initial     = nobreak_shrink_break_stretch,
>      korean           = nobreak_autoshrink_break_stretch,
>      chinese          = stretch_break, -- nobreak_autoshrink_break_stretch,
>      hiragana         = stretch_break, -- nobreak_autoshrink_break_stretch,
>      katakana         = stretch_break, -- nobreak_autoshrink_break_stretch,
>      half_width_open  = nobreak_autoshrink_break_stretch_nobreak_autoshrink,
> half_width_open  = stretch_break,
>      half_width_close = nobreak_autoshrink_nobreak_stretch,
>      full_width_open  = nobreak_autoshrink_break_stretch_nobreak_shrink,
>      full_width_close = nobreak_autoshrink_nobreak_stretch,
>      full_width_punct = nobreak_autoshrink_nobreak_stretch,
>      hyphen           = nobreak_autoshrink_break_stretch,
>      non_starter      = nobreak_autoshrink_break_stretch,
>      other            = nobreak_autoshrink_break_stretch,
> }

Dear Hans,

I have modified these two files, scrp-cjk.lua and char-scr.lua.
In the modified lines, some comments were made, but not all of these
places were commented. If you need to mark all the places, please let me know
and I'll mark all the changes.

As I've said last time, what I've done may be a bit reckless, please
check and correct it.

Though I've done some testing, but obviously it is not very comprehensive.

With best regards,

                            黄复雄（Huang Fusyong）

 ___________________________________________________________________________________
 If your question is of interest to others as well, please add an
entry to the Wiki!

 maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
 archive  : https://bitbucket.org/phg/context-mirror/commits/
 wiki     : https://contextgarden.net
 ___________________________________________________________________________________

[-- Attachment #2: scrp-cjk.lua --]
[-- Type: application/octet-stream, Size: 45159 bytes --]

if not modules then modules = { } end modules ['scrp-cjk'] = {
    version   = 1.001,
    comment   = "companion to scrp-ini.mkiv",
    author    = "Hans Hagen, PRAGMA-ADE, Hasselt NL",
    copyright = "PRAGMA ADE / ConTeXt Development Team",
    license   = "see context related readme files"
}

-- We can speed this up by preallocating nodes and copying them but the gain is not
-- that large.
--
-- If needed we can speed this up (traversers and prev next and such) but cjk
-- documents don't have that many glyphs and certainly not much font processing so
-- there not much gain in it.
--
-- The input line endings: there is no way to distinguish between inline spaces and
-- endofline turned into spaces (would not make sense either because otherwise a
-- wanted space at the end of a line would have to be a hard coded ones.

local nuts              = nodes.nuts

local copy_node        = nuts.copy
local remove_node      = nuts.remove
local nextglyph        = nuts.traversers.glyph

local getnext          = nuts.getnext
local getprev          = nuts.getprev
local getfont          = nuts.getfont
local getchar          = nuts.getchar
local getid            = nuts.getid
local getsubtype       = nuts.getsubtype
local getwidth         = nuts.getwidth

local setchar          = nuts.setchar

local nodepool         = nuts.pool
local new_glue         = nodepool.glue
local new_kern         = nodepool.kern
local new_penalty      = nodepool.penalty

local nodecodes        = nodes.nodecodes
local gluecodes        = nodes.gluecodes

local glyph_code       = nodecodes.glyph
local penalty_code     = nodecodes.penalty
local glue_code        = nodecodes.glue

local userskip_code    = gluecodes.userskip
local spaceskip_code   = gluecodes.spaceskip
local xspaceskip_code  = gluecodes.xspaceskip

local hash             = characters.scripthash

local getscriptstatus  = scripts.getstatus
local getscriptdata    = scripts.getdata
local scriptcolors     = scripts.colors

local fonthashes       = fonts.hashes
local quaddata         = fonthashes.quads
local spacedata        = fonthashes.spaces
local fontdata         = fonthashes.identifiers

local decomposed       = characters.hangul.decomposed

local trace_details    = false  trackers.register("scripts.details", function(v) trace_details = v end)

local report_details   = logs.reporter("scripts","detail")

-- raggedleft is controlled by leftskip and we might end up with a situation where
-- the intercharacter spacing interferes with this; the solution is to patch the
-- nodelist but better is to use veryraggedleft

local insertnodeafter  = scripts.helpers.insertnodeafter
local insertnodebefore = scripts.helpers.insertnodebefore

local inter_char_shrink          = 0
local inter_char_stretch         = 0
local inter_char_half_shrink     = 0
local inter_char_half_stretch    = 0
local inter_char_quarter_shrink  = 0
local inter_char_quarter_stretch = 0

local full_char_width            = 0
local half_char_width            = 0
local quarter_char_width         = 0

local inter_char_hangul_penalty  = 0

local function set_parameters(font,data)
    -- beware: parameters can be nil in e.g. punk variants
    local quad = quaddata[font]
    full_char_width            = quad
    half_char_width            = quad/2
    quarter_char_width         = quad/4
    inter_char_shrink          = data.inter_char_shrink_factor          * quad
    inter_char_stretch         = data.inter_char_stretch_factor         * quad
    inter_char_half_shrink     = data.inter_char_half_shrink_factor     * quad
    inter_char_half_stretch    = data.inter_char_half_stretch_factor    * quad
    inter_char_quarter_shrink  = data.inter_char_quarter_shrink_factor  * quad
    inter_char_quarter_stretch = data.inter_char_quarter_stretch_factor * quad
    inter_char_hangul_penalty  = data.inter_char_hangul_penalty
end

-- a test version did compensate for crappy halfwidth but we can best do that
-- at font definition time and/or just assume a correct font

local function trace_detail(current,what)
    local prev = getprev(current)
    local c_id = getid(current)
    local p_id = prev and getid(prev)
    if c_id == glyph_code then
        local c_ch = getchar(current)
        if p_id == glyph_code then
            local p_ch = p_id and getchar(prev)
            report_details("[%C %a] [%s] [%C %a]",p_ch,hash[p_ch],what,c_ch,hash[c_ch])
        else
            report_details("[%s] [%C %a]",what,c_ch,hash[c_ch])
        end
    else
        if p_id == glyph_code then
            local p_ch = p_id and getchar(prev)
            report_details("[%C %a] [%s]",p_ch,hash[p_ch],what)
        else
            report_details("[%s]",what)
        end
    end
end

local function trace_detail_between(p,n,what)
    local p_ch = getchar(p)
    local n_ch = getchar(n)
    report_details("[%C %a] [%s] [%C %a]",p_ch,hash[p_ch],what,n_ch,hash[n_ch])
end

local function nobreak(head,current)
    if trace_details then
        trace_detail(current,"nobreak")
    end
    insertnodebefore(head,current,new_penalty(10000))
end

local function stretch_break(head,current)
    if trace_details then
        trace_detail(current,"stretch break")
    end
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

-- `……` and `——` are seen respectively as a single punctuation in Chinese
-- we can break before or after each of them, and there should no be any space between the two parts.
-- [0x2026]   …   ellipsis
-- [0x2014]   —   Em Dash
-- Is there a better way or a better time to handle this issue?

modules['scrp-cjk']["doubleEMDash_kern"] = modules['scrp-cjk']["doubleEMDash_kern"] or {}
local doubleEMDash = modules['scrp-cjk']["doubleEMDash_kern"]

local function doubleEllipsis_doubleEMDash_nobreak(head,current)
    local current_char = getchar(current)
    local prev_node = getprev(current)
    if prev_node and current_char == getchar(prev_node)
    and (current_char == 0x2026 or current_char == 0x2014) then

        nobreak(head, current)

        if current_char == 0x2014 then -- kill the font space between
            local font = getfont(current)
            local kern
            if doubleEMDash[font] then
                kern = doubleEMDash[font]
            else
                local quad = quaddata[font]
                local desc = fontdata[font].descriptions[current_char]
                local desc_width = desc.width
                local boundingbox = desc.boundingbox
                local left_space =  boundingbox[1]
                local right_space = desc_width - boundingbox[3]
                kern = -(left_space + right_space)/desc_width * quad
                doubleEMDash[font] = kern
            end
            insertnodebefore(head,current,new_kern(kern))
        end

        -- remove infinite penalty before `……` or `——`
        prev_node = getprev(prev_node)
        while prev_node do
            local node_id = getid(prev_node)
            if node_id == penalty_code then
                remove_node(head,prev_node,true)
                if trace_details then
                    trace_detail(current,"stretch break before doubleEllipsis_doubleEMDash")
                end
                break
            elseif node_id == glyph_code then
                break
            end
            prev_node = getprev(prev_node)
        end
    else
        stretch_break(head, current)
    end
end

local function shrink_break(head,current)
    if trace_details then
        trace_detail(current,"shrink break")
    end
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

local function stretch_shrink_break(head,current)
    if trace_details then
        trace_detail(current,"stretch shrink break")
    end
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

local function nobreak_stretch(head,current)
    if trace_details then
        trace_detail(current,"nobreak stretch")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

local function korean_break(head,current)
    if trace_details then
        trace_detail(current,"korean break")
    end
    insertnodebefore(head,current,new_penalty(inter_char_hangul_penalty))
end

local function nobreak_shrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak shrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
end

local function nobreak_autoshrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak autoshrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

local function nobreak_stretch_nobreak_shrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak stretch nobreak shrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
end

local function nobreak_stretch_nobreak_autoshrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak stretch nobreak autoshrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

local function nobreak_shrink_nobreak_stretch(head,current)
    if trace_details then
        trace_detail(current,"nobreak shrink nobreak stretch")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

local function nobreak_autoshrink_nobreak_stretch(head,current)
    if trace_details then
        trace_detail(current,"nobreak autoshrink nobreak stretch")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

local function nobreak_shrink_break_stretch(head,current)
    if trace_details then
        trace_detail(current,"nobreak shrink break stretch")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

local function nobreak_autoshrink_break_stretch(head,current)
    if trace_details then
        trace_detail(current,"nobreak autoshrink break stretch")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
end

local function nobreak_shrink_break_stretch_nobreak_shrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak shrink break stretch nobreak shrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_shrink))
end

local function japanese_between_full_close_open(head,current) -- todo: check width
    if trace_details then
        trace_detail(current,"japanese between full close open")
    end
    insertnodebefore(head,current,new_kern(-half_char_width))
    insertnodebefore(head,current,new_glue(half_char_width,0,inter_char_half_shrink))
    insertnodebefore(head,current,new_kern(-half_char_width))
end

local function japanese_between_full_close_full_close(head,current) -- todo: check width
    if trace_details then
        trace_detail(current,"japanese between full close full close")
    end
    insertnodebefore(head,current,new_kern(-half_char_width))
 -- insertnodebefore(head,current,new_glue(half_char_width,0,inter_char_half_shrink))
end

local function japanese_before_full_width_punct(head,current) -- todo: check width
    if trace_details then
        trace_detail(current,"japanese before full width punct")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(quarter_char_width,0,inter_char_quarter_shrink))
    insertnodebefore(head,current,new_kern(-quarter_char_width))
end

local function japanese_after_full_width_punct(head,current) -- todo: check width
    if trace_details then
        trace_detail(current,"japanese after full width punct")
    end
    insertnodebefore(head,current,new_kern(-quarter_char_width))
    insertnodebefore(head,current,new_glue(quarter_char_width,0,inter_char_quarter_shrink))
end

local function nobreak_autoshrink_break_stretch_nobreak_autoshrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak autoshrink break stretch nobreak autoshrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

local function nobreak_autoshrink_break_stretch_nobreak_shrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak autoshrink break stretch nobreak shrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
end

local function nobreak_shrink_break_stretch_nobreak_autoshrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak shrink break stretch nobreak autoshrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,0,inter_char_shrink)) -- inter_char_shrink?
end

local function nobreak_stretch_break_shrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak stretch break shrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink)) -- inter_char_shrink?
end

local function nobreak_stretch_break_autoshrink(head,current)
    if trace_details then
        trace_detail(current,"nobreak stretch break autoshrink")
    end
    insertnodebefore(head,current,new_penalty(10000))
    insertnodebefore(head,current,new_glue(0,inter_char_stretch,0))
    insertnodebefore(head,current,new_glue(0,0,inter_char_half_shrink))
end

-- Korean: hangul

local korean_0 = {
}

local korean_1 = {
    jamo_initial     = korean_break,
    korean           = korean_break,
    chinese          = korean_break,
    hiragana         = korean_break,
    katakana         = korean_break,
    half_width_open  = stretch_break,
    half_width_close = nobreak,
    full_width_open  = stretch_break,
    full_width_close = nobreak,
    full_width_punct = nobreak,
--  hyphen           = nil,
    non_starter      = korean_break,
    other            = korean_break,
}

local korean_2 = {
    jamo_initial     = stretch_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = stretch_break,
    half_width_close = nobreak,
    full_width_open  = stretch_break,
    full_width_close = nobreak,
    full_width_punct = nobreak,
--  hyphen           = nil,
    non_starter      = stretch_break,
    other            = stretch_break,
}

local korean_3 = {
    jamo_initial     = stretch_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = stretch_break,
    half_width_close = nobreak,
    full_width_open  = stretch_break,
    full_width_close = nobreak,
    full_width_punct = nobreak,
--  hyphen           = nil,
    non_starter      = nobreak,
    other            = nobreak,
}

local korean_4 = {
    jamo_initial     = nobreak,
    korean           = nobreak,
    chinese          = nobreak,
    hiragana         = nobreak,
    katakana         = nobreak,
    half_width_open  = nobreak,
    half_width_close = nobreak,
    full_width_open  = nobreak,
    full_width_close = nobreak,
    full_width_punct = nobreak,
    hyphen           = nobreak,
    non_starter      = nobreak,
    other            = nobreak,
}

local korean_5 = {
    jamo_initial     = stretch_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = stretch_break,
    half_width_close = nobreak_stretch,
    full_width_open  = stretch_break,
    full_width_close = nobreak_stretch,
    full_width_punct = nobreak_stretch,
    hyphen           = nobreak_stretch,
    non_starter      = nobreak_stretch,
    other            = stretch_break,
}

local injectors = { -- [previous] [current]
    jamo_final       = korean_1,
    korean           = korean_1,
    chinese          = korean_1,
    hiragana         = korean_1,
    katakana         = korean_1,
    hyphen           = korean_2,
    start            = korean_0,
    other            = korean_2,
    non_starter      = korean_3,
    full_width_open  = korean_4,
    half_width_open  = korean_4,
    full_width_close = korean_5,
    full_width_punct = korean_5,
    half_width_close = korean_5,
}

scriptcolors.korean                     = "trace:0"
scriptcolors.chinese                    = "trace:0"
scriptcolors.basic_latin                = "trace:0"
scriptcolors.ASCII_digit                = "trace:0"
scriptcolors.katakana                   = "trace:0"
scriptcolors.hiragana                   = "trace:0"
scriptcolors.full_width_open            = "trace:1"
scriptcolors.full_width_close           = "trace:2"
scriptcolors.full_width_nospace_close   = "trace:2"
scriptcolors.half_width_open            = "trace:3"
scriptcolors.half_width_close           = "trace:4"
scriptcolors.full_width_punct           = "trace:5"
------------.hyphen                     = "trace:5"
scriptcolors.non_starter                = "trace:6"
scriptcolors.jamo_initial               = "trace:7"
scriptcolors.jamo_medial                = "trace:8"
scriptcolors.jamo_final                 = "trace:9"

local function process(head,first,last)
    if first ~= last then
        local lastfont = nil
        local previous = "start"
        local last     = nil
        while true do
            local upcoming = getnext(first)
            local id       = getid(first)
            if id == glyph_code then
                local current = getscriptstatus(first)
                local action  = injectors[previous]
                if action then
                    action = action[current]
                    if action then
                        local font = getfont(first)
                        if font ~= lastfont then
                            lastfont = font
                            set_parameters(font,getscriptdata(first))
                        end
                        action(head,first)
                    end
                end
                previous = current
            else -- glue
                local p = getprev(first)
                local n = upcoming
                if p and n then
                    local pid = getid(p)
                    local nid = getid(n)
                    if pid == glyph_code and nid == glyph_code then
                        local pcjk = getscriptstatus(p)
                        local ncjk = getscriptstatus(n)
                        if not pcjk                 or not ncjk
                            or pcjk == "korean"     or ncjk == "korean"
                            or pcjk == "other"      or ncjk == "other"
                            or pcjk == "jamo_final" or ncjk == "jamo_initial" then
                            previous = "start"
                        else -- if head ~= first then
                            remove_node(head,first,true)
                            previous = pcjk
                    --    else
                    --        previous = pcjk
                        end
                    else
                        previous = "start"
                    end
                else
                    previous = "start"
                end
            end
            if upcoming == last then -- was stop
                break
            else
                first = upcoming
            end
        end
    end
end

scripts.installmethod {
    name     = "hangul",
    injector = process,
    datasets = { -- todo: metatables and maybe some stretch and shrink factor
        default = {
            inter_char_shrink_factor          = 0.50, -- of quad
            inter_char_stretch_factor         = 0.50, -- of quad
            inter_char_half_shrink_factor     = 0.50, -- of quad
            inter_char_half_stretch_factor    = 0.50, -- of quad
            inter_char_quarter_shrink_factor  = 0.50, -- of quad
            inter_char_quarter_stretch_factor = 0.50, -- of quad
            inter_char_hangul_penalty         =   50,
        },
        tight = {
            inter_char_shrink_factor          = 0.10, -- of quad
            inter_char_stretch_factor         = 0.10, -- of quad
            inter_char_half_shrink_factor     = 0.10, -- of quad
            inter_char_half_stretch_factor    = 0.10, -- of quad
            inter_char_quarter_shrink_factor  = 0.10, -- of quad
            inter_char_quarter_stretch_factor = 0.10, -- of quad
            inter_char_hangul_penalty         =   50,
        },
    },
}

function scripts.decomposehangul(head)
    local done = false
    for current, char in nextglyph, head do
        local lead_consonant, medial_vowel, tail_consonant = decomposed(char)
        if lead_consonant then
            setchar(current,lead_consonant)
            local m = copy_node(current)
            setchar(m,medial_vowel)
            head, current = insertnodeafter(head,current,m)
            if tail_consonant then
                local t = copy_node(current)
                setchar(t,tail_consonant)
                head, current = insertnodeafter(head,current,t)
            end
            done = true
        end
    end
    return head, done
end

-- nodes.tasks.prependaction("processors","normalizers","scripts.decomposehangul")

local otffeatures         = fonts.constructors.features.otf
local registerotffeature  = otffeatures.register

registerotffeature {
    name         = "decomposehangul",
    description  = "decompose hangul",
    processors = {
        position = 1,
        node     = scripts.decomposehangul,
    }
}

-- Chinese: hanzi

local chinese_0 = {
}

local chinese_1 = {
    jamo_initial                = korean_break,
    korean                      = korean_break,
    chinese                     = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
--  hyphen                      = nil,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local chinese_2 = {
    jamo_initial                = stretch_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    basic_latin                 = stretch_break,
    ASCII_digit                 = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
    hyphen                      = nobreak_stretch,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local chinese_3 = {
    jamo_initial                = korean_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
    hyphen                      = nobreak,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local chinese_4 = {
    --  jamo_initial            = nil,
    --  korean                  = nil,
    --  chinese                 = nil,
    --  hiragana                = nil,
    --  katakana                = nil,
    half_width_open             = nobreak_autoshrink,
    half_width_close            = nil,
    full_width_open             = nobreak_shrink,
    full_width_close            = nobreak,
    full_width_nospace_close    = nobreak,
    full_width_punct            = nobreak,
    hyphen                      = nobreak,
    non_starter                 = nobreak,
    --  other                   = nil,
}

local chinese_5 = {
    jamo_initial                = stretch_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
    --  hyphen                  = nil,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local chinese_6 = {
    jamo_initial                = nobreak_stretch,
    korean                      = nobreak_stretch,
    chinese                     = nobreak_stretch,
    hiragana                    = nobreak_stretch,
    katakana                    = nobreak_stretch,
    half_width_open             = nobreak_stretch_break_autoshrink,
    half_width_close            = nobreak_stretch,
    full_width_open             = nobreak_stretch_break_shrink,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
    hyphen                      = nobreak_stretch,
    non_starter                 = nobreak_stretch,
    other                       = nobreak_stretch,
}

local chinese_7 = {
    jami_initial                = stretch_shrink_break,
    korean                      = stretch_shrink_break,
    chinese                     = stretch_shrink_break,
    hiragana                    = stretch_shrink_break,
    katakana                    = stretch_shrink_break,
    half_width_open             = stretch_shrink_break,
    half_width_close            = nobreak_shrink_nobreak_stretch,
    full_width_open             = shrink_break, -- don't stretch whith 2 half spaces
    full_width_close            = nobreak_shrink_nobreak_stretch,
    full_width_nospace_close    = nobreak_shrink_break_stretch,
    full_width_punct            = nobreak_shrink_nobreak_stretch,
    hyphen                      = nobreak_shrink_break_stretch,
    non_starter                 = nobreak_shrink_break_stretch,
    other                       = nobreak_shrink_break_stretch,
}

local chinese_8 = {
    jami_initial                = stretch_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_autoshrink_nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_autoshrink_nobreak_stretch,
    full_width_punct            = nobreak_autoshrink_nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    hyphen                      = nobreak_autoshrink_break_stretch,
    non_starter                 = nobreak_autoshrink_break_stretch,
    other                       = nobreak_autoshrink_break_stretch,
}

local chinese_9 = {
    jami_initial                = stretch_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    basic_latin                 = stretch_break,
    ASCII_digit                 = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_nospace_close    = doubleEllipsis_doubleEMDash_nobreak,
    full_width_punct            = nobreak_stretch,
    hyphen                      = nobreak_stretch,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local chinese_10 = {
    jamo_initial                = korean_break,
    korean                      = stretch_break,
    chinese                     = stretch_break,
    hiragana                    = stretch_break,
    katakana                    = stretch_break,
    half_width_open             = stretch_break,
    half_width_close            = nobreak_stretch,
    full_width_open             = stretch_shrink_break,
    full_width_close            = nobreak_stretch,
    full_width_punct            = nobreak_stretch,
    full_width_nospace_close    = nobreak_stretch,
    hyphen                      = nobreak_stretch,
    non_starter                 = stretch_break,
    other                       = stretch_break,
}

local injectors = { -- [previous] [current]
    jamo_final                  = chinese_1,
    korean                      = chinese_1,
    chinese                     = chinese_2,
    hiragana                    = chinese_2,
    katakana                    = chinese_2,
    hyphen                      = chinese_3,
    start                       = chinese_4,
    other                       = chinese_5,
    non_starter                 = chinese_5,
    full_width_open             = chinese_6,
    half_width_open             = chinese_6,
    full_width_close            = chinese_7,
    full_width_punct            = chinese_7,
    half_width_close            = chinese_8,
    full_width_nospace_close    = chinese_9,
    basic_latin                 = chinese_10, -- Will it override or interfere with the system behavior?
    ASCII_digit                 = chinese_10, -- Will it override or interfere with the system behavior?
}

local function process(head,first,last)
    if first ~= last then
        local lastfont = nil
        local previous = "start"
        local last     = nil
        while true do
            local upcoming = getnext(first)
            local id       = getid(first)
            if id == glyph_code then
                local current = getscriptstatus(first)
                local action  = injectors[previous]
                if action then
                    action = action[current]
                    if action then
                        local font = getfont(first)
                        if font ~= lastfont then
                            lastfont = font
                            set_parameters(font,getscriptdata(first))
                        end
                        action(head,first)
                    end
                end
                previous = current
            else -- glue
                local p = getprev(first)
                local n = upcoming
                if p and n then
                    local pid = getid(p)
                    local nid = getid(n)
                    if pid == glyph_code and nid == glyph_code then
                        local pcjk = getscriptstatus(p)
                        local ncjk = getscriptstatus(n)
                        if not pcjk                       or not ncjk
                            or pcjk == "basic_latin"      or ncjk == "basic_latin" -- !!!
                            or pcjk == "ASCII_digit"      or ncjk == "ASCII_digit" -- !!!
                            or pcjk == "korean"           or ncjk == "korean"
                            or pcjk == "other"            or ncjk == "other"
                            or pcjk == "jamo_final"       or ncjk == "jamo_initial"
                            or pcjk == "half_width_close" or ncjk == "half_width_open" then -- extra compared to korean
                            previous = "start"
                        else -- if head ~= first then
                            remove_node(head,first,true)
                            previous = pcjk
                    --    else
                    --        previous = pcjk
                        end
                    else
                        previous = "start"
                    end
                else
                    previous = "start"
                end
            end
            if upcoming == last then -- was stop
                break
            else
                first = upcoming
            end
        end
    end
end

scripts.installmethod {
    name     = "hanzi",
    injector = process,
    datasets = {
        default = {
            inter_char_shrink_factor          = 0.50, -- of quad
            inter_char_stretch_factor         = 0.50, -- of quad
            inter_char_half_shrink_factor     = 0.50, -- of quad
            inter_char_half_stretch_factor    = 0.50, -- of quad
            inter_char_quarter_shrink_factor  = 0.50, -- of quad
            inter_char_quarter_stretch_factor = 0.50, -- of quad
            inter_char_hangul_penalty         =   50,
        },
    },
}

-- Japanese: idiographic, hiragana, katakana, romanji / jis

local japanese_0 = {
}

local japanese_1 = {
    jamo_initial     = korean_break,
    korean           = korean_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = nobreak_stretch_break_autoshrink,
    half_width_close = nobreak_stretch,
    full_width_open  = nobreak_stretch_break_shrink,
    full_width_close = nobreak_stretch,
    full_width_punct = nobreak_stretch,
--  hyphen           = nil,
    non_starter      = nobreak_stretch,
    other            = stretch_break,
}

local japanese_2 = {
    jamo_initial     = korean_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = nobreak_stretch_break_autoshrink,
    half_width_close = nobreak_stretch,
    full_width_open  = nobreak_stretch_break_shrink,
    full_width_close = nobreak_stretch,
    full_width_punct = japanese_before_full_width_punct, -- nobreak_stretch,
    hyphen           = nobreak_stretch,
    non_starter      = nobreak_stretch,
    other            = stretch_break,
}

local japanese_3 = {
    jamo_initial     = korean_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = nobreak_stretch_break_autoshrink,
    half_width_close = nobreak_stretch,
    full_width_open  = nobreak_stretch_break_shrink,
    full_width_close = nobreak_stretch,
    full_width_punct = nobreak_stretch,
--  hyphen           = nil,
    non_starter      = nobreak_stretch,
    other            = stretch_break,
}

local japanese_4 = {
--  jamo_initial     = nil,
--  korean           = nil,
--  chinese          = nil,
--  hiragana         = nil,
--  katakana         = nil,
    half_width_open  = nobreak_autoshrink,
    half_width_close = nil,
    full_width_open  = nobreak_shrink,
    full_width_close = nobreak,
    full_width_punct = nobreak,
--  hyphen           = nil,
    non_starter      = nobreak,
--  other            = nil,
}

local japanese_5 = {
    jamo_initial     = stretch_break,
    korean           = stretch_break,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = nobreak_stretch_break_autoshrink,
    half_width_close = nobreak_stretch,
    full_width_open  = nobreak_stretch_break_shrink,
    full_width_close = nobreak_stretch,
    full_width_punct = nobreak_stretch,
--  hyphen           = nil,
    non_starter      = nobreak_stretch,
    other            = stretch_break,
}

local japanese_6 = {
    jamo_initial     = nobreak_stretch,
    korean           = nobreak_stretch,
    chinese          = nobreak_stretch,
    hiragana         = nobreak_stretch,
    katakana         = nobreak_stretch,
    half_width_open  = nobreak_stretch_break_autoshrink,
    half_width_close = nobreak_stretch,
    full_width_open  = nobreak_stretch_break_shrink,
    full_width_close = nobreak_stretch,
    full_width_punct = nobreak_stretch,
    hyphen           = nobreak_stretch,
    non_starter      = nobreak_stretch,
    other            = nobreak_stretch,
}

local japanese_7 = {
    jami_initial     = nobreak_shrink_break_stretch,
    korean           = nobreak_shrink_break_stretch,
    chinese          = japanese_after_full_width_punct, -- stretch_break
    hiragana         = japanese_after_full_width_punct, -- stretch_break
    katakana         = japanese_after_full_width_punct, -- stretch_break
    half_width_open  = nobreak_shrink_break_stretch_nobreak_autoshrink,
    half_width_close = nobreak_shrink_nobreak_stretch,
    full_width_open  = japanese_between_full_close_open, -- !!
    full_width_close = japanese_between_full_close_full_close, -- nobreak_shrink_nobreak_stretch,
    full_width_punct = nobreak_shrink_nobreak_stretch,
    hyphen           = nobreak_shrink_break_stretch,
    non_starter      = nobreak_shrink_break_stretch,
    other            = nobreak_shrink_break_stretch,
}

local japanese_8 = {
    jami_initial     = nobreak_shrink_break_stretch,
    korean           = nobreak_autoshrink_break_stretch,
    chinese          = stretch_break,
    hiragana         = stretch_break,
    katakana         = stretch_break,
    half_width_open  = nobreak_autoshrink_break_stretch_nobreak_autoshrink,
    half_width_close = nobreak_autoshrink_nobreak_stretch,
    full_width_open  = nobreak_autoshrink_break_stretch_nobreak_shrink,
    full_width_close = nobreak_autoshrink_nobreak_stretch,
    full_width_punct = nobreak_autoshrink_nobreak_stretch,
    hyphen           = nobreak_autoshrink_break_stretch,
    non_starter      = nobreak_autoshrink_break_stretch,
    other            = nobreak_autoshrink_break_stretch,
}

local injectors = { -- [previous] [current]
    jamo_final       = japanese_1,
    korean           = japanese_1,
    chinese          = japanese_2,
    hiragana         = japanese_2,
    katakana         = japanese_2,
    hyphen           = japanese_3,
    start            = japanese_4,
    other            = japanese_5,
    non_starter      = japanese_5,
    full_width_open  = japanese_6,
    half_width_open  = japanese_6,
    full_width_close = japanese_7,
    full_width_punct = japanese_7,
    half_width_close = japanese_8,
}

local function process(head,first,last)
    if first ~= last then
        local lastfont = nil
        local previous = "start"
        local last     = nil
        while true do
            local upcoming = getnext(first)
            local id       = getid(first)
            if id == glyph_code then
                local current = getscriptstatus(first)
                local action  = injectors[previous]
                if action then
                    action = action[current]
                    if action then
                        local font = getfont(first)
                        if font ~= lastfont then
                            lastfont = font
                            set_parameters(font,getscriptdata(first))
                        end
                        action(head,first)
                    end
                end
                previous = current
         -- elseif id == math_code then
         --     upcoming = getnext(endofmath(current))
         --     previous = "start"
            else -- glue
                local p = getprev(first)
                local n = upcoming
                if p and n then
                    local pid = getid(p)
                    local nid = getid(n)
                    if pid == glyph_code and nid == glyph_code then
                        local pcjk = getscriptstatus(p)
                        local ncjk = getscriptstatus(n)
                        if not pcjk                       or not ncjk
                            or pcjk == "korean"           or ncjk == "korean"
                            or pcjk == "other"            or ncjk == "other"
                            or pcjk == "jamo_final"       or ncjk == "jamo_initial"
                            or pcjk == "half_width_close" or ncjk == "half_width_open" then -- extra compared to korean
                            previous = "start"
                        else -- if head ~= first then
                            if id == glue_code then
                                -- also scriptstatus check?
                                local subtype = getsubtype(first)
                                if subtype == userskip_code or subtype == spaceskip_code or subtype == xspaceskip_code then
                                    -- for the moment no distinction possible between space and userskip
                                    local w = getwidth(first)
                                    local s = spacedata[getfont(p)]
                                    if w == s then -- could be option
                                        if trace_details then
                                            trace_detail_between(p,n,"space removed")
                                        end
                                        remove_node(head,first,true)
                                    end
                                end
                            end
                            previous = pcjk
                    --    else
                    --        previous = pcjk
                        end
                    else
                        previous = "start"
                    end
                else
                    previous = "start"
                end
            end
            if upcoming == last then -- was stop
                break
            else
                first = upcoming
            end
        end
    end
end

scripts.installmethod {
    name     = "nihongo", -- what name to use?
    injector = process,
    datasets = {
        default = {
            inter_char_shrink_factor          = 0.50, -- of quad
            inter_char_stretch_factor         = 0.50, -- of quad
            inter_char_half_shrink_factor     = 0.50, -- of quad
            inter_char_half_stretch_factor    = 0.50, -- of quad
            inter_char_quarter_shrink_factor  = 0.25, -- of quad
            inter_char_quarter_stretch_factor = 0.25, -- of quad
            inter_char_hangul_penalty         =   50,
        },
    },
}

[-- Attachment #3: char-scr.lua --]
[-- Type: application/octet-stream, Size: 7918 bytes --]

if not modules then modules = { } end modules ['char-scr'] = {
    version   = 1.001,
    comment   = "companion to char-ini.mkiv",
    author    = "Hans Hagen, PRAGMA-ADE, Hasselt NL",
    copyright = "PRAGMA ADE / ConTeXt Development Team",
    license   = "see context related readme files"
}

local tonumber = tonumber

characters.scripthash = { -- we could put these presets in char-def.lua
    --
    -- half width opening parenthesis
    --
    [0x0028] = "half_width_open", -- (
    [0x005B] = "half_width_open", -- [
    [0x007B] = "half_width_open", -- {
    [0xFF62] = "half_width_open", --  ｢   left corner bracket
    --
    -- full width opening parenthesis
    --
    [0x2018] = "full_width_open", -- ‘   all to simsun.ttc the main font for Simplified Chinese in Win10 (not to mingliu.ttc that just for raditional Chinese)
    [0x201C] = "full_width_open", -- “
    [0x3008] = "full_width_open", -- 〈   Left book quote
    [0x300A] = "full_width_open", -- 《   Left double book quote
    [0x300C] = "full_width_open", -- 「   left quote
    [0x300E] = "full_width_open", -- 『   left double quote
    [0x3010] = "full_width_open", -- 【   left double book quote
    [0x3014] = "full_width_open", -- 〔   left book quote
    [0x3016] = "full_width_open", -- 〖   left double book quote
    [0x3018] = "full_width_open", -- 〘   left tortoise bracket
    [0x301A] = "full_width_open", -- 〚   left square bracket
    [0x301D] = "full_width_open", -- 〝   reverse double prime qm
    [0xFF08] = "full_width_open", -- （   left parenthesis
    [0xFF3B] = "full_width_open", -- ［   left square brackets
    [0xFF5B] = "full_width_open", -- ｛   left curve bracket
    --
    -- half width closing parenthesis
    --
    [0x0029] = "half_width_close", -- )
    [0x005D] = "half_width_close", -- ]
    [0x007D] = "half_width_close", -- }
    [0xFF63] = "half_width_close", -- ｣   right corner bracket
    --
    -- full width closing parenthesis
    --
    [0x2019] = "full_width_close", -- ’   right quote, right
    [0x201D] = "full_width_close", -- ”   right double quote
    [0x3009] = "full_width_close", -- 〉   book quote
    [0x300B] = "full_width_close", -- 》   double book quote
    [0x300D] = "full_width_close", -- 」   right quote, right
    [0x300F] = "full_width_close", -- 』   right double quote
    [0x3011] = "full_width_close", -- 】   right double book quote
    [0x3015] = "full_width_close", -- 〕   right book quote
    [0x3017] = "full_width_close", -- 〗   right double book quote
    [0x3019] = "full_width_close", -- 〙   right tortoise bracket
    [0x301B] = "full_width_close", -- 〛   right square bracket
    [0x301E] = "full_width_close", -- 〞   double prime qm
    [0x301F] = "full_width_close", -- 〟   low double prime qm
    [0xFF09] = "full_width_close", -- ）   right parenthesis
    [0xFF3D] = "full_width_close", -- ］   right square brackets
    [0xFF5D] = "full_width_close", -- ｝   right curve brackets
    --
    -- vertical opening vertical
    --
    -- 0xFE35, 0xFE37, 0xFE39,  0xFE3B,  0xFE3D,  0xFE3F,  0xFE41,  0xFE43,  0xFE47,
    --
    -- vertical closing
    --
    -- 0xFE36, 0xFE38, 0xFE3A,  0xFE3C,  0xFE3E,  0xFE40,  0xFE42,  0xFE44,  0xFE48,
    --
    -- half width opening punctuation
    --
    -- <empty>
    --
    -- full width opening punctuation
    --
    --  0x2236, -- ∶
    --  0xFF0C, -- ，
    --
    -- half width closing punctuation_hw
    --
    [0x0021] = "half_width_close", -- !
    [0x002C] = "half_width_close", -- ,
    [0x002E] = "half_width_close", -- .
    [0x003A] = "half_width_close", -- :
    [0x003B] = "half_width_close", -- ;
    [0x003F] = "half_width_close", -- ?
    [0xFF61] = "half_width_close", -- ｡   hw full stop
    [0x002F] = "half_width_close", -- /   Solidus. `/～·-` are added according to *General Rules for Punctuation* (GB/T 1583—2011), the standard applicable to The People's Republic of China
    --
    -- full width closing punctuation
    --
    [0x3001] = "full_width_close", -- 、
    [0x3002] = "full_width_close", -- 。
    [0xFF0C] = "full_width_close", -- ，
    [0xFF0E] = "full_width_close", -- ．
    [0x00b7] = "full_width_close", -- ·   MIDDLE DOT
    --
    -- depends on font
    --
    [0xFF01] = "full_width_close", -- ！
    [0xFF1F] = "full_width_close", -- ？
    [0xFF1A] = "full_width_close", -- ：
    [0xFF1B] = "full_width_close", -- ；
    -- 
    -- full width closing punctuation without space
    -- 
    [0x2026] = "full_width_nospace_close", -- …   ellipsis
    [0x2014] = "full_width_nospace_close", -- —   Em Dash
    [0xff5e] = "full_width_nospace_close", -- ～   FULLWIDTH TILDE
    --
    -- non starter
    --
    [0x3005] = "non_starter", [0x3041] = "non_starter", [0x3043] = "non_starter", [0x3045] = "non_starter", [0x3047] = "non_starter",
    [0x3049] = "non_starter", [0x3063] = "non_starter", [0x3083] = "non_starter", [0x3085] = "non_starter", [0x3087] = "non_starter",
    [0x308E] = "non_starter", [0x3095] = "non_starter", [0x3096] = "non_starter", [0x309B] = "non_starter", [0x309C] = "non_starter",
    [0x309D] = "non_starter", [0x309E] = "non_starter", [0x30A0] = "non_starter", [0x30A1] = "non_starter", [0x30A3] = "non_starter",
    [0x30A5] = "non_starter", [0x30A7] = "non_starter", [0x30A9] = "non_starter", [0x30C3] = "non_starter", [0x30E3] = "non_starter",
    [0x30E5] = "non_starter", [0x30E7] = "non_starter", [0x30EE] = "non_starter", [0x30F5] = "non_starter", [0x30F6] = "non_starter",
    [0x30FC] = "non_starter", [0x30FD] = "non_starter", [0x30FE] = "non_starter", [0x31F0] = "non_starter", [0x31F1] = "non_starter",
    [0x31F2] = "non_starter", [0x31F3] = "non_starter", [0x31F4] = "non_starter", [0x31F5] = "non_starter", [0x31F6] = "non_starter",
    [0x31F7] = "non_starter", [0x31F8] = "non_starter", [0x31F9] = "non_starter", [0x31FA] = "non_starter", [0x31FB] = "non_starter",
    [0x31FC] = "non_starter", [0x31FD] = "non_starter", [0x31FE] = "non_starter", [0x31FF] = "non_starter",
    --
    [0x301C] = "non_starter", [0x303B] = "non_starter", [0x303C] = "non_starter", [0x30FB] = "non_starter",
    -- [0x309B] = "non_starter", -- duplicated
    -- [0x30FE] = "non_starter", -- duplicated
    -- 
    -- hyphenation
    --
    [0x002D] = "hyphen", -- -   Hyphen-Minus. Will there be any side effects?
    -- 
    [0x1361] = "ethiopic_word", -- ፡   Ethiopic Wordspace
    [0x1362] = "ethiopic_sentence", -- ።   Ethiopic Full Stop
    --
    -- tibetan:
    --
    [0x0F0B] = "breaking_tsheg", -- ་
    [0x0F0C] = "nonbreaking_tsheg", -- ༌

}

table.setmetatableindex(characters.scripthash, function(t,k)
    local v
    if not tonumber(k)                     then v = false
    elseif (k >= 0x03040 and k <= 0x030FF)
        or (k >= 0x031F0 and k <= 0x031FF)
        or (k >= 0x032D0 and k <= 0x032FE)
        or (k >= 0x0FF00 and k <= 0x0FFEF) then v = "katakana"
    elseif (k >= 0x03400 and k <= 0x04DFF)
        or (k >= 0x04E00 and k <= 0x09FFF)
        or (k >= 0x0F900 and k <= 0x0FAFF)
        or (k >= 0x20000 and k <= 0x2A6DF)
        or (k >= 0x2F800 and k <= 0x2FA1F) then v = "chinese"
    elseif (k >= 0x00000 and k <= 0x0007F) then v = "basic_latin" --  any side effects?
    elseif (k >= 0x0030 and k <= 0x0039) then v = "ASCII_digit"   --  any side effects?
    elseif (k >= 0x0AC00 and k <= 0x0D7A3) then v = "korean"
    elseif (k >= 0x01100 and k <= 0x0115F) then v = "jamo_initial"
    elseif (k >= 0x01160 and k <= 0x011A7) then v = "jamo_medial"
    elseif (k >= 0x011A8 and k <= 0x011FF) then v = "jamo_final"
    elseif (k >= 0x01200 and k <= 0x0139F) then v = "ethiopic_syllable"
    elseif (k >= 0x00F00 and k <= 0x00FFF) then v = "tibetan"
                                           else v = false
    end
    t[k] = v
    return v
end)

-- storage.register("characters/scripthash", hash, "characters.scripthash")

[-- Attachment #4: watch_sys_actions_to_punctuation.lmtx --]
[-- Type: application/octet-stream, Size: 9652 bytes --]

\setscript[hanzi]
\usetypescriptfile[mscore]
\usebodyfont   [mschinese,20pt]

\showframe
% \enabletrackers[script*] %脚本跟踪

% just for watching node list
% \startluacode
%     local watch = require("watch_sys_actions.lua")
%     watch.register()
% \stopluacode

\starttext
\define\myfill{\hskip 13.8em}

% 
% \subject{5.1.1 句号、逗号、顿号、分号、冒号均置于相应文字之后，占一个字位置，居左下，不出现在一行之首。}

\myfill\dorecurse{2}{我。} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [。 (U+03002) 'full_width_close']
% [。 (U+03002) 'full_width_close'] [stretch shrink break] [我 (U+06211) 'chinese']

\myfill\dorecurse{3}{我，} %

\myfill\dorecurse{4}{我、} %

\myfill\dorecurse{5}{我；} %

\myfill\dorecurse{6}{我：} %

% 
% \subject{5.1.2 问号、叹号均置于相应文字之后，占一个字位置，居左，不出现在一行之首。两个问号（或叹号）叠用时，占一个字位置；三个问号（或叹号）叠用时，占两个字位置；问号和叹号连用时，占一个字位置。}

\myfill\dorecurse{2}{我？} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [！ (U+0FF01) 'full_width_close']
% [！ (U+0FF01) 'full_width_close'] [stretch shrink break] [我 (U+06211) 'chinese']

\myfill\dorecurse{3}{我！} %

\myfill\dorecurse{4}{我？？} % 
% [？ (U+0FF1F) 'full_width_close'] [nobreak shrink nobreak stretch] [？ (U+0FF1F) 'full_width_close']

\myfill\dorecurse{5}{我！！} %

\myfill\dorecurse{6}{我？？？} %

\myfill\dorecurse{7}{我！！！} %

\myfill\dorecurse{8}{我！？} %

\myfill\dorecurse{9}{我？！！} %

% 
% \subject{5.1.3 引号、括号、书名号中的两部分标在相应项目的两端，各占一个字位置。其中前一半不出现在一行之末，后一半不出现在一行之首。}

\myfill\dorecurse{1}{“我”我}
% [nobreak shrink] [“ (U+0201C) 'full_width_open']
% [“ (U+0201C) 'full_width_open'] [nobreak stretch] [我 (U+06211) 'chinese']
% [我 (U+06211) 'chinese'] [nobreak stretch] [” (U+0201D) 'full_width_close']
% [” (U+0201D) 'full_width_close'] [stretch shrink break] [我 (U+06211) 'chinese']
% [我 (U+06211) 'chinese'] [stretch shrink break] [“ (U+0201C) 'full_width_open']

\myfill\dorecurse{2}{‘我’我} %

\myfill\dorecurse{3}{（我）我} %

\myfill\dorecurse{4}{［我］我} %

\myfill\dorecurse{5}{〔我〕我} %

\myfill\dorecurse{6}{【我】我} %

\myfill\dorecurse{7}{《我》我} %

\myfill\dorecurse{8}{〈我〉我} %

% 
% \subject{ 5.1.4 破折号标在相应项目之间，占两个字位置，上下居中，不能中间断开分处上行之末和下行之首。}

\myfill\dorecurse{2}{我——} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [— (U+02014) 'full_width_nospace_close']
% [nobreak] [— (U+02014) 'full_width_nospace_close']
% [stretch break before double punctuations] [— (U+02014) 'full_width_nospace_close']
% [— (U+02014) 'full_width_nospace_close'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{3}{我——} %

\myfill我\dorecurse{3}{我——} %

% 
% \subject{5.1.5 省略号占两个字位置，两个省略号连用时占四个字位置并须单独占一行。省略号不能中间断开分处上行之末和下行之首。}

\myfill\dorecurse{2}{我……} %

\myfill\dorecurse{3}{我……} %

\myfill我\dorecurse{3}{我……} %

………… %

\myfill\dorecurse{2}{我～～} %

\myfill\dorecurse{3}{我～～} %

\myfill\dorecurse{3}{我我～～} %

\myfill我\dorecurse{3}{我～～} %

% 
% \subject{5.1.6 连接号中的短横线比汉字“一”略短，占半个字位置；一字线比汉字“一”略长，占一个字位置；浪纹线占一个字位置。连接号上下居中，不出现在一行之首。}

\myfill\dorecurse{8}{1-} %
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [- (U+0002D) 'hyphen']
% ？？？
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [- (U+0002D) 'hyphen']

\myfill\dorecurse{15}{1-} %

\myfill\dorecurse{8}{a-} %
% [a (U+00061) 'basic_latin'] [nobreak stretch] [- (U+0002D) 'hyphen']
% ？？？
% [a (U+00061) 'basic_latin'] [nobreak stretch] [- (U+0002D) 'hyphen']

\myfill\dorecurse{15}{a-} %

\myfill\dorecurse{6}{我-} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [- (U+0002D) 'hyphen']
% [- (U+0002D) 'hyphen'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{5}{1—} %
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [— (U+02014) 'full_width_nospace_close']
% [— (U+02014) 'full_width_nospace_close'] [stretch break] [1 (U+00031) 'ASCII_digit']

\myfill\dorecurse{6}{1—} %

\myfill\dorecurse{5}{a—} %
% [a (U+00061) 'basic_latin'] [nobreak stretch] [— (U+02014) 'full_width_nospace_close']
% [— (U+02014) 'full_width_nospace_close'] [stretch break] [a (U+00061) 'basic_latin']

\myfill\dorecurse{6}{a—} %

\myfill\dorecurse{3}{我—} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [— (U+02014) 'hyphen']
% [— (U+02014) 'hyphen'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{6}{我—} % good
% [我 (U+06211) 'chinese'] [nobreak stretch] [— (U+02014) 'full_width_nospace_close']
% [— (U+02014) 'full_width_nospace_close'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{5}{1～} %
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [～ (U+0FF5E) 'full_width_nospace_close']
% [～ (U+0FF5E) 'full_width_nospace_close'] [stretch break] [1 (U+00031) 'ASCII_digit']

\myfill\dorecurse{6}{1～} %

\myfill\dorecurse{5}{a～} %
% [a (U+00061) 'basic_latin'] [nobreak stretch] [～ (U+0FF5E) 'full_width_nospace_close']
% [～ (U+0FF5E) 'full_width_nospace_close'] [stretch break] [a (U+00061) 'basic_latin']

\myfill\dorecurse{6}{a～} %

\myfill\dorecurse{3}{我～} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [～ (U+0FF5E) 'full_width_nospace_close']
% [～ (U+0FF5E) 'full_width_nospace_close'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{4}{我～} %

\myfill\dorecurse{5}{我～} %

% \subject{5.1.7 间隔号标在需要隔开的项目之间，占半个字位置，上下居中，不出现在一行之首。}
% 符号宽度待处理

\myfill\dorecurse{3}{我·} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [· (U+000B7) 'full_width_nospace_close']
% [· (U+000B7) 'full_width_nospace_close'] [stretch shrink break] [我 (U+06211) 'chinese']

\myfill\dorecurse{4}{我·} %

\myfill\dorecurse{5}{我·} %

% \subject{5.1.8 着重号和专名号标在相应文字的下边。}
% 不处理行间符号

% 
% \subject{5.1.9 分隔号占半个字位置，不出现在一行之首或一行之末。}
% 不执行本条断行规则，而按一般连字符处理

\myfill\dorecurse{7}{1/} %
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [/ (U+0002F) 'half_width_close']
% ？？？
% [1 (U+00031) 'ASCII_digit'] [nobreak stretch] [/ (U+0002F) 'half_width_close']

\myfill\dorecurse{8}{1/} %

\myfill\dorecurse{9}{1/} %

\myfill\dorecurse{8}{a/} %
% [a (U+00061) 'basic_latin'] [nobreak stretch] [/ (U+0002F) 'half_width_close']
% ？？？
% [a (U+00061) 'basic_latin'] [nobreak stretch] [/ (U+0002F) 'half_width_close']

\myfill\dorecurse{9}{a/} %

\myfill\dorecurse{10}{a/} %

\myfill\dorecurse{4}{我/} %
% [我 (U+06211) 'chinese'] [nobreak stretch] [/ (U+0002F) 'half_width_close']
% [/ (U+0002F) 'half_width_close'] [stretch break] [我 (U+06211) 'chinese']

\myfill\dorecurse{5}{我/} %

\myfill\dorecurse{6}{我/} %

% \subject{5.1.10 标点符号排在一行末尾时，若为全角字符则应占半角字符的宽度（即半个字位置），以使视觉效果更美观。}

% \subject{5.1.11 在实际编辑出版工作中，为排版美观、方便阅读等需要，或为避免某一小节最后一个汉字转行或出现在另外一页开头等情况（浪费版面及视觉效果差），可适当压缩标点符号所占用的空间。}

% 
% \subject{常用标点组合测试}

% close and open

\myfill\dorecurse{2}{我：“} %

\myfill\dorecurse{6}{我：“} %

\myfill\dorecurse{2}{我”“} %

\myfill\dorecurse{6}{我”“} %

\myfill\dorecurse{1}{我”、“} %

\myfill\dorecurse{2}{我”、“} %

\myfill\dorecurse{3}{我”、“} %

% long

\myfill\dorecurse{1}{我——”} %

\myfill\dorecurse{3}{我——”} %

\myfill\dorecurse{8}{我——”} %

\myfill\dorecurse{8}{我……”} %

\myfill\dorecurse{8}{我？？？”} %

\myfill\dorecurse{8}{我！！！”} %

\myfill\dorecurse{8}{我？？！！！”} %

\myfill\dorecurse{20}{（——《〈庄子〉“内篇”“集解”？！！》……）} %

% open and open

\myfill\dorecurse{3}{我(((} %

\myfill\dorecurse{4}{我(((} %

\myfill\dorecurse{5}{我(((} %

\myfill\dorecurse{1}{我（（（} %

\myfill\dorecurse{2}{我（（（} %

\myfill\dorecurse{3}{我（（（} %

% close and close

\myfill\dorecurse{3}{我)))} %

\myfill\dorecurse{4}{我)))} %

\myfill\dorecurse{5}{我)))} %

\myfill\dorecurse{1}{我）））} %

\myfill\dorecurse{2}{我）））} %

\myfill\dorecurse{3}{我）））} %

% Chinese Latin number

\myfill\dorecurse{1}{我1000} %

\myfill\dorecurse{2}{我1000} %

\myfill\dorecurse{3}{我1000} %

\myfill\dorecurse{8}{我1000} %

\myfill\dorecurse{1}{我word} %

\myfill\dorecurse{2}{我word} %

\myfill\dorecurse{3}{我word} %

\myfill\dorecurse{8}{我word} %

\stoptext

[-- Attachment #5: Type: text/plain, Size: 496 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________