* [NTG-context] On Unicode and ConTeXt (emojis)
@ 2026-03-04 2:54 Jairo A. del Rio
2026-03-04 10:10 ` [NTG-context] " Hans Hagen via ntg-context
0 siblings, 1 reply; 2+ messages in thread
From: Jairo A. del Rio @ 2026-03-04 2:54 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]
Hi again, list. char-emj.lua is automatically generated, I guess from here?
https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt. Well, I
regenerated data with a custom script (attached) in order to test newest
emojis and I find something weird. Emojis whose names include double quotes
(Ux201C and Ux201D) cannot be accessed nor with English quotes nor with
ASCII double quotes (example attached). However, direct access by codepoint
works fine. So, my questions are:
1. How is data from char-emj.lua generated? A script in the distribution
would help to ease and speed updates after Unicode releases.
2. Is the double quote issue expected or should it be fixed? I think
ASCII-only names would be easier to type, but it's just my opinion
Regards,
Jairo
[-- Attachment #1.2: Type: text/html, Size: 927 bytes --]
[-- Attachment #2: emojitest.lua --]
[-- Type: text/x-lua, Size: 1054 bytes --]
local lpeg = require("lpeg")
local C, Cg, Ct, utfR, P, R, V, match = lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.utfR, lpeg.P, lpeg.R, lpeg.V, lpeg.match
local emojis = {}
local grammar = P({
"grammar",
hex = (R("09", "af", "AF") ^ 1) / function(c)
return tonumber(c, 16)
end,
word = (utfR(0, 0x10FFFF) - P(" ")) ^ 1,
words = V("word") * (P(" ") * V("word")) ^ 0,
cps = Ct(V("hex") * (P(" ") * V("hex")) ^ 0),
qualified = C(V("word")),
name = C(V("word") * (P(" ") * V("word")) ^ 0),
grammar = Ct(
Cg(V("cps"), "codepoints")
* P(" ") ^ 1
* P(";")
* P(" ") ^ 1
* Cg(V("word"), "qualified")
* P(" ") ^ 1
* P("#")
* P(" ") ^ 1
* V("word")
* P(" ") ^ 1
* V("word")
* P(" ") ^ 1
* Cg(V("words"), "name")
* P(-1)
),
})
for line in io.lines("./emoji-test.txt") do
local matches = match(grammar, line)
if matches then
if matches["qualified"] ~= "unqualified" then
emojis[matches["name"]:lower()] = matches["codepoints"]
end
end
end
-- Is this OK?
table.tofile("char-emj.lua", emojis, true, nil, false, true)
[-- Attachment #3: emojitest.pdf --]
[-- Type: application/pdf, Size: 7451 bytes --]
[-- Attachment #4: emojitest.tex --]
[-- Type: text/x-tex, Size: 235 bytes --]
\definefontfeature[colored][default][ccmp=yes,dist=yes,sbix=yes]
\definefont[emoj][file:NotoColorEmoji.ttf*colored]
\startTEXpage
\emoj\emoji{japanese “bargain” button} % This doesn't work
\zwj\Ux{1F250} % This does
\stopTEXpage
[-- Attachment #5: Type: text/plain, Size: 511 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
* [NTG-context] Re: On Unicode and ConTeXt (emojis)
2026-03-04 2:54 [NTG-context] On Unicode and ConTeXt (emojis) Jairo A. del Rio
@ 2026-03-04 10:10 ` Hans Hagen via ntg-context
0 siblings, 0 replies; 2+ messages in thread
From: Hans Hagen via ntg-context @ 2026-03-04 10:10 UTC (permalink / raw)
To: ntg-context; +Cc: Hans Hagen
On 3/4/2026 3:54 AM, Jairo A. del Rio wrote:
> Hi again, list. char-emj.lua is automatically generated, I guess from
> here? https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt
> <https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt>. Well, I
> regenerated data with a custom script (attached) in order to test newest
> emojis and I find something weird. Emojis whose names include double
> quotes (Ux201C and Ux201D) cannot be accessed nor with English quotes
> nor with ASCII double quotes (example attached). However, direct access
> by codepoint works fine. So, my questions are:
>
> 1. How is data from char-emj.lua generated? A script in the distribution
> would help to ease and speed updates after Unicode releases.
it has always been there: mtxrun --script unicode
concerning speed: if i know it, i do it ... also, we need to check it,
so any update outside the distribution is kind of unsupported
it's not like noadays unicode updates are critical so we can permit some
delay till a distribution update happens
> 2. Is the double quote issue expected or should it be fixed? I think
> ASCII-only names would be easier to type, but it's just my opinion
I'll strip these quotes in the database generator as well as the
resolver so that
\startTEXpage[offset=10pt]
\emoj \showglyphs
\emoji{japanese “bargain” button}%
\space
\emoji{japanese bargain button}%
\stopTEXpage
both work.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-04 10:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 2:54 [NTG-context] On Unicode and ConTeXt (emojis) Jairo A. del Rio
2026-03-04 10:10 ` [NTG-context] " Hans Hagen via ntg-context
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).