ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* [NTG-context] On Unicode and ConTeXt (emojis)
@ 2026-03-04  2:54 Jairo A. del Rio
  2026-03-04 10:10 ` [NTG-context] " Hans Hagen via ntg-context
  0 siblings, 1 reply; 2+ messages in thread
From: Jairo A. del Rio @ 2026-03-04  2:54 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

Hi again, list. char-emj.lua is automatically generated, I guess from here?
https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt. Well, I
regenerated data with a custom script (attached) in order to test newest
emojis and I find something weird. Emojis whose names include double quotes
(Ux201C and Ux201D) cannot be accessed nor with English quotes nor with
ASCII double quotes (example attached). However, direct access by codepoint
works fine. So, my questions are:

1. How is data from char-emj.lua generated? A script in the distribution
would help to ease and speed updates after Unicode releases.
2. Is the double quote issue expected or should it be fixed? I think
ASCII-only names would be easier to type, but it's just my opinion

Regards,

Jairo

[-- Attachment #1.2: Type: text/html, Size: 927 bytes --]

[-- Attachment #2: emojitest.lua --]
[-- Type: text/x-lua, Size: 1054 bytes --]

local lpeg = require("lpeg")
local C, Cg, Ct, utfR, P, R, V, match = lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.utfR, lpeg.P, lpeg.R, lpeg.V, lpeg.match

local emojis = {}

local grammar = P({
	"grammar",
	hex = (R("09", "af", "AF") ^ 1) / function(c)
		return tonumber(c, 16)
	end,
	word = (utfR(0, 0x10FFFF) - P(" ")) ^ 1,
	words = V("word") * (P(" ") * V("word")) ^ 0,
	cps = Ct(V("hex") * (P(" ") * V("hex")) ^ 0),
	qualified = C(V("word")),
	name = C(V("word") * (P(" ") * V("word")) ^ 0),
	grammar = Ct(
		Cg(V("cps"), "codepoints")
			* P(" ") ^ 1
			* P(";")
			* P(" ") ^ 1
			* Cg(V("word"), "qualified")
			* P(" ") ^ 1
			* P("#")
			* P(" ") ^ 1
			* V("word")
			* P(" ") ^ 1
			* V("word")
			* P(" ") ^ 1
			* Cg(V("words"), "name")
			* P(-1)
	),
})

for line in io.lines("./emoji-test.txt") do
	local matches = match(grammar, line)
	if matches then
		if matches["qualified"] ~= "unqualified" then
			emojis[matches["name"]:lower()] = matches["codepoints"]
		end
	end
end

-- Is this OK?
table.tofile("char-emj.lua", emojis, true, nil, false, true)

[-- Attachment #3: emojitest.pdf --]
[-- Type: application/pdf, Size: 7451 bytes --]

[-- Attachment #4: emojitest.tex --]
[-- Type: text/x-tex, Size: 235 bytes --]

\definefontfeature[colored][default][ccmp=yes,dist=yes,sbix=yes]
\definefont[emoj][file:NotoColorEmoji.ttf*colored]

\startTEXpage

\emoj\emoji{japanese “bargain” button} % This doesn't work
\zwj\Ux{1F250} % This does

\stopTEXpage

[-- Attachment #5: Type: text/plain, Size: 511 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [NTG-context] Re: On Unicode and ConTeXt (emojis)
  2026-03-04  2:54 [NTG-context] On Unicode and ConTeXt (emojis) Jairo A. del Rio
@ 2026-03-04 10:10 ` Hans Hagen via ntg-context
  0 siblings, 0 replies; 2+ messages in thread
From: Hans Hagen via ntg-context @ 2026-03-04 10:10 UTC (permalink / raw)
  To: ntg-context; +Cc: Hans Hagen

On 3/4/2026 3:54 AM, Jairo A. del Rio wrote:
> Hi again, list. char-emj.lua is automatically generated, I guess from 
> here? https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt 
> <https://www.unicode.org/Public/17.0.0/emoji/emoji-test.txt>. Well, I 
> regenerated data with a custom script (attached) in order to test newest 
> emojis and I find something weird. Emojis whose names include double 
> quotes (Ux201C and Ux201D) cannot be accessed nor with English quotes 
> nor with ASCII double quotes (example attached). However, direct access 
> by codepoint works fine. So, my questions are:
> 
> 1. How is data from char-emj.lua generated? A script in the distribution 
> would help to ease and speed updates after Unicode releases.

it has always been there: mtxrun --script unicode

concerning speed: if i know it, i do it ... also, we need to check it, 
so any update outside the distribution is kind of unsupported

it's not like noadays unicode updates are critical so we can permit some 
delay till a distribution update happens

> 2. Is the double quote issue expected or should it be fixed? I think 
> ASCII-only names would be easier to type, but it's just my opinion

I'll strip these quotes in the database generator as well as the 
resolver so that

\startTEXpage[offset=10pt]
     \emoj \showglyphs
     \emoji{japanese “bargain” button}%
     \space
     \emoji{japanese bargain button}%
\stopTEXpage

both work.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-04 10:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04  2:54 [NTG-context] On Unicode and ConTeXt (emojis) Jairo A. del Rio
2026-03-04 10:10 ` [NTG-context] " Hans Hagen via ntg-context

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).