ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Typesetting unicode characters
@ 2022-03-30  7:32 Thangalin via ntg-context
  2022-03-30  7:48 ` Arthur Rosendahl via ntg-context
  0 siblings, 1 reply; 3+ messages in thread
From: Thangalin via ntg-context @ 2022-03-30  7:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Thangalin


[-- Attachment #1.1: Type: text/plain, Size: 796 bytes --]

Hi list,

An XML document includes the 👍 emoji, as shown in the following snippet:

<html>
  <head><meta charset="utf8"/></head>
  <body>
    <div class="bubblerx">
      <p>Thumbs up emoji: &#55357;&#56397;</p>
    </div>
  </body
</html>

The document is typeset using ConTeXt, but the thumbs up emoji isn't in the
PDF. Neither Noto Emoji nor Open Sans Emoji fonts will render.

Does anyone have a minimal example that shows how to typeset such escaped
entities?

When the emoji is added directly to a document, it works fine:

\definefont [TextFontEmoji] [opensansemoji]

\starttext
  \TextFontEmoji{Thumbs up emoji: 👍}
\stoptext

Is there something special that needs to be set for ConTeXt to interpret
the escaped unicode values as an emoji?

Thank you!

[-- Attachment #1.2: Type: text/html, Size: 1178 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Typesetting unicode characters
  2022-03-30  7:32 Typesetting unicode characters Thangalin via ntg-context
@ 2022-03-30  7:48 ` Arthur Rosendahl via ntg-context
  2022-03-31  8:06   ` Thangalin via ntg-context
  0 siblings, 1 reply; 3+ messages in thread
From: Arthur Rosendahl via ntg-context @ 2022-03-30  7:48 UTC (permalink / raw)
  To: Mailing list for ConTeXt users; +Cc: Arthur Rosendahl

On Wed, Mar 30, 2022 at 12:32:11AM -0700, Thangalin via ntg-context wrote:
> An XML document includes the 👍 emoji, as shown in the following snippet:
> 
> <html>
>   <head><meta charset="utf8"/></head>
>   <body>
>     <div class="bubblerx">
>       <p>Thumbs up emoji: &#55357;&#56397;</p>

  Try the correct escape sequence :-)  That’s &#128077; -- or
equivalently &#x1F44D;

	Best,

		Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Typesetting unicode characters
  2022-03-30  7:48 ` Arthur Rosendahl via ntg-context
@ 2022-03-31  8:06   ` Thangalin via ntg-context
  0 siblings, 0 replies; 3+ messages in thread
From: Thangalin via ntg-context @ 2022-03-31  8:06 UTC (permalink / raw)
  To: Mailing list for ConTeXt users; +Cc: Thangalin


[-- Attachment #1.1: Type: text/plain, Size: 913 bytes --]

On the rare chance that someone else stumbles across this problem ...

By default, Java's Xalan transformer for creating XML documents does not
correctly encode emojis. Instead of &#x1F44D; for the thumbs up emoji,
Xalan encodes it as &#55357;&#56397;. As Arthur pointed out, this is not a
valid entity encoding.

One solution is to use Saxonica's Saxon 11 transformer, which produces the
expected output:

  <html>
    <head><meta charset="utf8"/></head>
    <body>
      <p id="caret">the 👍 emoji</p>
    </body>
  </html>

In Java, switching to Saxon entails installing the Jar files for Saxonica
and its resolvers. Then set the system property before invoking the XML
transformer: System.setProperty( "javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl" );

ConTeXt handles the emoji from the transformed XML file without any issues.

Thank you, Arthur.

[-- Attachment #1.2: Type: text/html, Size: 1227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-31  8:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30  7:32 Typesetting unicode characters Thangalin via ntg-context
2022-03-30  7:48 ` Arthur Rosendahl via ntg-context
2022-03-31  8:06   ` Thangalin via ntg-context

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).