From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/114639 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Thangalin via ntg-context Newsgroups: gmane.comp.tex.context Subject: Re: Typesetting unicode characters Date: Thu, 31 Mar 2022 01:06:27 -0700 Message-ID: References: Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3994441001689102944==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23575"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Thangalin To: Mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Thu Mar 31 10:07:08 2022 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nZppk-0005vN-Kw for gctc-ntg-context-518@m.gmane-mx.org; Thu, 31 Mar 2022 10:07:08 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 3C962289953; Thu, 31 Mar 2022 10:06:46 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xhhf4PD4Div0; Thu, 31 Mar 2022 10:06:44 +0200 (CEST) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 20E4D28981D; Thu, 31 Mar 2022 10:06:44 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id CB30A28981D for ; Thu, 31 Mar 2022 10:06:41 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aoCCPMpZ6R9S for ; Thu, 31 Mar 2022 10:06:41 +0200 (CEST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.215.174; helo=mail-pg1-f174.google.com; envelope-from=thangalin@gmail.com; receiver= Original-Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id B6D51284ACB for ; Thu, 31 Mar 2022 10:06:40 +0200 (CEST) Original-Received: by mail-pg1-f174.google.com with SMTP id l129so10907434pga.3 for ; Thu, 31 Mar 2022 01:06:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=NfgA4pilc2/GpY5Uc4TlZDXPZ4wI9SLPtJ1RK3oMlVM=; b=SQEkvKbjlQerZZDhhVjP458Dty6qMxEKrm9mr6jWb2+8kvg23wM3egVfSSbpCe5hB+ AOUdGYFP5rGAQxwctFF3HBJ7b3gtL+O+Mc/cSJZYAHOfBK39ROzeVceLPe0uKYrVp335 1HuwSnURzl+EzPXL78PpNBDwT3M8If0EqdYhXodFt94rtU35EUu7M4uK/a6kz0G5yiNS J5fWlYXpyXjwpvnrX2GvO9Ip+f1sMLlSUXJOlHLghjTxzs8nWDZOh3ViLATt48bc6zT3 bWeFPNGGOHR6NikTTyd1F5KvQh2ymWx02AWk+zFuwRj7MOgV64rZNZe8FBEq7KZ6FCWs mU8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=NfgA4pilc2/GpY5Uc4TlZDXPZ4wI9SLPtJ1RK3oMlVM=; b=imzTCqVu9U8WAkl9tvGGYM04S5Fgin953XwlMz0N7YZtdDxCxlhwDvXG9+B2sVni0Q 8CsDJ3Wc1Awrc5MdC9iYy91igNLQsF5L0nrZXtt176AAZU44TYLlMfol4+Ir+uOFg35C muZIwraNwYB4KxoOAMhXElV84QzDb6sVYWbvGfj+rYVos7+8mssAAIO+6ZvTJWVCimnT Uh/rbIcTsbeDEQoHq3PBPsnKGocXvxwOQ7T5Ym9Lql0rKKsifBHuERE2KzLN26FUjF7l LUDnb50ZT9eqjO//eUYbzjCkfJt2onmGkIoles4iTTtf/U9Gx+H9HDGT6xrcLcH55HJo Ik1Q== X-Gm-Message-State: AOAM533zk1SJ0YEo8FChV9ZC6kdGDqdoWFcEGSAp3Lfjt/zc5Juf10uT j5SqLTtLcCYxarhwL66eZltPVIOrMnrOWdODWQ3dR8Y9oiE= X-Google-Smtp-Source: ABdhPJwgorhKXWh8/tYQh5nK50UDQXLkqUveRZ8uEcdgUUkAOkyCTyE1kvk+Xea1zB4DqOSdUw89s2nO9YZ8XDVSIf4= X-Received: by 2002:a63:eb51:0:b0:382:53c4:bb66 with SMTP id b17-20020a63eb51000000b0038253c4bb66mr9824566pgk.540.1648713999040; Thu, 31 Mar 2022 01:06:39 -0700 (PDT) In-Reply-To: X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:114639 Archived-At: --===============3994441001689102944== Content-Type: multipart/alternative; boundary="0000000000003fc64605db7f24dc" --0000000000003fc64605db7f24dc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On the rare chance that someone else stumbles across this problem ... By default, Java's Xalan transformer for creating XML documents does not correctly encode emojis. Instead of 👍 for the thumbs up emoji, Xalan encodes it as ��. As Arthur pointed out, this is not a valid entity encoding. One solution is to use Saxonica's Saxon 11 transformer, which produces the expected output:

the =F0=9F=91=8D emoji

In Java, switching to Saxon entails installing the Jar files for Saxonica and its resolvers. Then set the system property before invoking the XML transformer: System.setProperty( "javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl" ); ConTeXt handles the emoji from the transformed XML file without any issues. Thank you, Arthur. --0000000000003fc64605db7f24dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On the rare chance that someone else stumbles across = this problem ...

By default, Java's Xalan tran= sformer for creating XML documents does not correctly encode emojis. Instea= d of &#x1F44D; for the thumbs up emoji, Xalan encodes it as &#55357= ;&#56397;. As Arthur pointed out, this is not a valid entity encoding.<= /div>

One solution is to use Saxonica's Saxon 11 tra= nsformer, which produces the expected output:

=C2= =A0 <html>
=C2=A0=C2=A0=C2=A0 <head><meta charset=3D"= ;utf8"/></head>
=C2=A0=C2=A0=C2=A0 <body>
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <p id=3D"caret">the =F0=9F= =91=8D emoji</p>
=C2=A0=C2=A0=C2=A0 </body>
=C2=A0= </html>

In Java, switching to Saxon entails= installing the Jar files for Saxonica and its resolvers. Then set the syst= em property before invoking the XML transformer: System.setProperty( "= javax.xml.transform.TransformerFactory",=C2=A0 "net.sf.saxon.Tran= sformerFactoryImpl" );

ConTeXt handles th= e emoji from the transformed XML file without any issues.
Thank you, Arthur.
--0000000000003fc64605db7f24dc-- --===============3994441001689102944== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============3994441001689102944==--