From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23440 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: nopria Newsgroups: gmane.text.pandoc Subject: Re: U+200B and LaTeX Date: Wed, 18 Sep 2019 11:21:55 -0700 (PDT) Message-ID: References: <45688658-4762-4910-b8d1-a28a23efd91c@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1457_1707559996.1568830915451" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="136372"; mail-complaints-to="usenet@blaine.gmane.org" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDRYH4WSQIOBBRHLRHWAKGQETAQYYCI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Sep 18 20:22:01 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-oi1-f191.google.com ([209.85.167.191]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1iAeaU-000ZK0-Eo for gtp-pandoc-discuss@m.gmane.org; Wed, 18 Sep 2019 20:21:58 +0200 Original-Received: by mail-oi1-f191.google.com with SMTP id b187sf497740oii.23 for ; Wed, 18 Sep 2019 11:21:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=mB+4IwnwgoXWHlFYeMHamTN2L1SVYDYgzwQy1lIoihg=; b=KQuRNAU/z+PiXE7x0qQ1nhN/B2pXh1910XOu9/AlfSkmMk8aGrP880A1jO7lnm7l3p hjV9bgTYLzV7NQq75F30GlG9ofHZtCOkyEp9tJbyDUf1uC+9O6SU4hTkdFRvcFfeLjo8 iof6lPDRYE/VG4BUXiwzduxcHHdpTCJItSZHlH24VPcRdEY28rAOmtCWqmpQnoOWXp9J HbqGp8rWgsVoQ7l7k7yEjV/LPw2tZ8cCcI8jJDew9gpVDB2N8yR7g25/v8v2FlBQXyLX dpHR8JFpbsieTqkMUZ2lAWdtlg1aFi7ULoEa8Ni9cEtBTrNRHbqEPkYjI9RA4Jyhlgkj BSLg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=mB+4IwnwgoXWHlFYeMHamTN2L1SVYDYgzwQy1lIoihg=; b=RRyTjxYlk2DOF32+We/Q4NoHbrI44TBgDd1g4eqk08KyubXmnHLCK5Pi24vHn+LAU0 tQTiClK1SSMMtgNjZWGv0AJp9gPRNF5WJ6lFYR5pzHM1JHFMJRMRQCxh42wixcGUF3IV zGuExnx6LGto/mYyX3oFcUDY/XofKWjWpjDFa5iEd11TTOg6sVB4PhYaLa3MUTB3viAM LywEgPWtP2CjrvvaBsTX4grgZuqLkK8RL4pQvIx28215V+re0mUiZVigdMXi8zpQQ933 Hv79UI4g9YMhJaU8BdCW9w3WeKEgB0GCG1kPWCvhvetxHYW+vJN1euf5BCiHxuDzBiNQ AMcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=mB+4IwnwgoXWHlFYeMHamTN2L1SVYDYgzwQy1lIoihg=; b=I2Hp17vWpn0NFv7bfZB3RcASN3bZxfO8NZNID6CXsltXRDFQNWsgII8/Nb4fsqkxVi y6EOxdEIeLt3Mjz4bUyvOyJ05uJtnvYUPZQ1ndi6QCDhrNtLXmYdW1mFelBe4YH8Tu8X FWC0XR3IhKJZXBG+VCiltF6DJ5ELBgsXIB3k505do2pVu2E7BP25SBJ5dxjXzdAw+4R6 L+v5mPqsJ1k7XV36ZYRKhuho6LrsUNXcSxp3gPzOA9xza+5UR8XQiCHfPu0DIM8GLtNQ 3EK9it/PWbjbxzCCvEwdpaCVKokZOyxFF+uYI21bhBLop70SfgGQMdMQ7kVUz4Kz4zu8 BOEw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAUKsENoarE/LwzkfGJiighEaqgvgfJ+Fjpm1ejdQkjj1UeAuQxC YCsEUPFAajM1175pbHPCSvM= X-Google-Smtp-Source: APXvYqx7fB3fyvwd0plWiK+3jfjpXNxUWJIXkj8GzoFDjejudgMcbfyZ9x4lcZo+giAShrAHVfa4AQ== X-Received: by 2002:aca:d683:: with SMTP id n125mr3261531oig.21.1568830917141; Wed, 18 Sep 2019 11:21:57 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:1245:: with SMTP id s5ls213328otp.4.gmail; Wed, 18 Sep 2019 11:21:56 -0700 (PDT) X-Received: by 2002:a05:6830:2092:: with SMTP id y18mr2495961otq.331.1568830916213; Wed, 18 Sep 2019 11:21:56 -0700 (PDT) In-Reply-To: X-Original-Sender: mmj529-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23440 Archived-At: ------=_Part_1457_1707559996.1568830915451 Content-Type: multipart/alternative; boundary="----=_Part_1458_1941072782.1568830915452" ------=_Part_1458_1941072782.1568830915452 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I opened the issue https://github.com/jgm/pandoc/issues/5756 Il giorno mercoled=C3=AC 18 settembre 2019 18:24:40 UTC+2, John MacFarlane = ha=20 scritto: > > > The question is how we should render U+200B zero-width space=20 > in LaTeX. Currently we are just outputing the unicode character=20 > (which should work okay with xelatex anyway).=20 > > Is there a better way?=20 > > We could just output {}, for example.=20 > > It's probably worth putting an issue on the tracker.=20 > > nopria > writes:=20 > > > Converting from docbook to LaTeX I came across a possible uncorrect=20 > > management of U+200B when converting to LaTeX.=20 > > The following docbook MWE (the simple string "...abc")=20 > >=20 > > =20 > > =20 > > =20 > >
> "http://www.w3.org/1999/xlink" version=3D"5.0" xml:lang=3D"en">=20 > > …​abc=20 > >=20 > > is converted to LaTeX=20 > >=20 > > \ldots=E2=80=8Babc=20 > >=20 > > with a (invisible but detectable in the real output) zero-width-space= =20 > > between "\ldots" and "abc".=20 > >=20 > > I think that the correct LaTeX output should be=20 > >=20 > > \ldots abc=20 > >=20 > > with a standard space after `\ldots`, because if you try to produce a= =20 > PDF=20 > > you get=20 > >=20 > > [WARNING] Missing character: There is no =C3=94=C3=87=C3=AF (U+200B) in= font=20 > [lmroman10-=20 > > regular]:mapping=3Dtex-text;!=20 > >=20 > > because of the presence of the zero-width-space, whereas with the=20 > standard=20 > > space you get the correct output ("...abc" and not "... abc") in PDF=20 > (and=20 > > no warnings).=20 > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group.=20 > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/45688658-4762-4910-b8d1-= a28a23efd91c%40googlegroups.com.=20 > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/e205a1a5-ac61-41e7-a990-eb563a7e5a9d%40googlegroups.com. ------=_Part_1458_1941072782.1568830915452 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Il giorno mercoled=C3=AC 18 settembre 2019 18:24:40 UTC+2, J= ohn MacFarlane ha scritto:

The question is how we should render U+200B zero-width space
in LaTeX. Currently we are just outputing the unicode character
(which should work okay with xelatex anyway).

Is there a better way?

We could just output {}, for example.

It's probably worth putting an issue on the tracker.

nopria <mmj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Converting from docbook to LaTeX I came across a possible uncorrec= t=20
> management of U+200B when converting to LaTeX.
> The following docbook MWE (the simple string "...abc")
>
> <?xml version=3D"1.0" encoding=3D"UTF-8"?&g= t;
> <?asciidoc-toc?>
> <?asciidoc-numbered?>
> <article xmlns=3D"http://docbook.org/ns/docbook<= /a>" xmlns:xl=3D
> "
http://www.w3.org/1999/xlink" version=3D"5.0= " xml:lang=3D"en">
> <simpara>&#8230;&#8203;abc</simpara>
>
> is converted to LaTeX
>
> \ldots=E2=80=8Babc
>
> with a (invisible but detectable in the real output) zero-width-sp= ace=20
> between "\ldots" and "abc".
>
> I think that the correct LaTeX output should be
>
> \ldots abc
>
> with a standard space after `\ldots`, because if you try to produc= e a PDF=20
> you get
>
> [WARNING] Missing character: There is no =C3=94=C3=87=C3=AF (U+200= B) in font [lmroman10-
> regular]:mapping=3Dtex-text;!
>
> because of the presence of the zero-width-space, whereas with the = standard=20
> space you get the correct output ("...abc" and not "= ;... abc") in PDF (and=20
> no warnings).
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-...@googlegroups.com.
> To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/45688658-4762-4910-b8d1-a28a23e= fd91c%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/e205a1a5-ac61-41e7-a990-eb563a7e5a9d%40googlegroups.co= m.
------=_Part_1458_1941072782.1568830915452-- ------=_Part_1457_1707559996.1568830915451--