public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* U+200B and LaTeX
@ 2019-09-18 10:05 nopria
       [not found] ` <45688658-4762-4910-b8d1-a28a23efd91c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: nopria @ 2019-09-18 10:05 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1404 bytes --]

Converting from docbook to LaTeX I came across a possible uncorrect 
management of U+200B when converting to LaTeX.
The following docbook MWE (the simple string "...abc")

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl=
"http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<simpara>&#8230;&#8203;abc</simpara>

is converted to LaTeX

\ldots​abc

with a (invisible but detectable in the real output) zero-width-space 
between "\ldots" and "abc".

I think that the correct LaTeX output should be

\ldots abc

with a standard space after `\ldots`, because if you try to produce a PDF 
you get

[WARNING] Missing character: There is no ÔÇï (U+200B) in font [lmroman10-
regular]:mapping=tex-text;!

because of the presence of the zero-width-space, whereas with the standard 
space you get the correct output ("...abc" and not "... abc") in PDF (and 
no warnings).

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/45688658-4762-4910-b8d1-a28a23efd91c%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8191 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: U+200B and LaTeX
       [not found] ` <45688658-4762-4910-b8d1-a28a23efd91c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-09-18 16:24   ` John MacFarlane
       [not found]     ` <m25zlp4lon.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2019-09-18 16:24 UTC (permalink / raw)
  To: nopria, pandoc-discuss


The question is how we should render U+200B zero-width space
in LaTeX. Currently we are just outputing the unicode character
(which should work okay with xelatex anyway).

Is there a better way?

We could just output {}, for example.

It's probably worth putting an issue on the tracker.

nopria <mmj529-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Converting from docbook to LaTeX I came across a possible uncorrect 
> management of U+200B when converting to LaTeX.
> The following docbook MWE (the simple string "...abc")
>
> <?xml version="1.0" encoding="UTF-8"?>
> <?asciidoc-toc?>
> <?asciidoc-numbered?>
> <article xmlns="http://docbook.org/ns/docbook" xmlns:xl=
> "http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
> <simpara>&#8230;&#8203;abc</simpara>
>
> is converted to LaTeX
>
> \ldots​abc
>
> with a (invisible but detectable in the real output) zero-width-space 
> between "\ldots" and "abc".
>
> I think that the correct LaTeX output should be
>
> \ldots abc
>
> with a standard space after `\ldots`, because if you try to produce a PDF 
> you get
>
> [WARNING] Missing character: There is no ÔÇï (U+200B) in font [lmroman10-
> regular]:mapping=tex-text;!
>
> because of the presence of the zero-width-space, whereas with the standard 
> space you get the correct output ("...abc" and not "... abc") in PDF (and 
> no warnings).
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/45688658-4762-4910-b8d1-a28a23efd91c%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m25zlp4lon.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: U+200B and LaTeX
       [not found]     ` <m25zlp4lon.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-09-18 18:21       ` nopria
  0 siblings, 0 replies; 3+ messages in thread
From: nopria @ 2019-09-18 18:21 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2570 bytes --]

I opened the issue https://github.com/jgm/pandoc/issues/5756


Il giorno mercoledì 18 settembre 2019 18:24:40 UTC+2, John MacFarlane ha 
scritto:
>
>
> The question is how we should render U+200B zero-width space 
> in LaTeX. Currently we are just outputing the unicode character 
> (which should work okay with xelatex anyway). 
>
> Is there a better way? 
>
> We could just output {}, for example. 
>
> It's probably worth putting an issue on the tracker. 
>
> nopria <mmj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > Converting from docbook to LaTeX I came across a possible uncorrect 
> > management of U+200B when converting to LaTeX. 
> > The following docbook MWE (the simple string "...abc") 
> > 
> > <?xml version="1.0" encoding="UTF-8"?> 
> > <?asciidoc-toc?> 
> > <?asciidoc-numbered?> 
> > <article xmlns="http://docbook.org/ns/docbook" xmlns:xl= 
> > "http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> 
> > <simpara>&#8230;&#8203;abc</simpara> 
> > 
> > is converted to LaTeX 
> > 
> > \ldots​abc 
> > 
> > with a (invisible but detectable in the real output) zero-width-space 
> > between "\ldots" and "abc". 
> > 
> > I think that the correct LaTeX output should be 
> > 
> > \ldots abc 
> > 
> > with a standard space after `\ldots`, because if you try to produce a 
> PDF 
> > you get 
> > 
> > [WARNING] Missing character: There is no ÔÇï (U+200B) in font 
> [lmroman10- 
> > regular]:mapping=tex-text;! 
> > 
> > because of the presence of the zero-width-space, whereas with the 
> standard 
> > space you get the correct output ("...abc" and not "... abc") in PDF 
> (and 
> > no warnings). 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/45688658-4762-4910-b8d1-a28a23efd91c%40googlegroups.com. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e205a1a5-ac61-41e7-a990-eb563a7e5a9d%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5120 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-18 18:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 10:05 U+200B and LaTeX nopria
     [not found] ` <45688658-4762-4910-b8d1-a28a23efd91c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-09-18 16:24   ` John MacFarlane
     [not found]     ` <m25zlp4lon.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-09-18 18:21       ` nopria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).