From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/24335 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Chris Jones Newsgroups: gmane.text.pandoc Subject: Re: pandoc correctly translates U+2024 thin space to '\,' but the spaces in PDF created by Xelatex are full-width Date: Sun, 2 Feb 2020 18:24:20 -0800 (PST) Message-ID: <5f3b2ff3-b74f-4ba5-858b-b08b13124190@googlegroups.com> References: <818817e7-17c7-4bf4-b9fb-e300f6faaf37@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_384_1889571878.1580696660746" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="107196"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD4675FCS4BBBVMI33YQKGQEB7PXFOQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 03 03:24:25 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f63.google.com ([209.85.210.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1iyRPV-000RmU-3n for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 03 Feb 2020 03:24:25 +0100 Original-Received: by mail-ot1-f63.google.com with SMTP id 4sf8209861otd.17 for ; Sun, 02 Feb 2020 18:24:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=3XLpctkGAsr2dzZJKQJXKeEYxqdZr7L6fZJG5EWWQMA=; b=Tr43vlaXrk5uSC9UydQEahIOrZP58pmPHuD1C7BWdCVVC1Gt6XEmpE+fPRTd6VzwRR fqyYjRPzrFnnr0h/PzJYPIazMk7E7BOFtB+l305xJ6GH81Ae0SHCsjGE0i1Rd3XimD6H iO+2W6dqO3DzdPHJ/0J9IP3ST+xtU5od6Zhim0C+6Ly9MD4CrLLmUSJnSQ8IRzDk8M99 qYmdAs7poJzE1oLMX9kV0DmE0oKXtbRm1ULwVNav7s+/9f3NnnoJf+o1W+s4JaeoBhMH my/NHA7V2c445Qv8gTGtFMF7UmUPnxvtCiruY4Y0GJRfsCiPvp/DEeIFAZwcpORAXqxg NoFQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=3XLpctkGAsr2dzZJKQJXKeEYxqdZr7L6fZJG5EWWQMA=; b=cugjuH1ahaOdudz681CdktfvGXpnbDIDv+2zOQxDPRACL5QtR+sIYzO0s8z2biWo4j NVCaZEX4IMFqHn2hZdWRIfmLLhm14Ob4TR9WC1OBP/WBzKK7r1uiQ8vNA17i48W2pDDJ fTPapwW6fPR3oSbID1yuokdTrq86Zm7lBjnoZWKG8eTp52QmDUJDo5H+7BwzOq1LHlZL hK1KIBS2bqafMw/tOQGGpGEg3CEf4MMTX9FEfMx5NyOdFO3xFPnldFTIGZnITsiiVQrp Rg7RxrTJpzWtEL5fnCq3U95xTS4p+xt3dfRXTJkkOK8cKMbg7WOBKIkI1cngWHs7FDQe A8Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=3XLpctkGAsr2dzZJKQJXKeEYxqdZr7L6fZJG5EWWQMA=; b=cbJYiZsbv9S3oyVyx0E+Pr9RnyFkV5N06zwae0n+F9fvC6NZuUmkP4TJTAREDUjdhz GuiIY1Zy3724pdJIPc2GA9se7qlomlFz8nrTziumSqZ+p7MqCOHCCDTX88sBwXo7qdpW JzrMH+psZsphtxAgdaHeXHT6bZB+08vkKTdi7nPUljWdlHGn21N3LE8Ob1Y79g1X05GW uiVh8Duou6rZRcMw0es11SjrC5LMLtR3RKF9CcjKRy/EWvk2GT0UVpoOuWDKdWRsoswW BXV3kStdZTOdGC9waUfxeZywTKWVVJ+EAAzVf2ePnYp/PFQr1n7//Nj0xaqew0ckxuu2 +TbA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAWueRk6fWAaVEzcPlsj7x71gYco7bQO7mbecC71qMXLwep2NTGF ISMxf3frC1qZ0dkq2gPf/f0= X-Google-Smtp-Source: APXvYqwDiOZI9RAdG7ZK+1S1EFFlIgXuK9Gw39j+ohamirYLP+RHWSMz9nHbKu74jafrnCBPOxjp7A== X-Received: by 2002:a9d:7410:: with SMTP id n16mr16716215otk.23.1580696664086; Sun, 02 Feb 2020 18:24:24 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:895:: with SMTP id 143ls1621928oii.1.gmail; Sun, 02 Feb 2020 18:24:21 -0800 (PST) X-Received: by 2002:a05:6808:99c:: with SMTP id a28mr7230512oic.164.1580696661370; Sun, 02 Feb 2020 18:24:21 -0800 (PST) In-Reply-To: <818817e7-17c7-4bf4-b9fb-e300f6faaf37-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:24335 Archived-At: ------=_Part_384_1889571878.1580696660746 Content-Type: multipart/alternative; boundary="----=_Part_385_1986805461.1580696660746" ------=_Part_385_1986805461.1580696660746 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Good=E2=80=A6 was beginning to wonder with all those wicked viruses going t= he=20 rounds=E2=80=A6 whether I would get some form of reply. The problem I think turns out to be related to the way I write French=20 punctuation. I learned the hard way about this peculiarity of French=20 typography where you must stick those non-breaking thin spaces where they= =20 belong and made a point of doing it by hand (in vim I haved mapped CTRL+K= =20 to U+202f). Now what happens is that from his name I guess that the=20 gentleman who initially came up with the polyglossia package is a=20 frenchman. As such he was very focused on this particular typesetting quirk= =20 and decided to ensure that this thin spaces business would be taken care of= =20 automatically by his package. I believe this is the way text processors=20 handle it when you tell them that the material is written in French (?). So what I am beginning to suspect is that pandoc invokes the polyglossia=20 package correctly but since I do not rely on this being done for me by the= =20 package=E2=80=A6 it may very well turn out that polyglossia inserts a* seco= nd thin=20 space* next to the one that's already there=E2=80=A6 adding up to what look= s=20 basically like a regular-width space. Not sure how I could check the=20 resulting PDF and verify this hunch of mine. Another approach would=20 obviously be to run a few regex's to get rid of all those U+202f's in my=20 .md files and see what happens=E2=80=A6 I took a look at the polyglossia doc as you suggest and I did find=20 indications that polyglossia does add thin spaces automatically (that's the= =20 kind of feature that make polyglossia more 'modern' than babel I imagine)= =20 and that you can use the 'autospacing=3Dfalse' option should you need to=20 disable the feature. Obviously most people would NOT want pandoc to disable the feature=E2=80=A6 So it's really up to me to change my usepackage polyglossia invocation to= =20 make sure my thin spaces are left alone. What would be the recommend way to do this=E2=80=A6 hard code something lik= e=20 '\usepackage[autospacing=3Dfalse]{polyglossia}' in my=20 ~/.pandoc/templates/default.latex I imagine? Thanks, CJ P.S. how can I fix the typo in the issue's title=E2=80=A6 U+2024 instead of= the=20 intended U+202f? On Saturday, February 1, 2020 at 2:18:30 PM UTC-5, Chris Jones wrote: > > Searched online for similar cases and didn't find anything relevant. > > The context is that I recently was made aware that the French insist that= =20 > a *thin space* be inserted immediately before some punctuation characters= =20 > *',:!?=C2=BB%*' etc.=E2=80=A6 So in dialogs for instance e.g. =E2=80=A6 t= he .md source has: =C2=AB=20 > =C2=B7 bonjour mademoiselle =C2=B7 =C2=BB where the middle dots represent= a single U+202f=20 > non-breaking space. > > When I take a look at the intermediate .tex file that pandoc generates th= e=20 > thin spaces are correctly converted to '\,' which I believe is the *latex= =20 > way *of coding thin spaces. But when I run xelatex on the latex file and= =20 > look at the resulting PDF I can see that the thin spaces have become=20 > regular-width spaces.=20 > > I compared the PDF output to another PDF I had created using plain latex= =20 > rather than pandoc and the U+202F's that I typed in my .tex source clearl= y=20 > materialize as thin spaces in the PDF. =20 > > What I suspect at this point is that one of the latex packages that pando= c=20 > sticks in the generated latex file (or the way it is invoked? perhaps a= =20 > combination of packages? =E2=80=A6?) is causing this. > > As to an *MWE*=E2=80=A6 I'm not sure it's really appropriate in this part= icular=20 > case=E2=80=A6 > > *Just in case=E2=80=A6 here's what I get from a minimal .md input file:* > > `\PassOptionsToPackage{unicode=3Dtrue}{hyperref} % options for packages= =20 > loaded elsewhere > \PassOptionsToPackage{hyphens}{url} > % > \documentclass[oneside,10pt,french,]{extbook} % cjns1989 - 27112019 -=20 > added the oneside option: so that the text doesn't jump left & right when= =20 > reading on a tablet/ereader > \usepackage{lmodern} > \usepackage{amssymb,amsmath} > \usepackage{ifxetex,ifluatex} > \usepackage{fixltx2e} % provides \textsubscript > \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=3D0 % if pdftex > \usepackage[T1]{fontenc} > \usepackage[utf8]{inputenc} > \usepackage{textcomp} % provides euro and other symbols > \else % if luatex or xelatex > \usepackage{unicode-math} > \defaultfontfeatures{Ligatures=3DTeX,Scale=3DMatchLowercase} > % \setmainfont[]{EBGaramond-Regular} > \setmainfont[Numbers=3D{OldStyle,Proportional}]{EBGaramond-Regular} = =20 > % cjns1989 - 20191129 - old style numbers=20 > \fi > % use upquote if available, for straight quotes in verbatim environments > \IfFileExists{upquote.sty}{\usepackage{upquote}}{} > % use microtype if available > \IfFileExists{microtype.sty}{% > \usepackage[]{microtype} > \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts > }{} > \usepackage{hyperref} > \hypersetup{ > pdftitle=3D{WME}, > pdfborder=3D{0 0 0}, > breaklinks=3Dtrue} > \urlstyle{same} % don't use monospace font for urls > \usepackage[papersize=3D{3.75 in, 6.0 in},left=3D.3 in,right=3D.3 in]{geo= metry} > \setlength{\emergencystretch}{3em} % prevent overfull lines > \providecommand{\tightlist}{% > \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} > \setcounter{secnumdepth}{0} > % Redefines (sub)paragraphs to behave more like sections > \ifx\paragraph\undefined\else > \let\oldparagraph\paragraph > \renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}} > \fi > \ifx\subparagraph\undefined\else > \let\oldsubparagraph\subparagraph > \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}} > \fi > % set default figure placement to htbp > \makeatletter > \def\fps@figure{htbp} > \makeatother > > \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=3D0 % if pdftex > \usepackage[shorthands=3Doff,main=3Dfrench]{babel} > \else > % load polyglossia as late as possible as it *could* call bidi if RTL= =20 > lang (e.g. Hebrew or Arabic) > \usepackage{polyglossia} > \setmainlanguage[]{french} > \fi > > \title{WME} > \date{} > > \begin{document} > \maketitle > > \$ ECM > > \hypertarget{wme-title}{% > \chapter{WME (title)}\label{wme-title}} > > en lettres capitales, soigneusement imprim=C3=A9es au pochoir\,: > > --- =C2=AB\,Cr=C3=A9tins\,!\,=C2=BB murmura-t-il. > > \end{document}` > > *Customization* is minimal: old style numbers (proportional) and=20 > one-sided since the document is not destined for hard-copy printing=E2=80= =A6 > > What I have in mind at this point to try and figure out what is happening= =20 > is to work with a one line .md source that has some U+202F's and remove= =20 > default packages until the problem goes away but before I do this I thoug= ht=20 > maybe someone has run into something similar or might suggest a better=20 > approach than plain trial and error to help determine the cause of the=20 > problem. > > Thoughts? > > Thanks, > > CJ > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/5f3b2ff3-b74f-4ba5-858b-b08b13124190%40googlegroups.com. ------=_Part_385_1986805461.1580696660746 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Good=E2=80=A6 was beginning to wonder with all those = wicked viruses going the rounds=E2=80=A6 whether I would get some form of r= eply.

The problem I think turns out to be related = to the way I write French punctuation. I learned the hard way about this pe= culiarity of French typography where you must stick those non-breaking thin= spaces where they belong and made a point of doing it by hand (in vim I ha= ved mapped CTRL+K to U+202f). Now what happens is that from his name I gues= s that the gentleman who initially came up with the polyglossia package is = a frenchman. As such he was very focused on this particular typesetting qui= rk and decided to ensure that this thin spaces business would be taken care= of automatically by his package. I believe this is the way text processors= handle it when you tell them that the material is written in French (?).

So what I am beginning to suspect is that pandoc in= vokes the polyglossia package correctly but since I do not rely on this bei= ng done for me by the package=E2=80=A6 it may very well turn out that polyg= lossia inserts a second thin space next to the one that's alread= y there=E2=80=A6 adding up to what looks basically like a regular-width spa= ce. Not sure how I could check the resulting PDF and verify this hunch of m= ine. Another approach would obviously be to run a few regex's to get ri= d of all those U+202f's in my .md files and see what happens=E2=80=A6

I took a look at the polyglossia doc as you sug= gest and I did find indications that polyglossia does add thin spaces autom= atically (that's the kind of feature that make polyglossia more 'mo= dern' than babel I imagine) and that you can use the 'autospacing= =3Dfalse' option should you need to disable the feature.

=
Obviously most people would NOT want pandoc to disable the featu= re=E2=80=A6

So it's really up to me to cha= nge my usepackage polyglossia invocation to make sure my thin spaces are le= ft alone.

What would be the recommend way to do th= is=E2=80=A6 hard code something like '\usepackage[autospacing=3Dfalse]{= polyglossia}' in my ~/.pandoc/templates/default.latex I imagine?
<= div>
Thanks,

CJ

=
P.S. how can I fix the typo in the issue's title=E2=80=A6 U+2024 i= nstead of the intended U+202f?

On Saturday, February 1, 2020 a= t 2:18:30 PM UTC-5, Chris Jones wrote:
Searched online for similar cases and didn= 9;t find anything relevant.

The context is that I = recently was made aware that the French insist that a thin space be = inserted immediately before some punctuation characters ',:!?=C2=BB%= ' etc.=E2=80=A6 So in dialogs for instance e.g. =E2=80=A6 the .md s= ource has: =C2=AB =C2=B7 bonjour mademoiselle =C2=B7 =C2=BB where the middl= e dots represent a single U+202f non-breaking space.

When I take a look at the intermediate .tex file that pandoc generates t= he thin spaces are correctly converted to '\,' which I believe is t= he latex way of coding thin spaces. But when I run xelatex on the la= tex file and look at the resulting PDF I can see that the thin spaces have = become regular-width spaces.

I compared the P= DF output to another PDF I had created using plain latex rather than pandoc= and the U+202F's that I typed in my .tex source clearly materialize as= thin spaces in the PDF.=C2=A0

What I suspect= at this point is that one of the latex packages that pandoc sticks in the = generated latex file (or the way it is invoked? perhaps a combination of pa= ckages? =E2=80=A6?) is causing this.

As to an M= WE=E2=80=A6 I'm not sure it's really appropriate in this partic= ular case=E2=80=A6

Just in case=E2=80=A6 here&#= 39;s what I get from a minimal .md input file:

`\PassOptionsToPackage{unicode=3Dtrue}{hyperref} % options for packag= es loaded elsewhere
\PassOptionsToPackage{hyphens}{url}
%
\do= cumentclass[oneside,10pt,french,]{extbook} % cjns1989 - 27112019 - add= ed the oneside option: so that the text doesn't jump left & right w= hen reading on a tablet/ereader
\usepackage{lmodern}
\usepackage{amss= ymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % pr= ovides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=3D0 % if pdfte= x
=C2=A0 \usepackage[T1]{fontenc}
=C2=A0 \usepackage[utf8]{inputenc}<= br>=C2=A0 \usepackage{textcomp} % provides euro and other symbols
\else = % if luatex or xelatex
=C2=A0 \usepackage{unicode-math}
=C2=A0 \defau= ltfontfeatures{Ligatures=3DTeX,Scale=3DMatchLowercase}
%=C2=A0= =C2=A0 \setmainfont[]{EBGaramond-Regular}
=C2=A0=C2=A0=C2=A0 \setma= infont[Numbers=3D{OldStyle,Proportional}]{EBGaramond-Regular}=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 % cjns1989 - 20191129 - old style numbers
\= fi
% use upquote if available, for straight quotes in verbatim environme= nts
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use mic= rotype if available
\IfFileExists{microtype.sty}{%
\usepackage[]{micr= otype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion= for tt fonts
}{}
\usepackage{hyperref}
\hypersetup{
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pdftitle=3D{WME},=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pdfb= order=3D{0 0 0},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 breaklinks=3Dtrue}
\urlstyle{same}=C2=A0 % don't use mo= nospace font for urls
\usepackage[papersize=3D{3.75 in, 6.0 in},left=3D.= 3 in,right=3D.3 in]{geometry}
\setlength{\emergencystretch}{3em}=C2= =A0 % prevent overfull lines
\providecommand{\tightlist}{%
=C2=A0 \se= tlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnum= depth}{0}
% Redefines (sub)paragraphs to behave more like sections
\i= fx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand= {\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\= undefined\else
\let\oldsubparagraph\subparagraph
\renewcomm= and{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi<= br>% set default figure placement to htbp
\makeatletter
\def\fps@figu= re{htbp}
\makeatother

\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=3D0 % i= f pdftex
=C2=A0 \usepackage[shorthands=3Doff,main=3Dfrench]{babel}<= br>\else
=C2=A0 % load polyglossia as late as possible as it *could* cal= l bidi if RTL lang (e.g. Hebrew or Arabic)
=C2=A0 \usepackage{polyglossi= a}
=C2=A0 \setmainlanguage[]{french}
\fi

\title{WME}
\date{= }

\begin{document}
\maketitle

\$ ECM

\hypertarget{w= me-title}{%
\chapter{WME (title)}\label{wme-title}}

en lettres ca= pitales, soigneusement imprim=C3=A9es au pochoir\,:

--- =C2=AB\,Cr= =C3=A9tins\,!\,=C2=BB murmura-t-il.

\end{document}`

Customization is minimal: old style numbers (proportional) = and one-sided since the document is not destined for hard-copy printing=E2= =80=A6

What I have in mind at this point to try an= d figure out what is happening is to work=20 with a one line .md source that has some U+202F's and remove default pa= ckages=20 until the problem goes away but before I do this I thought maybe someone has run into something similar or might suggest a better approach than=20 plain trial and error to help determine the cause of the problem.

Thoughts?

Thanks,

CJ

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/5f3b2ff3-b74f-4ba5-858b-b08b13124190%40googlegroups.co= m.
------=_Part_385_1986805461.1580696660746-- ------=_Part_384_1889571878.1580696660746--