public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Chris Jones <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: pandoc correctly translates U+2024 thin space to '\,' but the spaces in PDF created by Xelatex are full-width
Date: Sun, 2 Feb 2020 18:24:20 -0800 (PST)	[thread overview]
Message-ID: <5f3b2ff3-b74f-4ba5-858b-b08b13124190@googlegroups.com> (raw)
In-Reply-To: <818817e7-17c7-4bf4-b9fb-e300f6faaf37-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 7353 bytes --]

Good… was beginning to wonder with all those wicked viruses going the 
rounds… whether I would get some form of reply.

The problem I think turns out to be related to the way I write French 
punctuation. I learned the hard way about this peculiarity of French 
typography where you must stick those non-breaking thin spaces where they 
belong and made a point of doing it by hand (in vim I haved mapped CTRL+K 
to U+202f). Now what happens is that from his name I guess that the 
gentleman who initially came up with the polyglossia package is a 
frenchman. As such he was very focused on this particular typesetting quirk 
and decided to ensure that this thin spaces business would be taken care of 
automatically by his package. I believe this is the way text processors 
handle it when you tell them that the material is written in French (?).

So what I am beginning to suspect is that pandoc invokes the polyglossia 
package correctly but since I do not rely on this being done for me by the 
package… it may very well turn out that polyglossia inserts a* second thin 
space* next to the one that's already there… adding up to what looks 
basically like a regular-width space. Not sure how I could check the 
resulting PDF and verify this hunch of mine. Another approach would 
obviously be to run a few regex's to get rid of all those U+202f's in my 
.md files and see what happens…

I took a look at the polyglossia doc as you suggest and I did find 
indications that polyglossia does add thin spaces automatically (that's the 
kind of feature that make polyglossia more 'modern' than babel I imagine) 
and that you can use the 'autospacing=false' option should you need to 
disable the feature.

Obviously most people would NOT want pandoc to disable the feature…

So it's really up to me to change my usepackage polyglossia invocation to 
make sure my thin spaces are left alone.

What would be the recommend way to do this… hard code something like 
'\usepackage[autospacing=false]{polyglossia}' in my 
~/.pandoc/templates/default.latex I imagine?

Thanks,

CJ

P.S. how can I fix the typo in the issue's title… U+2024 instead of the 
intended U+202f?

On Saturday, February 1, 2020 at 2:18:30 PM UTC-5, Chris Jones wrote:
>
> Searched online for similar cases and didn't find anything relevant.
>
> The context is that I recently was made aware that the French insist that 
> a *thin space* be inserted immediately before some punctuation characters 
> *',:!?»%*' etc.… So in dialogs for instance e.g. … the .md source has: « 
> · bonjour mademoiselle · » where the middle dots represent a single U+202f 
> non-breaking space.
>
> When I take a look at the intermediate .tex file that pandoc generates the 
> thin spaces are correctly converted to '\,' which I believe is the *latex 
> way *of coding thin spaces. But when I run xelatex on the latex file and 
> look at the resulting PDF I can see that the thin spaces have become 
> regular-width spaces. 
>
> I compared the PDF output to another PDF I had created using plain latex 
> rather than pandoc and the U+202F's that I typed in my .tex source clearly 
> materialize as thin spaces in the PDF.  
>
> What I suspect at this point is that one of the latex packages that pandoc 
> sticks in the generated latex file (or the way it is invoked? perhaps a 
> combination of packages? …?) is causing this.
>
> As to an *MWE*… I'm not sure it's really appropriate in this particular 
> case…
>
> *Just in case… here's what I get from a minimal .md input file:*
>
> `\PassOptionsToPackage{unicode=true}{hyperref} % options for packages 
> loaded elsewhere
> \PassOptionsToPackage{hyphens}{url}
> %
> \documentclass[oneside,10pt,french,]{extbook} % cjns1989 - 27112019 - 
> added the oneside option: so that the text doesn't jump left & right when 
> reading on a tablet/ereader
> \usepackage{lmodern}
> \usepackage{amssymb,amsmath}
> \usepackage{ifxetex,ifluatex}
> \usepackage{fixltx2e} % provides \textsubscript
> \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
>   \usepackage[T1]{fontenc}
>   \usepackage[utf8]{inputenc}
>   \usepackage{textcomp} % provides euro and other symbols
> \else % if luatex or xelatex
>   \usepackage{unicode-math}
>   \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
> %   \setmainfont[]{EBGaramond-Regular}
>     \setmainfont[Numbers={OldStyle,Proportional}]{EBGaramond-Regular}      
> % cjns1989 - 20191129 - old style numbers 
> \fi
> % use upquote if available, for straight quotes in verbatim environments
> \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
> % use microtype if available
> \IfFileExists{microtype.sty}{%
> \usepackage[]{microtype}
> \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
> }{}
> \usepackage{hyperref}
> \hypersetup{
>             pdftitle={WME},
>             pdfborder={0 0 0},
>             breaklinks=true}
> \urlstyle{same}  % don't use monospace font for urls
> \usepackage[papersize={3.75 in, 6.0 in},left=.3 in,right=.3 in]{geometry}
> \setlength{\emergencystretch}{3em}  % prevent overfull lines
> \providecommand{\tightlist}{%
>   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
> \setcounter{secnumdepth}{0}
> % Redefines (sub)paragraphs to behave more like sections
> \ifx\paragraph\undefined\else
> \let\oldparagraph\paragraph
> \renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
> \fi
> \ifx\subparagraph\undefined\else
> \let\oldsubparagraph\subparagraph
> \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
> \fi
> % set default figure placement to htbp
> \makeatletter
> \def\fps@figure{htbp}
> \makeatother
>
> \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
>   \usepackage[shorthands=off,main=french]{babel}
> \else
>   % load polyglossia as late as possible as it *could* call bidi if RTL 
> lang (e.g. Hebrew or Arabic)
>   \usepackage{polyglossia}
>   \setmainlanguage[]{french}
> \fi
>
> \title{WME}
> \date{}
>
> \begin{document}
> \maketitle
>
> \$ ECM
>
> \hypertarget{wme-title}{%
> \chapter{WME (title)}\label{wme-title}}
>
> en lettres capitales, soigneusement imprimées au pochoir\,:
>
> --- «\,Crétins\,!\,» murmura-t-il.
>
> \end{document}`
>
> *Customization* is minimal: old style numbers (proportional) and 
> one-sided since the document is not destined for hard-copy printing…
>
> What I have in mind at this point to try and figure out what is happening 
> is to work with a one line .md source that has some U+202F's and remove 
> default packages until the problem goes away but before I do this I thought 
> maybe someone has run into something similar or might suggest a better 
> approach than plain trial and error to help determine the cause of the 
> problem.
>
> Thoughts?
>
> Thanks,
>
> CJ
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5f3b2ff3-b74f-4ba5-858b-b08b13124190%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8434 bytes --]

  parent reply	other threads:[~2020-02-03  2:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-01 19:18 Chris Jones
     [not found] ` <818817e7-17c7-4bf4-b9fb-e300f6faaf37-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-02-02 22:36   ` Chris Jones
     [not found]     ` <158fd0ac-89bc-4fb1-9920-386bf325dad6-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-02-02 23:49       ` John MacFarlane
2020-02-03  2:24   ` Chris Jones [this message]
2020-02-03  4:12   ` Chris Jones
     [not found]     ` <561d210b-ceb6-4f9a-98e2-556f8e12e2ca-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-02-03 17:02       ` John MacFarlane
2020-02-04  2:04   ` Chris Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f3b2ff3-b74f-4ba5-858b-b08b13124190@googlegroups.com \
    --to=cjns1989-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).