From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/13666 Path: news.gmane.org!not-for-mail From: david.pw.smith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Newsgroups: gmane.text.pandoc Subject: Re: ligatures in html Date: Tue, 22 Sep 2015 02:52:58 -0700 (PDT) Message-ID: <874daeba-ced3-4d7d-b2ad-b0178e5a079b@googlegroups.com> References: <7d633ff1-c25d-436c-a66f-9a8456699db6@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_6126_1617270014.1442915578828" X-Trace: ger.gmane.org 1442915584 13991 80.91.229.3 (22 Sep 2015 09:53:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Sep 2015 09:53:04 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCJLPC7UW4DRB66JQSYAKGQEY5LKPVY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Sep 22 11:53:04 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-yk0-f187.google.com ([209.85.160.187]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZeKFx-0001Uz-4S for gtp-pandoc-discuss@m.gmane.org; Tue, 22 Sep 2015 11:53:01 +0200 Original-Received: by ykaf83 with SMTP id f83sf744438yka.0 for ; Tue, 22 Sep 2015 02:53:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=dVaJWZqUsXB+udBoGiY9LHj+k2fBzr77l2tTOzm0+bU=; b=MqHhNdfge+N1RhQrnESF6MdAMAVNqBxv3TWNmPL2K9pvYxtkCbBpUEja9Xyp7hiGvZ 4YdKJG+OaqBCNn7DjGDiPmvVoCZmWDmes5SjW9FGtJc6kR7/WLvxVarU5KiQdlT9/1VE jCmi8+l+QXCi9zBLsha/yB0/cXtM1D2PV9HOFf0d9H3zLQg9cq0LN1NCoZ3L5OEc3iHC mzVlOWWDkrgz6KbmLiuuWpiHbESMzmOHgkHr22KhgYAa7Y7g0JgHoJniO21c2OH9aLtl veywb8UOkKrioV1rkXzuzevWmtHII0AKZCjWF+F16tWyRdmZoOnIJIMmuAMhkIyVy/Ag RMiw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=dVaJWZqUsXB+udBoGiY9LHj+k2fBzr77l2tTOzm0+bU=; b=dbu9CPrfO7m3V+cboXuh7n6r3eFsBRNBUDt06BoAE1EyJiTM+NEp+m1RuvehB59dfO nsQyTJyb4hii0+bM3A9x45M/JOTMwaFhZFm1iPG3EQvqtWo3P7Aa7wqCAcDbjR8bQFWq t3kkDnXrOP5ZvGfTUO6JtFyCwSz8psnWhgoNw/0MSzMcXDQzVTAEaKtF9S6+K1xhkFa8 gOfRoVZmOXvHTl6P0ZxXr9OLnlGz5333cgfUR5yAUBCEI17xbbVDqYlgSQrCazMYqF8F xMq0FsdyqdMQB8negey3kFmkAk8m44r16O6At3tLpoIPXeLD0VTLmfZ7dyG2IO7LKCU/ 9RjA== X-Received: by 10.140.94.115 with SMTP id f106mr161380qge.23.1442915580168; Tue, 22 Sep 2015 02:53:00 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.140.93.1 with SMTP id c1ls3272690qge.11.gmail; Tue, 22 Sep 2015 02:52:59 -0700 (PDT) X-Received: by 10.140.83.231 with SMTP id j94mr162662qgd.40.1442915579369; Tue, 22 Sep 2015 02:52:59 -0700 (PDT) In-Reply-To: <7d633ff1-c25d-436c-a66f-9a8456699db6-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: david.pw.smith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:13666 Archived-At: ------=_Part_6126_1617270014.1442915578828 Content-Type: multipart/alternative; boundary="----=_Part_6127_1475500047.1442915578828" ------=_Part_6127_1475500047.1442915578828 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think the more obvious problem is that LaTeX doesn't handle UTF-8 well.= =20 If UTF-8 is properly supported then you don't need escape characters at=20 all, you can just type the =C3=A6. These examples would work fine in both h= tml=20 and XeTeX/XeLaTeX: echo "... =C3=A6robic" | pandoc -s -f markdown -t html -o test.html echo "... =C3=A6robic" | pandoc -s -f markdown -t latex -o test.tex You can see for yourself with: echo "... =C3=A6robic" | pandoc -s -f markdown --latex-engine=3Dxelatex -o = test.pdf Indeed, for me, Pandoc's default tex template outputs the unicode=20 characters correctly, but this could be because my environment is set up to= =20 use XeLaTeX. Hope that clarifies a bit? On Monday, 21 September 2015 11:57:37 UTC+1, Chris Wright wrote: > > I want to publish a document with an \ae ligature to html and to pdf. The= =20 > latex form "\ae robic" converts to the appropriate form and displays=20 > properly in pdf, but the html just drops the ligature. > > > Simple test case: > > > chriswri$ cat > test.txt > > \ae robic > > chriswri$ more test.txt > > \ae robic > > chriswri$ pandoc -t native test.txt > > [Para [RawInline (Format "tex") "\\ae ",Str "robic"]] > > chriswri$ pandoc -t html test.txt > >

robic

> > > What's the best way around this - write a filter? finding some docs that= =20 > will help? (I've found that ... is automatically converted to an ellipsis= =20 > - so \dots isn't necessary). > > > with thanks > > > Chris > > > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/874daeba-ced3-4d7d-b2ad-b0178e5a079b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_6127_1475500047.1442915578828 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think the more obvious problem is that LaTeX doesn't= handle UTF-8 well. If UTF-8 is properly supported then you don't need = escape characters at all, you can just type the =C3=A6. These examples woul= d work fine in both html and XeTeX/XeLaTeX:

echo "... =C3=A6rob= ic" | pandoc -s -f markdown -t html -o test.html

echo "...= =C3=A6robic" | pandoc -s -f markdown -t latex -o test.tex

You = can see for yourself with:

echo "... =C3=A6robic" | pandoc= -s -f markdown --latex-engine=3Dxelatex -o test.pdf

Indeed, for me,= Pandoc's default tex template outputs the unicode characters correctly= , but this could be because my environment is set up to use XeLaTeX.
Hope that clarifies a bit?




On Monday, 21 September 2015= 11:57:37 UTC+1, Chris Wright wrote:

I want to publish a document with an \ae ligature to html and to pdf. Th= e latex form "\ae robic" converts to the appropriate form and dis= plays properly in pdf, but the html just drops the ligature.


=

Simple test case:


chriswri$ cat > test.txt

\ae robic

chriswri$ more test.txt

\ae robic

chriswri$ pandoc -t native test.txt

[Para [RawInline (Format "tex") "\\ae ",Str &q= uot;robic"]]

chriswri$ pandoc -t html test.txt

<p>robic</p>


What's the best way around this - write a filter? f= inding some docs that will help? (I've found that ... is automatically = converted to an ellipsis =C2=A0- so \dots isn't necessary).

<= p>

with thanks


<= /p>

Chris



--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/874daeba-ced3-4d7d-b2ad-b0178e5a079b%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_6127_1475500047.1442915578828-- ------=_Part_6126_1617270014.1442915578828--