From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/13667 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Stefan_Bj=C3=B6rk?= Newsgroups: gmane.text.pandoc Subject: Re: ligatures in html Date: Tue, 22 Sep 2015 10:12:31 +0000 Message-ID: References: <7d633ff1-c25d-436c-a66f-9a8456699db6@googlegroups.com> <874daeba-ced3-4d7d-b2ad-b0178e5a079b@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=e89a8f3baebff5185f0520533ccc X-Trace: ger.gmane.org 1442916766 486 80.91.229.3 (22 Sep 2015 10:12:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Sep 2015 10:12:46 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCJY7ZPXUYARBGOTQSYAKGQENZBI6DY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Sep 22 12:12:42 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-la0-f60.google.com ([209.85.215.60]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZeKYz-0005ZE-O2 for gtp-pandoc-discuss@m.gmane.org; Tue, 22 Sep 2015 12:12:41 +0200 Original-Received: by lahg1 with SMTP id g1sf2186465lah.1 for ; Tue, 22 Sep 2015 03:12:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=k+ixhEVcuqzgoR5/jrTQ9lZyNSQKJdNGuR1kvF4Qomw=; b=uzk+d1+DIoMN7xCL4CYZ9RYoGupTAOInYwkQIHaeJUWfohpTEWlghRaPS7zOk/gUJw mBat54+ai8jhFJhCg2PEbfQ/2iFZLutEN61tZ/B7s7TRHHCpLxA8Dy3aiizfrPFWAguU 2x0uA9JhyPGfELxX17N2IW98FUNDEvNf3qYpSzw9widPw4gVBXGQdEyuFffMINiJHxhN kzWntibkTQ/o9sojEYBQslEthhc8L1gCRMNycou/HG7yoVbcEApLerHGRaGdJfqx73X/ F/j78A8FzsCrTStJRDsutP7y8xQ/wspNRVWDl3Q5WPm7QDTu36HMLVRUF5+HrsgVvyd5 1+zg== X-Received: by 10.180.75.100 with SMTP id b4mr72148wiw.8.1442916761366; Tue, 22 Sep 2015 03:12:41 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.180.35.34 with SMTP id e2ls805619wij.34.canary; Tue, 22 Sep 2015 03:12:40 -0700 (PDT) X-Received: by 10.180.87.199 with SMTP id ba7mr2948006wib.5.1442916760782; Tue, 22 Sep 2015 03:12:40 -0700 (PDT) Original-Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com. [2a00:1450:400c:c05::229]) by gmr-mx.google.com with ESMTPS id k7si535878wif.1.2015.09.22.03.12.40 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 03:12:40 -0700 (PDT) Received-SPF: pass (google.com: domain of stefan.bjork-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:400c:c05::229 as permitted sender) client-ip=2a00:1450:400c:c05::229; Original-Received: by mail-wi0-x229.google.com with SMTP id ge5so153254152wic.0 for ; Tue, 22 Sep 2015 03:12:40 -0700 (PDT) X-Received: by 10.180.106.196 with SMTP id gw4mr19827089wib.63.1442916760621; Tue, 22 Sep 2015 03:12:40 -0700 (PDT) In-Reply-To: <874daeba-ced3-4d7d-b2ad-b0178e5a079b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: stefan.bjork-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of stefan.bjork-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:400c:c05::229 as permitted sender) smtp.mailfrom=stefan.bjork-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:13667 Archived-At: --e89a8f3baebff5185f0520533ccc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sorry for cutting in, but I wonder why using TeX macros in markdown in the first place? As Jason and David points out, the '=C3=86' ligature is availa= ble in UTF-8 (and latin-1 as well, I think, since '=C3=A6' is a characther of i= ts own in some Nordic countries), and since all modern variants of TeX, namely XeTeX and LuaTeX, more or less requires UTF-8. By far, the easiest solution should be to simply use the '=C3=A6' character in markdown -- unless there = are some requirements of ascii-only markdown? And yes, pandoc converts three dots '...' to an ellipsis '=E2=80=A6', but '= ...' is not a TeX macro. tis 22 sep. 2015 kl 11:53 skrev : > I think the more obvious problem is that LaTeX doesn't handle UTF-8 well. > If UTF-8 is properly supported then you don't need escape characters at > all, you can just type the =C3=A6. These examples would work fine in both= html > and XeTeX/XeLaTeX: > > echo "... =C3=A6robic" | pandoc -s -f markdown -t html -o test.html > > echo "... =C3=A6robic" | pandoc -s -f markdown -t latex -o test.tex > > You can see for yourself with: > > echo "... =C3=A6robic" | pandoc -s -f markdown --latex-engine=3Dxelatex -= o > test.pdf > > Indeed, for me, Pandoc's default tex template outputs the unicode > characters correctly, but this could be because my environment is set up = to > use XeLaTeX. > > Hope that clarifies a bit? > > > > > > On Monday, 21 September 2015 11:57:37 UTC+1, Chris Wright wrote: >> >> I want to publish a document with an \ae ligature to html and to pdf. Th= e >> latex form "\ae robic" converts to the appropriate form and displays >> properly in pdf, but the html just drops the ligature. >> >> >> Simple test case: >> >> >> chriswri$ cat > test.txt >> >> \ae robic >> >> chriswri$ more test.txt >> >> \ae robic >> >> chriswri$ pandoc -t native test.txt >> >> [Para [RawInline (Format "tex") "\\ae ",Str "robic"]] >> >> chriswri$ pandoc -t html test.txt >> >>

robic

>> >> >> What's the best way around this - write a filter? finding some docs that >> will help? (I've found that ... is automatically converted to an ellipsi= s >> - so \dots isn't necessary). >> >> >> with thanks >> >> >> Chris >> >> >> >> -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/874daeba-ced3-4d7d-b2ad-= b0178e5a079b%40googlegroups.com > > . > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CABsBQU7Qm4W5tnEqXsOz7AW7X%3DBitNP0PWj%2BYo68CiKknnexTA%40ma= il.gmail.com. For more options, visit https://groups.google.com/d/optout. --e89a8f3baebff5185f0520533ccc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sorry for cutting in, but I wonder why using TeX macros in= markdown in the first place? As Jason and David points out, the '=C3= =86' ligature is available in UTF-8 (and latin-1 as well, I think, sinc= e '=C3=A6' is a characther of its own in some Nordic countries), an= d since all modern variants of TeX, namely XeTeX and LuaTeX, more or less r= equires UTF-8. By far, the easiest solution should be to simply use the = 9;=C3=A6' character in markdown -- unless there are some requirements o= f ascii-only markdown?

And yes, pandoc converts three do= ts '...' to an ellipsis '=E2=80=A6', but '...' is n= ot a TeX macro.

= tis 22 sep. 2015 kl 11:53 skrev <david.pw.smith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
I think the more obvious problem is that LaTeX doesn&#= 39;t handle UTF-8 well. If UTF-8 is properly supported then you don't n= eed escape characters at all, you can just type the =C3=A6. These examples = would work fine in both html and XeTeX/XeLaTeX:

echo "... =C3= =A6robic" | pandoc -s -f markdown -t html -o test.html

echo &qu= ot;... =C3=A6robic" | pandoc -s -f markdown -t latex -o test.tex
You can see for yourself with:

echo "... =C3=A6robic" | = pandoc -s -f markdown --latex-engine=3Dxelatex -o test.pdf

Indeed, f= or me, Pandoc's default tex template outputs the unicode characters cor= rectly, but this could be because my environment is set up to use XeLaTeX.<= br>
Hope that clarifies a bit?




On Monday, 21 September 2015 11:57:37 UTC+1, Chris Wright wrote:

I want to publish a document with an \ae ligature to html and to pdf. Th= e latex form "\ae robic" converts to the appropriate form and dis= plays properly in pdf, but the html just drops the ligature.


=

Simple test case:


chriswri$ cat > test.txt

\ae robic

chriswri$ more test.txt

\ae robic

chriswri$ pandoc -t native test.txt

[Para [RawInline (Format "tex") "\\ae ",Str &q= uot;robic"]]

chriswri$ pandoc -t html test.txt

<p>robic</p>


What's the best way around this - write a filter? f= inding some docs that will help? (I've found that ... is automatically = converted to an ellipsis =C2=A0- so \dots isn't necessary).

<= p>

with thanks


<= /p>

Chris



--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https:= //groups.google.com/d/msgid/pandoc-discuss/874daeba-ced3-4d7d-b2ad-b0178e5a= 079b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://gro= ups.google.com/d/msgid/pandoc-discuss/CABsBQU7Qm4W5tnEqXsOz7AW7X%3DBitNP0PW= j%2BYo68CiKknnexTA%40mail.gmail.com.
For more options, visit http= s://groups.google.com/d/optout.
--e89a8f3baebff5185f0520533ccc--