From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/13665 Path: news.gmane.org!not-for-mail From: Chris Wright Newsgroups: gmane.text.pandoc Subject: Re: ligatures in html Date: Mon, 21 Sep 2015 18:03:06 -0700 (PDT) Message-ID: References: <7d633ff1-c25d-436c-a66f-9a8456699db6@googlegroups.com> <20150921205458.GA92420@D25Q40BGFY13.Berkeley.EDU> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_5810_624414008.1442883786906" X-Trace: ger.gmane.org 1442883799 15473 80.91.229.3 (22 Sep 2015 01:03:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Sep 2015 01:03:19 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD7OHMHMQMBBBS6RQKYAKGQEUOJWCNI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Sep 22 03:03:10 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-yk0-f190.google.com ([209.85.160.190]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZeBzB-0002UQ-CU for gtp-pandoc-discuss@m.gmane.org; Tue, 22 Sep 2015 03:03:09 +0200 Original-Received: by ykdg206 with SMTP id g206sf20909157ykd.1 for ; Mon, 21 Sep 2015 18:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=0ZpHsS5p873Hu4vw/qVgf5Ax9zSUFXvH9J8b/UoknEI=; b=UgbHeF/MW4+r7znxmQfMKVW3yB1da6tRPofG09nqhmVahnP76JzVVYS5FL1R2t/yFC hVJgrh26pQqkc81B0NxgRVUo0qq9JWXCQfhrQqZJBK9VU/VMbe4ZYM7s+1r94J6ElmPJ SjOfMa8jI57zHCn4IcEURrcRU+2VSzKnGRdIC02lz2YmtQBZ51UlVex7tTAzhdO5y7zt C8GJrZfihtNy5mdluk5gDokekJxRUyGnUlQ7CFUxBD6Ec/RyDhLW0qi+NbjKCfzTIbCW e+2O+ba1A+Ms+QxcTk+R39ZBl1f3ZQgJ8rnBVFTNJIuufTNtrFzR80Wu1FkAm6ajrqG0 2NOg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=0ZpHsS5p873Hu4vw/qVgf5Ax9zSUFXvH9J8b/UoknEI=; b=X06Pq1aupkDpn+eNJ+33boSr2X+PMX3s0VY1Qrzh920QRcW4ezW+CnS7pdacbpAI5k E++KpG72jmcI0MGC/VYpeB9QdVuSLB2Scu5Qwj6xA2cqTJF9NpyzQzfmxZekgiiKBS5L Oru6H/mO54Mb/5ZfhQNmHRJrDWjenEkYwOd908KJkzOWFbGZTEq/UWnVn49v+lw8ZJx/ /ORLmEk+O95cFQFUzj7q7BD7oYGUU6Xt3RS7+V6JIqjs8Hf/xBLbuFAWZ0zGDbgnPkHm y5W2eXSTX6veJ38HvR0ey1RFirEAibaNcWl000zTtLP/sbYcpe33OvCHW8/daui9sTML YKaw== X-Received: by 10.140.43.230 with SMTP id e93mr152036qga.7.1442883788559; Mon, 21 Sep 2015 18:03:08 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.140.37.114 with SMTP id q105ls2760349qgq.14.gmail; Mon, 21 Sep 2015 18:03:07 -0700 (PDT) X-Received: by 10.140.85.242 with SMTP id n105mr73344qgd.8.1442883787711; Mon, 21 Sep 2015 18:03:07 -0700 (PDT) In-Reply-To: <20150921205458.GA92420-4kKid1p5UN4xFjuZnxJpBp3lxR28IOakuDuwTybUTCk@public.gmane.org> X-Original-Sender: cawright.99-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:13665 Archived-At: ------=_Part_5810_624414008.1442883786906 Content-Type: multipart/alternative; boundary="----=_Part_5811_1622109049.1442883786906" ------=_Part_5811_1622109049.1442883786906 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for the help folks.=20 I don't understand why this would be an issue with the latex reader- that's= =20 a comment about me, not pandoc! - so I apologise for the laborious pace of= =20 this question... If I change the test document to: $ cat test.md ... \ae robe that's three periods then the ligature and look at the native format: $ pandoc -S -f markdown -t native test.md > test.native [Para [Str "\8230",Space,RawInline (Format "tex") "\\ae ",Str "robe"]] so the three periods are converted to the correct ellipsis character,=20 and the ligature is parsed to a RawInline then outputting latex from this native representation: $ pandoc -S -f native -t latex test.native=20 \ldots{} \ae robe so Str "\8230" is converted to \dots, and the ligature is done from=20 RawInline converting the same native to html: $ pandoc -S -f native -t html test.native=20

=E2=80=A6 robe

the Str"\8230" is printed as the single ellipsis character, and the=20 RawInline is dropped. It seems as if the md reader can parse three periods to an ellipsis=20 character, but doesn't have a representation of the ligature that would=20 work in both HTML and LaTex - thought it would work in LuaLaTeX/XeTex if it= =20 outputted the ligature character (e.g. Str"\8230").=20 Might it work if the ligature was recognised as something like: Ligature(ae) was generated in the native format, which could then be=20 converted by whatever writer produced the output format? again, thanks for your patience and help Chris On Tuesday, 22 September 2015 06:55:14 UTC+10, John MacFarlane wrote: Ideally pandoc's latex reader would recognize \ae and=20 convert it to the proper character, so feel free to put=20 an issue on the bug tracker about this.=20 +++ 'Jason Seeley' via pandoc-discuss [Sep 21 15 07:50 ]:=20 > Hello,=20 > Ligatures like \ae are specific to the LaTeX (and thus PDF) writer, so=20 > they don't work in any other formats. Pandoc just passes it through=20 > unchanged. For HTML output, you can use an entity: `Æ` or=20 > `æ`, for upper case or lower case. Another option is to use the=20 > unicode character directly (how you do this depends on your system and=20 > text editor; in Windows hold Alt and type 0230 on the number pad; in=20 > vim type CTRL-K a e; use a character-map app, etc.) This should work=20 > for most output formats. It'll work with LaTeX if you use XeLaTeX or=20 > LuaLaTeX, as those allow unicode input.=20 > Jason=20 > On Monday, September 21, 2015 at 5:57:37 AM UTC-5, Chris Wright wrote:=20 >=20 > I want to publish a document with an \ae ligature to html and to pdf.=20 > The latex form "\ae robic" converts to the appropriate form and=20 > displays properly in pdf, but the html just drops the ligature.=20 >=20 > Simple test case:=20 >=20 > chriswri$ cat > test.txt=20 >=20 > \ae robic=20 >=20 > chriswri$ more test.txt=20 >=20 > \ae robic=20 >=20 > chriswri$ pandoc -t native test.txt=20 >=20 > [Para [RawInline (Format "tex") "\\ae ",Str "robic"]]=20 >=20 > chriswri$ pandoc -t html test.txt=20 >=20 >

robic

=20 >=20 > What's the best way around this - write a filter? finding some docs=20 > that will help? (I've found that ... is automatically converted to an=20 > ellipsis - so \dots isn't necessary).=20 >=20 > with thanks=20 >=20 > Chris=20 >=20 > --=20 > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group.=20 > To unsubscribe from this group and stop receiving emails from it, send=20 > an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 > To post to this group, send email to=20 > [2]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 > To view this discussion on the web visit=20 > [3]https://groups.google.com/d/msgid/pandoc-discuss/bbaae9b2-c139-415f-= =20 > 9063-86a887358b4c%40googlegroups.com.=20 > For more options, visit [4]https://groups.google.com/d/optout.=20 >=20 >References=20 >=20 > 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org =20 > 2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org =20 > 3.=20 https://groups.google.com/d/msgid/pandoc-discuss/bbaae9b2-c139-415f-9063-86= a887358b4c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=3Demail&utm_source=3Dfooter=20 > 4. https://groups.google.com/d/optout=20 --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/ab142d34-ed35-4034-be33-744e955f0329%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_5811_1622109049.1442883786906 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the help folks.
I don't understand why = this would be an issue with the latex reader- that's a comment about me= , not pandoc! - so I apologise for the laborious pace of this question...
If I change the test document to:

$ cat test.md
... \ae ro= be

that's three periods then the ligature

and look at the= native format:

$ pandoc -S -f markdown -t native test.md > test= .native
[Para [Str "\8230",Space,RawInline (Format "tex&= quot;) "\\ae ",Str "robe"]]

so the th= ree periods are converted to the correct ellipsis character, and=C2=A0the l= igature is parsed to a RawInline

then outputting latex from this nat= ive representation:

$ pandoc -S -f native -t latex test.native=C2=A0=
\ldots{} \ae robe

so=C2=A0Str "\8230= " is converted to \dots, and the ligature is done from RawInline
<= br>
converting the same native to html:

$ pandoc -S -f nat= ive -t html test.native
<p>=E2=80=A6 robe </p>
<= br>
the Str"\8230" is printed as the single ellipsis ch= aracter, and the RawInline is dropped.

It seems as= if the md reader can parse three periods to an ellipsis character, but doe= sn't have a representation of the ligature that would work in both HTML= and LaTex - thought it would work in LuaLaTeX/XeTex if it outputted the li= gature character (e.g. Str"\8230").=C2=A0

Might it work if the ligature was recognised as something like:

Ligature(ae) was generated in the native format, which coul= d then be converted by whatever writer produced the output format?

again, =C2=A0thanks for your patience and help
<= br>
Chris



=


On Tuesday, 22 September 2015 06:55:14 UTC+10, John MacFarlane = wrote:
Ideally pandoc's latex reader would recognize \ae and
<= div>convert it to the proper character, so feel free to put
an issue on= the bug tracker about this.

+++ 'Jason Seeley' via pandoc= -discuss [Sep 21 15 07:50 ]:
> Hello,
> Ligatures like \a= e are specific to the LaTeX (and thus PDF) writer, so
> they don&#= 39;t work in any other formats. Pandoc just passes it through
> un= changed. For HTML output, you can use an entity: `&AElig;` or
> = `&aelig;`, for upper case or lower case. Another option is to use the=
> unicode character directly (how you do this depends on your sys= tem and
> text editor; in Windows hold Alt and type 0230 on the nu= mber pad; in
> vim type CTRL-K a e; use a character-map app, etc.)= This should work
> for most output formats. It'll work with L= aTeX if you use XeLaTeX or
> LuaLaTeX, as those allow unicode inpu= t.
> Jason
> On Monday, September 21, 2015 at 5:57:37 AM = UTC-5, Chris Wright wrote:
>
> I want to publish a document= with an \ae ligature to html and to pdf.
> The latex form "\= ae robic" converts to the appropriate form and
> displays pro= perly in pdf, but the html just drops the ligature.
>
> Sim= ple test case:
>
> chriswri$ cat > test.txt
> > \ae robic
>
> chriswri$ more test.txt
>
= > \ae robic
>
> chriswri$ pandoc -t native test.txt >
> [Para [RawInline (Format "tex") "\\ae &quo= t;,Str "robic"]]
>
> chriswri$ pandoc -t html tes= t.txt
>
> <p>robic</p>
>
> What= 's the best way around this - write a filter? finding some docs
>= ; that will help? (I've found that ... is automatically converted to = an
> ellipsis - so \dots isn't necessary).
>
> = with thanks
>
> Chris
>
> --
> Y= ou received this message because you are subscribed to the Google
> = Groups "pandoc-discuss" group.
> To unsubscribe from t= his group and stop receiving emails from it, send
> an email to [1= ]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> = To post to this group, send email to
> [2]= pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the = web visit
> [3]https://groups.google.com/d/msgid/pandoc-disc= uss/bbaae9b2-c139-415f-
> 9063-86a887358b4c%40googlegroups.com.
> For more options, v= isit [4]https://groups.googl= e.com/d/optout.
>
>References
>
> 1. mail= to:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org =
> 2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> 3.
https://groups.google.com/d/msgid/pandoc-discuss/bb= aae9b2-c139-415f-9063-86a887358b4c-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=3Demail&= utm_source=3Dfooter
> 4. https://groups.google.com/d/optout

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/ab142d34-ed35-4034-be33-744e955f0329%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_5811_1622109049.1442883786906-- ------=_Part_5810_624414008.1442883786906--