From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32321 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Frederik Elwert' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: LaTeX: Individual words in Chinese script Date: Fri, 10 Mar 2023 14:34:17 -0800 (PST) Message-ID: <4b263b7a-edbb-4939-86a7-21ca9fb1b8d1n@googlegroups.com> References: <667077D3-D8A5-4B6A-9253-F3F569439ADA@gmail.com> <2163FD26-E90E-472A-A94E-D20247FC7A9C@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3846_871694871.1678487657670" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7939"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCCLDG7XVIHBB27AV2QAMGQE67S5DRQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Mar 10 23:34:22 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qv1-f61.google.com ([209.85.219.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1palJe-0001oo-6e for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 10 Mar 2023 23:34:22 +0100 Original-Received: by mail-qv1-f61.google.com with SMTP id px25-20020a056214051900b0056f0794632bsf3690735qvb.18 for ; Fri, 10 Mar 2023 14:34:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; t=1678487661; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=FY289uWpP4QEYknj9LAfjZiru0Y1FgsGYFyFwhlisuA=; b=DnEJ51wStoetulCFrAyIRBk/tSc9kTih4bsRhLNyS6UgASII1bg2Afd4WLMD/gQrzH OXF6u8uKLIoJuyTkk57bxakRgpcylKIIv7YL5/t07U5OC3+i92pmIfQnNLmVA+uquMPg f8H9x6jFUy0N/o5vRU5gQKA9Y7ifgab1/cUClqrs67YhaZGSdvGsAbHbyi5ezhTd4JE4 pTv6MBIfLeWz6AZkKkhT6c9YIKbzbz6wV7PpORijzxKLjotsGXBtBc1gqEm+lSHKzuUN 7HgCXmm+VsBaVHYf/Kd7D7MtiqxuIJHnI8WfcVLLxMoYTE9AHHnbpkx4TWsaxo/4YXb0 04bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678487661; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=FY289uWpP4QEYknj9LAfjZiru0Y1FgsGYFyFwhlisuA=; b=5/uunHSc8M0m+upiahlD7kUEPkfSMNG1pfwt3f2qwdgKtWsQGtOYmYUeafVWmCB2XR VZFQv/3J1rXnNzJC1ih87W09EtMnPSve0Nh/ZXp+hAueGPT6Z84KMT1dcL4lrZsrWhmp 0Z6INUiObh+HNn/tb7PTx88izocz33oNHr7wzr1jDoqBeBElMxaLmVoqjLZb99F8YGe9 g6ROd8qZHMx6xWE9gNrxwPb+anl+1DCW/dp/W+WV7J/8lv3MxPl2wz1n+6V2/wzMruJL VuuEK8vqz0tCY8+mG/AH/VETWuqN27xpqbFWdx5tRoe7byPjLOfzfEfKo5v0nfEgaSDz XgGg= X-Gm-Message-State: AO0yUKVwcCW91OUN/Fxww4fN1nqsrKPYXYh1fPDLRLZgTlha7YJ2Q9c6 DOgxjNkoVPEO0UdRCXDVu8M= X-Google-Smtp-Source: AK7set//qR6s6qmnLXE8YtaHLWcPvvsOfyo41VvQo6Zb1NmHz1OoNCdQBecKrfuBKmu7Fkv0aIaJ1A== X-Received: by 2002:ad4:4f50:0:b0:56f:795:82cd with SMTP id eu16-20020ad44f50000000b0056f079582cdmr141689qvb.10.1678487661091; Fri, 10 Mar 2023 14:34:21 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:620a:6009:b0:725:3d45:f2bb with SMTP id dw9-20020a05620a600900b007253d45f2bbls1742691qkb.7.-pod-prod-gmail; Fri, 10 Mar 2023 14:34:18 -0800 (PST) X-Received: by 2002:a05:620a:1349:b0:742:876c:7786 with SMTP id c9-20020a05620a134900b00742876c7786mr1074782qkl.7.1678487658351; Fri, 10 Mar 2023 14:34:18 -0800 (PST) In-Reply-To: <2163FD26-E90E-472A-A94E-D20247FC7A9C-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: frederik.elwert-lnqqDxZ76pI@public.gmane.org X-Original-From: Frederik Elwert Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32321 Archived-At: ------=_Part_3846_871694871.1678487657670 Content-Type: multipart/alternative; boundary="----=_Part_3847_774050082.1678487657670" ------=_Part_3847_774050082.1678487657670 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Wow, thank you so much! John MacFarlane schrieb am Freitag, 10. M=C3=A4rz 2023 um 23:20:35 UTC+1: > I=E2=80=99ve updated pandoc with both the changes you suggested. chinese= is now=20 > used with babel, and the template now includes babelfonts. With the new= =20 > code you should be able to make this work without a filter. > > On Mar 10, 2023, at 12:32 PM, 'Frederik Elwert' via pandoc-discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > > Thanks for the kind reply! > > If I see it correctly, modern babel works fine with Chinese the same way= =20 > as other languages. I wrote a small lua filter that inserts raw=20 > `\foreignlanguage`/`\begin{otherlanguage}` commands for chinese in the sa= me=20 > way as pandoc by default does for other languages, and it works just fine= .=20 > So maybe the best solution would be to simply add chinese to Lang.hs? > > Also, I modified the template a little bit to support a `babelfonts`=20 > metadata structure. That allows to specify custom fonts for other scripts= .=20 > Would something like this be acceptable for the default template? > > You can see my solution in this gist:=20 > https://gist.github.com/frederik-elwert/fb1ab57bf88fa2c6dbd7253958b64014 > > An example patch for the template is attached. > > If someone sees any issues with this approach, I=E2=80=99m glad if they c= ould=20 > point them out. I don=E2=80=99t write Chinese myself, so I don=E2=80=99t = know if there are=20 > subtle issues that I=E2=80=99m not aware of. > > > > > > John MacFarlane schrieb am Freitag, 10. M=C3=A4rz 2023 um 19:17:56 UTC+1: > >> Obviously, it is important to support this kind of thing. Can someone=20 >> else who uses pandoc with multiple scripts comment on how you handle thi= s? >> =20 >> >> One solution I see at=20 >> >> https://tex.stackexchange.com/questions/165197/russian-and-chinese-in-th= e-same-document >> =20 >> is=20 >> >> \usepackage[encapsulated]{CJK}=20 >> >> and then=20 >> >> \begin{CJK}{UTF8}{gbsn}=20 >> =E4=BD=A0=E5=A5=BD=20 >> \end{CJK}=20 >> >> Perhaps you could use a Lua filter to convert [=E4=BD=A0=E5=A5=BD]{lang= =3Dzh} to this form? >> =20 >> >> >> >> >> >> > On Mar 10, 2023, at 2:35 AM, 'Frederik Elwert' via pandoc-discuss < >> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:=20 >> >=20 >> > Hi!=20 >> >=20 >> >=20 >> > We are typesetting a lot of academic texts which are mainly in English= ,=20 >> but contain words in other languages/scripts. At the moment, I need to= =20 >> produce an article that contains individual words in traditional Chinese= . >> =20 >> >=20 >> > With pandoc, I currently face two issues:=20 >> >=20 >> > First, chinese is not defined in Writers/LaTeX/Lang.hs. This means=20 >> that, in contrast to other scripts, I cannot use `[=E8=A8=80]{lang=3D"zh= -Hant"}`. For=20 >> other languages that are defined, the LaTeX writer produces=20 >> `\foreignlanguage{}{}` commands, but in this case it produces only `{=E8= =A8=80}`.=20 >> Thus, the only possibility is to resort to `CJKmainfont`. The issue is t= hat=20 >> this switches on xeCJK for the whole document, which has undesirable sid= e=20 >> effects (mainly concerning spacing around punctuation characters). Thus,= I=20 >> have to introduce manual `\makexeCJKactive`/`\makexeCJKinactive` command= s=20 >> in my markdown source. For short passages of Chinese, it is actually=20 >> desirable not to use xeCJK, but simply babel, as with other scripts.=20 >> >=20 >> > Second, I do not see a way to define fonts for other scripts. The=20 >> default LaTeX template calls `\babelprovide` for used fonts, but does no= t=20 >> allow to define corresponding `\babelfont`s. With the old polyglossia=20 >> system, IIRC, it was possible to use the `fontfamilies` metadata entry t= o=20 >> define `\chinesefont` etc. for additional scripts. But this does no long= er=20 >> work with babel (if I=E2=80=99m not missing something).=20 >> >=20 >> > So basically my question is twofold: First, how do I define additional= =20 >> fonts for foreign scripts with the new babel system, and second, would i= t=20 >> be possible to support chinese script in the LaTeX writer in the same wa= y=20 >> it supports other scripts?=20 >> > Thanks,=20 >> > Frederik=20 >> >=20 >> > --=20 >> > You received this message because you are subscribed to the Google=20 >> Groups "pandoc-discuss" group.=20 >> > To unsubscribe from this group and stop receiving emails from it, send= =20 >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >> > To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/cadd3e14-3167-447f-9f64= -8158a6613cb6n%40googlegroups.com >> .=20 >> >> > --=20 > You received this message because you are subscribed to the Google Groups= =20 > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= =20 > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/a66a708e-6599-4cab-88a5-= a89f592e8b8en%40googlegroups.com=20 > > . > > > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/4b263b7a-edbb-4939-86a7-21ca9fb1b8d1n%40googlegroups.com. ------=_Part_3847_774050082.1678487657670 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wow, thank you so much!
John MacFarlane schrieb am Freitag, 10. M=C3=A4rz= 2023 um 23:20:35 UTC+1:
I=E2=80=99ve updated pandoc with both the changes you suggested. =C2=A0c= hinese is now used with babel, and the template now includes babelfonts. = =C2=A0With the new code you should be able to make this work without a filt= er.

<= div style=3D"word-wrap:break-word;line-break:after-white-space">
<= blockquote type=3D"cite">
On Mar 10, 2023, at 12:32 PM, 'Frederik E= lwert' via pandoc-discuss <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:

=
Thanks for th= e kind reply!

If I= see it correctly, modern babel works fine with Chinese the same way as oth= er languages. I wrote a small lua filter that inserts raw `\foreignlanguage= `/`\begin{otherlanguage}` commands for chinese in the same way as pandoc by= default does for other languages, and it works just fine. So maybe the bes= t solution would be to simply add chinese to Lang.hs?

Also, I modified the template a little = bit to support a `babelfonts` metadata structure. That allows to specify cu= stom fonts for other scripts. Would something like this be acceptable for t= he default template?


An example patch for the template is attached.

If someone sees any issues with = this approach, I=E2=80=99m glad if they could point them out. I don=E2=80= =99t write Chinese myself, so I don=E2=80=99t know if there are subtle issu= es that I=E2=80=99m not aware of.

<= div style=3D"font-family:Helvetica;font-size:12px;font-style:normal;font-va= riant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start= ;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;te= xt-decoration:none">



John MacFarlane schr= ieb am Freitag, 10. M=C3=A4rz 2023 um 19:17:56 UTC+1:
Obviously, it is important to support this kind of thing. Can someone = else who uses pandoc with multiple scripts comment on how you handle this?<= span>=C2=A0

One solution I see at=C2=A0
http= s://tex.stackexchange.com/questions/165197/russian-and-chinese-in-the-same-= document=C2=A0
is=C2=A0

\usepackage= [encapsulated]{CJK}=C2=A0

and then=C2=A0
\begin{CJK}{UTF8}{gbsn}=C2=A0
=E4=BD=A0=E5=A5=BD=C2=A0
\end{CJK}=C2=A0

Perhaps you could use= a Lua filter to convert [=E4=BD=A0=E5=A5=BD]{lang=3Dzh} to this form?=C2=A0





> On Mar 10, 2023, at 2:35 AM, = 9;Frederik Elwert' via pandoc-discuss <pandoc-..= .@googlegroups.com> wrote:=C2=A0
>=C2=A0
> Hi!=C2=A0
>=C2=A0
>=C2=A0

> We are typesetting a lot of academic texts which are= mainly in English, but contain words in other languages/scripts. At the mo= ment, I need to produce an article that contains individual words in tradit= ional Chinese.=C2=A0
>=C2=A0
> With p= andoc, I currently face two issues:=C2=A0
>=C2=A0<= /span>
> First, chinese is not defined in Writers/LaTeX/Lang.hs. This= means that, in contrast to other scripts, I cannot use `[=E8=A8=80]{lang= =3D"zh-Hant"}`. For other languages that are defined, the LaTeX w= riter produces `\foreignlanguage{}{}` commands, but in this case it produce= s only `{=E8=A8=80}`. Thus, the only possibility is to resort to `CJKmainfo= nt`. The issue is that this switches on xeCJK for the whole document, which= has undesirable side effects (mainly concerning spacing around punctuation= characters). Thus, I have to introduce manual `\makexeCJKactive`/`\makexeC= JKinactive` commands in my markdown source. For short passages of Chinese, = it is actually desirable not to use xeCJK, but simply babel, as with other = scripts.=C2=A0
>=C2=A0
> Second, I do= not see a way to define fonts for other scripts. The default LaTeX templat= e calls `\babelprovide` for used fonts, but does not allow to define corres= ponding `\babelfont`s. With the old polyglossia system, IIRC, it was possib= le to use the `fontfamilies` metadata entry to define `\chinesefont` etc. f= or additional scripts. But this does no longer work with babel (if I=E2=80= =99m not missing something).=C2=A0
>=C2=A0<= br>> So basically my question is twofold: First, how do I define additio= nal fonts for foreign scripts with the new babel system, and second, would = it be possible to support chinese script in the LaTeX writer in the same wa= y it supports other scripts?=C2=A0
> Thanks,=C2=A0=
> Frederik=C2=A0
>=C2=A0
&= gt; --=C2=A0
> You received this message because you are= subscribed to the Google Groups "pandoc-discuss" group.=C2= =A0
> To unsubscribe from this group and stop receiving emails= from it, send an email to=C2=A0pandoc-dis= cus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.=C2=A0
> To view this discus= sion on the web visit=C2=A0https://groups.google.com/d/msgid/pandoc-discuss/cadd3e14-3167-447f-9f6= 4-8158a6613cb6n%40googlegroups.com.=C2=A0


--=C2=A0
You received this me= ssage because you are subscribed to the Google Groups "pandoc-discuss&= quot; group.
To unsubscribe from this group and stop receiving ema= ils from it, send an email to=C2=A0pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/4b263b7a-edbb-4939-86a7-21ca9fb1b8d1n%40googlegroups.= com.
------=_Part_3847_774050082.1678487657670-- ------=_Part_3846_871694871.1678487657670--