From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32319 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Frederik Elwert' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: LaTeX: Individual words in Chinese script Date: Fri, 10 Mar 2023 12:32:11 -0800 (PST) Message-ID: References: <667077D3-D8A5-4B6A-9253-F3F569439ADA@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_5626_1449444981.1678480331717" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33175"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCCLDG7XVIHBBTNHV2QAMGQESJYZPWA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Mar 10 21:32:17 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qv1-f55.google.com ([209.85.219.55]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pajPU-0008Pn-IL for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 10 Mar 2023 21:32:16 +0100 Original-Received: by mail-qv1-f55.google.com with SMTP id jh21-20020a0562141fd500b0053c23b938a0sf3547946qvb.17 for ; Fri, 10 Mar 2023 12:32:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; t=1678480335; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=Beak/+EQASbZpRt6TyKjRY1/TbL2GDHfzpaT2DXkQ6Q=; b=fbRKyvbbmcFGwizdwCYJur8xDss50Dstzuift/dZBR6B5uVpbsKym5JTmAWCGMLe9J IvghUGBdy/MrUtnqPsUg1Ywa9h3TQRbblk/dsMQssojXUeHMX5G+qUrN8w/RgVhTUO7V H69WPrGxEirMZzZguaDQHEWRQ7vLMp3WlfVCgCmDfXsBUXG26Wob48KcSsN+iBnO3tLf mNO6ASRqri2JINYJfGjcraN+3xVWTL/vFksd+JDK4wGIAzrv1vdmDGWVq1PhxsccNmDu PuAtjg2inoSLoyQDiBs+s2KxRIryfeM6cXaj53GwgtHPG7PTObLUI7ACeXL7tX16v6ed OUIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678480335; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Beak/+EQASbZpRt6TyKjRY1/TbL2GDHfzpaT2DXkQ6Q=; b=FDR9C7EB6WM+BY1VSwcdftdxlqk6fqMscwhNX0l+qOYzzryxNslxlvPyr7B4GkeVxa HSSFkg3VO7aGhrV5XVtLVXeJkFSWEfOJSrI90o/RTTf+cJddaeYd5tThLEOO01uQxcQj E7jwbsTDZ5tvBSBAq9fE+ozvUpjOR0DwLWvNJSdDChVoc674tx600iQnZVM0DymYO40J fA+I6NsFqGskwM73PKZwQHp50f6lff217Ky7eQwWWu/6rmZnT5wcLM5mUCaxovMfy64y WZksX7jWl/YhTsr8szG/azrKXHT4Vjh6dvAPJNtwAcM0ehMVv1RKbZRej0s4q0tQsc07 6ukg= X-Gm-Message-State: AO0yUKXh0jITvc63e4dc4cdrc0I2bdRNN4L/qehYEJJt3afqf22WVJwk iQwoBq6Xw6f+wa0ViBze3o0= X-Google-Smtp-Source: AK7set8rOMiPSouy2WSzzroTy9W5l7W8lixvL3uB18DujI+wdevPYeW4y6px44EB5YFG/yjSCFgrSw== X-Received: by 2002:ac8:1cd:0:b0:3bf:b844:ffc7 with SMTP id b13-20020ac801cd000000b003bfb844ffc7mr7815879qtg.12.1678480335389; Fri, 10 Mar 2023 12:32:15 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6214:528d:b0:56e:a8ad:64c2 with SMTP id kj13-20020a056214528d00b0056ea8ad64c2ls4780931qvb.1.-pod-prod-gmail; Fri, 10 Mar 2023 12:32:12 -0800 (PST) X-Received: by 2002:ad4:58c5:0:b0:570:f21a:2e69 with SMTP id dh5-20020ad458c5000000b00570f21a2e69mr63601qvb.2.1678480332491; Fri, 10 Mar 2023 12:32:12 -0800 (PST) In-Reply-To: <667077D3-D8A5-4B6A-9253-F3F569439ADA-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: frederik.elwert-lnqqDxZ76pI@public.gmane.org X-Original-From: Frederik Elwert Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32319 Archived-At: ------=_Part_5626_1449444981.1678480331717 Content-Type: multipart/alternative; boundary="----=_Part_5627_1573260530.1678480331717" ------=_Part_5627_1573260530.1678480331717 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the kind reply! If I see it correctly, modern babel works fine with Chinese the same way as= =20 other languages. I wrote a small lua filter that inserts raw=20 `\foreignlanguage`/`\begin{otherlanguage}` commands for chinese in the same= =20 way as pandoc by default does for other languages, and it works just fine.= =20 So maybe the best solution would be to simply add chinese to Lang.hs? Also, I modified the template a little bit to support a `babelfonts`=20 metadata structure. That allows to specify custom fonts for other scripts.= =20 Would something like this be acceptable for the default template? You can see my solution in this gist:=20 https://gist.github.com/frederik-elwert/fb1ab57bf88fa2c6dbd7253958b64014 An example patch for the template is attached. If someone sees any issues with this approach, I=E2=80=99m glad if they cou= ld point=20 them out. I don=E2=80=99t write Chinese myself, so I don=E2=80=99t know if = there are subtle=20 issues that I=E2=80=99m not aware of. John MacFarlane schrieb am Freitag, 10. M=C3=A4rz 2023 um 19:17:56 UTC+1: > Obviously, it is important to support this kind of thing. Can someone els= e=20 > who uses pandoc with multiple scripts comment on how you handle this? > > One solution I see at > > https://tex.stackexchange.com/questions/165197/russian-and-chinese-in-the= -same-document > is=20 > > \usepackage[encapsulated]{CJK} > > and then > > \begin{CJK}{UTF8}{gbsn} > =E4=BD=A0=E5=A5=BD > \end{CJK} > > Perhaps you could use a Lua filter to convert [=E4=BD=A0=E5=A5=BD]{lang= =3Dzh} to this form? > > > > > > > On Mar 10, 2023, at 2:35 AM, 'Frederik Elwert' via pandoc-discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > >=20 > > Hi! > >=20 > >=20 > > We are typesetting a lot of academic texts which are mainly in English,= =20 > but contain words in other languages/scripts. At the moment, I need to=20 > produce an article that contains individual words in traditional Chinese. > >=20 > > With pandoc, I currently face two issues: > >=20 > > First, chinese is not defined in Writers/LaTeX/Lang.hs. This means that= ,=20 > in contrast to other scripts, I cannot use `[=E8=A8=80]{lang=3D"zh-Hant"}= `. For other=20 > languages that are defined, the LaTeX writer produces=20 > `\foreignlanguage{}{}` commands, but in this case it produces only `{=E8= =A8=80}`.=20 > Thus, the only possibility is to resort to `CJKmainfont`. The issue is th= at=20 > this switches on xeCJK for the whole document, which has undesirable side= =20 > effects (mainly concerning spacing around punctuation characters). Thus, = I=20 > have to introduce manual `\makexeCJKactive`/`\makexeCJKinactive` commands= =20 > in my markdown source. For short passages of Chinese, it is actually=20 > desirable not to use xeCJK, but simply babel, as with other scripts. > >=20 > > Second, I do not see a way to define fonts for other scripts. The=20 > default LaTeX template calls `\babelprovide` for used fonts, but does not= =20 > allow to define corresponding `\babelfont`s. With the old polyglossia=20 > system, IIRC, it was possible to use the `fontfamilies` metadata entry to= =20 > define `\chinesefont` etc. for additional scripts. But this does no longe= r=20 > work with babel (if I=E2=80=99m not missing something). > >=20 > > So basically my question is twofold: First, how do I define additional= =20 > fonts for foreign scripts with the new babel system, and second, would it= =20 > be possible to support chinese script in the LaTeX writer in the same way= =20 > it supports other scripts? > > Thanks, > > Frederik > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/cadd3e14-3167-447f-9f64-= 8158a6613cb6n%40googlegroups.com > . > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/a66a708e-6599-4cab-88a5-a89f592e8b8en%40googlegroups.com. ------=_Part_5627_1573260530.1678480331717 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the kind reply!

If I see it corr= ectly, modern babel works fine with Chinese the same way as other languages= . I wrote a small lua filter that inserts raw `\foreignlanguage`/`\begin{ot= herlanguage}` commands for chinese in the same way as pandoc by default doe= s for other languages, and it works just fine. So maybe the best solution w= ould be to simply add chinese to Lang.hs?

Also, = I modified the template a little bit to support a `babelfonts` metadata str= ucture. That allows to specify custom fonts for other scripts. Would someth= ing like this be acceptable for the default template?

You can see my solution in this gist: https://gist.github.com/frederi= k-elwert/fb1ab57bf88fa2c6dbd7253958b64014

An exa= mple patch for the template is attached.

If some= one sees any issues with this approach, I=E2=80=99m glad if they could poin= t them out. I don=E2=80=99t write Chinese myself, so I don=E2=80=99t know i= f there are subtle issues that I=E2=80=99m not aware of.




John M= acFarlane schrieb am Freitag, 10. M=C3=A4rz 2023 um 19:17:56 UTC+1:
Obviously, it is imp= ortant to support this kind of thing. Can someone else who uses pandoc wit= h multiple scripts comment on how you handle this?

One solution I see at
https://tex.stackexchange.com/questions/165197/russian-and-chinese-in-= the-same-document
is=20

\usepackage[encapsulated]{CJK}

and then

\begin{CJK}{UTF8}{gbsn}
=E4=BD=A0=E5=A5=BD
\end{CJK}

Perhaps you could use a Lua filter to convert [=E4=BD=A0=E5=A5=BD]{lang= =3Dzh} to this form?





> On Mar 10, 2023, at 2:35 AM, 'Frederik Elwert' via pandoc-= discuss <pandoc-...@googlegro= ups.com> wrote:
>=20
> Hi!
>=20
>=20
> We are typesetting a lot of academic texts which are mainly in Eng= lish, but contain words in other languages/scripts. At the moment, I need t= o produce an article that contains individual words in traditional Chinese.
>=20
> With pandoc, I currently face two issues:
>=20
> First, chinese is not defined in Writers/LaTeX/Lang.hs. This means= that, in contrast to other scripts, I cannot use `[=E8=A8=80]{lang=3D"= ;zh-Hant"}`. For other languages that are defined, the LaTeX writer pr= oduces `\foreignlanguage{}{}` commands, but in this case it produces only `= {=E8=A8=80}`. Thus, the only possibility is to resort to `CJKmainfont`. The= issue is that this switches on xeCJK for the whole document, which has und= esirable side effects (mainly concerning spacing around punctuation charact= ers). Thus, I have to introduce manual `\makexeCJKactive`/`\makexeCJKinacti= ve` commands in my markdown source. For short passages of Chinese, it is ac= tually desirable not to use xeCJK, but simply babel, as with other scripts.
>=20
> Second, I do not see a way to define fonts for other scripts. The = default LaTeX template calls `\babelprovide` for used fonts, but does not a= llow to define corresponding `\babelfont`s. With the old polyglossia system= , IIRC, it was possible to use the `fontfamilies` metadata entry to define = `\chinesefont` etc. for additional scripts. But this does no longer work wi= th babel (if I=E2=80=99m not missing something).
>=20
> So basically my question is twofold: First, how do I define additi= onal fonts for foreign scripts with the new babel system, and second, would= it be possible to support chinese script in the LaTeX writer in the same w= ay it supports other scripts?
> Thanks,
> Frederik
>=20
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cadd3e14-3= 167-447f-9f64-8158a6613cb6n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/a66a708e-6599-4cab-88a5-a89f592e8b8en%40googlegroups.= com.
------=_Part_5627_1573260530.1678480331717-- ------=_Part_5626_1449444981.1678480331717 Content-Type: text/x-patch; charset=US-ASCII; name=babelfont.patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=babelfont.patch X-Attachment-Id: 7d5ec7fd-112b-42f9-ba8b-715508f7fcab Content-ID: <7d5ec7fd-112b-42f9-ba8b-715508f7fcab> --- default.tex 2023-03-10 21:20:49.332547333 +0100 +++ langtest_template.tex 2023-03-10 20:55:38.286983982 +0100 @@ -362,6 +362,9 @@ $for(babel-otherlangs)$ \babelprovide[import]{$babel-otherlangs$} $endfor$ +$for(babelfonts)$ +\babelfont[$babelfonts.language$]{rm}{$babelfonts.font$} +$endfor$ % get rid of language-specific shorthands (see #6817): \let\LanguageShortHands\languageshorthands \def\languageshorthands#1{} ------=_Part_5626_1449444981.1678480331717--