From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17800 Path: news.gmane.org!.POSTED!not-for-mail From: Lyndon Drake Newsgroups: gmane.text.pandoc Subject: Re: Going round in circles with latex output Date: Fri, 9 Jun 2017 01:02:48 -0700 (PDT) Message-ID: References: <89122680-f883-4853-a97f-a81861395b78@googlegroups.com> <99f9330e-3a82-4a0d-8bb4-4ec2513723fe@googlegroups.com> <5cd094b1-ccba-4c6c-a83b-d0fa2ebf39ba@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_722_1720013796.1496995368967" X-Trace: blaine.gmane.org 1496995372 32111 195.159.176.226 (9 Jun 2017 08:02:52 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 9 Jun 2017 08:02:52 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCJKJO4E2ICRBKNM5HEQKGQEYH4XGOI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Jun 09 10:02:45 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-yb0-f184.google.com ([209.85.213.184]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dJEsX-0007uN-A3 for gtp-pandoc-discuss@m.gmane.org; Fri, 09 Jun 2017 10:02:45 +0200 Original-Received: by mail-yb0-f184.google.com with SMTP id 202sf4433787ybd.0 for ; Fri, 09 Jun 2017 01:02:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=GWl7ZKD7YUlc45uXyYvSYZJQXRHSSJUWh7mD8XCOAh4=; b=sLYzEjBi91gsV3Y8gMXaEESsz/K0JtRfWLDW89eaV6xA1Odj6ILuIUCA+zW+2OkWjy wRWNTUEvW4xrNtNRDPAInMJxDFF+Z2/KkpM5Z8YpXf51EuhFKSeSsOw4pFzwKEVxjMA4 Ey9ZdG523i59yxY2CIltT/cJ4+tVPeOiNBfO1LERctS2FMnXmE+8S83mQV/f5I0SImZG ko4QljI90he+QIkc4Xui9xL7TGA/6fFpJUDjafh5n8eilEzizwNcYtn55qBSn2Ce0fN7 mk8meeq6KJxt1oxE7CCVY/FJ/OAUja7moRTvc3TlN1O1TJQLFCa+wL+HTysoWaa6/0cY sHJw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arotau-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=GWl7ZKD7YUlc45uXyYvSYZJQXRHSSJUWh7mD8XCOAh4=; b=fCQ++iBEpDr6VltoozkUNkaJFM9VklJ0FAP/zvseg7A8enMnENE8HCZKoPTYD1SXex UPxGyk8/p1hKqHg0ORBa3nvbdDVqiJ+DHPN7/hiYETlhGRk5koiQ0Fn3s9YCmreRTIaK dd6BXCZ8CWsxBnACjF2+BjoqtZsKMwsbnR4wL/AxEODE+FQ3BMhteJD4zMk3nmYQYsVh Db1nxdY2g/tcHa1EUeFE0x0kGyjs29LA9Wzlew2aUCPEoqpLEhxjjNpXZ08wcwb5elJ9 DDc+hauPWR1vjimnj5UunI8uAD3vv9UmnPiCSwG1x9Ds7cJO4DiszYxyHXHhuL9777pA FOnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=GWl7ZKD7YUlc45uXyYvSYZJQXRHSSJUWh7mD8XCOAh4=; b=IiJYZzZk2S+HRzMXb2y8wwG3GSht5U0dEcx0k7Aoo0fAHIITE6eDZ9WDUomsenBIM2 435Crqku2WZkuqJHPgu0W3rEfatRE/G9jc5YSOui0zIq7jvHm5vgZzPGCDFi0wGFt68W 7WBJbR4tI26V0NW0OnhNVEoIMRdul9BpTxejaMcafGoKYu1eV4LwOFOGCv87M5R2/CKL qIVIHwlIe0vVKQVrn01ftD5xrLlz+UfnvATto9menHJwD1Ye+DfnuSZGCThovI1llPLl p6fMZU6r78dNeJJZ/fmRpjnmD8+DgMSKH71ljBisKJfGEz6JwXNk4UKACmND8CRjlo49 dmww== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AODbwcASXPyEXSZsOW33eL4CjRN1//wxNbDeVBGv55o8/M2fvnN9IAVi BN8GsujhwsM8VQ== X-Received: by 10.157.37.78 with SMTP id j14mr956787otd.12.1496995370611; Fri, 09 Jun 2017 01:02:50 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.15.39 with SMTP id 36ls4669836ott.40.gmail; Fri, 09 Jun 2017 01:02:49 -0700 (PDT) X-Received: by 10.157.49.2 with SMTP id e2mr727144otc.19.1496995369562; Fri, 09 Jun 2017 01:02:49 -0700 (PDT) In-Reply-To: X-Original-Sender: lyndon-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17800 Archived-At: ------=_Part_722_1720013796.1496995368967 Content-Type: multipart/alternative; boundary="----=_Part_723_121369335.1496995368969" ------=_Part_723_121369335.1496995368969 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I did once know Haskell, but these days it makes my head hurt a bit. I'll= =20 have a go later this summer once exams are out of the way. I think it's=20 feasible (partly because I've seen it implemented elsewhere) but I could=20 easily be wrong, in which case I'll fall back on the marked spans. Your=20 filter does at least make them minimally invasive. I just don't like the=20 way it clutters up the text-only view with markup, which is the number one= =20 reason I'm trying to use Markdown instead of just using LaTeX directly. Thanks again for the help. On Thursday, June 8, 2017 at 6:20:11 PM UTC+1, BP wrote: > > The marked spans approach suits me well as in my nook of the woods=20 > everything but Greek and modern Cyrillic languages is usually cited in=20 > romanization. > > Setting the language based on Unicode ranges with a Pandoc filter is=20 > probably rather hard, since Pandoc splits text content into lists of=20 > alternating Str elements, containing the non-whitespace parts, and Space= =20 > and LineBreak elements representing the whitespace parts. You will need t= o=20 > locate all places in the AST where there is such a list, step through the= =20 > list looking for Str elements containing characters from one of the scrip= ts=20 > you are interested in and enclose sequences of alternating script charact= er=20 > Str elements and whitespace elements in Span elements with appropriate=20 > attributes or raw markup elements. It is possible that this might be done= =20 > more or less efficiently with Haskell -- I don't know. If you like me don= 't=20 > know Haskell it is going to be tricky. If you also want the resulting=20 > markup to look pretty you also need to take such things as embedded=20 > emphasis elements into consideration, not to speak of Str elements with= =20 > mixed scripts if any. I tried a poor man's substitute once, locating Gree= k=20 > portions in Markdown source with a regular expression, but it was too har= d=20 > to handle punctuation sufficiently elegantly (Greek portions starting and= =20 > ending with punctuation in particular). > > You mentioned the ucharclasses package which may be helpful when producin= g=20 > PDF. My experience is again that punctuation inside an otherscript portio= n=20 > is problematic, at least if the other script uses punctuation from genera= l=20 > punctuation and/or ASCII, and it still leaves you out of luck when=20 > producing HTML. It is still best practice to wrap portions with other=20 > directionality in elements with a dir attribute, not to mention making su= re=20 > that portions in other languages are put in divs with a lang tag, not lea= st=20 > for the benefit of those using assistive technologies. > > Den 8 jun 2017 12:14 skrev "Lyndon Drake" > >: > > That's very helpful, thank you! > > I haven't actually rewritten the LaTeX template preamble, just moved=20 > things around a bit, retaining all the Pandoc variables but making the=20 > package load order robust. I think I'll probably slightly update the=20 > template, for two more things: > > 1. it's nice to have access to two places for header insertions, one near= =20 > the start and the other near the end of the header material once all the= =20 > packages have been loaded; > > 2. the memoir class works best for books when there are \frontmatter,=20 > \mainmatter, and \endmatter calls in the document, so I will add options= =20 > and a test to allow those. > > I like the span syntax you've got, though in the medium term I want to=20 > work on an automatic filter for setting the language based on Unicode=20 > ranges and just use spans to cover unusual cases (although in fact, there= =20 > is almost no case that cannot be covered by the use of Unicode=20 > direction-breaking marks or spaces, which have the advantage of working= =20 > without any markup in HTML in modern browsers with font fallback=20 > mechanisms). > > > On Wednesday, June 7, 2017 at 5:23:54 PM UTC+1, BP Jonsson wrote: >> >> Den 2017-06-06 kl. 15:26, skrev Lyndon Drake:=20 >> > Thanks for this. I'd come to the conclusion that writing a latex file= =20 >> and=20 >> > including the fragments that pandoc generates might be the way forward= ,=20 >> but=20 >> > I'm also curious to know what I've been doing wrong.=20 >> >> You don't need to write the whole preamble by hand, just the part=20 >> where you load and configure polyglossia and define the fonts=20 >> needed for polyglossia.=20 >> put them in a file called for example `poly.ltx` and then run=20 >> Pandoc with=20 >> >> ````=20 >> pandoc -H poly.ltx --latex-engine=3Dxelatex=20 >> ````=20 >> >> I *think* this will also make the bidi bug go away. The=20 >> polyglossia package loads the bidi package if needed, but bidi=20 >> wants to be loaded after a lot of other packages which it performs=20 >> keyhole surgery on, including longtable and even hyperref. However=20 >> Pandoc's latex template, loads polyglossia quite early,=20 >> alternatively to loading babel. There may be no other way to fix=20 >> that than to use a custom template where polyglossia is loaded=20 >> quite late, perhaps even after the header-includes, lest the=20 >> latter also load some package which bidi wants to be loaded before=20 >> itself. I have made such a template=20 >> ()=20 >> If it solves the problem please let me know and I'll make a pull=20 >> request for the change.=20 >> >> My custom template also includes my fontspec hack which lets you=20 >> declare font families in your metadata like this:=20 >> >> ````=20 >> font-families:=20 >> - name: '\font'=20 >> font: =20 >> options:=20 >> - =3D''=20 >> - name: '\greekfont'=20 >> font: GFS Neohellenic=20 >> options:=20 >> - Language=3DGreek=20 >> - Script=3DGreek=20 >> - Scale=3DMatchLowercase=20 >> - Ligatures=3DTeX=20 >> - name: '\sanskritfont'=20 >> font: Sahadeva=20 >> options:=20 >> - Language=3DSanskrit=20 >> - Script=3DDevanagari=20 >> - name: '\myfancyfont'=20 >> font: My Fancy=20 >> ````=20 >> >> >> >=20 >> > No rush of course, but I'm keen to have a look at your filter and see= =20 >> what=20 >> > it does,=20 >> >> It is now documented and uploaded:=20 >> >> =20 >> >> (Scroll down for the rendered documentation. The first code block=20 >> should suffice to understand how it works.)=20 >> >> It takes some initial configuration but that should be reusable by=20 >> including a separate YAML file on the command line with the actual=20 >> document.=20 >> >> >> >> >> even without docs. I've also found another filter on the list back=20 >> > in 2014 from Jesse Rosenthal that looks at Unicode ranges and wraps=20 >> them in=20 >> > a latex environment, which seems like a good idea (I've done this kind= =20 >> of=20 >> > thing in InDesign grep styles and it works well for most normal bits o= f=20 >> > text).=20 >> >=20 >> > I found the lang/otherlangs documentation, but couldn't figure out fro= m=20 >> the=20 >> > manual (might just be overlooking the correct bit) how to set a div or= =20 >> a=20 >> > span for another language.=20 >> >=20 >> > Part of the problem is that if I set lang and otherlangs as follows:= =20 >> >=20 >> > lang: en-GB=20 >> > otherlangs: [he, sy]=20 >> > =20 >> > I get this:=20 >> >=20 >> > ! Package bidi Error: Oops! you have loaded package longtable after=20 >> bidi=20 >> > packag=20 >> >=20 >> > e. Please load package longtable before bidi package, and then try to= =20 >> run=20 >> > xelat=20 >> >=20 >> > ex on your document again.=20 >> >=20 >> >=20 >> > See the bidi package documentation for explanation.=20 >> >=20 >> > Type H for immediate help.=20 >> >=20 >> > ...=20 >> >=20 >> > =20 >> >=20 >> > l.72 \begin{document}=20 >> >=20 >> >=20 >> > pandoc: Error producing PDF=20 >> >=20 >> >=20 >> > which I guess means that some kind of strange interaction in the latex= =20 >> > template is producing an undesirable latex file to feed to xelatex=20 >> (maybe=20 >> > pandoc-csv2table is doing something to the produced latex?). But it=20 >> kind of=20 >> > put a stop to me experimenting with the spans and divs.=20 >> >=20 >> > Best,=20 >> > Lyndon=20 >> >=20 >> > On Tuesday, June 6, 2017 at 1:08:48 PM UTC+1, BPJ wrote:=20 >> >>=20 >> >> You need to use the lang and otherlang variables as described in the= =20 >> >> manual http://pandoc.org/MANUAL if I recall correctly.=20 >> >>=20 >> >> Alternatively/additionally write a latex file containing a preamble= =20 >> >> fragment where you load polyglossia and any languages and fonts you= =20 >> need=20 >> >> with the options you need in the usual polyglossia/fontspec way and= =20 >> include=20 >> >> it with the -H option. You also need to mark spans/divs containing=20 >> extra=20 >> >> languages with lang and dir attributes as appropriate. Use your=20 >> browser's=20 >> >> page search function to find these terms in the manual.=20 >> >>=20 >> >> I saw your other question about font/language switching yesterday and= =20 >> >> started to write some documentation for the filter I use to make thos= e=20 >> >> things easier. Alas I couldn't finish and today there is a national= =20 >> holiday=20 >> >> in Sweden. I'll get back to it tomorrow. Basically you can use spans= =20 >> with a=20 >> >> single short class like .g for greek and the filter will inject late= x=20 >> >> markup, docx custom style names or extended (html) attributes you hav= e=20 >> >> declared to correspond to the class in your metadata.=20 >> >>=20 >> >> I can comfort you that you are much better off than I was when I=20 >> started=20 >> >> doing multilingual work with Pandoc. We had no filters, no native=20 >> spans or=20 >> >> divs and no built-in multilingual/polyglossia support back then.=20 >> Everything=20 >> >> had to be done in -H files and with raw latex in the markdown, which= =20 >> was a=20 >> >> pain because I needed to make things available in HTML as well.=20 >> >>=20 >> >> I'll also update my latex template on github which contains some stuf= f=20 >> for=20 >> >> fontspec font loading.=20 >> >>=20 >> >> I hope this helps. I'm afraid I won't be able to check my mail for th= e=20 >> >> rest of the day.=20 >> >>=20 >> >>=20 >> >> tis 6 juni 2017 kl. 09:02 skrev Lyndon Drake > >> >:=20 >> >>=20 >> >>> Sorry, I probably wasn't clear: I followed the instruction from=20 >> Pandoc=20 >> >>> and switched to xelatex. Now I'm stuck trying to configure the=20 >> language=20 >> >>> options.=20 >> >>>=20 >> >>>=20 >> >>> On Tuesday, June 6, 2017 at 7:41:32 AM UTC+1, BP wrote:=20 >> >>>=20 >> >>>> You need the --latex-engine=3Dxelatex option.=20 >> >>>>=20 >> >>>> tis 6 juni 2017 kl. 07:53 skrev Lyndon Drake := =20 >> >>>>=20 >> >>> Hi all,=20 >> >>>>>=20 >> >>>>> Many apologies as I'm sure this is all obvious once one knows, but= =20 >> I'm=20 >> >>>>> a bit stuck. I've got some Pandoc Markdown files which I'm trying= =20 >> to=20 >> >>>>> convert to PDF using Pandoc. They include various non-ascii=20 >> characters, all=20 >> >>>>> in unicode. If I run:=20 >> >>>>>=20 >> >>>>> /usr/local/bin/pandoc -f=20 >> >>>>> markdown+pipe_tables+grid_tables+yaml_metadata_block --filter=20 >> >>>>> pandoc-citeproc --filter pandoc-csv2table -s -o=20 >> formatted/Draft3.pdf=20 >> >>>>> text/metadata.yaml text/1-Introduction.md=20 >> >>>>>=20 >> >>>>> I get the following:=20 >> >>>>>=20 >> >>>>> ! Package inputenc Error: Unicode char =E1=B9=A3 (U+1E63)=20 >> >>>>>=20 >> >>>>> (inputenc) not set up for use with LaTeX.=20 >> >>>>>=20 >> >>>>>=20 >> >>>>> See the inputenc package documentation for explanation.=20 >> >>>>>=20 >> >>>>> Type H for immediate help.=20 >> >>>>>=20 >> >>>>> ...=20 >> >>>>>=20 >> >>>>> =20 >> >>>>>=20 >> >>>>> l.125 Vandenhoeck \& Ruprecht, 1990), 39--62.}=20 >> >>>>>=20 >> >>>>>=20 >> >>>>> Try running pandoc with --latex-engine=3Dxelatex.=20 >> >>>>>=20 >> >>>>> pandoc: Error producing PDF=20 >> >>>>>=20 >> >>>>>=20 >> >>>>> So the next step was to switch to xelatex based on the helpful=20 >> >>>>> suggestion from pandoc. As long as I don't try to use any babel or= =20 >> >>>>> polyglossia environments, or biblatex, this works fine. But as I= =20 >> want to=20 >> >>>>> use both, I'm a bit stuck. First thing is that it looks like the= =20 >> default=20 >> >>>>> template tries to use babel rather than polyglossia if xetex is th= e=20 >> engine.=20 >> >>>>> Is there a reason for this? (I want to use the biblatex-sbl style= =20 >> for my=20 >> >>>>> bibliography, and they recommend polyglossia.)=20 >> >>>>>=20 >> >>>>> I want to use English (UK) as my main language, with Hebrew and=20 >> Syriac=20 >> >>>>> as other languages (I've also got some ancient Greek, but the main= =20 >> font=20 >> >>>>> I've chosen works fine and the output looks good for that without= =20 >> using a=20 >> >>>>> separate language environment).=20 >> >>>>>=20 >> >>>>> As a starting point, what language options do I set in my YAML=20 >> metadata=20 >> >>>>> to enable those other two language environments, and how do I=20 >> specify the=20 >> >>>>> fonts for them?=20 >> >>>>>=20 >> >>>>> Here's my YAML metadata file so far:=20 >> >>>>>=20 >> >>>>> ---=20 >> >>>>> author: Lyndon Drake=20 >> >>>>> documentclass: memoir=20 >> >>>>> toc: true=20 >> >>>>> papersize: a4=20 >> >>>>> fontsize: 12pt=20 >> >>>>> top-level-division: chapter=20 >> >>>>> number-sections: true=20 >> >>>>> mainfont: Skolar PE Light=20 >> >>>>> mainfontoptions: Numbers=3DOldStyle=20 >> >>>>> bibliography:=20 >> /Users/lyndon/Documents/Media/Bibliography/0lib.bib=20 >> >>>>> csl:=20 >> >>>>>=20 >> /Users/lyndon/Documents/Media/Bibliography/society-of-biblical-literatur= e-fullnote-bibliography.csl=20 >> >> >>>>> notes-after-punctuation: true=20 >> >>>>> ---=20 >> >>>>>=20 >> >>>>> Many thanks in advance for any help on this,=20 >> >>>>> Lyndon=20 >> >>>>>=20 >> >>>>> --=20 >> >>>>> You received this message because you are subscribed to the Google= =20 >> >>>>> Groups "pandoc-discuss" group.=20 >> >>>>>=20 >> >>>> To unsubscribe from this group and stop receiving emails from it,= =20 >> send=20 >> >>>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >> >>>>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org= =20 >> >>>>=20 >> >>>>=20 >> >>>>> To view this discussion on the web visit=20 >> >>>>>=20 >> https://groups.google.com/d/msgid/pandoc-discuss/89122680-f883-4853-a97f= -a81861395b78%40googlegroups.com=20 >> >>>>> < >> https://groups.google.com/d/msgid/pandoc-discuss/89122680-f883-4853-a97f= -a81861395b78%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter>=20 >> >> >>>>> .=20 >> >>>>> For more options, visit https://groups.google.com/d/optout.=20 >> >>>>>=20 >> >>>> --=20 >> >>> You received this message because you are subscribed to the Google= =20 >> Groups=20 >> >>> "pandoc-discuss" group.=20 >> >>> To unsubscribe from this group and stop receiving emails from it,=20 >> send an=20 >> >>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 >> >>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org=20 >> >>> .=20 >> >>> To view this discussion on the web visit=20 >> >>>=20 >> https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-3a82-4a0d-8bb4= -4ec2513723fe%40googlegroups.com=20 >> >>> < >> https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-3a82-4a0d-8bb4= -4ec2513723fe%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter>=20 >> >> >>> .=20 >> >>> For more options, visit https://groups.google.com/d/optout.=20 >> >>>=20 >> >>=20 >> >=20 >> >> --=20 > You received this message because you are subscribed to the Google Groups= =20 > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= =20 > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org . > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org=20 > . > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/5cd094b1-ccba-4c6c-a83b-= d0fa2ebf39ba%40googlegroups.com=20 > > . > > For more options, visit https://groups.google.com/d/optout. > > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/a5d78535-d049-49c9-a77b-ddfa3b226302%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_723_121369335.1496995368969 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I did once know Haskell, but these days it makes my head h= urt a bit. I'll have a go later this summer once exams are out of the w= ay. I think it's feasible (partly because I've seen it implemented = elsewhere) but I could easily be wrong, in which case I'll fall back on= the marked spans. Your filter does at least make them minimally invasive. = I just don't like the way it clutters up the text-only view with markup= , which is the number one reason I'm trying to use Markdown instead of = just using LaTeX directly.

Thanks again for the help.
On Thursday, June 8, 2017 at 6:20:11 PM UTC+1, BP wrote:
The mark= ed spans approach suits me well as in my nook of the woods everything but G= reek and modern Cyrillic languages is usually cited in romanization.
<= div dir=3D"auto">
Setting the language based on = Unicode ranges with a Pandoc filter is probably rather hard, since Pandoc s= plits text content into lists of alternating Str elements, containing the n= on-whitespace parts, and Space and LineBreak elements representing the whit= espace parts. You will need to locate all places in the AST where there is = such a list, step through the list looking for Str elements containing char= acters from one of the scripts you are interested in and enclose sequences = of alternating script character Str elements and whitespace elements in Spa= n elements with appropriate attributes or raw markup elements. It is possib= le that this might be done more or less efficiently with Haskell -- I don&#= 39;t know. If you like me don't know Haskell it is going to be tricky. = If you also want the resulting markup to look pretty you also need to take = such things as embedded emphasis elements into consideration, not to speak = of Str elements with mixed scripts if any. I tried a poor man's substit= ute once, locating Greek portions in Markdown source with a regular express= ion, but it was too hard to handle punctuation sufficiently elegantly (Gree= k portions starting and ending with punctuation in particular).

You mentioned the ucharclasses pack= age which may be helpful when producing PDF. My experience is again that pu= nctuation inside an otherscript portion is problematic, at least if the oth= er script uses punctuation from general punctuation and/or ASCII, and it st= ill leaves you out of luck when producing HTML. It is still best practice t= o wrap portions with other directionality in elements with a dir attribute,= not to mention making sure that portions in other languages are put in div= s with a lang tag, not least for the benefit of those using assistive techn= ologies.

Den 8 jun 2017 12:1= 4 skrev "Lyndon Drake" <lyn...-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org>:
That's very helpful, thank you!
<= br>
I haven't actually rewritten the LaTeX template preamble,= just moved things around a bit, retaining all the Pandoc variables but mak= ing the package load order robust. I think I'll probably slightly updat= e the template, for two more things:

1. it's nice to= have access to two places for header insertions, one near the start and th= e other near the end of the header material once all the packages have been= loaded;

2. the memoir class works best for books = when there are \frontmatter, \mainmatter, and \endmatter calls in the docum= ent, so I will add options and a test to allow those.

<= div>I like the span syntax you've got, though in the medium term I want= to work on an automatic filter for setting the language based on Unicode r= anges and just use spans to cover unusual cases (although in fact, there is= almost no case that cannot be covered by the use of Unicode direction-brea= king marks or spaces, which have the advantage of working without any marku= p in HTML in modern browsers with font fallback mechanisms).


On= Wednesday, June 7, 2017 at 5:23:54 PM UTC+1, BP Jonsson wrote:
Den 2017-06-06 kl. 15:26, skrev Lyndon Drake:
> Thanks for this. I'd come to the conclusion that writing a lat= ex file and
> including the fragments that pandoc generates might be the way for= ward, but
> I'm also curious to know what I've been doing wrong.

You don't need to write the whole preamble by hand, just the part= =20
where you load and configure polyglossia and define the fonts=20
needed for polyglossia.
put them in a file called for example `poly.ltx` and then run=20
Pandoc with

````
pandoc -H poly.ltx --latex-engine=3Dxelatex
````

I *think* this will also make the bidi bug go away. The=20
polyglossia package loads the bidi package if needed, but bidi=20
wants to be loaded after a lot of other packages which it performs=20
keyhole surgery on, including longtable and even hyperref. However=20
Pandoc's latex template, loads polyglossia quite early,=20
alternatively to loading babel. There may be no other way to fix=20
that than to use a custom template where polyglossia is loaded=20
quite late, perhaps even after the header-includes, lest the=20
latter also load some package which bidi wants to be loaded before=20
itself. I have made such a template=20
(<https://gist.github.com/bpj/5cebc975= 685134145cd74ca8670b1ccc>)=20
If it solves the problem please let me know and I'll make a pull=20
request for the change.

My custom template also includes my fontspec hack which lets you=20
declare font families in your metadata like this:

````
font-families:
=C2=A0 =C2=A0- name: '\<language>font'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0<Font Name>
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- <key>=3D'<value>'
=C2=A0 =C2=A0- name: '\greekfont'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0GFS Neohellenic
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Language=3DGreek
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Script=3DGreek
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Scale=3DMatchLowercase
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Ligatures=3DTeX
=C2=A0 =C2=A0- name: '\sanskritfont'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0Sahadeva
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Language=3DSanskrit
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Script=3DDevanagari
=C2=A0 =C2=A0- name: '\myfancyfont'
=C2=A0 =C2=A0 =C2=A0font: My Fancy
````


>=20
> No rush of course, but I'm keen to have a look at your filter = and see what
> it does,=20

It is now documented and uploaded:

<https://gist.github.com/bpj/02de1ed87= ff8f8d0c31a43b9dcac1c80>

(Scroll down for the rendered documentation. The first code block=20
should suffice to understand how it works.)

It takes some initial configuration but that should be reusable by=20
including a separate YAML file on the command line with the actual=20
document.




even without docs. I've also found another filter on the list back
> in 2014 from Jesse Rosenthal that looks at Unicode ranges and wrap= s them in
> a latex environment, which seems like a good idea (I've done t= his kind of
> thing in InDesign grep styles and it works well for most normal bi= ts of
> text).
>=20
> I found the lang/otherlangs documentation, but couldn't figure= out from the
> manual (might just be overlooking the correct bit) how to set a di= v or a
> span for another language.
>=20
> Part of the problem is that if I set lang and otherlangs as follow= s:
>=20
> =C2=A0 =C2=A0lang: en-GB
> =C2=A0 =C2=A0otherlangs: [he, sy]
> =C2=A0 =C2=A0
> I get this:
>=20
> ! Package bidi Error: Oops! you have loaded package longtable afte= r bidi
> packag
>=20
> e. Please load package longtable before bidi package, and then try= to run
> xelat
>=20
> ex on your document again.
>=20
>=20
> See the bidi package documentation for explanation.
>=20
> Type =C2=A0H <return> =C2=A0for immediate help.
>=20
> =C2=A0 ...
>=20
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>=20
> l.72 \begin{document}
>=20
>=20
> pandoc: Error producing PDF
>=20
>=20
> which I guess means that some kind of strange interaction in the l= atex
> template is producing an undesirable latex file to feed to xelatex= (maybe
> pandoc-csv2table is doing something to the produced latex?). But i= t kind of
> put a stop to me experimenting with the spans and divs.
>=20
> Best,
> Lyndon
>=20
> On Tuesday, June 6, 2017 at 1:08:48 PM UTC+1, BPJ wrote:
>>
>> You need to use the lang and otherlang variables as described = in the
>> manual http://pandoc.org/MANUAL if I recall correctly.
>>
>> Alternatively/additionally write a latex file containing a pre= amble
>> fragment where you load polyglossia and any languages and font= s you need
>> with the options you need in the usual polyglossia/fontspec wa= y and include
>> it with the -H option. You also need to mark spans/divs contai= ning extra
>> languages with lang and dir attributes as appropriate. Use you= r browser's
>> page search function to find these terms in the manual.
>>
>> I saw your other question about font/language switching yester= day and
>> started to write some documentation for the filter I use to ma= ke those
>> things easier. Alas I couldn't finish and today there is a= national holiday
>> in Sweden. I'll get back to it tomorrow. Basically you can= use spans with a
>> single short class like .g for greek and the filter will injec= t =C2=A0latex
>> markup, docx custom style names or extended (html) attributes = you have
>> declared to correspond to the class in your metadata.
>>
>> I can comfort you that you are much better off than I was when= I started
>> doing multilingual work with Pandoc. We had no filters, no nat= ive spans or
>> divs and no built-in multilingual/polyglossia support back the= n. Everything
>> had to be done in -H files and with raw latex in the markdown,= which was a
>> pain because I needed to make things available in HTML as well= .
>>
>> I'll also update my latex template on github which contain= s some stuff for
>> fontspec font loading.
>>
>> I hope this helps. I'm afraid I won't be able to check= my mail for the
>> rest of the day.
>>
>>
>> tis 6 juni 2017 kl. 09:02 skrev Lyndon Drake <lyn...@aro= tau.com
>> <javascript:>>:
>>
>>> Sorry, I probably wasn't clear: I followed the instruc= tion from Pandoc
>>> and switched to xelatex. Now I'm stuck trying to confi= gure the language
>>> options.
>>>
>>>
>>> On Tuesday, June 6, 2017 at 7:41:32 AM UTC+1, BP wrote:
>>>
>>>> You need =C2=A0the --latex-engine=3Dxelatex option.
>>>>
>>>> tis 6 juni 2017 kl. 07:53 skrev Lyndon Drake <ly= n...-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org>:
>>>>
>>> Hi all,
>>>>>
>>>>> Many apologies as I'm sure this is all obvious= once one knows, but I'm
>>>>> a bit stuck. I've got some Pandoc Markdown fil= es which I'm trying to
>>>>> convert to PDF using Pandoc. They include various = non-ascii characters, all
>>>>> in unicode. If I run:
>>>>>
>>>>> /usr/local/bin/pandoc -f
>>>>> markdown+pipe_tables+grid_tables+yaml_metadat= a_block --filter
>>>>> pandoc-citeproc --filter pandoc-csv2table -s -o fo= rmatted/Draft3.pdf
>>>>> text/metadata.yaml text/1-Introduction.md
>>>>>
>>>>> I get the following:
>>>>>
>>>>> ! Package inputenc Error: Unicode char =E1=B9=A3 (= U+1E63)
>>>>>
>>>>> (inputenc) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0not set up for use with LaTeX.
>>>>>
>>>>>
>>>>> See the inputenc package documentation for explana= tion.
>>>>>
>>>>> Type =C2=A0H <return> =C2=A0for immediate he= lp.
>>>>>
>>>>> =C2=A0 ...
>>>>>
>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>>>>>
>>>>> l.125 =C2=A0 Vandenhoeck \& Ruprecht, 1990), 3= 9--62.}
>>>>>
>>>>>
>>>>> Try running pandoc with --latex-engine=3Dxelatex.
>>>>>
>>>>> pandoc: Error producing PDF
>>>>>
>>>>>
>>>>> So the next step was to switch to xelatex based on= the helpful
>>>>> suggestion from pandoc. As long as I don't try= to use any babel or
>>>>> polyglossia environments, or biblatex, this works = fine. But as I want to
>>>>> use both, I'm a bit stuck. First thing is that= it looks like the default
>>>>> template tries to use babel rather than polyglossi= a if xetex is the engine.
>>>>> Is there a reason for this? (I want to use the bib= latex-sbl style for my
>>>>> bibliography, and they recommend polyglossia.)
>>>>>
>>>>> I want to use English (UK) as my main language, wi= th Hebrew and Syriac
>>>>> as other languages (I've also got some ancient= Greek, but the main font
>>>>> I've chosen works fine and the output looks go= od for that without using a
>>>>> separate language environment).
>>>>>
>>>>> As a starting point, what language options do I se= t in my YAML metadata
>>>>> to enable those other two language environments, a= nd how do I specify the
>>>>> fonts for them?
>>>>>
>>>>> Here's my YAML metadata file so far:
>>>>>
>>>>> ---
>>>>> =C2=A0 =C2=A0author: Lyndon Drake
>>>>> =C2=A0 =C2=A0documentclass: memoir
>>>>> =C2=A0 =C2=A0toc: true
>>>>> =C2=A0 =C2=A0papersize: a4
>>>>> =C2=A0 =C2=A0fontsize: 12pt
>>>>> =C2=A0 =C2=A0top-level-division: chapter
>>>>> =C2=A0 =C2=A0number-sections: true
>>>>> =C2=A0 =C2=A0mainfont: Skolar PE Light
>>>>> =C2=A0 =C2=A0mainfontoptions: Numbers=3DOldStyle
>>>>> =C2=A0 =C2=A0bibliography: /Users/lyndon/Documents= /Media/Bibliography/0lib.bib
>>>>> =C2=A0 =C2=A0csl:
>>>>> /Users/lyndon/Documents/Media/Bibliography/so= ciety-of-biblical-literature-fullnote-bibliography.csl
>>>>> =C2=A0 =C2=A0notes-after-punctuation: true
>>>>> ---
>>>>>
>>>>> Many thanks in advance for any help on this,
>>>>> Lyndon
>>>>>
>>>>> --=20
>>>>> You received this message because you are subscrib= ed to the Google
>>>>> Groups "pandoc-discuss" group.
>>>>>
>>>> To unsubscribe from this group and stop receiving emai= ls from it, send
>>>>> an email to pandoc-discus...@googlegroups.= com.
>>>>> To post to this group, send email to pandoc-...= @googlegroups.com.
>>>>
>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/p= andoc-discuss/89122680-f883-4853-a97f-a81861395b78%40googlegroups= .com
>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/8= 9122680-f883-4853-a97f-a81861395b78%40googlegroups.com?utm_medium= =3Demail&utm_source=3Dfooter>
>>>>> .
>>>>> For more options, visit = https://groups.google.com/d/optout.
>>>>>
>>>> --=20
>>> You received this message because you are subscribed to th= e Google Groups
>>> "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails f= rom it, send an
>>> email to pandoc-discus...@googlegroups.com <= ;javascript:>.
>>> To post to this group, send email to pandoc-...@googleg= roups.com
>>> <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-di= scuss/99f9330e-3a82-4a0d-8bb4-4ec2513723fe%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-= 3a82-4a0d-8bb4-4ec2513723fe%40googlegroups.com?utm_medium=3D= email&utm_source=3Dfooter>
>>> .
>>> For more options, visit https://g= roups.google.com/d/optout.
>>>
>>
>=20

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pand= oc-discuss/5cd094b1-ccba-4c6c-a83b-d0fa2ebf39ba%40googlegroups.co= m.

For more options, visit https://groups.google.com/= d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/a5d78535-d049-49c9-a77b-ddfa3b226302%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_723_121369335.1496995368969-- ------=_Part_722_1720013796.1496995368967--