From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17808 Path: news.gmane.org!.POSTED!not-for-mail From: Lyndon Drake Newsgroups: gmane.text.pandoc Subject: Re: Going round in circles with latex output Date: Sat, 10 Jun 2017 01:38:22 -0700 (PDT) Message-ID: References: <89122680-f883-4853-a97f-a81861395b78@googlegroups.com> <99f9330e-3a82-4a0d-8bb4-4ec2513723fe@googlegroups.com> <5cd094b1-ccba-4c6c-a83b-d0fa2ebf39ba@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1764_1086524642.1497083902214" X-Trace: blaine.gmane.org 1497083904 4384 195.159.176.226 (10 Jun 2017 08:38:24 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 10 Jun 2017 08:38:24 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCJKJO4E2ICRB7W753EQKGQEAUQEKQQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Jun 10 10:38:18 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pf0-f192.google.com ([209.85.192.192]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dJbuU-0000px-62 for gtp-pandoc-discuss@m.gmane.org; Sat, 10 Jun 2017 10:38:18 +0200 Original-Received: by mail-pf0-f192.google.com with SMTP id r72sf6446823pfl.1 for ; Sat, 10 Jun 2017 01:38:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=h9wBKG0J4jQn76jQXoI/xSHk+y41O6SU3iCboWYglV0=; b=r5BrrtznQZ6oaM0XREruugwJE+tSjyYzv1PnPwPZNnIXL4otV3AU/CSa4qqygwV6Co RAdCppCrH/yJqKbXOEcj8eS0RdwA9Q363wUTThBNTvc9UIBMzHx4qig+Z/MgYKuzFCxy hU/lRjGQ6KZtPQTei6wbvWTJQvVxij84OPNhVWZp3J6mLTQP9vpDk8POX2yMBE2uRCMG VWfxly8zhT5qVatEu/spySE0ZDxcqz3kKIer2sAWw51e6gsd52uCny2B562JT+DkRpZV k5sgpOv3SXatNEU6Hdmzk2RjtkcjlHqFGztpp7AigSs0PYhkxUkRBxk8XgBW2/CHFXle AA8A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arotau-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=h9wBKG0J4jQn76jQXoI/xSHk+y41O6SU3iCboWYglV0=; b=opupecu8qryyHU/slbdPiTtQT3BibPd1Aes40OdjKmQc0Qp0KBWFTZbKewKpbjTCrK nLQZx0+yDG/yBtf05M//ntJPbA3T1ks4lY4DzvDBHuZz8172dkRBNIAglUYW+/eYMJ+l 7ec+AdkcyMWRSnYMfdl6NTuV40epDgqOa+X9C1Yh2M6TeNbSAHif8zf5dlMwoRjcXO0+ 8XI9jlW9mDu3n0yU/AFMjaWq+Dher304uKV1zBDOdShZ3TypprhWSp8BVu4WR9TygAa9 HvwWza4fjVQEDjU/UJ8UN34tY5ivJg1rsihTePVtjNbimXcdeVxt0ZgxQm4T2IhqTHvk W5bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=h9wBKG0J4jQn76jQXoI/xSHk+y41O6SU3iCboWYglV0=; b=BjFob2ZMaLQMSS33RBNW5D3mTi8uoqFrtsyQ4opg9qgsrTvIHzioZuVMSjIgNNasSk UmHvki6NnxJoUUnrBomGZVbJ4nA4XY7X344jS5H/f+Bja9owZTsMTJ/H4WyxURaZQ/AC lT2wrI2r0aOhuXQV9IdTcbUHUGu46eS5Vka9RKMSHH/LrXUWsG+yjb3EAnsL0IO9zFbC ThmsYfCAVL/BUSbFrFmgWxtTxCY9JD15m9tKLOkcLUBbvmL8+/Ein0dTNKvUZJw7xSZJ zCuFv6Nsot7oNFZ4Bk16oeybFf8m0lYAz7BozNL4TCukKA1nxbW5B3vIbhFrlm6nicCj VfuQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AODbwcDCql27Vm7/Nw3u3WJrrkCmXufqweO+TYCMDgQhmdJTSgp5aGdV hBIiOdVAcNeEXA== X-Received: by 10.157.84.36 with SMTP id j36mr1091619oth.9.1497083903288; Sat, 10 Jun 2017 01:38:23 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.66.18 with SMTP id q18ls10737414ote.24.gmail; Sat, 10 Jun 2017 01:38:22 -0700 (PDT) X-Received: by 10.157.80.24 with SMTP id a24mr1090569oth.18.1497083902690; Sat, 10 Jun 2017 01:38:22 -0700 (PDT) In-Reply-To: X-Original-Sender: lyndon-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17808 Archived-At: ------=_Part_1764_1086524642.1497083902214 Content-Type: multipart/alternative; boundary="----=_Part_1765_1720605710.1497083902216" ------=_Part_1765_1720605710.1497083902216 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I reckon you can do most cases just with the direction-breaking character,= =20 and I've got a custom keymap with a shortcut for doing it. Because so many= =20 apps now implement the bidi algorithm, I can see in my text editor whether= =20 it's working correctly there, with some confidence that the same output=20 will occur elsewhere (if the copy/paste preserves the unicode characters). But yes, it's all much easier than in the past - thankfully I never had to= =20 do what you did back then! On Friday, June 9, 2017 at 12:13:00 PM UTC+1, BP wrote: > > On occasion I have used some of the many Unicode bracket pairs as=20 > delimiters and replaced them with the corresponding LaTeX or HTML markup = by=20 > regular expression or by making them active in XeLaTeX. They're minimally= =20 > invasive but can be a pain to type, and it taxes the memory to remember= =20 > which is which. I would end up defining a code snippet where I anyway wou= ld=20 > type something like `..grc TEXT` and then I think I can just as well= =20 > have the same snippet expand to a span. It's so much less invasive now th= at=20 > Pandoc supports bracketed spans. As always how you see things depends on= =20 > where you are coming from. Back in the early nineties I had to make my ow= n=20 > 8-bit fonts and do the necessary incantations to make them work with LaTe= X,=20 > and I wrote my own script which sorted and marked up word indexes, a must= =20 > have in comparative philology. In fact I still use a descendant of it;=20 > things are both easier and harder with Unicode! > > > Den 9 jun 2017 10:03 skrev "Lyndon Drake" > >: > >> I did once know Haskell, but these days it makes my head hurt a bit. I'l= l=20 >> have a go later this summer once exams are out of the way. I think it's= =20 >> feasible (partly because I've seen it implemented elsewhere) but I could= =20 >> easily be wrong, in which case I'll fall back on the marked spans. Your= =20 >> filter does at least make them minimally invasive. I just don't like the= =20 >> way it clutters up the text-only view with markup, which is the number o= ne=20 >> reason I'm trying to use Markdown instead of just using LaTeX directly. >> >> Thanks again for the help. >> >> On Thursday, June 8, 2017 at 6:20:11 PM UTC+1, BP wrote: >>> >>> The marked spans approach suits me well as in my nook of the woods=20 >>> everything but Greek and modern Cyrillic languages is usually cited in= =20 >>> romanization. >>> >>> Setting the language based on Unicode ranges with a Pandoc filter is=20 >>> probably rather hard, since Pandoc splits text content into lists of=20 >>> alternating Str elements, containing the non-whitespace parts, and Spac= e=20 >>> and LineBreak elements representing the whitespace parts. You will need= to=20 >>> locate all places in the AST where there is such a list, step through t= he=20 >>> list looking for Str elements containing characters from one of the scr= ipts=20 >>> you are interested in and enclose sequences of alternating script chara= cter=20 >>> Str elements and whitespace elements in Span elements with appropriate= =20 >>> attributes or raw markup elements. It is possible that this might be do= ne=20 >>> more or less efficiently with Haskell -- I don't know. If you like me d= on't=20 >>> know Haskell it is going to be tricky. If you also want the resulting= =20 >>> markup to look pretty you also need to take such things as embedded=20 >>> emphasis elements into consideration, not to speak of Str elements with= =20 >>> mixed scripts if any. I tried a poor man's substitute once, locating Gr= eek=20 >>> portions in Markdown source with a regular expression, but it was too h= ard=20 >>> to handle punctuation sufficiently elegantly (Greek portions starting a= nd=20 >>> ending with punctuation in particular). >>> >>> You mentioned the ucharclasses package which may be helpful when=20 >>> producing PDF. My experience is again that punctuation inside an=20 >>> otherscript portion is problematic, at least if the other script uses= =20 >>> punctuation from general punctuation and/or ASCII, and it still leaves = you=20 >>> out of luck when producing HTML. It is still best practice to wrap port= ions=20 >>> with other directionality in elements with a dir attribute, not to ment= ion=20 >>> making sure that portions in other languages are put in divs with a lan= g=20 >>> tag, not least for the benefit of those using assistive technologies. >>> >>> Den 8 jun 2017 12:14 skrev "Lyndon Drake" : >>> >>> That's very helpful, thank you! >>> >>> I haven't actually rewritten the LaTeX template preamble, just moved=20 >>> things around a bit, retaining all the Pandoc variables but making the= =20 >>> package load order robust. I think I'll probably slightly update the=20 >>> template, for two more things: >>> >>> 1. it's nice to have access to two places for header insertions, one=20 >>> near the start and the other near the end of the header material once a= ll=20 >>> the packages have been loaded; >>> >>> 2. the memoir class works best for books when there are \frontmatter,= =20 >>> \mainmatter, and \endmatter calls in the document, so I will add option= s=20 >>> and a test to allow those. >>> >>> I like the span syntax you've got, though in the medium term I want to= =20 >>> work on an automatic filter for setting the language based on Unicode= =20 >>> ranges and just use spans to cover unusual cases (although in fact, the= re=20 >>> is almost no case that cannot be covered by the use of Unicode=20 >>> direction-breaking marks or spaces, which have the advantage of working= =20 >>> without any markup in HTML in modern browsers with font fallback=20 >>> mechanisms). >>> >>> >>> On Wednesday, June 7, 2017 at 5:23:54 PM UTC+1, BP Jonsson wrote: >>>> >>>> Den 2017-06-06 kl. 15:26, skrev Lyndon Drake:=20 >>>> > Thanks for this. I'd come to the conclusion that writing a latex fil= e=20 >>>> and=20 >>>> > including the fragments that pandoc generates might be the way=20 >>>> forward, but=20 >>>> > I'm also curious to know what I've been doing wrong.=20 >>>> >>>> You don't need to write the whole preamble by hand, just the part=20 >>>> where you load and configure polyglossia and define the fonts=20 >>>> needed for polyglossia.=20 >>>> put them in a file called for example `poly.ltx` and then run=20 >>>> Pandoc with=20 >>>> >>>> ````=20 >>>> pandoc -H poly.ltx --latex-engine=3Dxelatex=20 >>>> ````=20 >>>> >>>> I *think* this will also make the bidi bug go away. The=20 >>>> polyglossia package loads the bidi package if needed, but bidi=20 >>>> wants to be loaded after a lot of other packages which it performs=20 >>>> keyhole surgery on, including longtable and even hyperref. However=20 >>>> Pandoc's latex template, loads polyglossia quite early,=20 >>>> alternatively to loading babel. There may be no other way to fix=20 >>>> that than to use a custom template where polyglossia is loaded=20 >>>> quite late, perhaps even after the header-includes, lest the=20 >>>> latter also load some package which bidi wants to be loaded before=20 >>>> itself. I have made such a template=20 >>>> ()=20 >>>> If it solves the problem please let me know and I'll make a pull=20 >>>> request for the change.=20 >>>> >>>> My custom template also includes my fontspec hack which lets you=20 >>>> declare font families in your metadata like this:=20 >>>> >>>> ````=20 >>>> font-families:=20 >>>> - name: '\font'=20 >>>> font: =20 >>>> options:=20 >>>> - =3D''=20 >>>> - name: '\greekfont'=20 >>>> font: GFS Neohellenic=20 >>>> options:=20 >>>> - Language=3DGreek=20 >>>> - Script=3DGreek=20 >>>> - Scale=3DMatchLowercase=20 >>>> - Ligatures=3DTeX=20 >>>> - name: '\sanskritfont'=20 >>>> font: Sahadeva=20 >>>> options:=20 >>>> - Language=3DSanskrit=20 >>>> - Script=3DDevanagari=20 >>>> - name: '\myfancyfont'=20 >>>> font: My Fancy=20 >>>> ````=20 >>>> >>>> >>>> >=20 >>>> > No rush of course, but I'm keen to have a look at your filter and se= e=20 >>>> what=20 >>>> > it does,=20 >>>> >>>> It is now documented and uploaded:=20 >>>> >>>> =20 >>>> >>>> (Scroll down for the rendered documentation. The first code block=20 >>>> should suffice to understand how it works.)=20 >>>> >>>> It takes some initial configuration but that should be reusable by=20 >>>> including a separate YAML file on the command line with the actual=20 >>>> document.=20 >>>> >>>> >>>> >>>> >>>> even without docs. I've also found another filter on the list back=20 >>>> > in 2014 from Jesse Rosenthal that looks at Unicode ranges and wraps= =20 >>>> them in=20 >>>> > a latex environment, which seems like a good idea (I've done this=20 >>>> kind of=20 >>>> > thing in InDesign grep styles and it works well for most normal bits= =20 >>>> of=20 >>>> > text).=20 >>>> >=20 >>>> > I found the lang/otherlangs documentation, but couldn't figure out= =20 >>>> from the=20 >>>> > manual (might just be overlooking the correct bit) how to set a div= =20 >>>> or a=20 >>>> > span for another language.=20 >>>> >=20 >>>> > Part of the problem is that if I set lang and otherlangs as follows:= =20 >>>> >=20 >>>> > lang: en-GB=20 >>>> > otherlangs: [he, sy]=20 >>>> > =20 >>>> > I get this:=20 >>>> >=20 >>>> > ! Package bidi Error: Oops! you have loaded package longtable after= =20 >>>> bidi=20 >>>> > packag=20 >>>> >=20 >>>> > e. Please load package longtable before bidi package, and then try t= o=20 >>>> run=20 >>>> > xelat=20 >>>> >=20 >>>> > ex on your document again.=20 >>>> >=20 >>>> >=20 >>>> > See the bidi package documentation for explanation.=20 >>>> >=20 >>>> > Type H for immediate help.=20 >>>> >=20 >>>> > ...=20 >>>> >=20 >>>> > =20 >>>> >=20 >>>> > l.72 \begin{document}=20 >>>> >=20 >>>> >=20 >>>> > pandoc: Error producing PDF=20 >>>> >=20 >>>> >=20 >>>> > which I guess means that some kind of strange interaction in the=20 >>>> latex=20 >>>> > template is producing an undesirable latex file to feed to xelatex= =20 >>>> (maybe=20 >>>> > pandoc-csv2table is doing something to the produced latex?). But it= =20 >>>> kind of=20 >>>> > put a stop to me experimenting with the spans and divs.=20 >>>> >=20 >>>> > Best,=20 >>>> > Lyndon=20 >>>> >=20 >>>> > On Tuesday, June 6, 2017 at 1:08:48 PM UTC+1, BPJ wrote:=20 >>>> >>=20 >>>> >> You need to use the lang and otherlang variables as described in th= e=20 >>>> >> manual http://pandoc.org/MANUAL if I recall correctly.=20 >>>> >>=20 >>>> >> Alternatively/additionally write a latex file containing a preamble= =20 >>>> >> fragment where you load polyglossia and any languages and fonts you= =20 >>>> need=20 >>>> >> with the options you need in the usual polyglossia/fontspec way and= =20 >>>> include=20 >>>> >> it with the -H option. You also need to mark spans/divs containing= =20 >>>> extra=20 >>>> >> languages with lang and dir attributes as appropriate. Use your=20 >>>> browser's=20 >>>> >> page search function to find these terms in the manual.=20 >>>> >>=20 >>>> >> I saw your other question about font/language switching yesterday= =20 >>>> and=20 >>>> >> started to write some documentation for the filter I use to make=20 >>>> those=20 >>>> >> things easier. Alas I couldn't finish and today there is a national= =20 >>>> holiday=20 >>>> >> in Sweden. I'll get back to it tomorrow. Basically you can use span= s=20 >>>> with a=20 >>>> >> single short class like .g for greek and the filter will inject=20 >>>> latex=20 >>>> >> markup, docx custom style names or extended (html) attributes you= =20 >>>> have=20 >>>> >> declared to correspond to the class in your metadata.=20 >>>> >>=20 >>>> >> I can comfort you that you are much better off than I was when I=20 >>>> started=20 >>>> >> doing multilingual work with Pandoc. We had no filters, no native= =20 >>>> spans or=20 >>>> >> divs and no built-in multilingual/polyglossia support back then.=20 >>>> Everything=20 >>>> >> had to be done in -H files and with raw latex in the markdown, whic= h=20 >>>> was a=20 >>>> >> pain because I needed to make things available in HTML as well.=20 >>>> >>=20 >>>> >> I'll also update my latex template on github which contains some=20 >>>> stuff for=20 >>>> >> fontspec font loading.=20 >>>> >>=20 >>>> >> I hope this helps. I'm afraid I won't be able to check my mail for= =20 >>>> the=20 >>>> >> rest of the day.=20 >>>> >>=20 >>>> >>=20 >>>> >> tis 6 juni 2017 kl. 09:02 skrev Lyndon Drake >>> >> >:=20 >>>> >>=20 >>>> >>> Sorry, I probably wasn't clear: I followed the instruction from=20 >>>> Pandoc=20 >>>> >>> and switched to xelatex. Now I'm stuck trying to configure the=20 >>>> language=20 >>>> >>> options.=20 >>>> >>>=20 >>>> >>>=20 >>>> >>> On Tuesday, June 6, 2017 at 7:41:32 AM UTC+1, BP wrote:=20 >>>> >>>=20 >>>> >>>> You need the --latex-engine=3Dxelatex option.=20 >>>> >>>>=20 >>>> >>>> tis 6 juni 2017 kl. 07:53 skrev Lyndon Drake := =20 >>>> >>>>=20 >>>> >>> Hi all,=20 >>>> >>>>>=20 >>>> >>>>> Many apologies as I'm sure this is all obvious once one knows,= =20 >>>> but I'm=20 >>>> >>>>> a bit stuck. I've got some Pandoc Markdown files which I'm tryin= g=20 >>>> to=20 >>>> >>>>> convert to PDF using Pandoc. They include various non-ascii=20 >>>> characters, all=20 >>>> >>>>> in unicode. If I run:=20 >>>> >>>>>=20 >>>> >>>>> /usr/local/bin/pandoc -f=20 >>>> >>>>> markdown+pipe_tables+grid_tables+yaml_metadata_block --filter=20 >>>> >>>>> pandoc-citeproc --filter pandoc-csv2table -s -o=20 >>>> formatted/Draft3.pdf=20 >>>> >>>>> text/metadata.yaml text/1-Introduction.md=20 >>>> >>>>>=20 >>>> >>>>> I get the following:=20 >>>> >>>>>=20 >>>> >>>>> ! Package inputenc Error: Unicode char =E1=B9=A3 (U+1E63)=20 >>>> >>>>>=20 >>>> >>>>> (inputenc) not set up for use with LaTeX.=20 >>>> >>>>>=20 >>>> >>>>>=20 >>>> >>>>> See the inputenc package documentation for explanation.=20 >>>> >>>>>=20 >>>> >>>>> Type H for immediate help.=20 >>>> >>>>>=20 >>>> >>>>> ...=20 >>>> >>>>>=20 >>>> >>>>> =20 >>>> >>>>>=20 >>>> >>>>> l.125 Vandenhoeck \& Ruprecht, 1990), 39--62.}=20 >>>> >>>>>=20 >>>> >>>>>=20 >>>> >>>>> Try running pandoc with --latex-engine=3Dxelatex.=20 >>>> >>>>>=20 >>>> >>>>> pandoc: Error producing PDF=20 >>>> >>>>>=20 >>>> >>>>>=20 >>>> >>>>> So the next step was to switch to xelatex based on the helpful= =20 >>>> >>>>> suggestion from pandoc. As long as I don't try to use any babel= =20 >>>> or=20 >>>> >>>>> polyglossia environments, or biblatex, this works fine. But as I= =20 >>>> want to=20 >>>> >>>>> use both, I'm a bit stuck. First thing is that it looks like the= =20 >>>> default=20 >>>> >>>>> template tries to use babel rather than polyglossia if xetex is= =20 >>>> the engine.=20 >>>> >>>>> Is there a reason for this? (I want to use the biblatex-sbl styl= e=20 >>>> for my=20 >>>> >>>>> bibliography, and they recommend polyglossia.)=20 >>>> >>>>>=20 >>>> >>>>> I want to use English (UK) as my main language, with Hebrew and= =20 >>>> Syriac=20 >>>> >>>>> as other languages (I've also got some ancient Greek, but the=20 >>>> main font=20 >>>> >>>>> I've chosen works fine and the output looks good for that withou= t=20 >>>> using a=20 >>>> >>>>> separate language environment).=20 >>>> >>>>>=20 >>>> >>>>> As a starting point, what language options do I set in my YAML= =20 >>>> metadata=20 >>>> >>>>> to enable those other two language environments, and how do I=20 >>>> specify the=20 >>>> >>>>> fonts for them?=20 >>>> >>>>>=20 >>>> >>>>> Here's my YAML metadata file so far:=20 >>>> >>>>>=20 >>>> >>>>> ---=20 >>>> >>>>> author: Lyndon Drake=20 >>>> >>>>> documentclass: memoir=20 >>>> >>>>> toc: true=20 >>>> >>>>> papersize: a4=20 >>>> >>>>> fontsize: 12pt=20 >>>> >>>>> top-level-division: chapter=20 >>>> >>>>> number-sections: true=20 >>>> >>>>> mainfont: Skolar PE Light=20 >>>> >>>>> mainfontoptions: Numbers=3DOldStyle=20 >>>> >>>>> bibliography:=20 >>>> /Users/lyndon/Documents/Media/Bibliography/0lib.bib=20 >>>> >>>>> csl:=20 >>>> >>>>>=20 >>>> /Users/lyndon/Documents/Media/Bibliography/society-of-biblical-literat= ure-fullnote-bibliography.csl=20 >>>> >>>> >>>>> notes-after-punctuation: true=20 >>>> >>>>> ---=20 >>>> >>>>>=20 >>>> >>>>> Many thanks in advance for any help on this,=20 >>>> >>>>> Lyndon=20 >>>> >>>>>=20 >>>> >>>>> --=20 >>>> >>>>> You received this message because you are subscribed to the=20 >>>> Google=20 >>>> >>>>> Groups "pandoc-discuss" group.=20 >>>> >>>>>=20 >>>> >>>> To unsubscribe from this group and stop receiving emails from it,= =20 >>>> send=20 >>>> >>>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >>>> >>>>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org= .=20 >>>> >>>> >>>>=20 >>>> >>>>=20 >>>> >>>>> To view this discussion on the web visit=20 >>>> >>>>>=20 >>>> https://groups.google.com/d/msgid/pandoc-discuss/89122680-f883-4853-a9= 7f-a81861395b78%40googlegroups.com=20 >>>> >>>>> < >>>> https://groups.google.com/d/msgid/pandoc-discuss/89122680-f883-4853-a9= 7f-a81861395b78%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter>= =20 >>>> >>>> >>>>> .=20 >>>> >>>>> For more options, visit https://groups.google.com/d/optout.=20 >>>> >>>>>=20 >>>> >>>> --=20 >>>> >>> You received this message because you are subscribed to the Google= =20 >>>> Groups=20 >>>> >>> "pandoc-discuss" group.=20 >>>> >>> To unsubscribe from this group and stop receiving emails from it,= =20 >>>> send an=20 >>>> >>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 >>>> >>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org= =20 >>>> >>> .=20 >>>> >>> To view this discussion on the web visit=20 >>>> >>>=20 >>>> https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-3a82-4a0d-8b= b4-4ec2513723fe%40googlegroups.com=20 >>>> >>> < >>>> https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-3a82-4a0d-8b= b4-4ec2513723fe%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter>= =20 >>>> >>>> >>> .=20 >>>> >>> For more options, visit https://groups.google.com/d/optout.=20 >>>> >>>=20 >>>> >>=20 >>>> >=20 >>>> >>>> --=20 >>> You received this message because you are subscribed to the Google=20 >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send= =20 >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/5cd094b1-ccba-4c6c-a83= b-d0fa2ebf39ba%40googlegroups.com=20 >>> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> --=20 >> You received this message because you are subscribed to the Google Group= s=20 >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n=20 >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org . >> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org=20 >> . >> To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/a5d78535-d049-49c9-a77b= -ddfa3b226302%40googlegroups.com=20 >> >> . >> For more options, visit https://groups.google.com/d/optout. >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/abcf4c23-e7ac-4eea-b214-6d5bb1a53019%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_1765_1720605710.1497083902216 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I reckon you can do most cases just with the direction-bre= aking character, and I've got a custom keymap with a shortcut for doing= it. Because so many apps now implement the bidi algorithm, I can see in my= text editor whether it's working correctly there, with some confidence= that the same output will occur elsewhere (if the copy/paste preserves the= unicode characters).

But yes, it's all much easier = than in the past - thankfully I never had to do what you did back then!
=
On Friday, June 9, 2017 at 12:13:00 PM UTC+1, BP wrote:
On occasion I have used some= of the many Unicode bracket pairs as delimiters and replaced them with the= corresponding LaTeX or HTML markup by regular expression or by making them= active in XeLaTeX. They're minimally invasive but can be a pain to typ= e, and it taxes the memory to remember which is which. I would end up defin= ing a code snippet where I anyway would type something like `..grc TEXT<= tab>` and then I think I can just as well have the same snippet expand t= o a span. It's so much less invasive now that Pandoc supports bracketed= spans. As always how you see things depends on where you are coming from. = Back in the early nineties I had to make my own 8-bit fonts and do the nece= ssary incantations to make them work with LaTeX, and I wrote my own script = which sorted and marked up word indexes, a must have in comparative philolo= gy. In fact I still use a descendant of it; things are both easier and hard= er with Unicode!


Den 9 jun 2017 10:03 skrev "Lyndon Drake" <lyn...@arota= u.com>:
I did once know Haskell, but these days it makes my head hurt a= bit. I'll have a go later this summer once exams are out of the way. I= think it's feasible (partly because I've seen it implemented elsew= here) but I could easily be wrong, in which case I'll fall back on the = marked spans. Your filter does at least make them minimally invasive. I jus= t don't like the way it clutters up the text-only view with markup, whi= ch is the number one reason I'm trying to use Markdown instead of just = using LaTeX directly.

Thanks again for the help.

= On Thursday, June 8, 2017 at 6:20:11 PM UTC+1, BP wrote:
The marked spans= approach suits me well as in my nook of the woods everything but Greek and= modern Cyrillic languages is usually cited in romanization.

Setting the language based on Unicode= ranges with a Pandoc filter is probably rather hard, since Pandoc splits t= ext content into lists of alternating Str elements, containing the non-whit= espace parts, and Space and LineBreak elements representing the whitespace = parts. You will need to locate all places in the AST where there is such a = list, step through the list looking for Str elements containing characters = from one of the scripts you are interested in and enclose sequences of alte= rnating script character Str elements and whitespace elements in Span eleme= nts with appropriate attributes or raw markup elements. It is possible that= this might be done more or less efficiently with Haskell -- I don't kn= ow. If you like me don't know Haskell it is going to be tricky. If you = also want the resulting markup to look pretty you also need to take such th= ings as embedded emphasis elements into consideration, not to speak of Str = elements with mixed scripts if any. I tried a poor man's substitute onc= e, locating Greek portions in Markdown source with a regular expression, bu= t it was too hard to handle punctuation sufficiently elegantly (Greek porti= ons starting and ending with punctuation in particular).

You mentioned the ucharclasses package whi= ch may be helpful when producing PDF. My experience is again that punctuati= on inside an otherscript portion is problematic, at least if the other scri= pt uses punctuation from general punctuation and/or ASCII, and it still lea= ves you out of luck when producing HTML. It is still best practice to wrap = portions with other directionality in elements with a dir attribute, not to= mention making sure that portions in other languages are put in divs with = a lang tag, not least for the benefit of those using assistive technologies= .

Den 8 jun 2017 12:14 skrev= "Lyndon Drake" <lyn...-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org>= :
That's very helpful= , thank you!

I haven't actually rewritten the LaTeX = template preamble, just moved things around a bit, retaining all the Pandoc= variables but making the package load order robust. I think I'll proba= bly slightly update the template, for two more things:

1= . it's nice to have access to two places for header insertions, one nea= r the start and the other near the end of the header material once all the = packages have been loaded;

2. the memoir class wor= ks best for books when there are \frontmatter, \mainmatter, and \endmatter = calls in the document, so I will add options and a test to allow those.

I like the span syntax you've got, though in the = medium term I want to work on an automatic filter for setting the language = based on Unicode ranges and just use spans to cover unusual cases (although= in fact, there is almost no case that cannot be covered by the use of Unic= ode direction-breaking marks or spaces, which have the advantage of working= without any markup in HTML in modern browsers with font fallback mechanism= s).


On Wednesday, June 7, 2017 at 5:23:54 PM UTC+1, BP Jonsson = wrote:
Den 2017-06-06 kl. 15:26, skr= ev Lyndon Drake:
> Thanks for this. I'd come to the conclusion that writing a lat= ex file and
> including the fragments that pandoc generates might be the way for= ward, but
> I'm also curious to know what I've been doing wrong.

You don't need to write the whole preamble by hand, just the part= =20
where you load and configure polyglossia and define the fonts=20
needed for polyglossia.
put them in a file called for example `poly.ltx` and then run=20
Pandoc with

````
pandoc -H poly.ltx --latex-engine=3Dxelatex
````

I *think* this will also make the bidi bug go away. The=20
polyglossia package loads the bidi package if needed, but bidi=20
wants to be loaded after a lot of other packages which it performs=20
keyhole surgery on, including longtable and even hyperref. However=20
Pandoc's latex template, loads polyglossia quite early,=20
alternatively to loading babel. There may be no other way to fix=20
that than to use a custom template where polyglossia is loaded=20
quite late, perhaps even after the header-includes, lest the=20
latter also load some package which bidi wants to be loaded before=20
itself. I have made such a template=20
(<https://gist.github.com/bpj/5cebc975= 685134145cd74ca8670b1ccc>)=20
If it solves the problem please let me know and I'll make a pull=20
request for the change.

My custom template also includes my fontspec hack which lets you=20
declare font families in your metadata like this:

````
font-families:
=C2=A0 =C2=A0- name: '\<language>font'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0<Font Name>
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- <key>=3D'<value>'
=C2=A0 =C2=A0- name: '\greekfont'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0GFS Neohellenic
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Language=3DGreek
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Script=3DGreek
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Scale=3DMatchLowercase
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Ligatures=3DTeX
=C2=A0 =C2=A0- name: '\sanskritfont'
=C2=A0 =C2=A0 =C2=A0font: =C2=A0 =C2=A0Sahadeva
=C2=A0 =C2=A0 =C2=A0options:
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Language=3DSanskrit
=C2=A0 =C2=A0 =C2=A0 =C2=A0- Script=3DDevanagari
=C2=A0 =C2=A0- name: '\myfancyfont'
=C2=A0 =C2=A0 =C2=A0font: My Fancy
````


>=20
> No rush of course, but I'm keen to have a look at your filter = and see what
> it does,=20

It is now documented and uploaded:

<https://gist.github.com/bpj/02de1ed87= ff8f8d0c31a43b9dcac1c80>

(Scroll down for the rendered documentation. The first code block=20
should suffice to understand how it works.)

It takes some initial configuration but that should be reusable by=20
including a separate YAML file on the command line with the actual=20
document.




even without docs. I've also found another filter on the list back
> in 2014 from Jesse Rosenthal that looks at Unicode ranges and wrap= s them in
> a latex environment, which seems like a good idea (I've done t= his kind of
> thing in InDesign grep styles and it works well for most normal bi= ts of
> text).
>=20
> I found the lang/otherlangs documentation, but couldn't figure= out from the
> manual (might just be overlooking the correct bit) how to set a di= v or a
> span for another language.
>=20
> Part of the problem is that if I set lang and otherlangs as follow= s:
>=20
> =C2=A0 =C2=A0lang: en-GB
> =C2=A0 =C2=A0otherlangs: [he, sy]
> =C2=A0 =C2=A0
> I get this:
>=20
> ! Package bidi Error: Oops! you have loaded package longtable afte= r bidi
> packag
>=20
> e. Please load package longtable before bidi package, and then try= to run
> xelat
>=20
> ex on your document again.
>=20
>=20
> See the bidi package documentation for explanation.
>=20
> Type =C2=A0H <return> =C2=A0for immediate help.
>=20
> =C2=A0 ...
>=20
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>=20
> l.72 \begin{document}
>=20
>=20
> pandoc: Error producing PDF
>=20
>=20
> which I guess means that some kind of strange interaction in the l= atex
> template is producing an undesirable latex file to feed to xelatex= (maybe
> pandoc-csv2table is doing something to the produced latex?). But i= t kind of
> put a stop to me experimenting with the spans and divs.
>=20
> Best,
> Lyndon
>=20
> On Tuesday, June 6, 2017 at 1:08:48 PM UTC+1, BPJ wrote:
>>
>> You need to use the lang and otherlang variables as described = in the
>> manual http://pandoc.org/MANUAL if I recall correctly.
>>
>> Alternatively/additionally write a latex file containing a pre= amble
>> fragment where you load polyglossia and any languages and font= s you need
>> with the options you need in the usual polyglossia/fontspec wa= y and include
>> it with the -H option. You also need to mark spans/divs contai= ning extra
>> languages with lang and dir attributes as appropriate. Use you= r browser's
>> page search function to find these terms in the manual.
>>
>> I saw your other question about font/language switching yester= day and
>> started to write some documentation for the filter I use to ma= ke those
>> things easier. Alas I couldn't finish and today there is a= national holiday
>> in Sweden. I'll get back to it tomorrow. Basically you can= use spans with a
>> single short class like .g for greek and the filter will injec= t =C2=A0latex
>> markup, docx custom style names or extended (html) attributes = you have
>> declared to correspond to the class in your metadata.
>>
>> I can comfort you that you are much better off than I was when= I started
>> doing multilingual work with Pandoc. We had no filters, no nat= ive spans or
>> divs and no built-in multilingual/polyglossia support back the= n. Everything
>> had to be done in -H files and with raw latex in the markdown,= which was a
>> pain because I needed to make things available in HTML as well= .
>>
>> I'll also update my latex template on github which contain= s some stuff for
>> fontspec font loading.
>>
>> I hope this helps. I'm afraid I won't be able to check= my mail for the
>> rest of the day.
>>
>>
>> tis 6 juni 2017 kl. 09:02 skrev Lyndon Drake <lyn...@aro= tau.com
>> <javascript:>>:
>>
>>> Sorry, I probably wasn't clear: I followed the instruc= tion from Pandoc
>>> and switched to xelatex. Now I'm stuck trying to confi= gure the language
>>> options.
>>>
>>>
>>> On Tuesday, June 6, 2017 at 7:41:32 AM UTC+1, BP wrote:
>>>
>>>> You need =C2=A0the --latex-engine=3Dxelatex option.
>>>>
>>>> tis 6 juni 2017 kl. 07:53 skrev Lyndon Drake <ly= n...-S8RYeTzMgQ3QT0dZR+AlfA@public.gmane.org>:
>>>>
>>> Hi all,
>>>>>
>>>>> Many apologies as I'm sure this is all obvious= once one knows, but I'm
>>>>> a bit stuck. I've got some Pandoc Markdown fil= es which I'm trying to
>>>>> convert to PDF using Pandoc. They include various = non-ascii characters, all
>>>>> in unicode. If I run:
>>>>>
>>>>> /usr/local/bin/pandoc -f
>>>>> markdown+pipe_tables+grid_tables+yaml_metadat= a_block --filter
>>>>> pandoc-citeproc --filter pandoc-csv2table -s -o fo= rmatted/Draft3.pdf
>>>>> text/metadata.yaml text/1-Introduction.md
>>>>>
>>>>> I get the following:
>>>>>
>>>>> ! Package inputenc Error: Unicode char =E1=B9=A3 (= U+1E63)
>>>>>
>>>>> (inputenc) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0not set up for use with LaTeX.
>>>>>
>>>>>
>>>>> See the inputenc package documentation for explana= tion.
>>>>>
>>>>> Type =C2=A0H <return> =C2=A0for immediate he= lp.
>>>>>
>>>>> =C2=A0 ...
>>>>>
>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>>>>>
>>>>> l.125 =C2=A0 Vandenhoeck \& Ruprecht, 1990), 3= 9--62.}
>>>>>
>>>>>
>>>>> Try running pandoc with --latex-engine=3Dxelatex.
>>>>>
>>>>> pandoc: Error producing PDF
>>>>>
>>>>>
>>>>> So the next step was to switch to xelatex based on= the helpful
>>>>> suggestion from pandoc. As long as I don't try= to use any babel or
>>>>> polyglossia environments, or biblatex, this works = fine. But as I want to
>>>>> use both, I'm a bit stuck. First thing is that= it looks like the default
>>>>> template tries to use babel rather than polyglossi= a if xetex is the engine.
>>>>> Is there a reason for this? (I want to use the bib= latex-sbl style for my
>>>>> bibliography, and they recommend polyglossia.)
>>>>>
>>>>> I want to use English (UK) as my main language, wi= th Hebrew and Syriac
>>>>> as other languages (I've also got some ancient= Greek, but the main font
>>>>> I've chosen works fine and the output looks go= od for that without using a
>>>>> separate language environment).
>>>>>
>>>>> As a starting point, what language options do I se= t in my YAML metadata
>>>>> to enable those other two language environments, a= nd how do I specify the
>>>>> fonts for them?
>>>>>
>>>>> Here's my YAML metadata file so far:
>>>>>
>>>>> ---
>>>>> =C2=A0 =C2=A0author: Lyndon Drake
>>>>> =C2=A0 =C2=A0documentclass: memoir
>>>>> =C2=A0 =C2=A0toc: true
>>>>> =C2=A0 =C2=A0papersize: a4
>>>>> =C2=A0 =C2=A0fontsize: 12pt
>>>>> =C2=A0 =C2=A0top-level-division: chapter
>>>>> =C2=A0 =C2=A0number-sections: true
>>>>> =C2=A0 =C2=A0mainfont: Skolar PE Light
>>>>> =C2=A0 =C2=A0mainfontoptions: Numbers=3DOldStyle
>>>>> =C2=A0 =C2=A0bibliography: /Users/lyndon/Documents= /Media/Bibliography/0lib.bib
>>>>> =C2=A0 =C2=A0csl:
>>>>> /Users/lyndon/Documents/Media/Bibliography/so= ciety-of-biblical-literature-fullnote-bibliography.csl
>>>>> =C2=A0 =C2=A0notes-after-punctuation: true
>>>>> ---
>>>>>
>>>>> Many thanks in advance for any help on this,
>>>>> Lyndon
>>>>>
>>>>> --=20
>>>>> You received this message because you are subscrib= ed to the Google
>>>>> Groups "pandoc-discuss" group.
>>>>>
>>>> To unsubscribe from this group and stop receiving emai= ls from it, send
>>>>> an email to pandoc-discus...@googlegroups.= com.
>>>>> To post to this group, send email to pandoc-...= @googlegroups.com.
>>>>
>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/p= andoc-discuss/89122680-f883-4853-a97f-a81861395b78%40googlegroups= .com
>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/8= 9122680-f883-4853-a97f-a81861395b78%40googlegroups.com?utm_medium= =3Demail&utm_source=3Dfooter>
>>>>> .
>>>>> For more options, visit = https://groups.google.com/d/optout.
>>>>>
>>>> --=20
>>> You received this message because you are subscribed to th= e Google Groups
>>> "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails f= rom it, send an
>>> email to pandoc-discus...@googlegroups.com <= ;javascript:>.
>>> To post to this group, send email to pandoc-...@googleg= roups.com
>>> <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-di= scuss/99f9330e-3a82-4a0d-8bb4-4ec2513723fe%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/99f9330e-= 3a82-4a0d-8bb4-4ec2513723fe%40googlegroups.com?utm_medium=3D= email&utm_source=3Dfooter>
>>> .
>>> For more options, visit https://g= roups.google.com/d/optout.
>>>
>>
>=20

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googleg= roups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pan= doc-discuss/5cd094b1-ccba-4c6c-a83b-d0fa2ebf39ba%40googlegroups.c= om.

For more options, visit https://groups.google.com= /d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googlegroups.com.
To post to this group, send email to pandoc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pand= oc-discuss/a5d78535-d049-49c9-a77b-ddfa3b226302%40googlegroups.co= m.
For more options, visit https://groups.google.com/= d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/abcf4c23-e7ac-4eea-b214-6d5bb1a53019%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_1765_1720605710.1497083902216-- ------=_Part_1764_1086524642.1497083902214--