public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* AST and non-Latin scripts
@ 2017-06-05  9:01 Lyndon Drake
       [not found] ` <a88f9304-09b6-4f8f-acea-5626ccc9e546-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Lyndon Drake @ 2017-06-05  9:01 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1619 bytes --]

Quick question: does the Pandoc AST have a way of identifying the script 
for a sequence of characters? E.g. using the named Unicode ranges, perhaps 
tagged with their ISO 15924 name (similar to various HTML 
implementations: https://en.wikipedia.org/wiki/ISO_15924).

That way a writer or filter could do different things with non-Latin 
scripts.

The particular case I'm interested in is documents with mixed Latin, 
Hebrew, Greek, Syriac, cuneiform, etc., where the ability for a writer (or 
filter) to set a different document inline or paragraph style (DOCX, or 
ICML writers) or span/environment (LATEX or HTML) would make life much 
easier. Typically one wants a specific font for the script in question, or 
in LATEX using polyglossia you mark a particular environment.

For LATEX I could embed the environment commands but it's not portable to 
other output formats. And now that text editing for these scripts is so 
good, I like keeping everything in plain text without the clutter of markup.

Does that make any sense?

Cheers,
Lyndon

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a88f9304-09b6-4f8f-acea-5626ccc9e546%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2174 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AST and non-Latin scripts
       [not found] ` <a88f9304-09b6-4f8f-acea-5626ccc9e546-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-06-05 13:47   ` John MacFarlane
       [not found]     ` <20170605134741.GD18559-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2017-06-05 13:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Except for verbatim code, all of the actual text will be
in Str elements in the AST.  So your filter can just
match on Str elements, check the string to see if it
satisfies some condition, and then do something
depending on the result -- e.g. add a raw latex
font-changing command, if the output format is
latex, or put it in a special span.

Pandoc itself doesn't provide any special functions
for unicode ranges, but you can just use whatever
functions are provided by the language in which
you're writing the filter.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AST and non-Latin scripts
       [not found]     ` <20170605134741.GD18559-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2017-06-05 13:56       ` John Muccigrosso
  2017-06-05 19:38       ` Lyndon Drake
  1 sibling, 0 replies; 4+ messages in thread
From: John Muccigrosso @ 2017-06-05 13:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1300 bytes --]

On Monday, June 5, 2017 at 9:47:56 AM UTC-4, John MacFarlane wrote:
>
> Except for verbatim code, all of the actual text will be 
> in Str elements in the AST.  So your filter can just 
> match on Str elements, check the string to see if it 
> satisfies some condition, and then do something 
> depending on the result -- e.g. add a raw latex 
> font-changing command, if the output format is 
> latex, or put it in a special span. 
>
> Pandoc itself doesn't provide any special functions 
> for unicode ranges, but you can just use whatever 
> functions are provided by the language in which 
> you're writing the filter. 
>

I'm not great on the scripting, but the ability to switch fonts by language 
would be handy for me. 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3d6dab29-856a-49a0-9484-3b9b9b43a9de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1900 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AST and non-Latin scripts
       [not found]     ` <20170605134741.GD18559-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2017-06-05 13:56       ` John Muccigrosso
@ 2017-06-05 19:38       ` Lyndon Drake
  1 sibling, 0 replies; 4+ messages in thread
From: Lyndon Drake @ 2017-06-05 19:38 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1581 bytes --]

I'll have a go at that. I know the unicode ranges I need so it shouldn't be 
impossible.

BTW, is there a reason the default latex template uses Babel with Xelatex? 
For this I want to use Polyglossia, and up till now I've just been able to 
put options in a YAML block, but I figure someone might have run into 
difficulty and so I'm reluctant to switch if it will cause problems.

On Monday, June 5, 2017 at 2:47:56 PM UTC+1, John MacFarlane wrote:
>
> Except for verbatim code, all of the actual text will be 
> in Str elements in the AST.  So your filter can just 
> match on Str elements, check the string to see if it 
> satisfies some condition, and then do something 
> depending on the result -- e.g. add a raw latex 
> font-changing command, if the output format is 
> latex, or put it in a special span. 
>
> Pandoc itself doesn't provide any special functions 
> for unicode ranges, but you can just use whatever 
> functions are provided by the language in which 
> you're writing the filter. 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/08b4ef89-105b-4e88-bd37-2f81b59b552f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2195 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-05 19:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05  9:01 AST and non-Latin scripts Lyndon Drake
     [not found] ` <a88f9304-09b6-4f8f-acea-5626ccc9e546-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-06-05 13:47   ` John MacFarlane
     [not found]     ` <20170605134741.GD18559-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2017-06-05 13:56       ` John Muccigrosso
2017-06-05 19:38       ` Lyndon Drake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).