The problem with doing language tagging with a filter in the current AST model is that you do want to include inter-word spaces in the span but exclude some punctuation, like brackets while including other punctuation like dashes and quotes. Terminal punctuation (period, comma, (semi)colon, exclamation/question mark) is very difficult: it should be included if the whole sentence or paragraph is Greek but excluded if only one or two words in an otherwise non-Greek sentence are Greek. I have an ugly hack of a perl script which currently does a half decent job of tagging parts of a text file based on Unicode scripts with a monster regular expression. It can even be told to skip code (inline and block), LaTeX environments and commands and HTML tags, and spurred by this thread I spent some time yesterday trying to improve it by skipping the parenthesized part of inline links, but the results of running this script still need manual checking so that it works almost as well or better to just search character spans with certain Unicode ranges in a capable editor. For languages written in the same script even more sophisticated language detection software easily fail. For example with Swedish and Icelandic more than half of the words in running text could be either language, but they need different hyphenation rules. /bpj Den 28 jan 2017 20:29 skrev "Andrew Dunning" : > It works fine if,you set your main font to something including Greek, e.g. > Brill. All you're missing in that case is correct hyphenation. > Alternatively, you can set a font by Unicode range rather than language (it > doesn't do this automatically). > > It strikes me that what we really need is a filter that would tag > languages in Pandoc output based on best guesses (Word does this to some > extent already). Should theoretically be possible with the language > detection libraries out there. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/ > msgid/pandoc-discuss/e2f300d1-b5dd-41df-b81f-11d05516bc00% > 40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuQLyx_dUoMkY32OXT%3D-CmX9US5veLZdo87mvoD-OaUfzA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.