Thanks John for your prompt reply. On 31 January 2015 at 22:49, John MacFarlane wrote: > There was a fix for UTF-8 in custom lua writers in 1.12.4, so if your > version is earlier you should upgrade. > I’m using the latest stable version, 1.13.2. I have no problem with the character you mention in a custom writer: > > % pandoc -t data/sample.lua > girl/woman/female: 女) > ^D >

girl/woman/female: 女)

> > Can you reproduce the problem with the sample custom writer, > data/sample.lua? > It works fine in sample.lua. However, until about 10 AM today it also worked fine in my custom writer. I think something a little more subtle is going on here. I should add that the problem is not being triggered from the main body of the work… it’s coming from a > block in my YAML metadata header, which I found to be a fine place to keep stuff like author’s notes. Incidentally, I don’t know why, but for the markdown to parse correctly, you need to insert _two_ blank lines between paragraph text and the start of a bullet list in YAML metadata. If you only leave one blank line between them, the first bullet-list item gets folded into the preceding text paragraph. Kind of strange but there you are. Gordon +++ Gordon Steemson [Jan 31 15 18:42 ]: > >> I came very close to getting Pandoc to actually do what I mean today. >> Unfortunately, when I ran my Pandoc wrapper script (it divides up my >> custom-formatted whole-story Markdown files into individual chapters, each >> with a prepended metadata block, then calls Pandoc on each individual >> chapter) on a different input file, it worked the first couple of times >> and >> then started complaining that a specific well-formed UTF-8 character >> wasn’t >> well-formed (specifically, the CJKV ideograph for girl/woman/female: 女). >> Pandoc >> is the only software I can find that makes this claim about my file, so I >> am inclined to believe the file is not at fault — especially since it >> worked fine yesterday. I have reinstalled both Haskell and Pandoc, without >> effect. >> >> This is not the first time Pandoc has been annoying at me about UTF-8 >> interpretation; I have found that any attempt to print UTF-8 text to >> standard output or standard error from within my custom writer is doomed >> to >> failure. The individual bytes within each UTF-8 encoded character are >> being >> interpreted by some layer within Pandoc as Latin-1 or some similar >> single-byte encoding, and then erroneously re-translated into a string of >> two or three UTF-8 characters for every single UTF-8 character I try to >> output. >> >> Every software setting I have control of is set to UTF-8. Even setting the >> locale within Lua with “os.setlocale('en_CA.UTF-8')” doesn’t have any >> effect. >> >> I’m completely stumped here. Help! >> > -- The world’s only gsteemso -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CABKoxZoMwz0un9icMY2AWqstfaUmiqgB5jwa3zfVhBUrtpF6gA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.