Thanks John for your prompt reply.

On 31 January 2015 at 22:49, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> There was a fix for UTF-8 in custom lua writers in 1.12.4, so if your
> version is earlier you should upgrade.
>

I’m using the latest stable version, 1.13.2.

I have no problem with the character you mention in a custom writer:
>
>    % pandoc -t data/sample.lua
>    girl/woman/female: 女)
>    ^D
>    <p>girl/woman/female: 女)</p>
>
> Can you reproduce the problem with the sample custom writer,
> data/sample.lua?
>

It works fine in sample.lua. However, until about 10 AM today it also
worked fine in my custom writer. I think something a little more subtle is
going on here.

I should add that the problem is not being triggered from the main body of
the work… it’s coming from a > block in my YAML metadata header, which I
found to be a fine place to keep stuff like author’s notes. Incidentally, I
don’t know why, but for the markdown to parse correctly, you need to insert
_two_ blank lines between paragraph text and the start of a bullet list in
YAML metadata. If you only leave one blank line between them, the first
bullet-list item gets folded into the preceding text paragraph. Kind of
strange but there you are.

Gordon





+++ Gordon Steemson [Jan 31 15 18:42 ]:
>
>> I came very close to getting Pandoc to actually do what I mean today.
>> Unfortunately, when I ran my Pandoc wrapper script (it divides up my
>> custom-formatted whole-story Markdown files into individual chapters, each
>> with a prepended metadata block, then calls Pandoc on each individual
>> chapter) on a different input file, it worked the first couple of times
>> and
>> then started complaining that a specific well-formed UTF-8 character
>> wasn’t
>> well-formed (specifically, the CJKV ideograph for girl/woman/female: 女).
>> Pandoc
>> is the only software I can find that makes this claim about my file, so I
>> am inclined to believe the file is not at fault — especially since it
>> worked fine yesterday. I have reinstalled both Haskell and Pandoc, without
>> effect.
>>
>> This is not the first time Pandoc has been annoying at me about UTF-8
>> interpretation; I have found that any attempt to print UTF-8 text to
>> standard output or standard error from within my custom writer is doomed
>> to
>> failure. The individual bytes within each UTF-8 encoded character are
>> being
>> interpreted by some layer within Pandoc as Latin-1 or some similar
>> single-byte encoding, and then erroneously re-translated into a string of
>> two or three UTF-8 characters for every single UTF-8 character I try to
>> output.
>>
>> Every software setting I have control of is set to UTF-8. Even setting the
>> locale within Lua with “os.setlocale('en_CA.UTF-8')” doesn’t have any
>> effect.
>>
>> I’m completely stumped here. Help!
>>
>
-- 
The world’s only gsteemso

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CABKoxZoMwz0un9icMY2AWqstfaUmiqgB5jwa3zfVhBUrtpF6gA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.