From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11894 Path: news.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Flaws in the Pandoc Unicode (OK, UTF-8) handling Date: Sat, 31 Jan 2015 22:49:28 -0800 Message-ID: <20150201064928.GC12964@localhost.hsd1.ca.comcast.net> References: <97d9c232-cc87-41ca-859b-f7495db3148f@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1422773384 21946 80.91.229.3 (1 Feb 2015 06:49:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 1 Feb 2015 06:49:44 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBBUZW6TAKGQE2MLVNZI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun Feb 01 07:49:43 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ie0-f187.google.com ([209.85.223.187]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YHoLn-0004xH-BU for gtp-pandoc-discuss@m.gmane.org; Sun, 01 Feb 2015 07:49:43 +0100 Original-Received: by mail-ie0-f187.google.com with SMTP id rl12sf27235749iec.4 for ; Sat, 31 Jan 2015 22:49:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=from:date:to:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=ljwCnkYMmEcg+yDlG3LNomQTPip6i5QogbVunTViIZE=; b=fcxqrPYeXV/XcJGNb4YApuM5hKBIvtl7edCXvwrKE+pecG67/CZgUpulttk+3ppO5y PdlF8lzcJL4zwYsYD9uHP39clGXLSnwpf8SC8LfOGIgZm82Sj0iF6imaHA/S2pggDNtm rq8O/anaX895j4xccL5hUWpl8iFMUYL+AANaW5/Smw+6ArsdVGMFf1zgdLqHX6bRy4Kf hgJN811iYzdKd8cSnyXpXFGPBbNTdv1RY8jd5287sM+GnR670ZXdjxvvL+Gqd4kZWdu6 adYgL/vBSU07CVAyuhNjJpo8tfmic3reKFO2IbWyamkL4JT2o7P9HxTBavdairkLW X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:date:to:subject:message-id:references :mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=ljwCnkYMmEcg+yDlG3LNomQTPip6i5QogbVunTViIZE=; b=YN6NIuVANnKfyrmuDWihCgyT7baL7KeuZczGtesNTb19gJLgqH7v9zQX2bROVuE5gk 1A3FWw9sFQfQCFL5O26LAdECsg7wW4BOwBZvpaT+Ty1KS1zhqUAnEIkE3X063f40uMWg 5Pli3GwmE0lwS3eDQCH41B6gyNYxK0R95EnfCGiA5BzIeXYLBPVpV5pl2SGYzT0I/ygK 2WJSsxcvxn56F7yEJ8Q7DanYL1C44RkTs4BqxM93CMvU4j3ynxDsZINyxxnz9lLk2+4H 17fiRsAIvn/12Wdg/AOWxS3w8Sb4s3IBR/dVboECuneV X-Received: by 10.50.118.42 with SMTP id kj10mr59743igb.9.1422773382583; Sat, 31 Jan 2015 22:49:42 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.50.77.37 with SMTP id p5ls948686igw.41.gmail; Sat, 31 Jan 2015 22:49:42 -0800 (PST) X-Received: by 10.66.66.1 with SMTP id b1mr11796960pat.8.1422773382133; Sat, 31 Jan 2015 22:49:42 -0800 (PST) Original-Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com. [209.85.220.53]) by gmr-mx.google.com with ESMTPS id v6si1681336pdo.2.2015.01.31.22.49.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 31 Jan 2015 22:49:42 -0800 (PST) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 209.85.220.53 as permitted sender) client-ip=209.85.220.53; Original-Received: by mail-pa0-f53.google.com with SMTP id kx10so69186852pab.12 for ; Sat, 31 Jan 2015 22:49:42 -0800 (PST) X-Gm-Message-State: ALoCoQlWqR9QVNSiMUBJNFR5HfTTg9fOUIxMkUXQs300LeedeOHV2HXEcYt5lHqMrSC5zB1C5Dhq X-Received: by 10.68.129.6 with SMTP id ns6mr20202699pbb.137.1422773381978; Sat, 31 Jan 2015 22:49:41 -0800 (PST) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by mx.google.com with ESMTPSA id fs1sm15305763pdb.16.2015.01.31.22.49.40 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 31 Jan 2015 22:49:40 -0800 (PST) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 04C18A27B; Sun, 1 Feb 2015 01:49:29 -0500 (EST) Content-Disposition: inline In-Reply-To: <97d9c232-cc87-41ca-859b-f7495db3148f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-PGP-Key: http://johnmacfarlane.net/jgm.asc User-Agent: Mutt/1.5.23 (2014-03-12) X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 209.85.220.53 as permitted sender) smtp.mail=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11894 Archived-At: There was a fix for UTF-8 in custom lua writers in 1.12.4, so if your version is earlier you should upgrade. I have no problem with the character you mention in a custom writer: % pandoc -t data/sample.lua girl/woman/female: =E5=A5=B3) ^D

girl/woman/female: =E5=A5=B3)

Can you reproduce the problem with the sample custom writer, data/sample.lua? +++ Gordon Steemson [Jan 31 15 18:42 ]: >I came very close to getting Pandoc to actually do what I mean today. >Unfortunately, when I ran my Pandoc wrapper script (it divides up my >custom-formatted whole-story Markdown files into individual chapters, each >with a prepended metadata block, then calls Pandoc on each individual >chapter) on a different input file, it worked the first couple of times an= d >then started complaining that a specific well-formed UTF-8 character wasn= =E2=80=99t >well-formed (specifically, the CJKV ideograph for girl/woman/female: =E5= =A5=B3). Pandoc >is the only software I can find that makes this claim about my file, so I >am inclined to believe the file is not at fault =E2=80=94 especially since= it >worked fine yesterday. I have reinstalled both Haskell and Pandoc, without >effect. > >This is not the first time Pandoc has been annoying at me about UTF-8 >interpretation; I have found that any attempt to print UTF-8 text to >standard output or standard error from within my custom writer is doomed t= o >failure. The individual bytes within each UTF-8 encoded character are bein= g >interpreted by some layer within Pandoc as Latin-1 or some similar >single-byte encoding, and then erroneously re-translated into a string of >two or three UTF-8 characters for every single UTF-8 character I try to >output. > >Every software setting I have control of is set to UTF-8. Even setting the >locale within Lua with =E2=80=9Cos.setlocale('en_CA.UTF-8')=E2=80=9D doesn= =E2=80=99t have any >effect. > >I=E2=80=99m completely stumped here. Help! > >--=20 >You received this message because you are subscribed to the Google Groups = "pandoc-discuss" group. >To unsubscribe from this group and stop receiving emails from it, send an = email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >To view this discussion on the web visit https://groups.google.com/d/msgid= /pandoc-discuss/97d9c232-cc87-41ca-859b-f7495db3148f%40googlegroups.com. >For more options, visit https://groups.google.com/d/optout. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/20150201064928.GC12964%40localhost.hsd1.ca.comcast.net. For more options, visit https://groups.google.com/d/optout.