From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/21399 Path: news.gmane.org!.POSTED!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Markdown writer: emit HTML entities instead of unicode Date: Thu, 01 Nov 2018 11:07:39 -0700 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: blaine.gmane.org 1541095549 22096 195.159.176.226 (1 Nov 2018 18:05:49 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 1 Nov 2018 18:05:49 +0000 (UTC) To: BP Jonsson , pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBB6MB5XPAKGQEPYIO6DY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Nov 01 19:05:44 2018 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pf1-f189.google.com ([209.85.210.189]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gIHLk-0005e3-3R for gtp-pandoc-discuss@m.gmane.org; Thu, 01 Nov 2018 19:05:44 +0100 Original-Received: by mail-pf1-f189.google.com with SMTP id g24-v6sf14079521pfi.23 for ; Thu, 01 Nov 2018 11:07:54 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1541095674; cv=pass; d=google.com; s=arc-20160816; b=pQlYueoKu6kB2t1lHuuBrqY7kyk6hT75Yu9wmdsouqy5DX2+8XdDJttDm7mime896u +nCHrFv6qiIWLhI9imXxq1k50Cwg4G5PitIIuQfox6XvDmGApa7wDejRq9T6COkiOSMp VKVpZ4LIJiUVTC1eoCnTDiANsl8tJ3FrkrkK6UjRSB/kf6Z7cek1aH/9f7J9Cjg1Nbge 3AZbG7MsoVUA23KYFTPoI5BCtswrRYxpl2zaBgujFIb2NNntMaa0oN36rnrlfJZEHs7t uCC/r1Atfe0FY3AjXyJXUn3UJ5v2Aea1QoGCZ3Sd8M4IkW0AWD/yibSxVubdGGBbBV4Q ZLjA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:references:in-reply-to:subject:to:from:sender:dkim-signature; bh=uBU5AuLbQ8LR5Qu2jgZ+Ui20kOaJR5DmYRgaz2DNB84=; b=H7KrWnBmoeMv1RKapF8BaoyMAyu0n/1ygORn4XVlV+jnSGB6bRkPeqTH9fFzrecN0/ 98UQBz3mSCbNDhAdB3uSAZlORcS6Ii0wsDen78xvbzvnYqwBW8/1lC1YyHkYOBcZuG4p sb+bOBFFV3uV4uYJ2rpXKpvu6naIfdrCQ3VGb3ub78mcGe/1Znf4sngflNYi+8XR/0ld b+1Q5mZx/U48kVAlvmjtiuXazmdc73vf3Khc4sLnKgsOApGb8JTlhCrAmet6PwSvnc4y kcHaE/BYEbVNg+IAMNbqTXFVWBLimWE/3YSao+tfAcGGGP9g76NANbHOvydGIfNl8pHo Pgkw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="l/5xwbQ7"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=uBU5AuLbQ8LR5Qu2jgZ+Ui20kOaJR5DmYRgaz2DNB84=; b=U1b27saILlQT+vgkZjeRP+6XEI1MCvdgIrjC4dAy3ePCDGEpMnWqesgu79n+pu4JmV RnoHKqxRWKINqcuVZXHEKjm/7EiStyp6bEEF/ydT3Xqx63bXTUsHpBZAPajc73q2p6dy nIDinVp2GfMsF0SdNLs89IFKxajlyzJk8cxybZ5z/itl+0sJ8Wezxvqmkkj7Ty31qa55 zvKxvMFdyaNrZKJbK0o2JrfWjytXdiLe4ZHorCA79SBBZLb3nk0IjwaQ3dRTjG3DapVm crRUP4fw0BIOIPiA6BzC2kWwETvcrhwEduqim8GxBnPUBk5/W1XVtkPiEaXfl0qy3RvQ vOHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=uBU5AuLbQ8LR5Qu2jgZ+Ui20kOaJR5DmYRgaz2DNB84=; b=kohaHziQciLNPRKpa63EhA+LekuU/B/ZDut3oQJEvu7Amix9FuLrUPeRg06ktYMos3 g49U4T/NNPiO/pvqVDfIUrCyHuLOF+mNd+czVW45G+yBgay7VuB+XiRoaNZenFAPG3nG loQyP18UIHopTHhq9EcyYqch3c+xidDwovJcuuoLJE9qjQVF6uyo+qgRMh63BMol794o OMThDtnfHYkphtISbQRvbtv6rGTJQK67X1VtqyOB2hvg4cD6qe4u0pag6S1ZNiEmVZSu AjMj+swbn43AJ7FaagDZzm+VK8TQk7VpN8Nos0y3Vr2xRyKVe23ZrX7MbcX07/xaBtXD 8WHg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AGRZ1gLiWETVhxFwvGDD74XbXDyQj/w+euip8M6u7TzGd5qNZ7Iv1ecW Gi9ujLOvn3QzlRwiLLgZS6A= X-Google-Smtp-Source: AJdET5ffHIoiiNqi+2xetstdUM6Ixwo4TKgSs+xCQR1dJ84svq3gBeBg8HeFp0s6uuLv8lFCzBdAhw== X-Received: by 2002:a17:902:8d93:: with SMTP id v19-v6mr61714plo.2.1541095674302; Thu, 01 Nov 2018 11:07:54 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a17:902:b483:: with SMTP id y3-v6ls1162176plr.3.gmail; Thu, 01 Nov 2018 11:07:53 -0700 (PDT) X-Received: by 2002:a17:902:bd8a:: with SMTP id q10-v6mr472426pls.34.1541095673475; Thu, 01 Nov 2018 11:07:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541095673; cv=none; d=google.com; s=arc-20160816; b=Xy1G4uRSLhTx4gDsfj/hPiHH1R8IprTTjCYT1TQ4rXJYJAlWd7PxY8Gv7IXt71mewo xuOHZOhNdvdVy/XAp2uKJHNak2ExAzHaFEu49hhAx9XECrg46mh16e0zwWSaPEcqb9wq uv8aCxVEnvKHUOtco3Nt7QYXAKNGPLKl9HsyxCSt8u4KQQMKhdV6XELLwNQ0aRsZ7YO5 K8qBDjWAPFRQLihvY1v0sfJ3sGst+ahqK/462lnrk/UhIvYQhAd6bygbbl5KKZMNRm5l H4s9OP8hY8GUReHnRIuYKZ13YZaE0efrCFp0DMpTSimq66bKncIhn8/YF1L+zqd5rJjf EuGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=KDyRFR6CLLLUET/pXKKmSazkVb8Rd26I9THW6MwJ+K0=; b=pjD0TU2E2z0IYE3iKbegxXLeOjycUGwWwcxxsXZVUlrhmfI1vXHxSnFNMsQTrFvacI 6s1XZ6UmFz36fnVWdOuw4R2lFYS6zQcVUk/x+rFMl4z5ce5vbkS8dVILdcj7DB67ApOp 87iPP6EbPZdDqUdPOfavXPUM0RdC+NqAwqm4N41dTvfh4lctHHWvjG9xRAZccyMKJqI2 fnktkGTGPyU22tqYWWW8yLto5K1hf2P3ZpPOoBExYEr9/AxRKyoPPShAf5p4LpKJTQHO gbdfGDo7Q07DI/TBz2kRhRJmQMVn47LYRv0XfW16EWbZqyFKTugShQSXjw1vOp/BWuCS R74Q== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="l/5xwbQ7"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com. [2607:f8b0:4864:20::532]) by gmr-mx.google.com with ESMTPS id p85-v6si1610406pfa.3.2018.11.01.11.07.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Nov 2018 11:07:53 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) client-ip=2607:f8b0:4864:20::532; Original-Received: by mail-pg1-x532.google.com with SMTP id n10-v6so9370565pgv.10 for ; Thu, 01 Nov 2018 11:07:53 -0700 (PDT) X-Received: by 2002:a63:f547:: with SMTP id e7mr8214071pgk.182.1541095672602; Thu, 01 Nov 2018 11:07:52 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id a124-v6sm15879527pfb.78.2018.11.01.11.07.50 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 11:07:51 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 1F665A177; Thu, 1 Nov 2018 14:07:40 -0400 (EDT) In-Reply-To: X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b="l/5xwbQ7"; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::532 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:21399 Archived-At: tagsoup has htmlEntities :: [(String, String)] and we could indeed do this lookup. We'd probably want to convert it to a map to make this more efficient. Maybe this is worth doing, at least for HTML5 output? (For XML, we need to stick with numerical entities, and probably also for HTML4.) BP Jonsson writes: > Just out of curiosity, since the Markdown and HTML readers presumably do a > named entity to character lookup to resolve entities, would it be hard or > forbiddingly expensive to have the writers do the reverse lookup under the > `--ascii` option, only falling back to (preferably hex) numeric entities > only if no named entity is found? After all probably everyone has an easier > time mentally mapping named entities to characters than numeric entities. I > know the [HTML 5 named entity list][] is huge, but AFAIK it is not official > yet. > > [HTML 5 named entity list]: > https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 > > Den ons 31 okt 2018 21:37 skrev John MacFarlane : > >> mb21 writes: >> >> > You can use the --ascii flag, which will emit:

>> >> And, just to be explicit: there's no way to keep >> `™`; pandoc throws out information about which >> entity was used and just stores the character. >> >> If you really want `™`, though, you could do: >> >> `™`{=markdown} >> >> and this will be passed through to markdown output verbatim. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout.