* Markdown writer: emit HTML entities instead of unicode @ 2018-10-31 16:19 Gareth Stockwell [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Gareth Stockwell @ 2018-10-31 16:19 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1172 bytes --] Is there any way to cause the markdown writer to emit HTML entities such as ™ rather than the unicode equivalent? To explain what I mean, see the following example $ echo "™" | pandoc -f markdown -t markdown_strict+raw_html | uni2ascii 0x2122 Total input characters 2 Characters converted to escapes 1 Characters replaced with ASCII 0 Characters deleted 0 How should I modify the pandoc options, to cause this example to output simply "™" ? See also https://www.w3schools.com/charsets/ref_utf_letterlike.asp -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d61bd26b-7257-420f-920f-4f140616e519%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 4071 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-10-31 18:57 ` mb21 [not found] ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: mb21 @ 2018-10-31 18:57 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1349 bytes --] You can use the --ascii flag, which will emit: <p>™</p> On Wednesday, October 31, 2018 at 5:19:43 PM UTC+1, Gareth Stockwell wrote: > > Is there any way to cause the markdown writer to emit HTML entities such > as ™ rather than the unicode equivalent? > > To explain what I mean, see the following example > > $ echo "™" | pandoc -f markdown -t markdown_strict+raw_html | > uni2ascii > 0x2122 > Total input characters 2 > Characters converted to escapes 1 > Characters replaced with ASCII 0 > Characters deleted 0 > > How should I modify the pandoc options, to cause this example to output > simply "™" ? > > See also https://www.w3schools.com/charsets/ref_utf_letterlike.asp > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/edf5aeb1-bd71-4f0e-8596-8203c794245d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 3901 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-10-31 20:37 ` John MacFarlane [not found] ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: John MacFarlane @ 2018-10-31 20:37 UTC (permalink / raw) To: mb21, pandoc-discuss mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > You can use the --ascii flag, which will emit: <p>™</p> And, just to be explicit: there's no way to keep `™`; pandoc throws out information about which entity was used and just stores the character. If you really want `™`, though, you could do: `™`{=markdown} and this will be passed through to markdown output verbatim. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2018-11-01 11:22 ` BP Jonsson [not found] ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: BP Jonsson @ 2018-11-01 11:22 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2361 bytes --] Just out of curiosity, since the Markdown and HTML readers presumably do a named entity to character lookup to resolve entities, would it be hard or forbiddingly expensive to have the writers do the reverse lookup under the `--ascii` option, only falling back to (preferably hex) numeric entities only if no named entity is found? After all probably everyone has an easier time mentally mapping named entities to characters than numeric entities. I know the [HTML 5 named entity list][] is huge, but AFAIK it is not official yet. [HTML 5 named entity list]: https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: > mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > You can use the --ascii flag, which will emit: <p>™</p> > > And, just to be explicit: there's no way to keep > `™`; pandoc throws out information about which > entity was used and just stores the character. > > If you really want `™`, though, you could do: > > `™`{=markdown} > > and this will be passed through to markdown output verbatim. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 3809 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2018-11-01 18:07 ` John MacFarlane [not found] ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-11-02 0:50 ` MarLinn 1 sibling, 1 reply; 10+ messages in thread From: John MacFarlane @ 2018-11-01 18:07 UTC (permalink / raw) To: BP Jonsson, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw tagsoup has htmlEntities :: [(String, String)] and we could indeed do this lookup. We'd probably want to convert it to a map to make this more efficient. Maybe this is worth doing, at least for HTML5 output? (For XML, we need to stick with numerical entities, and probably also for HTML4.) BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Just out of curiosity, since the Markdown and HTML readers presumably do a > named entity to character lookup to resolve entities, would it be hard or > forbiddingly expensive to have the writers do the reverse lookup under the > `--ascii` option, only falling back to (preferably hex) numeric entities > only if no named entity is found? After all probably everyone has an easier > time mentally mapping named entities to characters than numeric entities. I > know the [HTML 5 named entity list][] is huge, but AFAIK it is not official > yet. > > [HTML 5 named entity list]: > https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 > > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: > >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> > You can use the --ascii flag, which will emit: <p>™</p> >> >> And, just to be explicit: there's no way to keep >> `™`; pandoc throws out information about which >> entity was used and just stores the character. >> >> If you really want `™`, though, you could do: >> >> `™`{=markdown} >> >> and this will be passed through to markdown output verbatim. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2018-11-01 21:45 ` Gareth Stockwell [not found] ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-11-01 23:33 ` John MacFarlane 1 sibling, 1 reply; 10+ messages in thread From: Gareth Stockwell @ 2018-11-01 21:45 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: BP Jonsson [-- Attachment #1: Type: text/plain, Size: 4298 bytes --] If such a reverse translation was implemented, could it also be supported in the Markdown writer? On Thu, 1 Nov 2018, 18:07 John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org wrote: > > tagsoup has > > htmlEntities :: [(String, String)] > > and we could indeed do this lookup. We'd probably > want to convert it to a map to make this more > efficient. > > Maybe this is worth doing, at least for HTML5 output? > (For XML, we need to stick with numerical entities, > and probably also for HTML4.) > > BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Just out of curiosity, since the Markdown and HTML readers presumably do > a > > named entity to character lookup to resolve entities, would it be hard or > > forbiddingly expensive to have the writers do the reverse lookup under > the > > `--ascii` option, only falling back to (preferably hex) numeric entities > > only if no named entity is found? After all probably everyone has an > easier > > time mentally mapping named entities to characters than numeric > entities. I > > know the [HTML 5 named entity list][] is huge, but AFAIK it is not > official > > yet. > > > > [HTML 5 named entity list]: > > > https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 > > > > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: > > > >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > >> > >> > You can use the --ascii flag, which will emit: <p>™</p> > >> > >> And, just to be explicit: there's no way to keep > >> `™`; pandoc throws out information about which > >> entity was used and just stores the character. > >> > >> If you really want `™`, though, you could do: > >> > >> `™`{=markdown} > >> > >> and this will be passed through to markdown output verbatim. > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups > >> "pandoc-discuss" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an > >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > >> To view this discussion on the web visit > >> > https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net > >> . > >> For more options, visit https://groups.google.com/d/optout. > >> > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com > . > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480kva5gk6xg.fsf%40johnmacfarlane.net > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAGewFGB0xGnPgdz_ZaQxRZ5Cf%3DOjMJpUp4dCTiKc%2Bo0%3DvgO9QA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 7097 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2018-11-01 22:34 ` John MacFarlane 0 siblings, 0 replies; 10+ messages in thread From: John MacFarlane @ 2018-11-01 22:34 UTC (permalink / raw) To: Gareth Stockwell, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: BP Jonsson Gareth Stockwell <gareth.stockwell-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: Yes, when used with --ascii. > If such a reverse translation was implemented, could it also be supported > in the Markdown writer? > > On Thu, 1 Nov 2018, 18:07 John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org wrote: > >> >> tagsoup has >> >> htmlEntities :: [(String, String)] >> >> and we could indeed do this lookup. We'd probably >> want to convert it to a map to make this more >> efficient. >> >> Maybe this is worth doing, at least for HTML5 output? >> (For XML, we need to stick with numerical entities, >> and probably also for HTML4.) >> >> BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> > Just out of curiosity, since the Markdown and HTML readers presumably do >> a >> > named entity to character lookup to resolve entities, would it be hard or >> > forbiddingly expensive to have the writers do the reverse lookup under >> the >> > `--ascii` option, only falling back to (preferably hex) numeric entities >> > only if no named entity is found? After all probably everyone has an >> easier >> > time mentally mapping named entities to characters than numeric >> entities. I >> > know the [HTML 5 named entity list][] is huge, but AFAIK it is not >> official >> > yet. >> > >> > [HTML 5 named entity list]: >> > >> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 >> > >> > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: >> > >> >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> >> >> > You can use the --ascii flag, which will emit: <p>™</p> >> >> >> >> And, just to be explicit: there's no way to keep >> >> `™`; pandoc throws out information about which >> >> entity was used and just stores the character. >> >> >> >> If you really want `™`, though, you could do: >> >> >> >> `™`{=markdown} >> >> >> >> and this will be passed through to markdown output verbatim. >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "pandoc-discuss" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> To view this discussion on the web visit >> >> >> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net >> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com >> . >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/yh480kva5gk6xg.fsf%40johnmacfarlane.net >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAGewFGB0xGnPgdz_ZaQxRZ5Cf%3DOjMJpUp4dCTiKc%2Bo0%3DvgO9QA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-11-01 21:45 ` Gareth Stockwell @ 2018-11-01 23:33 ` John MacFarlane 1 sibling, 0 replies; 10+ messages in thread From: John MacFarlane @ 2018-11-01 23:33 UTC (permalink / raw) To: BP Jonsson, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw I've implemented this for the HTML5 writer and for the Markdown writer. % pandoc --ascii -t html5 äéıå <p>äéıå</p> -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kbm78id9p.fsf%40johnmacfarlane.net. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-11-01 18:07 ` John MacFarlane @ 2018-11-02 0:50 ` MarLinn [not found] ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 10+ messages in thread From: MarLinn @ 2018-11-02 0:50 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 5977 bytes --] I see a potential problem. Do all browsers support all of these named entities, especially if the list is not official yet? What about other potential consumers? And following along this line, is it a good idea to rely on tagsoup's list? Or more generally, what are the criteria for an entity to be included in such a list, and who should maintain it? Should users be able to partially override it? On 2018-11-01 12:22, BP Jonsson wrote: > Just out of curiosity, since the Markdown and HTML readers presumably do a > named entity to character lookup to resolve entities, would it be hard or > forbiddingly expensive to have the writers do the reverse lookup under the > `--ascii` option, only falling back to (preferably hex) numeric entities > only if no named entity is found? After all probably everyone has an easier > time mentally mapping named entities to characters than numeric entities. I > know the [HTML 5 named entity list][] is huge, but AFAIK it is not official > yet. > > [HTML 5 named entity list]: > https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 > > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: > >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >>> You can use the --ascii flag, which will emit: <p>™</p> >> And, just to be explicit: there's no way to keep >> `™`; pandoc throws out information about which >> entity was used and just stores the character. >> >> If you really want `™`, though, you could do: >> >> `™`{=markdown} >> >> and this will be passed through to markdown output verbatim. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > Just out of curiosity, since the Markdown and HTML readers presumably > do a named entity to character lookup to resolve entities, would it be > hard or forbiddingly expensive to have the writers do the reverse > lookup under the `--ascii` option, only falling back to (preferably > hex) numeric entities only if no named entity is found? After all > probably everyone has an easier time mentally mapping named entities > to characters than numeric entities. I know the [HTML 5 named entity > list][] is huge, but AFAIK it is not official yet. > > [HTML 5 named entity list]: > https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23 > > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org > <mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>>: > > mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> writes: > > > You can use the --ascii flag, which will emit: <p>™</p> > > And, just to be explicit: there's no way to keep > `™`; pandoc throws out information about which > entity was used and just stores the character. > > If you really want `™`, though, you could do: > > `™`{=markdown} > > and this will be passed through to markdown output verbatim. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To post to this group, send email to > pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com > <https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/16cbeca9-6954-dab7-3984-6234a32c4708%40gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 9612 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Markdown writer: emit HTML entities instead of unicode [not found] ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2018-11-02 2:28 ` John MacFarlane 0 siblings, 0 replies; 10+ messages in thread From: John MacFarlane @ 2018-11-02 2:28 UTC (permalink / raw) To: MarLinn, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw MarLinn <monkleyon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > I see a potential problem. Do all browsers support all of these named > entities, especially if the list is not official yet? What about other > potential consumers? I don't know the answer to this. Maybe someone who is knowledgable can comment. I assumed that it's a standard list and that all modern browsers support it. According to tagsoup's documentation, the list comes from http://www.w3.org/TR/html5/syntax.html#named-character-references > And following along this line, is it a good idea to rely on tagsoup's > list? Or more generally, what are the criteria for an entity to be > included in such a list, and who should maintain it? Should users be > able to partially override it? I'd prefer to avoid complexity here. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-11-02 2:28 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-31 16:19 Markdown writer: emit HTML entities instead of unicode Gareth Stockwell [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-10-31 18:57 ` mb21 [not found] ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-10-31 20:37 ` John MacFarlane [not found] ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-11-01 11:22 ` BP Jonsson [not found] ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-11-01 18:07 ` John MacFarlane [not found] ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-11-01 21:45 ` Gareth Stockwell [not found] ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-11-01 22:34 ` John MacFarlane 2018-11-01 23:33 ` John MacFarlane 2018-11-02 0:50 ` MarLinn [not found] ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2018-11-02 2:28 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).