public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Markdown writer: emit HTML entities instead of unicode
@ 2018-10-31 16:19 Gareth Stockwell
       [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Gareth Stockwell @ 2018-10-31 16:19 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1172 bytes --]

Is there any way to cause the markdown writer to emit HTML entities such as 
&trade; rather than the unicode equivalent?

To explain what I mean, see the following example

$ echo "&trade;" | pandoc -f markdown -t markdown_strict+raw_html | 
uni2ascii
0x2122
Total input characters                     2
Characters converted to escapes            1
Characters replaced with ASCII             0
Characters deleted                         0

How should I modify the pandoc options, to cause this example to output 
simply "&trade;" ?

See also https://www.w3schools.com/charsets/ref_utf_letterlike.asp

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d61bd26b-7257-420f-920f-4f140616e519%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4071 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-10-31 18:57   ` mb21
       [not found]     ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: mb21 @ 2018-10-31 18:57 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1349 bytes --]

You can use the  --ascii flag, which will emit: <p>&#8482;</p>


On Wednesday, October 31, 2018 at 5:19:43 PM UTC+1, Gareth Stockwell wrote:
>
> Is there any way to cause the markdown writer to emit HTML entities such 
> as &trade; rather than the unicode equivalent?
>
> To explain what I mean, see the following example
>
> $ echo "&trade;" | pandoc -f markdown -t markdown_strict+raw_html | 
> uni2ascii
> 0x2122
> Total input characters                     2
> Characters converted to escapes            1
> Characters replaced with ASCII             0
> Characters deleted                         0
>
> How should I modify the pandoc options, to cause this example to output 
> simply "&trade;" ?
>
> See also https://www.w3schools.com/charsets/ref_utf_letterlike.asp
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/edf5aeb1-bd71-4f0e-8596-8203c794245d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3901 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]     ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-10-31 20:37       ` John MacFarlane
       [not found]         ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2018-10-31 20:37 UTC (permalink / raw)
  To: mb21, pandoc-discuss

mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> You can use the  --ascii flag, which will emit: <p>&#8482;</p>

And, just to be explicit: there's no way to keep
`&trade;`; pandoc throws out information about which
entity was used and just stores the character.

If you really want `&trade;`, though, you could do:

    `&trade;`{=markdown}

and this will be passed through to markdown output verbatim.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]         ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-11-01 11:22           ` BP Jonsson
       [not found]             ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: BP Jonsson @ 2018-11-01 11:22 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2361 bytes --]

Just out of curiosity, since the Markdown and HTML readers presumably do a
named entity to character lookup to resolve entities, would it be hard or
forbiddingly expensive to have the writers do the reverse lookup under the
`--ascii` option, only falling back to (preferably hex) numeric entities
only if no named entity is found? After all probably everyone has an easier
time mentally mapping named entities to characters than numeric entities. I
know the [HTML 5 named entity list][] is huge, but AFAIK it is not official
yet.

[HTML 5 named entity list]:
https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23

Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:

> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > You can use the  --ascii flag, which will emit: <p>&#8482;</p>
>
> And, just to be explicit: there's no way to keep
> `&trade;`; pandoc throws out information about which
> entity was used and just stores the character.
>
> If you really want `&trade;`, though, you could do:
>
>     `&trade;`{=markdown}
>
> and this will be passed through to markdown output verbatim.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3809 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]             ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-11-01 18:07               ` John MacFarlane
       [not found]                 ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2018-11-02  0:50               ` MarLinn
  1 sibling, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2018-11-01 18:07 UTC (permalink / raw)
  To: BP Jonsson, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


tagsoup has

htmlEntities :: [(String, String)]

and we could indeed do this lookup.  We'd probably
want to convert it to a map to make this more
efficient.

Maybe this is worth doing, at least for HTML5 output?
(For XML, we need to stick with numerical entities,
and probably also for HTML4.)

BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Just out of curiosity, since the Markdown and HTML readers presumably do a
> named entity to character lookup to resolve entities, would it be hard or
> forbiddingly expensive to have the writers do the reverse lookup under the
> `--ascii` option, only falling back to (preferably hex) numeric entities
> only if no named entity is found? After all probably everyone has an easier
> time mentally mapping named entities to characters than numeric entities. I
> know the [HTML 5 named entity list][] is huge, but AFAIK it is not official
> yet.
>
> [HTML 5 named entity list]:
> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23
>
> Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:
>
>> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > You can use the  --ascii flag, which will emit: <p>&#8482;</p>
>>
>> And, just to be explicit: there's no way to keep
>> `&trade;`; pandoc throws out information about which
>> entity was used and just stores the character.
>>
>> If you really want `&trade;`, though, you could do:
>>
>>     `&trade;`{=markdown}
>>
>> and this will be passed through to markdown output verbatim.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]                 ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-11-01 21:45                   ` Gareth Stockwell
       [not found]                     ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-11-01 23:33                   ` John MacFarlane
  1 sibling, 1 reply; 10+ messages in thread
From: Gareth Stockwell @ 2018-11-01 21:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: BP Jonsson

[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]

If such a reverse translation was implemented, could it also be supported
in the Markdown writer?

On Thu, 1 Nov 2018, 18:07 John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org wrote:

>
> tagsoup has
>
> htmlEntities :: [(String, String)]
>
> and we could indeed do this lookup.  We'd probably
> want to convert it to a map to make this more
> efficient.
>
> Maybe this is worth doing, at least for HTML5 output?
> (For XML, we need to stick with numerical entities,
> and probably also for HTML4.)
>
> BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Just out of curiosity, since the Markdown and HTML readers presumably do
> a
> > named entity to character lookup to resolve entities, would it be hard or
> > forbiddingly expensive to have the writers do the reverse lookup under
> the
> > `--ascii` option, only falling back to (preferably hex) numeric entities
> > only if no named entity is found? After all probably everyone has an
> easier
> > time mentally mapping named entities to characters than numeric
> entities. I
> > know the [HTML 5 named entity list][] is huge, but AFAIK it is not
> official
> > yet.
> >
> > [HTML 5 named entity list]:
> >
> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23
> >
> > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:
> >
> >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>
> >> > You can use the  --ascii flag, which will emit: <p>&#8482;</p>
> >>
> >> And, just to be explicit: there's no way to keep
> >> `&trade;`; pandoc throws out information about which
> >> entity was used and just stores the character.
> >>
> >> If you really want `&trade;`, though, you could do:
> >>
> >>     `&trade;`{=markdown}
> >>
> >> and this will be passed through to markdown output verbatim.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "pandoc-discuss" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net
> >> .
> >> For more options, visit https://groups.google.com/d/optout.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com
> .
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/yh480kva5gk6xg.fsf%40johnmacfarlane.net
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAGewFGB0xGnPgdz_ZaQxRZ5Cf%3DOjMJpUp4dCTiKc%2Bo0%3DvgO9QA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 7097 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]                     ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-11-01 22:34                       ` John MacFarlane
  0 siblings, 0 replies; 10+ messages in thread
From: John MacFarlane @ 2018-11-01 22:34 UTC (permalink / raw)
  To: Gareth Stockwell, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: BP Jonsson

Gareth Stockwell <gareth.stockwell-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

Yes, when used with --ascii.

> If such a reverse translation was implemented, could it also be supported
> in the Markdown writer?
>
> On Thu, 1 Nov 2018, 18:07 John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org wrote:
>
>>
>> tagsoup has
>>
>> htmlEntities :: [(String, String)]
>>
>> and we could indeed do this lookup.  We'd probably
>> want to convert it to a map to make this more
>> efficient.
>>
>> Maybe this is worth doing, at least for HTML5 output?
>> (For XML, we need to stick with numerical entities,
>> and probably also for HTML4.)
>>
>> BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > Just out of curiosity, since the Markdown and HTML readers presumably do
>> a
>> > named entity to character lookup to resolve entities, would it be hard or
>> > forbiddingly expensive to have the writers do the reverse lookup under
>> the
>> > `--ascii` option, only falling back to (preferably hex) numeric entities
>> > only if no named entity is found? After all probably everyone has an
>> easier
>> > time mentally mapping named entities to characters than numeric
>> entities. I
>> > know the [HTML 5 named entity list][] is huge, but AFAIK it is not
>> official
>> > yet.
>> >
>> > [HTML 5 named entity list]:
>> >
>> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23
>> >
>> > Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:
>> >
>> >> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >>
>> >> > You can use the  --ascii flag, which will emit: <p>&#8482;</p>
>> >>
>> >> And, just to be explicit: there's no way to keep
>> >> `&trade;`; pandoc throws out information about which
>> >> entity was used and just stores the character.
>> >>
>> >> If you really want `&trade;`, though, you could do:
>> >>
>> >>     `&trade;`{=markdown}
>> >>
>> >> and this will be passed through to markdown output verbatim.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "pandoc-discuss" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> To view this discussion on the web visit
>> >>
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net
>> >> .
>> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com
>> .
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480kva5gk6xg.fsf%40johnmacfarlane.net
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAGewFGB0xGnPgdz_ZaQxRZ5Cf%3DOjMJpUp4dCTiKc%2Bo0%3DvgO9QA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]                 ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2018-11-01 21:45                   ` Gareth Stockwell
@ 2018-11-01 23:33                   ` John MacFarlane
  1 sibling, 0 replies; 10+ messages in thread
From: John MacFarlane @ 2018-11-01 23:33 UTC (permalink / raw)
  To: BP Jonsson, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I've implemented this for the HTML5 writer and for
the Markdown writer.

% pandoc --ascii -t html5
äéıå
<p>&auml;&eacute;&inodot;&aring;</p>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kbm78id9p.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]             ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-11-01 18:07               ` John MacFarlane
@ 2018-11-02  0:50               ` MarLinn
       [not found]                 ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: MarLinn @ 2018-11-02  0:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5977 bytes --]

I see a potential problem. Do all browsers support all of these named 
entities, especially if the list is not official yet? What about other 
potential consumers?

And following along this line, is it a good idea to rely on tagsoup's 
list? Or more generally, what are the criteria for an entity to be 
included in such a list, and who should maintain it? Should users be 
able to partially override it?


On 2018-11-01 12:22, BP Jonsson wrote:
> Just out of curiosity, since the Markdown and HTML readers presumably do a
> named entity to character lookup to resolve entities, would it be hard or
> forbiddingly expensive to have the writers do the reverse lookup under the
> `--ascii` option, only falling back to (preferably hex) numeric entities
> only if no named entity is found? After all probably everyone has an easier
> time mentally mapping named entities to characters than numeric entities. I
> know the [HTML 5 named entity list][] is huge, but AFAIK it is not official
> yet.
>
> [HTML 5 named entity list]:
> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23
>
> Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:
>
>> mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> You can use the  --ascii flag, which will emit: <p>&#8482;</p>
>> And, just to be explicit: there's no way to keep
>> `&trade;`; pandoc throws out information about which
>> entity was used and just stores the character.
>>
>> If you really want `&trade;`, though, you could do:
>>
>>      `&trade;`{=markdown}
>>
>> and this will be passed through to markdown output verbatim.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> Just out of curiosity, since the Markdown and HTML readers presumably 
> do a named entity to character lookup to resolve entities, would it be 
> hard or forbiddingly expensive to have the writers do the reverse 
> lookup under the `--ascii` option, only falling back to (preferably 
> hex) numeric entities only if no named entity is found? After all 
> probably everyone has an easier time mentally mapping named entities 
> to characters than numeric entities. I know the [HTML 5 named entity 
> list][] is huge, but AFAIK it is not official yet.
>
> [HTML 5 named entity list]: 
> https://metacpan.org/source/TOBYINK/HTML-HTML5-Entities-0.004/lib/HTML/HTML5/Entities.pm#L23
>
> Den ons 31 okt 2018 21:37 skrev John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org 
> <mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>>:
>
>     mb21 <mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:mauro.bieg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> writes:
>
>     > You can use the  --ascii flag, which will emit: <p>&#8482;</p>
>
>     And, just to be explicit: there's no way to keep
>     `&trade;`; pandoc throws out information about which
>     entity was used and just stores the character.
>
>     If you really want `&trade;`, though, you could do:
>
>         `&trade;`{=markdown}
>
>     and this will be passed through to markdown output verbatim.
>
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "pandoc-discuss" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>     To post to this group, send email to
>     pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>     To view this discussion on the web visit
>     https://groups.google.com/d/msgid/pandoc-discuss/yh480k7ehx3la0.fsf%40johnmacfarlane.net.
>     For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com 
> <https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTO0w%2BoW9bdszpuP1iq30gUP0Zm0_Y%3DqyAeDX8WFvDz5Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/16cbeca9-6954-dab7-3984-6234a32c4708%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 9612 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Markdown writer: emit HTML entities instead of unicode
       [not found]                 ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-11-02  2:28                   ` John MacFarlane
  0 siblings, 0 replies; 10+ messages in thread
From: John MacFarlane @ 2018-11-02  2:28 UTC (permalink / raw)
  To: MarLinn, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

MarLinn <monkleyon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I see a potential problem. Do all browsers support all of these named 
> entities, especially if the list is not official yet? What about other 
> potential consumers?

I don't know the answer to this.  Maybe someone who is
knowledgable can comment.  I assumed that it's a
standard list and that all modern browsers support it.

According to tagsoup's documentation, the list comes
from

http://www.w3.org/TR/html5/syntax.html#named-character-references
 
> And following along this line, is it a good idea to rely on tagsoup's 
> list? Or more generally, what are the criteria for an entity to be 
> included in such a list, and who should maintain it? Should users be 
> able to partially override it?

I'd prefer to avoid complexity here.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-02  2:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31 16:19 Markdown writer: emit HTML entities instead of unicode Gareth Stockwell
     [not found] ` <d61bd26b-7257-420f-920f-4f140616e519-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-10-31 18:57   ` mb21
     [not found]     ` <edf5aeb1-bd71-4f0e-8596-8203c794245d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-10-31 20:37       ` John MacFarlane
     [not found]         ` <yh480k7ehx3la0.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-11-01 11:22           ` BP Jonsson
     [not found]             ` <CAFC_yuTO0w+oW9bdszpuP1iq30gUP0Zm0_Y=qyAeDX8WFvDz5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-01 18:07               ` John MacFarlane
     [not found]                 ` <yh480kva5gk6xg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-11-01 21:45                   ` Gareth Stockwell
     [not found]                     ` <CAGewFGB0xGnPgdz_ZaQxRZ5Cf=OjMJpUp4dCTiKc+o0=vgO9QA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-01 22:34                       ` John MacFarlane
2018-11-01 23:33                   ` John MacFarlane
2018-11-02  0:50               ` MarLinn
     [not found]                 ` <16cbeca9-6954-dab7-3984-6234a32c4708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-11-02  2:28                   ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).