public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Raw links -- beyond IANA support
@ 2018-12-18  3:46 JM Marcastel
       [not found] ` <4adad8d6-fe14-43b5-8e86-d52e5a84d0c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: JM Marcastel @ 2018-12-18  3:46 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1800 bytes --]

Pandoc supports raw links, i.e. plain links in angular brackets. And that 
is great.

As per my current understanding raw link support is restricted to official IANA 
schemes <http://www.iana.org/assignments/uri-schemes.htm> plus a couple of 
extra (hardcoded) schemes (i.e. doi isbn javascript pmid)

Rather than hardcoded non-standard schemes, would it be possible to have 
custom, command-line specified schemes ?

The interest for this feature is two fold:

a) It allows to easily add custom schemes, not defined by IANA, but 
supported by your operating environment (e.g. I have schemes such as `todo` 
to launch a todo item in my PIM tools, `think` to launch document views in 
my CMS system, etc.)

b) Since this is already parsed in the Pandoc AST, this allows to easily 
act upon those schemes and invoke whatever is necessary to enact the link; 
with Pandoc's LUA support, this becomes trivial.


For instance running

pandoc --custom-scheme task ...

with `<task:123>` would open task `#123` in my PIM tools thanks to handling 
Pandoc's AST:

{
  "t": "Link",
  "c": [
    [
      "",
      [],
      []
    ],
    [
      {
 "t": "Str",
 "c": "task:123"
      }
    ],
    [
      "task:123",
      ""
    ]
  ]
}



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4adad8d6-fe14-43b5-8e86-d52e5a84d0c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 7180 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raw links -- beyond IANA support
       [not found] ` <4adad8d6-fe14-43b5-8e86-d52e5a84d0c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-18  5:42   ` John MacFarlane
       [not found]     ` <m21s6fqtjz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2018-12-18  5:42 UTC (permalink / raw)
  To: JM Marcastel, pandoc-discuss


For CommonMark we had a big discussion about what
schemes should be allowed, whether custom schemes
should be allowed, and so on.

In the end we changed the spec to just say that,
for the purpose of recognizing these autolinks,

> a scheme is any sequence of 2–32 characters beginning
> with an ASCII letter and followed by any combination
> of ASCII letters, digits, or the symbols plus (”+”),
> period (”.”), or hyphen (”-”).

This seems to have been working fine -- at least I
haven't heard any complaints about it, and we did
consider possible drawbacks.

So I think it might make sense to make pandoc
behave the same way.



JM Marcastel <jm-hh8AyDY1G20S+FvcfC7Uqw@public.gmane.org> writes:

> Pandoc supports raw links, i.e. plain links in angular brackets. And that 
> is great.
>
> As per my current understanding raw link support is restricted to official IANA 
> schemes <http://www.iana.org/assignments/uri-schemes.htm> plus a couple of 
> extra (hardcoded) schemes (i.e. doi isbn javascript pmid)
>
> Rather than hardcoded non-standard schemes, would it be possible to have 
> custom, command-line specified schemes ?
>
> The interest for this feature is two fold:
>
> a) It allows to easily add custom schemes, not defined by IANA, but 
> supported by your operating environment (e.g. I have schemes such as `todo` 
> to launch a todo item in my PIM tools, `think` to launch document views in 
> my CMS system, etc.)
>
> b) Since this is already parsed in the Pandoc AST, this allows to easily 
> act upon those schemes and invoke whatever is necessary to enact the link; 
> with Pandoc's LUA support, this becomes trivial.
>
>
> For instance running
>
> pandoc --custom-scheme task ...
>
> with `<task:123>` would open task `#123` in my PIM tools thanks to handling 
> Pandoc's AST:
>
> {
>   "t": "Link",
>   "c": [
>     [
>       "",
>       [],
>       []
>     ],
>     [
>       {
>  "t": "Str",
>  "c": "task:123"
>       }
>     ],
>     [
>       "task:123",
>       ""
>     ]
>   ]
> }
>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4adad8d6-fe14-43b5-8e86-d52e5a84d0c4%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21s6fqtjz.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raw links -- beyond IANA support
       [not found]     ` <m21s6fqtjz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-18  9:46       ` JM Marcastel
       [not found]         ` <4C508872-DC08-4A23-94C3-96B99948510F-hh8AyDY1G20S+FvcfC7Uqw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: JM Marcastel @ 2018-12-18  9:46 UTC (permalink / raw)
  To: John MacFarlane; +Cc: pandoc-discuss

This sounds great. However wouldn’t such flexibility break the HTML support… especially in the Microsoft world ?
HTML allows tag names to contain colons (SGML heritage); how then would one distinguish a tag (<my:tag>) from a raw link (<my:scheme>) ?

Distinguishing one from the other, since they have the same syntax, requires a dictionary.
IMO changing the syntax would introduce unnecessary complications.
Since we cannot change the HTML syntax, we would have to change the raw link syntax.
Angular brackets are natural, they pre-date Markdown… we shouldn’t change the raw link syntax either.

I would argue in favour of keeping HTML support as is rather than constraining it in favour of schemes for raw links…
HTML is a wild wild world in constant change, schemes are less fluctuant in time and have a much narrower scope.

One possible way down the road, without a dictionary would we to adopt the CommonMark conclusion and make it RFC compliant for non-IANA schemes.
The correct syntax for a URI with authority (rfc3986) is:

	<scheme://…>

By adding the double slash, the parser is forced to recognise the input as a raw link (as this cannot be an HTML tag).
Obviously this openness is at the cost of URIs without authority (e.g. relative URIs).
IMO this is acceptable.

IANA defined schemes, for which Pandoc already has a dictionary, remain unchanged and do not need a the double slash.
Making both syntaxes available for IANA schemes would be a plus for those who wish consistency :-)


> On 18 Dec 2018, at 06:42, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
> 
> 
> For CommonMark we had a big discussion about what
> schemes should be allowed, whether custom schemes
> should be allowed, and so on.
> 
> In the end we changed the spec to just say that,
> for the purpose of recognizing these autolinks,
> 
>> a scheme is any sequence of 2–32 characters beginning
>> with an ASCII letter and followed by any combination
>> of ASCII letters, digits, or the symbols plus (”+”),
>> period (”.”), or hyphen (”-”).
> 
> This seems to have been working fine -- at least I
> haven't heard any complaints about it, and we did
> consider possible drawbacks.
> 
> So I think it might make sense to make pandoc
> behave the same way.
> 
> 
> 
> JM Marcastel <jm-hh8AyDY1G20S+FvcfC7Uqw@public.gmane.org> writes:
> 
>> Pandoc supports raw links, i.e. plain links in angular brackets. And that 
>> is great.
>> 
>> As per my current understanding raw link support is restricted to official IANA 
>> schemes <http://www.iana.org/assignments/uri-schemes.htm> plus a couple of 
>> extra (hardcoded) schemes (i.e. doi isbn javascript pmid)
>> 
>> Rather than hardcoded non-standard schemes, would it be possible to have 
>> custom, command-line specified schemes ?
>> 
>> The interest for this feature is two fold:
>> 
>> a) It allows to easily add custom schemes, not defined by IANA, but 
>> supported by your operating environment (e.g. I have schemes such as `todo` 
>> to launch a todo item in my PIM tools, `think` to launch document views in 
>> my CMS system, etc.)
>> 
>> b) Since this is already parsed in the Pandoc AST, this allows to easily 
>> act upon those schemes and invoke whatever is necessary to enact the link; 
>> with Pandoc's LUA support, this becomes trivial.
>> 
>> 
>> For instance running
>> 
>> pandoc --custom-scheme task ...
>> 
>> with `<task:123>` would open task `#123` in my PIM tools thanks to handling 
>> Pandoc's AST:
>> 
>> {
>>  "t": "Link",
>>  "c": [
>>    [
>>      "",
>>      [],
>>      []
>>    ],
>>    [
>>      {
>> "t": "Str",
>> "c": "task:123"
>>      }
>>    ],
>>    [
>>      "task:123",
>>      ""
>>    ]
>>  ]
>> }
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4adad8d6-fe14-43b5-8e86-d52e5a84d0c4%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4C508872-DC08-4A23-94C3-96B99948510F%40marcastel.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raw links -- beyond IANA support
       [not found]         ` <4C508872-DC08-4A23-94C3-96B99948510F-hh8AyDY1G20S+FvcfC7Uqw@public.gmane.org>
@ 2018-12-18 17:49           ` John MacFarlane
       [not found]             ` <m2va3qpvx8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2018-12-18 17:49 UTC (permalink / raw)
  To: JM Marcastel; +Cc: pandoc-discuss


Here are some relevant discussions if you really want to dive into it:

https://talk.commonmark.org/t/what-is-the-point-of-limiting-uri-schemes-in-autolinks/555

Your consideration was, in fact, the reason for my
hesitation in making this change, but in the end I was
convinced.  The Commonmark spec doesn't allow colons
in HTML tags anyway. And there are no colons in any of
the officially supported HTML5 tag names. I suppose
they might be allowed in some variants of HTML, e.g.
XHTML? I have not seen them in the wild, but maybe you
can point to some examples?

Note that even with this change it would still be
possible to use the raw blocks and raw spans:
e.g., `<my:htmltag>`{=html}.

> One possible way down the road, without a dictionary would we to adopt the CommonMark conclusion and make it RFC compliant for non-IANA schemes.
> The correct syntax for a URI with authority (rfc3986) is:
>
> 	<scheme://…>
>
> By adding the double slash, the parser is forced to recognise the input as a raw link (as this cannot be an HTML tag).
> Obviously this openness is at the cost of URIs without authority (e.g. relative URIs).
> IMO this is acceptable.

This might be a good compromise.  Relative URIs have
never been supported in autolinks. However, we have
supported `<mailto:me-hcDgGtZH8xNBDgjK7y7TUQ@public.gmane.org>`. Maybe there are
others like this?  As you note, we could retain the
dictionary here and not require the double slash for
IANA schemes.  Yes, this seems a good solution.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2va3qpvx8.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raw links -- beyond IANA support
       [not found]             ` <m2va3qpvx8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-19  2:52               ` JM Marcastel
  0 siblings, 0 replies; 5+ messages in thread
From: JM Marcastel @ 2018-12-19  2:52 UTC (permalink / raw)
  To: John MacFarlane; +Cc: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 3774 bytes --]

Sorry for the wrong vocabulary.. auto links not raw links. Mea culpa.

* * *

> This might be a good compromise.  Relative URIs have
> never been supported in autolinks. However, we have
> supported `<mailto:me-hcDgGtZH8xNBDgjK7y7TUQ@public.gmane.org <mailto:me-hcDgGtZH8xNBDgjK7y7TUQ@public.gmane.org>>`. Maybe there are
> others like this?  As you note, we could retain the
> dictionary here and not require the double slash for
> IANA schemes.  Yes, this seems a good solution.


That would be great :-)

* * *

Going through the CommonMark thread, two points popped up:

a) Attributes and title

Do we need support for a title and attributes in auto links ? I would be tempted to say no, we have inline links in such situations.
In essence auto links are hyper references whose URI is significant to the reader… the title is the URI and the style is implicit (or we stye it using inline markup).

On the other hand, since Pandoc already has a defined syntax for attributes adding such support for auto links could be a nice feature.
(The syntaxes in the CommonMark thread tend to reinvent the wheel.)

b) Case handling.

The CommonMark auto link specification appears as a truncated RFC 3986 scheme definition. The truncated part being the case handling:

> Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with
> lowercase letters.  An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow
> “HTTP" as well as "http") for the sake of robustness but should only produce lowercase scheme names for consistency.


Though this is behind the scene handling, I believe that it is important that it be mentioned that schemes are lowercase.

* * *

> And there are no colons in any of
> the officially supported HTML5 tag names. I suppose
> they might be allowed in some variants of HTML, e.g.
> XHTML? I have not seen them in the wild, but maybe you
> can point to some examples?

I don’t have HTML5 specific examples beyond those mentioned in the thread (e.g. svg).
My viewpoint is slightly different though,
Angled bracket support in Pandoc allows to support SGML and SGML descendants, be they called XML, HTML, XHTML, HTML5, or others.
Though obviously the angle bracket markup of the day is HTML5, this is transient, we’ll soon have HTML6 or HTML-NG or whatever,
All these are (and will be ((W3G)) SGML-based markup languages.
Pandoc isn’t transient. SGML support is future proof (and legacy support too… helping convert IBM and Filenet CMS vaults, for instance)
By analogy backslash support, in the AST, is not restricted to LaTeX, but more broadly to TeX and its various flavours (and custom commands).
Colon support is SGML. Why restrict it ? Why restrict Pandoc ?

Pandoc’s job in supporting angle bracketed markup is isolating that kind of markup in the AST, not making decisions on what that markup’s semantic content should be beyond the angled brackets base syntax.
Supporting a specific flavour of SGML or TeX, is a writer consideration, be it HTML5, LaTeX, or any subsequent flavour. FWIW :-)



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8ABD808F-A93A-4AB6-9F54-226E5245FBD6%40marcastel.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 12282 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-19  2:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  3:46 Raw links -- beyond IANA support JM Marcastel
     [not found] ` <4adad8d6-fe14-43b5-8e86-d52e5a84d0c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-18  5:42   ` John MacFarlane
     [not found]     ` <m21s6fqtjz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-18  9:46       ` JM Marcastel
     [not found]         ` <4C508872-DC08-4A23-94C3-96B99948510F-hh8AyDY1G20S+FvcfC7Uqw@public.gmane.org>
2018-12-18 17:49           ` John MacFarlane
     [not found]             ` <m2va3qpvx8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-19  2:52               ` JM Marcastel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).