* Permissible characters in ConTeXt reference labels @ 2014-09-08 22:20 Mark Szepieniec 2014-09-17 22:06 ` Mark Szepieniec 0 siblings, 1 reply; 5+ messages in thread From: Mark Szepieniec @ 2014-09-08 22:20 UTC (permalink / raw) To: ntg-context [-- Attachment #1.1: Type: text/plain, Size: 1228 bytes --] I'm trying to fix a problem in pandoc (see https://github.com/jgm/pandoc/pull/1589) where it doesn't properly sanitize the reference labels in ConTeXt output, causing errors during compilation when a label contains '#' for example. Note that this sanitizing is needed in addition to the regular backslash escaping used for control characters: '\#' is still illegal in a label for example. In the sanitizer function I'm writing, I'd like to properly escape all illegal characters, but I couldn't find an explicit list of allowed or illegal characters. Based on some testing I've conducted (see attached file), I've arrived at the following set: \#[]",{}%()|= 1) Does this look like a reasonable set? Are there other characters or sequences that should be included, or are worth testing? 2) I was told (see https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) that if the characters " and , didn't work, it would count as a ConTeXt bug, is there any truth to that? Please let me know if any further info is needed on my part. 3) Does anyone see issues with this general approach? I'm relatively new to ConTeXt, so I might be missing either a huge problem, or an obviously easier way to do this. Thanks, Mark [-- Attachment #1.2: Type: text/html, Size: 1617 bytes --] [-- Attachment #2: test.tex --] [-- Type: application/x-tex, Size: 1944 bytes --] [-- Attachment #3: Type: text/plain, Size: 485 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Permissible characters in ConTeXt reference labels 2014-09-08 22:20 Permissible characters in ConTeXt reference labels Mark Szepieniec @ 2014-09-17 22:06 ` Mark Szepieniec 2014-09-17 22:18 ` Hans Hagen 0 siblings, 1 reply; 5+ messages in thread From: Mark Szepieniec @ 2014-09-17 22:06 UTC (permalink / raw) To: ntg-context [-- Attachment #1.1: Type: text/plain, Size: 1601 bytes --] Bump... If it's not too much trouble, I would greatly appreciate some feedback on this before I propose it to be merged into pandoc; even a "looks good to me" from one of the ConTeXt gurus would be very helpful. Thanks in advance, Mark On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec <mszepien@gmail.com> wrote: > I'm trying to fix a problem in pandoc (see > https://github.com/jgm/pandoc/pull/1589) where it doesn't properly > sanitize the reference labels in ConTeXt output, causing errors during > compilation when a label contains '#' for example. Note that this > sanitizing is needed in addition to the regular backslash escaping used for > control characters: '\#' is still illegal in a label for example. > > In the sanitizer function I'm writing, I'd like to properly escape all > illegal characters, but I couldn't find an explicit list of allowed or > illegal characters. Based on some testing I've conducted (see attached > file), I've arrived at the following set: > > \#[]",{}%()|= > > 1) Does this look like a reasonable set? Are there other characters or > sequences that should be included, or are worth testing? > > 2) I was told (see > https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) that > if the characters " and , didn't work, it would count as a ConTeXt bug, is > there any truth to that? Please let me know if any further info is needed > on my part. > > 3) Does anyone see issues with this general approach? I'm relatively new > to ConTeXt, so I might be missing either a huge problem, or an obviously > easier way to do this. > > Thanks, > > Mark > [-- Attachment #1.2: Type: text/html, Size: 2346 bytes --] [-- Attachment #2: Type: text/plain, Size: 485 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Permissible characters in ConTeXt reference labels 2014-09-17 22:06 ` Mark Szepieniec @ 2014-09-17 22:18 ` Hans Hagen 2014-09-18 2:26 ` Aditya Mahajan 0 siblings, 1 reply; 5+ messages in thread From: Hans Hagen @ 2014-09-17 22:18 UTC (permalink / raw) To: mailing list for ConTeXt users On 9/18/2014 12:06 AM, Mark Szepieniec wrote: > Bump... > > If it's not too much trouble, I would greatly appreciate some feedback > on this before I propose it to be merged into pandoc; even a "looks good > to me" from one of the ConTeXt gurus would be very helpful. > > Thanks in advance, > > Mark > > On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec <mszepien@gmail.com > <mailto:mszepien@gmail.com>> wrote: > > I'm trying to fix a problem in pandoc (see > https://github.com/jgm/pandoc/pull/1589) where it doesn't properly > sanitize the reference labels in ConTeXt output, causing errors > during compilation when a label contains '#' for example. Note that > this sanitizing is needed in addition to the regular backslash > escaping used for control characters: '\#' is still illegal in a > label for example. > > In the sanitizer function I'm writing, I'd like to properly escape > all illegal characters, but I couldn't find an explicit list of > allowed or illegal characters. Based on some testing I've conducted > (see attached file), I've arrived at the following set: > > \#[]",{}%()|= it depends on where these characters end up in # : always tricky as it denotes an argument, so escape [] : depends if it gets fed into a macro that uses [] as delimiters {} : only an issue when not balanced % : escaping needed as it's comment otherwise () : depends on where it ends up, like [] | : is special in context so needs escaping \ : of course that one needs escaping > 1) Does this look like a reasonable set? Are there other characters > or sequences that should be included, or are worth testing? keep in mind that escapes should end up unescaped at some point > 2) I was told (see > https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) > that if the characters " and , didn't work, it would count as a > ConTeXt bug, is there any truth to that? Please let me know if any > further info is needed on my part. well, define bug ... one can say the same of < and > in xml -) if the result ends up in a comma separated list then , can be an issue but one can always wrap an argument in {} to hide that > 3) Does anyone see issues with this general approach? I'm relatively > new to ConTeXt, so I might be missing either a huge problem, or an > obviously easier way to do this. i don't know ... i never used pandoc input Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Permissible characters in ConTeXt reference labels 2014-09-17 22:18 ` Hans Hagen @ 2014-09-18 2:26 ` Aditya Mahajan 2014-09-18 12:39 ` Mark Szepieniec 0 siblings, 1 reply; 5+ messages in thread From: Aditya Mahajan @ 2014-09-18 2:26 UTC (permalink / raw) To: mailing list for ConTeXt users On Thu, 18 Sep 2014, Hans Hagen wrote: > On 9/18/2014 12:06 AM, Mark Szepieniec wrote: >> Bump... >> >> If it's not too much trouble, I would greatly appreciate some feedback >> on this before I propose it to be merged into pandoc; even a "looks good >> to me" from one of the ConTeXt gurus would be very helpful. >> >> Thanks in advance, >> >> Mark >> >> On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec <mszepien@gmail.com >> <mailto:mszepien@gmail.com>> wrote: >> >> I'm trying to fix a problem in pandoc (see >> https://github.com/jgm/pandoc/pull/1589) where it doesn't properly >> sanitize the reference labels in ConTeXt output, causing errors >> during compilation when a label contains '#' for example. Note that >> this sanitizing is needed in addition to the regular backslash >> escaping used for control characters: '\#' is still illegal in a >> label for example. (LaTeX label) = (ConTeXt reference). What Mark mean was references such as \section[...]{...} or \startplacefigure[reference={...}]. >> In the sanitizer function I'm writing, I'd like to properly escape >> all illegal characters, but I couldn't find an explicit list of >> allowed or illegal characters. Based on some testing I've conducted >> (see attached file), I've arrived at the following set: >> >> \#[]",{}%()|= > > it depends on where these characters end up in > > # : always tricky as it denotes an argument, so escape > [] : depends if it gets fed into a macro that uses [] as delimiters > {} : only an issue when not balanced > % : escaping needed as it's comment otherwise > () : depends on where it ends up, like [] > | : is special in context so needs escaping > \ : of course that one needs escaping > >> 1) Does this look like a reasonable set? Are there other characters >> or sequences that should be included, or are worth testing? > > keep in mind that escapes should end up unescaped at some point > >> 2) I was told (see >> https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) >> that if the characters " and , didn't work, it would count as a >> ConTeXt bug, is there any truth to that? Please let me know if any >> further info is needed on my part. > > well, define bug ... one can say the same of < and > in xml -) Since I made that comment on the pandoc mailing list, let me explain. Consider: \section["some" reference]{Title} Given how " behaves elsewhere in ConTeXt, a user would expect the above to be a valid input. If it is not, then it is bug (or atleast, surprising). The same goes for \section[some, reference]{Title} > if the result ends up in a comma separated list then , can be an issue but > one can always wrap an argument in {} to hide that > >> 3) Does anyone see issues with this general approach? I'm relatively >> new to ConTeXt, so I might be missing either a huge problem, or an >> obviously easier way to do this. > > i don't know ... i never used pandoc input Aditya ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Permissible characters in ConTeXt reference labels 2014-09-18 2:26 ` Aditya Mahajan @ 2014-09-18 12:39 ` Mark Szepieniec 0 siblings, 0 replies; 5+ messages in thread From: Mark Szepieniec @ 2014-09-18 12:39 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 4159 bytes --] OK, thanks both of you, its looks like I need to sanitize all mentioned characters, since the reference strings will generally originate from formats other than ConTeXt, and we don't want ConTeXt to do any processing on them, aside from comparisons to resolve references. As for Aditya's examples, the first results in a compilation error on my test file, while the second compiles without error, and gives the expected result. On Thu, Sep 18, 2014 at 4:26 AM, Aditya Mahajan <adityam@umich.edu> wrote: > On Thu, 18 Sep 2014, Hans Hagen wrote: > > On 9/18/2014 12:06 AM, Mark Szepieniec wrote: >> >>> Bump... >>> >>> If it's not too much trouble, I would greatly appreciate some feedback >>> on this before I propose it to be merged into pandoc; even a "looks good >>> to me" from one of the ConTeXt gurus would be very helpful. >>> >>> Thanks in advance, >>> >>> Mark >>> >>> On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec <mszepien@gmail.com >>> <mailto:mszepien@gmail.com>> wrote: >>> >>> I'm trying to fix a problem in pandoc (see >>> https://github.com/jgm/pandoc/pull/1589) where it doesn't properly >>> sanitize the reference labels in ConTeXt output, causing errors >>> during compilation when a label contains '#' for example. Note that >>> this sanitizing is needed in addition to the regular backslash >>> escaping used for control characters: '\#' is still illegal in a >>> label for example. >>> >> > (LaTeX label) = (ConTeXt reference). What Mark mean was references such as > > \section[...]{...} or \startplacefigure[reference={...}]. > > In the sanitizer function I'm writing, I'd like to properly escape >>> all illegal characters, but I couldn't find an explicit list of >>> allowed or illegal characters. Based on some testing I've conducted >>> (see attached file), I've arrived at the following set: >>> >>> \#[]",{}%()|= >>> >> >> it depends on where these characters end up in >> >> # : always tricky as it denotes an argument, so escape >> [] : depends if it gets fed into a macro that uses [] as delimiters >> {} : only an issue when not balanced >> % : escaping needed as it's comment otherwise >> () : depends on where it ends up, like [] >> | : is special in context so needs escaping >> \ : of course that one needs escaping >> >> 1) Does this look like a reasonable set? Are there other characters >>> or sequences that should be included, or are worth testing? >>> >> >> keep in mind that escapes should end up unescaped at some point >> >> 2) I was told (see >>> https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) >>> that if the characters " and , didn't work, it would count as a >>> ConTeXt bug, is there any truth to that? Please let me know if any >>> further info is needed on my part. >>> >> >> well, define bug ... one can say the same of < and > in xml -) >> > > Since I made that comment on the pandoc mailing list, let me explain. > > Consider: > > \section["some" reference]{Title} > > Given how " behaves elsewhere in ConTeXt, a user would expect the above to > be a valid input. If it is not, then it is bug (or atleast, surprising). > > The same goes for > > \section[some, reference]{Title} > > if the result ends up in a comma separated list then , can be an issue >> but one can always wrap an argument in {} to hide that >> >> 3) Does anyone see issues with this general approach? I'm relatively >>> new to ConTeXt, so I might be missing either a huge problem, or an >>> obviously easier way to do this. >>> >> >> i don't know ... i never used pandoc input >> > > Aditya > > ____________________________________________________________ > _______________________ > If your question is of interest to others as well, please add an entry to > the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/ > listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ____________________________________________________________ > _______________________ > [-- Attachment #1.2: Type: text/html, Size: 6648 bytes --] [-- Attachment #2: Type: text/plain, Size: 485 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-09-18 12:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-08 22:20 Permissible characters in ConTeXt reference labels Mark Szepieniec 2014-09-17 22:06 ` Mark Szepieniec 2014-09-17 22:18 ` Hans Hagen 2014-09-18 2:26 ` Aditya Mahajan 2014-09-18 12:39 ` Mark Szepieniec
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).