From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/88721 Path: news.gmane.org!not-for-mail From: Mark Szepieniec Newsgroups: gmane.comp.tex.context Subject: Re: Permissible characters in ConTeXt reference labels Date: Thu, 18 Sep 2014 14:39:05 +0200 Message-ID: References: <541A08A0.1010002@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0690636611==" X-Trace: ger.gmane.org 1411043992 392 80.91.229.3 (18 Sep 2014 12:39:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 18 Sep 2014 12:39:52 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Thu Sep 18 14:39:43 2014 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([5.39.185.229]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XUazr-00089A-EU for gctc-ntg-context-518@m.gmane.org; Thu, 18 Sep 2014 14:39:39 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id C7C15102BB for ; Thu, 18 Sep 2014 14:39:37 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id kFz0kV3g8tYN for ; Thu, 18 Sep 2014 14:39:37 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id CB8EA10224 for ; Thu, 18 Sep 2014 14:39:28 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id CACF3101E7 for ; Thu, 18 Sep 2014 14:39:24 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id IF090601rYcY for ; Thu, 18 Sep 2014 14:39:19 +0200 (CEST) Original-Received: from filter3-ams.mf.surf.net (filter3-ams.mf.surf.net [192.87.102.71]) by balder.ntg.nl (Postfix) with ESMTP id A5834101E1 for ; Thu, 18 Sep 2014 14:39:14 +0200 (CEST) Original-Received: from mail-qa0-x232.google.com (mail-qa0-x232.google.com [IPv6:2607:f8b0:400d:c00::232]) by filter3-ams.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id s8ICd6oH006360 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 18 Sep 2014 14:39:07 +0200 Original-Received: by mail-qa0-f50.google.com with SMTP id k15so948732qaq.23 for ; Thu, 18 Sep 2014 05:39:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=v7i/cnV36EID3mkvCzBkdCY/EbX/kOcA3kUKPfyn5j8=; b=TqslKHiRr25hZpQ30gPptpf800ztobCkCLS4Sud0G2Oo75H8TT30M7w89goJLeskD2 8dneMJP3++LbokZFdkF/yQQSMvrvGRjlym1hs9rhtdi9moRJnKJ2vH8xyuOopFELi7Ta r4qpnPIyD0Oju8lm1SU4E5uBdp7nyH74CXVIrAqbnwdQdqnHMhV8KRFXsHeY0Jmj/1b+ /YGdKNd13OU2Qys5JAQoXNInwjUt0XHQw3W0qRFd2qYpLmVhFWIfZZTuA7MgxOWgbPtl 66/XRVw9F7Tca+BER9uiIIaiz4CYXsOo2pjjq51qRtA/c9aBChNMbo4nNj/J/JSltVzF vjog== X-Received: by 10.224.7.197 with SMTP id e5mr8533656qae.58.1411043945863; Thu, 18 Sep 2014 05:39:05 -0700 (PDT) Original-Received: by 10.229.196.73 with HTTP; Thu, 18 Sep 2014 05:39:05 -0700 (PDT) In-Reply-To: X-Bayes-Prob: 0.0001 (Score 0, tokens from: ntg-context@ntg.nl, base:default, @@RPTN) X-CanIt-Geo: ip=2607:f8b0:400d:c00::232; country=US X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 03MQAD7vb - d66ac50b6061 - 20140918 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:88721 Archived-At: --===============0690636611== Content-Type: multipart/alternative; boundary=001a11c223f4281e2f05035645cc --001a11c223f4281e2f05035645cc Content-Type: text/plain; charset=UTF-8 OK, thanks both of you, its looks like I need to sanitize all mentioned characters, since the reference strings will generally originate from formats other than ConTeXt, and we don't want ConTeXt to do any processing on them, aside from comparisons to resolve references. As for Aditya's examples, the first results in a compilation error on my test file, while the second compiles without error, and gives the expected result. On Thu, Sep 18, 2014 at 4:26 AM, Aditya Mahajan wrote: > On Thu, 18 Sep 2014, Hans Hagen wrote: > > On 9/18/2014 12:06 AM, Mark Szepieniec wrote: >> >>> Bump... >>> >>> If it's not too much trouble, I would greatly appreciate some feedback >>> on this before I propose it to be merged into pandoc; even a "looks good >>> to me" from one of the ConTeXt gurus would be very helpful. >>> >>> Thanks in advance, >>> >>> Mark >>> >>> On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec >> > wrote: >>> >>> I'm trying to fix a problem in pandoc (see >>> https://github.com/jgm/pandoc/pull/1589) where it doesn't properly >>> sanitize the reference labels in ConTeXt output, causing errors >>> during compilation when a label contains '#' for example. Note that >>> this sanitizing is needed in addition to the regular backslash >>> escaping used for control characters: '\#' is still illegal in a >>> label for example. >>> >> > (LaTeX label) = (ConTeXt reference). What Mark mean was references such as > > \section[...]{...} or \startplacefigure[reference={...}]. > > In the sanitizer function I'm writing, I'd like to properly escape >>> all illegal characters, but I couldn't find an explicit list of >>> allowed or illegal characters. Based on some testing I've conducted >>> (see attached file), I've arrived at the following set: >>> >>> \#[]",{}%()|= >>> >> >> it depends on where these characters end up in >> >> # : always tricky as it denotes an argument, so escape >> [] : depends if it gets fed into a macro that uses [] as delimiters >> {} : only an issue when not balanced >> % : escaping needed as it's comment otherwise >> () : depends on where it ends up, like [] >> | : is special in context so needs escaping >> \ : of course that one needs escaping >> >> 1) Does this look like a reasonable set? Are there other characters >>> or sequences that should be included, or are worth testing? >>> >> >> keep in mind that escapes should end up unescaped at some point >> >> 2) I was told (see >>> https://groups.google.com/forum/#!topic/pandoc-discuss/tYpXMUkmbEY) >>> that if the characters " and , didn't work, it would count as a >>> ConTeXt bug, is there any truth to that? Please let me know if any >>> further info is needed on my part. >>> >> >> well, define bug ... one can say the same of < and > in xml -) >> > > Since I made that comment on the pandoc mailing list, let me explain. > > Consider: > > \section["some" reference]{Title} > > Given how " behaves elsewhere in ConTeXt, a user would expect the above to > be a valid input. If it is not, then it is bug (or atleast, surprising). > > The same goes for > > \section[some, reference]{Title} > > if the result ends up in a comma separated list then , can be an issue >> but one can always wrap an argument in {} to hide that >> >> 3) Does anyone see issues with this general approach? I'm relatively >>> new to ConTeXt, so I might be missing either a huge problem, or an >>> obviously easier way to do this. >>> >> >> i don't know ... i never used pandoc input >> > > Aditya > > ____________________________________________________________ > _______________________ > If your question is of interest to others as well, please add an entry to > the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/ > listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ____________________________________________________________ > _______________________ > --001a11c223f4281e2f05035645cc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
OK, thanks both of you, its looks like I need to sanitize = all mentioned characters, since the reference strings will generally origin= ate from formats other than ConTeXt, and we don't want ConTeXt to do an= y processing on them, aside from comparisons to resolve references.
As for Aditya's examples, the first results in a compilatio= n error on my test file, while the second compiles without error, and gives= the expected result.

On Thu, Sep 18, 2014 at 4:26 AM, Aditya Mahajan <adityam= @umich.edu> wrote:
On Thu, 18 Sep 2014, Hans Hagen wrote:

On 9/18/2014 12:06 AM, Mark Szepieniec wrote:
Bump...

If it's not too much trouble, I would greatly appreciate some feedback<= br> on this before I propose it to be merged into pandoc; even a "looks go= od
to me" from one of the ConTeXt gurus would be very helpful.

Thanks in advance,

Mark

On Tue, Sep 9, 2014 at 12:20 AM, Mark Szepieniec <mszepien@gmail.com
<mailto:mszepien= @gmail.com>> wrote:

=C2=A0 =C2=A0 I'm trying to fix a problem in pandoc (see
=C2=A0 =C2=A0 https://github.com/jgm/pandoc/pull/1589) where it doesn= 't properly
=C2=A0 =C2=A0 sanitize the reference labels in ConTeXt output, causing erro= rs
=C2=A0 =C2=A0 during compilation when a label contains '#' for exam= ple. Note that
=C2=A0 =C2=A0 this sanitizing is needed in addition to the regular backslas= h
=C2=A0 =C2=A0 escaping used for control characters: '\#' is still i= llegal in a
=C2=A0 =C2=A0 label for example.

(LaTeX label) =3D (ConTeXt reference). What Mark mean was references such a= s

\section[...]{...} or \startplacefigure[reference=3D{...}].

=C2=A0 =C2=A0 In the sanitizer function I'm writing, I'd like to pr= operly escape
=C2=A0 =C2=A0 all illegal characters, but I couldn't find an explicit l= ist of
=C2=A0 =C2=A0 allowed or illegal characters. Based on some testing I've= conducted
=C2=A0 =C2=A0 (see attached file), I've arrived at the following set:
=C2=A0 =C2=A0 \#[]",{}%()|=3D

it depends on where these characters end up in

#=C2=A0 : always tricky as it denotes an argument, so escape
[] : depends if it gets fed into a macro that uses [] as delimiters
{} : only an issue when not balanced
%=C2=A0 : escaping needed as it's comment otherwise
() : depends on where it ends up, like []
|=C2=A0 : is special in context so needs escaping
\=C2=A0 : of course that one needs escaping

=C2=A0 =C2=A0 1) Does this look like a reasonable set? Are there other char= acters
=C2=A0 =C2=A0 or sequences that should be included, or are worth testing?

keep in mind that escapes should end up unescaped at some point

=C2=A0 =C2=A0 2) I was told (see
=C2=A0 =C2=A0 https://groups.google.com/forum/= #!topic/pandoc-discuss/tYpXMUkmbEY)
=C2=A0 =C2=A0 that if the characters " and , didn't work, it would= count as a
=C2=A0 =C2=A0 ConTeXt bug, is there any truth to that? Please let me know i= f any
=C2=A0 =C2=A0 further info is needed on my part.

well, define bug ... one can say the same of < and > in xml -)

Since I made that comment on the pandoc mailing list, let me explain.

Consider:

\section["some" reference]{Title}

Given how " behaves elsewhere in ConTeXt, a user would expect the abov= e to be a valid input. If it is not, then it is bug (or atleast, surprising= ).

The same goes for

\section[some, reference]{Title}

if the result ends up in a comma separated list then , can be an issue but = one can always wrap an argument in {} to hide that

=C2=A0 =C2=A0 3) Does anyone see issues with this general approach? I'm= relatively
=C2=A0 =C2=A0 new to ConTeXt, so I might be missing either a huge problem, = or an
=C2=A0 =C2=A0 obviously easier way to do this.

i don't know ... i never used pandoc input

Aditya

_____________________________________________________________= ______________________
If your question is of interest to others as well, please add an entry to t= he Wiki!

maillist : ntg-cont= ext@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage=C2=A0 :
http= ://www.pragma-ade.nl / http://tex.aanhet.net
archive=C2=A0 : http://foundry.supelec.fr/projects/contextrev/=
wiki=C2=A0 =C2=A0 =C2=A0: http://contextgarden.net
_____________________________________________________________= ______________________

--001a11c223f4281e2f05035645cc-- --===============0690636611== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0690636611==--