From: Hrvoje Niksic <hniksic@srce.hr>
Subject: Re: Cool bug in URL parsing
Date: 07 May 1998 17:25:08 +0200 [thread overview]
Message-ID: <kigemy685nv.fsf@jagor.srce.hr> (raw)
In-Reply-To: Karl Kleinpaste's message of "07 May 1998 11:15:47 -0400"
Karl Kleinpaste <karl@jprc.com> writes:
> If you haven't stopped your setup from doing highlighting of URLs
> embedded in text, here's an entertaining glitch to see.
>
> >From the end of this line containing the sequence to start a "<URL:"
> everything will be highlighted as a supposed URL until, for example,
> some quoted text shows up to provide the terminator.
>
> > Such as on this line here.
>
> Methinks there's a regexp that gets a /little/ too aggressive...
What makes you think this is a bug? According to rfc1738:
APPENDIX: Recommendations for URLs in Context
URIs, including URLs, are intended to be transmitted through
protocols which provide a context for their interpretation.
In some cases, it will be necessary to distinguish URLs from other
possible data structures in a syntactic structure. In this case, is
recommended that URLs be preceeded with a prefix consisting of the
characters "URL:". For example, this prefix may be used to
distinguish URLs from other kinds of URIs.
In addition, there are many occasions when URLs are included in other
kinds of text; examples include electronic mail, USENET news
messages, or printed on paper. In such cases, it is convenient to
have a separate syntactic wrapper that delimits the URL and separates
it from the rest of the text, and in particular from punctuation
marks that might be mistaken for part of the URL. For this purpose,
is recommended that angle brackets ("<" and ">"), along with the
prefix "URL:", be used to delimit the boundaries of the URL. This
wrapper does not form part of the URL and should not be used in
contexts in which delimiters are already specified.
In the case where a fragment/anchor identifier is associated with a
URL (following a "#"), the identifier would be placed within the
brackets as well.
In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
need to be added to break long URLs across lines. The whitespace
should be ignored when extracting the URL.
No whitespace should be introduced after a hyphen ("-") character.
Because some typesetters and printers may (erroneously) introduce a
hyphen at the end of line when breaking a line, the interpreter of a
URL containing a line break immediately after a hyphen should ignore
all unencoded whitespace around the line break, and should be aware
that the hyphen may or may not actually be part of the URL.
Examples:
Yes, Jim, I found it under <URL:ftp://info.cern.ch/pub/www/doc;
type=d> but you can probably pick it up from <URL:ftp://ds.in
ternic.net/rfc>. Note the warning in <URL:http://ds.internic.
net/instructions/overview.html#WARNING>.
--
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
I'm a Lisp variable -- bind me!
next prev parent reply other threads:[~1998-05-07 15:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
1998-05-07 15:15 Karl Kleinpaste
1998-05-07 15:25 ` Hrvoje Niksic [this message]
1998-05-07 16:03 ` Per Abrahamsen
1998-05-07 16:09 ` Hrvoje Niksic
1998-06-01 3:15 ` Lars Magne Ingebrigtsen
1998-06-02 6:27 ` Hrvoje Niksic
1998-06-03 3:08 ` Lars Magne Ingebrigtsen
1998-06-03 11:35 ` Hrvoje Niksic
1998-06-04 0:09 ` Lars Magne Ingebrigtsen
1998-06-04 0:40 ` Hrvoje Niksic
1998-06-05 18:42 ` Dave Love
1998-06-05 19:36 ` Hrvoje Niksic
1998-06-08 9:28 ` Jan Vroonhof
1998-06-08 12:01 ` Hrvoje Niksic
1998-06-08 19:03 ` Jan Vroonhof
1998-06-08 21:32 ` Hrvoje Niksic
1998-06-08 12:52 ` Jari Aalto+list.ding
1998-06-11 17:34 ` Dave Love
1998-06-11 21:53 ` Hrvoje Niksic
1998-05-08 5:03 ` Russ Allbery
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=kigemy685nv.fsf@jagor.srce.hr \
--to=hniksic@srce.hr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).