ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Max Chernoff via ntg-context <ntg-context@ntg.nl>
To: ntg-context@ntg.nl
Cc: Max Chernoff <mseven@telus.net>, oinos@gmx.es
Subject: Re: issue with scite module
Date: Wed, 1 Jun 2022 15:58:13 -0600	[thread overview]
Message-ID: <cac73198-f067-80c6-633c-b2172f65ddac@telus.net> (raw)
In-Reply-To: <0642f009-3737-f783-6b25-a346f7981667@fiee.net>

> Now, I still don’t understand LPEG and don’t know if there’s a general
> “character” class that doesn’t need a list...

Well looking through the XML spec

     https://www.w3.org/TR/REC-xml/#NT-NameChar

you'd think that we'd want a pattern like this:

     local name = (R("az","AZ","09", "\u{C0}\u{D6}", "\u{D8}\u{F6}", "\u{F8}\u{2FF}", "\u{370}\u{37D}", "\u{37F}\u{1FFF}", "\u{200C}\u{200D}", "\u{2070}\u{218F}", "\u{2C00}\u{2FEF}", "\u{3001}\u{D7FF}", "\u{F900}\u{FDCF}", "\u{FDF0}\u{FFFD}", "\u{10000}\u{EFFFF}", "\u{0300}\u{036F}", "\u{203F}\u{2040}") + S("_-.\u{B7}"))^1

But that doesn't work, since

> The same is true for lpeg.R, although the latter will display an error message if used
> with multibyte characters. Therefore lpeg.R('aä') results in the message bad argument #1
> to 'R' (range must have two characters), since to lpeg, ä is two ’characters’ (bytes), so
> aä totals three. (https://texdoc.org/serve/luatex/0##680)

The easiest way that I found was to just cheat and use everything with
a TeX catcode 11 ("letters"):

     local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1

This isn't strictly speaking correct, but I think that it's close
enough. It seems to work correctly for Pablo's initial example,
but it may break something else.

-- Max

diff --git a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
index e635d40..97de3fd 100644
--- a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original
+++ b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
@@ -41,7 +41,7 @@ local semicolon        = P(";")
  local equal            = P("=")
  local ampersand        = P("&")
  
-local name             = (R("az","AZ","09") + S("_-."))^1
+local name             = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1
  local openbegin        = P("<")
  local openend          = P("</")
  local closebegin       = P("/>") + P(">")




___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2022-06-01 21:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01 16:47 Pablo Rodriguez via ntg-context
2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
2022-06-01 21:58       ` Max Chernoff via ntg-context [this message]
2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
2022-06-02 22:52             ` Max Chernoff via ntg-context
2022-06-04  8:42               ` Pablo Rodriguez via ntg-context
2022-06-04  9:59 Pablo Rodriguez via ntg-context
2022-06-04 21:18 ` Max Chernoff via ntg-context

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cac73198-f067-80c6-633c-b2172f65ddac@telus.net \
    --to=ntg-context@ntg.nl \
    --cc=mseven@telus.net \
    --cc=oinos@gmx.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).