ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* issue with scite module
@ 2022-06-01 16:47 Pablo Rodriguez via ntg-context
  2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-01 16:47 UTC (permalink / raw)
  To: ntg-context; +Cc: Pablo Rodriguez

Dear list,

I have the following sample:

  \usemodule[scite]
  \starttext
  \startTEXpage[offset=1ex]
  \type[option=xml]{<ans/>}
  \type[option=xml]{<áñß/>}
  \stopTEXpage
  \stoptext

Using scite, I don’t get the second element right.

Without scite, both elements are displayed right.

In both Geany and Notepad++ (which use Scintilla internally), the two
elements are displayed right.

Could anyone confirm the issue?

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-01 16:47 issue with scite module Pablo Rodriguez via ntg-context
@ 2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
  2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Henning Hraban Ramm via ntg-context @ 2022-06-01 16:58 UTC (permalink / raw)
  To: ntg-context; +Cc: Henning Hraban Ramm

Am 01.06.22 um 18:47 schrieb Pablo Rodriguez via ntg-context:
> Dear list,
> 
> I have the following sample:
> 
>    \usemodule[scite]
>    \starttext
>    \startTEXpage[offset=1ex]
>    \type[option=xml]{<ans/>}
>    \type[option=xml]{<áñß/>}
>    \stopTEXpage
>    \stoptext
> 
> Using scite, I don’t get the second element right.
> 
> Without scite, both elements are displayed right.
> 
> In both Geany and Notepad++ (which use Scintilla internally), the two
> elements are displayed right.
> 
> Could anyone confirm the issue?

Hi Pablo,

with LMTX version 2022.05.11, both elements are displayed, but the first 
in blue, the second in red. Apparently the scite highlighter doesn’t 
like non-ASCII characters in elements.

Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
@ 2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
  2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-01 17:45 UTC (permalink / raw)
  To: Henning Hraban Ramm via ntg-context; +Cc: Pablo Rodriguez

On 6/1/22 18:58, Henning Hraban Ramm via ntg-context wrote:
> Am 01.06.22 um 18:47 schrieb Pablo Rodriguez via ntg-context:
>> [...]
>> Could anyone confirm the issue?
>
> Hi Pablo,
>
> with LMTX version 2022.05.11, both elements are displayed, but the first
> in blue, the second in red. Apparently the scite highlighter doesn’t
> like non-ASCII characters in elements.

Hi Hraban,

this is exactly what I’m experiencing (and sorry, I forgot to mention
that I was using current latest).

I experienced that without scite and Hans fixed it (in buff-imp-xml.lua).

I mentioned both Geany and Notepad++, because I think it may not be an
issue outside ConTeXt.

But I don’t know which file deals with it (so I could try to submit a
patch).

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
@ 2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
  2022-06-01 21:58       ` Max Chernoff via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Henning Hraban Ramm via ntg-context @ 2022-06-01 19:00 UTC (permalink / raw)
  To: Pablo Rodriguez via ntg-context; +Cc: Henning Hraban Ramm

Am 01.06.22 um 19:45 schrieb Pablo Rodriguez via ntg-context:
> But I don’t know which file deals with it (so I could try to submit a
> patch).

That would be texmf-context/tex/context/modules/mkiv/m-scite.mkiv
and 
texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
and there probably

local name = (R("az","AZ","09") + S("_-."))^1

Now, I still don’t understand LPEG and don’t know if there’s a general 
“character” class that doesn’t need a list...

Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
@ 2022-06-01 21:58       ` Max Chernoff via ntg-context
  2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Max Chernoff via ntg-context @ 2022-06-01 21:58 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff, oinos

> Now, I still don’t understand LPEG and don’t know if there’s a general
> “character” class that doesn’t need a list...

Well looking through the XML spec

     https://www.w3.org/TR/REC-xml/#NT-NameChar

you'd think that we'd want a pattern like this:

     local name = (R("az","AZ","09", "\u{C0}\u{D6}", "\u{D8}\u{F6}", "\u{F8}\u{2FF}", "\u{370}\u{37D}", "\u{37F}\u{1FFF}", "\u{200C}\u{200D}", "\u{2070}\u{218F}", "\u{2C00}\u{2FEF}", "\u{3001}\u{D7FF}", "\u{F900}\u{FDCF}", "\u{FDF0}\u{FFFD}", "\u{10000}\u{EFFFF}", "\u{0300}\u{036F}", "\u{203F}\u{2040}") + S("_-.\u{B7}"))^1

But that doesn't work, since

> The same is true for lpeg.R, although the latter will display an error message if used
> with multibyte characters. Therefore lpeg.R('aä') results in the message bad argument #1
> to 'R' (range must have two characters), since to lpeg, ä is two ’characters’ (bytes), so
> aä totals three. (https://texdoc.org/serve/luatex/0##680)

The easiest way that I found was to just cheat and use everything with
a TeX catcode 11 ("letters"):

     local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1

This isn't strictly speaking correct, but I think that it's close
enough. It seems to work correctly for Pablo's initial example,
but it may break something else.

-- Max

diff --git a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
index e635d40..97de3fd 100644
--- a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original
+++ b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
@@ -41,7 +41,7 @@ local semicolon        = P(";")
  local equal            = P("=")
  local ampersand        = P("&")
  
-local name             = (R("az","AZ","09") + S("_-."))^1
+local name             = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1
  local openbegin        = P("<")
  local openend          = P("</")
  local closebegin       = P("/>") + P(">")




___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-01 21:58       ` Max Chernoff via ntg-context
@ 2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
  2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-02 15:36 UTC (permalink / raw)
  To: Max Chernoff via ntg-context; +Cc: Pablo Rodriguez

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

On 6/1/22 23:58, Max Chernoff via ntg-context wrote:
>> Now, I still don’t understand LPEG and don’t know if there’s a general
>> “character” class that doesn’t need a list...

Many thanks for your reply, Hraban.

> The easiest way that I found was to just cheat and use everything with
> a TeX catcode 11 ("letters"):
>
>   local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1

Many thanks for your reply, Max,

I’m afraid I cannot make your proposed fix work.

For the sake of consistency (with buff-imp-xml.lua), I think the patch
should read (also attached to the message to avoid wrong line breaking):

--- scite-context-lexer-xml.lua	2022-06-01 17:24:38.625976000 +0200
+++
context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
2022-06-02 16:37:30.112824947 +0200
@@ -13,7 +13,7 @@
 -- todo: parse entities in attributes

 local global, string, table, lpeg = _G, string, table, lpeg
-local P, R, S, C, Cmt, Cp = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt,
lpeg.Cp
+local P, R, S, C, Cmt, Cp, lpatterns = lpeg.P, lpeg.R, lpeg.S, lpeg.C,
lpeg.Cmt, lpeg.Cp, lpeg.patterns
 local type = type
 local match, find = string.match, string.find

@@ -41,7 +41,8 @@
 local equal            = P("=")
 local ampersand        = P("&")

-local name             = (R("az","AZ","09") + S("_-."))^1
+local alsoname         = lpatterns.utf8two + lpatterns.utf8three +
lpatterns.utf8four
+local name             = (R("az","AZ","09") + S("_-.") + + alsoname)^1
 local openbegin        = P("<")
 local openend          = P("</")
 local closebegin       = P("/>") + P(">")

But I’m afraid I cannot make it work on my computer (Linux64).

On another Win64 computer, both patches worked perfectly fine.

Both machines run LMTX current latest. So I have an issue on my
installation that I have to fix first.

Many thanks for your help,

Pablo

[-- Attachment #2: scite-xml.diff --]
[-- Type: text/x-patch, Size: 982 bytes --]

--- scite-context-lexer-xml.lua	2022-06-01 17:24:38.625976000 +0200
+++ context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua	2022-06-02 16:37:30.112824947 +0200
@@ -13,7 +13,7 @@
 -- todo: parse entities in attributes
 
 local global, string, table, lpeg = _G, string, table, lpeg
-local P, R, S, C, Cmt, Cp = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt, lpeg.Cp
+local P, R, S, C, Cmt, Cp, lpatterns = lpeg.P, lpeg.R, lpeg.S, lpeg.C, lpeg.Cmt, lpeg.Cp, lpeg.patterns
 local type = type
 local match, find = string.match, string.find
 
@@ -41,7 +41,8 @@
 local equal            = P("=")
 local ampersand        = P("&")
 
-local name             = (R("az","AZ","09") + S("_-."))^1
+local alsoname         = lpatterns.utf8two + lpatterns.utf8three + lpatterns.utf8four
+local name             = (R("az","AZ","09") + S("_-.") + + alsoname)^1
 local openbegin        = P("<")
 local openend          = P("</")
 local closebegin       = P("/>") + P(">")

[-- Attachment #3: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
@ 2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
  2022-06-02 22:52             ` Max Chernoff via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-02 17:03 UTC (permalink / raw)
  To: Pablo Rodriguez via ntg-context; +Cc: Pablo Rodriguez

On 6/2/22 17:36, Pablo Rodriguez via ntg-context wrote:
> On 6/1/22 23:58, Max Chernoff via ntg-context wrote:
>>
>> local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1
>
> I’m afraid I cannot make your proposed fix work.

Even with a brand new install, neither of both patches works for me.

I don’t know what I may be missing on my installation.

Do you have any hint about what I am doing wrong?

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
@ 2022-06-02 22:52             ` Max Chernoff via ntg-context
  2022-06-04  8:42               ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Max Chernoff via ntg-context @ 2022-06-02 22:52 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff, oinos

> For the sake of consistency (with buff-imp-xml.lua), I think the patch
> should read
 > [...]
> +local alsoname         = lpatterns.utf8two + lpatterns.utf8three +
> lpatterns.utf8four

I think that that pattern is a little too broad, since it will match any 
non-ASCII Unicode character. Things like U+202E (xkcd.com/1137), U+00A0 
(no-break space), etc are valid UTF-8 characters, but not valid XML tag 
names. Neither of these two characters are matched by the TeX catcode 
check. This doesn't make any real difference for a syntax highlighter 
though.

> +local name             = (R("az","AZ","09") + S("_-.") + + alsoname)^1

There's a doubled plus in the middle there. The patch works when I 
remove it.

> But I’m afraid I cannot make it work on my computer (Linux64).
> 
> On another Win64 computer, both patches worked perfectly fine.

Hmm, that's really weird. Both patches work for me on my main Win64 
computer (after I fixed the extra plus). I also pulled the 
"contextgarden/context:lmtx" Docker image (Debian sid), and both patches 
worked there too. I get this from inside the container:

     root@e8d29a32595c:~# cat /etc/os-release
     PRETTY_NAME="Debian GNU/Linux bookworm/sid"
     NAME="Debian GNU/Linux"
     ID=debian
     HOME_URL="https://www.debian.org/"
     SUPPORT_URL="https://www.debian.org/support"
     BUG_REPORT_URL="https://bugs.debian.org/"

     root@e8d29a32595c:~# locale
     LANG=
     LANGUAGE=
     LC_CTYPE="POSIX"
     LC_NUMERIC="POSIX"
     LC_TIME="POSIX"
     LC_COLLATE="POSIX"
     LC_MONETARY="POSIX"
     LC_MESSAGES="POSIX"
     LC_PAPER="POSIX"
     LC_NAME="POSIX"
     LC_ADDRESS="POSIX"
     LC_TELEPHONE="POSIX"
     LC_MEASUREMENT="POSIX"
     LC_IDENTIFICATION="POSIX"
     LC_ALL=

     root@e8d29a32595c:~# xxd test.tex
     00000000: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
     00000010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
     00000020: 6172 7454 4558 7061 6765 5b6f 6666 7365  artTEXpage[offse
     00000030: 743d 3165 785d 0a5c 7479 7065 5b6f 7074  t=1ex].\type[opt
     00000040: 696f 6e3d 786d 6c5d 7b3c 616e 732f 3e7d  ion=xml]{<ans/>}
     00000050: 0a5c 7479 7065 5b6f 7074 696f 6e3d 786d  .\type[option=xm
     00000060: 6c5d 7b3c c3a1 c3b1 c39f 2f3e 7d0a 5c73  l]{<....../>}.\s
     00000070: 746f 7054 4558 7061 6765 0a5c 7374 6f70  topTEXpage.\stop
     00000080: 7465 7874 0a                             text

     root@e8d29a32595c:~# context --version
     mtx-context     | ConTeXt Process Management 1.04
     mtx-context     |
     mtx-context     | main context file: [snip]
     mtx-context     | current version: 2022.05.11 11:36
     mtx-context     | main context file: [snip]
     mtx-context     | current version: 2022.05.11 11:36

     ldd "$(type -p luametatex)"
         linux-vdso.so.1 (0x00007ffdbe9a5000)
         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4b034d4000)
         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007f4b034b3000)
         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4b0336f000)
         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4b03196000)
         /lib64/ld-linux-x86-64.so.2 (0x00007f4b03a55000)

Is this perhaps a weird locale or encoding issue? Maybe try compiling with:

     LC_ALL=C.UTF-8 LANG=C.UTF-8 context test.tex

or

     LC_ALL=POSIX LANG=POSIX context test.tex

I'm surprised Linux is the one not working here, since it's usually 
Windows that has text encoding issues with its weird hybrid of DOS 
codepages and UTF-16+BOM.

The only other thing that I can think of is a weird library issue with 
your distro, but LuaMetaTeX is statically linked. Not sure what else to 
check here.

-- Max
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-02 22:52             ` Max Chernoff via ntg-context
@ 2022-06-04  8:42               ` Pablo Rodriguez via ntg-context
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-04  8:42 UTC (permalink / raw)
  To: Max Chernoff via ntg-context; +Cc: Pablo Rodriguez

On 6/3/22 00:52, Max Chernoff via ntg-context wrote:
>> For the sake of consistency (with buff-imp-xml.lua), I think the patch
>> should read
>  > [...]
>> +local alsoname         = lpatterns.utf8two + lpatterns.utf8three +
>> lpatterns.utf8four
>
> I think that that pattern is a little too broad, since it will match any
> non-ASCII Unicode character. Things like U+202E (xkcd.com/1137), U+00A0
> (no-break space), etc are valid UTF-8 characters, but not valid XML tag
> names. Neither of these two characters are matched by the TeX catcode
> check. This doesn't make any real difference for a syntax highlighter
> though.

Hi Max,

many thanks for your reply.

At best, the patch is only a suggestion and Hans will merge the code he
sees it fits.

>> +local name             = (R("az","AZ","09") + S("_-.") + + alsoname)^1
>
> There's a doubled plus in the middle there. The patch works when I
> remove it.

I noticed it too just after sending the message to the list, but I had
to solve the issue with my installation first.

>> But I’m afraid I cannot make it work on my computer (Linux64).
>>
>> On another Win64 computer, both patches worked perfectly fine.
>
> Hmm, that's really weird. Both patches work for me on my main Win64
> computer (after I fixed the extra plus).

It was a stupid mistake on my side. The patch I sent before points to
the error:

--- scite-context-lexer-xml.lua	2022-06-01 17:24:38.625976000 +0200
+++
context/tex/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
2022-06-02 16:37:30.112824947 +0200

I was compiling the sample file in the directory where the unmodified
version of "scite-context-lexer-xml.lua" was running.

ConTeXt was reading the unmodified file and not the modified one, but
that was all my fault.

Now I have to find a MWE for issues I’m experiencing with XML sources
and using the scite module.

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: issue with scite module
  2022-06-04  9:59 Pablo Rodriguez via ntg-context
@ 2022-06-04 21:18 ` Max Chernoff via ntg-context
  0 siblings, 0 replies; 11+ messages in thread
From: Max Chernoff via ntg-context @ 2022-06-04 21:18 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff, oinos

> Could anyone confirm the issue or explain me what I am missing?

Confirmed on Win64 with the same version.

But I did find a workaround: if I convert your example from NFC 
(composed) to NFD (decomposed), it compiles fine.

     $ xxd xml.tex
     00000000: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
     00000010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
     00000020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
     00000030: 4c0a 5c73 746f 7074 6578 740a            L.\stoptext.

     $ context xml
     [...]
     ConTeXt  ver: 2022.05.11 11:36 LMTX  fmt: 2022.6.2
     [...]
     The file ended when scanning an argument.
     [...]
     mtx-context     | fatal error: return code: 1

     $ uconv -x any-nfd xml.tex | sponge xml.tex

     $ xxd xml.tex
     00000000: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
     00000010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
     00000020: 6172 7458 4d4c 206e cc83 5c73 746f 7058  artXML n..\stopX
     00000030: 4d4c 0a5c 7374 6f70 7465 7874 0a         ML.\stoptext.

     $ context xml
     [success]

This also gives us a hint as to what the problem is:

     $ echo -n 'ñ' | xxd
     00000000: c3b1                                     ..

     $ echo -n 'ñ' | uconv -x any-nfd | xxd
     00000000: 6ecc 83                                  ...

     $ xxd xml.tex
     00000020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
                                ^^ ^^
     $ xxd xml.log
     00000570: 5c73 6c78 6465 6661 756c 747b c37d 7d5c  \slxdefault{.}}\
                                             ^^

The character "ñ" in UTF-8 NFC is "0xC3, 0xB1". The "0xC3" starts a 
2-byte character, while "0xB1" is a continuation character. In the error 
message from the log, we have "0xC3, 0x7D" which is a 2-byte leading 
byte followed by an ASCII character, which is invalid UTF-8.

I'm guessing that what's happening is the module code is just grabbing 
one character at a time, which works for ASCII, but can lead to orphaned 
characters in Unicode. The NFD form fixes this since the first byte of 
the line is the plain ASCII "n", which can freely be treated as a single 
byte.

This NFD workaround should hopefully "fix" things for basic Latin 
characters with accents, but it probably won't help with non-Latin 
characters since there isn't an ASCII character to decompose them into.

-- Max
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* issue with scite module
@ 2022-06-04  9:59 Pablo Rodriguez via ntg-context
  2022-06-04 21:18 ` Max Chernoff via ntg-context
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-06-04  9:59 UTC (permalink / raw)
  To: ntg-context; +Cc: Pablo Rodriguez

Dear list,

I have this minimal sample (with current latest from 2022.05.11 11:36 on
Linux64):

  \usemodule[scite]
  \starttext
  \startXML ñ\stopXML
  \stoptext

Commenting out the first line avoids compilation error.

Replacing ñ with n also allows compilation.

I think there may be an error in m-scite.mkiv.

The inclusion of non-ASCII characters in XML code seems to leave an
unclosed argument.

Could anyone confirm the issue or explain me what I am missing?

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-04 21:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-01 16:47 issue with scite module Pablo Rodriguez via ntg-context
2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
2022-06-01 21:58       ` Max Chernoff via ntg-context
2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
2022-06-02 22:52             ` Max Chernoff via ntg-context
2022-06-04  8:42               ` Pablo Rodriguez via ntg-context
2022-06-04  9:59 Pablo Rodriguez via ntg-context
2022-06-04 21:18 ` Max Chernoff via ntg-context

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).