ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Max Chernoff via ntg-context <ntg-context@ntg.nl>
To: ntg-context@ntg.nl
Cc: Max Chernoff <mseven@telus.net>, oinos@gmx.es
Subject: Re: issue with scite module
Date: Sat, 4 Jun 2022 15:18:55 -0600	[thread overview]
Message-ID: <8ecb9fba-0cab-0e7c-64f1-e16fa775dd46@telus.net> (raw)
In-Reply-To: <c4b62780-0c0c-34f7-c1c3-4129689d90ba@gmx.es>

> Could anyone confirm the issue or explain me what I am missing?

Confirmed on Win64 with the same version.

But I did find a workaround: if I convert your example from NFC 
(composed) to NFD (decomposed), it compiles fine.

     $ xxd xml.tex
     00000000: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
     00000010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
     00000020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
     00000030: 4c0a 5c73 746f 7074 6578 740a            L.\stoptext.

     $ context xml
     [...]
     ConTeXt  ver: 2022.05.11 11:36 LMTX  fmt: 2022.6.2
     [...]
     The file ended when scanning an argument.
     [...]
     mtx-context     | fatal error: return code: 1

     $ uconv -x any-nfd xml.tex | sponge xml.tex

     $ xxd xml.tex
     00000000: 5c75 7365 6d6f 6475 6c65 5b73 6369 7465  \usemodule[scite
     00000010: 5d0a 5c73 7461 7274 7465 7874 0a5c 7374  ].\starttext.\st
     00000020: 6172 7458 4d4c 206e cc83 5c73 746f 7058  artXML n..\stopX
     00000030: 4d4c 0a5c 7374 6f70 7465 7874 0a         ML.\stoptext.

     $ context xml
     [success]

This also gives us a hint as to what the problem is:

     $ echo -n 'ñ' | xxd
     00000000: c3b1                                     ..

     $ echo -n 'ñ' | uconv -x any-nfd | xxd
     00000000: 6ecc 83                                  ...

     $ xxd xml.tex
     00000020: 6172 7458 4d4c 20c3 b15c 7374 6f70 584d  artXML ..\stopXM
                                ^^ ^^
     $ xxd xml.log
     00000570: 5c73 6c78 6465 6661 756c 747b c37d 7d5c  \slxdefault{.}}\
                                             ^^

The character "ñ" in UTF-8 NFC is "0xC3, 0xB1". The "0xC3" starts a 
2-byte character, while "0xB1" is a continuation character. In the error 
message from the log, we have "0xC3, 0x7D" which is a 2-byte leading 
byte followed by an ASCII character, which is invalid UTF-8.

I'm guessing that what's happening is the module code is just grabbing 
one character at a time, which works for ASCII, but can lead to orphaned 
characters in Unicode. The NFD form fixes this since the first byte of 
the line is the plain ASCII "n", which can freely be treated as a single 
byte.

This NFD workaround should hopefully "fix" things for basic Latin 
characters with accents, but it probably won't help with non-Latin 
characters since there isn't an ASCII character to decompose them into.

-- Max
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2022-06-04 21:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-04  9:59 Pablo Rodriguez via ntg-context
2022-06-04 21:18 ` Max Chernoff via ntg-context [this message]
  -- strict thread matches above, loose matches on Subject: below --
2022-06-01 16:47 Pablo Rodriguez via ntg-context
2022-06-01 16:58 ` Henning Hraban Ramm via ntg-context
2022-06-01 17:45   ` Pablo Rodriguez via ntg-context
2022-06-01 19:00     ` Henning Hraban Ramm via ntg-context
2022-06-01 21:58       ` Max Chernoff via ntg-context
2022-06-02 15:36         ` Pablo Rodriguez via ntg-context
2022-06-02 17:03           ` Pablo Rodriguez via ntg-context
2022-06-02 22:52             ` Max Chernoff via ntg-context
2022-06-04  8:42               ` Pablo Rodriguez via ntg-context

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ecb9fba-0cab-0e7c-64f1-e16fa775dd46@telus.net \
    --to=ntg-context@ntg.nl \
    --cc=mseven@telus.net \
    --cc=oinos@gmx.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).