* [NTG-context] rfc4180splitter not handling UTF-8 with BOM files
@ 2024-11-23 19:40 Marco Patzer
2024-11-23 20:09 ` [NTG-context] " Hans Hagen
0 siblings, 1 reply; 2+ messages in thread
From: Marco Patzer @ 2024-11-23 19:40 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1: Type: text/plain, Size: 993 bytes --]
Hi!
I run into a problem reading in certain CSV files. I nailed it down
to the following example:
\starttext
\startluacode
local mycsvsplitter = utilities.parsers.rfc4180splitter{
separator = ",",
quote = '"'}
-- fails with
-- token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')
-- local mycsv = io.loaddata("A.csv")
-- works
local mycsv = io.loaddata("B.csv")
local tablerows = mycsvsplitter(mycsv)
context(tablerows[1][1])
context(" ")
context(tablerows[1][2])
\stopluacode
\stoptext
The compilation fails with
token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')
The two files are attached. The only difference is that:
A.csv: Unicode text, UTF-8 (with BOM) text
B.csv: ASCII text
Somehow the rfc4180splitter chokes on UTF-8 with BOM files.
io.loaddata succeeds as far as I can tell. Is there a way to read in
those files without pre-processing them?
Marco
version: 2024.09.17 13:15
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: A.csv --]
[-- Type: text/csv, Size: 16 bytes --]
"Date","ID"
[-- Attachment #3: B.csv --]
[-- Type: text/csv, Size: 12 bytes --]
"Date","ID"
[-- Attachment #4: Type: text/plain, Size: 511 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
* [NTG-context] Re: rfc4180splitter not handling UTF-8 with BOM files
2024-11-23 19:40 [NTG-context] rfc4180splitter not handling UTF-8 with BOM files Marco Patzer
@ 2024-11-23 20:09 ` Hans Hagen
0 siblings, 0 replies; 2+ messages in thread
From: Hans Hagen @ 2024-11-23 20:09 UTC (permalink / raw)
To: ntg-context
On 11/23/2024 8:40 PM, Marco Patzer wrote:
> Somehow the rfc4180splitter chokes on UTF-8 with BOM files.
> io.loaddata succeeds as far as I can tell. Is there a way to read in
> those files without pre-processing them?
I'll send you a patch to test.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive : https://github.com/contextgarden/context
wiki : https://wiki.contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-11-23 20:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-23 19:40 [NTG-context] rfc4180splitter not handling UTF-8 with BOM files Marco Patzer
2024-11-23 20:09 ` [NTG-context] " Hans Hagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).