ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* [NTG-context] rfc4180splitter not handling UTF-8 with BOM files
@ 2024-11-23 19:40 Marco Patzer
  2024-11-23 20:09 ` [NTG-context] " Hans Hagen
  0 siblings, 1 reply; 2+ messages in thread
From: Marco Patzer @ 2024-11-23 19:40 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

Hi!

I run into a problem reading in certain CSV files. I nailed it down
to the following example:

\starttext
\startluacode
  local mycsvsplitter = utilities.parsers.rfc4180splitter{
    separator = ",",
    quote = '"'}

  -- fails with
  -- token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')
  -- local mycsv = io.loaddata("A.csv")

  -- works
  local mycsv = io.loaddata("B.csv")

  local tablerows = mycsvsplitter(mycsv)
  context(tablerows[1][1])
  context(" ")
  context(tablerows[1][2])
\stopluacode
\stoptext

The compilation fails with

  token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')

The two files are attached. The only difference is that:

  A.csv: Unicode text, UTF-8 (with BOM) text
  B.csv: ASCII text

Somehow the rfc4180splitter chokes on UTF-8 with BOM files.
io.loaddata succeeds as far as I can tell. Is there a way to read in
those files without pre-processing them?

Marco

version: 2024.09.17 13:15

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: A.csv --]
[-- Type: text/csv, Size: 16 bytes --]

"Date","ID"

[-- Attachment #3: B.csv --]
[-- Type: text/csv, Size: 12 bytes --]

"Date","ID"

[-- Attachment #4: Type: text/plain, Size: 511 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [NTG-context] Re: rfc4180splitter not handling UTF-8 with BOM files
  2024-11-23 19:40 [NTG-context] rfc4180splitter not handling UTF-8 with BOM files Marco Patzer
@ 2024-11-23 20:09 ` Hans Hagen
  0 siblings, 0 replies; 2+ messages in thread
From: Hans Hagen @ 2024-11-23 20:09 UTC (permalink / raw)
  To: ntg-context

On 11/23/2024 8:40 PM, Marco Patzer wrote:

> Somehow the rfc4180splitter chokes on UTF-8 with BOM files.
> io.loaddata succeeds as far as I can tell. Is there a way to read in
> those files without pre-processing them?

I'll send you a patch to test.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-11-23 20:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-23 19:40 [NTG-context] rfc4180splitter not handling UTF-8 with BOM files Marco Patzer
2024-11-23 20:09 ` [NTG-context] " Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).