edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] BOM
@ 2015-11-14  7:37 Karl Dahlke
  2015-11-14 10:17 ` Adam Thompson
  0 siblings, 1 reply; 2+ messages in thread
From: Karl Dahlke @ 2015-11-14  7:37 UTC (permalink / raw)
  To: Edbrowse-dev

The Windows port has raised the issue of the byte order mark,
which is prevalent in windows files, but virtually nonexistent in unix.
If we do choose to support this, I would read the BOM,
convert the file to utf8 for internal use, then convert it back with its BOM
if that file or any portion of it was written to disk.
There is a precedent for this.
An iso8859 file is converted to utf8, then converted back upon write.
Try it and see.
But only iso8859-1, and even this we may not support for long,
as unix / linux is almost 100% utf8 at this point.
Anyway there is some machinery in place.

The real key for me is the search and substitute commands.
These are under control of pcre, which runs in utf8 mode.
/ni.o/ will match niño, with the dot matching
the 2 byte utf8 char n tilde.
So if everything is utf8 inside then all the searches and substitutes
will work the way our international users would want and expect.

This is thinking ahead, I don't expect to implement BOM tomorrow.

Karl Dahlke

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-11-14 10:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-14  7:37 [Edbrowse-dev] BOM Karl Dahlke
2015-11-14 10:17 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).