edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Karl Dahlke <eklhad@comcast.net>
To: edbrowse-dev@lists.the-brannons.com
Subject: [Edbrowse-dev]  Edbrowse recognizing site as binary data
Date: Mon, 25 Jan 2016 07:39:59 -0500	[thread overview]
Message-ID: <20160025073959.eklhad@comcast.net> (raw)
In-Reply-To: <20160122171305.GE2555@Kraftkrust>

As of this commit, edbrowse recognizes utf16 or utf32, according to the
byte order mark, and converts to utf8, the internal edbrowse format,
and the only format understood by pcre.
Text is converted back if the same file is written.
If text is sent anywhere else it remains in utf8.
This is consistent with our iso utf8 conversions.
Big and little endian are recognized.

I ran a few tests but it is not thoroughly tested,
there are lots of corner cases.

This has been muched discusssed, and didn't seem worth doing,
but Geoff pointed out that such files are more common on Windows,
in fact I think he first discovered the problem,
and much of the Asian world uses utf16 in files and websites
because it is the most efficient way to represent such text,
more efficient than utf8.

So this web page, coming down as utf16, now works.
https://portal.slm.tu-dresden.de

Geoff if you have some 16 or 32 files, you may wish to test,
	edbrowse whatever-file-utf32.txt
and see if it looks right,
and beyond this, make some edits and write the file
and see if the edits stick and if the file remains in its original format.

Ok, I already found a windows bug just by thinking about it.
Text files are open text mode but when mapping back to utf 16 or 32
they need to be binary mode.
I may even have to stick in \r\0\0\0 manually. Arrgghh.
I'll look into it.

Karl Dahlke

      reply	other threads:[~2016-01-25 12:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 13:01 Sebastian Humenda
2016-01-22 14:30 ` Karl Dahlke
2016-01-22 17:13   ` Sebastian Humenda
2016-01-25 12:39     ` Karl Dahlke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160025073959.eklhad@comcast.net \
    --to=eklhad@comcast.net \
    --cc=edbrowse-dev@lists.the-brannons.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).