9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] I don't understand utf8 (it seems)
@ 2015-01-05 21:52 Steve Simon
  2015-01-05 22:05 ` erik quanstrom
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Steve Simon @ 2015-01-05 21:52 UTC (permalink / raw)
  To: 9fans

I am trying to parse a stream from a tcp connection.

I think the data is utf8, here is a sample

	 20 2d 20 c8 65 73 6b fd 20 72 6f 7a 68 6c 61 73

which when I print it I get:

     -       e  s  k       r  o  z  h  l  a  s           
           ^          ^
        missing    missing

there are two missing characters. Ok, bad UTF8 perhaps?
but when I try unicode(1) I see:

	unicode c8 fd
	È
	ý

Is this 8 bit runes? (!)
Is there a name for such a thing?
Is this common?
Is it just MS code pages but the >0x7f values happen (designed to) to map onto the same letters as utf8?

thanks in advance of useful suggestions ☺

-Steve




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-01-07  9:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05 21:52 [9fans] I don't understand utf8 (it seems) Steve Simon
2015-01-05 22:05 ` erik quanstrom
2015-01-05 22:27   ` Quintile
2015-01-05 22:15 ` Antons Suspans
2015-01-05 22:31 ` Bakul Shah
2015-01-06 19:57 ` Matěj Cepl
2015-01-06 22:09   ` Quintile
2015-01-07  9:43     ` a.f.e.belinfante

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).