9front - general discussion about 9front
 help / color / mirror / Atom feed
* mothra not respecting charset=utf-8
@ 2014-02-16 10:47 Ethan Grammatikidis
  2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
  2014-02-16 18:38 ` cinap_lenrek
  0 siblings, 2 replies; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-16 10:47 UTC (permalink / raw)
  To: 9front

i just ran into a little problem because http's default encoding is
iso-8859-1. i modified rc-httpd to correct it like this:

type = `{echo $full_path | awk -f $libdir/quickfile.awk}
if(~ $"type text/plain) {
	type = 'text/plain; charset=utf-8'
}

in firefox this worked fine, but mothra still loads the file as if it
were iso-8859-1. here's the file; the problem characters are the degree
signs near the end of several lines:

http://ethan.uk.to/static/awk-stew/sensors_compact


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9front] mothra not respecting charset=utf-8
  2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
@ 2014-02-16 11:25 ` Ethan Grammatikidis
  2014-02-16 14:46   ` Nick Owens
  2014-02-16 18:38 ` cinap_lenrek
  1 sibling, 1 reply; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-16 11:25 UTC (permalink / raw)
  To: 9front

mischief reports (and i've confirmed) the following patch works. i'm
almost scared to ask why.

diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
--- a/sys/src/cmd/mothra/rdhtml.c       Sat Feb 15 17:18:58 2014 -0500
+++ b/sys/src/cmd/mothra/rdhtml.c       Sat Feb 15 03:13:05 2014 -0800
@@ -166,7 +166,7 @@
 		g->hbufp=g->hbuf;
 		g->ehbuf=g->hbuf+n;
 	}
-       c=*g->hbufp++&255;
+       c=*g->hbufp++;
 	if(c=='\n') g->lineno++;
 	return c;
 }



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9front] mothra not respecting charset=utf-8
  2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
@ 2014-02-16 14:46   ` Nick Owens
  0 siblings, 0 replies; 5+ messages in thread
From: Nick Owens @ 2014-02-16 14:46 UTC (permalink / raw)
  To: 9front

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

On Sun, Feb 16, 2014 at 11:25:16AM +0000, Ethan Grammatikidis wrote:
> mischief reports (and i've confirmed) the following patch works. i'm
> almost scared to ask why.
> 
> diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
> --- a/sys/src/cmd/mothra/rdhtml.c       Sat Feb 15 17:18:58 2014 -0500
> +++ b/sys/src/cmd/mothra/rdhtml.c       Sat Feb 15 03:13:05 2014 -0800
> @@ -166,7 +166,7 @@
>  		g->hbufp=g->hbuf;
>  		g->ehbuf=g->hbuf+n;
>  	}
> -       c=*g->hbufp++&255;
> +       c=*g->hbufp++;
>  	if(c=='\n') g->lineno++;
>  	return c;
>  }
> 

i'm not sure why i said that was a good idea.
maybe this one isn't either.

here we actually read full runes from the fd.

this seems to fix visiting
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
in mothra.


diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
--- a/sys/src/cmd/mothra/rdhtml.c	Sat Feb 15 17:18:58 2014 -0500
+++ b/sys/src/cmd/mothra/rdhtml.c	Sat Feb 15 04:10:34 2014 -0800
@@ -154,6 +154,7 @@
 	int n, c;
 	char err[1024];
 	if(g->hbufp==g->ehbuf){
+doread:
 		n=read(g->hfd, g->hbuf, NHBUF);
 		if(n<=0){
 			if(n<0){
@@ -166,7 +167,11 @@
 		g->hbufp=g->hbuf;
 		g->ehbuf=g->hbuf+n;
 	}
-	c=*g->hbufp++&255;
+	if(!fullrune(g->hbufp, g->ehbuf - g->hbufp)) {
+		goto doread;
+	}
+
+	g->hbufp += chartorune((Rune*)&c, g->hbufp);
 	if(c=='\n') g->lineno++;
 	return c;
 }


[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9front] mothra not respecting charset=utf-8
  2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
  2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
@ 2014-02-16 18:38 ` cinap_lenrek
  2014-02-18 22:32   ` Ethan Grammatikidis
  1 sibling, 1 reply; 5+ messages in thread
From: cinap_lenrek @ 2014-02-16 18:38 UTC (permalink / raw)
  To: 9front

pushed fix. the problem is that pl_readc() returns bytes, not full runes.
the byte to rune translation was done in pl_nextc() but the plaintext handler
doesnt use pl_nextc() as that does additional html tokenization.

http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03

--
cinap


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9front] mothra not respecting charset=utf-8
  2014-02-16 18:38 ` cinap_lenrek
@ 2014-02-18 22:32   ` Ethan Grammatikidis
  0 siblings, 0 replies; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-18 22:32 UTC (permalink / raw)
  To: 9front

On Sun, Feb 16, 2014, at 06:38 PM, cinap_lenrek@felloff.net wrote:
> pushed fix. the problem is that pl_readc() returns bytes, not full runes.
> the byte to rune translation was done in pl_nextc() but the plaintext
> handler
> doesnt use pl_nextc() as that does additional html tokenization.
> 
> http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03
> 
> --
> cinap

ah, thank you.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-18 22:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
2014-02-16 14:46   ` Nick Owens
2014-02-16 18:38 ` cinap_lenrek
2014-02-18 22:32   ` Ethan Grammatikidis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).