* mothra not respecting charset=utf-8
@ 2014-02-16 10:47 Ethan Grammatikidis
2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
2014-02-16 18:38 ` cinap_lenrek
0 siblings, 2 replies; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-16 10:47 UTC (permalink / raw)
To: 9front
i just ran into a little problem because http's default encoding is
iso-8859-1. i modified rc-httpd to correct it like this:
type = `{echo $full_path | awk -f $libdir/quickfile.awk}
if(~ $"type text/plain) {
type = 'text/plain; charset=utf-8'
}
in firefox this worked fine, but mothra still loads the file as if it
were iso-8859-1. here's the file; the problem characters are the degree
signs near the end of several lines:
http://ethan.uk.to/static/awk-stew/sensors_compact
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8
2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
@ 2014-02-16 11:25 ` Ethan Grammatikidis
2014-02-16 14:46 ` Nick Owens
2014-02-16 18:38 ` cinap_lenrek
1 sibling, 1 reply; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-16 11:25 UTC (permalink / raw)
To: 9front
mischief reports (and i've confirmed) the following patch works. i'm
almost scared to ask why.
diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
--- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500
+++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 03:13:05 2014 -0800
@@ -166,7 +166,7 @@
g->hbufp=g->hbuf;
g->ehbuf=g->hbuf+n;
}
- c=*g->hbufp++&255;
+ c=*g->hbufp++;
if(c=='\n') g->lineno++;
return c;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8
2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
@ 2014-02-16 14:46 ` Nick Owens
0 siblings, 0 replies; 5+ messages in thread
From: Nick Owens @ 2014-02-16 14:46 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]
On Sun, Feb 16, 2014 at 11:25:16AM +0000, Ethan Grammatikidis wrote:
> mischief reports (and i've confirmed) the following patch works. i'm
> almost scared to ask why.
>
> diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
> --- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500
> +++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 03:13:05 2014 -0800
> @@ -166,7 +166,7 @@
> g->hbufp=g->hbuf;
> g->ehbuf=g->hbuf+n;
> }
> - c=*g->hbufp++&255;
> + c=*g->hbufp++;
> if(c=='\n') g->lineno++;
> return c;
> }
>
i'm not sure why i said that was a good idea.
maybe this one isn't either.
here we actually read full runes from the fd.
this seems to fix visiting
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
in mothra.
diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c
--- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500
+++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 04:10:34 2014 -0800
@@ -154,6 +154,7 @@
int n, c;
char err[1024];
if(g->hbufp==g->ehbuf){
+doread:
n=read(g->hfd, g->hbuf, NHBUF);
if(n<=0){
if(n<0){
@@ -166,7 +167,11 @@
g->hbufp=g->hbuf;
g->ehbuf=g->hbuf+n;
}
- c=*g->hbufp++&255;
+ if(!fullrune(g->hbufp, g->ehbuf - g->hbufp)) {
+ goto doread;
+ }
+
+ g->hbufp += chartorune((Rune*)&c, g->hbufp);
if(c=='\n') g->lineno++;
return c;
}
[-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8
2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
@ 2014-02-16 18:38 ` cinap_lenrek
2014-02-18 22:32 ` Ethan Grammatikidis
1 sibling, 1 reply; 5+ messages in thread
From: cinap_lenrek @ 2014-02-16 18:38 UTC (permalink / raw)
To: 9front
pushed fix. the problem is that pl_readc() returns bytes, not full runes.
the byte to rune translation was done in pl_nextc() but the plaintext handler
doesnt use pl_nextc() as that does additional html tokenization.
http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03
--
cinap
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8
2014-02-16 18:38 ` cinap_lenrek
@ 2014-02-18 22:32 ` Ethan Grammatikidis
0 siblings, 0 replies; 5+ messages in thread
From: Ethan Grammatikidis @ 2014-02-18 22:32 UTC (permalink / raw)
To: 9front
On Sun, Feb 16, 2014, at 06:38 PM, cinap_lenrek@felloff.net wrote:
> pushed fix. the problem is that pl_readc() returns bytes, not full runes.
> the byte to rune translation was done in pl_nextc() but the plaintext
> handler
> doesnt use pl_nextc() as that does additional html tokenization.
>
> http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03
>
> --
> cinap
ah, thank you.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-18 22:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis
2014-02-16 11:25 ` [9front] " Ethan Grammatikidis
2014-02-16 14:46 ` Nick Owens
2014-02-16 18:38 ` cinap_lenrek
2014-02-18 22:32 ` Ethan Grammatikidis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).