* mothra not respecting charset=utf-8 @ 2014-02-16 10:47 Ethan Grammatikidis 2014-02-16 11:25 ` [9front] " Ethan Grammatikidis 2014-02-16 18:38 ` cinap_lenrek 0 siblings, 2 replies; 5+ messages in thread From: Ethan Grammatikidis @ 2014-02-16 10:47 UTC (permalink / raw) To: 9front i just ran into a little problem because http's default encoding is iso-8859-1. i modified rc-httpd to correct it like this: type = `{echo $full_path | awk -f $libdir/quickfile.awk} if(~ $"type text/plain) { type = 'text/plain; charset=utf-8' } in firefox this worked fine, but mothra still loads the file as if it were iso-8859-1. here's the file; the problem characters are the degree signs near the end of several lines: http://ethan.uk.to/static/awk-stew/sensors_compact ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8 2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis @ 2014-02-16 11:25 ` Ethan Grammatikidis 2014-02-16 14:46 ` Nick Owens 2014-02-16 18:38 ` cinap_lenrek 1 sibling, 1 reply; 5+ messages in thread From: Ethan Grammatikidis @ 2014-02-16 11:25 UTC (permalink / raw) To: 9front mischief reports (and i've confirmed) the following patch works. i'm almost scared to ask why. diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c --- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500 +++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 03:13:05 2014 -0800 @@ -166,7 +166,7 @@ g->hbufp=g->hbuf; g->ehbuf=g->hbuf+n; } - c=*g->hbufp++&255; + c=*g->hbufp++; if(c=='\n') g->lineno++; return c; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8 2014-02-16 11:25 ` [9front] " Ethan Grammatikidis @ 2014-02-16 14:46 ` Nick Owens 0 siblings, 0 replies; 5+ messages in thread From: Nick Owens @ 2014-02-16 14:46 UTC (permalink / raw) To: 9front [-- Attachment #1: Type: text/plain, Size: 1402 bytes --] On Sun, Feb 16, 2014 at 11:25:16AM +0000, Ethan Grammatikidis wrote: > mischief reports (and i've confirmed) the following patch works. i'm > almost scared to ask why. > > diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c > --- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500 > +++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 03:13:05 2014 -0800 > @@ -166,7 +166,7 @@ > g->hbufp=g->hbuf; > g->ehbuf=g->hbuf+n; > } > - c=*g->hbufp++&255; > + c=*g->hbufp++; > if(c=='\n') g->lineno++; > return c; > } > i'm not sure why i said that was a good idea. maybe this one isn't either. here we actually read full runes from the fd. this seems to fix visiting https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt in mothra. diff -r 709e18f21cad sys/src/cmd/mothra/rdhtml.c --- a/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 17:18:58 2014 -0500 +++ b/sys/src/cmd/mothra/rdhtml.c Sat Feb 15 04:10:34 2014 -0800 @@ -154,6 +154,7 @@ int n, c; char err[1024]; if(g->hbufp==g->ehbuf){ +doread: n=read(g->hfd, g->hbuf, NHBUF); if(n<=0){ if(n<0){ @@ -166,7 +167,11 @@ g->hbufp=g->hbuf; g->ehbuf=g->hbuf+n; } - c=*g->hbufp++&255; + if(!fullrune(g->hbufp, g->ehbuf - g->hbufp)) { + goto doread; + } + + g->hbufp += chartorune((Rune*)&c, g->hbufp); if(c=='\n') g->lineno++; return c; } [-- Attachment #2: Type: application/pgp-signature, Size: 834 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8 2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis 2014-02-16 11:25 ` [9front] " Ethan Grammatikidis @ 2014-02-16 18:38 ` cinap_lenrek 2014-02-18 22:32 ` Ethan Grammatikidis 1 sibling, 1 reply; 5+ messages in thread From: cinap_lenrek @ 2014-02-16 18:38 UTC (permalink / raw) To: 9front pushed fix. the problem is that pl_readc() returns bytes, not full runes. the byte to rune translation was done in pl_nextc() but the plaintext handler doesnt use pl_nextc() as that does additional html tokenization. http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03 -- cinap ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] mothra not respecting charset=utf-8 2014-02-16 18:38 ` cinap_lenrek @ 2014-02-18 22:32 ` Ethan Grammatikidis 0 siblings, 0 replies; 5+ messages in thread From: Ethan Grammatikidis @ 2014-02-18 22:32 UTC (permalink / raw) To: 9front On Sun, Feb 16, 2014, at 06:38 PM, cinap_lenrek@felloff.net wrote: > pushed fix. the problem is that pl_readc() returns bytes, not full runes. > the byte to rune translation was done in pl_nextc() but the plaintext > handler > doesnt use pl_nextc() as that does additional html tokenization. > > http://code.google.com/p/plan9front/source/detail?r=fa04f7734453fb580edf6af61b45a77fc8f0ee03 > > -- > cinap ah, thank you. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-18 22:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-16 10:47 mothra not respecting charset=utf-8 Ethan Grammatikidis 2014-02-16 11:25 ` [9front] " Ethan Grammatikidis 2014-02-16 14:46 ` Nick Owens 2014-02-16 18:38 ` cinap_lenrek 2014-02-18 22:32 ` Ethan Grammatikidis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).