From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <38536e85375e4f6b21154487cd83a1c7@9srv.net> To: 9fans@9fans.net Date: Tue, 1 Jul 2008 19:43:17 -0400 From: a@9srv.net In-Reply-To: <88F134A3A25A8FF1B2F6BE86@F74D39FA044AA309EAEA14B9> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] sad commentary Topicbox-Message-UUID: d564a832-ead3-11e9-9d60-3106f5b1d025 // 5. Oh, and that thing on (4) is the Discordian transliteration of what= ever=20 // was written on the apple. Greek text input to a mail client on Windows= .=20 // Check if you can read it on the "mother of UTF-8." If you do you're=20 // "almost" there, if you don't... I was surprised by this, so I actually fired up my XP install. Yes, it lo= oks like you finally can get some non-latin characters into thing. Good for them. It looks like the command prompt even *almost* gets it right: ?=CE=B1???=CF=83=CF=84? Well, three characters for eight isn't so bad, right? And it's just glyph= s, right? Surely the gui stuff does better. Let's stick it in the search box= ... ooo, look at that! All characters show up! And the search... looks for "?a???st?". Uh, what? Note the transposition into roughly similar latin characters. It clearly has some understanding of what the characters are, but has decided to look for something else. IE and Firefox will let me search for such things properly, but (as with the =CE=BA=CE=B1=CE=BB=CE= =BB=CE=B9=CF=83=CF=84=CE=B9 in your original message) the tops of many of the returned glyphs are cut off. That is to say, the Unicode is *almost* there. Conversely, in Plan 9, the following involves a number of tools certainly not designed for the task, but works just fine: {echo /=CE=BA=CE=B1=CE=BB=CE=BB=CE=B9=CF=83=CF=84=CE=B9 ; echo '-/^$/,+/= ^$/'} | sam -d `{grep -l [=CE=91-=CF=89] /mail/fs/mbox/*/body} >[2] /dev/null | sed -n '2,$p' I'm curious how you'd do something similar elsewhere. You really just haven't bothered, have you? Anthony