* Re: [9fans] file heuristics on troff input
@ 2008-07-15 20:07 erik quanstrom
0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2008-07-15 20:07 UTC (permalink / raw)
To: 9fans
> From: "roger peppe" <rogpeppe@gma...>
> one thing that has bugged me in the past: upas relies on file -m to
> determine the type of attachments, but file only reads the first block
> of the file, so if you've got a utf-8 file with the first non-ascii character
> beyond the 8192nd byte, you get corrupted mail.
> IMHO for the -m option, file should probably read the whole file,
> but there are probably good reasons for not doing so.
i don't think this is correct.
upas/marshal uses file -m to determine the mime type, e.g.
"text/plain" or "text/html" or whatever. it tests for bucky bits
in the entire body to determine if it is utf-8 or us-ascii.
(cf. upas/marshal/marshal.c:^/body) it uses this information
to emit a charset.
(obviously there's a whole in this nulls and bad utf can cause
this algorithm to go pear shaped.)
upas/nedmail also uses file -m. but since upas/fs does
character set translation, i don't see how using file could
be to blame for corrupt email.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
* [9fans] file heuristics on troff input
@ 2008-07-11 14:44 Pietro Gagliardi
2008-07-11 16:11 ` hiro
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Pietro Gagliardi @ 2008-07-11 14:44 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Just a bit of humor:
COMPUTER ME
% cd troff
% file *
advp9prog: directory yes (old attempt at plan 9 programmer's
guide)
algoawk: directory yes (awk book)
bentley.ms: troff -ms input yes (Bentley paper)
bentley2.ms: Ascii text wtf? (Bentley paper retry)
cod.ms: troff input why not -ms? (paper on calculator program
I'm writing)
cod.ms.part: Ascii file doesn't understand .ig and pic? (part of
cod.ms that doesn't belong yet)
forloop.ms: Ascii this is pic input (flowchart on how for loops
work in JavaScript)
jstut.ms: HTML file very wrong (JavaScript tutorial)
luxidejavu.ms: troff input fine (ripped from p9port, -ms .FP with
DejaVu and Luxi Sans)
programming.ms: c program no one would dare put a program that big
into one file, stupid (new attempt at programming tutorial, you'll see
it when it's done)
school: directory good (stuff for school)
% file advp9prog/*
adv9prog/ch1: Ascii wtf?
adv9prog/ch2: c program not again
adv9prog/dates: short Ascii all right (I date evereything for
record keeping purposes)
adv9prog/mkfile: short Ascii don't you know about mk?
% file algoawk/*
algoawk/book_macros: Ascii ?
algoawk/ch1: Ascii text oh my gawd, something different!
algoawk/colophon: Ascii ...
algoawk/dates: short Ascii good
algoawk/intro: Ascii no
algoawk/mkfile: Ascii still no mk...
algoawk/show: rc executable file right
% file school/*
... To save you the trouble, they're either directories or -
ms input, but either showing "directory" or "Ascii."
A full report of my troff directory and subdirectories is in /n/
sources/contrib/pietro/file.funny.
So that's only one file that is absolutely correct. It turns out that
the problem is file isn't reading the
.FP font
as a troff -ms macro line. In the books, they don't read enough lines
to see that there are more .PPs than there are #includes. Ah well.
And if you thought that was funny, look at the example of a file that
actually seeks to more than one line from UNIX in "The UNIX-HATERS
Handbook" (now a free PDF from its authors).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-11 14:44 Pietro Gagliardi
@ 2008-07-11 16:11 ` hiro
2008-07-11 17:30 ` Pietro Gagliardi
2008-07-11 16:55 ` C H Forsyth
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: hiro @ 2008-07-11 16:11 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> So that's only one file that is absolutely correct. It turns out that the
> problem is file isn't reading the
>
> .FP font
>
> as a troff -ms macro line. In the books, they don't read enough lines to
> see that there are more .PPs than there are #includes. Ah well.
>
> And if you thought that was funny, look at the example of a file that
> actually seeks to more than one line from UNIX in "The UNIX-HATERS Handbook"
> (now a free PDF from its authors).
So you have shown that you are more intelligent than a computer.
Or are you just bored and trolling?
To be honest, I don't find this very funny at all. Even though I wished I could.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-11 16:11 ` hiro
@ 2008-07-11 17:30 ` Pietro Gagliardi
0 siblings, 0 replies; 9+ messages in thread
From: Pietro Gagliardi @ 2008-07-11 17:30 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
What I'm pointing out is that file doesn't look hard enough to guess
accurately. Something like
.ds da December 4, 3210
.DA \*(da
.TL
would not be detected.
On Jul 11, 2008, at 12:11 PM, hiro wrote:
>> So that's only one file that is absolutely correct. It turns out
>> that the
>> problem is file isn't reading the
>>
>> .FP font
>>
>> as a troff -ms macro line. In the books, they don't read enough
>> lines to
>> see that there are more .PPs than there are #includes. Ah well.
>>
>> And if you thought that was funny, look at the example of a file that
>> actually seeks to more than one line from UNIX in "The UNIX-HATERS
>> Handbook"
>> (now a free PDF from its authors).
>
> So you have shown that you are more intelligent than a computer.
> Or are you just bored and trolling?
> To be honest, I don't find this very funny at all. Even though I
> wished I could.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-11 14:44 Pietro Gagliardi
2008-07-11 16:11 ` hiro
@ 2008-07-11 16:55 ` C H Forsyth
2008-07-11 17:00 ` Iruata Souza
2008-07-14 16:33 ` Russ Cox
3 siblings, 0 replies; 9+ messages in thread
From: C H Forsyth @ 2008-07-11 16:55 UTC (permalink / raw)
To: 9fans
> bentley.ms: troff -ms input yes (Bentley paper)
doctype might produce better guesses at troff macro packages and preprocessors
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-11 14:44 Pietro Gagliardi
2008-07-11 16:11 ` hiro
2008-07-11 16:55 ` C H Forsyth
@ 2008-07-11 17:00 ` Iruata Souza
2008-07-14 16:33 ` Russ Cox
3 siblings, 0 replies; 9+ messages in thread
From: Iruata Souza @ 2008-07-11 17:00 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Fri, Jul 11, 2008 at 11:44 AM, Pietro Gagliardi <pietro10@mac.com> wrote:
> Just a bit of humor:
>
> COMPUTER
> ME
> % cd troff
> % file *
> advp9prog: directory
> yes (old attempt at plan 9 programmer's guide)
> algoawk: directory
> yes (awk book)
> bentley.ms: troff -ms input yes (Bentley
> paper)
> bentley2.ms: Ascii text wtf?
> (Bentley paper retry)
> cod.ms: troff input why
> not -ms? (paper on calculator program I'm writing)
> cod.ms.part: Ascii file
> doesn't understand .ig and pic? (part of cod.ms that doesn't belong yet)
> forloop.ms: Ascii this
> is pic input (flowchart on how for loops work in JavaScript)
> jstut.ms: HTML file
> very wrong (JavaScript tutorial)
> luxidejavu.ms: troff input fine (ripped
> from p9port, -ms .FP with DejaVu and Luxi Sans)
> programming.ms: c program no one would
> dare put a program that big into one file, stupid (new attempt at
> programming tutorial, you'll see it when it's done)
> school: directory
> good (stuff for school)
> % file advp9prog/*
> adv9prog/ch1: Ascii wtf?
> adv9prog/ch2: c program not again
> adv9prog/dates: short Ascii all right (I
> date evereything for record keeping purposes)
> adv9prog/mkfile: short Ascii don't
> you know about mk?
> % file algoawk/*
> algoawk/book_macros: Ascii ?
> algoawk/ch1: Ascii text oh my gawd,
> something different!
> algoawk/colophon: Ascii ...
> algoawk/dates: short Ascii good
> algoawk/intro: Ascii no
> algoawk/mkfile: Ascii still no
> mk...
> algoawk/show: rc executable file right
> % file school/*
> ...
> To save you the trouble, they're either directories or -ms input,
> but either showing "directory" or "Ascii."
>
> A full report of my troff directory and subdirectories is in
> /n/sources/contrib/pietro/file.funny.
>
> So that's only one file that is absolutely correct. It turns out that the
> problem is file isn't reading the
>
> .FP font
>
> as a troff -ms macro line. In the books, they don't read enough lines to see
> that there are more .PPs than there are #includes. Ah well.
>
> And if you thought that was funny, look at the example of a file that
> actually seeks to more than one line from UNIX in "The UNIX-HATERS Handbook"
> (now a free PDF from its authors).
>
>
>
nao me cague
iru
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-11 14:44 Pietro Gagliardi
` (2 preceding siblings ...)
2008-07-11 17:00 ` Iruata Souza
@ 2008-07-14 16:33 ` Russ Cox
2008-07-14 17:55 ` roger peppe
3 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2008-07-14 16:33 UTC (permalink / raw)
To: 9fans
file(1):
BUGS
It can make mistakes.
thanks for playing.
russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-14 16:33 ` Russ Cox
@ 2008-07-14 17:55 ` roger peppe
2008-07-14 18:48 ` erik quanstrom
0 siblings, 1 reply; 9+ messages in thread
From: roger peppe @ 2008-07-14 17:55 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
one thing that has bugged me in the past: upas relies on file -m to
determine the type of attachments, but file only reads the first block
of the file, so if you've got a utf-8 file with the first non-ascii character
beyond the 8192nd byte, you get corrupted mail.
IMHO for the -m option, file should probably read the whole file,
but there are probably good reasons for not doing so.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] file heuristics on troff input
2008-07-14 17:55 ` roger peppe
@ 2008-07-14 18:48 ` erik quanstrom
0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2008-07-14 18:48 UTC (permalink / raw)
To: 9fans
> one thing that has bugged me in the past: upas relies on file -m to
> determine the type of attachments, but file only reads the first block
> of the file, so if you've got a utf-8 file with the first non-ascii character
> beyond the 8192nd byte, you get corrupted mail.
>
> IMHO for the -m option, file should probably read the whole file,
> but there are probably good reasons for not doing so.
by upas, i believe you mean marshal and ned. another option these
days might be to default to utf-8 and not us-ascii. are there common
mailers that can't handle ascii when sent as utf-8? i can't think of
any off the top of my head.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-15 20:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-15 20:07 [9fans] file heuristics on troff input erik quanstrom
-- strict thread matches above, loose matches on Subject: below --
2008-07-11 14:44 Pietro Gagliardi
2008-07-11 16:11 ` hiro
2008-07-11 17:30 ` Pietro Gagliardi
2008-07-11 16:55 ` C H Forsyth
2008-07-11 17:00 ` Iruata Souza
2008-07-14 16:33 ` Russ Cox
2008-07-14 17:55 ` roger peppe
2008-07-14 18:48 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).