9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] file heuristics on troff input
@ 2008-07-11 14:44 Pietro Gagliardi
  2008-07-11 16:11 ` hiro
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Pietro Gagliardi @ 2008-07-11 14:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Just a bit of humor:

	COMPUTER								ME
	% cd troff
	% file *
	advp9prog:				directory			yes (old attempt at plan 9 programmer's
guide)
	algoawk:				directory			yes (awk book)
	bentley.ms:				troff -ms input	yes (Bentley paper)
	bentley2.ms:				Ascii text		wtf? (Bentley paper retry)
	cod.ms:					troff input		why not -ms? (paper on calculator program
I'm writing)
	cod.ms.part:				Ascii			file doesn't understand .ig and pic? (part of
cod.ms that doesn't belong yet)
	forloop.ms:				Ascii			this is pic input (flowchart on how for loops
work in JavaScript)
	jstut.ms:					HTML file		very wrong (JavaScript tutorial)
	luxidejavu.ms:			troff input		fine (ripped from p9port, -ms .FP with
DejaVu and Luxi Sans)
	programming.ms:			c program		no one would dare put a program that big
into one file, stupid (new attempt at programming tutorial, you'll see
it when it's done)
	school:					directory			good (stuff for school)
	% file advp9prog/*
	adv9prog/ch1:			Ascii			wtf?
	adv9prog/ch2:			c program		not again
	adv9prog/dates:			short Ascii		all right (I date evereything for
record keeping purposes)
	adv9prog/mkfile:			short Ascii		don't you know about mk?
	% file algoawk/*
	algoawk/book_macros:	Ascii			?
	algoawk/ch1:			Ascii text		oh my gawd, something different!
	algoawk/colophon:		Ascii			...
	algoawk/dates:			short Ascii		good
	algoawk/intro:			Ascii			no
	algoawk/mkfile:			Ascii			still no mk...
	algoawk/show:			rc executable file	right
	% file school/*
	...										To save you the trouble, they're either directories or -
ms input, but either showing "directory" or "Ascii."

A full report of my troff directory and subdirectories is in /n/
sources/contrib/pietro/file.funny.

So that's only one file that is absolutely correct. It turns out that
the problem is file isn't reading the

	.FP font

as a troff -ms macro line. In the books, they don't read enough lines
to see that there are more .PPs than there are #includes. Ah well.

And if you thought that was funny, look at the example of a file that
actually seeks to more than one line from UNIX in "The UNIX-HATERS
Handbook" (now a free PDF from its authors).




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-11 14:44 [9fans] file heuristics on troff input Pietro Gagliardi
@ 2008-07-11 16:11 ` hiro
  2008-07-11 17:30   ` Pietro Gagliardi
  2008-07-11 16:55 ` C H Forsyth
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: hiro @ 2008-07-11 16:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>  So that's only one file that is absolutely correct. It turns out that the
> problem is file isn't reading the
>
>         .FP font
>
>  as a troff -ms macro line. In the books, they don't read enough lines to
> see that there are more .PPs than there are #includes. Ah well.
>
>  And if you thought that was funny, look at the example of a file that
> actually seeks to more than one line from UNIX in "The UNIX-HATERS Handbook"
> (now a free PDF from its authors).

So you have shown that you are more intelligent than a computer.
Or are you just bored and trolling?
To be honest, I don't find this very funny at all. Even though I wished I could.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-11 14:44 [9fans] file heuristics on troff input Pietro Gagliardi
  2008-07-11 16:11 ` hiro
@ 2008-07-11 16:55 ` C H Forsyth
  2008-07-11 17:00 ` Iruata Souza
  2008-07-14 16:33 ` Russ Cox
  3 siblings, 0 replies; 9+ messages in thread
From: C H Forsyth @ 2008-07-11 16:55 UTC (permalink / raw)
  To: 9fans

>	bentley.ms:				troff -ms input	yes (Bentley paper)

doctype might produce better guesses at troff macro packages and preprocessors



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-11 14:44 [9fans] file heuristics on troff input Pietro Gagliardi
  2008-07-11 16:11 ` hiro
  2008-07-11 16:55 ` C H Forsyth
@ 2008-07-11 17:00 ` Iruata Souza
  2008-07-14 16:33 ` Russ Cox
  3 siblings, 0 replies; 9+ messages in thread
From: Iruata Souza @ 2008-07-11 17:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jul 11, 2008 at 11:44 AM, Pietro Gagliardi <pietro10@mac.com> wrote:
> Just a bit of humor:
>
>        COMPUTER
>    ME
>        % cd troff
>        % file *
>        advp9prog:                              directory
>   yes (old attempt at plan 9 programmer's guide)
>        algoawk:                                directory
>   yes (awk book)
>        bentley.ms:                             troff -ms input yes (Bentley
> paper)
>        bentley2.ms:                            Ascii text              wtf?
> (Bentley paper retry)
>        cod.ms:                                 troff input             why
> not -ms? (paper on calculator program I'm writing)
>        cod.ms.part:                            Ascii                   file
> doesn't understand .ig and pic? (part of cod.ms that doesn't belong yet)
>        forloop.ms:                             Ascii                   this
> is pic input (flowchart on how for loops work in JavaScript)
>        jstut.ms:                                       HTML file
>   very wrong (JavaScript tutorial)
>        luxidejavu.ms:                  troff input             fine (ripped
> from p9port, -ms .FP with DejaVu and Luxi Sans)
>        programming.ms:                 c program               no one would
> dare put a program that big into one file, stupid (new attempt at
> programming tutorial, you'll see it when it's done)
>        school:                                 directory
>   good (stuff for school)
>        % file advp9prog/*
>        adv9prog/ch1:                   Ascii                   wtf?
>        adv9prog/ch2:                   c program               not again
>        adv9prog/dates:                 short Ascii             all right (I
> date evereything for record keeping purposes)
>        adv9prog/mkfile:                        short Ascii             don't
> you know about mk?
>        % file algoawk/*
>        algoawk/book_macros:    Ascii                   ?
>        algoawk/ch1:                    Ascii text              oh my gawd,
> something different!
>        algoawk/colophon:               Ascii                   ...
>        algoawk/dates:                  short Ascii             good
>        algoawk/intro:                  Ascii                   no
>        algoawk/mkfile:                 Ascii                   still no
> mk...
>        algoawk/show:                   rc executable file      right
>        % file school/*
>        ...
>           To save you the trouble, they're either directories or -ms input,
> but either showing "directory" or "Ascii."
>
> A full report of my troff directory and subdirectories is in
> /n/sources/contrib/pietro/file.funny.
>
> So that's only one file that is absolutely correct. It turns out that the
> problem is file isn't reading the
>
>        .FP font
>
> as a troff -ms macro line. In the books, they don't read enough lines to see
> that there are more .PPs than there are #includes. Ah well.
>
> And if you thought that was funny, look at the example of a file that
> actually seeks to more than one line from UNIX in "The UNIX-HATERS Handbook"
> (now a free PDF from its authors).
>
>
>

nao me cague

iru



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-11 16:11 ` hiro
@ 2008-07-11 17:30   ` Pietro Gagliardi
  0 siblings, 0 replies; 9+ messages in thread
From: Pietro Gagliardi @ 2008-07-11 17:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

What I'm pointing out is that file doesn't look hard enough to guess
accurately. Something like

	.ds da December 4, 3210
	.DA \*(da
	.TL

would not be detected.

On Jul 11, 2008, at 12:11 PM, hiro wrote:

>> So that's only one file that is absolutely correct. It turns out
>> that the
>> problem is file isn't reading the
>>
>>        .FP font
>>
>> as a troff -ms macro line. In the books, they don't read enough
>> lines to
>> see that there are more .PPs than there are #includes. Ah well.
>>
>> And if you thought that was funny, look at the example of a file that
>> actually seeks to more than one line from UNIX in "The UNIX-HATERS
>> Handbook"
>> (now a free PDF from its authors).
>
> So you have shown that you are more intelligent than a computer.
> Or are you just bored and trolling?
> To be honest, I don't find this very funny at all. Even though I
> wished I could.
>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-11 14:44 [9fans] file heuristics on troff input Pietro Gagliardi
                   ` (2 preceding siblings ...)
  2008-07-11 17:00 ` Iruata Souza
@ 2008-07-14 16:33 ` Russ Cox
  2008-07-14 17:55   ` roger peppe
  3 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2008-07-14 16:33 UTC (permalink / raw)
  To: 9fans

file(1):

     BUGS
          It can make mistakes.

thanks for playing.
russ



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-14 16:33 ` Russ Cox
@ 2008-07-14 17:55   ` roger peppe
  2008-07-14 18:48     ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: roger peppe @ 2008-07-14 17:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

one thing that has bugged me in the past: upas relies on file -m to
determine the type of attachments, but file only reads the first block
of the file, so if you've got a utf-8 file with the first non-ascii character
beyond the 8192nd byte, you get corrupted mail.

IMHO for the -m option, file should probably read the whole file,
but there are probably good reasons for not doing so.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
  2008-07-14 17:55   ` roger peppe
@ 2008-07-14 18:48     ` erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2008-07-14 18:48 UTC (permalink / raw)
  To: 9fans

> one thing that has bugged me in the past: upas relies on file -m to
> determine the type of attachments, but file only reads the first block
> of the file, so if you've got a utf-8 file with the first non-ascii character
> beyond the 8192nd byte, you get corrupted mail.
>
> IMHO for the -m option, file should probably read the whole file,
> but there are probably good reasons for not doing so.

by upas, i believe you mean marshal and ned.  another option these
days might be to default to utf-8 and not us-ascii.  are there common
mailers that can't handle ascii when sent as utf-8?  i can't think of
any off the top of my head.

- erik




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] file heuristics on troff input
@ 2008-07-15 20:07 erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2008-07-15 20:07 UTC (permalink / raw)
  To: 9fans

> From: "roger peppe" <rogpeppe@gma...>

> one thing that has bugged me in the past: upas relies on file -m to
> determine the type of attachments, but file only reads the first block
> of the file, so if you've got a utf-8 file with the first non-ascii character
> beyond the 8192nd byte, you get corrupted mail.
> IMHO for the -m option, file should probably read the whole file,
> but there are probably good reasons for not doing so.

i don't think this is correct.

upas/marshal uses file -m to determine the mime type, e.g.
"text/plain" or "text/html" or whatever.  it tests for bucky bits
in the entire body to determine if it is utf-8 or us-ascii.
(cf. upas/marshal/marshal.c:^/body)  it uses this information
to emit a charset.

(obviously there's a whole in this nulls and bad utf can cause
this algorithm to go pear shaped.)

upas/nedmail also uses file -m.  but since upas/fs does
character set translation, i don't see how using file could
be to blame for corrupt email.

- erik




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-07-15 20:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-11 14:44 [9fans] file heuristics on troff input Pietro Gagliardi
2008-07-11 16:11 ` hiro
2008-07-11 17:30   ` Pietro Gagliardi
2008-07-11 16:55 ` C H Forsyth
2008-07-11 17:00 ` Iruata Souza
2008-07-14 16:33 ` Russ Cox
2008-07-14 17:55   ` roger peppe
2008-07-14 18:48     ` erik quanstrom
2008-07-15 20:07 erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).