9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] a question of file and the history of magic
@ 2008-07-07  8:58 Harri Haataja
  2008-07-11 16:21 ` Dan Cross
  0 siblings, 1 reply; 10+ messages in thread
From: Harri Haataja @ 2008-07-07  8:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


----- Original message -----
> There are a lot of things like that.  Do we still need to
> compress man pages on 1TB disk driver? :)

I actually happened to be reading this on a small device that has a compressed system fs. Compressing data here on the "app" side with uninformed algorithms is discouraged to avoid double compression and other assorted losses. I believe you hit the same things with compression on some network protocols.

Touching GB range, this isn't that tiny but well below TB. :)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
  2008-07-07  8:58 [9fans] a question of file and the history of magic Harri Haataja
@ 2008-07-11 16:21 ` Dan Cross
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Cross @ 2008-07-11 16:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I know!  We need dynamically loadable shared object files and a new
language to describe all the things that file can do, and then a
compiler for that language that generates shared objects that are
dynamically loaded at runtime....

Oh, wait.

But seriously (yes, Virginia, for the humor impaired, that *was* a
joke...) there are benefits to external description files.  Somewhat
obviously, it's the whole "little language" concept that we all know
and love, and we all know that, but the question becomes, for
something like "file", how complex does one make that little language?
 At what point does the tradeoff between complexity of the description
language and hand-coded C break in favor of one versus the other?  How
often are we updating things?  And that *is* a legitimate question,
and I think it's the basis of the original question.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 23:45 geoff
  0 siblings, 0 replies; 10+ messages in thread
From: geoff @ 2008-07-06 23:45 UTC (permalink / raw)
  To: 9fans

As far as I know, all (l)unix dd commands from at least sixth edition
unix forward accept the options if=, bs= and count=, so portability
across (l)unix systems shouldn't be a problem.  Not all (l)unix dd
commands have some of the newer options, such as iseek= and oseek=,
but the ones I have seen all take the same syntax.

An assumption behind the system v magic file is that almost all file
does is match byte strings or integers, usually at the start of the
file being examined (though that wasn't really true even on unix).
The plan 9 file command also uses more complex algorithms; it's not
all simple table-driven matching.  It understands unicode and utf.  It
may look at histograms of byte or word distributions.  It may look for
programming language keywords.  It understands a.out and elf headers.
It can recognise a tar header and distinguish posix from non-posix
headers.  It can recognise html, rfc-822 mail messages, unix-style
mailboxes, images, compressed files and encrypted ones, and more.  A
quick look suggests that under 15% of the lines in file.c are devoted
to table definitions.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
  2008-07-06 22:31 ` Bakul Shah
@ 2008-07-06 22:44   ` Charles Forsyth
  0 siblings, 0 replies; 10+ messages in thread
From: Charles Forsyth @ 2008-07-06 22:44 UTC (permalink / raw)
  To: 9fans

> The main disadvantage of gnu file is performance.

the magic file contains surprisingly many spells,
even excluding muttered incantations.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
  2008-07-06 21:20 erik quanstrom
  2008-07-06 21:59 ` Brantley Coile
@ 2008-07-06 22:31 ` Bakul Shah
  2008-07-06 22:44   ` Charles Forsyth
  1 sibling, 1 reply; 10+ messages in thread
From: Bakul Shah @ 2008-07-06 22:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 06 Jul 2008 17:20:12 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> what is the upside to an external magic file?  as you've shown, you
> can add a file type in 1 line of code.  while the external magic file
> isn't c, i would argue that it's still code.

Yes it is code but the advantage is that the parser language
is factored out and anyone can add knowledge about new file
formats and it is easy to debug and experiment.

The main disadvantage of gnu file is performance.  As an
example, on about 6000 files totalling 200MB, gnu file takes
2s user, 1s system and 30s real time.  Compared to that p9p
file takes 1.65s user, 0.25s system and 1.9s real time.
Note: any cache effects have been accounted for by looking at
only the best 3 runs of of each test.  As per csh, there were
no page faults and no disk io.

The magic file should really be compiled and linked w/
file(1) -- if that was done right, the rest of file(1) code
would be pretty trivial.  On the other hand file is usually
not a performance bottleneck.  On the gripping hand there are
a lot of similarities between cracking file formats and
packet formats so may be there is value in factoring all that
out and sticking it in a library routine.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
  2008-07-06 21:20 erik quanstrom
@ 2008-07-06 21:59 ` Brantley Coile
  2008-07-06 22:31 ` Bakul Shah
  1 sibling, 0 replies; 10+ messages in thread
From: Brantley Coile @ 2008-07-06 21:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I remember the day I first saw a file magic file.  I welcomed it because
for the first time I didn't have access to the source code.  Those were
the days when you had to have $45k to get the source.  A hard thing to
ask for.  Today a separate magic file is just a leftover vestige of the
past.  There are a lot of things like that.  Do we still need to
compress man pages on 1TB disk driver? :)

erik quanstrom wrote:
>>In a sense, the question is more about the historical change and/or
>>adoption of a new file command for Plan 9 that doesn't use a magic
>>file for references.  Why opt out of a magic file other than the
>>obvious performance hit of scanning it each run?  Is it worth
>>repeating the old forms that used magic, or has anyone in the Plan 9
>>community already improved upon the idea and introduced a new, more
>>adaptable tool?
>
>
> what is the upside to an external magic file?  as you've shown, you
> can add a file type in 1 line of code.  while the external magic file
> isn't c, i would argue that it's still code.
>
> the disadvantage is that you need to write a parser for yet another
> file format.  it turns out that linux file's maintainers felt that a text file
> wasn't good enough so they implemented a magic compiler.  i really
> don't understand the logic behind the compiler, since it would seem
> to trade reduced cpu cycles for increased i/o.  that would seem to be
> a terrible trade off these days.
>
> ; wc magic magic.mgc
>   13469   69850  484372 magic
>    1301   17997 1062400 magic.mgc		# compiled version
>
> the source is pretty big, too:
>
> ; wc -l ffile-4.20/src/*.[ch]|grep total
>   9273 total
>
> according to wikipedia (http://en.wikipedia.org/wiki/File_(Unix)),
> system v introduced the external magic file.  i don't think that system v
> was in anyway an ancestor of plan 9.  but i don't know anything of
> the history of plan 9 file.
>
> - erik
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 21:20 erik quanstrom
  2008-07-06 21:59 ` Brantley Coile
  2008-07-06 22:31 ` Bakul Shah
  0 siblings, 2 replies; 10+ messages in thread
From: erik quanstrom @ 2008-07-06 21:20 UTC (permalink / raw)
  To: jas, 9fans

> In a sense, the question is more about the historical change and/or
> adoption of a new file command for Plan 9 that doesn't use a magic
> file for references.  Why opt out of a magic file other than the
> obvious performance hit of scanning it each run?  Is it worth
> repeating the old forms that used magic, or has anyone in the Plan 9
> community already improved upon the idea and introduced a new, more
> adaptable tool?

what is the upside to an external magic file?  as you've shown, you
can add a file type in 1 line of code.  while the external magic file
isn't c, i would argue that it's still code.

the disadvantage is that you need to write a parser for yet another
file format.  it turns out that linux file's maintainers felt that a text file
wasn't good enough so they implemented a magic compiler.  i really
don't understand the logic behind the compiler, since it would seem
to trade reduced cpu cycles for increased i/o.  that would seem to be
a terrible trade off these days.

; wc magic magic.mgc
  13469   69850  484372 magic
   1301   17997 1062400 magic.mgc		# compiled version

the source is pretty big, too:

; wc -l ffile-4.20/src/*.[ch]|grep total
  9273 total

according to wikipedia (http://en.wikipedia.org/wiki/File_(Unix)),
system v introduced the external magic file.  i don't think that system v
was in anyway an ancestor of plan 9.  but i don't know anything of
the history of plan 9 file.

- erik



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
  2008-07-06 19:30 erik quanstrom
@ 2008-07-06 21:00 ` Jeff Sickel
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Sickel @ 2008-07-06 21:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Jul 6, 2008, at 2:30 PM, erik quanstrom wrote:

>> This addition helped my scripts become a little more streamlined, but
>> of course puts in an additional entry into the source file I need to
>> track.  As file name extensions don't always work across all sorts of
>> systems, many still hamstrung by 8.3, what is the preferred or
>> recommend mechanism for checking file types the Plan 9 way since we
>> no
>> longer have the System V magic?
>
> i'm pretty confused by what you're saying here.  why doesn't file(1)
> work?
> are you saying there's something wrong with editing the source as
> opposed
> to to editing a configuration file?


File does work; until I need to check for something not in its
compiled-in magic tables.  So I patch the code and it works better
than many other options I've tried while still letting me use rc to
script out things I need to get done without the restart time of
trying to remember what the script does when I need to add to it later
(unlike the brick wall I continually hit all the time when using Perl).

In a sense, the question is more about the historical change and/or
adoption of a new file command for Plan 9 that doesn't use a magic
file for references.  Why opt out of a magic file other than the
obvious performance hit of scanning it each run?  Is it worth
repeating the old forms that used magic, or has anyone in the Plan 9
community already improved upon the idea and introduced a new, more
adaptable tool?

> either way your system is equally non-standard.  in either event,
> submitting a patch and having it accepted is the only way around this.

The beauty of standards is there are so many to choose from -- oft
quoted and most likely misappropriated from somewhere else.

> 	dd -if $infile -bs $nbytes -count 1 | xd

dang, I almost always forget about dd for some reason.  Though in that
case I'd need to pull in Plan 9's version of dd into p9p since the
arguments to dd are different on almost every system I use: Plan 9,
various Linux distros, Solaris, OS X, ...

-jas




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 19:30 erik quanstrom
  2008-07-06 21:00 ` Jeff Sickel
  0 siblings, 1 reply; 10+ messages in thread
From: erik quanstrom @ 2008-07-06 19:30 UTC (permalink / raw)
  To: jas, 9fans

> This addition helped my scripts become a little more streamlined, but
> of course puts in an additional entry into the source file I need to
> track.  As file name extensions don't always work across all sorts of
> systems, many still hamstrung by 8.3, what is the preferred or
> recommend mechanism for checking file types the Plan 9 way since we no
> longer have the System V magic?

i'm pretty confused by what you're saying here.  why doesn't file(1) work?
are you saying there's something wrong with editing the source as opposed
to to editing a configuration file?

either way your system is equally non-standard.  in either event,
submitting a patch and having it accepted is the only way around this.

> In a sense, a modified xd(1) that has an option for a restricted range
> of byte sequences would work.  That would at least provide a fast seek
> into a file that can be pipelined into any other command sequence--no
> need to dump the whole file when you just need to the first four
> bytes, but then it just gets to the point of having a magic file.

why would xd need modification?  how about

	dd -if $infile -bs $nbytes -count 1 | xd

there are no restrictions placed by dd on $nbytes.  it could be
4 or 99132 or whatever.  dd's -iseek option similarly can specify
any offset.

- erik



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [9fans] a question of file and the history of magic
@ 2008-07-06 18:16 Jeff Sickel
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Sickel @ 2008-07-06 18:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

This is a comment/question about file(1) as implemented in Plan 9 and
p9p.

Over the years I've been using various versions of file with editable
magic files.  Though file "can make mistakes", this worked out rather
well when I just wanted a little more detail than 'binary' with the
tradeoff of the command being a bit slow at times.  While deciding to
use p9p's rc for a script to help with some picture process, I
realized I needed to use file to help determine the type of data I'm
checking on the file system.  So I added the following (though it
could just be added to the long0tab just as easily):

% hg diff file.c
diff -r d7799c860a8f src/cmd/file.c
--- a/src/cmd/file.c	Sat Jul 05 10:01:43 2008 -0400
+++ b/src/cmd/file.c	Sun Jul 06 12:30:28 2008 -0500
@@ -655,6 +655,7 @@
  	"\377\330\377\340",	"jpeg",				4,	"image/jpeg",
  	"\377\330\377\341",	"jpeg",				4,	"image/jpeg",
  	"\377\330\377\333",	"jpeg",				4,	"image/jpeg",
+	"\106\117\126\142",	"x3f",				4,	"image/x3f",
  	"BM",			"bmp",				2,	"image/bmp",
  	"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1",	"microsoft office document",
8,	"application/octet-stream",
  	"<MakerFile ",		"FrameMaker file",		11,	"application/framemaker",


This addition helped my scripts become a little more streamlined, but
of course puts in an additional entry into the source file I need to
track.  As file name extensions don't always work across all sorts of
systems, many still hamstrung by 8.3, what is the preferred or
recommend mechanism for checking file types the Plan 9 way since we no
longer have the System V magic?

In a sense, a modified xd(1) that has an option for a restricted range
of byte sequences would work.  That would at least provide a fast seek
into a file that can be pipelined into any other command sequence--no
need to dump the whole file when you just need to the first four
bytes, but then it just gets to the point of having a magic file.

-jas




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-11 16:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-07  8:58 [9fans] a question of file and the history of magic Harri Haataja
2008-07-11 16:21 ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2008-07-06 23:45 geoff
2008-07-06 21:20 erik quanstrom
2008-07-06 21:59 ` Brantley Coile
2008-07-06 22:31 ` Bakul Shah
2008-07-06 22:44   ` Charles Forsyth
2008-07-06 19:30 erik quanstrom
2008-07-06 21:00 ` Jeff Sickel
2008-07-06 18:16 Jeff Sickel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).