9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] a question of file and the history of magic
@ 2008-07-07  8:58 Harri Haataja
  2008-07-11 16:21 ` Dan Cross
  0 siblings, 1 reply; 10+ messages in thread
From: Harri Haataja @ 2008-07-07  8:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


----- Original message -----
> There are a lot of things like that.  Do we still need to
> compress man pages on 1TB disk driver? :)

I actually happened to be reading this on a small device that has a compressed system fs. Compressing data here on the "app" side with uninformed algorithms is discouraged to avoid double compression and other assorted losses. I believe you hit the same things with compression on some network protocols.

Touching GB range, this isn't that tiny but well below TB. :)




^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 23:45 geoff
  0 siblings, 0 replies; 10+ messages in thread
From: geoff @ 2008-07-06 23:45 UTC (permalink / raw)
  To: 9fans

As far as I know, all (l)unix dd commands from at least sixth edition
unix forward accept the options if=, bs= and count=, so portability
across (l)unix systems shouldn't be a problem.  Not all (l)unix dd
commands have some of the newer options, such as iseek= and oseek=,
but the ones I have seen all take the same syntax.

An assumption behind the system v magic file is that almost all file
does is match byte strings or integers, usually at the start of the
file being examined (though that wasn't really true even on unix).
The plan 9 file command also uses more complex algorithms; it's not
all simple table-driven matching.  It understands unicode and utf.  It
may look at histograms of byte or word distributions.  It may look for
programming language keywords.  It understands a.out and elf headers.
It can recognise a tar header and distinguish posix from non-posix
headers.  It can recognise html, rfc-822 mail messages, unix-style
mailboxes, images, compressed files and encrypted ones, and more.  A
quick look suggests that under 15% of the lines in file.c are devoted
to table definitions.




^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 21:20 erik quanstrom
  2008-07-06 21:59 ` Brantley Coile
  2008-07-06 22:31 ` Bakul Shah
  0 siblings, 2 replies; 10+ messages in thread
From: erik quanstrom @ 2008-07-06 21:20 UTC (permalink / raw)
  To: jas, 9fans

> In a sense, the question is more about the historical change and/or
> adoption of a new file command for Plan 9 that doesn't use a magic
> file for references.  Why opt out of a magic file other than the
> obvious performance hit of scanning it each run?  Is it worth
> repeating the old forms that used magic, or has anyone in the Plan 9
> community already improved upon the idea and introduced a new, more
> adaptable tool?

what is the upside to an external magic file?  as you've shown, you
can add a file type in 1 line of code.  while the external magic file
isn't c, i would argue that it's still code.

the disadvantage is that you need to write a parser for yet another
file format.  it turns out that linux file's maintainers felt that a text file
wasn't good enough so they implemented a magic compiler.  i really
don't understand the logic behind the compiler, since it would seem
to trade reduced cpu cycles for increased i/o.  that would seem to be
a terrible trade off these days.

; wc magic magic.mgc
  13469   69850  484372 magic
   1301   17997 1062400 magic.mgc		# compiled version

the source is pretty big, too:

; wc -l ffile-4.20/src/*.[ch]|grep total
  9273 total

according to wikipedia (http://en.wikipedia.org/wiki/File_(Unix)),
system v introduced the external magic file.  i don't think that system v
was in anyway an ancestor of plan 9.  but i don't know anything of
the history of plan 9 file.

- erik



^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [9fans] a question of file and the history of magic
@ 2008-07-06 19:30 erik quanstrom
  2008-07-06 21:00 ` Jeff Sickel
  0 siblings, 1 reply; 10+ messages in thread
From: erik quanstrom @ 2008-07-06 19:30 UTC (permalink / raw)
  To: jas, 9fans

> This addition helped my scripts become a little more streamlined, but
> of course puts in an additional entry into the source file I need to
> track.  As file name extensions don't always work across all sorts of
> systems, many still hamstrung by 8.3, what is the preferred or
> recommend mechanism for checking file types the Plan 9 way since we no
> longer have the System V magic?

i'm pretty confused by what you're saying here.  why doesn't file(1) work?
are you saying there's something wrong with editing the source as opposed
to to editing a configuration file?

either way your system is equally non-standard.  in either event,
submitting a patch and having it accepted is the only way around this.

> In a sense, a modified xd(1) that has an option for a restricted range
> of byte sequences would work.  That would at least provide a fast seek
> into a file that can be pipelined into any other command sequence--no
> need to dump the whole file when you just need to the first four
> bytes, but then it just gets to the point of having a magic file.

why would xd need modification?  how about

	dd -if $infile -bs $nbytes -count 1 | xd

there are no restrictions placed by dd on $nbytes.  it could be
4 or 99132 or whatever.  dd's -iseek option similarly can specify
any offset.

- erik



^ permalink raw reply	[flat|nested] 10+ messages in thread
* [9fans] a question of file and the history of magic
@ 2008-07-06 18:16 Jeff Sickel
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Sickel @ 2008-07-06 18:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

This is a comment/question about file(1) as implemented in Plan 9 and
p9p.

Over the years I've been using various versions of file with editable
magic files.  Though file "can make mistakes", this worked out rather
well when I just wanted a little more detail than 'binary' with the
tradeoff of the command being a bit slow at times.  While deciding to
use p9p's rc for a script to help with some picture process, I
realized I needed to use file to help determine the type of data I'm
checking on the file system.  So I added the following (though it
could just be added to the long0tab just as easily):

% hg diff file.c
diff -r d7799c860a8f src/cmd/file.c
--- a/src/cmd/file.c	Sat Jul 05 10:01:43 2008 -0400
+++ b/src/cmd/file.c	Sun Jul 06 12:30:28 2008 -0500
@@ -655,6 +655,7 @@
  	"\377\330\377\340",	"jpeg",				4,	"image/jpeg",
  	"\377\330\377\341",	"jpeg",				4,	"image/jpeg",
  	"\377\330\377\333",	"jpeg",				4,	"image/jpeg",
+	"\106\117\126\142",	"x3f",				4,	"image/x3f",
  	"BM",			"bmp",				2,	"image/bmp",
  	"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1",	"microsoft office document",
8,	"application/octet-stream",
  	"<MakerFile ",		"FrameMaker file",		11,	"application/framemaker",


This addition helped my scripts become a little more streamlined, but
of course puts in an additional entry into the source file I need to
track.  As file name extensions don't always work across all sorts of
systems, many still hamstrung by 8.3, what is the preferred or
recommend mechanism for checking file types the Plan 9 way since we no
longer have the System V magic?

In a sense, a modified xd(1) that has an option for a restricted range
of byte sequences would work.  That would at least provide a fast seek
into a file that can be pipelined into any other command sequence--no
need to dump the whole file when you just need to the first four
bytes, but then it just gets to the point of having a magic file.

-jas




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-11 16:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-07  8:58 [9fans] a question of file and the history of magic Harri Haataja
2008-07-11 16:21 ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2008-07-06 23:45 geoff
2008-07-06 21:20 erik quanstrom
2008-07-06 21:59 ` Brantley Coile
2008-07-06 22:31 ` Bakul Shah
2008-07-06 22:44   ` Charles Forsyth
2008-07-06 19:30 erik quanstrom
2008-07-06 21:00 ` Jeff Sickel
2008-07-06 18:16 Jeff Sickel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).