From: "James A. Robinson" <jim.robinson@stanford.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Gecko based web browser
Date: Wed, 19 Jul 2000 21:05:25 -0700 [thread overview]
Message-ID: <200007200405.AAA27933@cse.psu.edu> (raw)
Someone just pointed out 'hget ...|awk ...' would do what I was talking
about. Yes, I did know about it, but perhaps I should explain some of
the advantages I could see coming out of a web retrieval engine.
I realize a lot of this could be built into a stand alone binary, I
just don't see the point of doing that instead of an easier to navigate
fs-style server. In any case, after this I won't harp on the subject
any more.
cache A cache is a wonderful thing. This is from my point of view
as one of the poor schmos who gets paged when Science Magazine
has an article or three that a million people want *right now.*
A few weeks ago they published a breaking article about water
on Mars, an article about some new AIDS research, and an article
about dinosaurs who had feathers. This brought the merry hordes
breaking down our door, and we were seeing 30 hits per second
sustained for 48 hours.
proxy You start up webfs on your firewall machine, and export the
service to everyone on the local net. They bind the firewall's
/webfs to their own root, and off they go (taking advantage of
the cache, and not having to futz with http vs https proxys like
might have to in netscape)
past Anyone else thing that IE actually has a pretty spiffy history
of last visted links? The break down into Yesterday, Last Week,
2 Weeks Ago, etc., is a lot nicer than NS's default history buffer.
Imagine what you could do in terms of an fs system:
diff /webfs/past/20000718/bmj.com/index.dtl /webfs/now/bmj.com/index.dtl
Of course I don't pretend to know how one can handle the file
heirarchy stuff. As I wrote to someone in private e-mail
awhile ago, I don't think that just because a url has a '/'
that it should automatically be a candidate for an fs storage
heirarchy. But I do agree that a LOT of it CAN be mapped.
What if you could do searches in the cache of places you've
visited over the past few weeks? For example, you remember
seeing a real nice algorithm, but you now you've forgotten some
important detail. You can visit Google and comb the web again,
but if you know it's somewhere in that cache on local fs...
anon I can't remember if it was Bell Labs or AT&T Labs, but someone at
one of those places wrote about a neat proxy server which helped
protect your identity. You browsed via the proxy, and it
would subsitute codes like '\@u' which you POST to forms with
an expanded username like 'anon10007' or what-not. It could
generate a unique e-mail address for each '\@e' sent domains so
that you could tell which corperation sold your e-mail address.
junk Anyone else use the junkbuster? It strips out banner ads and other
ad junk, replacing graphics with a transparent gif (and since
it is transparent it can expand to the size of the gif it's
replacing without looking weird). So sites look normal, but you
don't have to view the ads. It also ignores HREF to places
like doubleclick. Wouldn't it be nice to be able to write a
set of common rules to control such features, and allow users
to bind their own /webfs/junkbuster/rules into place?
secure Strip out postscript or junk javascript according to default or
user rules.
I guess all these are variations on a theme. Like I said, I understand
you can do all this in a stand alone binary. But for me it just seems to
call out for a fs style approach -- I'm under the impression that it would
be easier to manage than a bazillion command line options like wget has:
; wget --help
GNU Wget 1.5.3, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit.
-h, --help print this help.
-b, --background go to background after startup.
-e, --execute=COMMAND execute a `.wgetrc' command.
Logging and input file:
-o, --output-file=FILE log messages to FILE.
-a, --append-output=FILE append messages to FILE.
-d, --debug print debug output.
-q, --quiet quiet (no output).
-v, --verbose be verbose (this is the default).
-nv, --non-verbose turn off verboseness, without being quiet.
-i, --input-file=FILE read URL-s from file.
-F, --force-html treat input file as HTML.
Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits).
-O --output-document=FILE write documents to FILE.
-nc, --no-clobber don't clobber existing files.
-c, --continue restart getting an existing file.
--dot-style=STYLE set retrieval display style.
-N, --timestamping don't retrieve files if older than local.
-S, --server-response print server response.
--spider don't download anything.
-T, --timeout=SECONDS set the read timeout to SECONDS.
-w, --wait=SECONDS wait SECONDS between retrievals.
-Y, --proxy=on/off turn proxy on or off.
-Q, --quota=NUMBER set retrieval quota to NUMBER.
Directories:
-nd --no-directories don't create directories.
-x, --force-directories force creation of directories.
-nH, --no-host-directories don't create host directories.
-P, --directory-prefix=PREFIX save files to PREFIX/...
--cut-dirs=NUMBER ignore NUMBER remote directory components.
HTTP options:
--http-user=USER set http user to USER.
--http-passwd=PASS set http password to PASS.
-C, --cache=on/off (dis)allow server-cached data (normally allowed).
--ignore-length ignore `Content-Length' header field.
--header=STRING insert STRING among the headers.
--proxy-user=USER set USER as proxy username.
--proxy-passwd=PASS set PASS as proxy password.
-s, --save-headers save the HTTP headers to file.
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.
FTP options:
--retr-symlinks retrieve FTP symbolic links.
-g, --glob=on/off turn file name globbing on or off.
--passive-ftp use the "passive" transfer mode.
Recursive retrieval:
-r, --recursive recursive web-suck -- use with care!.
-l, --level=NUMBER maximum recursion depth (0 to unlimit).
--delete-after delete downloaded files.
-k, --convert-links convert non-relative links to relative.
-m, --mirror turn on options suitable for mirroring.
-nr, --dont-remove-listing don't remove `.listing' files.
Recursive accept/reject:
-A, --accept=LIST list of accepted extensions.
-R, --reject=LIST list of rejected extensions.
-D, --domains=LIST list of accepted domains.
--exclude-domains=LIST comma-separated list of rejected domains.
-L, --relative follow relative links only.
--follow-ftp follow FTP links from HTML documents.
-H, --span-hosts go to foreign hosts when recursive.
-I, --include-directories=LIST list of allowed directories.
-X, --exclude-directories=LIST list of excluded directories.
-nh, --no-host-lookup don't DNS-lookup hosts.
-np, --no-parent don't ascend to the parent directory.
Mail bug reports and suggestions to <bug-wget@gnu.org>.
next reply other threads:[~2000-07-20 4:05 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-07-20 4:05 James A. Robinson [this message]
-- strict thread matches above, loose matches on Subject: below --
2000-07-26 17:43 miller
2000-07-26 17:50 ` James G. Stallings II
2000-07-27 7:43 ` Matt
2000-07-27 7:54 ` Lucio De Re
2000-07-27 17:28 ` Matt
2000-07-20 1:41 rob pike
2000-07-20 8:34 ` George Coulouris
2000-07-19 7:18 forsyth
2000-07-19 7:43 ` Lucio De Re
2000-07-19 7:58 ` Randolph Fritz
2000-07-19 15:23 ` Jonathan Sergent
2000-07-18 23:02 forsyth
2000-07-18 22:30 ` Frank Gleason
2000-07-19 0:17 ` Randolph Fritz
2000-07-19 0:01 ` Frank Gleason
2000-07-19 1:02 ` Skip Tavakkolian
2000-07-19 11:45 ` Theo Honohan
2000-07-18 22:33 rob pike
2000-07-18 22:59 ` Howard Trickey
2000-07-21 8:34 ` Alt
2000-07-25 15:07 ` Douglas A. Gwyn
2000-07-18 20:23 miller
2000-07-18 22:07 ` Randolph Fritz
2000-07-18 19:03 Stephen Harris
2000-07-18 19:17 ` Andrey Mirtchovski
2000-07-18 23:48 ` Randolph Fritz
2000-07-19 5:40 ` Randolph Fritz
2000-07-19 9:26 ` Michael Dingler
2000-07-19 15:22 ` Douglas A. Gwyn
2000-07-19 16:28 ` Andrey Mirtchovski
2000-07-19 16:47 ` Randolph Fritz
2000-07-19 22:52 ` sah
2000-07-20 1:16 ` James A. Robinson
2000-07-20 3:08 ` Boyd Roberts
2000-07-26 8:42 ` Ralph Corderoy
2000-07-19 9:27 ` Christopher Browne
2000-07-19 15:24 ` Andy Newman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200007200405.AAA27933@cse.psu.edu \
--to=jim.robinson@stanford.edu \
--cc=9fans@cse.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).