* [TUHS] Mirror with link
2011-10-13 18:43 ` Larry McVoy
@ 2011-10-13 18:51 ` Derrik Walker
2011-10-13 18:53 ` SPC
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Derrik Walker @ 2011-10-13 18:51 UTC (permalink / raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]
I like shell scripts for hard to remember things like that.
Call it wget_mirror.sh or something like that:
#!/bin/bash
if [ $# -ne 1 ] ; then echo ERRR; exit 1 ; else url=$1 ; fi
wget -c -m -k -np -e robots=off $url
( that was off the top of my head and has NOT been tested ... it might have a typo or two :)
- Derrik
On Oct 13, 2011, at 02:43 PM, Larry McVoy <lm at bitmover.com> wrote:
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> >
> > The correct way to mirror a site with links corrected is
> >
> > wget -c -m -k -np -e robots=off URL
> >
> > Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
> wget --mirror URL
>
> ?
> --
> ---
> Larry McVoy lm at bitmover.com http://www.bitkeeper.com
> _______________________________________________
> TUHS mailing list
> TUHS at minnie.tuhs.org
> https://minnie.tuhs.org/mailman/listinfo/tuhs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20111013/29ca20ef/attachment.html>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [TUHS] Mirror with link
2011-10-13 18:43 ` Larry McVoy
2011-10-13 18:51 ` Derrik Walker
@ 2011-10-13 18:53 ` SPC
2011-10-13 19:35 ` A. P. Garcia
2011-10-14 15:59 ` Jose R. Valverde
3 siblings, 0 replies; 6+ messages in thread
From: SPC @ 2011-10-13 18:53 UTC (permalink / raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]
Computing woudln't be the same :-)
--
Saludos - Greetings - Freundliche Grüße - Salutations
Sergio Pedraja
<http://www.linkedin.com/in/sergiopedraja>
twitter: @sergio_pedraja
http://www.linkedin.com/in/sergiopedraja
http://www.quora.com/Sergio-Pedraja
-----
2011/10/13 Larry McVoy <lm at bitmover.com>
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> >
> > The correct way to mirror a site with links corrected is
> >
> > wget -c -m -k -np -e robots=off URL
> >
> > Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
> wget --mirror URL
>
> ?
> --
> ---
> Larry McVoy lm at bitmover.com
> http://www.bitkeeper.com
> _______________________________________________
> TUHS mailing list
> TUHS at minnie.tuhs.org
> https://minnie.tuhs.org/mailman/listinfo/tuhs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20111013/9ccb1331/attachment.html>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [TUHS] Mirror with link
2011-10-13 18:43 ` Larry McVoy
2011-10-13 18:51 ` Derrik Walker
2011-10-13 18:53 ` SPC
@ 2011-10-13 19:35 ` A. P. Garcia
2011-10-14 15:59 ` Jose R. Valverde
3 siblings, 0 replies; 6+ messages in thread
From: A. P. Garcia @ 2011-10-13 19:35 UTC (permalink / raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 539 bytes --]
On Thu, Oct 13, 2011 at 1:43 PM, Larry McVoy <lm at bitmover.com> wrote:
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
>> Just for the record.
>>
>> The correct way to mirror a site with links corrected is
>>
>> wget -c -m -k -np -e robots=off URL
>>
>> Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
> wget --mirror URL
for a complete mirror, yes. but if you want to make sure that you
don't download any malware:
wget --mirror --evil=false
^ permalink raw reply [flat|nested] 6+ messages in thread
* [TUHS] Mirror with link
2011-10-13 18:43 ` Larry McVoy
` (2 preceding siblings ...)
2011-10-13 19:35 ` A. P. Garcia
@ 2011-10-14 15:59 ` Jose R. Valverde
3 siblings, 0 replies; 6+ messages in thread
From: Jose R. Valverde @ 2011-10-14 15:59 UTC (permalink / raw)
That is already included: -m == --mirror
But --mirror (or -m) does not include -k (convert links to local after
the transfer) nor -np (do not follow links upwards the parent directory),
nor an instruction to ignore 'robots.txt'.
The magic incantation I submitted will only download down the hierarchy,
in spite of robots.txt and fixing links, all three problems reported in
the thread.
Of course it is not polite to ignore robots.txt, but sometimes it may be
justified.
j
On Thu, 13 Oct 2011 11:43:37 -0700
Larry McVoy <lm at bitmover.com> wrote:
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> >
> > The correct way to mirror a site with links corrected is
> >
> > wget -c -m -k -np -e robots=off URL
> >
> > Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
> wget --mirror URL
>
> ?
> --
> ---
> Larry McVoy lm at bitmover.com http://www.bitkeeper.com
--
EMBnet/CNB
Scientific Computing Service
Solving all your computer needs for Scientific
Research.
http://bioportal.cnb.csic.es
http://www.es.embnet.org
^ permalink raw reply [flat|nested] 6+ messages in thread