The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Mirror with link
@ 2011-10-13 18:37 Jose R. Valverde
  2011-10-13 18:43 ` Larry McVoy
  0 siblings, 1 reply; 6+ messages in thread
From: Jose R. Valverde @ 2011-10-13 18:37 UTC (permalink / raw)


Just for the record.

The correct way to mirror a site with links corrected is

	wget -c -m -k -np -e robots=off URL

Seems most people have problem remembering this incantation.

-- 
			EMBnet/CNB
		Scientific Computing Service
	Solving all your computer needs for Scientific
			Research.

		http://bioportal.cnb.csic.es
		  http://www.es.embnet.org



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Mirror with link
  2011-10-13 18:37 [TUHS] Mirror with link Jose R. Valverde
@ 2011-10-13 18:43 ` Larry McVoy
  2011-10-13 18:51   ` Derrik Walker
                     ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Larry McVoy @ 2011-10-13 18:43 UTC (permalink / raw)


On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> Just for the record.
> 
> The correct way to mirror a site with links corrected is
> 
> 	wget -c -m -k -np -e robots=off URL
> 
> Seems most people have problem remembering this incantation.

Wouldn't it be nice if it were

	wget --mirror URL

?
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Mirror with link
  2011-10-13 18:43 ` Larry McVoy
@ 2011-10-13 18:51   ` Derrik Walker
  2011-10-13 18:53   ` SPC
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Derrik Walker @ 2011-10-13 18:51 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

I like shell scripts for hard to remember things like that.

Call it wget_mirror.sh or something like that:

#!/bin/bash

if [ $# -ne 1 ] ; then echo ERRR; exit 1 ; else url=$1 ; fi

wget -c -m -k -np -e robots=off $url


( that was off the top of my head and has NOT been tested ... it might have a typo or two :)

- Derrik

On Oct 13, 2011, at 02:43 PM, Larry McVoy <lm at bitmover.com> wrote:

> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> >
> > The correct way to mirror a site with links corrected is
> >
> > wget -c -m -k -np -e robots=off URL
> >
> > Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
> wget --mirror URL
>
> ?
> -- 
> ---
> Larry McVoy lm at bitmover.com http://www.bitkeeper.com
> _______________________________________________
> TUHS mailing list
> TUHS at minnie.tuhs.org
> https://minnie.tuhs.org/mailman/listinfo/tuhs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20111013/29ca20ef/attachment.html>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Mirror with link
  2011-10-13 18:43 ` Larry McVoy
  2011-10-13 18:51   ` Derrik Walker
@ 2011-10-13 18:53   ` SPC
  2011-10-13 19:35   ` A. P. Garcia
  2011-10-14 15:59   ` Jose R. Valverde
  3 siblings, 0 replies; 6+ messages in thread
From: SPC @ 2011-10-13 18:53 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

Computing woudln't be the same :-)

-- 
Saludos - Greetings - Freundliche Grüße - Salutations

Sergio Pedraja
<http://www.linkedin.com/in/sergiopedraja>
twitter: @sergio_pedraja
http://www.linkedin.com/in/sergiopedraja
 http://www.quora.com/Sergio-Pedraja
-----

2011/10/13 Larry McVoy <lm at bitmover.com>

> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> >
> > The correct way to mirror a site with links corrected is
> >
> >       wget -c -m -k -np -e robots=off URL
> >
> > Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
>        wget --mirror URL
>
> ?
> --
> ---
> Larry McVoy                lm at bitmover.com
> http://www.bitkeeper.com
> _______________________________________________
> TUHS mailing list
> TUHS at minnie.tuhs.org
> https://minnie.tuhs.org/mailman/listinfo/tuhs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20111013/9ccb1331/attachment.html>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Mirror with link
  2011-10-13 18:43 ` Larry McVoy
  2011-10-13 18:51   ` Derrik Walker
  2011-10-13 18:53   ` SPC
@ 2011-10-13 19:35   ` A. P. Garcia
  2011-10-14 15:59   ` Jose R. Valverde
  3 siblings, 0 replies; 6+ messages in thread
From: A. P. Garcia @ 2011-10-13 19:35 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On Thu, Oct 13, 2011 at 1:43 PM, Larry McVoy <lm at bitmover.com> wrote:
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
>> Just for the record.
>>
>> The correct way to mirror a site with links corrected is
>>
>>       wget -c -m -k -np -e robots=off URL
>>
>> Seems most people have problem remembering this incantation.
>
> Wouldn't it be nice if it were
>
>        wget --mirror URL

for a complete mirror, yes. but if you want to make sure that you
don't download any malware:

wget --mirror --evil=false



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Mirror with link
  2011-10-13 18:43 ` Larry McVoy
                     ` (2 preceding siblings ...)
  2011-10-13 19:35   ` A. P. Garcia
@ 2011-10-14 15:59   ` Jose R. Valverde
  3 siblings, 0 replies; 6+ messages in thread
From: Jose R. Valverde @ 2011-10-14 15:59 UTC (permalink / raw)


That is already included: -m == --mirror

But --mirror (or -m) does not include -k (convert links to local after
the transfer) nor -np (do not follow links upwards the parent directory),
nor an instruction to ignore 'robots.txt'.

The magic incantation I submitted will only download down the hierarchy,
in spite of robots.txt and fixing links, all three problems reported in
the thread.

Of course it is not polite to ignore robots.txt, but sometimes it may be
justified.

				j

On Thu, 13 Oct 2011 11:43:37 -0700
Larry McVoy <lm at bitmover.com> wrote:
> On Thu, Oct 13, 2011 at 08:37:27PM +0200, Jose R. Valverde wrote:
> > Just for the record.
> > 
> > The correct way to mirror a site with links corrected is
> > 
> > 	wget -c -m -k -np -e robots=off URL
> > 
> > Seems most people have problem remembering this incantation.
> 
> Wouldn't it be nice if it were
> 
> 	wget --mirror URL
> 
> ?
> -- 
> ---
> Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com


-- 
			EMBnet/CNB
		Scientific Computing Service
	Solving all your computer needs for Scientific
			Research.

		http://bioportal.cnb.csic.es
		  http://www.es.embnet.org



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-10-14 15:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 18:37 [TUHS] Mirror with link Jose R. Valverde
2011-10-13 18:43 ` Larry McVoy
2011-10-13 18:51   ` Derrik Walker
2011-10-13 18:53   ` SPC
2011-10-13 19:35   ` A. P. Garcia
2011-10-14 15:59   ` Jose R. Valverde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).