Fernando Gont
2018-11-27 11:20:45 UTC
Folks,
I'm using wget in a script to check for broken links in a web site,
which uses the "--spider" mode.
I'd like wget to operate in recursive mode for pages in the target
domain, but not for pages in other hosts/sites.
That is, if I'm crawling www.example.com, I'd like wget to process all
pages in that domain recursively. However, if there's a link to an
external site, I just want wget to check that URL, but not process that
external reference recursively.
"-D" would seem to prevent checking external references, so I cannot use
it. And "--level" would mean that pages on external sites my still be
processed recursively.
Any advice on how to implement this?
Thanks!
Cheers,
Fernando
I'm using wget in a script to check for broken links in a web site,
which uses the "--spider" mode.
I'd like wget to operate in recursive mode for pages in the target
domain, but not for pages in other hosts/sites.
That is, if I'm crawling www.example.com, I'd like wget to process all
pages in that domain recursively. However, if there's a link to an
external site, I just want wget to check that URL, but not process that
external reference recursively.
"-D" would seem to prevent checking external references, so I cannot use
it. And "--level" would mean that pages on external sites my still be
processed recursively.
Any advice on how to implement this?
Thanks!
Cheers,
Fernando
--
Fernando Gont
SI6 Networks
e-mail: ***@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
Fernando Gont
SI6 Networks
e-mail: ***@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492