Discussion:
[Bug-wget] "Referer" when using spider mode
Fernando Gont
2018-11-27 11:04:25 UTC
Permalink
Folks,

I'm using wget in a script to find broken and "moved" links in a web site.

My problem is that, when parsing the output of "wget --spider", I cannot
tell which page triggered the retrieval of a URL (i.e., the "referer" of
such URL) -- so, while I can find that there are broken links, I cannot
easily tell which page contains the broken link.

Any clues on how to obtain such info?

Thanks!
Fernando
--
Fernando Gont
SI6 Networks
e-mail: ***@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
Darshit Shah
2018-11-27 12:34:04 UTC
Permalink
Hi Fernando,

Once again, the answer is quite the same. You could parse the --debug output of
Wget to do this. Though, remember, parsing the debug is not always safe since
we may change it any point.

However, even for this case, I guess, using Wget2 is a better choice for you.

https://gitlab.com/gnuwget/wget2
Post by Fernando Gont
Folks,
I'm using wget in a script to find broken and "moved" links in a web site.
My problem is that, when parsing the output of "wget --spider", I cannot
tell which page triggered the retrieval of a URL (i.e., the "referer" of
such URL) -- so, while I can find that there are broken links, I cannot
easily tell which page contains the broken link.
Any clues on how to obtain such info?
Thanks!
Fernando
--
Fernando Gont
SI6 Networks
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
Dale R. Worley
2018-11-28 04:02:29 UTC
Permalink
Post by Fernando Gont
I'm using wget in a script to find broken and "moved" links in a web site.
My problem is that, when parsing the output of "wget --spider", I cannot
tell which page triggered the retrieval of a URL (i.e., the "referer" of
such URL) -- so, while I can find that there are broken links, I cannot
easily tell which page contains the broken link.
Any clues on how to obtain such info?
I would grep through the files in the file tree that wget constructs.
That isn't perfect, but it's probably close enough for your purposes.

Dale

Loading...