Justin MacCarthy wrote:
> Hi
>> I want to use wget to give me a list of all URIs referenced on my domain
> (all only my domain), so I can remove old files. I'm sure there is way
> to do this with wget, but I just can't see it, (really bad flu today) I
> don't want to download anything just list the documents so I can clean
> up a local copy
I didn't wanted to go to this quick and dirty hack, but because nobody
came with better in 24 hours...
The idea came from a section of wget man page.
wget -r -nd --reject gif,jpg,png --delete-after http://whatever.com/
That will request all the linked documents on you site, but won't store
the retrieved data.
Then just after that, retrieve the logfile of your webserver, grep the
IP you connected from, cut the 7th field (cut -d' ' -f7) and you'll get
your full list.
You may also ask wget no to retriev files that are obviously "leaves" of
you web tree like images.
--
Thomas
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!