LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] wget receipe

[ILUG] wget receipe

Thomas Pedoussaut thomas at staffeurs.org
Thu May 20 23:23:41 IST 2004


Justin MacCarthy wrote:
> Hi
> 
> I want to use wget to give me a list of all URIs referenced on my domain 
> (all only my domain), so I can remove old files. I'm sure there is way 
> to do this with wget, but I just can't see it, (really bad flu today) I 
> don't want to download anything just list the documents so I can clean 
> up a local copy

I didn't wanted to go to this quick and dirty hack, but because nobody 
came with better in 24 hours...

The idea came from a section of wget man page.

wget -r -nd --reject gif,jpg,png --delete-after http://whatever.com/

That will request all the linked documents on you site, but won't store 
the retrieved data.
Then just after that, retrieve the logfile of your webserver, grep the 
IP you connected from, cut the 7th field (cut -d' ' -f7) and you'll get 
your full list.

You may also ask wget no to retriev files that are obviously "leaves" of 
you web tree like images.

-- 
Thomas




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell