On Thu, 20 May 2004 20:21:17 +0100 (IST)
Rory McCann <rory at netsoc.ucd.ie> wrote:
> On Wed, 19 May 2004, Justin MacCarthy wrote:
> > Hi
> > I want to use wget to give me a list of all URIs referenced
> > on my domain (all only my domain), so I can remove old files.
> > I'm sure there is way to do this with wget, but I just can't
> > see it, (really bad flu today) I don't want to download
> > anything just list the documents so I can clean up a local
> > copy
> > Thanks Justin
>> Well maybe you could combine wget and a bit of bash scripting
> and sed/awk/perl. Use wget to download a page, then pass the
> downloaded page into an awk/perl script to extract out all the
> URLs/URIs, and save the outputted list of URLs/URIs. That might
> do the trick, if I understand your problem.
If you do that to build a list of URIs referenced on your site,
you can then use wget --spider -i <uri-list.txt> to check that
they are all correct. But that's not really what you want to do.
> Irish Linux Users' Group
PGP Key: 0x0E7EE8D8 (expires 06-Aug-2004)
Web: http://www.helgrim.com/ | ICQ: 109837009 | YIM: ectoraige
Visit http://ie.bsd.net - BSDs presence in Ireland
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 187 bytes
Desc: not available
Url : http://mail.linux.ie/pipermail/ilug/attachments/20040520/9639544f/attachment.pgp
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!