On Thu, Jul 05, 2001 at 12:24:17PM +0100, Anton McKee wrote:
> Anyhow I was wondering, is there a way of telling visiting spiders etc
> to ignore content in a page like a text menu etc.
> UDMSEARCH (The best search engine for personal sites etc IMHO) allows
> you to put comments in a page and it will ignore everything between
> the comments. (Hate to have to type in loads of Meta)
This doesn't seem to be possible for most search engines at least. It
seems you can only tell most whether to include a page or not.
> Can I do this for the like of google etc. Can I specify it in
If you go with robots.txt, you can specify which bots can look at certain
pages, or make a blanket one for all engines. Have a look at:
as it will explain better than I can.
> If anyone has any ideas that would be cool. (Hate to have to type in
> loads of Meta)
You could write a small script to put in the METAs in each page, or a set
of pages. Perl would be ideal for this sort of thing. You could also just
change your directory structure, eg, make /norobots and put that in your
robots.txt, although this would be considered A Bad Thing.
Another thought would be to create some (mod_rewrite?) rules based on known
robot user-agent types which would render a different page for them to
cache. If your page is not just static, I'm sure it would be relatively
painless to change what is being rendered for robots by looking at their
I'm not sure if any of this is helpful, but I thought I would try and help.
Computers are useless. They can only give you answers.
-- Pablo Picasso
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!