On Thu, 11 Nov 2004, John McCormac wrote:
> here.) The msnbot is so badly written that it does not use 304s and puts
> excessive loads on webservers. So many webmasters have complained about it
> that Microsoft even introduced its own robots.txt entry so that webmasters
> can use a delay between pages being fetched by its scrapers.
Ah, where would that be? I followed the URL in the client id it uses
to http://search.msn.com/webmasters/msnbot.aspx, but there's nothing
there on how to limit it except the standard robots.txt and robot
meta tags.
I've sent a mail to their mail address asking them to rate limit, but
received no reply - at this stage I'm considering barring the MSNBot
altogether. It's responsible for 40% of hits to a site I maintain..
regards,
--
Paul Jakma paul at clubi.iepaul at jakma.org Key ID: 64A2FF6A
Fortune:
A citizen of America will cross the ocean to fight for democracy, but
won't cross the street to vote in a national election.
-- Bill Vaughan
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!