On Thu, 11 Nov 2004, Paul Jakma wrote:
> On Thu, 11 Nov 2004, John McCormac wrote:
>> > here.) The msnbot is so badly written that it does not use 304s and puts
> > excessive loads on webservers. So many webmasters have complained about it
> > that Microsoft even introduced its own robots.txt entry so that webmasters
> > can use a delay between pages being fetched by its scrapers.
>> Ah, where would that be? I followed the URL in the client id it uses
> to http://search.msn.com/webmasters/msnbot.aspx, but there's nothing
> there on how to limit it except the standard robots.txt and robot
> meta tags.
It was on the mssearch forum on http://www.webmasterworld.com but the
syntax was something like
User-agent: msnbot
Crawl-delay: nn
where nn is the delay between fetches in seconds.
> I've sent a mail to their mail address asking them to rate limit, but
> received no reply - at this stage I'm considering barring the MSNBot
> altogether. It's responsible for 40% of hits to a site I maintain..
Imagine what it is like on whoisireland - it kept whacking the site every
few days because the gobshites in Microsoft couldn't build a 304 capable
robot. :)
Regards...jmcc
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!