In an earlier post, I talked about the initial reaction to WebmasterWorld.com’s decision to exclude all bots, including search engine bots, in its robots.txt file. Most search engine optimization professionals, many of them WebmasterWorld regulars, were shocked by the move. (One of the more amusing suggestions was that Brett Tabke was stolen by body-snatchers - credit this to long-time Pubcon presenter Andy Beal.)
In a lengthy post titled Attack of the Robots, Spiders, Crawlers, etc. WebmasterWorld CEO Brett Tabke explains the reasoning behind the total bot ban.
In essence, to avoid being inundated by page requests from non-human visitors and incurring a decline in site performance for members, Tabke felt that requiring all visitors to log in was the only workable approach. (While even this step wouldn’t eliminate all unwanted spiders, it would cut out most of the less sophisticated bots.) The robots.txt ban reflects not so much that “legitimate” bots (like spiders from Google, Yahoo, and MSN) were causing a problem as that allowing these bots to navigate the site with impunity while requiring human visitors to log in would probably not comply with search engine quality guidelines. Feeding the search engine bots the pages they wanted would require content delivery based on the requester’s IP address, commonly called “cloaking”. Search engines frown on cloaking, and generally evaluate the difference in content to determine if cloaking is acceptable. Delivering geo-targeted content based on IP address might be acceptable. Delivering search engine bots a page of keyword-stuffed text while showing all other users something entirely different would generally be considered spam and would result in the site being removed from search engine indices. In Tabke’s judgment, showing human visitors a login screen while showing search engines the site content would probably be considered an unacceptable level of cloaking; hence, he implemented the bot ban.
Tabke’s expanded explanation answers critics who noted that the rogue bots and crawlers wouldn’t even check the robots.txt, and that the move would only eliminate “good” bots. The real move against the bad bots was requiring all visitors to log in (including cookie support), and the robots.txt move was intended to avoid search engine bots bumping into millions of login screens or requiring unacceptable cloaking.
The duration of this change is unknown; Tabke’s original announcement suggested it was a trial that would last 1 - 3 months. While WebmasterWorld benefits from huge repeat traffic and thousands of inbound links, it’s hard to imagine that Tabke would be willing to run forever, or even an extended period of time, without the flow of new members generated by relevant search engine traffic.
Add this post to: del.icio.us - Digg it - Stumble it - Furl - Yahoo MyWeb No Comments so far
Leave a comment
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
