If you search any commercial phrase in blog search leader Technorati.com, you’ll likely find that many, if not most, of the search results take you to spam blogs, or “splogs” as some call them. These sites look like a regular blog at first glance, but are easily identified by republished or even gibberish content, keyword titles, spammy links, generic design, and lack of identifying info.
The mainstream media has caught on, as WSJ.com shows in ‘Splogs’ Roil Web, and Some Blame Google.
To date, most discussions of blog spam related to automated comment posting to generate links. An entry today in the MarketingLoop blog even describes commercial software for generating phony blog comments. While blog software makers and search engines combined to create the “nofollow” link attribute to render such links worthless, the concept of the blogs themselves being entirely spam hasn’t received nearly as much attention.
The magnitude of the problem, though, was illustrated by a few anecdotes in the WSJ article. Recently, a single “spamalanche” attack on Google’s Blogger site attempted to create 13,000 spam blogs. And IceRocket, another blog search provider, says that it deletes a million spam blogs a month from its index. Rexblog notes that “reptilian” marketers have created multiple copies of their content as part of the splog explosion.
Interestingly, Verisign may play a role in slowing the spam blog pestilence.
The long-term solution is probably mostly algorithmic in nature, with a dash of human intervention thrown in. Just as the major search engines deliver reasonably good results for most searches, blog search algorithms will have to develop to be able to sift out most of the spam. And, just as with web search results, humans will further enhance the results by flagging spam sites when they are found and monitoring commercial search terms for quality and spam fighting.
Add this post to: del.icio.us - Digg it - Stumble it - Furl - Yahoo MyWeb No Comments so far
Leave a comment
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
