Blog spam protection

I started this blog a few days ago. It is not much popular. In fact, I did not get even one comment so far. I still have hopes though. My website has been indexed by Google and I have got a few hits from search. One of the problem I will quickly have to face is blog spam.

There are many kinds of blog spam but the most common one is comment spam. It is quite easy to do if you don’t have any protection. All you need is some kind of HTTP client facilities like the HttpWebRequest class with C# or the httplib with Python. You could write your own HTTP client but don’t do it if it is not for educational purposes. “You shall not reinvent the wheel” is probably in programmer commandments somewhere. Once you have your HTTP client, all you need to do is send a request to the target of the new comment form with some post data as name, email, website and comment. It is really simple. I did it a few times in a different context.

Imagine the possibilities if you were a spammer. You could use the same client to make some simple search on Google Blog Search, parse the resulting links, use your client to get the different posts of each blog, parse the (X)HTML code for the correct comment form data fields and post it back with your client everywhere in a big automated way to advertise your wonderfull products. If I thought about it, it is probably being done by someone somewhere.

How do you protect your blog? There are many different ways, but the most effective in my opinion is a CAPTCHA.

A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. “CAPTCHA” is a contrived acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”.

It usually looks like some words or text that are showed distorted in an image that you have to write down in order to proceed.

Why is it effective? Because is it automated and there are still some fields in which humans are better than machines. In most case, we are talking about character recognition. There are some alternatives that are not automated or are not using a field in which the human being is not superior to a computer. Those solutions are not effective. Not being automated means that you have a set of challenges. The spammer only needs to know the set in advance to win the fight. It is similar to security through obscurity which does not work in the long term. If the computer is better than most people at the challenge, it wins.

I know that I need a CAPTCHA. Which one will I choose? I have to be carefull because there are several bad CAPTCHAs. Character recognition challenges has to be well distorted. I’m still not sure which one I will use. I want one which does not need javascript and can produce a computationally hard to recognize image. During my search, I found a nice one called freeCap which has all these qualities, is free and open source. It does not have a plugin for my blog publishing system but I might do it if that’s all it remains to be done.

Post a Comment

Your email is never published nor shared. Required fields are marked *

CAPTCHA image