Problems with Bayesian Comment Filtering

As much as it sounded like a good idea, my previous solution to comment spams had a few issues. Well, not so much issues as annoyances. The original author of that plugin posted his list of issues, so I knew about those going in, though I was unconcerned. To me, the biggest issue is simply fact that comment spam still gets posted. Sure, it doesn’t appear to the user, nor does it appear to the search engine when it comes to index my site, but I still get an email for each comment and I still have to sort through it on the backend. So, I decided to exclude blind people from being able to post comments on my blog (sorry). I’m now using the SCode plugin instead of (well, truthfully, in addition to) the bayesian filter.

I did run into a couple of problems when trying to install it. After following the instructions in the README file, it still wasn’t working. So, I ran the scodetest.cgi script. It kept telling me that my temporary directory wasn’t writable by the webserver, which I knew to be an incorrect statement. So I did what any reasonable person trying to install a plugin would do: I started looking at the code. It really is a simple module, so it didn’t take long to figure out what was going on. The scodetest.cgi script as well as the mt-scode.cgi image generation script were calling MT::SCode::scode_get($code) to retrieve a security code. There was an if block in that subroutine that would call scode_generate() and return that subroutine’s return value as the return value for scode_get. That seemed like the right thing to do, but it wasn’t. The problem is that scode_generate() doesn’t make any effort whatsoever to save the value that it generated so that it will be associated with the $code originally passed to scode_get(). This is what was causing the scodetest.cgi script to always say that my tmp directory wasn’t writable (that script simply calls the scode_get() routine twice and checks to see if it got the same result both times). So, I modified my scode_get() routine to call scode_create() (which calls scode_generate() and saves the value) instead of scode_generate(). The relavent portion of that subroutine now looks like this:

# Random number back...if have not initialized
if ($code< =0 || $code>$scode_maxtmp || !-e $tmpdir.$code ) {
return scode_create($code);

Because I’m still trying to return this random value to whatever called scode_get(), I needed to modify the scode_create() subroutine to return the generated code. My new scode_create() routine looks like:

sub scode_create {
my $code = shift;

return if (-e $tmpdir.$code);

my $scode = scode_generate();
if ($code>0 && $code<=$scode_maxtmp) {
print OUTFILE $scode;
return $scode

And now everything works. The only other issue I noticed was an incorrect usage of the alt tag for an image in the installation instructions. The alt tag is not supposed to be used as a tooltip, but as alternate text to display when the image is unable to be displayed. As such, it should say something like: “Image required in order to post.” The tag that should be used for the text “Please enter the security code you see here” (what the plugin author put in the alt tag) is the title tag.

3 Replies to “Problems with Bayesian Comment Filtering”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.