{"id":46,"date":"2005-01-06T03:17:11","date_gmt":"2005-01-06T07:17:11","guid":{"rendered":"http:\/\/www.steenhagen.us\/~jake\/trunk\/?p=46"},"modified":"2006-12-07T11:17:51","modified_gmt":"2006-12-07T16:17:51","slug":"problems-with-bayesian-comment-filtering","status":"publish","type":"post","link":"https:\/\/jacob.steenhagen.us\/blog\/2005\/01\/problems-with-bayesian-comment-filtering\/","title":{"rendered":"Problems with Bayesian Comment Filtering"},"content":{"rendered":"<p class=\"first\">As much as it sounded like a good idea, <a href=\"http:\/\/www.steenhagen.us\/~jake\/blog\/archives\/000042.html\">my previous solution<\/a> to comment spams had a few issues. Well, not so much issues as annoyances. The original author of that plugin posted his <a href=\"http:\/\/james.seng.cc\/archives\/000337.html\"> list of issues<\/a>, so I knew about those going in, though I was unconcerned. To me, the biggest issue is simply fact that comment spam still gets posted. Sure, it doesn&#8217;t appear to the user, nor does it appear to the search engine when it comes to index my site, but I still get an email for each comment and I still have to sort through it on the backend. So, I decided to exclude blind people from being able to post comments on my blog (sorry). I&#8217;m now using the <a href=\"http:\/\/james.seng.cc\/archives\/000145.html\">SCode<\/a> plugin instead of (well, truthfully, in addition to) the bayesian filter.<\/p>\n<p>I did run into a couple of problems when trying to install it. After following the instructions in the README file, it still wasn&#8217;t working. So, I ran the <tt>scodetest.cgi<\/tt> script. It kept telling me that my temporary directory wasn&#8217;t writable by the webserver, which I knew to be an incorrect statement. So I did what any reasonable person trying to install a plugin would do: I started looking at the code. It really is a simple module, so it didn&#8217;t take long to figure out what was going on. The <tt>scodetest.cgi<\/tt> script as well as the <tt>mt-scode.cgi<\/tt> image generation script were calling MT::SCode::scode_get($code) to retrieve a security code. There was an <tt>if<\/tt> block in that subroutine that would call <tt>scode_generate()<\/tt> and return that subroutine&#8217;s return value as the return value for <tt>scode_get<\/tt>. That seemed like the right thing to do, but it wasn&#8217;t. The problem is that <tt>scode_generate()<\/tt> doesn&#8217;t make any effort whatsoever to save the value that it generated so that it will be associated with the <tt>$code<\/tt> originally passed to <tt>scode_get()<\/tt>. This is what was causing the <tt>scodetest.cgi<\/tt> script to always say that my tmp directory wasn&#8217;t writable (that script simply calls the <tt>scode_get()<\/tt> routine twice and checks to see if it got the same result both times). So, I modified my <tt>scode_get()<\/tt> routine to call <tt>scode_create()<\/tt> (which calls <tt>scode_generate()<\/tt> and saves the value) instead of <tt>scode_generate()<\/tt>. The relavent portion of that subroutine now looks like this:<\/p>\n<pre class=\"code\"># Random number back...if have not initialized\r\nif ($code< =0 || $code>$scode_maxtmp || !-e $tmpdir.$code ) {\r\nreturn scode_create($code);\r\n}<\/pre>\n<p>Because I&#8217;m still trying to return this random value to whatever called <tt>scode_get()<\/tt>, I needed to modify the <tt>scode_create()<\/tt> subroutine to return the generated code. My new <tt>scode_create()<\/tt> routine looks like:<\/p>\n<pre class=\"code\">sub scode_create {\r\nmy $code = shift;\r\n\r\nreturn if (-e $tmpdir.$code);\r\n\r\nmy $scode = scode_generate();\r\nif ($code&gt;0 && $code&lt;=$scode_maxtmp) {\r\nopen(OUTFILE,\"&gt;${tmpdir}${code}\");\r\nprint OUTFILE $scode;\r\nclose(OUTFILE);\r\n}\r\nreturn $scode\r\n}<\/pre>\n<p>And now everything works. The only other issue I noticed was an incorrect usage of the <tt>alt<\/tt> tag for an image in the installation instructions. The <tt>alt<\/tt> tag is not supposed to be used as a tooltip, but as alternate text to display when the image is unable to be displayed. As such, it should say something like: &#8220;Image required in order to post.&#8221; The tag that should be used for the text &#8220;Please enter the security code you see here&#8221; (what the plugin author put in the <tt>alt<\/tt> tag) is the <tt>title<\/tt> tag.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As much as it sounded like a good idea, my previous solution to comment spams had a few issues. Well, not so much issues as annoyances. The original author of that plugin posted his list of issues, so I knew about those going in, though I was unconcerned. To me, the biggest issue is simply &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/jacob.steenhagen.us\/blog\/2005\/01\/problems-with-bayesian-comment-filtering\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Problems with Bayesian Comment Filtering&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[3,15],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p7EJi-K","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/posts\/46"}],"collection":[{"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/comments?post=46"}],"version-history":[{"count":0,"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/posts\/46\/revisions"}],"wp:attachment":[{"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/media?parent=46"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/categories?post=46"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jacob.steenhagen.us\/blog\/wp-json\/wp\/v2\/tags?post=46"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}