browser icon
You are using an insecure version of your web browser. Please update your browser!
Using an outdated browser makes your computer unsafe. For a safer, faster, more enjoyable user experience, please update your browser today or try a newer browser.

Bayes and fuzzyocr.hashdb working!

Posted by on September 22, 2006

For well over a year now, I’ve been wondering why Bayes worked in my manual scans when I ran a piece of mail that made it through mailserver’s filters. It appears that spamd and the manual scan weren’t producing the exact same results. In order to get spamd working as well as a manual scan, I first had to debug by looking at the mail log at /var/log/mail.log. Immediately, I noticed some permissions problems. I chmod 777 the areas where spamd needed access and then set up a site wide bayesian database per the instructions at this link:

http://wiki.apache.org/spamassassin/SiteWideBayesSetup

With this in place, Bayesian use by Spamassassin and autolearning are working fine now. The only sad thing is the mysql database I set up for the Bayes database was unusable due to some weirdness with a perl component DBD::mysql. It worked fine yesterday, but then it broke. I just went back to using the default DBD database. The strangest thing that was going on was that my manual scans were using a different local.cf and directory for plugins than the spamd scans. Upon realizing this, I knew where I had to put the latest version of fuzzyocr so that I could write to the fuzzyocr.hashdb. This seems to be working well in catching the noxious image spams that are coming in a lot more frequently these days. I’ve also upped the scores for any emails that come in with attached or inline gifs since they are rarely legitimate emails.

The combination of Bayes finally working and fuzzyocr will really cut down on the amount of spam hitting the clients. I’m very happy about that!