spamassassin and OCR plugin
I wanted to install something to weed out spam that had gifs containing spam content. Normally, these
can't be parsed by standard methods. AC pointed me in the direction of an OCR plugin, which can scan
an image and recognise certain patterns. At first, I found OCR, which was based on some perl stuff as
well as gocr. I had some problems integrating it into the existing spamassassin setup. It just was not
running the scans. I just didn't know where to put the 'loadplugin' statements and perl modules.
I found running spamassassin in debug mode is the best way to find out exactly where it looks and what tests
it does. By this point I'd found that there was a newer plugin which claimed improvements on OCR
FuzzyOCR (wiki at: http://fuzzyocr.own-hero.net/wiki/Installation-3.x)
spamassassin --debug FuzzyOcr < ./gif_spam > /dev/null
So I ran it it in debug mode and it prints all the paths it searches, plugins it loads, and tests it does.
It turned out that it was using /var/lib/spamassassin, not the standard /usr/share/spamassassin (which also existed)
The wiki comes with good instructions and suggested to put it in /etc/spamassassin, where spamd just picked it up (after a reload). I had to set the db directory to something writeable by the nobody user though. Running it in debug (as above) reported all the issues it had.
http://spamassassin.apache.org/tests_3_1_x.html
AREA TESTED LOCALE DESCRIPTION OF TEST TEST NAME DEFAULT SCORES
(local, net, with bayes, with bayes+net) MORE INFO
(additional wiki docs)
body Generic Test for Unsolicited Bulk Email GTUBE 1000.000 Wiki
body Incorporates a tracking ID number TRACKER_ID 2.000 1.295 2.292 1.032 Wiki
body Weird repeated double-quotation marks WEIRD_QUOTING 1.120 1.200 1.295 1.341 Wiki
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment