FuzzyOCR for SpamAssassin on FreeBSD
Update: As of 4.1.4_2, graphics/libungif has the required patch applied, so these instructions are redundant. You should be able to just install the FuzzyOCR port directly (you might still want to define WITHOUT_X11). I will leave these instructions here as a reference in case they’re of use to someone.
Update: 20070312 Chris Martin wrote to me to say,
You might want to make reference to the issue discussed in this thread:
http://lists.freebsd.org/pipermail/freebsd-ports/2006-November/037098.html
I had this issue and I wish I had googled earlier…
The FuzzyOCR port for FreeBSD only installs the .pm file in /usr/local/etc/mail/spamassassin and I got the word list and .cf file from the file tarball linked to in the howto and then was stumped for a bit before I went hunting for the solution.
Thanks, Chris!
(Note: Instructions written based on the method I used to install on one machine, and have tested on others. There’s now a port for FuzzyOCR which you may want to use instead. If you have updates for this page, please mail james@jamesoff.net.)
FuzzyOCR is a plugin for SpamAssassin which lets it find spammy text inside an image attachment - a trend spammers seem to be using more and more now.
To install it on FreeBSD, you need to do the following (don’t worry, it’s pretty straightforward):
Have SpamAssassin already installed and working
Install needed ports:
# portinstall -m WITHOUT_X11=yes graphics/netpbm graphics/ImageMagick graphics/gocr devel/p5-String-Approx security/p5-Digest-MD5These can pull in quite a few dependencies you probably won’t have installed on your mail server, so now is a good time for coffee. Keep an eye on it though, as
print/ghostscript-gnuand possibly some of the other ports will ask you about configuration. You might also want to look into some of the otherWITHOUT_*ImageMagick knobs in the Makefile and set them. If you want to set them on theportinstallcommandline, do it like this:portinstall -m "WITHOUT_A=yes WITHOUT_B=yes" port/name ...According to the FuzzyOCR page, you need to patch gocr and libungif, but gocr in ports already has the needed patch. libungif doesn’t (as of 2006-11-06), so you need to patch it:
# cd && fetch http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch# cd /usr/ports/graphics/libungif && make extract patch# cd work/libungif-4.1.4/util# patch < ~/giftext-segfault.patch# cd ../../.. && make -DWITHOUT_X11 build install cleanNow you have the dependencies installed, you can install FuzzyOCR itself.
# cd && fetch http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz# tar zxf fuzzyocr-latest.tar.gz# cd FuzzyOcr-2.3b# cp FuzzyOcr.cf FuzzyOcr.pm /usr/local/etc/mail/spamassassin# cp FuzzyOcr.words.sample /usr/local/etc/mail/spamassassinNow head on over to
/usr/local/etc/mail/spamassassinand edit the .cf file - at the least you need to set the logfile location and change the program paths to point to where FreeBSD puts them (i.e. in/usr/local/binnot/usr/bin):# cd /usr/local/etc/mail/spamassassin# vi FuzzyOcr.cfSet the logfile to an appropriate location (
/var/log/FuzzyOcr.logfor example) and thentouchthe file and make sure the user your spamd runs as can write to it.Grab the sample spam archive from http://users.own-hero.net/~decoder/fuzzyocr/ and follow the instructions in the README to make sure
spamassassinuses FuzzyOCR right. When I tested it on the JPEG one, FuzzyOCR didn’t do anything becausespamassassinhad already allocated it 10 points. I had to increase the verbosity in the logfile to find that out.Once you’re happy it’s working, restart
spamd!# /usr/local/etc/rc.d/sa-spamd restart
Feb
2007
04:45
[...] A great FreeBSD Tutorial [...]
Mar
2007
06:49
[...] I’ve been meaning to implement FuzzyOCR on one of my FreeBSD mail servers for a while now, but I kept putting it off because it involved applying some patches and I hate applying patches. Lazy? Maybe. [...]