jamesoff.net




FuzzyOCR for SpamAssassin on FreeBSD

Update: As of 4.1.4_2, graphics/libungif has the required patch applied, so these instructions are redundant. You should be able to just install the FuzzyOCR port directly (you might still want to define WITHOUT_X11). I will leave these instructions here as a reference in case they’re of use to someone.

Update: 20070312 Chris Martin wrote to me to say,

You might want to make reference to the issue discussed in this thread:

http://lists.freebsd.org/pipermail/freebsd-ports/2006-November/037098.html

I had this issue and I wish I had googled earlier…

The FuzzyOCR port for FreeBSD only installs the .pm file in /usr/local/etc/mail/spamassassin and I got the word list and .cf file from the file tarball linked to in the howto and then was stumped for a bit before I went hunting for the solution.

Thanks, Chris!


(Note: Instructions written based on the method I used to install on one machine, and have tested on others. There’s now a port for FuzzyOCR which you may want to use instead. If you have updates for this page, please mail james@jamesoff.net.)

FuzzyOCR is a plugin for SpamAssassin which lets it find spammy text inside an image attachment - a trend spammers seem to be using more and more now.

To install it on FreeBSD, you need to do the following (don’t worry, it’s pretty straightforward):

  1. Have SpamAssassin already installed and working

  2. Install needed ports:

    # portinstall -m WITHOUT_X11=yes graphics/netpbm graphics/ImageMagick graphics/gocr devel/p5-String-Approx security/p5-Digest-MD5

    These can pull in quite a few dependencies you probably won’t have installed on your mail server, so now is a good time for coffee. Keep an eye on it though, as print/ghostscript-gnu and possibly some of the other ports will ask you about configuration. You might also want to look into some of the other WITHOUT_* ImageMagick knobs in the Makefile and set them. If you want to set them on the portinstall commandline, do it like this: portinstall -m "WITHOUT_A=yes WITHOUT_B=yes" port/name ...

  3. According to the FuzzyOCR page, you need to patch gocr and libungif, but gocr in ports already has the needed patch. libungif doesn’t (as of 2006-11-06), so you need to patch it:

    # cd && fetch http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch

    # cd /usr/ports/graphics/libungif && make extract patch

    # cd work/libungif-4.1.4/util

    # patch < ~/giftext-segfault.patch

    # cd ../../.. && make -DWITHOUT_X11 build install clean

  4. Now you have the dependencies installed, you can install FuzzyOCR itself.

    # cd && fetch http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz

    # tar zxf fuzzyocr-latest.tar.gz

    # cd FuzzyOcr-2.3b

    # cp FuzzyOcr.cf FuzzyOcr.pm /usr/local/etc/mail/spamassassin

    # cp FuzzyOcr.words.sample /usr/local/etc/mail/spamassassin

  5. Now head on over to /usr/local/etc/mail/spamassassin and edit the .cf file - at the least you need to set the logfile location and change the program paths to point to where FreeBSD puts them (i.e. in /usr/local/bin not /usr/bin):

    # cd /usr/local/etc/mail/spamassassin

    # vi FuzzyOcr.cf

    Set the logfile to an appropriate location (/var/log/FuzzyOcr.log for example) and then touch the file and make sure the user your spamd runs as can write to it.

  6. Grab the sample spam archive from http://users.own-hero.net/~decoder/fuzzyocr/ and follow the instructions in the README to make sure spamassassin uses FuzzyOCR right. When I tested it on the JPEG one, FuzzyOCR didn’t do anything because spamassassin had already allocated it 10 points. I had to increase the verbosity in the logfile to find that out.

  7. Once you’re happy it’s working, restart spamd!

    # /usr/local/etc/rc.d/sa-spamd restart


2 comments on “FuzzyOCR for SpamAssassin on FreeBSD”


  1. Posted by FuzzyOCR Just Works on FreeBSD | anti-SPAM | IT Infusion in Calgary, Alberta, Canada (Permalink)

    [...] I’ve been meaning to implement FuzzyOCR on one of my FreeBSD mail servers for a while now, but I kept putting it off because it involved applying some patches and I hate applying patches. Lazy? Maybe. [...]