K9 Email Filter and Blacklist

K9 E-mail Filter and Blacklist

This page exists solely to provide links related to Robin Keir’s excellent and free K9 E-mail Filter, which may be of use to my readers.

About this Blacklist

A blacklist is, technically, a list of senders from whom you do not wish to receive e-mail messages. In K9, this meaning is extended significantly: a K9 blacklist is a list of rules – relating to sender, subject, content, etc – defining messages you do not wish to receive.

My blacklist file is provided at the link above, in the hope that it may be useful. I receive nearly 500 messages per day, of which approximately 85% are spam and 13% come from several mailing lists to which I subscribe. That leaves only 2% of my traffic as non-mailing-list “ham,” or good e-mails. The blacklist I use catches 86.0% of my spam (77.7% of total message volume), leaving only 14.0% of the spam (12.7% of total volume) and a very small amount of non-spam for statistical processing by K9’s Bayesian filter.

[NOTE: The above information is very old and has been changing rapidly. I now get well over 1,000 spams per day (1,500+ in July 2008), and spammers are apparently having to get a little more creative. The spam tactics which are currently popular are designed to throw off Bayesian filters, but a simple Bayesian filter can still do very well. In any case, please take everything that follows with a grain of salt.]

Overall, if you train K9 properly, you should be able to get 99.5% or better, with only about 0.1% false positives. With this blacklist and a corresponding whitelist, I am getting far better than 99.9% accuracy. These are just my statistics; your results may vary, depending on the composition of your e-mail traffic. Actual stats through 11:48 AM, 4/24/2004:

Since Thu Apr 01 2004 02:30:03 PM
(23 days)
Since Fri Feb 13 2004 12:45:25 PM
(71 days)
Total number of emails processed 11,116
Number of Good emails processed 1,063
Number of Spam emails processed 10,053
Percentage of emails that matched whitelist rules 8.88% 13.83%
Percentage of emails that matched blacklist rules 77.74% 63.76%
Number of emails re-classified to Good 0 2
Number of emails re-classified to Spam 2 12
Percentage emails misidentified as Spam (false positives) 0.000% 0.023%
Percentage emails misidentified as Good (false negatives) 0.018% 0.039%
Overall accuracy 99.982% 99.955%

DISCLAIMER: These stats are SLIGHTLY inflated through my whitelisting policy (though not intentionally), as three special cases were false positives not reflected in these stats:

  • a legitimate e-mail from PayPal (imagine that!),
  • a handful of (often poorly written) contacts from a site I manage sent directly to an e-mail address (not through a web form),
  • and an e-mail from a friend in Bolivia, who had not written in almost five years.

If you want to be really picky, then, my accuracy since April 1 was actually about 99.901%. All of the above cases are now handled correctly, though; this goes to show the importance of using a whitelist if you’re going to use an aggressive blacklist.

The blacklist offered makes extensive use of regular expressions, or regexes. You need to download the PCRE Regular Expression engine in order to use regexes in K9.

Note: I have not provided my personal whitelist, because it wouldn’t do anybody else any good. The people, groups, and lists whose e-mails I always want are most likely not those people, groups, and lists that interest you. The best way to use this blacklist is by developing your own whitelist, using the context menus (right click menus) in K9.