Today’s breakthrough is discovering a great and free spam filter for Outlook 2000 (and up) which uses Bayesian filtering. It’s called Spammunition.
For those not familiar with these filters, a Bayesian filter operates on the assumption that history repeats itself: the odds of something being true in the future (or the present) can be predicted extremely well from the odds that the same thing was true in the past. In other words, without doing any complex combinatorics or statistical analysis, the fact that a playing card is a red 50% of the time over hundreds of draws is a great predictor that the odds of the next card being red are also 50%. In terms of emails, this would imply that if emails containing the word “weight” are spam 99% of the time (they are), we can delete any future emails containing that word with 99% confidence.
The trick here is to get beyond that 99%. Imagine if 1% of your legitimate emails get randomly deleted, and you happen to own an online business. That could be very expensive! So, we look at the weights of all the words in the message, including headers, subjects, etc, and combine them appropriately. Some filters using this technique (called Bayesian filtering, because it’s based on Bayes’s Rule) pass 99.95% accuracy. Now, that’s getting there. For more info, read Paul Graham’s website.