Previous post on spam covered checkpoints where antispam filters can be applied . However nature of filters matters a lot for setting up effective antispam solution.
Manual moderation
Sadly manual moderation is most effective. No filters are sophisticated enough to deal with every spam attack possible and human decision is best authority here. It may be certainly unproductive to deal with everything manually but it’s not right to escape manual work. The more human input and tuning filters get the more effective can they become.
Custom filters
Most primitive form of filtering are custom rules. Something as simple as “if address is not mine then it’s spam” can sometimes do wonders. Creating filters manually may not be productive but it’s often pretty effective.
White list
White list assumes that every message that doesn’t come from previously approved sender is supposedly spam. It’s obviously good for keeping spam out. Together with every message from new sender. While having extreme rate of false positives on its own white listing is often applied first to get important messages through without risk of losing them to other filters. Some systems allow to automatically add to white list sources that passed moderation once.
Black list
Black list checks message against list of text strings and/or senders assuming it is spam if there is match found. Black list is extremely effective against huge volumes of spam with roughly same content. Overall black list is as good as you can make (or get) it. There is also danger of false positives if list reacts to words and phrases that can appear in legitimate messages.
Karma
Karma is like light mix of white and black lists. It takes in past moderation events and calculates modifiers for specific senders. It’s considered effective in long term but is not helpful against new messages or human spam that may start with few valid messages first.
Bayesian filtering
This one is based on pure math and is extremely effective. Bayesian filters keep database of all words they had ever encountered and how often they occurred in spam and non-spam messages. Upon receiving new message filter looks up all words in it and calculates probability of it being spam. Downside is that it relies on manual correction and slightly susceptible to poisoning - when big chunk of valid text is used to get small chunk of spam along. It doesn’t react to randomly generated text well either.
Behavior analysis
Instead of analyzing message this method tries to analyze sender. It looks for signs that are common for human operated software but uncommon for bots. Ability to process JavaScript is used as indicator very often for web but it gives false positives on old browsers (or those that have JS disabled).
Proof of work
This method makes sender perform additional tasks. They may be tasks that can only be performed by humans (captchas, questions) or calculating. Latter is kinda upgraded behavior analysis with additional effect of slowing down spam bot.
Removing value
Google had popularized nofollow attribute for links claiming it would reduce online spam by removing value of spammed links for search engine optimization. Well it had no effect on spam at all and nofollow was turned into weapon to fight advertisement paid links. Removing value is extremely ineffective because carpet bombing is main concept behind spam. It doesn’t really care to check if messages are bringing value on case by case basis.
Poisoning
Poisoning tries to render spam bot ineffective, usually by feeding it huge amount of falsified data. It’s not widely used and effectiveness is questionable.
Honey pot
Method tries to detect spam by getting extra action that won’t be performed by human in same conditions. Extra line in form that says “don’t fill me” is usual example.
Dynamic method of sending
Periodic change of method to send message prevents bots from remembering it. Disposable email addresses and changing contact forms fall under this. Can be effective (if automated) for reducing amount of spam but can’t eliminate it completely. It can also lead to losing messages if expired method is used to send valid one.
Spam collection
This methods usually relies on collaboration from multiply participants in creating huge spam database. Messages are simply checked against it by hash or otherwise. Effect depends on database quality and downside is that such database can be possibly poisoned for treating valid messages as spam.
Next post in series is going to cover some factors to consider in choosing filter and examples from my personal experience.
Lyndi #
Rarst #
How anti-spam filters work #
Rarst #
emailspam #
Rarst #
emailspam #