By Dave Michmerhuizen – Research Scientist, Shawn Anderson – Engineer
On it's surface, filtering spam can seem easy. Just write a program that scans your incoming mail for the three A's – Bulova, Viagra and Nigeria. Have it delete anything it finds. Done!
Really, it's not so simple.
To show you what we mean, take a look at the following email that turned up in our spam traps recently.
It looks like it should be easy to detect, right? The word “Viagra” is right there!
Except it turns out it isn't. The HTML code for this is a huge mess of column-wise tables, completely hiding all of the words you see on the screen.
All of the text in the spam is similarly disguised, and all of the text is wrapped in links to a fake Canadian Pharmacy site.
Rogue pharmacy spammers can be very creative when it comes to hiding the text of their messages, and deliberate misspellings are very common. Often we see the word Viagra in a message as V1agra or Vaigra or even Vi@gra. Over the years we've seen other creative attempts to hide message content with everything from color and font formatting to using look-alike unicode characters to specially generated OCR-defeating images.
The war between spammers and anti-spammers is never really over. It's always transitioning, and we're always watching.