CMU-ISRI-04-121
Institute for Software Research International
School of Computer Science, Carnegie Mellon University



CMU-ISRI-04-121

ScamSlam: An Architecture for Learning the Criminal
Relations Behind Scam Spam

Edoardo Airoldi, Bradley Malin

May 2004

Data Privacy Laboratory

CMU-ISRI-04-121.ps
CMU-ISRI-04-121.pdf


Keywords:


Unsolicited communications currently accounts for over sixty percent of all sent e-mail with projections reaching the mid-eighties. While much spam is innocuous, a portion is engineered by criminals to prey upon, or scam, unsuspecting people. The senders of scam spam attempt to mask their messages as non-spam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, augment the text of their messages and assume an endless number of pseudonyms with an equal number of different stories. In this paper, we introduce ScamSlam, a software system designed to learn the underlying number criminal cells perpetrating a particular type of scam, as well as to identify which scam spam messages were written by which cell. The system consists of two main components; 1) a filtering mechanism based on a Poisson classifier to separate scam from general spam and non-spam messages, and 2) a message normalization and clustering technique to relate scam messages to one another. We apply ScamSlam to a corpus of approximately 500 scam messages communicating the Nigerian advance fee fraud. The scam filtration method filters out greater than 99% of scam messages, which vastly outperforms well known spam filtering software which catches only 82% of the scam messages. Through the clustering component, we discover that at least half of all scam messages are accounted for by 20 individuals or collaborating groups.

17 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu