|
CMU-ISRI-04-121
Institute for Software Research International
School of Computer Science, Carnegie Mellon University
CMU-ISRI-04-121
ScamSlam: An Architecture for Learning the Criminal
Relations Behind Scam Spam
Edoardo Airoldi, Bradley Malin
May 2004
Data Privacy Laboratory
CMU-ISRI-04-121.ps
CMU-ISRI-04-121.pdf
Keywords:
Unsolicited communications currently accounts for over sixty
percent of all sent e-mail with projections reaching the
mid-eighties. While much spam is innocuous, a portion is
engineered by criminals to prey upon, or scam, unsuspecting
people. The senders of scam spam attempt to mask their messages
as non-spam and con through a range of tactics, including pyramid
schemes, securities fraud, and identity theft via phisher mechanisms
(e.g. faux PayPal or AOL websites). To lessen the suspicion of
fraudulent activities, scam messages sent by the same individual,
or collaborating group, augment the text of their messages and
assume an endless number of pseudonyms with an equal number of
different stories. In this paper, we introduce ScamSlam, a software
system designed to learn the underlying number criminal cells
perpetrating a particular type of scam, as well as to identify which
scam spam messages were written by which cell. The system consists
of two main components; 1) a filtering mechanism based on a Poisson
classifier to separate scam from general spam and non-spam messages,
and 2) a message normalization and clustering technique to relate
scam messages to one another. We apply ScamSlam to a corpus of
approximately 500 scam messages communicating the Nigerian advance
fee fraud. The scam filtration method filters out greater than 99% of
scam messages, which vastly outperforms well known spam filtering
software which catches only 82% of the scam messages. Through the
clustering component, we discover that at least half of all scam
messages are accounted for by 20 individuals or collaborating groups.
17 pages
|