Institute for Software Research
School of Computer Science, Carnegie Mellon University


Learning to Detect Phishing Emails

Ian Fette, Norman Sadeh, Anthony Tomasic

June 2006

Also appears as Carnegie Mellon Cyber Laboratory
Technical Report CMU-CyLab-06-112


Keywords: Phishing, email, filtering semantic attacks, learning

There are an increasing number of emails purporting to be from a trusted entity that attempt to deceive users into providing account or identity information, commonly known as "phishing" emails. Traditional spam filters are not adequately detecting these undesirable emails, and this causes problems for both consumers and businesses wishing to do business online. From a learning perspective,this is a challenging problem. At first glance, the problem appears to be a simple text classification problem, but the classification is confounded by the fact that the class of "phishing" emails is often designed to look exactly like the class of real emails. We propose a new framework for detecting these malicious emails called PILFER. By incorporating features specifically designed to highlight the deceptive methods used to fool users, we are able to accurately classify over 92% of phishing emails, while maintaining a false positive rate on the order of 0.1%.

16 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by