Targeted email attacks to enable computer network exploitation have become more prevalent, more insidious, and more widely documented in recent years. Beyond nuisance spam or phishing designed to trick users into revealing personal information, targeted malicious email (TME) facilitates computer network exploitation and the gathering of sensitive information from targeted networks. These targeted email attacks are not singular unrelated events, instead they are coordinated and persistent attack campaigns that can span years. This dissertation surveys and categorizes existing email filtering techniques, proposes and implements new methods for detecting targeted malicious email and compares these newly developed techniques to traditional detection methods. Current research and commercial methods for detecting illegitimate email are limited to addressing Internet scale email abuse, such as spam, but not focused on addressing targeted malicious emails. Furthermore, conventional tools such as anti-virus are vulnerability focused examining only the binary code of an email but ignoring all relevant contextual metadata.
This study first documents the existence of TME and characterizes it as a form of malicious email attack different than spam, phishing and other conventional illegitimate email. The quantitative research is conducted by analyzing email data from a large Fortune 500 company that has been subjected to these targeted emails. Persistent threat features, such as threat actor locale and weaponization tools, along with recipient oriented features, such as reputation and role, are leveraged with supervised data classification algorithms to demonstrate new techniques for detection of targeted malicious email. The specific tools, techniques, procedures, and infrastructure that a threat actor uses characterize the level and capability of a threat; the recipient's role and repeated targeting speak to the intent of the threat. Both sets of features are used in a random forest classifier to separate targeted malicious email from non-targeted malicious email. Performance of this data classifier is measured and compared to conventional email filtering techniques to demonstrate the added benefit of including these features. Performance evaluations are focused on false negative reduction since the cost of missing a targeted malicious email is far greater than the cost of mistakenly flagging a legitimate email as malicious.
Several findings are made in this study. First, targeted malicious email demonstrates association to persistent threat features as compared to non-targeted malicious email that does not. Second, targeted malicious email demonstrates association to recipient oriented features as compared to non-targeted malicious email that does not. Finally, detection of targeted malicious email using persistent threat and recipient oriented features results in significantly fewer false negatives than detection of targeted malicious email using conventional email filtering techniques. This improvement in false negative rates comes with acceptable false positive rates.
Future research can expand upon the features introduced in this study. For example, additional persistent threat features can be harvested from file level metadata (e.g. author names, document path locations) and additional recipient oriented features can be incorporated from organization databases. In this study, a binary outcome is defined: emails are either targeted malicious or non-targeted malicious. Future work can explore multi-class outcomes that pair specific threat actor campaigns and targeted recipients.
|Adviser||Julie J.C.H. Ryan|
|School||THE GEORGE WASHINGTON UNIVERSITY|
|Subjects||Statistics; Information technology; Computer science|
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.