CMU-CS-10-122
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-10-122

Privacy-Preserving Distributed, Automated
Signature-Based Detection of New Internet Worms

Hyang-Ah Kim

May 2010

Ph.D. Thesis

CMU-CS-10-122.pdf


Keywords: Internet Worm Containment, Worm Signature Generation, Content Prevalence Analysis, Privacy-Preserving Collaboration, Distributed Monitoring

This dissertation develops techniques, based on monitoring network traffic, that automate signature generation for wide-spreading malicious payloads such as Internet worms. Fast signature detection is required to achieve effective content-based filtering. The main thesis is that content prevalence analysis in network payloads across distributed networks is a good basis for automated signature generation for wide-spreading malicious payloads, and can be performed without compromising the privacy of participating networks.

Content-prevalence analysis extracts unique payload patterns that are identical and invariant over all the flows that convey a wide-spreading malicious payload. Distributed monitoring enables us to rapidly capture many sample payloads, thus expediting the signature generation. Extra care for privacy encourages more networks to participate in the distributed monitoring and makes the approach practical.

The first part of this dissertation presents a system, Autograph, that generates network payload signatures for Internet worms by utilizing the content invariance and wide-spreading communication patterns of Internet worm traffic. Signature generation speed is improved further by extending Autograph to share port scanner lists with distributed Autograph monitors. Trace-driven simulation shows the fundamental trade-off between early generation of signatures for novel worms and specificity of the generated signatures.

Distributed monitoring is a recognized technique in security to expedite worm detection. However, extra care for privacy must be taken. The second part of the dissertation presents two techniques for privacy-preserving distributed signature generation. HotItemID protects the data and owner privacy by using sampling techniques and hiding private data in a crowd. Another technique protects privacy using privacy-preserving multiset operation framework. The technique relies on a semantically secure homomorphic cryptosystem and arithmetic operations over polynomial representation of sets. Both techniques protect privacy based on the assumption that a payload appearing in multiple locations should not be private. The dissertation confirms the assumption by studying real network traffic traces, and shows that privacy-preserving distributed worm signature detection is feasible.

173 pages


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu