Jackstraws: Picking Command and Control Connections from Bot Traffic
Gregoire Jacob, Ralf Hund, Christopher Kruegel, Thorsten Holz
USENIX Security Symposium, San Francisco, CA, August 2011
A distinguishing characteristic of a bot is its ability to establish a command and control (C&C) channel. The typical approach to build detection models for C&C traffic and to identify C&C endpoints (IPs and domains of C&C servers) is to execute a bot in a controlled environment and monitor its outgoing network connections. Using the bot traffic, one can then craft signatures that match C&C connections or blacklist the IPs or domains that the packets are sent to. Unfortunately, this process is not as easy as it seems. For example, bots often open a large number of additional connections to legitimate sites (to perform click fraud or to query for the current time), and a bot can deliberately produce "noise" – bogus connections that make the analysis more difficult. Thus, before one can build a model for C&C traffic or blacklist IPs/domains, one first has to pick the C&C connections among all the network traffic that a bot produces.
In this paper, we present Jackstraws, a system that accurately identifies C&C connections. To this end, we leverage host-based information that provides insights into which data is sent over each network connection as well as the ways in which a bot processes information that it receives. More precisely, we associate with each network connection a behavior graph that captures the system calls that lead to a network connection, as well as the system calls that operate on data that is returned. By using machine learning techniques and a training set of graphs that are associated with known C&C connections, we automatically extract and generalize graph templates that capture the core of different types of C&C activity. Later, we can use these C&C templates to match against behavior graphs produced by other bots. Our results show that Jackstraws can accurately detect C&C connections, even for novel bot families that were not used for template generation.[PDF]