«Untangling Attribution David D. Clark* and Susan Landau** I. Introduction In February 2010, former Director of the National Security Agency Mike ...»
David D. Clark* and Susan Landau**
In February 2010, former Director of the National Security Agency
Mike McConnell wrote, "We need to develop an early-warning system to
monitor cyberspace, identify intrusions and locate the source of attacks with
a trail of evidence that can support diplomatic, military and legal options
and we must be able to do this in milliseconds. More specifically, we need to
reengineer the Internet to make attribution, geolocation, intelligence analysis and impact assessment - who did it, from where, why and what was the result - more manageable."I The Internet was not designed with the goal of deterrence in mind, and perhaps a future Internet should be designed differently. McConnell's statement is part of a recurring theme that a secure Internet must provide better attribution for actions occurring on the network. Although attribution generally means assigning a cause to an action, as used here attribution refers to identifying the agent responsible for the action (specifically, "determining * David Clark, Senior Research Scientist, MIT, Cambridge MA 02139, ddc acsail.mit.edu.
Clark's effort on this work was funded by the Office of Naval Research under award number N00014-08-1-0898. Any opinions, findings, and conclusions or recommendations expressed in this Essay are those of the authors and do not necessarily reflect the views of the Office of Naval Research.
" Susan Landau, Fellow, Radcliffe Institute for Advanced Study, Harvard University, Cambridge, MA 02138, susan.landau(a) rivacvink.oru. An earlier version of this Essay appeared in COMMI. ON DLTLRRING CYBERATTACKS, NAT'L RLSLARCH COUNCIL,
PROCELDINGS OF A WORKSHOP ON DLTLRRING CYBLRATTACKS: INFORMINGSTRATEGILS AND DLVLLOPING OPTIONS FOR U.S. POLICY, 25-40 (2010), available at http://wwwnal).edu/catalo /12997.html.
I Mike McConnell, Mike McConnell on How to Win the Cyber-war We're Losing, WASH. POST, Feb. 28, 2010, http://wwv.washingtonpost.com/wpdyn/content/article/2010/02/2i AR201002202493hitml.
Copyright C 2011 by the President and Fellows of Harvard College, David D. Clark, and Susan Landau 2011 / UntanglingAttribution the identity or location of an attacker or an attacker's intermediary.)" 2 This links the word to the more general idea of identiy, in its various meanings.
Attribution is central to deterrence, the idea that one can dissuade attackers from acting through fear of some sort of retaliation. Retaliation requires knowing with full certainty who the attackers are. In particular, there have been calls for a stronger form of personal identification that can be observed in the network.3 A technically nonsensical but nonetheless clear complaint might be: "Why don't packets have license plates?" This is called the attribution problem. There are many types of attribution, and different types are useful in different contexts. We believe that what has been described as the attribution problem is actually a number of problems rolled together. Attribution is certainly not one size fits all.
Attribution on the Internet can mean the owner of the machine (e.g., the Enron Corporation), the physical location of the machine (e.g., Houston, Estonia, China), or the individual who is actually responsible for the actions. The differences between these varied forms of attribution motivate this Essay. Our goal is to tease apart the attribution problems in order to determine under which circumstances which types of attribution would actually be useful.
In summary, we draw the following conclusions:
1. Network-level addresses (IP addresses) are more useful than is often thought as a starting point for attribution, in those cases where attribution is relevant.4
& GREGORY N. LARSEN, INST. FOR DLI. ANALYSIS, TECHNIQUES2 DAVID A. WHEELER FOR CYBER ATTACK ATTRIBUTION ES-1 (2003), available at http://wwv.dtic.mil/c ibin GetTRDoc?AD ADA46B8aQ.
See, e.g, STEWART A. BAKER, SKATING ON STILTS: WHY WE AREN'T STOPPING TOMORROW'S TERRORISM 231-32 (2010), available at http://media.hoover.or /sites default/files /documents/ Skating on Stilts Big Brothers R evenge 223.pdf (describing the proposals for attribution put forward by the former assistant Secretary for Policy at the Department of Homeland Security): CSIS COAMISSION ON
CYBLRSLCURITY FOR THE 44TH PRESIDENCY, SECURING CYBLRSPACE FOR THE 44THPRESIDENCY 62 (2008), available at http://csis.oru/files/media/csis/)ubs/081208 securinoc herspace 44.)df ("Creating the ability to know reliably what person or device is sending a particular data stream in cyberspace must be part of an effective cybersecurity strategy.").
4 See, e.g., W. Earl Boerbert, A Siney of Challenges in Attibution, in COAIM. ON DETERRING
CYBLRATTACKS, NAT'L RESEARCH COUNCIL, PROCEEDINGS OF A WORKSHOP ON
DETERRING CYBLRATTACKS: INFORMING STRATEGIES AND DEVELOPING OPTIONS FORHarvardNationalSecurity Journal/ Vol. 2
2. Redesigning the Internet so that all actions can be robustly attributed to a person would not help to deter the sophisticated attacks we are seeing today. At the same time, such a change would raise numerous issues with respect to privacy, freedom of expression, and freedom of action, a trait of the current Internet valued by many including intelligence agencies.
3. The most challenging and complex attacks to deter are those we call multi-stage attacks, where the attacker infiltrates one computer to use as a platform to attack a second, and so on. These attacks, especially if they cross jurisdictional boundaries, raise technical and methodological barriers to attribution.
4. A prime problem for the research community is the issue of dealing with multi-stage attacks. This - rather than the issue of designing highly robust top-down identity schemes - is the problem that should be of central concern to network researchers.
To illustrate the utility of different sorts of attribution, we will use several examples of attacks. First we consider a distributed denial of service (DDoS) attack. As we discuss below, one aspect of dealing with DDoS attacks involves stopping or mitigating them as they occur. (This aspect may or may not be categorized as "deterrence," or instead just as good preparation.) To stop a DDoS attack, we want to shut off communication from the attacking machines, which would most obviously call for attribution at the level of an IP address. On the other hand, to bring the attacker - the bot-master - to justice requires a different type of attribution. We must find a person, not a machine. Unlike the information for halting the attack, this form of attribution is not needed in real time.
Next we consider a phishing attack, which attempts to extract information back from the recipient, so the attempted exploitation must include an IP address to which information is returned. The attribution question then becomes whether that address can effectively be translated into a higherlevel identity (such as a person). Attribution in the cases of information theft U.S. POLICY 41-52 (2010), available at htt://wvv.nap.edu/catalog/12997/.html (focusing on sophisticated attacks from state-sponsored agencies and concluding that attribution would not be a useful tool in those situations). For simpler and less sophisticated events, where one computer engages another directly., attribution may be a useful tool and we discuss the utility of IP addresses as a starting point for attribution in these cases.
2011 / UntanglingAttribution can be easy (relatively speaking) if the information is used in criminal ways (e.g., to generate false identities and open fake accounts), but extremely hard if the stolen data, such as flight plans for U.S. military equipment, disappears into another nation-state's military planning apparatus.
We start by putting attribution in the context of Internet communications and then move to examining different kinds of cyberexploitations and the role attribution plays in these. We follow by considering attribution from four vantage points: type of identity, timing of attribution (before, during, and after an event), type of investigator, and jurisdiction. By considering both what information is available (through types of identity and timing of attribution) and what type of investigation is being done (type of investigator and particulars of jurisdiction), we are better able to discern what the real needs are for attribution.
II. Brief Introduction to Internet Communications In common parlance, all parts of the Internet are often rolled together into a single phenomenon called "the Internet." Calls for better security are often framed in this simple way, but it is important to start with a more detailed model of the Internet's structure.
To its designers, the term "Internet" is reserved for the general platform that transports data from source to destination, in contrast to the various applications (email, the Web, games, voice, etc.), which are described as operating "on" or "over" the Internet. The data transport service of the Internet is based on packets - small units of data prefixed with delivery instructions. The analogy often used to describe a packet is an envelope, with an address on the outside and data on the inside. A better analogy might be a postcard, since unless the data is encrypted it too is visible as the packet is moved across the Internet.
The Internet is made up of a mesh of specialized computers called routers, and packets carry a destination address that is examined by each router in turn in order to select the next router to which to forward the packet. The format of the addresses found in packets is defined as part of the core Internet Protocol (IP), and they are usually referred to as IP addresses. Packets also carry a source IP address, which indicates where the packet came from (somewhat like the return address on a letter or postcard).
This address thus provides a form of attribution for the packet. Since the HarvardNationalSecurity Journal/ Vol. 2 routers do not use the source address as they forward a packet, much has been made of the fact that the source address can be forged or falsified by the sender. For a variety of reasons, it is not always easy for a router to verify a source address, even if it tries.) However, since the source address in a packet is used by the recipient of the packet to send a reply, if the initial sender is attempting to do more than send a flood of one-way packets, the source address of the packet has to be valid for the reply to arrive back. For this reason, the source address found in packets often provides a valid form of source attribution.
Above the packet service of the Internet we find the rich space of applications - applications that run "over" the packet service. At this level, some applications employ very robust means for each end to identify the other. When a customer connects to a bank, for example, the bank wants to be very sure that the customer has been correctly identified. The customer similarly wants to be sure that the bank is actually the bank, and not a falsified web site pretending to be the bank. Encrypted connections from browser to bank, 6 certificate hierarchies, passwords, and the like are used to achieve a level of mutual identification that is as trustworthy as is practical.
There are two important points to note about these application-level identity mechanisms. First, the strength of the identification mechanism is up to the application. Some applications such as banking require robust mutual identity. Other sites need robust identity, but rely on third parties to do the vetting, e.g., credit card companies do so for online merchants. Some sites, such as those that offer information on illness and medical options, are at pains not to gather identifying information, because they believe that offering their users private and anonymous access will encourage them to make frank enquiries.
Second, these schemes do not involve the packets. An Internet engineer would say that these schemes do not involve the Internet at all, but only the services that run on top of it. Certainly, some of these identity schemes involve third parties, such as credit card companies or merchant, One recent experiment concluded that nearly a third of Internet customers could spoof their source IP address without detection. ROBLRT BLVLRLY LT AL., UNDLRSTANDING THE O1
EFFICACY DLPLOYLD INTERNET SOURCE ADDRLSS VALIDATION FILTERING 1 (2009),available at http://wwxwcaida.org/publications/papers/2009/imc s oofer/imc spoofer.1df 6 The relevant protocols go by the acronyms of Secure Sockets Layer (SSL) and Transport Layer Security (TLS).
2011 / UntanglingAttribution certification services. But these, too, are "on top of' the Internet, and not "in" the Internet.
In contrast to these two forms of identity mechanisms - IP addresses and application-level exchange of identity credentials, the "license plates on packets" approach would imply some mandatory and robust form of personal level identifier associated with packets (independent of applications) that could be recorded and used by observers in the network.
This packet-level personal identifier, which might be proposed in the future for the Internet, is one focus of our concern.
III. Classes of Attacks
It has become standard to call anything from a piece of spam to a carefully designed intrusion and exfiltration of multiple files an "attack."
However, lumping together such a wide range of events does not help us understand the issues that arise; it is valuable to clarify terminology. As a 2009 National Research Council report on cyberattacks delineated, some attacks are really exploitations. Cyberattacks and cyberexploitations are similar in that they both rely on the existence of a vulnerability, access to exploit it, and software to accomplish the task,7 but cyberattacks are directed to disrupting or destroying the host (or some attached cyber or physical system), while cyberexploitations are directed towards gaining information.
Indeed a cyberexploitation may cause no explicit disruption or destruction at all. We will use that distinction. Attacks and exploitations run the gamut from the very public to the very hidden, and we will examine cyberattacks/cyberexploitations along that axis.
A. Bot-Net Based Attacks