«Untangling Attribution David D. Clark* and Susan Landau** I. Introduction In February 2010, former Director of the National Security Agency Mike ...»
Investigation of such theft is very difficult. To trace back across the network to the perpetrator may involve several stages through multiple machines in different jurisdictions. However, the data being stolen must follow some path back to the perpetrator, which raises the possibility of tracking. Possession of the stolen information may or may not be useful as evidence, depending on the sort of retribution contemplated. From a national security perspective, these type of cases are the most important to deter. They are also the ones least likely to be solved solely through technical means.
IV. Cascades of Attribution and Multi-Stage Attacks
Many attacks and exploits are multi-stage in character: for example, A penetrates computer B to use as a platform for penetrating C, which is then used to attack D. Deterrence means focusing on computer A. It does not do much good to ask what person or actor owns machines B and C — they were just penetrated in passing. Following the chain of attribution backwards toward A involves identifying the IP addresses that lead back from D to C to B to A. If that trail can be followed, then the investigator can attempt to learn what can be discovered about A.
It is important to note both the limits of mechanisms for attribution and the intentional complexity of the various attacks and exploits, which have been crafted precisely to confound attribution. Looking at our earlier examples, we see patterns that are both multi-step and multi-stage. For example, a DDoS attack has a first step in which the array of attack machines (the bot-net) is assembled. This step will be taken in a multi-stage way, with the machines, as they fall prey to the initial event that infiltrates them, reporting back to some intermediate control computer that itself may have been first infiltrated and corrupted. Then in the step where the machines launch the attack, the instructions describing the attack will have been preloaded, and perhaps launched using a timer or a signal sent Harvard National Security Journal / Vol. 2 through some complex signaling channel (e.g. a message to a chat channel), so that the controller is far away by the time that the attack is evident.
Of course, the multi-stage pattern is not unique to attacks and malicious behavior. Linking services together on multiple machines, such that A asks B to carry out some action, and B invokes C as part of the task, is the general idea behind composable services such as Web 2.0. In situations like this, A and B might exchange identity credentials, B and C might also do so, but C would not know whom A is.25 B is providing a service to its clients (e.g., A), and uses C as part of this service. Under normal circumstances, B would take on the responsibility of ensuring that the clients (e.g. A) are not undertaking unsuitable objectives when they invoke the service. In case of a bad event (consider the analog of a multi-car rear-end collision), C complains to B, and B complains to A.
When the multi-stage activity is malicious, of course, the issue is that the intermediate machine has been infiltrated and corrupted, so the machine is not acting in a responsible way or in ways that reflect the wishes if its owner/operator. The human operator of B may be seen at the origin of the attack, but is just a victim of a security flaw in his machine.
One of the conclusions of this Essay is that multi-stage attacks must be a focus of attention when considering attribution and deterrence. First, many attacks fit in this category, including sophisticated and crafty attacks designed to avoid attribution. Assigning blame to such attacks is very challenging and difficult. Second, when computers are penetrated by an attacker to use as a platform for a further attack, that penetration usually bypasses any sort of end-to-end exchange of application-level credentials.
Therefore, the only kind of attribution that can possibly be applied here is at the level of IP addresses. Personal-level attribution will not be a useful tool in tracing attribution or assigning blame, and dealing with these sorts of attacks does not provide a justification for requiring network-based, personal-level identification.
One legitimate example of this occurs in federated identity management systems: the Identity Provider knows that Service Provider A and Service Provider B (for example, a hotel and a car-rental agency) are both providing services for the same customer, but through the judicious use of pseudonyms, no one else, including the two service providers, can determine that fact.
2011 / Untangling Attribution While multi-stage attacks represent a serious challenge, we urge the research community to consider what might be done to improve the options for tracking back to an ultimate source. Any solution or improvement that might be found will certainly not be purely technical, but will be a mix of technical and policy tools. For example, one might imagine every user of the Internet being urged to keep a log of incoming and outgoing connections.
To avoid concerns about privacy, this log could be maintained under the control of the user himself — given today's technology, the sort of device called a “home router” could keep such a log with minimal additional cost for storage. But such a log would only be useful in a context where there are regulations as to when data could be requested from this log, by whom, etc.
And of course, the user might have failed to maintain such a log. In such a case, the “punishment” might be that the ISP serving that user is required to log the user's traffic — the cost for failing to self-protect is a loss of privacy.
This idea may not be suitable — we offer it only as an example to illustrate how technology and policy will have to be combined as part of any solution, and also to illustrate that jurisdictional issues (and variation of regulation across jurisdictions) will be central in dealing with these sorts of attacks.
As the discussion above points out, different types of cyberattacks and cyberexploitations raise different options for prevention and deterrence.
We have found it useful to think about attribution from different vantage
An IP address in a packet identifies an attachment point on the Internet. Roughly, by analogy to a street address, it indicates a location, but not who lives there. In many cases, of course, an address (both physical and Internet) can be linked to a person, or at least a family. Since residential Internet service is almost always provided by commercial Internet Service Providers (ISPs), they have billing information for all of their customers. If they choose to maintain a database that links billing information to the Internet addresses they give out to specific customers, they can trace back from address to personal identity. In the United States, the organizations that work to deter copyright infringement have had laws passed allowing them to obtain a subpoena for such information from ISPs. But unless this connection has been made, Internet addresses have meaning only at the level of a network endpoint, which usually maps to one or a small cluster of machines.26 Indeed, in many cases, an IP address cannot be identified with a particular machine because the machine has been on the network for a quite short period of time, such as in an airport lounge, hotel lobby, or coffeehouse.27 In many application-level identity schemes such as the banking example above, identity has meaning at the level of an individual. The bank may keep track of Internet addresses as supplemental information to be used in case of abuse, but the design of their identity system is intended to tie directly to an individual as the accountable agent, not a machine. The IP address is not used as part of establishing that identity.
A related kind of individual identity is the pseudonym. The idea of a pseudonym, as the term is usually used, is an identity that links to a specific individual, without revealing who that individual is. A pseudonym system Many homes have a device called a “home router,” which allows a small number of computers in the home to share one network connection. As the Internet is currently used, all these machines share one Internet address, so starting with that address there is no way to distinguish among those different machines. At a larger scale, an ISP (or a country) might use this same sort of technology to map a large number of machines to one address, making this sort of attribution even less effective.
27 See supra text accompanying note 8.
2011 / Untangling Attribution should have two goals. First, the pseudonym should not be easily linked to an actual person — the goal is freedom from attribution. Second, the pseudonym should not be easily stolen and co-opted by another individual — the speaker, although anonymous, should have the exclusive use of that identity. Encryption schemes can be used in various ways to achieve this sort of functionality, which is a sort of “anti-attribution.” To fully protect pseudonymous speech and other types of anonymous activities, it is necessary to complement application-level “antiattribution” mechanisms with tools to mask IP-level machine-based identity, since that can often be linked to human-level identity with some effort, as discussed above. Tools such as Tor28 are used to give IP-level anonymity to communications; they are employed by activists and dissidents, journalists, the military and the intelligence community, and many others to mask with whom the communication is occurring. Law enforcement uses Tor to visit websites and chat rooms without leaving behind a tell-tale government IP address, while the military uses Tor to enable personnel “in place” to communicate with headquarters without revealing their true identity.29 When Internet communications occur without the use of traffic analysis anonymizers such as Tor, the source and destination addresses in packets can be seen by every router that forwards the packet, and by any other sort of monitoring device that is in the path from the sender to the receiver. As such, these sorts of identity indicators are fairly public. In contrast, if two end points exchange identity credentials between themselves over an encrypted connection, that exchange is private to those two endTor is a tool developed by the U.S. Naval Research Lab to permit anonymous (at the IP level) use of the Internet. See TOR, https://www.torproject.org/ (last visited Feb. 25, 2011).
29 Tor allows anonymous communication and by its very nature does not reveal who is communicating with whom. Thus one cannot point to specific instances where the system was used to support military communications. However, various sources have claimed that Tor is in fact used by the military. See, e.g., Granting Anonymity, N.Y TIMES MAGAZINE, Dec.
17, 2010, http://www.nytimes.com/2010/12/19/magazine/19FOB-Mediumt.html?_r=1&ref=virginiaheffernan. The fact that the project had its genesis in the Naval Research Laboratory and was funded for many years by the Defense Advanced Research Projects Agency is a clear indication of the value of the program to national security communications. Tor is available for public use because “anonymity loves company,” that is, broad use of the system by outsiders hides the instances of national security and law enforcement communications. See Roger Dingledine & Nick Mathewson, Anonymity Loves Company: Usability and the Network Effect, in Security and Usability: Designing Systems that People Can Use, (Lorrie Cranor & Simson Garfinkel eds., 2005).
Harvard National Security Journal / Vol. 2 points.30 Even if a third party, such as a credit-card company, is involved in the identity verification, that third party has been invoked with the knowledge and concurrence of the initial end-points. The knowledge of the identity is restricted to those parties.
An analogy to monitoring IP addresses in the network might be security cameras. A camera on a public street captures our public behavior and a likeness of our face. But, it does not reveal who we are unless that face can be linked to some other aspect of identity. In contrast, in various circumstances we have to identify ourselves to some other entity (show a driver’s license, passport, credit card, etc.), but this transaction is specific to the circumstances at hand, and is normally not visible to a third-party observer. A security camera in a store provides an analogy to the logging of IP addresses by an endpoint. The images might be more easily linked to a customer transaction, and thus to other aspects of identity. But, the video captured by that camera is private to the store unless it chooses to reveal it (e.g., after a robbery) or it is demanded by an authorized third party (e.g., by a court order).
Using IP addresses as a starting point, one can try to derive forms of attribution other than at the level of the individual. IP addresses are usually allocated in blocks to Internet service providers (ISPs), corporations, universities, governments, and the like. Normally, the “owner” of a block of addresses is publicly recorded, so one can look up an address to see to whom it belongs. This can provide a starting point for investigation and subsequent fact-finding.
Another potential form of attribution is the “where” — geo-locating the end-point associated with the IP address on the face of the physical landscape. IP addresses are not allocated in a way that makes geo-location automatic — they are given out to actors that may have large geographic scope. Nonetheless, for many IP addresses, one can make a very accurate guess about where the end-point is located, since many networks have a hierarchical design to their physical connectivity, and map the addresses to The restriction of encrypted communication is critical here. If the observer is using technology called Deep Packet Inspection, or DPI, he can observe anything not encrypted, including identity credentials being exchanged end-to-end. Encryption does not hide everything; it is possible, for example, to determine the type of traffic (e.g., VoIP or video) even while the content itself is hidden.