Economics, Computer Science, and Policy

The Henry and Bryna David Lecture

MICHAEL KEARNS

Economics, Computer Science, and Policy

Cross-fertilization of ideas and techniques between economics and computer science is yielding fresh insights that can help inform policy decisions.

Perhaps as little as a decade ago, it might have seemed far-fetched for scientists to apply similar methodologies to problems as diverse as vaccination against infectious disease, the eradication of email spam, screening baggage for explosives, and packet forwarding in computer networks. But there are at least two compelling commonalities between these and many other problems. The first is that they can be expressed in a strongly economic or game-theoretic framework. For instance, individuals deciding whether to seek vaccination against a disease may consider how infectious the overall population is, which in turn depends on the vaccination decisions of others. The second commonality is that the problems considered take place over an underlying network structure that may be quite complex and asymmetric. The vulnerability of a party to infectious disease or spam or explosives depends strongly on the party’s interactions with other parties.

The growing importance of network views of scientific and social problems has by now been well documented and even popularized in books such as Malcolm Gladwell’s The Tipping Point, but the central relevance of economic principles in such problems is only beginning to be studied and understood. The interaction between the network and economic approaches to diverse and challenging problems, as well as the impact that this interaction can have on matters of policy, are the subjects I will explore here. And nowhere is this interaction more relevant and actively studied than in the field of computer science.

Research at the intersection of computer science and economics has flourished in recent years and is a source of great interest and excitement for both disciplines. One of the drivers of this exchange has been the realization that many aspects of our most important information networks, such as the Internet, might be better understood, managed, and improved when viewed as economic systems rather than as purely technological ones. Indeed, such networks display all of the properties classically associated with economic behavior, including decentralization, mixtures of competition and cooperation, adaptation, free riding, and tragedies of the commons.

I will begin with simple but compelling examples of economic thought in computer science, including its potential applications to policy issues such as the management of spam. Later, I will argue that the power and scale of the models and algorithms that computer scientists have developed may in turn provide new opportunities for traditional economic modeling.

The economics of computer science

The Internet provides perhaps the richest source of examples of economic inspiration within computer science. These examples range from macroscopic insights about the economic incentives of Internet users and their service providers to very specific game-theoretic models for the behavior of low-level Internet protocols for basic functionality, such as packet routing. Across this entire range, the economic insights often suggest potential solutions to difficult problems.

To elaborate on these insights, let us begin with some background. At practically every level of detail, the Internet exhibits one of the most basic hallmarks of economic systems: decentralization. It is clear that the human users of the Internet are a decentralized population with heterogeneous needs, interests, and incentives. What is less widely known is that the same statement applies to the organizations that build, manage, and maintain what we call monolithically the Internet. In addition to being physically distributed, the Internet is a loose and continually changing amalgamation of administratively and economically distinct and disparate subnetworks (often called autonomous systems). These subnetworks vary dramatically in size and may be operated by institutions that simply need to provide local connectivity (such as the autonomous system administered by the University of Pennsylvania), or they may be in the business of providing services at a profit (such as large backbone providers like AT&T). There is great potential for insight from studying the potentially competing economic incentives of these autonomous systems and their users. Indeed, formal contractual and financial agreements between different autonomous systems specifying their connectivity, exchange of data and pricing, and other interactions are common.

Against this backdrop of decentralized administration, a number of prominent researchers have posited that many of the most common problems associated with the Internet, such as email spam, viruses, and denial-of-service attacks, are fundamentally economic problems at their core. They may be made possible by networking technology, and one may look for technological solutions, but it is often more effective to attack these problems at their economic roots.

For example, many observers argue that problems such as spam would be best addressed upstream in the network. They contend that it is more efficient to have Internet service providers (ISPs) filter spam from legitimate mail, rather than to have every end user install spam protection. But such purely technological observations ignore the question of whether the ISPs have an economic incentive to address such problems. Indeed, it has been noted that some ISPs have contractual arrangements with their corporate customers that charge fees based on the volume of data carried to and from the customer. Thus, in principle, an ISP could view spam or a denial-of-service attack as a source of potential revenue.

An economic view of the same problem is that spam has proliferated because the creation of a nearly free public resource (electronic mail) whose usage is unlimited has resulted in a favorable return on investment for email marketing, even under infinitesimal take rates for the products or services offered. One approach is to accept this economic condition and pursue technological defenses such as spam filters or whitelists and blacklists of email addresses. An alternative is to seek to alter the economic equation that makes spam profitable in the first place, by charging a fee for each email sent. The charge should be sufficiently small that email remains a nearly free resource (aside from Internet access costs) for nearly all non-spammers, but sufficiently large to eradicate or greatly reduce the spammer’s profitability. There are many challenging issues to be worked out in any such scheme, including who is to be paid and how to aggregate all the so-called micropayments. But the mere fact that computer scientists are now incorporating real-world economics directly into their solutions or policy considerations represents a significant shift in their view of technology and its management.

As an example of economic thought at the level of the Internet’s underlying protocols, consider the problem of routing, the multi-hop transmission of data packets across the network. Although a delay of a second or two is unimportant for email and many other Internet operations, it can be a serious problem for applications such as teleconferencing and Internet telephony, where any latency in transmission severely degrades usefulness. For these applications, the goal is not simply to move data from point A to point B in the Internet, but to find the fastest possible route among the innumerable possible paths through the distributed network. Of course, which route is the fastest is not static. The speed of electronic traffic, like the speed of road traffic, depends on how much other traffic is taking the same route, and the electronic routes can be similarly disrupted by “accidents” in the form of temporary outages or failures of links.

Recently, computer scientists have begun to consider this problem from a game-theoretic perspective. In this formulation, one regards a network user (whether human or software) as a player in a large-population game in which the goal is to route data from one point to another in the network. There are many possible paths between the source and destination points, and these different paths constitute the choice of actions available to the player. Being “rational” in this context means choosing the path that minimizes the latency suffered in routing the data. A series of striking recent mathematical results has established that the “price of anarchy”— a measure of how much worse the overall latency can be at competitive equilibrium in comparison to the best “socialist” or centrally mandated nonequilibrium choice of routes—is surprisingly small under certain conditions. In other words, in many cases there is not much improvement in network behavior to be had from even the most laborious centralized network design. In addition to their descriptive properties, such results also have policy implications. For example, a number of plausible schemes for levying taxes on transmission over congested links of the network have been shown to significantly reduce the price of anarchy.

These examples are just some of the many cases of computer scientists using the insights of economics to solve problems. Others include the study of electronic commerce and the analysis and design of complex digital markets and auctions.

The computer science of economics

The flow of ideas between computer science and economics is traveling in both directions, as some economists have begun to apply the insights and methods of computer science to new and old problems. The computer scientist’s interest in economics has been accompanied by an explosion of research on algorithmic issues in economic modeling, due in large part to the fact that the economic models being entertained in computer science are often of extraordinarily large dimension. In the game-theoretic routing example discussed above, the number of players equals the number of network users, and the number of actions equals the number of routes through the network. Representing such models in the so-called normal form of traditional game theory (where one explicitly enumerates all the possibilities) is infeasible. In recent years, computer scientists have been examining new ways of representing or encoding such high-dimensional models.

Such new encodings are of little value unless there are attendant algorithms that can manipulate them efficiently (for instance, performing equilibrium and related computations). Although the computational complexity of certain basic problems remains unresolved, great strides have been made in the development of fast algorithms for many high-dimensional economic models. In short, it appears that from a computational perspective, many aspects of economic and game-theoretic modeling may be ready to scale up. We can now undertake the construction and algorithmic manipulation of numerical economic models whose complexity greatly exceeds those one could have contemplated a decade ago.

Finally, it also turns out that the analytical and mathematical methods of computer science are extremely well suited to examining the ways in which the structure of an economic model might influence the expected outcomes in the models; for instance, the way in which the topology of a routing network might influence the congestion experienced at game-theoretic equilibrium, the way in which the connectivity pattern of a goods exchange network might influence the variation in prices or the distribution of wealth, or (as we shall see shortly) the way in which transfers of passengers between air carriers might influence their investment decisions for improved security.

Interdependence in computer security

To illustrate some of these computational trends, I will examine a case study drawn from my own work on a class of economic models known as interdependent security (IDS) games, which nicely capture a wide range of commonly occurring risk management scenarios. Howard Kunreuther of the Wharton School at the University of Pennsylvania and Geoffrey Heal of Columbia University introduced the notion of IDS games, which are meant to capture settings in which decisions to invest in risk mitigation may be heavily influenced by natural notions of risk “contagion.” Interestingly, this class is sufficiently general that it models problems in areas as diverse as infectious disease vaccination, corporate compliance, computer network security, investment in research, and airline baggage screening. It also presents nontrivial computational challenges.

Let us introduce the IDS model with another example from computer science, the problem of securing a shared computer resource. Suppose you have a desktop computer with its own software and memory, but you also keep your largest and most important data files on a hard disk drive that is shared with many other users. Your primary security concern is thus that a virus or other piece of malicious software might erase the contents of this shared hard drive. Your desktop computer and its contents, including all of your email, any programs or files you download, and so on, is a potential point of entry for such “malware,’’ but of course so are the desktop machines of all the other users of the hard disk.

Now imagine that you face the decision of whether to download the most recent updates to your standard desktop security software, such as Norton Anti-Virus. This is a distinct investment decision; not so much because of the monetary cost but because it takes time and energy for you to perform the update. If your diligence were the only factor in protecting the valued hard drive, your incentive to suffer the hassle would be high. But it is not the only factor. The safety of the hard drive is dependent on the diligence of all of the users whose desktop machines present potential points of compromise, since laziness on the part of just a single user could result in the breach that wipes the disk clean forever. Furthermore, some of those users may not keep any important files on the drive and therefore have considerably less concern than you about the drive’s safety.

Thus, your incentive to invest is highly interdependent with the actions of the other players in this game. In particular, if there are many users, and essentially none of them are currently keeping their security software updated, your diligence would have at best an incremental effect on an already highly vulnerable disk, and it will not be worth your time to update your security software. At the other extreme, if the others are reliable in their security updates, your negligence would constitute the primary source of vulnerability, so you can have a first-order effect on the disk’s safety by investing in the virus updates.

Kunreuther and Heal propose a game-theoretic model for this and many other related problems. Although the formal mathematical details of this model are beyond our current scope, the main features are as follows:

  • Each player (such as the disk users above) in the game has an investment decision (such as downloading security updates) to make. The investment can marginally reduce the risk of a catastrophic event (such as the erasure of the disk).
  • Each player’s risk can be decomposed into direct and indirect sources. The direct risk is that which arises because of a players own actions or inactions, and it can be reduced or eradicated by sufficient investment. The indirect risk is entirely in the hands of the rest of the player population. In the current example, your direct risk is the risk that the disk will be erased by malware entering the system through your own desktop machine. Your remaining risk is the indirect risk that the disk will be erased by malware entering through someone else’s machine. You can reduce the former by doing the updates, but you can do nothing about the latter.
  • Rational players will choose to invest according to the tradeoff presented by the two sources of risk. In the current example, you would choose to invest the least update effort when all other parties are negligent (since the disk is so vulnerable already that there is little help you alone can provide) and the most when all other parties are diligent (since you constitute the primary source of risk).
  • The predicted outcomes of the IDS model are the (Nash) equilibria that can arise when all players are rational; that is, the collective investment decisions in which no player can benefit by unilateral deviation. In such an equilibrium, every party is optimizing their behavior according to their own cost/benefit tradeoff and the behavior of the rest of the population.

Baggage screening unraveled

In the shared disk example, there is no interesting network structure per se, in the sense that users interact with each other solely through a shared resource, and the effect of any given user is the same on all other users: By being negligent, you reduce the security of the disk by the same amount for everyone, not differentially for different parties. In other words, there are no network asymmetries: All pairs of parties have the same interactions, even though specific individuals may influence the overall outcome differently by their different behaviors.

Kunreuther and Heal naturally first examined settings in which such asymmetries are absent, so that all parties have the same direct and indirect risks. Such models permit not only efficient computation but even the creation of simple formulas for the possible equilibria. But in more realistic settings, asymmetries among the parties will abound, precluding simple characterizations and presenting significant computational challenges. It is exactly in such problems that the interests and strengths of computer science take hold.

A practical numerical and computational example of IDS was studied in recent work done in my group. In this example, the players are air carriers, the investment decision pertains to the amount of resources devoted to luggage screening for explosives, the catastrophic event is a midair explosion, and the network structure arises from baggage transfers between pairs of carriers.

Before describing our experiments, I provide some background. In the United States, individual air carriers determine the procedures and investments they each make in baggage screening for explosives and other contraband, subject to meeting minimum federal requirements. Individual bags are thus subjected to the procedures of whichever carrier a traveler boards at the beginning of a trip. If a bag is transferred from one carrier to another, the receiving carrier does not rescreen according to its own procedures but simply accepts the implicit validation of the first carrier. The reasons for this have primarily to do with efficiency and the cost of repeated screenings. The fact that carriers are free to apply procedures that exceed the federal requirements is witnessed by the practices of El Al Airlines, which is also an exception in that it does in fact screen transferred bags.

As in the shared disk example, there is thus a clear interdependent component to the problem of baggage screening. If a carrier receives a great volume of transfers from other carriers with lax security, it may actually have little incentive to invest in improved security for the bags it screens directly: The explosion risk presented by the transferred bags is already so high that the expense of the marginal improvement in direct check security is unjustified. (Note: For simplicity, I am not considering the expensive proposition of rescreening transferred bags, but only of improving security on directly checked luggage.) Alternatively, if the other airlines maintain extremely high screening standards, a less secure carrier’s main source of risk may be its own checked baggage, creating the incentive for improved screening. Kunreuther and Heal discuss how the fatal explosion aboard Pan Am flight 103 over Lockerbie, Scotland, in 1988 can be viewed as a deliberate exploitation of the interdependent risks of baggage screening.

The network structure in this case arises from the fact that there is true pairwise interaction between carriers (as opposed to the shared disk setting, where all interactions were indirect and occurred via the shared resource). Since not all pairs of airlines may transfer bags with each other, or may not do so in equal volume, strong asymmetries may emerge. Within the same network of transfers, some airlines may find themselves receiving many transfers from carriers with lax security, and others may receive transfers primarily from more responsible parties. On a global scale, one can imagine that such asymmetries might arise from political or regulatory practices in different geographical regions, demographic factors, and many other sources. Such a network structure might be expected to have a strong influence on outcomes, since the asymmetries in transfers will create asymmetries of incentives and therefore of behavior.

In the work of my group, we conducted the first large-scale computational and simulation study of IDS games. This simulation was based on a data set containing 35,362 records of actual civilian commercial flight reservations (both domestic and international) made on August 26, 2002. Each record contains a complete flight itinerary for a single individual and thus documents passenger (and therefore presumably baggage) transfers between the 122 commercial air carriers appearing in the data set. The data set contained no identifying information for individuals. Furthermore, since I am describing an idealized simulation based on limited data, I will not identify specific carriers in the ensuing discussion.

I will begin by discussing the raw data itself—in particular, the counts of transfers between carriers. Figure 1 shows a visualization of the transfer counts between the 36 busiest carriers (as measured by the total flight legs on the carrier appearing in the data). Along each of the horizontal axes, the carriers are arranged in order of number of flight legs (with rank 1 being the busiest carrier, and rank 36 the least). At each grid cell, the vertical bar shows the number of transfers from one particular carrier to another. Thus, transfers between pairs of the busiest (highest-rank) carriers appear at the far corner of the diagram; transfers between pairs of the least busy carriers in the near corner; and so on.