# Mechanism design

## Overview

Algorithm design vs. mechanism design. The starting point of Algorithm design is a specified input-output behavior: $L$ is a list of numbers, and $SORT(L)$ is the same list sorted in ascending order; $f$ is boolean formula, and $SAT(f)$ is a satisfying assignment – if one exists; $D$ is a list of labeled examples, and $Model(D)$ is a trained model for labeling previously unseed examples. Some of these tasks are incredibly complex. Much of computer science (and applied mathematics) is about finding new ways of performing these tasks. No matter how complex the algorithmic task at hand, it is normally assumed that the input $x$ does not depend on algorithm $A$.

When the inputs to an algorithm $A$ come from outside parties interested in the output of the algorithm, this assumption breaks down. For example, underlying an auction is typically a simple optimization algorithm (with a single item for sale, the algorithm is just the $max$ function). However, inputs come from bidders who might (and will) optimize their bids to get an improved outcome (for example, get the item for as little money as possible). The field of mechanism design studies designing algorithms where the inputs are provided by self-interested (and potentially strategic) agents. Mechanism design draws on game theory for its modeling foundations. Algorithmic mechanism design specifically studies settings where both the algorithmic and game-theoreic aspects play an important role.

Within theoretical computer science, there are several reasons to study algorithmic mechanism design now. Algorithmic mechanism design is becoming more important as algorithms enter more application domains. Even in areas where the “algorithms” involved are fairly simple, the availability of data and cheap compute means that it is easy for participants to be more strategic. This means that systems design needs to incorporate mechanism design thinking. Ideally, a set of transformations and heuristics would make this process automatic or at least routine. On the other hand, algorithms (along with data and hardware) are evolving at a rapid rate, giving rise to exciting new opportunities for making systems better using mechanims that incorporate these algorithmic advances.

While both mechanism design and algorithm design and deployment are progressing rapidly, algorithms are being developed and deployed at a faster rate. This leads to a growing gap between areas and ways in which algorithms are used and mechanism design theory and craft. My overall goal is to develop generic tool for narrowing this gap. In technical terms it means developing tools, techniques, and concepts for reductions from algorithms to mechanisms.

## Black-box reductions from algorithms to mechanisms without money

Mechanism design without money. Abstractly, mechanism design addresses the problem of aggregating preferences of multiple parties into an outcome. Examples of mechanisms include the housing market, voting, college admissions, and the electricity market. Mechanisms can be (roughly) partitioned into mechanisms with and without money. Of the examples above, the housing and electricity markets explicitly involve money, while voting and college admissions do not. Mechanism design without money is generally very challenging. Even the relatively “simple” case of voting is rife with impossibility results.

The VCG mechanism with money. Mechanism design with money is also challenging, but it admits several very general solution concepts, such as the celebrated Vickrey–Clarke–Groves mechanism. In one sentence the VCG mechanism can be summarized as “optimize outcome, charge each player their externality”. A player’s externality is the effect that player has on the welfare of the other players. For example, in the special case of a single-item auction, VCG becomes the second price auction: the winner deprives other players of the item, and their externality is the highest utility for the item among the remaining players. The VCG has some wonderful properties, including dominant-strategy truthfulness: players never benefit from misreporting their preferences. It also has some significant limitations. In addition to requiring money, the mechanism is quite unstable numerically, making it challenging to combine with heuristic algorithms.

Repeated games vs. single shot. The main reason why mechanism design with money is “easier” than mechanism design without money, is that money essentially turns each game into a repeated game. Someone who doesn’t win in an auction, gets to keep their money for future use. It is much easier to sustain “good” behavior (such as truthful reporting) in a repeated game. Even without money, it is much easier to run a repeated mechanism by introducing a points currency. The currency will have no value outside of a repeated mechanism but can be used to connect rounds of the mechanism. For example, if we need to repeatedly allocate tasks to people, we can create a point system where less desirable tasks “pay” more points (with rates determined via a market-like mechanism), and where, over the long run, all participants are expected to “earn” the same number of points.

Continuous optimization based on local search. Suppose we need to maximize a function $F(x)$ on some domain $\mathcal{X}$. A surprisingly powerful approach is gradient descent: make a sequence of small local steps, each improving the value of $F$ slightly. This method is guaranteed to converge to a local maximum (and the global maximum when the global maximum is unique, such as when $F$ is concave). Gradient descent and its variants, such as stochastic gradient descent, play an important role in online learning, optimization, and machine learning model training. Motivated primarily by the latter application, these algorithms have seen a lot of activity, both in terms of theory and implementation. Most optimization algorithms can be recast in terms of a local gradient-based search. It should be noted that while the local procedure is typically quite simple, the overall algorithm here may be quite complicated, with a lot of effort going into tweaking the objective function (e.g. through regularization), step sizes (aka learning rates) and other meta-parameters.

Putting the pieces together: a new optimization-to-mechanism reduction. We are given a gradient-descent-based optimization procedure, and wish to turn it into a mechanism that incentivizes participants to reveal their types truthfully. Our proposed APEX (Adaptive Pricing Equalizing Externalities) framework does it by applying VCG locally to each optimization step. This sidesteps two of the main difficulties of applying VCG in combination with optimization algorithms: (1) by making the steps local, the optimization problem becomes very simple, avoiding the sensitivity of VCG to heuristic errors; (2) by taking many small steps, the game becomes repeated, essentially turning the problem without money into one where money-like points are used across rounds.

New results. While the applied side of the framework will need to be validated experimentally, theoretical results have been very promising. Consider the setting of one-sided allocation without money, such as allocating students to school slots. This setting is “one-sided” because only one side of the match has preferences over the other side. There are $n$ participants that need to be allocated $n$ items – one item each. A classical result of Hylland and Zeckhauser (1979), showed that the allocation problem can be solved using a market-like mechanism called a competitive equilibrium from equal incomes (CEEI). Each item $j$ is assigned a price $P_j$; each player is give one unit of tokens. Each player “buys” her favorite distribution (fractional allocation) of items at prices $P_j$ without exceeding her budget. The [HZ79] results states that there are always market-clearing prices. Notably, these prices and allocations are known to not be uniquely determined by the players’ utilities.

By plugging in the one-sided allocation problem into the APEX framework, we obtain a sharpening of the [HZ79] result: there is a scaling of players’ utilities, such that if we run the unit-demand VCG auction with these scaled utilities, the resulting prices are HZ prices. Interestingly, this is a proper subset of all possible HZ prices/equilibria: there are HZ allocations that are not supported by any VCG prices.

The motivation for this work comes from pricing insurance reimbursement for Medicare Advantage, but it is part of a much broader challenge of adapting problems in classical mechanism design to the “big data” regime. Informally, the “big data” regime is a regime where mining data for information, while not infinitely useful, is still useful (so more data leads to better information).

We consider a scenario where a principal wishes to outsource the care of some patients to a private insurance plan. Let us say that the care of a patient $p$ costs $C_{pub}(p)$ in the public system, and $C_{priv}(p)$ under private insurance. Note that none of the costs are known a priori, and the best one can do is estimate them from known characteristics. The principal’s goal here is to achieve an efficient allocation. Simplifying, and assuming $C(p)$ accounts for quality as well as cost, an ideal mechanism would allocate $p$ to the private plan if $C_{priv}(p)<C_{pub}(p)$, and to the public plan otherwise. Patient costs are estimated using past cost data and patient characteristics.

If the principal could just allocate patients between the plans, it could produce the first-best allocation by estimating costs, and sending patients whose expected $C_{priv}(p)$ is lower than $C_{pub}(p)$ to the private plan. However, in reality, the principal is unable to just allocate patients. It writes a reimbursement contract, and then lets the private agent recruit patients: this is exactly how Medicare Advantage works. Therefore, the principal would like to design a contract that would cause the private agent to select patients if and only if $C_{priv}(p)<C_{pub}(p).$

If only little data were available about each patient, the number of patients $\gg$ data dimension, and the principal could calculate a very accurate unbiased estimate $E(p)$ of $C_{pub}(p)$ given the data, and write $E(p)$ as the reimbursement. If the private agent also only has access to the same data, the only thing it can do is to produce an estimate of $C_{priv}(p)$, and select if it is lower than $C_{pub}(p)$. If, on the other hand, the private agent has access to more data, it can produce a better estimate of $C_{priv}(p)$, and exploit the principal by attracting patients with $$C_{pub}(p) < C_{priv}(p) < E(p).$$ For example, if reimbursement is set based on patient age, but the private plan has access to some health information, it can exploit the scheme by attracting older patients who are unusually healthy for their age.

If rich data on an unlimited number of patients were available, the principal could produce extremely accurate estimates $E_{rich}(p)$ of $C_{pub}(p)$ given the data, and write $E_{rich}(p)$ as the reimbursement. The richness assumption would guarantee that the private plan has no way of obtaining more accurate estimates, and that only selecting patients with $C_{priv}(p) < E_{rich}(p) \approx C_{pub}(p)$ is the best response, leading to efficient selection. Here we rely crucially on having an unlimited number of patients to produce good estimates $E_{rich}$. If the private agent is able to use more data to produce better estimates, it will be able to again exploit the system as in the little data regime.

In practice, we are in what we call the big data regime – there is enough data to produce good cost estimates, but with more data and effort the private agent can refine these estimates and select over-reimbursed patients. This gap becomes more significant as data becomes high dimensional (so that even over millions of patients estimation errors are non-negligible), and advances in statistics and machine learning lead to better estimation techniques. To counter this problem, we propose a new class of strategic capitation mechanisms: instead of writing a contract in terms of a fixed formula $E(p)$ based on estimated public cost, we split the contract into two terms: (1) the first term is a fixed unbiased estimator based on few relevant characteristics as in the little data regime; (2) the second term is only calculated after selection is realized (e.g. at the end of the year), and accounts for all data available, as in the rich data regime. Importantly, for a private agent that does not actively engage in selection, the second term will be close to $0$. At the same time, since the second term is not revealed until after selection is realized, an agent who wishes to engage in strategic selection will not be able to use its statistical flaws to its advantage (since these flaws are only revealed after selection has already happened).

More broadly, the framework we propose allows a principal who is computationally weaker or has less data to use the time dimension to incentivize efficient behavior by more sophisticated agents. This is done by using the time dimension, revealing some of the contract ex-post. The ex-post part is such that simple agents are not affected by it. The ex-post contract, in expectation, has statistical properties which would be unattainable for a fixed ex-ante contract, ensuring that even sophisticated agents cannot exploit it.

## Mechanism design for learning agents

The main challenge algorithmic mechanism design aims to address is implementing algorithmically-optimal solutions in environments with self-interested parties. Classic algorithm design assumes that players are truthful (i.e. always reveal their true preferences and participate in the algorithm in prescribed ways). Classic game theory assumes that players best respond in equilibrium: that is, they take actions/reveal information to maximize their own utility. In most cases, there is a significant gap between what is attainable in a game-theoretic equilibrium, and what is attainable algorithmically (this gap is known as the price of anarchy).

When the stakes are sufficiently high, ignoring incentives will lead to unraveling, and is not an option. On the other hand, without additional assumptions, it is often impossible to compute equilibria solutions of most games. The issue is not just computational complexity, but also the fact that information environments faced by individual players are often incomplete, and are hard to model.

In practice, this means that beyond designing mechanisms where non-strategic behavior always dominates strategic behavior, it is difficult to make use of equilibrium concepts to predict players’ behavior. In multi-round games (such as repeated auctions or procurement), a slightly weaker – but algorithmically much more friendly – model of players’ behavior is to assume that the players are learning to play. This is a realistic model of players’ behavior, especially since in many cases it is impossible to separate learning the environment from learning to strategically interact with it. For example, an algorithm optimizing an online ad campaign will use online learning to tweak “strategic” aspects of the campaign (such as how much to bid for ad placement) and “non-strategic” ones (such as optimizing ads to increase conversion).

Once we have an explicit model for players’ behavior based on online learning, we can obtain new insights in scenarios which would be out of reach for “standard” equilibrium analysis. We can also introduce new strategic scenarios (such as multi-arm bandits with strategic arms), which arise specifically when one side runs an online learning algorithm.

Preliminary results show that mechanism design with learning agents indeed falls somewhere between non-strategic algorithm design and standard mechanism design based on Nash equilibria. For example, in “Selling to a no-regret buyer” we revisit the classical scenario of a seller (repeatedly) selling a single item to buyers drawn from a distribution. A classical theorem by Myerson characterizes the optimal revenue attainable with strategic buyers. Note that the maximum revenue attainable with non-strategic buyers is just the aggregate utility that the buyers get from the items. We show that if the buyers are learning to repeatedly bid in the auction using a low-regret algorithm (such as multiplicative weight update), the seller can extract significantly more revenue than Myerson’s mechanism (while also improving overall utility). At the same time, there is still a gap between what we can extract and the non-strategic optimal.

Perhaps the best-studied online-learning scenario is that of multi-armed bandits, where the learner has to repeatedly select one of $k$ options, and only learns the reward of the selected option after it has been irrevocably selected. Multi-armed bandits (and their extensions to contextual bandits) capture much of the issues involved in online decision making. Examples range from repeatedly selecting a service contractor, to choosing which ads to display based on a query, to selecting a caching policy on a server. Importantly, in the classical bandit setting it is assumed that none of the parties are strategic.

In “Multi-armed bandit problems with strategic arms” we revisit the bandits model in the case where the arms are strategic. Consider the scenario where the learner needs to repeatedly hire a contractor to do a job. In this case, the goal of the learner is to always pick the best contractor in terms of $utility_t-cost_t$: the difference between its utility and its cost. When the utilities and costs are fixed (even adversarially), it is known from online learning theory (and practice) that this goal is attainable. However, if the “arms” are contractors, they will behave strategically to maximize their reward in the game. For example, they might misrepresent the cost in order to collect rewards in later rounds; a contractor may charge a very low fee to steer the learner away from competition, and then raise the prices in later rounds. Additionally, tacit collusion may emerge between the arms.

Both projects above demonstrate that applying “learning” to strategic problems is fraught even when the underlying learning problem is well-understood: whenever one party is learning, other parties may exploit it by “teaching” it to behave in their interest.