A Bellman Equation About Nothing

Cold Open [Introduction]

A few years ago I came across a short paper that I desperately wanted to understand. The magnificent title was “An Option Value Problem from Seinfeld” and the author, Professor Avinash Dixit (of Dixit-Stiglitz model fame), therein discussed methods of solving for “sponge-worthiness.” I don’t think I need to explain why I was immediately drawn to an academic article that focuses on Elaine Benes, but for those of you who didn’t learn about the realities of birth control from this episode of 1990’s television, allow me to briefly explain the relevant Seinfeld-ism. The character Elaine Benes[1] loyally uses the Today sponge as her preferred form of contraception. However, one day it is taken off the market and, after trekking all over Manhattan, our heroine manages to find only one case of 60 sponges to purchase. The finite supply of sponges poses a daunting question to Elaine… namely, when should she choose to use a sponge? Ie, when is a given potential partner sponge-worthy?

JERRY: I thought you said it was imminent.

ELAINE: Yeah, it was, but then I just couldn’t decide if he was really sponge-worthy.

JERRY: “Sponge-worthy”?

ELAINE: Yeah, Jerry, I have to conserve these sponges.

JERRY: But you like this guy, isn’t that what the sponges are for?

ELAINE: Yes, yes – before they went off the market. But I mean, now I’ve got to re-evaluate my whole screening process. I can’t afford to waste any of ’em.

–“The Sponge” [Seinfeld Season 7 Episode 9]

As an undergraduate reading Professor Dixit’s introduction, I felt supremely excited that an academic article was going to delve into the decision-making processes of one of my favorite fictional characters. However, the last sentence in the introduction gave me pause: “Stochastic dynamic programming methods must be used.” Dynamic programming? Suffice it to say that I did not grasp the methodological context or mathematical machinery embedded in the short and sweet paper. After a few read-throughs, I filed wispy memories of the paper away in some cluttered corner of my mind… Maybe one day this will make more sense to me… 

Flash forward to August 2016. Professor David Laibson, the economics department chair, explains to us fresh-faced G1’s (first-year PhD’s) that he will be teaching us the first part of the macroeconomics sequence… Dynamic Programming. After a few days of talking about Bellman equations, I started to feel as if I had seen related work in some past life. Without all the eeriness of a Westworld-esque robot, I finally remembered the specifics of Professor Dixit’s paper and decided to revisit it with Professor Laibson’s lectures in mind. Accordingly, my goal here is to explain the simplified model set-up of the aforementioned paper and illustrate how basics from dynamic programming can be used in “solving for spongeworthiness.”

Act One [The Model]

Dynamic programming refers to taking a complex optimization problem and splitting it up into simpler recursive sub-problems. Consider Elaine’s decision as to when to use a sponge. We can model this as an optimal stopping problem–ie, when should Elaine use the sponge and thus give up the option value of holding it into the future? The answer lies in the solution to a mathematical object called the Bellman equation, which will represent Elaine’s expected present value of her utility recursively.

Using a simplified version of the framework from Dixit (2011), we can explain the intuition behind setting up and solving a Bellman equation. First, let’s lay out the modeling framework. For the sake of computational simplicity, assume Elaine managed to acquire only one sponge rather than the case of 60 (Dixit assumes she has a general m sponges in his set-up, so his computations are more complex than mine). With that one glorious sponge in her back pocket, Elaine goes about her life meeting potential partners, and yada yada yadaTo make the yada yada’s explicit, we say Elaine lives infinitely and meets one new potential partner every day t who is of some quality Qt. Elaine is not living a regular continuous-time life, instead she gets one romantic option each time period. This sets up the problem in discrete-time since Elaine’s decisions are day-by-day rather than infinitesimally-small-moment-by-infinitesimally-small-moment. If we want to base this assumption somewhat in reality, we could think of Elaine as using Coffee Meets Bagel, a dating app that yields one match per day. Ie, one “bagel” each day.

Dixit interprets an individual’s quality as the utility Elaine receives from sleeping with said person. Now, in reality, Elaine would only be able to make some uncertain prediction of a person’s quality based on potentially noisy signals. The corresponding certainty equivalent [the true quality metric] would be realized after Elaine slept with the person. In other words, there would be a distinction between ex post and ex ante quality assessments—you could even think of a potential partner as an experience good in this sense. (Sorry to objectify you, Scott Patterson.) But, to simplify our discussion, we assume that true quality is observable to Elaine—she knows exactly how much utility she will gain if she chooses to sleep with the potential partner of the day. In defense of that assumption, she does vet potential partners pretty thoroughly.

Dixit also assumes quality is drawn from a uniform distribution over [0,1] and that Elaine discounts the future exponentially by a factor of δ in the interval (0,1). Discounting is a necessary tool for agent optimization problems since preferences are time dependent. Consider the following set-up for illustrative purposes: Say Elaine gains X utils from eating a box of jujyfruit fruit today, then using our previously defined discount factor, she would gain δX from eating the box tomorrow, δ2X from eating it the day after tomorrow, and so on. In general, she gains δnX utils from consuming it n days into the future—thus the terminology “exponential discounting.” Given the domain for δ, we know unambiguously that X > δX >δ2X >… and on. That is, if the box of candy doesn’t change between periods (it is always X), (assuming it yields positive utility—which clearly it must given questionable related life decisions.) Elaine will prefer to consume it in the current time period. Ie, why wait if there is no gain from waiting? On the other hand, if Elaine wants to drink a bottle of wine today that yields Y utils, but the wine improves by a factor of w>1 each day, then whether she prefers to drink it today or tomorrow depends on whether Y—the present utility gain of the current state of the wine—or δ(wY)—the discounted utility gain of the aged (improved) wine—is greater. (Ie, if δw>1, she’ll wait for tomorrow.) If Elaine also considers up until n days into the future, she will be comparing, Y,  δ(wY), δ2X(w2Y), …, and δn(wnY).

In our set-up Elaine receives some quality offer each day that is neither static (as in the jujyfruit fruit example) nor deterministically growing (as in the wine example), rather the quality is drawn from a defined distribution (the uniform distribution on the unit interval—mainly chosen to allow for straightforward computations). While quality is observable in the current period, the future draws are not observable, meaning that Elaine must compare her current draw with an expectation of future draws. In short, everyday Elaine has the choice whether to use the sponge and gain Qt through her utility function, or hold the sponge for a potentially superior partner in the future. In other words, Elaine’s current value function is expressed as a choice between the “flow payoff” Qt and the discounted “continuation value function.” Since she is utility maximizing, she will always choose the higher of these two options. Again, since the continuation value function is uncertain, as future quality draws are from some distribution, we must use the expectation operator in that piece of the maximization problem. Elaine’s value function is thus:


This is the Bellman equation of lore! It illustrates a recursive relationship between the value functions for different time periods, and formalizes Elaine’s decision as a simple optimal stopping problem.

Act Two [Solving for Sponge-worthiness]

To solve for sponge-worthiness, we need to find the value function that solves the Bellman equation, and derive the associated optimal policy rule. Our optimal policy rule is a function that maps each point in the state space (the space of possible quality draws) to the action space such that Elaine achieves payoff V(Qt) for all feasible quality draws in [0,1]. The distribution of Qt+1 are stationary and independent of Qt, as the draws are perpetually from U[0,1]. (Note to the confounded reader: don’t think of the space of quality draws as akin to some jar of marbles in conventional probability puzzles—those in which the draw of a red marble means there are less red to draw later—since our distribution does not shift between periods. For more on other possible distributions, see Act Four.) Due to the aforementioned stationarity and independence, the value of holding onto the sponge [δEV(Qt+1)] is constant for all days. By this logic, if a potential partner of quality Q’ is sponge-worthy, then Q’ ≥ δEV(Qt+1)! Note that for all Q”>Q’, Q”>δEV(Qt+1), so some partner of quality Q” must also be considered sponge-worthy. Similarly, if a person of quality Q’ is not sponge-worthy, then δEV(Qt+1) ≥ Q’ and for all Q”<Q’, Q”<δEV(Qt+1), so any partner of quality Q” must also not be sponge-worthy. Thus, the functional form of the value function is:


In other words, our solution will be a threshold rule where the optimal policy is to use the sponge if Q> Q* and hold onto the sponge otherwise. The free parameter we need to solve for is Q*, which we can conceptualize as the all-powerful quality level that separates the sponge-worthy from the not!

Act Three [What is Q*?]

When Q= Q*, Elaine should be indifferent between using the sponge and holding onto it. This means that the two arguments in the maximization should be equal–that is, the flow payoff [Q*] and the discounted continuation value function [δEV(Qt+1)]. We can thus set Q*=δEV(Qt+1and exploit the fact that we defined Q ~ U[0,1], to make the following calculations:


The positive root yields a Q* >1, which would mean that Elaine never uses the sponge. This cannot be the optimal policy, so we eliminate this root. In effect, we end up with the following solution for Q*:


Given this Q*, it is optimal to use the sponge if Q> Q*, and it is optimal to hold the sponge Q* ≥ Qt. Thus, as is required by the definition of optimal policy, for all values of Qt:


We can interpret the way the Q* threshold changes with the discount factor δ using basic economic intuition. As δ approaches 1 (Elaine approaches peak patience), Q* then approaches 1, meaning Elaine will accept no partner but the one of best possible quality. At the other extreme, as δ approaches 0 (Elaine approaches peak impatience), Q* then approaches 0, meaning Elaine will immediately use the sponge with the first potential partner she meets.

To make this visually explicit, let’s use a graph to illustrate Elaine’s value function for some set δ. Take δ=0.8, then Q*=0.5, a clear-cut solution for the sponge-worthiness threshold. Given these numbers, the relationship between the value function and quality level can be drawn out as such:


What better application is there for the pgfplots package in LaTeX?!

The first diagram illustrates the two pieces that make up Elaine’s value function, while the second then uses the black line to denote the value function, as the value function takes on the maximum value across the space of quality draws. Whether the value function conforms to the red or green line hinges on whether we are in the sponge-worthy range or not. As explained earlier, before the sponge-worthiness threshold, the option value of holding the sponge is the constant such that Q*=δEV(Qt+1). After hitting the magical point of sponge-worthiness, the value function moves one-for-one with Qt. Note that alternative choices for the discount rate would yield different Q*’s, which would shift the red line up or down depending on the value, which in turn impact the leftmost piece of the value function in the second graph. These illustrations are very similar to diagrams we drew in Professor Laibson’s module, but with some more advanced technical graph labelings than what we were exposed to in class (ie, “no sponge for you” and “sponge-worthy”). 

Act Four [Extensions]

In our set-up, the dependence of the value function is simple since there is one sponge and Elaine is infinitely lived. However, it could be that we solve for a value function with more complex time and resource dependence. This could yield a more realistic solution that takes into account Elaine’s age and mortality and the 60 sponges in the valuable case of contraception. We could even perform the sponge-worthiness calculations for Elaine’s monotonically increasing string of sponge quantity requests: 3, 10, 20, 25, 60! (These numbers based in the Seinfeld canon clearly should have been in the tabular calculations performed by Dixit.)

For computational purposes, we also assumed that quality is drawn independently each period (day) from a uniform distribution on the unit interval. (Recall that a uniform distribution over some interval means that each value in the interval has equal probability.) We could alternatively consider a normal distribution, which would likely do a better job of approximating the population quality in reality. Moreover, the quality of partners could be drawn from a distribution whose bounds deterministically grow over time, as there could be an underlying trend upward in the quality of people Elaine is meeting. Perhaps Coffee Meets Bagel gets better at matching Elaine with bagels, as it learns about her preferences.

Alternatively, we could try and legitimize a more specific choice of a distribution using proper Seinfeld canon. In particular, Season 7 Episode 11 (“The Wink,” which is just 2 episodes after “The Sponge”) makes explicit that Elaine believes about 25% of the population is good looking. If we assume Elaine gains utility only from sleeping with good looking people, we could defend using a distribution such that 75% of quality draws are exactly 0 and the remaining 25% of draws are from a normal distribution ranging from 0 to 1.  (Note that Jerry, on the other hand, believes 95% of the population is undateable, so quality draws for Jerry would display an even more extreme distribution–95% of draws would be 0 and the remaining 5% could come from a normal distribution from 0 to 1.)

Regardless of the specific distribution or time/resource constraint choices, the key take-away here is the undeniably natural formulation of this episode’s plot line as an optimal stopping problem. Throughout the course of our six weeks with Professor Laibson, our class used dynamic programming to approach questions of growth, search, consumption, and asset pricing… while these applications are diverse and wide-ranging, don’t methods seem even more powerful when analyzing fictional romantic encounters!?


Speaking of power


As explained earlier, this write-up is primarily focused on the aforementioned Dixit (2011) paper, but also draws on materials from Harvard’s Economics 2010D sequence. In particular, “Economics 2010c: Lecture 1 Introduction to Dynamic Programming” by David Laibson (9/1/2016) & “ECON 2010c Section 1” by Argyris Tsiaras (9/2/2016).

© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

Ultimate Game Theory

An introduction to the melted, gooey mind of a post-finals PhD student

In the days preceding my game theory final, I was quarantined in my Cambridge apartment. The heat was on and pages of yellow legal paper decorated with inky matrices and tree diagrams ruled my kitchen counters. Swaddled in some convex combination of polar fleece and section notes, I would only leave my warm fortress for two activities: (1) to throw $4 at an increasingly hard-to-please chai tea habit; and (2) to play and train for my sport of choice–that is, ultimate frisbee.

When I would return from ultimate, residual thoughts about the game lingered at the edges of my legal pads. The combination of studying for my exam and ultimate exposure in the throes of winter madness led me to the inevitable: reframing game theory concepts as they apply to aspects of ultimate! While I didn’t have the time to parse out examples of “Ultimate” Game Theory back in Cambridge, I’m on winter break in San Francisco now… which means two things: (1) I am still wearing lots of fleece; and (2) I have time to tease out all the kitschy alt-sport applications of game theory that my heart desires.

To discuss game theoretic concepts in this context, I build out two games that are based in the ultimate frisbee universe.[1] First, I use The “call lines” Game to discuss some popular, well-known concepts–namely, the prisoner’s dilemma and pure Nash equilibrium. I also use this framework to talk about repeated games and subgame perfect equilibrium. In adding the concepts of offense and defense, I refine the game so that it is no longer symmetric, and provide an example of how to solve for mixed Nash equilibrium.  The second game I herein created is The “throw it to the girl” Game. This game is much more complex and interesting than the former–it is a dynamic signaling game with imperfect information that allows me to illustrate how to solve for perfect bayesian equilibrium. The “throw it to the girl” Game allows us to model one kind of dynamic that can pop up in the social context of co-ed sports.

Game I: The “call lines” Game

a. The Game Set-up

First things first, I present a simple game based on “calling lines” during an ultimate frisbee game. Ultimate is played with two teams. Each team needs to put 7 people “on the line” to play any given point. However, teams themselves consist of more than 7 people since otherwise those 7 people would probably not be super into playing this sport. (People need some rest!) In my set-up, I assume there are two teams, 1 and 2, that are identical and each always has two lines to choose from: a strong line and a weak line. The payoffs are determined by strategies employed rather than the identity of those teams employing them. In effect, the normal form of this game is a 2×2 symmetric matrix. (This is 2×2 since there are two players–team 1 and team 2–as well as two choices of lines–weak and strong.)

In order to determine the payoffs in this matrix, I need to make assumptions about the team outcomes. In expectation (which is how payoffs in a normal form matrix are presented–as expected Bernoulli utility), weak lines lose to strong lines and the same type of lines win or lose to one another with equal probability. A team gets +3/-3 for winning/losing a point. (If two types of the same type play, they receive 0 in expectation since the probability of a win is 0.5.) Moreover, I assume that teams do not want to overuse their strong lines. Ie, teams do not want to wear out their best players for fear of fatigue or injury. Therefore, teams also receive payoffs of +1/-1 for playing a weak/strong line. Given these simple and linear assumptions,[2] the following represents the normal form game for “call lines”:


b. Prisoner’s Dilemma Form & Solving for Pure Nash Equilibrium

The normal form of the “call lines” game might look very familiar. While conceptually different, it is mathematically identical to everyone’s favorite simple non-cooperative game: the prisoner’s dilemma! Note that the prisoner’s dilemma has infinite representations with respect to the specific payoffs. The overarching requirement is that the game is symmetric across the two players and that the following strict ranking of payoffs holds: [the payoff to a player who “defects” (plays a strong line in this case) while the other “cooperates” (plays a weak line)]  > [the payoff to a player who “cooperates” (plays a weak line) while the other “cooperates” (plays a weak line)] > [the payoff to a player who “defects” (plays a strong line) while the other “defects” (plays a strong line)] > [the payoff to a player who “cooperates” (plays a weak line) while the other “defects” (plays a strong line)].[3] In table 1 we can see this holds since 2>1>-1>-2. I could replace these payoffs in the normal form matrix with any set that maintains the same strict inequality and the game would remain a prisoner’s dilemma.

In the prisoner’s dilemma context, the relevant solution concept is the well-known concept of Nash equilibrium. In Nash equilibrium, no agent (team in this case) has an incentive to deviate if the agent knows the other’s strategy. In order to solve for Nash equilibrium, I underline the best responses of both teams to each other’s strategies:


(Quick refresher as to how to find these marked best responses: Imagine team 1 plays a weak line, then the payoffs to team 2 are either 1 (if play weak) or 2 (if play strong). Since 2>1, team 2 will play strong. Imagine team 1 plays a strong line, then the payoffs to team 2 are either -2 (if plays weak) or -1 (if plays strong). Since -1>-2, team 2 will play strong. The same logic then applies to team 1 since the game is symmetric.)

Since both payoffs in the (-1,-1) box of the matrix are underlined, it is evident that neither team has an incentive to deviate from the strong strategy given that the other team is playing strong. Thus, strong-strong is the sole pure Nash equilibrium in the “call lines” game. However, note that the weak-weak strategy, which yields payoffs (1,1), while not Nash, is pareto optimal (no payoff duo gives both players a higher payoff) and, accordingly, pareto dominates (-1,-1). As Prof Maskin lecture slides wisely say, this “illustrates the tension between efficiency and individual maximization.”

c. Repeated Game Prisoner’s Dilemma & Solving for Subgame Perfect Equilibrium

While the original set-up of this game was in a static context, I can also render “call lines” a repeated game and end up with a different solution concept than the traditional Nash equilibrium previously described. Let’s assume that the same normal form game shown in Table 1 will be played infinitely–this generates an “iterated prisoner’s dilemma.” In this context, I use a solution concept known as subgame perfect equilibrium. Given repetition and recall of previous outcomes/actions, teams now have the opportunity to penalize each other for previous decisions. In the “call lines” context, I investigate the following strategy: play a weak line until someone plays a strong line (play strong from then on). This is also called a “grim trigger strategy,” which alters the choice of lines if someone chooses to deviate from cooperation (playing weak lines). This strategy, therefore, incentives cooperation since otherwise the players punish one another by forcing reduced payoffs for the rest of the infinitely repeated game.

This strategy yields efficiency in subgame perfect equilibrium–a point I show below. Imagine teams have discount factors, meaning they discount future utility flows from points played. The following break-down illustrates how the “grim trigger strategy” is a subgame perfect equilibrium (given some condition on the discount factor):

condcoop copy.png

Thus, if the discount factor is greater than one-third, the grim trigger strategy is a subgame perfect equilibrium for the “call lines” game. However, note that if the number of repetitions of the game is finite and known to both teams, then (by backwards induction) the two players will play strong lines in every period. Therefore, the solution concept is the same as in the static context if the repetition is finite and known, but can diverge if the repetition is infinite and the discount factor meets some requirement. (For a more complete discussion of repeated games and cooperation, check out these slides.)

d. Adding Offense and Defense & Solving for Mixed Nash Equilibrium

I now refine the “call lines” game by adding the concepts of offense and defense. This addition will change the payoffs in the normal form matrix. Assume that team 1 is on offense and team 2 is on defense. When a team starts a point on offense (meaning the other team pulls the disc down field to them–a kick-off in football), they have an advantage for scoring. Assume accordingly that a weak offense will beat a weak defense and a strong offense will beat a strong defense. Therefore, the only offense that loses in a match-up is a weak offense against a strong defense.  Maintaining the same +3/-3 for winning/losing a point and the same +1/-1 for strong/weak lines, the normal form game with player 1 on offense is as follows:


Given this change, the game is no longer symmetric. It is no longer a prisoner’s dilemma, and moreover, there is no longer a pure Nash equilibrium. This can be illustrated with the best responses marked below (ie, there is no box with both payoffs underlined):


While there is no pure Nash equilibrium, we know that all finite games have at least one Nash equilibrium (theorem of existence of Nash equilibrium). Therefore, there must be some mixed Nash equilibrium. Mixed Nash equilibrium is made up of mixed strategies, which are those by which a team plays its available pure strategies (play a weak line, play a strong line) with certain probabilities. In solving for mixed Nash, we consider three possibilities (only team 1 uses a mixed strategy, only team 2 uses a mixed strategy, both use mixed strategies) and make use of the indifference condition as follows:


There is therefore one single mixed Nash equilibrium in which team 1 plays a weak line with probability 2/3 (and so a strong line with probability 1/3) and team 2 plays a weak line with probability 1/3 (and so a strong line with probability 2/3).

e. Recap of “calling lines”

In sum, we have used the original and refined “call lines” set-ups and their corresponding normal forms in order to discuss the prisoner’s dilemma, pure Nash equilibrium, repeated games, subgame perfect equilibrium, and mixed Nash equilibrium. In moving to a more complex and interesting set-up, I now transition to the “throw it to the girl” game.

Game II: The “throw it to the girl” Game

a. The Game Set-up

Ultimate is played in a myriad of circumstances. The most casual form of ultimate frisbee is pick-up–that is, a group of people who get together to play who often don’t know each other. Pick-up is often mixed gender, meaning men and women are playing together, which while empowering and fun can often lead to some noticeable gender dynamics. For instance, playing pick-up in a mixed gender setting can lead to women being “looked off” by male players. [See here for an article on this exact subject that a fellow female frisbee friend recently shared!] In other words, men sometimes do not throw to open women…which can lead to the classic “throw it to the girl!” remark from the sideline as a woman appears open upfield but the dude with the disc chooses to holster the throw instead.  The reasons for this trend (preference for bigger, more dramatic plays in the form of hucks to big dudes, implicit bias, etc.) is not the focus of this discussion…rather, it suffices to note that, yeah, this is a dynamic.

In my own personal experience as a female pickup player, I’ve found that calling for the disc when open is a solid way to signal that I am more experienced or confident and that men shouldn’t hesitate to throw to me. In learning about dynamic signaling games in game theory, I quickly realized that this calling/throwing situation could easily be melded into game theoretic form. Consider the moment when a male player with a disc is looking upfield for a throw. Assume there is an open female cutter upfield. In this moment, the female cutter (player 1 to us) has a choice: she can (1) call for the disc, signaling that she wants to be thrown to, or (2) remain silent and again not be thrown to.

This set-up is a two-player dynamic signaling game. While conceptually distinct, note that this game is identical to the well-known “gift game”! Player 1 has two types: she is either (1) dirty, or (2) a scrub. (Yeah, frisbee vernacular. Let’s go.) In this world, we are assuming that a dirty woman is better than the average male cutter on the pick-up team, while a scrub woman is worse than the average male cutter on the team. We assume that with probability 0.7 nature makes the woman dirty and with probability 0.3 nature makes her a scrub. [This was an arbitrary choice–open to edits on this.] Once the cutter has chosen to yell out or not, the dude with the disc (player 2) has a choice. Player 2 only has one type. He has no choice if the woman is silent since he will unambiguously not throw to her, but if she calls out, he can choose to throw to her or holster (not throw to her).

  • If the woman is silent, the payoffs to both players are 0 regardless of player 1 type since no one gains from this and both players continue functioning at the status quo.
  • If the woman calls out, the payoffs are different depending on her type:
    • Let’s say she is dirty:
      • If the dude throws to her, she gains 2 since she is happy she was thrown to and she played the disc well; the dude in this case is happy since she played the disc better than the average male cutter would have and gets a payoff of 1.
      • If the dude does not throw to her, then she gets a payoff of -1. (This assumes, based on personal and shared experience, that women feel more ignored or disrespected when looked off after being openly vocal than after being silent.) Meanwhile, the dude in this case goes on with the status quo and gets a payoff of 0.
    • Let’s say she is a scrub:
      • If the dude throws to her, she gains 1 since she is happy she was thrown to. (But she doesn’t gain as much as the dirty woman since she’s not as dope at frisbee. I am assuming that people gain more utility from playing when they are dirty.) The dude, in this case, is unhappy since she doesn’t play the disc as well as the average male cutter so he gets payoff of -1.
      • If the dude does not throw to her, she again gets a payoff of -1 and he again gets a payoff of 0. (We are assuming that dirty women and scrubs receive the same payoffs when ignored, but differ in payoffs when they get to play the disc.)

Given these above assumptions for payoffs and dynamics, I used the TikZ package in LaTeX to build out an extensive form of this game. [Thank you to Dr. Chiu Yu Ko who has an incredible set of TikZ Templates openly available–Here is the signaling game one that I built off of.] See figure 1 for the extensive form of this game:

tree2 copy.png

b. Solving for Perfect Bayesian Equilibrium

In the context of such dynamic games with incomplete information, the equilibrium concept of interest is perfect bayesian equilibrium (a refinement of bayesian nash equilibrium and subgame perfect equilibrium).

In order to solve for perfect bayesian equilibrium (PBE from here on), I must investigate all possible strategies for our women in the pick-up game. Since we have two types of women (dirty players/scrubs) as well as two possible actions (call out/be silent), there are four possible strategies. Two of these are what we call “separating strategies” in which the two types choose different actions:

  • dirty player is silent/scrub calls (Figure 2)
  • dirty player calls/scrub is silent (Figure 3).

The other two are called “pooling strategies” in which both types choose the same action:

  • dirty player is silent/scrub is silent (Figure 4)
  • dirty player calls/scrub calls (Figure 5)

For each of the woman’s four possible strategies, I then determine the beliefs and accordingly the optimal response of the dude with the disc. Given that optimal response, I check to see if either of our types of women would like to deviate. If not, then we have a perfect bayesian equilibrium. I will now go through this systematically for the four strategies.


The above illustrates the separating equilibrium strategy in which the dirty player is silent and the scrub calls for the disc. (These actions for the two types of women are illustrated in red.) In a separating equilibrium, the action of player 1 signals the type, meaning that if the dude hears a “hey,” he knows she a scrub. The dude’s strategy (recall he only gets to make a choice when there has been a call for the disc) is then to holster the throw since 0>-1. (Thus holster being highlighted in red in the left information set.) Note that given that optimal response from the dude, the scrub female player could improve her payoff by remaining silent instead since 0>-1. In effect, this is not a PBE.


The next strategy we consider is that in which a dirty player calls for the disc and a scrub remains silent. In this separating case, the dude knows that if he hears a “hey,” the woman is dirty. So, the dude’s strategy is to throw since 1>0. (Throw is highlighted in red in the left information set.) Given this optimal response from the dude, the scrub female player could improve her payoff by deviating from silence to calling since 1>0. In effect, this is not a PBE.


The above figure illustrates the total silence strategy. In such a pooling equilibrium, the dude’s beliefs when hearing a disc called for can be arbitrary since hearing a “hey!” occurs with 0 probability and therefore bayes’ rule doesn’t apply in this context. In effect, if the dude’s beliefs as to the woman’s type are adequately pessimistic (believes with more than 50% certainty that she’s a scrub), then his strategy is to holster the throw (holster highlighted in left information set). (So, diagram is drawn for adequately pessimistic beliefs on the part of the dude.) Regardless of the probabilities determined by nature (0.7 and 0.3), neither player can improve by deviating since (-1,0) is inferior to (0,0). Therefore, this is a PBE. 


The last strategy to look into is the all call strategy. In this pooling equilibrium, the dude’s beliefs as to the woman’s type are based on the nature a priori probabilities. The payoff from throwing is thus (1)(0.7)+(-1)(0.3) and the payoff from holstering is (0)(0.7)+(0)(0.3). since 0.4>0, the optimal response for the dude is to throw (as marked by the red). Since 2>0 and 1>0, neither type of woman wants to deviate from the prescribed strategy. In effect, this is a PBE. 

c. Refining the Set of Perfect Bayesian Equilibria

In summary, there are two PBEs for this “throw it to the girl” game: the total silence and all call strategies.  However, note that the total silence strategy is not Pareto efficient while the all call strategy is. Ie, the expected payoffs of 1.7 for the woman and 0.4 for the dude (all call strategy) are larger than 0 payoffs for both (total silence strategy). Moreover, the total silence strategy fails “the intuitive criterion,” a refinement of the set of equilibria proposed by Cho and Kreps (1987). The concept of this requirement is to restrict the set of equilibria to those with “reasonable” off-equilibrium beliefs. This allows me (as the creator of the model) to choose between the multiple PBE’s previously outlined. For a PBE to satisfy the intuitive criterion there must exist no deviation for any type of woman such that the best response of the dude leads to the woman strictly preferring a deviation from the originally chosen strategy.

Let’s explain why the all silent strategy does not satisfy this requirement. Imagine a deviation for the dirty player to calling. If the woman now calls, the best response for the dude is to throw to her, which yields a payoff of 2 for the woman, which is strictly greater than 0. So, the woman prefers this deviation and the intuitive criterion is not satisfied. However, the all call strategy passes this criterion. Imagine a deviation to silence for the dirty player. Then there is no best response for the dude since the payoffs are automatically 0 and 0. Since 2>0, the woman doesn’t prefer the deviation. Similarly, a deviation to silence for the scrub yields 0 instead of 1, which is not preferred either. Thus, the all call strategy satisfies the intuitive criterion. In effect, when we refine the set of equilibria in this way, we have both types of women calling for the disc and the dude making the throw… Sounds like a pretty good equilibrium to me![4]

d. Recap of “throw it to the girl”

We have used this “throw it to the girl” set-up and its corresponding extensive form in order to discuss dynamic signaling games, solving for perfect bayesian equilibrium, and refining the set of equilibria using the intuitive criterion.

Hard cap is on! [In frisbee parlance, it’s time to wrap this all up]

There are endless ways to extend or reform these games in the world of game theoretic concepts. My formulations for “calling lines” and “throw it to the girl” are simple by design in order that they lend themselves to discussing some subset of useful concepts. However, despite the simplicity of the model builds, I’m happy to be able to arrive at conclusions that involve social behaviors as complex as gender dynamics… For example, next time, instead of yelling “throw it to the girl!” from the sideline, you can always shout: “assuming a gift-giving game payoff structure, it is a perfect bayesian equilibrium satisfying the intuitive criterion for you to throw to open women when they call for it!” No worries–if they don’t understand, you can always womansplain the concept during the next time-out.


Check out the relevant Github repository for all tex files necessary for reproducing the tables, tree diagrams, and solution write-ups!


[1] The good news is that since I’m pretty sure some nontrivial percentage of ultimate players have studied math, I don’t have to worry too much about this discussion being for some empty intersection of individuals.

[2] Comments on how to improve this are very welcomed. For this introductory context, I feel these payoffs suffice since it allows me to get into the prisoner’s dilemma and some useful simple equilibrium concepts.

[3] These requirements render the game a non-cooperative one. Prisoner’s dilemma terminology is often used for contexts that in fact would be better categorized as cooperative games such as Stag hunt. In the Stag hunt (or cooperative game) payoff matrix, the inequality relationship would instead be: [the payoff to a player who “cooperates” while the other “cooperates”] >[the payoff to a player who “defects” while the other “cooperates”]  >= [the payoff to a player who “defects” while the other “defects”] > [the payoff to a player who “cooperates” while the other “defects”]

[4] More generally, this will be the case as long as the nature a priori probabilities have the probability of the woman being dirty as 0.5 or greater.

© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

Anxious Optimization

On the morning of my dynamic programming midterm, I tried to calm myself by recalling the golden rules of grad school that have been passed down through the generations[1]:

“grades don’t matter… nothing matters.” 

However, I quickly realized that by adopting this perspective I was simply subbing out test anxiety for existential anxiety. And, come to think of it, the latter didn’t seem to be helpful, while the former could actually aid in my short-term goals–namely, jotting down lots of equations into a collection of small blue booklets.
In considering the roles that angst can play in day-to-day life, I started to become curious about whether I could model optimal choices of anxiety using dynamic programming techniques. Allow me to channel my inner Ed Glaeser[2] using David Laibson-esque framework[3] and wax poetic on the specifics of something you could call (using very flexible terminology) an “economic model”…
Let’s assume that anxiety has some negative cost (say to my overarching mental health), however, its presence in my life also often gets me to achieve goals that I do genuinely want to achieve. Therefore, anxiety factors into my personal utility function in both positive and negative ways. In other words, it is not some force in my life that I want to erase entirely since it can lead to incredibly positive outcomes.
Let’s say, for the sake of model simplicity and for the sake of accuracy since I’m in academia now[4], that my utility function is simply equated to my academic success. Imagine that academic success is some definable and quantifiable concept–we could define this as some weighted average of number of papers published in quality journals, number of good relationships with faculty members, etc. Let’s also assume that this type of success that is a function of (and increasing in) two items: idea creation and execution of ideas. This seems reasonable to me. The next piece is where the real controversial and strict assumptions come in with respect to anxiety: I assume that idea creation is a function of (and increasing in) existential anxiety, while execution is a function of (and increasing in) time/test anxiety. Assume that the functions with respect to the anxiety types have positive first derivatives and negative second derivatives–this is equivalent to assuming concavity. [Note: In reality, there is most definitely some level of both angsts that stops being productive… noting that this is the case calls for more complex assumptions about the functional forms beyond assuming simple concavity… suggestions are welcome!]
Then, given these assumptions and the framework of dynamic programming, the optimization problem of interest is equivalent to solving a maximization problem over the lifecycle.
Explicitly solving this optimization problem requires more assumptions about functional forms and the like. Ed, I’m open to your ideas! Sure, it’d be much simpler to somehow make this a one variable maximization problem–a transformation we are often able to achieve by exploiting some budget constraint and Walras’ law–however, I do not believe that anxiety measures should add to some value beyond human choice. Other potential questions: Do we think our state variables follow an Ito process? Ie, I could see the existential anxiety variable following geometric Brownian motion since drift maybe should rise exponentially with time?
Back to reality, an implication of my model build that comes straight out of my assumptions (don’t even need first order conditions for this) is that I should not be thinking about how “nothing matters” when there’s an upcoming test. A test falls into the category of execution tasks, rather than the realm of idea creation. The existential anxiety that grows out of repeating the mantra “nothing matters” to myself over and over would only be helping come up with ideas… In fact, this whole model idea and post did come from continuing down the path of some existential thought process! So, perhaps the real question should be: is my blogging engrained in weighted average measure for “academic success”? If so, I’m feeling pretty optimized.

[1] Thank you to the G2’s (second-year’s) for the soothing (yet still terrifying) words in your recent emails.

[2] Microeconomics professor of “verbal problem” fame

[3] Macroeconomics professor for the “dynamic programming” quarter of our sequence

[4] I kid, I kid. I’m off to a frisbee tournament for this entire weekend, so clearly my utility function must be more complex.

© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.